Back to Research & Publication
Most players in the NBA come from college basketball, and it is of great importance to NBA franchises to model how various college statistics translate to success in the NBA to determine how to draft college players. As a moderately enthusiastic basketball fan, I am interested in this relationship myself and as such, have kept track of historical trends in draft pick selection and how they change over time. From the mainstream media, I had always heard that wins in college was a surefire indicator of a player's draft position, and therefore success in the NBA—if a player couldn't make his team win in college, against much weaker opposition, then how was he going to lead his team against stronger, smarter, better athletes? However, with Ben Simmons going number 1 overall in the 2016 draft and Markelle Fultz number 1 overall in 2017 draft, with both players' college teams not performing well (Markelle Fultz's University of Washington team even had a below 50% winrate), I was surprised at this seeming departure from the conventional wisdom in recent years.
Based on this admittedly small sample size, I hypothesized that there might be some relationship between the current "era" of the NBA and the influence of a player's college winrate on his draft position. In the modern era, focusing on analytics, three-point shooting, and efficiency, perhaps wins in college were not as important a metric as various other advanced stats, but in previous years, with an emphasis on hand-check defense and post-up play, winrate could've been a more important metric. I decided to test this hypothesis by building my own model.
All of my data was scraped from basketball-reference.com and sports-reference.com/cbb using the BeautifulSoup package in Python and Rvest in R. To classify NBA eras, I first compiled a dataframe of NBA season averages of various statistical measures from the 1973-1974 season (the year the 3-point shot first started being tracked) upto the present day:
I then used the apcluster package in R that performs examplar-based affinity propogation, an algorithm that clusters data without being explicitly provided the number of clusters, unlike K-means clustering. Based on the data provided, the algorithm sorted NBA seasons into the following four clusters, to which I've added my own descriptions:
And produced the following heatmap:
It's interesting to note that most years placed in different eras were in chronological order, with the exception of the 1993, 1994, and 2011 seasons. This makes contextual sense, as 1993-1994 were the seasons with a shorter 3-point line and thus were quite similar to the modern run-and-gun offense, while the 2011-2012 season was a lockout-shortened and injury-ravaged season that had a very slow, defensive style of play, atypical for its era.
Once I obtained rough categorizations for NBA eras, I then scraped player draft data for each season in each era, only looking at the first round draft picks (top 30 picks). For each drafted player who went to college, I scraped their college statistics page on sports-reference.com/cbb to obtain counting stats for their final season in college, and also introduced two additional metrics of my own— their winrate in their final season and the number of seasons they attended college. I then scraped basketball-reference.com to obtain the average winshares per 48 minutes for each of those players during their NBA careers, as a career-length-unbiased estimate of their productivity in the NBA. I further cleaned the data, and then ran a series of 8 regressions using Statsmodels.api in Python—one set for trying to predict the draft position of a player given a set of college parameters for each of the four eras, and the other set for trying to predict a player's winshares per 48 minutes in the NBA given college parameters for each of the four eras. The goal is to observe the relationship between draft position and college statistics (specifically college winrate), see if it varies across eras, and see if there is a similar relationship between college statistics and wins contributed in the NBA.
For regression between draft position and college statistics, here are the four regressions
General Observations:
For regression between winshares per 48 minutes in the NBA and college statistics, here are the four regressions:
These two graphs summarize the data for the two custom metrics (college winrate and years in college) and their correlation to draft position and wins in the NBA. One can see that, with time, the weight for college winrates seem to have increased in the regression for draft position and winshares—that is, a better winrate in college correlates with a relatively better draft position and more wins in the NBA than it did in the past. However, apart from the weight for draft position in Era 3, none of these values have statistically significant p-values, showing that there is high degree of variance in the relationship between college winrate and each of the dependent parameters (draft position and winshares per 48 minutes in the NBA).
In comparison, the correlation values for years in college are decreasing in magnitude over eras, showing that years in college doesn't have as strong a correlation to draft position and wins in the NBA as years past. However, the p-values in recent eras are statistically significant, showing a more definite relationship than college winrate to each of our dependent parameters.
My analysis of era-based correlation between college statistics and draft position/NBA winrate proved quite fruitful. Some of the most interesting results I found were that, contrary to my initial hypothesis, the strength of correlation between college winrate and draft position increased over time, not decreased as I originally thought. In addition, even though years spent in college does not measure a player's performance or skills in the NBA in any way, it has a more statistically significant correlation to draft position and winshares in the NBA than college winrate does, across most eras. This sort of era-based approach to handling NBA data can be a very useful tool for NBA executives and general managers to see the general drafting patterns that teams have had over time, how those change with trends in eras, and how they can use those trends to better find the players who will perform on their teams at a high level.
Spring 2018
Manan Khattar