In the next blog of our Knoyd series, we are going to show one of the ways of measuring impact of NBA players on the game. Everyone knows that all-stars like LeBron James or Stephen Curry have tremendous impact and they show it in every game on the floor. However, very often role players need to step up and save the game. This analysis is dedicated especially to these players and how they can change the game by their performance. The cornerstone of this analysis is the basic correlation between players‘ impact and result of the game.
We have data from season 2015 / 2016 until 7th of March. The datasets we worked with consist of players‘ statistics per game and players‘ overall averages throughout the season. We take a look at the distribution of main statistics: points, assists and rebounds.
As we can see, we have no outliers from the bottom side. This is obvious, as the smallest possible number of points, assists or rebounds (and a very common one at that) is zero. Each outstanding performance will show as an outlier, for example the Anthony Davis's 59 points or rebounding performances of Andre Drummond from the beginning of the season.
In the next part, we will compute the impact of each player in every game and compute its correlation with the performance of the team. The impact is computed from the following metrics: points, assists, rebounds, offensive rebounds, steals and turnovers. The relative values in comparison with player average in the season were used. Therefore we can measure, what is the impact of the player on the final result of the game, if he plays above or below his average in said game.
where impact from points is equaled to:
Impact of every metric is calculated similarly. This way we ensure that we can measure impact from all players and all metrics, even the ones having naturally smaller numeric values (e.g. steals or turnovers). The downside of this approach is, as already mentioned above, that if player delivers every game on extraordinary level, he might not be flagged as significant by our analysis. However, impact of these players is well appreciated and we decided to focus on the remaining ones.
The correlation values of impact of the player and win in the game in descending order can be seen in the table below. The highest value represents the most significant impact of the particular player on the final result of the game. When players in top rows play above their average, the team has a higher chance of winning.
We can now further explore impact of players with the highest correlation. We have visually compared impact of players on top in both, won and lost games.
In the first 3 pictures, we have players with biggest impact. On the other hand, Jeremy Lamb was a player coming up on top in the negative correlation. That means if he plays good game, the team is more likely to lose.
In the last part, we used logistic regression model to identify, which metrics are the final results of the games the most dependent on. The results can be seen in the following table:
We can see that points (PTS) and rebounds (REB) have the biggest influence by the size of the regression coefficients. If the impact from points or rebounds increases by 1 then the percentage of won games goes up by 20 percent. Interestingly, the influence of offensive rebounds (OREB) is less then 0, meaning that player having more offensive rebounds actually increases the probability of the team losing the game. This effect was also confirmed by checking the hypotheses on the complete data from last season. We can only speculate why is the cause of this happening. However, one of the reasons might be, that team has to miss more shots in order for players to score more offensive rebounds.
For all steps in the post, including taking data from NBA website, the python was used. Acknowledgement to the blog post Web Scraping 201: finding the API written by Greg Reda that nicely describes how to take the data from the NBA website.