Lets Predict NFL Wins: Discrimination - It can be a good thing

The first model to predict wins for an NFL season has been finished. First I’m going to give a bit of background on the processes used to arrive at the model. If you don’t care about the statistics that went into these predictions please jump ahead to the predicted wins compared with the betting line. This model was arrived at by using backwards regression to pick the below model. While the goal of backwards regression is usually to end with a model containing only significant predictors, I instead selected the model that had the highest adjusted R². As I said yesterday, I included all variables that were not linearly related. However, due to an oversight I did include pass attempts, rush attempts, and total plays. In future models, this will need to be corrected.

The variables included in the final model were:

· Whether a team won the Super Bowl the previous year (SB_Win_Prev)

· Whether a team played in a Super Bowl the previous year (Sb_App_Prev)

· Points scored last season (Ppoints)

· Pass attempts (POPAtt)

· Rush attempts (PORAtt)

· Offensive plays (POPlays)

· Rush yards (PORYards)

· Passing yards (POPYards)

· Rushing TDs against (PDRTD)

· Passing yards against (PDPYards)

· Take Aways (PDTO)

Model Summary
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate	Change Statistics
					R Square Change	F Change	df1	df2	Sig. F Change
1	.385	.148	.128	2.854	.148	7.223	11	457	.000

In the above table, the only statistics of interest are the highlighted boxes: R Square and Adjusted R Square. R Square can be looked at as how accurately the combination of the above 11 variables predict wins. A model can predict between 0% and 100% of wins. However, R square tends to overestimate the prediction accuracy and thus Adjusted R Square is the preferred measure. Currently this model predicts 12.8% of wins which sucks. A model with an adjusted R squared of 30% is my overall goal for this project.

ANOVAb
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	647.227	11	58.839	7.223	.000
	Residual	3722.697	457	8.146
	Total	4369.923	468

The above table is for those that are interested in overall model fit. I won’t provide an explanation here but will gladly explain it for those interested.

Coefficientsa
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	9.895	3.597		2.751	.006
	Sb_App_Prev	-.951	.781	-.076	-1.218	.224
	SB_W_Prev	1.741	1.051	.100	1.657	.098
	PDTO	-.047	.024	-.102	-2.009	.045
	POPYards	-.001	.001	-.188	-1.702	.089
	PDPYards	-.001	.000	-.108	-2.283	.023
	PDRTD	-.063	.025	-.116	-2.497	.013
	PORYards	-.001	.001	-.088	-1.011	.312
	POPlays	-.023	.013	-.347	-1.786	.075
	PORAtt	.025	.013	.427	1.954	.051
	POPAtt	.026	.013	.460	1.935	.054
	PPoints	.020	.004	.437	4.412	.000

The coefficients table is the most interesting theoretically. The model formula is taken from the highlighted column and is known as the regression coefficient. The regression coefficients tell us how much wins changes when each individual predictor increases by 1. So for instance let’s look at rushing attempt variable. This variable has a regression coefficient of .025. The interpretation for this is that for every 1 extra rush attempt wins increases by .025. This indicates that teams that have a high amount of rushing attempts win more. Notice some of the coefficients defy what we would expect to see. For instance, for every extra take away we expect teams to lose .047 games. This is the result of some of the variables being correlated which can cause changes in the regression coefficients.

Predictions

So using the above regression coefficients, I plugged the stats for last season into the model and came up with predicted wins. However, the initial estimates provided no discrimination (hence the joke in the title). By discrimination, I mean that the majority of estimates were between 6 and 9 wins and thus it was hard to differentiate good and bad teams. To rectify this, I ordered the teams within their division. I rounded the original estimates and then awards bonuses and penalties based on that order. First place teams had 2 wins added, 2^nd place 1 win, 3^rd place team had 1 win subtracted and 4^th place teams had 2 wins subtracted. In the following table I included the original prediction, the revised prediction, and the current line. Note, the site I used is currently not taking bets on the Colts because of the Peyton Manning issue.

Couple of quick notes. For the most part I was surprised at how accurate the predictions were. While some predictions seem really high (hello Raiders) or low (Cardinals got hosed), overall the model gave a sound order of finish within the division. Also, should anyone decide to use this to actually place bets please use this just as an extra resource to making a good decision. Common sense should be used along with these predictions. As an example, the Colts are predicted to win 12 games. If Manning misses time that becomes a more and more extreme prediction. Hope you enjoyed this and I will post possible updates tomorrow.

Division	Predicted Wins	Revised Wins	Line 8/29*
AFC East
Dolphins	7.13	6	7.5
Bills	7.01	5	5.5
Patriots	9.74	12	11.5
Jets	8.3	9	10
AFC North
Ravens	8.21	9	10
Bengals	7.39	6	5.5
Browns	6.82	5	7
Steelers	9.62	12	10.5
AFC South
Texans	7.21	6	9
Colts	9.66	12	NA
Jags	7	5	6.5
Titans	8.15	9	6.5
AFC West
Broncos	6.55	5	6
Chiefs	8.13	7	7.5
Raiders	8.85	10	6.5
Chargers	9.25	11	10
NFC South
Falcons	9.26	11	10
Panthers	4.82	3	4.5
Saints	8.66	10	10
Buccaneers	7.72	7	8
NFC West
Cardinals	6.28	4	7
49ers	6.97	6	7.5
Seahawks	7.08	8	6
Rams	7.43	9	7.5
NFC East
Giants	8	9	9
Eagles	8.03	10	10.5
Cowboys	7.82	7	9
Redskins	5.6	4	6
NFC North
Lions	7.8	9	8
Packers	9.32	11	11.5
Vikings	6.83	6	7
Bears	6.77	5	8

*I used the following website for the lines http://www.sportsbook.ag/

1 comment:

K KAugust 29, 2011 at 8:27 PM
Mike and I were talking and I came up with a few ideas... just remember the main thing is this is an OPEN FORUM to get ideas out there so that Mike can better results.

First thing I thought of was instead of using team statistics, use individual player statistics that creates the aggregate team statistics. This is obviously a really difficult dataset to compile, but its a suggestion.

Another idea I had was somehow incorporating an "offseason" variable. Eagles adding Nnamdi, Kolb going to Arizona, Cam Newton being drafted by the Panthers. I have no idea how we would could make this work though, so suggestions are welcomed.

Monday, August 29, 2011

Discrimination - It can be a good thing

1 comment: