Monday, August 29, 2011

Discrimination - It can be a good thing



            The first model to predict wins for an NFL season has been finished.  First I’m going to give a bit of background on the processes used to arrive at the model.  If you don’t care about the statistics that went into these predictions please jump ahead to the predicted wins compared with the betting line. This model was arrived at by using backwards regression to pick the below model.  While the goal of backwards regression is usually to end with a model containing only significant predictors, I instead selected the model that had the highest adjusted R2.  As I said yesterday, I included all variables that were not linearly related.  However, due to an oversight I did include pass attempts, rush attempts, and total plays.  In future models, this will need to be corrected.

The variables included in the final model were:

·         Whether a team won the Super Bowl the previous year (SB_Win_Prev)

·         Whether a team played in a Super Bowl the previous year (Sb_App_Prev)

·         Points scored last season (Ppoints)

·         Pass attempts (POPAtt)

·         Rush attempts (PORAtt)

·         Offensive plays (POPlays)

·         Rush yards (PORYards)

·         Passing yards (POPYards)

·         Rushing TDs against (PDRTD)

·         Passing yards against (PDPYards)

·         Take Aways (PDTO)

Model Summary
Model
R
R Square
Adjusted R Square
Std. Error of the Estimate
Change Statistics
R Square Change
F Change
df1
df2
Sig. F Change
1
.385
.148
.128
2.854
.148
7.223
11
457
.000



In the above table, the only statistics of interest are the highlighted boxes: R Square and Adjusted R Square.  R Square can be looked at as how accurately the combination of the above 11 variables predict wins.  A model can predict between 0% and 100% of wins.  However, R square tends to overestimate the prediction accuracy and thus Adjusted R Square is the preferred measure.  Currently this model predicts 12.8% of wins which sucks.  A model with an adjusted R squared of 30% is my overall goal for this project.




ANOVAb
Model
Sum of Squares
df
Mean Square
F
Sig.
1
Regression
647.227
11
58.839
7.223
.000
Residual
3722.697
457
8.146


Total
4369.923
468






The above table is for those that are interested in overall model fit.  I won’t provide an explanation here but will gladly explain it for those interested.

Coefficientsa
Model
Unstandardized Coefficients
Standardized Coefficients
t
Sig.
B
Std. Error
Beta
1
(Constant)
9.895
3.597

2.751
.006
Sb_App_Prev
-.951
.781
-.076
-1.218
.224
SB_W_Prev
1.741
1.051
.100
1.657
.098
PDTO
-.047
.024
-.102
-2.009
.045
POPYards
-.001
.001
-.188
-1.702
.089
PDPYards
-.001
.000
-.108
-2.283
.023
PDRTD
-.063
.025
-.116
-2.497
.013
PORYards
-.001
.001
-.088
-1.011
.312
POPlays
-.023
.013
-.347
-1.786
.075
PORAtt
.025
.013
.427
1.954
.051
POPAtt
.026
.013
.460
1.935
.054
PPoints
.020
.004
.437
4.412
.000



The coefficients table is the most interesting theoretically.  The model formula is taken from the highlighted column and is known as the regression coefficient.  The regression coefficients tell us how much wins changes when each individual predictor increases by 1.  So for instance let’s look at rushing attempt variable.  This variable has a regression coefficient of .025.  The interpretation for this is that for every 1 extra rush attempt wins increases by .025.  This indicates that teams that have a high amount of rushing attempts win more.  Notice some of the coefficients defy what we would expect to see.  For instance, for every extra take away we expect teams to lose .047 games.  This is the result of some of the variables being correlated which can cause changes in the regression coefficients.

Predictions

So using the above regression coefficients, I plugged the stats for last season into the model and came up with predicted wins.  However, the initial estimates provided no discrimination (hence the joke in the title).  By discrimination, I mean that the majority of estimates were between 6 and 9 wins and thus it was hard to differentiate good and bad teams.  To rectify this, I ordered the teams within their division.  I rounded the original estimates and then awards bonuses and penalties based on that order.  First place teams had 2 wins added, 2nd place 1 win, 3rd place team had 1 win subtracted and 4th place teams had 2 wins subtracted.  In the following table I included the original prediction, the revised prediction, and the current line.  Note, the site I used is currently not taking bets on the Colts because of the Peyton Manning issue.

Couple of quick notes.  For the most part I was surprised at how accurate the predictions were.  While some predictions seem really high (hello Raiders) or low (Cardinals got hosed), overall the model gave a sound order of finish within the division.  Also, should anyone decide to use this to actually place bets please use this just as an extra resource to making a good decision.  Common sense should be used along with these predictions.  As an example, the Colts are predicted to win 12 games.  If Manning misses time that becomes a more and more extreme prediction.  Hope you enjoyed this and I will post possible updates tomorrow.

Division
Predicted Wins
Revised Wins
Line 8/29*
AFC East
Dolphins
7.13
6
7.5
Bills
7.01
5
5.5
Patriots
9.74
12
11.5
Jets
8.3
9
10
AFC North
Ravens
8.21
9
10
Bengals
7.39
6
5.5
Browns
6.82
5
7
Steelers
9.62
12
10.5
AFC South
Texans
7.21
6
9
Colts
9.66
12
NA
Jags
7
5
6.5
Titans
8.15
9
6.5
AFC West
Broncos
6.55
5
6
Chiefs
8.13
7
7.5
Raiders
8.85
10
6.5
Chargers
9.25
11
10
NFC South
Falcons
9.26
11
10
Panthers
4.82
3
4.5
Saints
8.66
10
10
Buccaneers
7.72
7
8
NFC West
Cardinals
6.28
4
7
49ers
6.97
6
7.5
Seahawks
7.08
8
6
Rams
7.43
9
7.5
NFC East
Giants
8
9
9
Eagles
8.03
10
10.5
Cowboys
7.82
7
9
Redskins
5.6
4
6
NFC North
Lions
7.8
9
8
Packers
9.32
11
11.5
Vikings
6.83
6
7
Bears
6.77
5
8



*I used the following website for the lines http://www.sportsbook.ag/

1 comment:

  1. Mike and I were talking and I came up with a few ideas... just remember the main thing is this is an OPEN FORUM to get ideas out there so that Mike can better results.

    First thing I thought of was instead of using team statistics, use individual player statistics that creates the aggregate team statistics. This is obviously a really difficult dataset to compile, but its a suggestion.

    Another idea I had was somehow incorporating an "offseason" variable. Eagles adding Nnamdi, Kolb going to Arizona, Cam Newton being drafted by the Panthers. I have no idea how we would could make this work though, so suggestions are welcomed.

    ReplyDelete