Monday, September 5, 2011

Constant Development is the Law of Life (Ghandi)

“OJ (Simpson) had to of known he wasn't ever gonna get away with a head that size. Only person u could mistake his silhouette for is stewey griffin.”

-Channing Crowder

            The first (and maybe only) major revision of our model predicting NFL wins is here.  Keeping in mind that the foremost goal of this project is to maximize the adjusted R square of this multiple regression model, I decided to compare different models across different lengths in time.  More specifically, I used the same procedure to select the best model using data from 1996, 2001, and 2006.  The corresponding adjusted R squares were as follows:

Year
RA2
1996
.128
2001
143
2006
.193



As you can see the model using data from 2006 increased our predictive power by 6.5%, which is substantial given my stated goal is to get the adjusted RA2 to .3.  One quick note:  There is another model that could have gotten the value up to .197.  However, I made the executive decision to manually include winning last year’s Super Bowl as a variable because the Packers were being predicted very low (as in winning only 5 games low).

Since, I was reanalyzing all 30 or so variables a model with different predictors came out.  Thanks to the inadvertent (maybe) advice of a professor, I was able to use a procedure in SAS that will select the combination of variables that produces the highest RA2.  Below is a table containing the variables in the original model with those in the revised model.

Original Model
Revised Model
Lost the previous Super Bowl
New Coaching Staff
Won the previous Super Bowl
Making the Playoffs the Previous Year
Defensive Take Aways
Offensive Points Scored
Offensive Passing Yards
Offensive 1st Downs
Passing Yards Allowed
Offensive Passing Yards
Rushing TDs Allowed
Turnovers
Rushing Yards Gained
Offensive Rushing Attempts
Offensive Plays
Rushing Yards Allowed
Offensive Rushing Attempts
Won the previous Super Bowl
Offensive Passing Attempts
Defensive Take Aways
Offensive Points Scored




As you can see some variables were included in both models while others were retained.  Of particular note: new coaching staff became highly predictive.  Further, the table below shows that the variable measuring take aways is behaving oddly.  That is it is predicting that the more take aways generated the lower the win total.  This could be a result of inter-correlations between predictors however it has appeared in both models so I’m concerned it may be measuring some other variable.  Below are the regression coefficients:

                                     Parameter Estimates


                    Parameter       Standard
           Variable         DF       Estimate          Error    t Value    Pr > |t|

           Intercept         1       -9.20641        5.54052      -1.66      0.0987
           New_Coach         1       -0.85085        0.58044      -1.47      0.1448
           Playoffs_Prev     1        0.78022        0.64109       1.22      0.2255
           PPoints           1        0.01734        0.00675       2.57      0.0112
           PO1stD            1        0.02443        0.01297       1.88      0.0616
           POPYards          1    -0.00092570     0.00062871      -1.47      0.1430
           POTO              1        0.06768        0.04225       1.60      0.1113
           PDRAtt            1        0.03470        0.01195       2.90      0.0043
           PDRYards          1       -0.00443        0.00142      -3.12      0.0022
           PDTO              1       -0.07588        0.04272      -1.78      0.0777
           SB_W_Prev         1        0.79177        1.33607       0.59      0.5543

A quick reminder of the interpretation: values under the column headed Parameter Estimates are interpreted as for every one unit increase in X, Y increases by that value.  For example, for everyone point scored (PPoints) wins increases by .01734.  Another example would be the offensive 1st down variable:  For every offensive 1st down gained we can expect to gain .02443 wins.  Note how this interpretation doesn’t make sense for take aways (PDTO).

Below are the model fit statistics.  Again I won’t go into detail unless someone asks me to.

Analysis of Variance

                                            Sum of           Mean

        Source                   DF        Squares         Square    F Value    Pr > F

        Model                    10      376.91710       37.69171       4.81    <.0001

        Error                   149     1168.07665        7.83944

        Corrected Total         159     1544.99375


                     Root MSE              2.79990    R-Square     0.2440

                     Dependent Mean        7.99375    Adj R-Sq     0.1932

                     Coeff Var            35.02612



Finally, I may possibly update this twice more by the end of the week.  The first update would be to add a variable which adds values to the number of playoff games won previously.  I may not have time to collect that data.  However, I will definitely be reviewing this model for outliers and influential observations so expect some predictions to change.  Below is the current predictions.  Like before I weighted the predictions by order of finish in the division however I changed the weights (now 1st place gets 1.5 wins, 2nd gets 1, 3rd loses 1, and 4th lose 1.5).  The 3rd column is what I would bet using the information from this model and common sense.

Division
Prediction
Lines 9/5*
My Bet
AFC East
Dolphins
8
7.5
Under
Bills
6
5.5
Over
Patriots
11
11.5
Under
Jets
10
10
Over
AFC North
Ravens
10
10
Under
Bengals
6
5.5
Over
Browns
6
7
Under
Steelers
9
10.5
Under
AFC South
Texans
11
9
Over
Colts
12
N/A
N/A
Jags
9
6.5
Over
Titans
6
6.5
Under
AFC West
Broncos
6
6
Over
Chiefs
10
N/A
N/A
Raiders
7
6.5
Over
Chargers
12
10
Over
NFC West
49ers
6
7.5
Under
Seahawks
9
6
Over
Rams
4
7.5
Over
Cardinals
8
7.5
Over
NFC South
Falcons
9
10
Under
Panthers
4
4.5
Under
Buccaneers
5
8
Under
Saints
11
10
Under
NFC North
Lions
6
8
Under
Packers
9
11.5
Under
Vikings
5
7
Under
Bears
8
8
Over
NFC East
Giants
9
9
Over
Eagles
11
10.5
Over
Redskins
5
6
Under
Cowboys
6
9
Under



The Colt’s are still off the table because of the Peyton situation and for some reason the Chiefs are as well.  Not to toot my own horn but look at how close the predictions are to the set lines.  My model predicts 46.7% of the teams winning within 1 game of the line and predicts 16.7% of the lines exactly.

Finally I have some announcements for future plans.  I’ll be keeping a summary of the bets I would’ve made if I wasn’t broke as hell and actually trusted the model.  My expectation is that this summary will show I’ve lost money when the season is done.  Secondly, due to recommendations from people who have read this blog, I am going to attempt to do the same for weekly lines.  Kyle Kelly and Matthew Cornelius Wojay has graciously volunteered to help me create a model and to collect data from weekly games.  I’ve also had colleagues offer to help me with data analysis on weekly lines.  Weekly line prediction is slated to start for Week 3, so look forward to that. 

Any Thoughts?

*I used the following website for the lines http://www.sportsbook.ag/

No comments:

Post a Comment