Question: This case study provides a structured activity for an introduction to students of basic data preparation, modeling, and analysis motivated by an engaging topic. The
This case study provides a structured activity for an introduction to students of basic data preparation, modeling, and analysis motivated by an engaging topic. The students are provided background in- formation and data for point spreads of National Football League games. Using these point spreads, the students model the relationship between point spread and the probability of winning the game using linear and logistic regression. We often teach it during the fall term when the NFL is in full swing so it is easy to relate the data. Essentially, if the point spread is less than double digits, the model states that the probability that a team wins a game equals 50% plus 3% per every unit of point spread. Stern states that the linear approximation for the probability of winning is 0.50 + 0.03p, where p is the point spread. At a certain point, the linear model breaks down as the probability estimate exceeds 100%. +700, meaning that a $100 bet would win $700 if the underdog wins the game. A heavy favorite to win might be listed as 1/9 or 900, meaning a $100 bet would only win about $11 if the favorite wins the game. Consider a football game where the underdog payoff is 7/1 and the favorite payoff is 1/9. To assess the underdogs probability of winning, divide the denominator by the sum of the numerator and the denominator to get 1/ = 1/8 = 12.5%. Similarly, an estimate for the favorite is 9/ = 9/10 = 90%. So, the best esti- mate for the underdogs probability of winning is likely somewhere in between 10% and 12.5%. Namely, the probability that an event will occur is the fraction of times you expect to see that event in many trials. Probabilities range between zero and one. The odds are dened as the probability that the event will occur divided by the probability that the event will not occur. For example, if a team has an 80% probability of winning the game, then the odds are 0.8/ = 0.8/0.2 = 4 . Other topics related to this case study are real-time or in-game winning probabilities and the concept that point spreads are efcient markets. The sports network ESPN and websites like www. Football Power Index to make these estimates, which relies on a series of factors and then utilizes simula- tion to determine how likely it is that a team will win. Further, going in the other direction, ESPN furnishes real-time probabilities during games with estimates for probabilities of each team winning or losing. For example, a team with a lead going into the fourth quarter will have a higher probability than a team with that same lead after the rst quarter. We encourage students interested in this line of inquiry to do some online research regarding whether NFL point spreads are efcient markets, which leads to dozens of papers on the subject.
This case study was developed for an introduction to business analytics course taken by junior and senior business majors. Additional option 2 represents an opportunity for the students to individually show what they have learned. In our class, the students work in groups to achieve the results for the general assignment, but we want to make sure that each member of each group understands the entire process individually. There- fore, each student redirects their focus to the NBA data and repeats the process working on their own. A few students tend to really enjoy this case and want to do more. Additional option 3 allows them to think about how to include information on the over/ under or other factors to improve their models. First, an add-in like StatTools could be used, eliminating the need to use Solver for the logistic regression and providing out- put like the Receiver Operating Characteristic curve and lift chart. Or if the students are procient at coding, the analysis could be done in Python or R, allowing for the case to take on a heavier emphasis on data science. This case study allows students to build both linear and logistic models of real sports data. In our expe- rience, students enjoy either the sports aspect of it or tting the logistic curve, some both. There are several highlights to point out for this case as well as an abundance of teachable moments. We describe these below in order of occurrence as- suming students are rst doing the general assign- ment and then working on the additional options. In our experience, the more practice students get with PivotTables, the better. Simple things like each column of data needing a header are easily forgotten. This can lead to a great classroom discussion of statistical sampling error. When the students t the rst line to the scatter diagram, they notice that it crosses 100% when the point spread exceeds 17. Even though the business students in the class are not necessarily mathemati- cally oriented, this rubs them the wrong way and they realize something is off. Both of these games are right at the point where the linear model breaks down, and it is evident that no outcome is guaranteed, whatever the point spread may be. Although most of the students have used the trendline function in Excel, many of them are not familiar with its options, such as forcing the y-intercept and why this might be appropriate. This is another op- portunity for classroom discussion. Fitting the logistic curves in the standard and nonstandard ways tends to be a highlight of the case study. The students want a model that ts, that is, one that doesnt predict probabilities that exceed 100% as the linear model does. In our class, we follow standard logistic re-gression by tting the general logistic function by minimizing the sum of absolute errors and the sum of the squared errors. Time per-mitting, there exists an excellent opportunity here to discuss with the students what really constitutes the best t. Some students really seem to understand for the rst time what regression is doing while working the nonstandard tting process. On additional option 1, the students must convert the raw data from 2017 and clean it up. The source data are separated by an uncommon delimiter, a half-space.
Question: after reading the text, identify and explain the crux of the problem.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
