Question: python and statistics!! please answer the following code AND EXPLAIN WHAT YOU DID IN ORDER TO GET GOOD FEEDBACK! I need to understand what you
python and statistics!!
please answer the following code AND EXPLAIN WHAT YOU DID IN ORDER TO GET GOOD FEEDBACK! I need to understand what you did in order to understand the topic :)
the dataset you will be working with: https://1drv.ms/x/s!AtfXPbdjkmO7oJpwOAKVo_fFXxXdBA?e=jdLo1Z
another way to view the data file: https://drive.google.com/file/d/175FdKvreD_qV80wwR6zLyIQ8wWtLOpkj/view?usp=sharing
All substantial questions need explanations. You do not have to explain the simple things like "how many rows are there in data", but if you make a plot of global temperature, you should explain what do you see there!
------------------------------------------------------------------------------------------------------------------------------------------------------------
In this section you will work with basketball data. Basketball is a big business, and there is a lot of analytics collected about high-profile games. Game score is one of the popular measures of player's performance in game. But how is it calculated? Here we look at one particular dataset about DeAaron Fox's (see photo) 2023-2024 season (downloaded from basketball-reference.com: https://www.basketball-reference.com/players/f/foxde01/gamelog/2024). We recommend you to be familiarize yourself with the basics of basketball, including what are field goals, turnovers, and personal fouls. The dataset contains 30 variables, including field goals, field goal attempts, 3-point field goals, rebounds and personal fouls (See my data repo readme for reference: https://bitbucket.org/otoomet/data/src/master/sports/). The central variable in current context is GmSc, the game score. It is a summary performance score for the player (given he played in the game).
Here are the tasks:
1. Load data (fox-deaaron 23-24.csv). Do basic checks.
2. These data also include games where he did not play. Find how many games did DeAaron Fox actually play in this season. Hint: there are no general method of how to answer this. Just look at the data and figure it out based on what do you see there. It can be coded in different ways, but first you have to see how the relevant data looks like.
3. Clean the data and ensure the relevant variables are of numeric type so we can use those in the regression models. It is your task to find what is wrong with the data in its present form (it is downloaded directly from basketball-reference.com), and fix these issues. Hint: a good way to transform text to number is pd.to_numeric. Hint 2: you do not have to convert variables you are not using.
4. Analyze the game score GmSc. What is its range? Mean? Standard deviation? Which distribution does the histogram resemble?
5. First, let's run a simple regression model explaining game score GmSc by field goal attempts FGA: GmScg = 0 + 1 FGAg + g (1) where g indexes games. Display the results and answer the following questions: (a) What is the interpretation of Intercept (0)? (b) What is the interpretation of FGA (1)? Is it statistically significant? 6. (8pt) Next, let's analyse how is game score related to field goals (FG) and field goal attempts (FGA). Estimate the model GmScg = 0 + 1 FGg + 2 FGAg + g. (2) If done correctly, you should see results approximately 4.5, 2.9 and -0.6 here. Answer the following questions: (a) What is the interpretation of FG? Is it statistically significant? (b) What is the interpretation of FGA (2)? Is it statistically significant? (c) How do you explain the fact that model 1 shows positive and model 2 shows a negative estimate for FGA? There is a very easy an intuitive explanation that everyone will understand, including those who have no clue about stats. Can you phrase it in that way? Hint: try to understand what exactly is the difference between interpreting slope for simple regression and multiple regression. (d) What is the R2 of the model? How does it compare to the model 1? What do you conclude from this comparison?
7. Now include all the independent numerical variables, i.e. FG, FGA, 3P, 3PA, FT, FTA, ORB, DRB, AST, STL, BLK, TOV, PF into the model. Estimate it, and discuss the results. Answer the following questions: (a) How do standard errors and t-values look like in this model? (b) What is R2 of this model? What does it tell you about how game score is calculated? (c) What do the results tell about turnover (TOV )? Is it good or bad for the team? Suggestion: check out patsy Q() quoting to include non-valid variable names.
8. Finally, consult the game score explanation here:https://www.nbastuffer.com/analytics101/game-score/ .
Did you recover the same formula?
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
