Question: In your data set is a random list of Data Science salaries from 2020 - 2022. You are given an unknown label X , the
In your data set is a random list of Data Science salaries from 2020 - 2022. You are given an unknown label X, the job title, and their salaries (in ten thousands) - this will be your Y.
Summary Statistics
- Mean of X and Y (5 points each)
- Median of X and Y (5 points each)
- Q1 of X and Y (5 points each)
- Q3 of X and Y (5 points each)
- IQR of X and Y (5 points each)
- Outliers of X and Y (5 points each)
Simple Linear Regression
- Find the equation for Linear Regression on this dataset - (5 points)
- Does X make sense? Why? Why not? - (10 points)
- Please give a possible label for X - (5 points)
Z-Tables
In your work for Linear Regression, you have calculated the mean and the standard deviation of the salaries for Data Scientists. Using that information, please calculate the following:
- The percentage of data scientists whose income are below 99.5 - 5 points for showing the Z score, and 5 points for showing the percentage
- The percentage of data scientists whose income are above 137.5 - 5 points for showing the Z score, and 5 points for showing the percentage
| X | Job Title | Salary (in ten thousands) |
| 538 | Data Scientist | 141 |
| 269 | Data Engineer | 65 |
| 543 | Data Engineer | 99 |
| 243 | Data Scientist | 165 |
| 336 | Data Analyst | 167 |
| 61 | Data Engineer | 131 |
| 370 | Data Scientist | 123 |
| 91 | Data Science Consultant | 77 |
| 248 | Data Engineer | 96 |
| 39 | Machine Learning Engineer | 138 |
| 76 | BI Data Analyst | 100 |
| 339 | Data Analyst | 109 |
| 111 | Director of Data Engineering | 113 |
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
