Question: use any dataset from kaggle website as an example Please use your project data to do the following tasks: STEP 1 Data Descriptive Statistics Q1.
Please use your project data to do the following tasks: STEP 1 Data Descriptive Statistics Q1. Amongst the variables of interest identify one that is categorical and one that is quantitative and then provide the following descriptive deliverables: Summaries (Do this for at least one categorical and one quantitative variable). a) For the categorical variable create a frequency distribution. b) For the categorical variable create a bar diagram. c) For the quantitative variable create numerical summaries grouped by a categorical variable. d) For quantitative variable create a histogram and a boxplot grouped by categorical. Provide your analysis regarding the graphical visualizations. Write your main findings like which group has a higher mean/median, standard deviation /interquartile range, highest or lowest values. Is the data skewed or symmetrical? Provide comparisons across groups. STEP 2 Correlation and Regression Analysis Q2. Amongst the quantitative variables generate Relationships and Associations Correlation and Regression: a) Identify two or more quantitative variables that might be correlated. b) Find the correlation coefficient c) Create the scatter diagram under graphs. d) Provide your rationale and justify your findings regarding the correlation between two quantitative variables of interest STEP 3 and Regression Set up Work on your project data to start preliminary analysis: Q3.Prepare data by using the following preprocessing transformation and t: a) Please standardize the data. b) Check for null values c) Check for outliers Step 4 Implement Regression. Conduct Regression and answer the following questions: Q4 Implement Regression a) Objective and rationale of using the specific algorithm to achieve the objective. a) Steps of implementing the algorithm with regards to the context b) Interpretation of the results and prediction accuracy achieved c) Performance improvement techniques and improved accuracy achieved. Use feature selection, variable importance, compare RMSE(Regression) across models and Information gain (Decision Trees). K-fold cross validation, grid search etc. Please use your project data to do the following tasks: STEP 1 Data Descriptive Statistics Q1. Amongst the variables of interest identify one that is categorical and one that is quantitative and then provide the following descriptive deliverables: Summaries (Do this for at least one categorical and one quantitative variable). a) For the categorical variable create a frequency distribution. b) For the categorical variable create a bar diagram. c) For the quantitative variable create numerical summaries grouped by a categorical variable. d) For quantitative variable create a histogram and a boxplot grouped by categorical. Provide your analysis regarding the graphical visualizations. Write your main findings like which group has a higher mean/median, standard deviation /interquartile range, highest or lowest values. Is the data skewed or symmetrical? Provide comparisons across groups. STEP 2 Correlation and Regression Analysis Q2. Amongst the quantitative variables generate Relationships and Associations Correlation and Regression: a) Identify two or more quantitative variables that might be correlated. b) Find the correlation coefficient c) Create the scatter diagram under graphs. d) Provide your rationale and justify your findings regarding the correlation between two quantitative variables of interest STEP 3 and Regression Set up Work on your project data to start preliminary analysis: Q3.Prepare data by using the following preprocessing transformation and t: a) Please standardize the data. b) Check for null values c) Check for outliers Step 4 Implement Regression. Conduct Regression and answer the following questions: Q4 Implement Regression a) Objective and rationale of using the specific algorithm to achieve the objective. a) Steps of implementing the algorithm with regards to the context b) Interpretation of the results and prediction accuracy achieved c) Performance improvement techniques and improved accuracy achieved. Use feature selection, variable importance, compare RMSE(Regression) across models and Information gain (Decision Trees). K-fold cross validation, grid search etc
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
