Question: In this assignment, you will perform hypothesis testing, calculate correlation coefficients, build linear regression models, and diagnose potential issues in the models using Python. Tasks:
In this assignment, you will perform hypothesis testing, calculate correlation coefficients, build linear regression models, and diagnose potential issues in the models using Python.
Tasks:
Hypothesis: Locate a dataset containing the heights, weights, and ages of at least individual males and females. Conduct a hypothesis test to determine whether there is a significant difference in the mean weight between males and females. Write a report discussing your findings, including relevant statistics, visualizations, and interpretations. Use Python to perform the analysis and include the relevant code in a Jupyter notebook.
Correlation Coefficient: Using the same dataset as in Task calculate the correlation coefficient between height and weight. Interpret the coefficient and visualize the relationship between the two variables using a scatter plot. Write a report discussing your findings, including relevant statistics, visualizations, and interpretations. Use Python to perform the analysis and include the relevant code in a Jupyter notebook.
Linear Regression Model: Using the same dataset as in Task build a linear regression model to predict weight based on height. Perform model verification to determine potential issues, such as heteroscedasticity or multicollinearity, and address any identified issues. Write a report discussing your findings, including relevant statistics, visualizations, and interpretations. Use Python to perform the analysis and include the relevant code in a Jupyter notebook.
Multiple Regression Model: Continuing with the dataset from Task build a multiple regression model to predict weight based on height and age. Perform model verification to determine potential issues, such as heteroscedasticity or multicollinearity, and address any identified issues. Write a report discussing your findings, including relevant statistics, visualizations, and interpretations. Use Python to perform the analysis and include the relevant code in a Jupyter notebook.
Solutions: Identify and discuss two common assumptions of linear regression models. Using a dataset of your choice, build a linear regression model that violates one of these assumptions. Perform model verification to determine any violation and propose a solution to address the issue. Write a report discussing your findings, including relevant statistics, visualizations, and interpretations. Use Python to perform the analysis and include the relevant code in a Jupyter notebook.
Nonlinear Model: Using the same dataset as in Task propose a nonlinear model to predict the response variable. Compare the performance of the nonlinear model to that of the linear regression model built in Task using appropriate metrics. Write a report discussing your findings, including relevant statistics, visualizations, and interpretations. Use Python to perform the analysis and include the relevant code in a Jupyter notebook.
Requirements:
Jupyter notebook containing all Python code used in the tasks.
Include written responses to each task prompt, using markdown cells in the same Jupyter notebook.
Include visual aids to support your answers.
Record a short video minutes explaining your work and highlighting key findings. Use visualization techniques to demonstrate the impact of the outliers on measures of central tendency and variability. Use an online video platform such as Loom, YouTube, or Vimeo to upload your completed video.
Deliverables:
Submit a Jupyter notebook containing all Python code, written responses, and visual aids.
Include the link to your video recording.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
