Answer the following: Formulate a prediction question that you want to answer by applying regression modeling. Examples:
Question:
Answer the following: Formulate a prediction question that you want to answer by applying regression modeling. Examples: Prediction Question: How accurately can I predict the price of a house given the values of all the variables?
Search and locate a dataset that is relevant to the question you created in the previous step. You may search repositories such as Data.gov, UCI Machine Learning, Kaggle, or Scikit-Learn.
Find dataset with no less than 10 variables, mostly quantitative. Explain and describe your dataset's variables. List your dependent and independent variables, and identify which scale is used to measure each variable (interval, ordinal, or nominal). Hint: interval is the most appropriate scale for regression analysis.
Import all the necessary libraries and load your dataset into a data frame. Use the feature-engine transformers available in the scikit-learn library to feature engineer your dataset variables.
Perform the following:
- Missing data imputation: which transformer(s) did you apply and why?
- Categorical variable encoding: which transformer(s) did you apply and why?
- Outliers: which transformer(s) did you apply and why?
- Discretization: which transformer(s) did you apply and why?
- Variable transformation: which transformer(s) did you apply and why?
Split your dataset into training and testing sets.
Import the Pipeline class from the sklearn.pipleline library.
Make pipeline object and pass all the transformers you created in step 5 and a regression model.
Fit the pipeline on the training dataset.
Make predictions and evaluate the performance of your model using the cross-validation technique. Report the RMSE and R2 values and explain the results.
Can you include the codes needed for this as well please.