Question: This needs to be written in R, and the data set is referred to as SarahsData Can someone please give me some examples on

This needs to be written in R, and the data set is referred to as "SarahsData"
Can someone please give me some examples on how to draft some code for this?
1. Linear Regression Greg has reached out to you for help in creating the model to predict diamond prices. For this purpose, please answer the following questions: (a) Load the data file titled "Diamonds Data.xls" into R. Print the structure of the dataset and explain the output. Hint: You can save the data as csv and use the read.csv and str commands. This can be done in 2 lines of code. (b) Through an exploratory analysis, identify the variables that appear to be good predictors of the price of a diamond. Present your top 3 findings from the data. (c) Create your training set with a random selection of 70% of the rows in the dataset and your testing set with the other 30%. Use seed value 123 for this randomization. Print the summary of the outcome variable in both train and test data. Are the two datasets similar in terms of the distribution of the outcome variable? Explain. Hint: You can use the sample command for the split. You will also need the set.seed command. (d) Train a linear regression model on the training dataset. How many of the variables are significant? Hint: Use the Im and summary commands to for this part. (e) What is the R value of the model developed in part (e) above? Hint: Use the summary command to for this part. (f) What is the MAPE of the model on the training data? Hint: Check fitted. values in the model which already contains predictions for the training dataset. (g) Which diamond is the most over-priced according to your model? Hint: Use the difference between predicted and actual values to calculate the magnitude of over- prediction. (h) Generate predictions on the test dataset. What is the MAPE on the test set? Hint: Use the predict function to generate predictions on the test set. (i) Assuming that Greg is willing to spend $12000 on the diamond and that the most important variable for Greg is the carats of the diamond, which diamond would you recommend he purchase from the training dataset? Explain your choice.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
