This needs to be written in R, and the data set is referred to as SarahsData
Fantastic news! We've Found the answer you've been seeking!
Question:
This needs to be written in R, and the data set is referred to as "SarahsData"
Can someone please give me some examples on how to draft some code for this?
Transcribed Image Text:
1. Linear Regression Greg has reached out to you for help in creating the model to predict diamond prices. For this purpose, please answer the following questions: (a) Load the data file titled "Diamonds Data.xls" into R. Print the structure of the dataset and explain the output. Hint: You can save the data as csv and use the read.csv and str commands. This can be done in 2 lines of code. (b) Through an exploratory analysis, identify the variables that appear to be good predictors of the price of a diamond. Present your top 3 findings from the data. (c) Create your training set with a random selection of 70% of the rows in the dataset and your testing set with the other 30%. Use seed value 123 for this randomization. Print the summary of the outcome variable in both train and test data. Are the two datasets similar in terms of the distribution of the outcome variable? Explain. Hint: You can use the sample command for the split. You will also need the set.seed command. (d) Train a linear regression model on the training dataset. How many of the variables are significant? Hint: Use the Im and summary commands to for this part. (e) What is the R value of the model developed in part (e) above? Hint: Use the summary command to for this part. (f) What is the MAPE of the model on the training data? Hint: Check fitted. values in the model which already contains predictions for the training dataset. (g) Which diamond is the most over-priced according to your model? Hint: Use the difference between predicted and actual values to calculate the magnitude of over- prediction. (h) Generate predictions on the test dataset. What is the MAPE on the test set? Hint: Use the predict function to generate predictions on the test set. (i) Assuming that Greg is willing to spend $12000 on the diamond and that the most important variable for Greg is the carats of the diamond, which diamond would you recommend he purchase from the training dataset? Explain your choice. 1. Linear Regression Greg has reached out to you for help in creating the model to predict diamond prices. For this purpose, please answer the following questions: (a) Load the data file titled "Diamonds Data.xls" into R. Print the structure of the dataset and explain the output. Hint: You can save the data as csv and use the read.csv and str commands. This can be done in 2 lines of code. (b) Through an exploratory analysis, identify the variables that appear to be good predictors of the price of a diamond. Present your top 3 findings from the data. (c) Create your training set with a random selection of 70% of the rows in the dataset and your testing set with the other 30%. Use seed value 123 for this randomization. Print the summary of the outcome variable in both train and test data. Are the two datasets similar in terms of the distribution of the outcome variable? Explain. Hint: You can use the sample command for the split. You will also need the set.seed command. (d) Train a linear regression model on the training dataset. How many of the variables are significant? Hint: Use the Im and summary commands to for this part. (e) What is the R value of the model developed in part (e) above? Hint: Use the summary command to for this part. (f) What is the MAPE of the model on the training data? Hint: Check fitted. values in the model which already contains predictions for the training dataset. (g) Which diamond is the most over-priced according to your model? Hint: Use the difference between predicted and actual values to calculate the magnitude of over- prediction. (h) Generate predictions on the test dataset. What is the MAPE on the test set? Hint: Use the predict function to generate predictions on the test set. (i) Assuming that Greg is willing to spend $12000 on the diamond and that the most important variable for Greg is the carats of the diamond, which diamond would you recommend he purchase from the training dataset? Explain your choice.
Expert Answer:
Related Book For
Business Analytics Communicating With Numbers
ISBN: 9781260785005
1st Edition
Authors: Sanjiv Jaggia, Alison Kelly, Kevin Lertwachara, Leida Chen
Posted Date:
Students also viewed these programming questions
-
: (i) What data structures are maintained by the page manager. (ii) What happens when a machine performs a read operation to a page. (iii) What happens when a machine performs a write operation to a...
-
llustrate different ways of connecting these components together to span a range of performance requirements. [10 marks] For each of the performance categories that you identify state today's typical...
-
A car is initially travelling at 3 0 m / s , which is above the speed limit . The driver sees a speed limit trap ahead and applies the brakes for 5 seconds, causing the car to slow down by 2 m / s...
-
Raber Corporation has found that 60% of its sales in any given month are credit sales, while the remainder are cash sales. Of the credit sales, the company has experienced the following collection...
-
Why are stock index futures and options sometimes referred to as derivative products? Why do some investors believe derivative products make the markets more volatile?
-
Your company has just announced a 7 percent price increase on your entire product line and you are meeting with your most important customer. She announces that your competitor has already been to...
-
Santo Corporation experienced a fire on December 31, 2014 in which its financial records were partially destroyed. It has been able to salvage some of the records and has ascertained the following...
-
What role does strategic communication play in crisis mitigation and reputation management, leveraging narrative frameworks to maintain stakeholder trust and confidence?
-
A medical study was conducted to study the relationship between infants systolic blood pressure and two explanatory variables, weight (kgm) and age ( days). The data for 25 infants are shown here. a....
-
A meeting has been arranged with Pietro Yon and other members of the committee and you have been asked to produce a paper for discussion which provides: An interpretation of the financial statements...
-
Is money the only extrinsic incentive? Describe and explain some examples of other extrinsic motivations and incentives.
-
If using heuristics leads to behavioural bias, is this rational or not? Why? Why not?
-
What barriers to career advancement do women and minorities face?
-
What is meant by the term born global?
-
What do expected utility theory, prospect theory and regret theory have in common? How do they differ?
-
Mar 5 Purchased $4,400 of merchandise inventory on account, 310, nEOM and FOB shipping point Mar 9 Returned $700 of defective merchandise purchased on March 5 Mar 11 Paid the freight bill of $250 on...
-
Representative data read from a plot that appeared in the paper Effect of Cattle Treading on Erosion from Hill Pasture: Modeling Concepts and Analysis of Rainfall Simulator Data (Australian Journal...
-
The consumption function is one of the key relationships in economics, where consumption is a function of disposable income. The accompanying table shows a portion of quarterly data for average U.S....
-
The accompanying data file contains 20 observations on the binary response variable y along with the predictor variables x 1 and x 2 . a. Estimate and interpret the linear probability model and the...
-
A manager of a local package delivery store believes that too many of the packages were damaged or lost. She extracts a sample of 75 packages with the following variables: the package number...
-
Which of Yellows statements regarding the trade implementation of non-equity investments is correct? A. Only Statement 4 B. Only Statement 5 C. Both Statement 4 and Statement 5 Robert Harding is a...
-
Based on Exhibit 1, the execution cost for purchasing the 90,000 shares of BYYP is: A. \($60\),000. B. \($82\),500. C. \($127\),500. Robert Harding is a portfolio manager at ValleyRise, a hedge fund...
-
What type of algorithm should be used to purchase the XYZ shares given Hardings priority in building the XYZ position and his belief about potential price movements? A. Scheduled algorithm B. Arrival...
Study smarter with the SolutionInn App