Question 2: Load the ISLR library into your R environment. Within this library is the data...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Question 2: Load the ISLR library into your R environment. Within this library is the data set you need for this assignment. Load the "Carseats" data into an object called "carseats" and check to ensure that the data loaded correctly (you should have 400 rows and 11 columns). Management has asked you to predict high sales volume, but the sales variable is the number of units sold (in thousands) in the last year. Create a new variable called "High Vol" that has the classes "yes" and "no" to indicate whether the location sold 10,000 units or more in the past year. How many stores produced a high volume? Question 3: Load the rpart library into your R environment (rpart contains the tree function necessary for fitting CART models). Partition the data set into training (60%) and testing (40%) sets. Build a single classification tree with the training data and High Vol as the target. (Hint: Be sure to exclude the Sales variable from the model since it was used to create our outcome variable.) 1. Which variable(s) were used in the tree model? 2. How would you use the model to predict whether or not a store would produce a high volume? 3. What is the accuracy of the model when using the training and test data? Use the function confusionMatrix to create a misclassification table to include with your answer. 4. Consider the following store: ShelvLoc = Good, Price = 115, no local advertising budget, and local income of $46,000. Based on the classification model, would a store with those features be predicted to be a high-performing store? Explain your answer. Question 4: Pruning a tree is important to ensure that the model has not overfit the data. Following the example provided in the book, prune the model created in Question 3 to minimize the cross-validation error. How did the tree change? How many levels does the pruned tree include? What are the 3 most important variables and their relative importance according to the pruned tree model? Question 5: What is the accuracy of the pruned tree model when using the training and test data? Use the function confusionMatrix to create a misclassification table to include with your answer. Question 2: Load the ISLR library into your R environment. Within this library is the data set you need for this assignment. Load the "Carseats" data into an object called "carseats" and check to ensure that the data loaded correctly (you should have 400 rows and 11 columns). Management has asked you to predict high sales volume, but the sales variable is the number of units sold (in thousands) in the last year. Create a new variable called "High Vol" that has the classes "yes" and "no" to indicate whether the location sold 10,000 units or more in the past year. How many stores produced a high volume? Question 3: Load the rpart library into your R environment (rpart contains the tree function necessary for fitting CART models). Partition the data set into training (60%) and testing (40%) sets. Build a single classification tree with the training data and High Vol as the target. (Hint: Be sure to exclude the Sales variable from the model since it was used to create our outcome variable.) 1. Which variable(s) were used in the tree model? 2. How would you use the model to predict whether or not a store would produce a high volume? 3. What is the accuracy of the model when using the training and test data? Use the function confusionMatrix to create a misclassification table to include with your answer. 4. Consider the following store: ShelvLoc = Good, Price = 115, no local advertising budget, and local income of $46,000. Based on the classification model, would a store with those features be predicted to be a high-performing store? Explain your answer. Question 4: Pruning a tree is important to ensure that the model has not overfit the data. Following the example provided in the book, prune the model created in Question 3 to minimize the cross-validation error. How did the tree change? How many levels does the pruned tree include? What are the 3 most important variables and their relative importance according to the pruned tree model? Question 5: What is the accuracy of the pruned tree model when using the training and test data? Use the function confusionMatrix to create a misclassification table to include with your answer.
Expert Answer:
Related Book For
Accounting Information Systems
ISBN: 978-0133428537
13th edition
Authors: Marshall B. Romney, Paul J. Steinbart
Posted Date:
Students also viewed these algorithms questions
-
Design a Java class that represents a cache with a fixed size. It should support operations like add, retrieve, and remove, and it should evict the least recently used item when it reaches capacity.
-
iScream Ice Cream Co. iScream is a Chicago start-up that sells ice cream in several mobile ice cream trucks. The main innovative feature of their business is the implementation of a smart phone app...
-
What is the square root of 3 to the square root of 2 power times the square root of 3 to the negative square root of 2 power?
-
Tsunamis are fast-moving waves often generated by underwater earthquakes. In the deep ocean their amplitude is barely noticeable, but upon reaching shore, they can rise up to the astonishing height...
-
Don kidnapped his seven-year-old daughter, Brittany, after the family court refused to give him legal visitation rights. Don and Brittany are on the run and need a place to hide from authorities. Don...
-
A circular metal disk with a shaft through its center rotates about a central axis as shown in Figure P29.33. The unit is placed in a uniform magnetic field of magnitude \(1.5 \mathrm{~T}\), directed...
-
Barnett Corporation sold a $500,000, 7 percent bond issue on January 1, 2011. The bonds pay interest each June 30 and December 31 and mature 10 years from January 1, 2011. For comparative study and...
-
QUESTION ONE a) Distinguish between sale and agreement to sell b) Explain the rights of unpaid seller against the goods c) Explain the nature of the contract of hire purchase QUESTION TWO (5 marks)...
-
Southeastern Foods has hired you to analyze their distribution-system design. The company has 11 distribution centers, with monthly volumes as listed below. Seven of these sites can support...
-
Using the accompanying Cost of Living Adjustments data to find the best multiple regression model to predict the salary as a function of the adjusted cost for living rates. What would the comparable...
-
CoCo's Bakery would like to revamp its production process. Currently it takes 15 minutes to bake a loaf of bread, in order to produce 5,000 units of bread per hour (measured in number of units that...
-
ER Model Assignment (2.5 point each) Create an ER model that includes entities (tables), Primary Keys (PK) and Foreign Key (FK) where appropriate. Include at least three fields that describe each...
-
Calculate the equivalent annual benefit (or cost) for each group of machines (Behemoth and Shikari). Based on such calculations, which machines should Mr. Moore buy, Behemoth or Shikari? is this...
-
What are the molecular mechanisms underlying enzyme regulation by cofactors and coenzymes, including metal ions, vitamins, and prosthetic groups, and how do these small molecules modulate enzyme...
-
How do relationships change with age? Might Sternberg's triangular theory of love differ between an 18-year-old couple and a 50-year-old couple? Provide examples.
-
A construction project is carried out by a general contractor with the following descriptions: Project Duration: 3 months (Start: May 1st; End: July 31st) Contract Amount:...
-
Do animals have rights? If so, what are they? What duties do human beings have toward animals? Does KFC protect animal welfare at an acceptable level?
-
Task: Use Excel and the Solver add-in to explore the effect of various resource constraints on the optimal product mix. b. Download the sample spreadsheet discussed in the article and print out the...
-
For each of the three basic options for replacing IT infrastructure (cold sites, hot sites, and real-time mirroring) give an example of an organization that could use that approach as part of its...
-
During a recent review, ABC Corporation discovered that it has a serious internal control problem. It is estimated that the impact associated with this problem is $1 million and that the likelihood...
-
A significance level of 0.05 indicates that the probability of making a type I error is 0.05.
-
A handy mnemonic for interpreting the P-value in a hypothesis test is this: If the P (value) is low, then the null must go.
-
In testing a claim about a population mean, a larger z test statistic always results in a larger P-value. Decide whether the statement makes sense (or is clearly true) or does not make sense (or is...
Study smarter with the SolutionInn App