Question: You will use the model created in HW-B, part [d] to answer the questions in [a] and [b] below. (a) [0.6 pts.] Print a summary'
![You will use the model created in HW-B, part [d] to](https://dsd5zvtm8ll6.cloudfront.net/si.experts.images/questions/2024/10/6703fadf82b79_0796703fadf64607.jpg)
![answer the questions in [a] and [b] below. (a) [0.6 pts.] Print](https://dsd5zvtm8ll6.cloudfront.net/si.experts.images/questions/2024/10/6703fae026117_0796703fadfee461.jpg)


You will use the model created in HW-B, part [d] to answer the questions in [a] and [b] below. (a) [0.6 pts.] Print a summary' of the model and based on that answer the following questions: (i) Which of the parameters are statistically significant at an t: of at feast 0.05? (ii) which of the parameters is the most important parameter for prediction? (iii) Which of the parameters is the least important parameter for prediction? (b) Use three digits of precision for this question (Hint: see the round(] function). [0.6 pts.] Count how many of the fitted values matched the PTS in the test dataset at a 95% confidence level by creating prediction intervals. To be considered a match, the response value of each observation in in the test dataset should be in the prediction interval created by predict. Note that we have the ground truth in our test dataset, so we can count the matches as one measure of accuracy. ICoerce the object returned from the predict(] function to be a data frame. To this data frame, add the response variable {PTS} from the held-out test dataset so that the data set has four columns: fit, lwr, upr, and PTS. Your data frame will look like the following (this is just an example, the contents of the data frame below may be different than yours). fit lwr upr PTS 5 7.843 2.6?2 9.891 3" 15 1E! . 891 1G! . 234 15. 991 9 Print out the data frame. Write R code that will count whether the response (PTS) falls within the lwr and upr limits of the prediction interval. For the output above, the first response [7] falls within the confidence interval, while the second one [9) does not Your code should do this computation and output an answer as follows: Number of predictions that are in the prediction interval: XX (c) [0.6 pts.] Consider the data frame you created in (b). Using the "fit" (prediction) and "PTS" columns, write R code to find out the RSS and RSE of your predictions. Recall that RSS and RSE are defined as shown below: RSS: RSE RSS na p-1 (d) [0.6 pts.] Plot the residuals of your predictions. What can you say about the distribution of the residuals? (e) [0.6 pts.] Plot a histogram of the residuals of your prediction. Does the histogram follow a Gaussian distribution?You are provided the Titanic dataset [titanic-newcsv). This dataset contains 1,309 observations across 15 dimensions. The response variable is "survived\" (D I did not survive, I I survived). The rest of the dimensions are as follows: pclass Passenger class (1 = 1'I class, 2 = 2\"\"1 class, 3 = 3mi class] name Name of passenger sex Gender of passenger (M, F] age Age of passenger sibsp Number of siblings or spouses aboard parch Number of parents children aboard ticket Ticket number fare Passenger fare cabin Cabin number embarked Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton) boat Lifeboat number [if survived] body Body number (if did not survived and body was recovered) home.dest Passenger origin and destination has_cabin_number Whether passenger had a cabin (1 = Yes, I] = No) Divide the dataset into 80% f 20%, 30% for training and 20% for testing. Use the seed[1122] before dividing the dataset. You will be creating a decision tree to predict who survives and who perishes. (a) [0.5 pts.] From the table above, list out the predictors you will use for modeling (b) [0.5 pts.] In the entire dataset, how many people survived and how many perished? (Hint: use table().) (c) [1].?5 pts.] Is the dataset class-balanced [i.e., has an equal class distribution in the dataset]? If the dataset is class- balanced, state so; if it is not, what problems do you foresee when you make predictions? (d) [0.5 pls.] Create a rpart decision tree model to predict \"survived\" using the predictors chosen in (a). (e) [0.5 pts.] What are the top 3 important variables from your model? Plot the decision tree using the following code: 3 rpart.plot{model, extra=ll, fallenJeavesIT, type=4, main=\"Titanic Survival Model") > print(model) Using the plot and the output of \"print[model)\
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
