Question

Each year, the American Academy of Motion Picture Arts and Sciences recognizes excellence in the film industry by honoring directors, actors, and writers with awards (called the Oscars) in different categories. The most notable of these awards is the Oscar for Best Picture.
The Data worksheet in the file Oscars contains data on a sample of movies nominated for the Best Picture Oscar. The variables include total number of Oscar nominations across all award categories, number of Golden Globe awards won (the Golden Globe award show precedes the Oscars), whether the movie is a comedy, and whether the movie won the Best Picture Oscar.
There is also a variable called ChronoPartition that specifies how to partition the data into training, validation, and test sets. The value t identifies observations that belong to the training set, the value v identifies observations that belong to the validation set, and the value s identifies observations that belong to the test set. Partition the data using XLMiner's
Standard Partition procedure by selecting Use partition variable in the Partitioning options area and specifying the variable ChronoPartition.
Construct a logistic regression model to classify winners of the Best Picture Oscar. Use Winner as the output variable and OscarNominations, GoldenGlobeWins, and Comedy as input variables. Perform an exhaustive-search best subset selection with the number of best subsets equal to 2.
a. From the generated set of logistic regression models, select one that you believe is a good fit. Express the model as a mathematical equation relating the output variable to the input variables. Do the relationships suggested by the model make sense? Try to explain the relationships.
b. Using the default cutoff value of 0.5 for your logistic regression model, what is the overall error rate on the validation data?
c. Note that each year there is only one winner of the Best Picture Oscar. Knowing this, what is wrong with classifying a movie as a winner or not using a cutoff value?
d. What is the best way to use the model to predict the annual winner? Out of the six years in the validation data, in how many years does the model correctly identify the winner?
e. Use your model to classify the 2011 nominees for Best Picture. In Step 3 of XLMiner's Logistic Regression procedure, check the box next to In worksheet in the Score new data area. In the Match variable in the new range dialog box, (1) specify NewData in the Worksheet: field, (2) enter the cell range A1:E9 in the Data range: field, and (3) click Match variable(s) with same name(s). When completing the procedure, this will result in a LR_NewScore worksheet that contains the predicted probability that each 2011 nominee will win the Best Picture Oscar. What film did the model believe was the most likely to win the 2011 Best Picture Oscar? Was the model correct?


$1.99
Sales1
Views47
Comments0
  • CreatedNovember 21, 2015
  • Files Included
Post your question
5000