New Semester
Started
Get
50% OFF
Study Help!
--h --m --s
Claim Now
Question Answers
Textbooks
Find textbooks, questions and answers
Oops, something went wrong!
Change your search query and then try again
S
Books
FREE
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Tutors
Online Tutors
Find a Tutor
Hire a Tutor
Become a Tutor
AI Tutor
AI Study Planner
NEW
Sell Books
Search
Search
Sign In
Register
study help
business
business analytics data
Data Mining And Predictive Analytics 2nd Edition Daniel T Larose, Chantal D Larose - Solutions
Based on the following information, how many components should be extracted, using (a) the eigenvalue criterion and (b) the proportion of variance explained criterion? Initial Eigenvalues Component Total % of Variance Cumulative % 1 2 3 4 5 6 7 8 9 10 11 1.088 1.056 1.040 1.023 1.000 0.989 0.972
Suppose that we go ahead and perform the PCA, in this case using seven components. Considering the following information, which variable or variables might we be well advised to omit from the PCA, and why? If we really need all these variables in the analysis, then what should we do? Communalities
Based on the following information, does there exists an adequate amount of correlation among the predictors to pursue PCA? Explain how you know this, and how we may be getting mixed signals. KMO and Bartlett’s Test 0.512 34.908 55 0.984 Kaiser–Meyer–Olkin measure of sampling adequacy Approx.
What is a user-define composite, and what is the benefit of using it in place of individual variables?
Explain why we perform factor rotation. Describe three different methods for factor rotation.
Describe two tests for determining whether there exists sufficient correlation within a data set for factor analysis to proceed. Which results from these tests would allow us to proceed?
Explain the difference between PCA and factor analysis. What is a drawback of factor analysis?
Explain the concept of communality, so that someone new to the field could understand it.
Describe the four criteria for choosing how many components to extract. Explain the rationale for each.
What is special about the first principal component, in terms of variability?
For what type of data are the covariance and correlation matrices identical? In this case, what is Σ?
Determine whether the following statements are true or false. If false, explain why the statement is false, and how one could alter the statement to make it true.a. Positive correlation indicates that, as one variable increases, the other variable increases as well.b. Changing the scale of
Refer to the hypothesis test in the previous exercise. Suppose we now set ???? = 0.01.a. What would our conclusion now be? Interpret this conclusion.b. Note that the conclusion has been reversed simply because we have changed the value of ????. But have the data changed? No, simply our level of
A sample of 100 donors to a charity has a mean donation amount of $55 with a sample standard deviation of $25. Test using ???? = 0.05 whether the population mean donation amount exceeds $50.a. Provide the hypotheses. State the meaning of ????.b. What is the rejection rule?c. What is the meaning of
Refer to the previous exercise. Describe the relationship between margin of error and confidence level.
Refer to the previous exercise. Describe the relationship between margin of error and sample size.
For each of the confidence intervals in the previous exercise, calculate and interpret the margin of error.
Extract a sample interesting decision rule from the original CART model. Comment on the interpretability of the results from the bagging and boosting models.
Apply each of (i), (ii), and (iii) to the test data set. Produce the contingency tables for each model. Compare the error rates for the bagging and boosting models against that of the original CART model.
Develop three models using the training data set: (i) an original CART model for predicting risk, (ii) a bagging model, where five base models are sampled with replacement from the training set, and (iii) a boosting model, where five iterations of the boosting algorithm are applied.
Partition the data set into training and test data sets.
Change the fifth bootstrap sample in Table 25.6 to the following: x 0.5 0.5 1 y 0 01
Verify that the ensemble classifier correctly predicts the three values of x.
Find the proportion of 1’s, and make the majority prediction for each value of x, similarly to that in Table 25.4.
Provide a table of the predictions for each base classifier, similarly to those found in Table 25.4.
Construct the base classifier for each bootstrap sample, analogous to those found in Table 25.3.
True or false: Unlike bagging, boosting does not suffer from a loss of interpretability of the results.
The boosting algorithm uses a weighted average of a series of classifiers. On what do the weights in this weighted average depend?
Does the boosting algorithm use bootstrap samples?
Explain what we mean when we say that the boosting algorithm is adaptive.
State the three steps of the boosting algorithm.
What is a downside of using bagging?
How does bagging contribute to a reduction in the prediction error?
State the three steps of the bagging algorithm.
What is a bootstrap sample?
What can happen if we apply bagging to stable models? Why might this happen?
Which classification algorithms are considered unstable? Which are considered stable?
What does it mean for a classification algorithm to be unstable?
True or false: bagging can reduce the variance of classifier models, while boosting can reduce both bias and variance.
Explain what is meant by the following terms: bias, variance, and noise.
What is the equation for the decomposition of the prediction error?
Demonstrate that an ensemble of five independent binary classifiers, each with a base error rate of 0.6, has a higher error rate than 0.6.
Recall the example at the beginning of the chapter, where we show that an ensemble of five independent binary classifiers has a lower error rate than the base error rate of 0.20. Demonstrate that an ensemble of three independent binary classifiers, each of which has a base error rate of 0.10, has a
Describe two benefits of using an ensemble of classification models.
Contrast the regression models generated for the two types of wines. Discuss any substantive differences.
Compare the standard deviations of the errors, and the mean absolute errors, for the Global Model versus the combined results (weighted averages) from the Red Wines Model and the White Wines Model.
Evaluate the White Wines Model using the white wines from the test data set. Calculate the standard deviation of the errors, and the mean absolute error.
Evaluate the Red Wines Model using the red wines from the test data set. Calculate the standard deviation of the errors, and the mean absolute error.
Train a regression model to predict Quality using the white wines in the training set. This is the White Wines Model.
Train a regression model to predict Quality using the red wines in the training set. This is the Red Wines Model.
Segment the training data set into red wines and white wines. Do the same for the test data set.
Evaluate the Global Model using the entire test data set, by applying the model generated on the training set to the records in the test set. Calculate the standard deviation of the errors (actual values−predicted values), and the mean absolute error. (For IBM/SPSS Modeler you can use the
Train a regression model to predict Quality using the entire training data set. This is our Global Model.
Perform Z-standardization. Partition the data set into a training set and a test set.
What would you say to a marketing manager who wished to use only one global model across his entire clientele, rather than trying segmentation models?
Explain the segmentation modeling process.
Name two methods for identifying useful segments.
Give a thumbnail explanation of segmentation modeling.
Compare the results from the a priori algorithm with those of the GRI algorithm. Which algorithm yields a richer set of rules, and why? Which algorithm is probably preferable for this particular data set? Why?
Apply the GRI algorithm to uncover association rules for predicting either churn or non-churn behavior. Specify reasonable lower bounds for support and confidence.
Compare the results from Exercise 13 with the results from the EDA and decision tree analysis in Chapters 3 and 6.Discuss similarities and differences. Which analysis format do you prefer? Do you find a confluence of results?
Set the minimum antecedent support to 1%, the minimum rule confidence to 5%, and the maximum number of antecedents to 1.a. Generate rules using confidence ratio as your evaluation measure with evaluation measure lower bound=40. Explain what this evaluation measure means.b. Select the rule
Set the minimum antecedent support to 1%, the minimum rule confidence to 5%, and the maximum number of antecedents to 1.a. Generate rules using confidence difference as your evaluation measure with evaluation measure lower bound=40. Explain what this evaluation measure means.b. For the rules that
Set the minimum antecedent support to 1%, the minimum rule confidence to 5%, and the maximum number of antecedents to 1.Use rule confidence as your evaluation measure.a. Find the association rule with the greatest lift.b. Report the following for the rule in (a). (i) Number of instances (ii)
Find the value of the J-measure for the sixth rule from Figure 23.5.
For each of the association rules found above by the a priori algorithm, find the J-measure. Then order the rules by J-measure. Compare the ordering with that from the a priori support × confidence ordering.
Verify your manually found results using association rule software.
Multiply the observed support times the confidence for each of the rules in Exercises 7 and 8, and rank them in a table.
Using 75% minimum confidence and 20% minimum support, generate one-antecedent association rules for predicting play.
Let ????=3. Generate the frequent 3-itemsets.
Restate the a priori property in your own words. For the following several exercises, consider the following data set from Quinlan5 shown as Table 23.8. The goal is to develop association rules using the a priori algorithm for trying to predict when a certain (evidently indoor) game may be played.
Describe support and confidence. Express the formula for confidence using support.
Describe the two main methods of representing market basket data. What are the benefits and drawbacks of each?
Summarize your salient EDA findings from the above exercises, just as if you were writing a report.
Refer to the previous exercise. Apply the other two binning methods (equal width, and equal number of records) to this same variable. Compare the results and discuss the differences. Which method do you prefer?
Report on whether anomalous fields exist in this data set, based on your EDA, which fields these are, and what we should do about it.
Why do we need to perform EDA? Why should not we simply proceed directly to the modeling phase and start applying our high-powered data mining software?
Apply the following inverse ln transformation to obtain the original value: eln value = exp(ln value).
Apply the following inverse Z transformation to obtain the original value: original value = (z value) ⋅ s + x. De-transforming a ln value:
Find the mean x and standard deviation s used to perform the standardization.
Report the standard errors (for continuous values) or confidence levels (for categorical values) for your imputations in Exercise 14
Impute all missing values in the data set. Explain the ordering that you are using.
Open the ClassifyRisk_Missing data set. Impute the missing value for marital status. Use the ClassifyRisk_Missing2 data set for Exercises 14–15.
Compare the standard errors for the imputations obtained in Exercises 9 and 11.Explain what you find.
Impute the sugars value of Quaker Oatmeal.
Impute the carbohydrates value of Quaker Oatmeal.
Impute the potassium content of Cream of Wheat.
Impute the potassium content of Almond Delight using multiple regression.
Repeat the procedure using the breast_cancer.arff data set with WEKA by selecting an attribute subset using Genetic Search. This time, however, specify naïve Bayes with use kernel estimator = true for both attribute selection and 10-fold cross-validation. Now, contrast the classification results
(Extra credit). Write a computer program for a simple genetic algorithm. Implement the example discussed in the text, using the Normal (16, 4) fitness function. Let the crossover rate be 0.6 and the mutation rate be 0.01. Start with the population of all integers 0–31. Generate 25 runs and
Calculate the child vectors for the whole arithmetic crossover example in the text. Use the parents indicated in the section on simple arithmetic crossover, with ???? = 0.5. Comment on your results.
Continue the example in the text, where the fitness is determined by the Normal (16, 4) distribution. Proceed to the end of the third iteration. Suppress mutation, and perform crossover only once, on the second iteration at locus four.
Compare the strengths and weaknesses of using backpropagation and genetic algorithms for optimization in neural networks.
Discuss why the selection operator should be careful to balance fitness with diversity. Describe the dangers of an overemphasis on each.
Match each of the following genetic algorithm terms with its definition or description. Term Definitiona. Selection One of the candidate solutions to the problem.b. Generation Scales the chromosome fitness by the standard deviation of the fitnesses, thereby maintaining selection pressure at a
Apply a misclassification cost of 5 (rather than the default of 1) for a false negative. Redo Exercises 23–29 using the new misclassification cost. Make sure to evaluate the models using the new misclassification cost rather than the measures mentioned in Exercise 28
Evaluate all base classifiers, as well as the models defined by the candidate threshold values selected in the previous exercise, using overall error rate, sensitivity, specificity, proportion of false positives, and proportion of false negatives. Deploy the best performing model.
Scan the histogram from left to right, to identify candidate threshold values of the mean propensity for partitioning the test set into churners and non-churners. The goal is to select a set of candidate threshold values that discriminate well between churners to its right and non-churners to its
Construct a normalized histogram of mean propensity, with an overlay of Churn. (See Figure 26.3 for an illustration.)
For each record in the test data set, calculate the propensity of that record toward a positive response for Churn, for each of the base classifiers. Compute the mean propensity for each record across all base classifiers.
Showing 2100 - 2200
of 4107
First
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Last
Step by Step Answers