New Semester
Started
Get
50% OFF
Study Help!
--h --m --s
Claim Now
Question Answers
Textbooks
Find textbooks, questions and answers
Oops, something went wrong!
Change your search query and then try again
S
Books
FREE
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Tutors
Online Tutors
Find a Tutor
Hire a Tutor
Become a Tutor
AI Tutor
AI Study Planner
NEW
Sell Books
Search
Search
Sign In
Register
study help
business
business statistics communicating
Business Analytics Communicating With Numbers 1st Edition Sanjiv Jaggia, Alison Kelly, Kevin Lertwachara, Leida Chen - Solutions
Refer to the previous exercise for a description of the problem and data set. Build a default classification tree to predict whether an individual is likely to attend church. Display the default classification tree. a. How many leaf nodes are in the tree? What are the predictor variable and
The accompanying data set contains four predictor variables (x1 to x4) and one binary target variable (y). Follow the instructions below to create classification trees using the Exercise_10.10_Data worksheet. a. Use the rpart function to build a default classification tree. Display the default
Refer to the previous exercise for a description of the problem and data set. Build a default classification tree to predict whether the gamer will make in-app purchases. Display the classification tree. a. What are the predictor variable and the split value for the first split of the default
Refer to the previous exercise for a description of the problem and data set. Create a classification tree model for predicting whether the student will be able to graduate within four years (Grad). Assign 0 as the success class as we are more interested in identifying students who are at the risk
Monstermash, an online game app development company, wants to be able to predict which gamers are likely to make in-app purchases. Ranon Weatherby, the company’s data analyst, has compiled a data set about customers that contains the following variables: customer age (Age), sex (1 if male, 0
Refer to the previous exercise for a description of the data set. Create a regression tree model for predicting house prices (Price). Select the best-pruned tree for scoring and display the full-grown, best-pruned, and minimum error trees. a. Display the best-pruned tree. How many leaf nodes
The accompanying data set contains two predictor variables, x1 and x2, and one numerical target variable, y. A regression tree will be constructed using the data set.a. List the possible split values for x1 in ascending order.b. List the possible split values for x2 in ascending order.c. Compute
The accompanying data set contains three predictor variables, x1, x2, and x3, and one numerical target variable, y. A regression tree will be constructed using the data set.a. List the possible split values for x1 in ascending order.b. List the possible split values for x2 in ascending order.c.
Refer to the previous exercise for a description of the data set. Build a default regression tree to predict the customer’s spending during the first three months of the year (Spending). Display the regression tree. a. What are the rules that can be derived from the default regression
The accompanying data set contains two predictor variables, x1 and x2, and one numerical target variable, y. A regression tree will be constructed using the data set.a. Which split on x1 will generate the smallest MSE?b. Which split on x2 will generate the smallest MSE?c. Which variable and split
Refer to the previous exercise for a description of the data set. Build a default regression tree to predict the customer’s annual household spending on travel products (TravelSpend). Display the regression tree. a. How many leaf nodes are in the default tree? What are the predictor variable
Create a regression tree using the accompanying data set in Exercise_10.28_Data worksheet (predictor variables: x1 to x5; target: y). Select the best-pruned tree for scoring and display the full-grown, best-pruned, and minimum error trees.a. What is the minimum validation MSE in the prune log? How
Create a regression tree using the accompanying data set (predictor variables: x1 to x4; target: y). Select the best-pruned tree for scoring and display the full-grown, best-pruned and minimum error trees. a. What is the minimum validation MSE in the prune log? How many decision nodes are
An online retail company is trying to predict customer spending in the first three months of the year. Brian Duffy, the marketing analyst of the company, has compiled a data set on 200 existing customers that includes sex (Female: 1 = Female, 0 otherwise), annual income in 1,000s (Income), age
Create a regression tree using the accompanying data set in the Exercise_10.30_Data worksheet (predictor variables: x1 to x4; target: y). a. Use the rpart function to build a default regression tree. Display the default regression tree. How many leaf nodes are in the default regression
Refer to the previous exercise for a description of the data set. Build a default regression tree to predict an NBA player’s salary (salary). Display the regression tree. a. What are the predictor variable and split value for the first split of the default regression tree? b. Build a
Create a regression tree using the accompanying data set (predictor variables: x1 to x4; target: y). Select the best-pruned tree for scoring and display the full-grown, best-pruned, and minimum error trees. a. What is the minimum validation MSE in the prune log? How many decision nodes are
Kyle Robson, an energy researcher for the U.S. Energy Information Administration, is trying to build a model for predicting annual electricity retail sales for states. Kyle has compiled a data set for the 50 states and the District of Columbia that contains average electricity retail price (Price
Create a regression tree using the accompanying data set (predictor variables: x1 to x4; target: y).a. Use the rpart function to build a default regression tree. Display the tree using the prp function. How many leaf nodes are in the default regression tree?b. What are the predictor variable and
Refer to the previous exercise for a description of the data set. Create a regression tree model for predicting per capita electricity retail sales (Sales). Select the best-pruned tree for scoring and display the full-grown, best-pruned, and minimum error trees. a. How many leaf nodes are in
Merrick Stevens is a sports analyst working for ACE Sports Management, a sports agency that represents over 200 athletes. He is interested in understanding the relationship between an NBA player’s salary and his physicality and performance statistics. Merrick has constructed a data set that
New Age Solar sells and installs solar panels for residential homes. The company’s sales representatives contact and pay a personal visit to potential customers to present the benefits of installing solar panels. This high-touch approach works well as the customers feel that they receive personal
Ben Derby is a highly paid scout for a professional baseball team. He attends at least five or six Major League Baseball games a week and watches as many recorded games as he can in order to evaluate potential players for his team. He also keeps detailed records about each perspective player. His
Create a bagging ensemble classification tree model using the accompanying data set (predictor variables: x1 to x4; target: y). a. What are the overall accuracy rate, sensitivity, and specificity of the model on the validation data?b. What is the AUC value of the model? c. Score a new
Create a boosting ensemble classification tree model using the accompanying data set (predictor variables: x1 to x4; target: y). a. What are the overall accuracy rate, sensitivity, and specificity of the model on the validation data? b. What is the AUC value of the model? c. Score a
Create a random forest ensemble classification tree model using the accompanying data set (predictor variables: x1 to x4; target: y). Select two predictor variables randomly to construct each weak learner. a. What are the overall accuracy rate, sensitivity, and specificity of the model on the
Create a bagging ensemble classification tree model using the accompanying data set (predictor variables: x1 to x5; target: y). a. What are the overall accuracy rate, sensitivity, and specificity of the model on the validation data? b. What is the lift value of the leftmost bar of the
Create a boosting ensemble classification tree model using the accompanying data set (predictor variables: x1 to x5; target: y). a. What are the overall accuracy rate, sensitivity, and specificity of the model on the validation data? b. What is the lift value of the leftmost bar of the
Create a random forest ensemble classification tree model using the accompanying data set (predictor variables: x1 to x5; target: y). Select two predictor variables randomly to construct each weak learner. a. What are the overall accuracy rate, sensitivity, and specificity of the model on the
Consider the following LP problem where x1 and x2 represent the decision variables. Solve the LP problem to answer the following questions. a. What are the values of x1 and x2 at the optimal solution? What is the maximum value of z? b. Identify the binding and nonbinding constraints
Perform k-means clustering on all the variables in the accompanying data set. Do not standardize the variables. a. Specify the k value as 2 and plot the cluster membership using the cluster and silhouette plots. b. Specify the k value as 3 and plot the cluster membership using the cluster
A local coffee shop observes that, on average, four customers enter the store every 5 minutes during the rush hour between 6:30 am and 7:30 am each day. The number of customers arriving at the coffee shop follows a Poisson distribution. Each barista can serve 2 or 3 customers every 8 minutes, a
Hoping to increase its sales, a pizzeria wants to start a new marketing campaign promising its customers that if their order does not get delivered within an hour, the pizzas are free. Historically, the probability of on-time pizza delivery follows a binomial distribution with n = 50 and p = 0.88.
The regression tree below relates credit score to number of defaults (NUM DEF), revolving balance (REV BAL), and years of credit history (YRS HIST). Predict the credit score of each of the following individuals. a. An individual with no defaults, $4,200 revolving balance, and 12 years of
The accompanying data set contains three predictor variables, x1, x2, and x3, and one numerical target variable, y. A regression tree will be constructed using the data set.a. Which split on x1 will generate the smallest MSE? b. Which split on x2 will generate the smallest MSE?c. Which split
Create a regression tree using the accompanying data set in the Exercise_10.31 worksheet (predictor variables: x1 to x4; target: y). a. Use the rpart function to build a default regression tree. Display the default regression tree. How many leaf nodes are in the default regression tree? What
Mateo Derby works as a cyber security analyst at a private equity firm. His colleagues at the firm have been inundated by a large number of spam e-mails. Mateo has been asked to implement a spam detection system on the company’s e-mail server. He reviewed a sample of 500 spam and legitimate
Daniella Lara, a human resources manager at a large tech consulting firm, has been reading about using analytics to predict the success of new employees. With the fast-changing nature of the tech industry, some employees have had difficulties staying current in their field and have missed the
In recent years, medical research has incorporated the use of data analytics to find new ways to detect heart disease in its early stage. Medical doctors are particularly interested in accurately identifying high-risk patients so that preventive care and intervention can be administered in a timely
Admission to medical school in the United States is highly competitive. The acceptance rate to the top medical schools could be as low as 2% or 3%. With such a low acceptance rate, medical school admissions consulting has become a growing business in many cities. In order to better serve his
Credit card fraud is becoming a serious problem for the financial industry and can pose a considerable cost to banks, credit card issuers, and consumers. Fraud detection using data mining techniques has become an indispensable tool for banks and credit card companies to combat fraudulent
Refer to Exercise 11 for a description of the data set. Partition the data into 60% training and 40% validation data. For Analytic Solver, use 12345 as the random seed and create 10 weak learners. For R, use one as the random seed and create 100 weak learners. a. Create a bagging ensemble
Refer to Exercise 13 for a description of the data set. a. Create a boosting ensemble classification tree model. What are the overall accuracy rate, sensitivity, and specificity of the model on the validation data? What is the AUC value of the model? b. Compare the performance of the
Refer to Exercise 15 for a description of the data set. a. Create a random forest ensemble classification tree model. Select two predictor variables randomly to construct each weak learner. What are the overall accuracy rate, sensitivity, and specificity of the model on the validation data?
Refer to Exercise 19 for a description of the data set. a. Create a random forest ensemble classification tree model. Select three predictor variables randomly to construct each weak learner. What are the overall accuracy rate, sensitivity, and specificity of the model on the validation data?
Refer to Exercise 21 for a description of the data set. a. Create a boosting ensemble classification tree model. What are the overall accuracy rate, sensitivity, and specificity of the model on the validation data? What is the AUC value of the model? b. Compare the performance of the
Ramona Kim is a California Highway Patrol (CHP) officer who works in the city of San Diego. Having lost her own uncle in a car accident, she is particularly interested in educating local drivers about driver safety. After discussing this idea with her commanding officer, she learns that since 2005
Perform agglomerative clustering on the accompanying data set. a. Include all five variables, first standardized to z-scores, for the analysis. Choose Euclidean for the distance between observations and single linkage for the distance between clusters. Plot and inspect the dendrogram. How many
Perform agglomerative clustering on the accompanying data set. Include all seven variables, first standardized to z-scores, for the analysis. Choose Euclidean for the distance between observations and Ward’s method for the distance between clusters. Plot and inspect the dendrogram. How many
Perform agglomerative clustering on the accompanying data set. Include all seven variables, first standardized to z-scores, for the analysis. Choose Euclidean for the distance between observations and complete linkage for the distance between clusters. Plot and inspect the dendrogram. How many
Perform agglomerative clustering on the accompanying data set. a. Include all five variables, first standardized to z-scores. Use the Euclidean distance for similarity and single linkage for the clustering method to cluster the data into three clusters. How many observations are in the largest
Perform agglomerative clustering on the accompanying data set, using all 11 binary variables. Use Jaccard’s coefficients for the similarity measure and average linkage for the clustering method. Inspect the dendrogram. How many clusters are generated if the minimum distance between clusters is
Perform agglomerative clustering on the accompanying data set, using all 11 binary variables. Use Jaccard’s coefficients for the similarity measure and complete linkage for the clustering method to cluster the data into 5 clusters. How many observations are in the largest cluster? How many 1s are
Perform agglomerative clustering on the accompanying data consisting of both numerical and categorical variables. Use Gower’s coefficient for the distance between observations and Ward’s clustering method. Plot and inspect the dendrogram. How many clusters are generated if the minimum distance
A local pizza store wants to get a better sense of who its customers are. The accompanying table shows a portion of data that it collected on 30 randomly selected customers. Variables include age, female (1 if female, 0 otherwise), annual income, married (1 if married, 0 otherwise), own (1 if own
Refer to the previous exercise for a description of the data set. a. Perform agglomerative clustering on the accompanying data consisting of both numerical and categorical variables. Use Gower’s coefficient for the distance between observations and Ward’s clustering method. How many
Refer to the previous exercise for a description of the data set. a. Perform agglomerative hierarchical clustering to group the 38 countries according to their population measures (i.e., Population Growth, Female Pop, Male Pop, Total Pop, Labor Force, Fertility Rate, and Birth Rate) only. Use
The accompanying data set contains country-level health and population measures for 38 countries from the World Bank’s 2000 Health Nutrition and Population Statistics database. For each country, the measures include death rate per 1,000 people (Death Rate, in %), health expenditure per capita
Internet addiction has been found to be a widespread problem among university students. A small liberal arts college in Colorado conducted a survey of Internet addiction among its students using the Internet Addiction Test (IAT) developed by Dr. Kimberly Young. The IAT contains 20 questions that
Denise Lau is an avid football fan and religiously follows every college football game. During the current season, she meticulously keeps a record of how each quarterback has played throughout the season. Denise is making a presentation at a local college football fan club about these quarterbacks.
Anne Cutberth has just started her new job as an academic adviser at a small liberal arts college in Colorado. She is going through a list of students in an academic department and wants to gain a better understanding about the student body in the department. The accompanying table shows a portion
Peter Lara, an aspiring college student, met with his high school college advisor to discuss potential colleges to which he might apply. He was advised to consult with the College Scorecard information on the Department of Education website. After talking to his family, he downloaded a list of 116
The accompanying table lists a portion of Major League Baseball’s pitchers, their earned run average (ERA), and their salary (in millions of $). a. Perform agglomerative clustering to group the players based on ERA and Salary. Standardize the variables and choose the Euclidean distance and
The accompanying table shows a portion of data consisting of the January, April, July, and October average temperatures of 50 selected U.S. cities. a. Perform agglomerative hierarchical clustering to group the cities based on their January, April, July, and October average temperatures.
Sanjay Johnson is working on a research paper that studies the relationship between the education level and the median income of a community. The accompanying table shows a portion of the data that he has collected on the educational attainment and the median income for 77 areas in the city of
Internet addiction has been found to be a widespread problem among university students. A small liberal arts college in Colorado conducted a survey of Internet addiction among its students using the Internet Addiction Test (IAT) developed by Dr. Kimberly Young. The IAT measures three underlying
The accompanying table contains a portion of data from the National Longitudinal Survey (NLS), which follows over 12,000 individuals in the United States over time. Variables in this analysis include Urban (1 if lives in urban area, 0 otherwise), Siblings (number of siblings), White (1 if white, 0
Perform k-means clustering on the accompanying data set. a. Use all variables in the analysis. Do not standardize the variables. Set the number of clusters to 3. What are the size and average distance for the largest cluster? b. Specify the same settings as in part a, but standardize the
Perform k-means clustering on both variables in the accompanying data set. Standardize the variables. Experiment with the k values of 2, 3, and 4. Compare the number of observations and distance statistics of the largest cluster for each k value
Perform k-means clustering on the accompanying data set. a. Use variables x1, x2, and x3 in the analysis. Standardize the data. Specify the k value as 2. What are the cluster center values for the larger cluster? b. Specify the k value as 3. What are the cluster center values for the
Perform k-means clustering on the accompanying data set. Use variables x1, x3, and x5 in the analysis. Do not standardize the variables. a. Set the number of clusters to 3. What are the size and cluster center values for the largest cluster? b. Perform the same analysis as in part a, but
Perform k-means clustering on the accompanying data set. Use variables x4, x5, x6, and x7, standardized to z-scores, in the analysis. a. Specify the k value as 2 and plot the cluster membership using the cluster and silhouette plots. What is the average silhouette width? b. Specify the k
Perform k-means clustering on the accompanying data set. a. Use variables x1, x3, and x5 in the analysis. Do not standardize the variables. Set the number of clusters to 4. What are the size and cluster center values for the largest cluster? b. Perform the same analysis as in part a, but
Perform k-means clustering on all the variables in the accompanying data set. a. Standardize the data. Specify the k value as 2 and plot the cluster membership using the cluster and silhouette plots. What is the average silhouette width? b. Specify the k value as 3 and plot the cluster
British biologist Ronald Fisher studied iris flowers and classified them according to the width and length of the flower’s petals and sepals (a small, green leafy part below the petal). The accompanying table shows a portion of the data that Fisher used in his study. a. Perform k-means
Sanjay Johnson is working on a research paper that studies the relationship between the education level and the median income of a community. The accompanying table shows a portion of the data that he has collected on the educational attainment and the median income for 77 areas in the city of
Refer to the previous exercise for a description of the data set. a. Perform k-means clustering to group the 38 countries into four clusters according to their population measures (i.e., Population Growth, Female Pop, Male Pop, Total Pop, Labor Force, Fertility Rate, and Birth Rate) only. Is
The accompanying data set contains country-level health and population measures for 38 countries from the World Bank’s 2000 Health Nutrition and Population Statistics database. For each country, the measures include death rate per 1,000 people (Death Rate, in %), health expenditure per capita
Ben Derby is a highly paid scout for a professional baseball team. He attends at least five or six Major League Baseball games a week and watches as many recorded games as he can in order to evaluate potential players for his team to recruit at the end of the season. He also keeps detailed records
Jennifer Gomez is moving to a small town in Napa Valley, California, and has been house hunting for her new home. Her Realtor has given her a list of 35 homes with at least 2 bedrooms that were recently sold. Jennifer wants to see if she can group them in some meaningful ways to help her narrow
Denise Lau is an avid football fan and religiously follows every college football game. During the current season, she meticulously keeps a record of how each quarterback has played throughout the season. Denise is making a presentation at the local college football fan club about these
The accompanying data set contains the nutrition facts on 30 common food items; a portion of the data is shown. The values are based on 100 grams of the food items. Perform k-means clustering using k = 3 on the nutritional facts of the food items. Standardize the variables. Describe the
Jake Duffy is the e-commerce manager of a major electronics retailer. He is researching his company’s e-commerce competitors and wants to group the competitors based on their performance data. He has compiled a data set that contains the following four performance measures of the major
The accompanying data set contains economic development indicators for 11 African countries collected by the World Bank in 2015. The economic development indicators include annual % growth in agriculture (Agriculture), annual % growth in exports (Exports), annual % growth in final consumption
The accompanying data set contains the school level average SAT critical reading (CR), math (M), and writing (W) scores for the graduating seniors from 100 high schools in New York City. The data set also records the number of SAT test takers (Test Takers) from each school. a. Perform k-means
A country’s information technology use has been linked to economic and societal development. A non-profit organization collects data that measure the use and impact of information technology in over 100 countries annually. The accompanying table shows a portion of the data collected, with the
A telecommunications company wants to identify customers who are likely to unsubscribe to the telephone service. The company collects the following information from 100 customers: customer ID (ID), age (Age), annual income (Income), monthly usage (Usage, in minutes), tenure (Tenure, in months), and
Consider the following portion of data consisting of 25 transactionsa. Generate association rules with a minimum support of 10 transactions and minimum confidence of 75%. Sort the rules by lift ratio. What is the lift ratio for the top rule? b. Interpret the support count of the top
The data are the same as in the previous exercise. Read the data file using the readtransactions function and perform the following tasks. a. Produce an item frequency plot and frequency table. Which item is the most frequent item? b. Generate association rules with a minimum support of
Consider the following portion of data consisting of 100 transactions.a. Generate association rules with minimum support of 10 transactions and minimum confidence of 50%. How many rules are generated? Sort the rules by lift ratio. Which is the top rule? What is the lift ratio for the top
The data are the same as in the previous exercise. Perform association rule analysis using the following settings. a. Generate association rules with a minimum support of 10 transactions and minimum confidence of 75%. How many rules are generated? b. Generate association rules with a
Consider the following portion of data consisting of 40 transactions. Read the data file into R using the readtransactions function and perform the following tasks. a. Produce an item frequency plot and frequency table. Which item is the least frequent item? b. Generate association rules
The data are the same as in the preceding exercise. Read the data file into R using the readtransactions function and perform the following tasks. a. Produce an item frequency plot and a frequency table. Which item is the most frequent item? b. Generate association rules with a minimum
Use Excel’s Analysis ToolPak or R, both with a seed of 1, to simulate 120 random observations of a continuous uniform random variable over the interval [10, 75]. What are the mean, the standard deviation, and the range of the 120 observations? How many observations are greater than 65?
Use Excel’s Analysis ToolPak or R, both with a seed of 1, to simulate 25 random observations based on a binomial distribution with five trials and p = 0.2. What are the mean and standard deviation of the 25 observations?
An online music streaming service wants to find out which popular Beatles songs are frequently downloaded together by its users. The service collects the download logs for 100 users over the past month, where the download logs show which songs were downloaded during the same session. A portion of
A local grocery store keeps track of individual products that customers purchase. Natalie Jackson, the manager in charge of the fresh fruits and produce section, wants to learn more about the customer purchasing patterns of apples, bananas, cherries, oranges, and watermelons, the five most
An online movie streaming company conducts a consumer study to find out the movie genres that its customers watch. Eighty-eight households volunteered to participate in the study and allow the company to track the genres of movies they watch over a one-week period. A portion of the data is shown in
Use the movie data set from the previous exercise and R to perform association rule analysis. Make sure to read the data file using the readtransactions function first.a. Explore the data using an item frequency plot and a frequency table. Which genre of movie is the most frequently
The data are the same as in the previous exercise. Perform association rule analysis using the following settings. a. Generate association rules with a minimum support of 10 transactions and minimum confidence of 50%. Sort the rules by lift ratio. Report and interpret the lift ratio for the
Showing 7300 - 7400
of 7675
First
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
Step by Step Answers