Question: Complete data are very rare because some data are usually missing. There are typically four strategies to handle missing data. -Ignore the variables with missing

Complete data are very rare because some data are usually missing. There are typically four strategies to handle missing data.

-Ignore the variables with missing data.

-Delete the records that have some missing values.

-Impute the missing values (i.e., simply fill in the missing values with some other value, such as the mean of similar records with data).

-Use a data mining technique that can handle missing values, such as CART, which is one of a few techniques that handle missing data pretty well.

Suppose you are working on a project to model a hotel company's customer base. The goal is to determine which customer attributes are correlated with a high number of hotel stays. You have access to data from a customer survey, but one field (which is optional) is yearly income and has some missing values equaling roughly 5% of the records. Which of the strategies should be used? Explain why. Suppose you chose the third strategy. Explain how you would impute the missing incomes.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!