# Question: The file DVD Movies xlsx contains a large data set of

The file DVD Movies.xlsx contains a large data set of 10,000 customer transactions for a fictional chain of video stores in the United States. Each row corresponds to a different customer and lists

(1) A customer ID number (1–10,000),

(2) The state where the customer lives,

(3) The city where the customer lives,

(4) The customer’s gender,

(5) The customer’s favorite type of movie (drama, comedy, science fiction, or action),

(6) The customer’s next favorite type of movie,

(7) The number of times the customer has rented movies in the past year, and

(8) The total dollar amount the customer has spent on movie rentals during the past year.

The data are sorted by state, then city, then gender. We assume that this data set represents the entire population of customers for this video chain. Imagine that only the data in columns A through D are readily available for this population. The company is interested in summary statistics of the data in columns E through H, such as the percentage of customers whose favorite movie type is drama or the average amount spent annually per customer, but it will have to do some work to obtain the data in columns E through H for any particular customer. Therefore, the company wants to perform sampling. The question is: What form—simple random sampling, systematic sampling, stratified sampling, cluster sampling, or even some type of multistage sampling—is most appropriate?

Your job is to investigate the possibilities and to write a report on your findings. For any sampling method, any sample size, and any quantity of interest (such as average dollar amount spent annually), you should be concerned with sampling cost and accuracy.

One way to judge the latter is to generate several random samples from a particular method and calculate the mean and standard deviation of your point estimates from these samples. For example, you might generate 10 systematic samples, calculate the average amount spent (an) for each sample, and then calculate the mean and standard deviation of these 10 s. If your sampling method is accurate, the mean of the s should be close to the population average, and the standard deviation should be small. By doing this for several sampling methods and possibly several sample sizes, you can experiment to see what is most cost-efficient for the company. You can make any reasonable assumptions about the cost of sampling with any particular method.

(1) A customer ID number (1–10,000),

(2) The state where the customer lives,

(3) The city where the customer lives,

(4) The customer’s gender,

(5) The customer’s favorite type of movie (drama, comedy, science fiction, or action),

(6) The customer’s next favorite type of movie,

(7) The number of times the customer has rented movies in the past year, and

(8) The total dollar amount the customer has spent on movie rentals during the past year.

The data are sorted by state, then city, then gender. We assume that this data set represents the entire population of customers for this video chain. Imagine that only the data in columns A through D are readily available for this population. The company is interested in summary statistics of the data in columns E through H, such as the percentage of customers whose favorite movie type is drama or the average amount spent annually per customer, but it will have to do some work to obtain the data in columns E through H for any particular customer. Therefore, the company wants to perform sampling. The question is: What form—simple random sampling, systematic sampling, stratified sampling, cluster sampling, or even some type of multistage sampling—is most appropriate?

Your job is to investigate the possibilities and to write a report on your findings. For any sampling method, any sample size, and any quantity of interest (such as average dollar amount spent annually), you should be concerned with sampling cost and accuracy.

One way to judge the latter is to generate several random samples from a particular method and calculate the mean and standard deviation of your point estimates from these samples. For example, you might generate 10 systematic samples, calculate the average amount spent (an) for each sample, and then calculate the mean and standard deviation of these 10 s. If your sampling method is accurate, the mean of the s should be close to the population average, and the standard deviation should be small. By doing this for several sampling methods and possibly several sample sizes, you can experiment to see what is most cost-efficient for the company. You can make any reasonable assumptions about the cost of sampling with any particular method.

## Answer to relevant Questions

Calculate the following probabilities using Excel. a. P(t10 ≥ 1.75), where t10 has a t distribution with 10 degrees of freedom.b. P(t100 ≥ 1.75), where t100 has a t distribution with 100 degrees of freedom. How do you ...The file P08_06.xlsx contains data on repetitive task times for each of two workers. John has been doing this task for months, whereas Fred has just started. Each time listed is the time (in seconds) to perform a routine ...A drugstore manager needs to purchase adequate supplies of various brands of toothpaste to meet the ongoing demands of its customers. In particular, the company is interested in estimating the proportion of its customers who ...The director of a university’s career development center is interested in comparing the starting annual salaries of male and female students who recently graduated from the university and commenced fulltime employment. The ...You have been assigned to determine whether more people prefer Coke or Pepsi. Assume that roughly half the population prefers Coke and half prefers Pepsi. How large a sample do you need to take to ensure that you can ...Post your question