Question: I. Statistical Analysis We start by looking at some summary data of all the regions that I am analyzing in this report. Below you will

I. Statistical Analysis We start by looking at some summary data of all the regions that I am analyzing in this report. Below you will find the mean, standard deviation, and correlation between the healthy and problem area customers. We do not draw any specific conclusions from this data but it provides an overall summary of what we are dealing with and these numbers are used throughout the rest of the analysis in various forms. Region Total Customers Chicago Michigan MN/WI NW St. Louis 617 157 208 289 368 Total # of Customers Purchasing Accessories, Parts, or Supplies with little to no York Purchase History Total # of Customers Purchasing Both (Healthy Customer) 487 125 130 193 236 130 32 78 96 132 ^^ Problem Area Customers Mean: 234.2 Mean: 93.6 STD Dev: 132.959241875095 STD Dev: 36.99513481527 Correlation of Problem Area VS Healthy Customers: 0.712541245830435 The above will show you a few pieces of information that stand right out at us. Less than 40% (93.6/234.2) of customers across all regions are considered to be a healthy customer. You will also notice the mean is slightly skewed by Chicago with its 487 problem customers. However, it is not skewed enough to cause this data to be off as there is still an obvious and general problem amongst all regions when looking at the individual proportions of customer groups. You will also notice later on that there is a natural linear progression of more problem customers as the overall customer population grows. It also seems there is somewhat of a correlation and a linear trend when looking at simple regression of more problem area customers as more healthy customers are present in a region. When looking at the correlation of each customer group in relation to the total customers we find a similar potential for a linear trend. We will analysis these in more detail below. Correlation of Custom Purchasing Both (Healthy) & Customers across Regions: 0.816093 Correlation of Custom Purchasing Acc, Parts, & Supplies & Customers across Regions: 0.986987 We come to the same potential conclusion here that because of the fairly high correlation numbers, there may be a linear trend when looking at simple regression within these two customer groups. Before moving on let's look at the same above analysis on a product category specific level. Product Category Purchase Count Air Conditioner Hear Pump Furnace Air Handler Sum of Category Purchase Count CH MI MN/WI NW 13 4 5 3 10 5 7 1 11 9 9 4 13 9 11 8 St.Louis 14 8 8 5 25 23 33 41 35 5.75 8.25 3.2691 2.5860201 74 08 0.9759 0.907205 21 592 10.25 1.9202 86 0.9678 56 8.75 3.269174 21 0.986572 48 ^^ Total product category purchases from problem area customers Mean STD Dev Correlation 6.25 3.96074 49 0.94398 51 Su m 61 35 40 21 15 7 We see another potential for an obvious linear trend when looking at the correlations between all product categories purchased in a specific region verses total product categories purchased across all regions within the problem area customer group. This could potentially tell us that strong performing product groups overall within the company would correlate to those groups performing stronger in given regions regardless of what the region is doing to help it. The product quality and innovation itself might be seen by customers across the US naturally. Or the company wide marketing might be working better for certain categories across the entire US. This will remain an unknown as we do not have the data to analysis this. Next we will use some descriptive statistics to look a little deeper at the summary data. Region Chicago Michigan MN/WI NW St.Louis Total Customers Mean Standard Error Median Total Customers 617 157 208 289 368 327.8 80.7102224 289 # of Customers Total # of Purchasing Customers Accessories, Purchasing Parts, or Supplies Both with little/no (Healthy) York Purchase History 487 130 125 32 130 78 193 96 236 132 ^^ Problem Area # of Customers Purchasing Accessories, Parts, or Supplies with little/no York Purchase History Mean 234.2 Standard Error 66.47962094 Median 193 Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count Confid Level(95.0%) Product Category Air Conditioner Hear Pump Furnace Air Handler Sum CH Mean #N/A 180.473544 32570.7 1.4935496 1.23991745 460 157 617 1639 5 224.087502 CH 13 4 5 3 25 6.25 Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count Confid Level(95.0%) MI 10 5 7 1 23 MI Mean #N/A 148.6529515 22097.7 3.128159988 1.731722553 362 125 487 1171 5 184.5770181 MN/WI 11 9 9 4 33 NW 13 9 11 8 41 MN/WI 5.75 Mean Standard Error Median Mode Standard Deviation 2.28673712 Standard Error 4.5 Median #N/A Mode Standard 4.57347424 Deviation 1.887458609 Standard Error 6 Median #N/A Mode Standard 3.774917218 Deviation Sample Variance Kurtosis 20.9166667 Sample Variance 3.3803273 Kurtosis 14.25 Sample Variance 0.257309942 Kurtosis Skewness Range Minimum Maximum Sum Count Confid Level(95.0%) NW Mean Standard Error Median Standard 1.81106705 10 3 13 25 4 0.357856111 9 1 10 23 4 7.27741811 10.25 1.10867789 10 2.21735578 Skewness Range Minimum Maximum Sum Count Confid Level(95.0%) St.Louis Mean Standard Error Median Standard Skewness Range Minimum Maximum Sum Count Confid 6.006735676 Level(95.0%) 8.75 1.887458609 8 3.774917218 St.Louis 14 8 8 5 35 8.25 1.493039 4 9 9 2.986078 8 8.916666 7 2.602498 1.380236 6 7 4 11 33 4 4.751517 7 Sum 61 35 40 21 157 Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count Confid Level(95.0%) Deviation 4.91666667 Sample Variance 1.69951163 Kurtosis 0.48156305 Skewness 5 Range 8 Minimum 13 Maximum 41 Sum 4 Count Confid 3.52830786 Level(95.0%) 14.25 2.227146814 1.129338115 9 5 14 35 4 6.006735676 We can make the following observations based on this descriptive statistics summary data. 1. The means of the total customers and the problem area customers are slightly skewed because of Chicago having a significantly larger amount of total customers and therefore total problem area customers. This makes the mean a bad measure for us to go off of alone. 2. In general as total number of customers in a region grows, the total number of problem customers grows with it. That shows us no specific region is doing a great job at converting dealers to healthy customers as they acquire new customers. They seem to be simply getting them on board but not generating equipment sales with a good portion of them. 3. Being that the min and max are so spread out in regards to the customers in the region versus the healthy vs problem area ones you can see that some regions simply have a lot more work do to because they have a larger portion of customers not purchasing equipment. They might need more account managers here to handle the same task another region might be able to do with less account managers. The availability of low hanging fruit will be higher in a few regions. 4. The means of all the product categories sold within a region seem to be fairly close to each other and consistent. This shows us that regardless of the regions the overall products sold to problem customers are close, but vary within each category of product. The difference that is found within these means of the product categories might be due to the skewness of the market having more overall customers and therefore slightly more product category sales within the given category. This could also be due to products simply performing better within a region because of the region it is in and the regulations of the HVAC industry. For example Chicago might sell way less heat pumps than the Northwest would because Chicago is a condenser and furnace market. 5. Because the ranges are fairly large within each region when looking at the product categories sold, we find that there is at least one product category is usually doing way worse than the best performing product category and therefore causing more differences in the mean of the data within a region. Next we will look at the proportion of customers purchasing only parts, supplies, and accessories, against the total number of customers in the region. This should follow the same linear trend that we found when looking at the total purchases in a product category by problem customers versus all purchases of problem area customers across all regions. Region Customers # of Custom Purchasing Accessories, Parts, or Supplies with little to no York Purchase History # of Customers Purchasing Both (Healthy Customer) Chicago 617 Michigan 157 MN/WI 208 NW 289 St.Louis 368 Proportion of Customers that Purchases Accessories, Parts, and Supplies Region Chicago Michigan MN/WI NW ST. Louis 487 125 130 193 236 Value/Total 0.7893031 0.7961783 0.625 0.6678201 0.6413043 You will see that in each region, there is a large amount of problem area customers in relation to total customers, and the proportions are pretty close to the same. This tells us that although some regions do better than others, the problem itself is still spread across all of the regions. The more customers in the region, the more problem area customers there are. The highest proportion of the customers purchasing only accessories, parts, and supplies is in Michigan. Proportion is given by the value divided by the total population in the region. This will give the fraction or proportion of the category in relation to the whole population. Michigan is the region where most of the marketing and dealer conversion activates should be done based on this data. However, we wish to test the hypotheses and find which proportions are significantly higher (measured against a 50% baseline) and where marketing requires to be done. H0: P=0.5 Ha: p0.5 130 32 78 96 132 CI= pZ *S ,=0.51.96*0.0201 ,=(0.4606, 0.5394) This means that we are 95% confident that the proportion should be between 0.4606 and 0.5394. Therefore all the values are significantly above the required half. However, the regions with the highest number of the people buying only accessories, parts, and supplies are Chicago and Michigan. Again giving us some info for our conclusion on which areas might need the most resources. Next we will look at which equipment product categories might be performing the best or worst out of the combined regions and test a hypothesis of whether there are any outliers. We start by looking at the summations and it becomes clear that the air conditioner is doing the best and the air handler is doing the worst. We test the hypothesis of the following: H0: no product is doing better Ha: at least one product is doing better Product Category Purchase Count Air Conditioner Mean 40 Hear Pump Standard 8.346656 Furnace Error Median 39 Air Handler Mode #N/A Sum Standard Deviation 16.69331 Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count Largest(1) Confidence Level(95.0%) 278.6667 0.472379 mean0.33019 =40 CI=x-barz*s 40 ,=401.96*8.34666 21 ,=(23.6405,6156.3595) 160 4 61 26.56278 CH 13 4 5 3 25 MI 10 5 7 1 23 MN/WI 11 9 9 4 33 NW St.Louis 13 14 9 8 11 8 8 5 41 35 Total 61 35 43 21 157 This means that we are 95% confident that the values should be between 23.6405 to 56.3595 since this is the confidence interval for the mean. The data shows us that the air conditioner (61) is above the mean interval and therefore it is doing well while the air handler (21) is below the interval and therefore is doing poorly. We reject the null hypothesis and accept the alternative. We will now look at whether the product categories are independent on themselves or dependent on the region by testing the null and alternative hypothesis related to this and stated below. We will use a Chi Square test to do this. MI Product Category Purchase Count Air Conditioner Hear Pump Furnace Air Handler Sum Expected Frequencies 9.713375796 5.573248408 6.847133758 3.343949045 8.936305732 5.127388535 6.299363057 3.076433121 12.82165605 7.356687898 CH MN/WI 13 4 5 3 25 Observe d 13 4 5 3 10 5 7 1 11 9 10 5 7 1 23 chi square from test 11 9 9 4 33 NW 13 9 11 8 41 St Louis 14 8 8 5 35 Sum 61 35 43 21 157 6.20303 2 degrees of freedom =(r-1)(c-1) ,=(4-1)(5-1)=3*4=12 level of significance is 0.05 critical chi square at 12 df and 0.05 level of significance is ,=21.026 total 9.038216561 4.414012739 15.92993631 9.140127389 11.22929936 5.484076433 13.59872611 7.802547771 9.585987261 4.681528662 Expected Frequenci es Air Conditione r Hear Pump Furnace Air Handler Sum 9 4 13 9 11 8 14 8 8 5 Hypotheses H0: independent on itself Ha: dependent on the region test calculated is equal to 0.996907 critical value at 12 df and 0.05 level of significance is 21.026 CH MI MN/ WI NW St.Lo uis 9.71 3376 5.57 3248 6.36 9427 3.34 3949 8.93 6306 5.12 7389 5.85 9873 3.07 6433 12.8 2166 7.35 6688 8.40 7643 4.41 4013 15.9 2994 9.14 0127 10.4 4586 5.48 4076 13.5 9873 7.80 2548 8.91 7197 4.68 1529 25 23 33 41 35 To ta l 61 35 40 21 15 7 Since 12 is less than 21.02607, reject the null hypothesis on the basis of the calculated value being less than the critical value, and conclude the product category purchase count is related (dependent) to the region. The region affects the purchase of the various product categories. As you will notice based on the test of the hypothesis, we find that the success of the products is somewhat dependent on the region and it is not just from an overall number of customers standpoint. This means that although we will find obvious product category performers (for better or worse) across all regions, the regional marketing, activity, and plan within the region itself will contribute to this overall effort and required for us to come to a conclusion of action on. Digging deeper into some of the above analysis we find and prove that there is in fact a positive relationship and a linear trend of total number of problem area customers as the overall customer population grows in a region. This is where we find conclusive evidence of this. Region Data Chicago Michigan MN/WI NW St. Louis Product Category Purchases Air Conditioner Hear Pump Furnace Air Handler Sum CH 13 4 5 3 25 Total # of Customers Purchasing Accessories, Parts, or Supplies with little to no York Purchase History 487 125 130 193 236 ^^ Problem Area Total # of Customers Purchasing Both (Healthy Customer) MI 10 5 7 1 23 MN/WI 11 9 9 4 33 130 32 78 96 132 NW 13 9 11 8 41 St.Louis 14 8 8 5 35 ^^ The above shows the total product category purchases from customers purchasing accessories, parts, and supplies with little to no York purchase history SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.987 0.9741 0.9655 27.602 5 ANOVA df 1 3 4 Regression Residual Total Coefficients Intercept Total # of Customers 32.28953323 0.812963799 SS MS F 86105.22334 86105.223 113.0199 2285.576662 761.85889 88390.8 Standard Error t Stat 1.15561219 27.94149565 9 10.6310822 0.076470464 9 Lower 95.0% 121.211842 8 0.56960065 2 Significanc eF 0.001779 P-value Upper 95.0% Model fit between overall number of customers in a market and the customers who will only be purchasing accessories, parts and supplies: Total # of Customers Purchasing Accessories, Parts, or Supplies with little to no York Purchase History 600 500 400 300 200 100 0 100 200 300 400 500 600 700 The scatter diagram above shows that the two variables have a positive relationship. The dots show a linear trend. Hypothesis H0: = 0 H1: 0 Significance level, = 0.05 10.6310822 Test statistic 9 0.00177858 p-value 7 Decision rule Reject H0 if p-value is less than Decision/Conclusion Since 0.001779 is less than 0.05, reject the null hypothesis. The intercept is significant implying that the overall number of customers in a market can be used to predict the number of customers who will only be purchasing accessories, parts and supplies. R R^2 1 - R^2 standard error of the estimate 0.986986509 0.974142369 0.025857631 27.60179138 The value of R is near to 1, which indicates that the overall number of customers in a market have a strong correlation (strong positive relationship) with the number of customers who will only be purchasing accessories, parts and supplies. Coefficient of determination, R^2, is the proportion of variability in the response variable that can be explained by the statistical model. In the above model, 97.41% variability in the number of customers who will only be purchasing accessories, parts and supplies is explained by the model. The model fit is good as depicted by the large value of R^2. Overall number of customers in a market appear to be good predictor of how many of these will only be purchasing accessories, parts and supplies. Only 2.59% of the variation in the number of customers who will only be purchasing accessories, parts and supplies is not explained by the model. Standard error measures the accuracy of the predictions, thus the above value of standard error is not large, showing that the prediction is fairly accurate. Model fit between the overall number of customers in a market and product categories: Total Customers 700 600 500 400 300 200 100 0 0 20 40 60 80 100 120 140 160 180 The scatter diagram above shows that the two variables have a positive but weak relationship. The dots are scattered indicating that the correlation between the two variables is not strong. Regression Statistics Multiple R 0.3138839 R Square 0.0985231 Adjusted R Square -0.201969 Standard Error 59.840676 Observations 5 ANOVA df Regression Residual Total Intercept SS 1 1174.080361 3 4 Coefficient s 10742.71964 11916.8 Standard Error 31.681811 60.57715512 MS 1174.08036 1 3580.90654 6 F 0.327872 4 t Stat 0.52299931 5 P-value 0.637144 8 Significance F 0.60701358 4 Total # of Customers 0.0949304 Upper 95% Lower 95% -161.1017329 224.46535 -0.432680824 0.6225416 0.165787946 Lower 95.0% 161.1017329 0.432680824 0.57260141 7 Upper 95.0% 224.465354 1 0.607013 6 0.62254165 Hypothesis H0: = 0 H1: 0 Significance level, = 0.05 0.572601 4 0.607013 6 Test statistic p-value Decision rule: Reject H0 if p-value is less than Decision/conclusion: Since 0.607014 is greater than 0.05, fail to reject the null hypothesis. The intercept is not statistically significant implying that the overall number of customers in a market cannot significantly predict what product categories will be better or worse performing. R R^2 1 - R^2 Standard error of the estimate 0.313883 9 0.098523 1 0.901476 9 59.84067 6 The value of R is small, showing that the overall number of customers in a market have a weak correlation with the product categories that will either be better or worse performing. Coefficient of determination, R^2, is the proportion of variability in the response variable than can be explained by the statistical model. In the above model, only 9.85% variability in what product categories will be better or worse performing is explained by the model. The model fit is not good as depicted by the small value of R^2. Overall number of customers in a market does not appear to be a good predictor of what product categories will be better or worse performing. About 90.15% of the variation in what product categories will be better or worse performing is not explained by the model. The value of standard error is fairly large. The predictions based on the model above are not good. As our last hypothesis and analysis before giving a conclusion we will test whether there is a significant difference in the mean purchase count between the product categories and regions. Product Category Purchases Air Conditioner Hear Pump Furnace Air Handler Sum CH 13 4 5 3 25 MI 10 5 7 1 23 MN/WI 11 9 9 4 33 NW 13 9 11 8 41 St.Louis 14 8 8 5 35 Hypothesis: H01: There is no significant difference in the mean purchase count between the four categories of products. H02: There is no significant difference in the mean purchase count between the five regions. Anova: Two-Factor Without Replication Coun SUMMARY t Air Conditioner 5 Hear Pump 5 Furnace 5 Air Handler 5 Su m 61 35 40 21 Averag e 12.2 7 8 4.2 CH MI 4 4 25 23 6.25 5.75 MN/WI 4 33 8.25 NW St.Louis 4 4 41 35 10.25 8.75 df MS ANOVA Source of Variation SS Rows 165 3 54.983 54.8 24.8 244.6 4 12 19 13.7 2.0667 Columns Error Total Variance 2.7 5.5 5 6.7 20.91666 7 14.25 8.916666 7 4.916666 7 14.25 F P-value 26.60483 9 1.377E-05 6.629032 0.004692 3 5 F crit 3.49029 5 3.25916 7 Looking at the averages, it is easy to see that air conditioner is the best performing product followed by furnace. Air handler is the worst performing product, as it has the lowest mean purchase count over the five regions. It appears the NW region has the highest average purchase count on all product categories while Michigan has the lowest average purchase count. A lot of marketing need to be targeted to Michigan as previously stated. Row difference is significant, F(3, 12) = 26.60, p = 0.00001377. We reject the first null hypothesis and conclude that there is a difference in the mean purchase count between the four categories of products. Column difference is significant F(4, 12) = 6.629, p = 0.00469. We reject the second null hypothesis and conclude that there is significant difference in the mean purchase count between the five regions

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!