Question: A2.1 - HBAT Data Analysis Due Sunday by 12:59am Points 100 Submitting a file upload Attempts 0 Allowed Attempts 1 Available Sep 18 at 8am
A2.1 - HBAT Data Analysis Due Sunday by 12:59am Points 100 Submitting a file upload Attempts 0 Allowed Attempts 1 Available Sep 18 at 8am - Sep 28 at 12:59am Assignment Description This assignment is a hands-on data assessment and evaluation. It exposes you to the various steps that data analysts go through when they receive a dataset. This is an individual assignment. Each student is expected to complete the assignment and submit their own work. Assignment Deliverable The submitted report should include a detailed discussion of the techniques you used to answer each question (see below). Include only relevant graphs and tables along with a thorough interpretation of each. Graphs and/or tables with no explanation/interpretation will not be graded. Hand in your work in a Word or PDF format. Dataset for the Assignment Download the HBAT_Missing file Download HBAT_Missing file. It has a sample size of 70. Assignment Objective The main objective of this assignment is to develop a better understanding of the HBAT dataset and, more specifically, to explore the characteristics of its customers and the relationship between their perception of HBAT and their actions towards HBAT. Make sure to address all the following questions. 1. Run a thorough univariate and bivariate graphical and statistical examination of your data. Do you notice any irregularities in your data? What does your data look like? Normal, skewed? Tips: Make sure to number and label each table and graph. (e.g., Table 1. Summary statistics for HBA missing data set). Provide a title and a detailed interpretation of each chart. Remember the rule of thumb: for any table and/or graph, you can have at most three genuine findings. If you have more, you have probably made them up (????). 2. Missing values analysis. Do you have any missing values in your data? If so, determine the extent of missing values per variable and case. Are there any variables/cases that you need to delete? Use 30% of missing values as the threshold for deletion. After deleting variables and/or cases with 30% missing data, construct a summary statistic for your data. Do you still observe any missing values? If you do, decide on how to impute these missing values. Limit your imputation technique to mean or median substitution. Justify your choice. 3. Detection and treatment of outliers. Are there any univariate outliers in your dataset? Use both Tukey's fences and the z-score approach (with the z threshold set at 2.5 since you have a small sample size) to identify them. Do you notice any discrepancy between the two methods? Explain. How many values were detected as outliers? Will you keep these outliers or delete them? Justify your decision. Discuss the impact of your decision on remaining data analysis. 4. After treating missing values and outliers, construct a summary statistic of your data. Compare and contrast your results with question one. Develop two hypothetical questions that you can answer using graphical and/or empirical techniques. Provide correct answers. 5. If you did not treat missing values and/or outliers, what would the impact be on subsequent data analysis? Context HBAT is a manufacturer of paper products. You are presented with a hypothetical dataset based on surveys of HBAT customers completed on a secure website managed by an established marketing research company. Sample size There are 70 observations on 14 separate variables based on a market segmentation study of HBAT customers: the newsprints industry and the magazine industry. Categories of data Numerical variables: V1 to V9. Categorical variable: V10 to V14. Additional information related to the variables is available in the Excel file (HBAT missing) on the Metadata spreadsheet. For detecting outliers, you are already familiar with the boxplot method as well as Tukey's fences. You can also use the z-score approach; to calculate the z value for each observation, you can use the Excel built-in function (STANDARDIZE). For missing value analysis, you can use the COUNT function to count the numbers of cells containing data in a range that contains numbers and use it to determine the extent of missing values per case and per variable. You can also use the following video to help you complete your assignment: Introduction to Pivot TablesLinks to an external site
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
