Question: Dataset name:Medical Cost Personal Dataset. please use this data set for below questions https://www.kaggle.com/datasets/mirichoi0218/insurance Dataset and task description Describe the data analytics task for the

Dataset name:"Medical Cost Personal Dataset". please use this data set for below questions

https://www.kaggle.com/datasets/mirichoi0218/insurance

Dataset and task description

Describe the data analytics task for the chosen dataset. How is the dataset used to solve a business problem? Show the dataset characteristics, namely the number of instances, the number of numerical and categorical variables, and the description and the domain for each.

2. Descriptive statistics

a. Categorical variables

2.a.1 Consider one categorical variable and create a frequency table including frequency, relative frequency, cumulative frequency and relative cumulative frequency. [5pts]

Provide a summative interpretation of the values in the frequency table. What is the most frequent category and the least frequent ones with their percentages?
Report any invalid values or errors in the data and explain how to correct them.
Correct the errors in the data and show the frequency table for each categorical variable after the corrections.

2.a.2 Create a pivot table with one categorical variable selected in the ROWS, one numerical variable in the COLUMNS (apply grouping), and the average of the target variable in the VALUES. When a column contains a missing value, it is represented with '?', grouping does not work. You will have to replace those missing values with the mean, save the worksheet and recreate the pivot table. Show the pivot chart and interpret. [10pts]

b. Numerical variables

2.b.1 Provide a table with the list of all numerical variables, their min, max, average, Q1, Q2 and Q3 and Q4. Provide an interpretation of the quartiles.

2.b.2. Create a histogram for one numerical variable. What is the shape of the distribution? Interpret? If it is right or left skewed, relate this to a meaningful sentence explaining what most range most of the values fall in. If it is a normal distribution, use the empirical rule to report the percentage of data points and associated range of values.

2.b.3. Create a box plot for two numerical variables. You may try many numerical variables and keep only the one that exhibits outliers. Interpret the results (how is the variability of the distribution (high/low), are there outliers?

Please share the Excel file as well, or you can share the link to Google share

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!

Question: What as the average weekly safety inventory level of refined sugar from the beginning January 2022 to the end of July 2022? A. 512,465.9691 metric tons per week B. 316,002.1474 metric tons...

Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...

Table of Contents Main Objective of the assessment 1 Description of the Assessment 1 Learning Outcomes and Marking Criteria. 4 Format of the Assessment 6 Submission Instructions. 7 Avoiding...

Dataset name:"Medical Cost Personal Dataset". please use this data set for below questions : https://www.kaggle.com/datasets/mirichoi0218/insurance *Note: - Please do this in excel and Provide excel...

Analytics Report Overview The purpose of this task is to provide students with practical experience in writing a data analytical report to provide useful insights, pattern and trends in a chosen...

As you can see in the picture there is task given. Please slove the each task please slove the task in the picture Dverview Timelines and Expectations Trevoniage Valin of Take: 3SSY But. Week. 11...

Dataset and task description Describe the data analytics task for the chosen dataset. How does the dataset used to solve a business problem? Show the dataset characteristics, namely the number of...

Description The assignment requires that you analyse a data set, interpret, and draw conclusions from your analysis, and then convey your conclusions in a written report. The assignment must be...

Data Management and Data Analytics . Objective of the project . Base SAS Programming Using SAS Studio on SAS Viya C. SAS Visual Data Mining and machine Learning (In some questions Base SAS...

The resulting bar chart shows that when HMK is the AR Clerk and FKL is the Cash Receipts Clerk, CT is the GL Accounting Clerk for $226,851 of current AR balances. However, there are $25,352 of...

1. How is the approach that BAA has taken here with this project different from the traditional approach to working with suppliers? 2. Why does the company want to own all of the risk all of the...

3. (a) Find the Laplace Transform of g(t), h(t) and k(t) shown in Figure 2. [6] | 9(t) |h(t) |k(t) 15 15 15 20 1- 10 10 10 15 20 Figure 2: Functions for problem 3a (b) Show that the Laplace Transform...

Select all that apply Which of the following are true of capital structure decisions? Multiple select question. Companies in the banking industry choose low debt - equity ratios. Financial economists...

1.What types of molecules pass through the glomerulus? 2.Should blood cells pass through the glomerulus under normal conditions? 3. patient has a eGFR of 10mL/min, what level of kidney disease is...

1. Name the vessel that brings blood out of the kidney and back to the heart. 2. Name the vessel that brings blood into the kidney. 3. Name the inner layer of the kidney. 4. Name the duct that brings...

Use the following information to answer the question on page 2 below: Note: all sales are credit sales. Compute each of the following ratios for 2017 and 2018 and indicate whether each ratio was...

1. Name the endocrine gland above the kidney. 2. Name the organ. 3. Urine leaves the kidney through this duct. 4. Blood leaves the kidney through this major vessel. 5. Blood enters the kidney through...

How does filtrate flow through the kidney? Order the flow of filtrate from glomerulus to toilet (urine) urinary bladder collecting duct minor calyx nephron loop (loop of Henle) descending limb -...

Design a minimum-mass symmetric three-bar truss (the area of member 1 and that of member 3 are the same) to support a load P, as was shown in Fig. 2.9. The following notation may be used: Pu = P cos...

45. Consider an irreducible finite Markov chain with states 0, 1, . . . ,N. (a) Starting in state i, what is the probability the process will ever visit state j? Explain! (b) Let xi = P{visit state N...

62. Consider the particle from Exercise 57. What is the expected number of steps the particle takes to return to the starting position? What is the probability that all other positions are visited...

63. For the Markov chain with states 1, 2, 3, 4 whose transition probability matrix P is as specified below find fi3 and si3 for i = 1, 2, 3. 0.4 0.2 0.1 0.3] 0.1 0.5 0.2 0.2 P = 0.3 0.4 0.2 0.1 0 0...