Question: Competencies 4159.1.1 : Profiles Data The learner interprets a data dictionary to understand the data set. 4159.1.2 : Interprets Statistics and Visualization The learner interprets

Competencies

4159.1.1 : Profiles Data

The learner interprets a data dictionary to understand the data set.

4159.1.2 : Interprets Statistics and Visualization

The learner interprets probability, descriptive and inferential statistics, and visualization.

4159.1.3 : Wrangles Data

The learner wrangles data to ensure accuracy, format, and integrity relevant to the task being performed.

Introduction

Throughout your career in data analytics, you will analyze data according to business and data analytic needs. You will explore basic statistics, examine correlations among variables using visualization, and perform inferential statistical analysis to provide insights relevant to business requirements. In this task, you will aggregate and analyze a large health insurance company's data. Your goal is to uncover patterns, trends, and correlations to offer insights into business performance. You will deliver the results of your analysis to company stakeholders.

Scenario

Refer to the most recent company data provided in the "Health Insurance Dataset" and "Health Insurance Considerations and Dictionary" supporting documents to inform your work.

Requirements

Your submission must represent your original work and understanding of the course material. Most performance assessment submissions are automatically scanned through the WGU similarity checker. Students are strongly encouraged to wait for the similarity report to generate after uploading their work and then review it to ensure Academic Authenticity guidelines are met before submitting the file for evaluation. SeeUnderstanding Similarity Reportsfor more information.

Grammarly Note:

Professional Communication will be automatically assessed through Grammarly for Education in most performance assessments before a student submits work for evaluation. Students are strongly encouraged to review the Grammarly for Education feedback prior to submitting work for evaluation, as the overall submission will not pass without this aspect passing. SeeUse Grammarly for Education Effectivelyfor more information.

Microsoft Files Note:

Write your paper in Microsoft Word (.doc or .docx) unless another Microsoft product, or pdf, is specified in the task directions. Tasks may not be submitted as cloud links, such as links to Google Docs, Google Slides, OneDrive, etc. All supporting documentation, such as screenshots and proof of experience, should be collected in a pdf file and submitted separately from the main file. For more information, please seeComputer System and Technology Requirements.

Your responses to the following task prompts must be provided in a document. Unless otherwise specified, responses to PA requirements that are included in a Python or RStudio notebook will not be accepted.

Python and R Notes:

Work can be performed for this assessment using a locally installed interactive development environment (IDE), such as PyCharm, JupyterLab, RStudio, or the WGU Virtual Lab Environment, which can be accessed using the "WGU Virtual Lab Environment" web link below.

Part I: Univariate and Bivariate Statistical Analysis and Visualization

A. Using the provided dataset, do the following using R or Python:

1. Select four variables (e.g., two quantitative/numeric variables and two qualitative/categorical variables) and provide univariate visualizations for each variable selected.

2. Provide two bivariate visualizations for each variable selected from part A1.

B. Complete the following using the attached "Health Insurance Dataset" and R or Python:

1. Provide the descriptive statistics (e.g., mean, median, range, standard deviation, variance, percentiles, quartiles) for all quantitative (i.e., numeric) variables selected in the dataset.

2. Provide the descriptive statistics (e.g., frequency counts and percentages) for all qualitative (i.e., categorical) variables in the dataset.

Part II: Parametric Statistical Testing

C. Describea real-world organizational situation or issue in the attached "Health Insurance Dataset" by doing the following:

1. Create one research question that is relevant to the dataset and anyorganizational needs that can be answered through data analysis and is appropriate for parametric testing.

D. Analyze the dataset by doing the following:

1. Identify a parametricstatistical test that is relevant to your research question from part C1.

2. List the dataset variables relevant to answering your research question from part C1.

3. Justify why you chose the statistical test identified in part D1 based on variables.

4. Develop null and alternative hypotheses related to your chosen parametric test from part D1.

5. Write error-free code in either Python or R to run the parametric test and provide the output and the results of allcalculations from the parametric statistical test you perform.

Note: Error-free code includes code that is free from syntax and logic errors.

E. Evaluate parametric test results by doing the following:

1. Discuss the test results, including the decision to reject or fail to reject the null hypothesis from part D4.

2. Create an answer to your research question from part C1 based on the decision to reject or fail to reject the null hypothesis.

3. Explain how stakeholders in the organization benefit from your choice of testing method.

F. Summarize the implications of your parametric statistical testing by doing the following:

1. Recommend a course of action based on your findings.

2. Discuss the limitations of your data analysis.

Note: One notebook can be submitted forbothstatistical tests.

Part III: Nonparametric Statistical Testing

G. Describe a real-world organizational situation or issue in the provided dataset by doing the following:

1. Create one research question that is relevant to the dataset and anyorganizational needs that can be answered through data analysis and is appropriate for nonparametric testing.

H. Analyze the dataset further by doing the following:

1. Identify a nonparametricstatistical test that is relevant to your question from part G1.

2. List the dataset variables relevant to answering your research question from part G1.

3. Justify why you chose the statistical test identified in part H1 based on variables.

4. Develop null and alternative hypotheses related to your chosen nonparametric test from part H1.

5. Write error-free code in either Python or R to run the nonparametric test and provide a screenshot of the output and the results of all calculations from the nonparametric statistical test you performed.

Note: Error-free code includes code that is free from syntax and logic errors.

I. Evaluate nonparametric test results by doing the following:

1. Discuss the test results, including the decision to reject or fail to reject the null hypothesis from part H4.

2. Create an answer to your research question from part G1 based on the decision to reject or fail to reject the null hypothesis.

3. Explain how stakeholders in the organization benefit from your choice of testing method.

J. Summarize the implications of your nonparametric statistical testing by doing the following:

1. Recommend a course of action based on your findings.

2. Discuss the limitations of your data analysis.

Part IV: Panopto Video Submission

K. Submit your work by doing the following:

1. Provide a document that includes responses to task prompts through the Assessments section of the student portal.

2. Provide the annotated code for Parts I, II, and III as an executable script file. R files and Python script files are accepted.

Note: Error-free code includes code that is free from syntax and logic errors. Upload this file in the D599 Repository and name it "D599Task2." Provide a link to the GitLab repository that contains a copy of the executable script file using the R or Python languagethrough the Assessments section of the student portal.

3. Provide a link to a Panopto video recording that includes a demonstration of the functionality of the code used for the analysis and an identification of the version of the programming environment. The demonstration must include a vocal presentation of allthe listed elements.

Note: One notebook must be submitted for Parts I, II, and III.

Note: The audiovisual recording should feature you visibly presenting the material (i.e., not in voiceover or embedded video) and should simultaneously capture both you and the functioning code.

Note: For instructions on how to access and use Panopto, use the "Panopto How-To Videos" web link provided below. To access Panopto's website, navigate to the web link titled "Panopto Access" and then choose to log in using the "WGU" option. If prompted, log in using your WGU student portal credentials, and then it will forward you to Panopto's website.

To submit your recording, upload it to the Panopto drop box titled"Data Preparation and Exploration TCN2 | D599 (Student Creators)."Once the recording has been uploaded and processed in Panopto's system, retrieve the URL of the recording from Panopto and copy and paste it into the Links option. Upload the remaining task requirements using the Attachments option.

Sources

L. Acknowledge reference sources used to support the Python or R code application. All references listed should also include an in-text citation in the code annotation. Be sure the sources are reliable. If no sources were used for coding, state, "No sources used."

M. Acknowledge sources, using in-text citations and references, for content that is quoted, paraphrased, or summarized.

Professional Communication

N. Demonstrate professional communication in the content and presentation of your submission.

File Restrictions

File name may contain only letters, numbers, spaces, and these symbols: ! - _ . * ' ( ) File size limit: 200 MB File types allowed: doc, docx, rtf, xls, xlsx, ppt, pptx, odt, pdf, csv, txt, qt, mov, mpg, avi, mp3, wav, mp4, wma, flv, asf, mpeg, wmv, m4v, svg, tif, tiff, jpeg, jpg, gif, png, zip, rar, tar, 7z

Rubric

A1:UNIVARIATE VISUALIZATIONS

Not Evident

A distribution of variables using univariate visualizations is not provided in the document.

Approaching Competence

The submitted document identifies the distribution of variables and provides univariate visualizations but does not cover 2 qualitative and 2 quantitative variables.

Competent

The submitted document accurately identifies the distribution of 2 qualitative and 2 quantitative variables and provides univariate visualizations for each variable using R or Python.

A2:BIVARIATE VISUALIZATIONS

Not Evident

A bivariate visualization is not provided in the document.

Approaching Competence

The submitted document accurately provides bivariate visualizations but not of all 4 variables from part A. Or the bivariate visualization contains inaccuracies. Or R or Python are not used.

Competent

The submitted document accurately provides 2 bivariate visualizations of all 4 variables from part A1 and uses R or Python.

B1:DESCRIPTIVE STATISTICS: QUANTITATIVE

Not Evident

The descriptive statistics for quantitative variables selected in the dataset are not provided in the document.

Approaching Competence

The submitted document provides the descriptive statistics for quantitative variables selected in the dataset but does not cover all quantitative variables. Or R or Python are not used.

Competent

The submitted document accurately provides the descriptive statistics for all quantitative variables selected in the dataset using R or Python.

B2:DESCRIPTIVE STATISTICS: QUALITATIVE

Not Evident

The descriptive statistics for qualitative variables are not provided in the document.

Approaching Competence

The submitted document provides descriptive statistics for qualitative variables but does not cover all qualitative variables or use R or Python.

Competent

The submitted document accurately provides the descriptive statistics for all qualitative variables selected in the dataset using R or Python.

C1:RESEARCH QUESTION

Not Evident

A research question is not provided in the document.

Approaching Competence

The research question cannot be addressed through analysis of the dataset. Or the question is not relevant to a realistic organizational need or situation represented in the dataset. Or the question is not appropriate for parametric testing.

Competent

The research question can be addressed through analysis of the dataset. The question is relevant to a realistic organizational need or situation represented in the dataset and is appropriate for parametric testing.

D1:PARAMETRIC TEST METHOD

Not Evident

A parametric test is not provided in the document.

Approaching Competence

The submitted document identifies a parametric test, but it is irrelevant to addressing the research question from part C1.

Competent

The submitted document correctly identifies a parametric test that is relevant to addressing the research question from part C1.

D2:DATASET VARIABLES

Not Evident

The dataset variables are not provided in the document.

Approaching Competence

The submitted document lists the dataset variables, but they are irrelevant to addressing the research question from part C1.

Competent

The submitted document correctly lists the dataset variables that are relevant to addressing the research question from part C1.

D3:JUSTIFICATION FOR PARAMETRIC TEST

Not Evident

A justification is not provided in the document.

Approaching Competence

The submitted document provides a justification, but the justification does not address why the chosen method of parametric testing was selected. Or the justified technique is not the same as the testing method identified in part D1. Or the chosen technique is insufficient or inappropriate for the dataset or does not address the question from part C1.

Competent

The submitted document provides a justification that addresses why the chosen method of parametric testing was selected. The justified technique is the same as the testing method identified in part D1. The technique is appropriate for the chosen dataset and addresses the question from part C1.

D4:DEVELOP PARAMETRIC HYPOTHESES

Not Evident

The submission does not develop null and alternative hypotheses in the document.

Approaching Competence

The submitted document develops null and alternative hypotheses, but they are inaccurate or not related to the chosen parametric test from part D1.

Competent

The submitted document develops accurate null and alternative hypotheses that are related to the chosen parametric test from part D1.

D5:PARAMETRIC TEST CODE

Not Evident

The submission does not provide any code, a screenshot of any output from running the code, or results of calculations in the document.

Approaching Competence

The submitted document includes code that has errors or does not accurately use a parametric statistical technique to analyze the data. Or the submitted document includes a screenshot of either the output from running the code or the results of the calculations but not both. Or the submission includes only some results of the calculations.

Competent

The submitted document includes error-free code to accurately analyze the dataset using the parametric technique identified in part D1 and includes a screenshot of both the output from running the code and the results of all calculations performed.

E1:PARAMETRIC HYPOTHESIS SUPPORT

Not Evident

A discussion of test results in terms of hypotheses is not provided in the document.

Approaching Competence

The submitted document discusses test results but does not provide a decision to reject or fail to reject the null hypothesis identified in part D4.

Competent

The submitted document discusses parametric test results, including the decision to reject or fail to reject the null hypothesis identified in part D4.

E2:ANSWER TO PARAMETRIC RESEARCH QUESTION

Not Evident

An answer to the research question is not provided in the document.

Approaching Competence

The submitted document provides an answer that is discussed but is incomplete or does not answer the question from part C1.

Competent

The submitted document provides an answer that correctly and completely addresses the research question from part C1.

E3:BENEFIT OF PARAMETRIC TESTING

Not Evident

An explanation is not provided in the document.

Approaching Competence

The submitted document provides an explanation that is not specific to stakeholders in the organization. Or the submission does not explain how stakeholders could benefit from the data analysis. Or the explanation includes incorrect information.

Competent

The submitted document provides an explanation that correctly addresses how stakeholders in the organization could benefit from the data analysis.

F1:RECOMMENDED COURSE OF ACTION

Not Evident

A recommended course of action is not provided in the document.

Approaching Competence

The submitted document provides a recommendation that includes only a response to the question from part C1, but the recommendation is missing a course of action that could be taken in response to the analysis. Or the recommendation is irrelevant to the situation or question or would not plausibly address the situation or question.

Competent

The submitted document provides a recommendation that includes both a response to the question from part C1 and a course of action that could be taken in response to the analysis. The recommendation is relevant to the situation and question and would plausibly address the situation and question.

F2:LIMITATIONS OF PARAMETRIC DATA ANALYSIS

Not Evident

An explanation of limitations is not provided in the document.

Approaching Competence

The submitted document provides an explanation that includes inaccurate limitations of the data analysis. Or 1 or more of the limitations provided are not applicable to the analysis.

Competent

The submitted document provides an explanation that includes the limitations of the data analysis, and all limitations provided apply to the analysis.

G1:RESEARCH QUESTION

Not Evident

A research question is not provided in the document.

Approaching Competence

The research question provided in the submitted document is not relevant to a realistic organizational need or situation represented in the dataset, cannot be addressed through analysis of the dataset, or is not appropriate for nonparametric testing.

Competent

The research question provided in the submitted document is relevant to a realistic organizational need or situation represented in the dataset, can be answered through data analysis, and is appropriate for nonparametric testing.

H1:NONPARAMETRIC TEST METHOD

Not Evident

A nonparametric test is not provided in the document.

Approaching Competence

The submitted document identifies a nonparametric test, but it is irrelevant to addressing the research question from part G1.

Competent

The submitted document correctly identifies a nonparametric test that is relevant to addressing the research question from part G1.

H2:DATASET VARIABLES

Not Evident

The dataset variables are not provided in the document.

Approaching Competence

The submitted document lists the dataset variables, but they are irrelevant to addressing the research question from part G1.

Competent

The submitted document accurately lists the dataset variables relevant to answering the research question from part G1.

H3:JUSTIFICATION FOR NONPARAMETRIC TEST

Not Evident

A justification is not provided in the document.

Approaching Competence

The submitted document provides a justification, but the justification does not address why the chosen method of nonparametric testing was selected. Or the justified technique is not the same as the testing method identified in part H1. Or the chosen technique is insufficient or inappropriate for the dataset or does not address the question from part G1.

Competent

The submitted document provides a justification that addresses why the chosen method of nonparametric testing was selected. The justified technique is the same as the testing method identified in part H1. The technique is appropriate for the chosen dataset and addresses the question from part G1.

H4:DEVELOP NONPARAMETRIC HYPOTHESES

Not Evident

The submission does not develop null and alternative hypotheses in the document.

Approaching Competence

The submitted document develops null and alternative hypotheses, but they are not related to the chosen nonparametric test from part H1.

Competent

The submitted document develops null and alternative hypotheses that are related to the chosen nonparametric test from part H1.

H5:NONPARAMETRIC TEST CODE

Not Evident

The submission does not provide any code, any output from running the code, or results of calculations in the document.

Approaching Competence

The submitted document includes code that has errors or does not accurately use a nonparametric statistical technique to analyze the data or includes either the output from running the code or the results of the calculations but not both. Or the submission includes only some results of the calculations.

Competent

The submitted document includes error-free code to accurately analyze the dataset using the nonparametric technique identified in part H1 and includes both the output from running the code and the results of all calculations performed.

I1:NONPARAMETRIC HYPOTHESIS SUPPORT

Not Evident

A discussion of test results in terms of hypotheses is not provided in the document.

Approaching Competence

The submitted document discusses test results but does not provide a decision to reject or fail to reject the null hypothesis identified in part H4.

Competent

The submitted document discusses nonparametric test results, including the decision to reject or fail to reject the null hypothesis identified in part H4.

I2:ANSWER TO NONPARAMETRIC RESEARCH QUESTION

Not Evident

An answer to the research question is not provided in the document.

Approaching Competence

The submitted document provides an answer that is discussed but is incomplete or does not answer the question from part G1.

Competent

The submitted document provides an answer that correctly and completely addresses the research question from part G1.

I3:BENEFIT OF NONPARAMETRIC DATA ANALYSIS

Not Evident

An explanation is not provided in the document.

Approaching Competence

The submitted document provides an explanation that is not specific to stakeholders in the organization. Or the submission does not explain how stakeholders could benefit from the data analysis. Or the explanation includes incorrect information.

Competent

The submitted document provides an explanation that correctly addresses how stakeholders in the organization could benefit from the data analysis.

J1:RECOMMENDED COURSE OF ACTION

Not Evident

A recommended course of action is not provided in the document.

Approaching Competence

The submitted document provides a recommendation that includes only a response to the question from part G1, but the recommendation is missing a course of action that could be taken in response to the analysis. Or the recommendation is irrelevant to the situation or question or would not plausibly address the situation or question.

Competent

The submitted document provides a recommendation that includes both a response to the question from part G1 and a course of action that could be taken in response to the analysis. The recommendation is relevant to the situation and question and would plausibly address the situation and question.

J2:LIMITATIONS OF NONPARAMETRIC DATA ANALYSIS

Not Evident

An explanation of limitations is not provided in the document.

Approaching Competence

The submitted document provides an explanation that includes inaccurate limitations of the data analysis. Or 1 or more of the limitations provided are not applicable to the analysis.

Competent

The submitted document provides an explanation that includes the limitations of the data analysis, and all limitations provided apply to the analysis.

K1:DOCUMENT

Not Evident

No document is provided.

Approaching Competence

Not applicable.

Competent

The submitted document provides responses to task prompts.

K2:SUBMIT CODE

Not Evident

No code is provided.

Approaching Competence

The submitted code for Parts I, II, and III is incomplete or is not in an annotated and executable script file. Or the code provided could not be used to mitigate data quality issues in the dataset.

Competent

The submitted code for Parts I, II, and III is in an annotated and executable script file that could be used to mitigate the data quality issues in the dataset and is error-free. The file is uploaded into the D599 Repository, and the GitLab repository URL is included.

K3:PANOPTO VIDEO

Not Evident

A Panopto video recording is not provided.

Approaching Competence

A Panopto video recording is provided that includes a vocalized demonstration of the functionality of the code but does not include the identification of the version of the programming environment, does not implement a vocal presentation, or does not have all of the listed elements. Or the video does not capture both the presenter and the functioning code for the duration of the video.

Competent

A link to a Panopto video recording is provided that includes a vocalized demonstration of the functionality of the code used for the analysis and an identification of the version of the programming environment. The demonstration includes all of the listed elements. For the duration of the presentation, the video captures both the presenter and the functioning code in a Panopto video recording.

L:SOURCES OF THIRD-PARTY CODE

Not Evident

Referenced sources used are not acknowledged.

Approaching Competence

The submission acknowledges only some of the referenced sources used to acquire data or third-party code. Or the referenced sources are not reliable.

Competent

The submission acknowledges all referenced sources used to acquire data or third-party code, and all the sources are reliable.

M:SOURCES

Not Evident

The submission does not include both in-text citations and a reference list for sources that are quoted, paraphrased, or summarized.

Approaching Competence

The submission includes in-text citations for sources that are quoted, paraphrased, or summarized and a reference list; however, the citations or reference list is incomplete or inaccurate.

Competent

The submission includes in-text citations for sources that are properly quoted, paraphrased, or summarized and a reference list that accurately identifies the author, date, title, and source location as available, or the submission states no sources were used.

N:PROFESSIONAL COMMUNICATION

Not Evident

This submission includes pervasive errors in professional communication related to grammar, sentence fluency, contextual spelling, or punctuation, negatively impacting the professional quality and clarity of the writing. Specific errors have been identified by Grammarly for Education under the Correctness category.

Approaching Competence

This submission includes substantial errors in professional communication related to grammar, sentence fluency, contextual spelling, and/or punctuation. Specific errors have been identified by Grammarly for Education under the Correctness category.

Competent

This submission includes satisfactory use of grammar, sentence fluency, contextual spelling, and punctuation, which promote accurate interpretation and understanding.

Web Links

Panopto Access

Sign in using the "WGU" option. If prompted, log in with your WGU student portal credentials, which should forward you to Panopto's website. If you have any problems accessing Panopto, please contact Assessment Services at a..s@wgu.edu. It may take up to two business days to receive your WGU Panopto recording permissions once you have begun the course.

Panopto Dropbox

Data Preparation and Exploration TCN2 | D599 (Student Creators)

Panopto FAQs

Panopto How-To Videos

WGU Virtual Lab Environment

WGU GitLab Environment - WGU Community

Supporting Documents

Health Insurance Dataset.csv

i can't upload the dataset here

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!