Question: Competencies 4159.1.1 : Profiles Data The learner interprets a data dictionary to understand the data set. 4159.1.2 : Interprets Statistics and Visualization The learner interprets
Competencies
4159.1.1 : Profiles Data
The learner interprets a data dictionary to understand the data set.
4159.1.2 : Interprets Statistics and Visualization
The learner interprets probability, descriptive and inferential statistics, and visualization.
4159.1.3 : Wrangles Data
The learner wrangles data to ensure accuracy, format, and integrity relevant to the task being performed.
Introduction
Throughout your career in data analytics, you will analyze data according to business and data analytic needs. You will explore basic statistics, examine correlations among variables using visualization, and perform inferential statistical analysis to provide insights relevant to business requirements. In this task, you will aggregate and analyze a large health insurance company's data. Your goal is to uncover patterns, trends, and correlations to offer insights into business performance. You will deliver the results of your analysis to company stakeholders.
Scenario
Refer to the most recent company data provided in the "Health Insurance Dataset" and "Health Insurance Considerations and Dictionary" supporting documents to inform your work.
Requirements
Your submission must represent your original work and understanding of the course material. Most performance assessment submissions are automatically scanned through the WGU similarity checker. Students are strongly encouraged to wait for the similarity report to generate after uploading their work and then review it to ensure Academic Authenticity guidelines are met before submitting the file for evaluation. SeeUnderstanding Similarity Reportsfor more information.
Grammarly Note:
Professional Communication will be automatically assessed through Grammarly for Education in most performance assessments before a student submits work for evaluation. Students are strongly encouraged to review the Grammarly for Education feedback prior to submitting work for evaluation, as the overall submission will not pass without this aspect passing. SeeUse Grammarly for Education Effectivelyfor more information.
Microsoft Files Note:
Write your paper in Microsoft Word (.doc or .docx) unless another Microsoft product, or pdf, is specified in the task directions. Tasks may not be submitted as cloud links, such as links to Google Docs, Google Slides, OneDrive, etc. All supporting documentation, such as screenshots and proof of experience, should be collected in a pdf file and submitted separately from the main file. For more information, please seeComputer System and Technology Requirements.
Your responses to the following task prompts must be provided in a document. Unless otherwise specified, responses to PA requirements that are included in a Python or RStudio notebook will not be accepted.
Python and R Notes:
Work can be performed for this assessment using a locally installed interactive development environment (IDE), such as PyCharm, JupyterLab, RStudio, or the WGU Virtual Lab Environment, which can be accessed using the "WGU Virtual Lab Environment" web link below.
Part I: Univariate and Bivariate Statistical Analysis and Visualization
A. Using the provided dataset, do the following using R or Python:
1. Select four variables (e.g., two quantitative/numeric variables and two qualitative/categorical variables) and provide univariate visualizations for each variable selected.
2. Provide two bivariate visualizations for each variable selected from part A1.
B. Complete the following using the attached "Health Insurance Dataset" and R or Python:
1. Provide the descriptive statistics (e.g., mean, median, range, standard deviation, variance, percentiles, quartiles) for all quantitative (i.e., numeric) variables selected in the dataset.
2. Provide the descriptive statistics (e.g., frequency counts and percentages) for all qualitative (i.e., categorical) variables in the dataset.
Part II: Parametric Statistical Testing
C. Describea real-world organizational situation or issue in the attached "Health Insurance Dataset" by doing the following:
1. Create one research question that is relevant to the dataset and anyorganizational needs that can be answered through data analysis and is appropriate for parametric testing.
D. Analyze the dataset by doing the following:
1. Identify a parametricstatistical test that is relevant to your research question from part C1.
2. List the dataset variables relevant to answering your research question from part C1.
3. Justify why you chose the statistical test identified in part D1 based on variables.
4. Develop null and alternative hypotheses related to your chosen parametric test from part D1.
5. Write error-free code in either Python or R to run the parametric test and provide the output and the results of allcalculations from the parametric statistical test you perform.
Note: Error-free code includes code that is free from syntax and logic errors.
E. Evaluate parametric test results by doing the following:
1. Discuss the test results, including the decision to reject or fail to reject the null hypothesis from part D4.
2. Create an answer to your research question from part C1 based on the decision to reject or fail to reject the null hypothesis.
3. Explain how stakeholders in the organization benefit from your choice of testing method.
F. Summarize the implications of your parametric statistical testing by doing the following:
1. Recommend a course of action based on your findings.
2. Discuss the limitations of your data analysis.
Note: One notebook can be submitted forbothstatistical tests.
Part III: Nonparametric Statistical Testing
G. Describe a real-world organizational situation or issue in the provided dataset by doing the following:
1. Create one research question that is relevant to the dataset and anyorganizational needs that can be answered through data analysis and is appropriate for nonparametric testing.
H. Analyze the dataset further by doing the following:
1. Identify a nonparametricstatistical test that is relevant to your question from part G1.
2. List the dataset variables relevant to answering your research question from part G1.
3. Justify why you chose the statistical test identified in part H1 based on variables.
4. Develop null and alternative hypotheses related to your chosen nonparametric test from part H1.
5. Write error-free code in either Python or R to run the nonparametric test and provide a screenshot of the output and the results of all calculations from the nonparametric statistical test you performed.
Note: Error-free code includes code that is free from syntax and logic errors.
I. Evaluate nonparametric test results by doing the following:
1. Discuss the test results, including the decision to reject or fail to reject the null hypothesis from part H4.
2. Create an answer to your research question from part G1 based on the decision to reject or fail to reject the null hypothesis.
3. Explain how stakeholders in the organization benefit from your choice of testing method.
J. Summarize the implications of your nonparametric statistical testing by doing the following:
1. Recommend a course of action based on your findings.
2. Discuss the limitations of your data analysis.
Part IV: Panopto Video Submission
K. Submit your work by doing the following:
1. Provide a document that includes responses to task prompts through the Assessments section of the student portal.
2. Provide the annotated code for Parts I, II, and III as an executable script file. R files and Python script files are accepted.
Note: Error-free code includes code that is free from syntax and logic errors. Upload this file in the D599 Repository and name it "D599Task2." Provide a link to the GitLab repository that contains a copy of the executable script file using the R or Python languagethrough the Assessments section of the student portal.
3. Provide a link to a Panopto video recording that includes a demonstration of the functionality of the code used for the analysis and an identification of the version of the programming environment. The demonstration must include a vocal presentation of allthe listed elements.
Note: One notebook must be submitted for Parts I, II, and III.
Note: The audiovisual recording should feature you visibly presenting the material (i.e., not in voiceover or embedded video) and should simultaneously capture both you and the functioning code.
Note: For instructions on how to access and use Panopto, use the "Panopto How-To Videos" web link provided below. To access Panopto's website, navigate to the web link titled "Panopto Access" and then choose to log in using the "WGU" option. If prompted, log in using your WGU student portal credentials, and then it will forward you to Panopto's website.
To submit your recording, upload it to the Panopto drop box titled"Data Preparation and Exploration TCN2 | D599 (Student Creators)."Once the recording has been uploaded and processed in Panopto's system, retrieve the URL of the recording from Panopto and copy and paste it into the Links option. Upload the remaining task requirements using the Attachments option.
Sources
L. Acknowledge reference sources used to support the Python or R code application. All references listed should also include an in-text citation in the code annotation. Be sure the sources are reliable. If no sources were used for coding, state, "No sources used."
M. Acknowledge sources, using in-text citations and references, for content that is quoted, paraphrased, or summarized.
Professional Communication
N. Demonstrate professional communication in the content and presentation of your submission.
File Restrictions
File name may contain only letters, numbers, spaces, and these symbols: ! - _ . * ' ( ) File size limit: 200 MB File types allowed: doc, docx, rtf, xls, xlsx, ppt, pptx, odt, pdf, csv, txt, qt, mov, mpg, avi, mp3, wav, mp4, wma, flv, asf, mpeg, wmv, m4v, svg, tif, tiff, jpeg, jpg, gif, png, zip, rar, tar, 7z
Rubric
A1:UNIVARIATE VISUALIZATIONS
Not Evident A distribution of variables using univariate visualizations is not provided in the document. | Approaching Competence The submitted document identifies the distribution of variables and provides univariate visualizations but does not cover 2 qualitative and 2 quantitative variables. | Competent The submitted document accurately identifies the distribution of 2 qualitative and 2 quantitative variables and provides univariate visualizations for each variable using R or Python. |
A2:BIVARIATE VISUALIZATIONS
Not Evident A bivariate visualization is not provided in the document. | Approaching Competence The submitted document accurately provides bivariate visualizations but not of all 4 variables from part A. Or the bivariate visualization contains inaccuracies. Or R or Python are not used. | Competent The submitted document accurately provides 2 bivariate visualizations of all 4 variables from part A1 and uses R or Python. |
B1:DESCRIPTIVE STATISTICS: QUANTITATIVE
Not Evident The descriptive statistics for quantitative variables selected in the dataset are not provided in the document. | Approaching Competence The submitted document provides the descriptive statistics for quantitative variables selected in the dataset but does not cover all quantitative variables. Or R or Python are not used. | Competent The submitted document accurately provides the descriptive statistics for all quantitative variables selected in the dataset using R or Python. |
B2:DESCRIPTIVE STATISTICS: QUALITATIVE
Not Evident The descriptive statistics for qualitative variables are not provided in the document. | Approaching Competence The submitted document provides descriptive statistics for qualitative variables but does not cover all qualitative variables or use R or Python. | Competent The submitted document accurately provides the descriptive statistics for all qualitative variables selected in the dataset using R or Python. |
C1:RESEARCH QUESTION
Not Evident A research question is not provided in the document. | Approaching Competence The research question cannot be addressed through analysis of the dataset. Or the question is not relevant to a realistic organizational need or situation represented in the dataset. Or the question is not appropriate for parametric testing. | Competent The research question can be addressed through analysis of the dataset. The question is relevant to a realistic organizational need or situation represented in the dataset and is appropriate for parametric testing. |
D1:PARAMETRIC TEST METHOD
Not Evident A parametric test is not provided in the document. | Approaching Competence The submitted document identifies a parametric test, but it is irrelevant to addressing the research question from part C1. | Competent The submitted document correctly identifies a parametric test that is relevant to addressing the research question from part C1. |
D2:DATASET VARIABLES
Not Evident The dataset variables are not provided in the document. | Approaching Competence The submitted document lists the dataset variables, but they are irrelevant to addressing the research question from part C1. | Competent The submitted document correctly lists the dataset variables that are relevant to addressing the research question from part C1. |
D3:JUSTIFICATION FOR PARAMETRIC TEST
Not Evident A justification is not provided in the document. | Approaching Competence The submitted document provides a justification, but the justification does not address why the chosen method of parametric testing was selected. Or the justified technique is not the same as the testing method identified in part D1. Or the chosen technique is insufficient or inappropriate for the dataset or does not address the question from part C1. | Competent The submitted document provides a justification that addresses why the chosen method of parametric testing was selected. The justified technique is the same as the testing method identified in part D1. The technique is appropriate for the chosen dataset and addresses the question from part C1. |
D4:DEVELOP PARAMETRIC HYPOTHESES
Not Evident The submission does not develop null and alternative hypotheses in the document. | Approaching Competence The submitted document develops null and alternative hypotheses, but they are inaccurate or not related to the chosen parametric test from part D1. | Competent The submitted document develops accurate null and alternative hypotheses that are related to the chosen parametric test from part D1. |
D5:PARAMETRIC TEST CODE
Not Evident The submission does not provide any code, a screenshot of any output from running the code, or results of calculations in the document. | Approaching Competence The submitted document includes code that has errors or does not accurately use a parametric statistical technique to analyze the data. Or the submitted document includes a screenshot of either the output from running the code or the results of the calculations but not both. Or the submission includes only some results of the calculations. | Competent The submitted document includes error-free code to accurately analyze the dataset using the parametric technique identified in part D1 and includes a screenshot of both the output from running the code and the results of all calculations performed. |
E1:PARAMETRIC HYPOTHESIS SUPPORT
Not Evident A discussion of test results in terms of hypotheses is not provided in the document. | Approaching Competence The submitted document discusses test results but does not provide a decision to reject or fail to reject the null hypothesis identified in part D4. | Competent The submitted document discusses parametric test results, including the decision to reject or fail to reject the null hypothesis identified in part D4. |
E2:ANSWER TO PARAMETRIC RESEARCH QUESTION
Not Evident An answer to the research question is not provided in the document. | Approaching Competence The submitted document provides an answer that is discussed but is incomplete or does not answer the question from part C1. | Competent The submitted document provides an answer that correctly and completely addresses the research question from part C1. |
E3:BENEFIT OF PARAMETRIC TESTING
Not Evident An explanation is not provided in the document. | Approaching Competence The submitted document provides an explanation that is not specific to stakeholders in the organization. Or the submission does not explain how stakeholders could benefit from the data analysis. Or the explanation includes incorrect information. | Competent The submitted document provides an explanation that correctly addresses how stakeholders in the organization could benefit from the data analysis. |
F1:RECOMMENDED COURSE OF ACTION
Not Evident A recommended course of action is not provided in the document. | Approaching Competence The submitted document provides a recommendation that includes only a response to the question from part C1, but the recommendation is missing a course of action that could be taken in response to the analysis. Or the recommendation is irrelevant to the situation or question or would not plausibly address the situation or question. | Competent The submitted document provides a recommendation that includes both a response to the question from part C1 and a course of action that could be taken in response to the analysis. The recommendation is relevant to the situation and question and would plausibly address the situation and question. |
F2:LIMITATIONS OF PARAMETRIC DATA ANALYSIS
Not Evident An explanation of limitations is not provided in the document. | Approaching Competence The submitted document provides an explanation that includes inaccurate limitations of the data analysis. Or 1 or more of the limitations provided are not applicable to the analysis. | Competent The submitted document provides an explanation that includes the limitations of the data analysis, and all limitations provided apply to the analysis. |
G1:RESEARCH QUESTION
Not Evident A research question is not provided in the document. | Approaching Competence The research question provided in the submitted document is not relevant to a realistic organizational need or situation represented in the dataset, cannot be addressed through analysis of the dataset, or is not appropriate for nonparametric testing. | Competent The research question provided in the submitted document is relevant to a realistic organizational need or situation represented in the dataset, can be answered through data analysis, and is appropriate for nonparametric testing. |
H1:NONPARAMETRIC TEST METHOD
Not Evident A nonparametric test is not provided in the document. | Approaching Competence The submitted document identifies a nonparametric test, but it is irrelevant to addressing the research question from part G1. | Competent The submitted document correctly identifies a nonparametric test that is relevant to addressing the research question from part G1. |
H2:DATASET VARIABLES
Not Evident The dataset variables are not provided in the document. | Approaching Competence The submitted document lists the dataset variables, but they are irrelevant to addressing the research question from part G1. | Competent The submitted document accurately lists the dataset variables relevant to answering the research question from part G1. |
H3:JUSTIFICATION FOR NONPARAMETRIC TEST
Not Evident A justification is not provided in the document. | Approaching Competence The submitted document provides a justification, but the justification does not address why the chosen method of nonparametric testing was selected. Or the justified technique is not the same as the testing method identified in part H1. Or the chosen technique is insufficient or inappropriate for the dataset or does not address the question from part G1. | Competent The submitted document provides a justification that addresses why the chosen method of nonparametric testing was selected. The justified technique is the same as the testing method identified in part H1. The technique is appropriate for the chosen dataset and addresses the question from part G1. |
H4:DEVELOP NONPARAMETRIC HYPOTHESES
Not Evident The submission does not develop null and alternative hypotheses in the document. | Approaching Competence The submitted document develops null and alternative hypotheses, but they are not related to the chosen nonparametric test from part H1. | Competent The submitted document develops null and alternative hypotheses that are related to the chosen nonparametric test from part H1. |
H5:NONPARAMETRIC TEST CODE
Not Evident The submission does not provide any code, any output from running the code, or results of calculations in the document. | Approaching Competence The submitted document includes code that has errors or does not accurately use a nonparametric statistical technique to analyze the data or includes either the output from running the code or the results of the calculations but not both. Or the submission includes only some results of the calculations. | Competent The submitted document includes error-free code to accurately analyze the dataset using the nonparametric technique identified in part H1 and includes both the output from running the code and the results of all calculations performed. |
I1:NONPARAMETRIC HYPOTHESIS SUPPORT
Not Evident A discussion of test results in terms of hypotheses is not provided in the document. | Approaching Competence The submitted document discusses test results but does not provide a decision to reject or fail to reject the null hypothesis identified in part H4. | Competent The submitted document discusses nonparametric test results, including the decision to reject or fail to reject the null hypothesis identified in part H4. |
I2:ANSWER TO NONPARAMETRIC RESEARCH QUESTION
Not Evident An answer to the research question is not provided in the document. | Approaching Competence The submitted document provides an answer that is discussed but is incomplete or does not answer the question from part G1. | Competent The submitted document provides an answer that correctly and completely addresses the research question from part G1. |
I3:BENEFIT OF NONPARAMETRIC DATA ANALYSIS
Not Evident An explanation is not provided in the document. | Approaching Competence The submitted document provides an explanation that is not specific to stakeholders in the organization. Or the submission does not explain how stakeholders could benefit from the data analysis. Or the explanation includes incorrect information. | Competent The submitted document provides an explanation that correctly addresses how stakeholders in the organization could benefit from the data analysis. |
J1:RECOMMENDED COURSE OF ACTION
Not Evident A recommended course of action is not provided in the document. | Approaching Competence The submitted document provides a recommendation that includes only a response to the question from part G1, but the recommendation is missing a course of action that could be taken in response to the analysis. Or the recommendation is irrelevant to the situation or question or would not plausibly address the situation or question. | Competent The submitted document provides a recommendation that includes both a response to the question from part G1 and a course of action that could be taken in response to the analysis. The recommendation is relevant to the situation and question and would plausibly address the situation and question. |
J2:LIMITATIONS OF NONPARAMETRIC DATA ANALYSIS
Not Evident An explanation of limitations is not provided in the document. | Approaching Competence The submitted document provides an explanation that includes inaccurate limitations of the data analysis. Or 1 or more of the limitations provided are not applicable to the analysis. | Competent The submitted document provides an explanation that includes the limitations of the data analysis, and all limitations provided apply to the analysis. |
K1:DOCUMENT
Not Evident No document is provided. | Approaching Competence Not applicable. | Competent The submitted document provides responses to task prompts. |
K2:SUBMIT CODE
Not Evident No code is provided. | Approaching Competence The submitted code for Parts I, II, and III is incomplete or is not in an annotated and executable script file. Or the code provided could not be used to mitigate data quality issues in the dataset. | Competent The submitted code for Parts I, II, and III is in an annotated and executable script file that could be used to mitigate the data quality issues in the dataset and is error-free. The file is uploaded into the D599 Repository, and the GitLab repository URL is included. |
K3:PANOPTO VIDEO
Not Evident A Panopto video recording is not provided. | Approaching Competence A Panopto video recording is provided that includes a vocalized demonstration of the functionality of the code but does not include the identification of the version of the programming environment, does not implement a vocal presentation, or does not have all of the listed elements. Or the video does not capture both the presenter and the functioning code for the duration of the video. | Competent A link to a Panopto video recording is provided that includes a vocalized demonstration of the functionality of the code used for the analysis and an identification of the version of the programming environment. The demonstration includes all of the listed elements. For the duration of the presentation, the video captures both the presenter and the functioning code in a Panopto video recording. |
L:SOURCES OF THIRD-PARTY CODE
Not Evident Referenced sources used are not acknowledged. | Approaching Competence The submission acknowledges only some of the referenced sources used to acquire data or third-party code. Or the referenced sources are not reliable. | Competent The submission acknowledges all referenced sources used to acquire data or third-party code, and all the sources are reliable. |
M:SOURCES
Not Evident The submission does not include both in-text citations and a reference list for sources that are quoted, paraphrased, or summarized. | Approaching Competence The submission includes in-text citations for sources that are quoted, paraphrased, or summarized and a reference list; however, the citations or reference list is incomplete or inaccurate. | Competent The submission includes in-text citations for sources that are properly quoted, paraphrased, or summarized and a reference list that accurately identifies the author, date, title, and source location as available, or the submission states no sources were used. |
N:PROFESSIONAL COMMUNICATION
Not Evident This submission includes pervasive errors in professional communication related to grammar, sentence fluency, contextual spelling, or punctuation, negatively impacting the professional quality and clarity of the writing. Specific errors have been identified by Grammarly for Education under the Correctness category. | Approaching Competence This submission includes substantial errors in professional communication related to grammar, sentence fluency, contextual spelling, and/or punctuation. Specific errors have been identified by Grammarly for Education under the Correctness category. | Competent This submission includes satisfactory use of grammar, sentence fluency, contextual spelling, and punctuation, which promote accurate interpretation and understanding. |
Web Links
Panopto Access
Sign in using the "WGU" option. If prompted, log in with your WGU student portal credentials, which should forward you to Panopto's website. If you have any problems accessing Panopto, please contact Assessment Services at a..s@wgu.edu. It may take up to two business days to receive your WGU Panopto recording permissions once you have begun the course.
Panopto Dropbox
Data Preparation and Exploration TCN2 | D599 (Student Creators)
Panopto FAQs
Panopto How-To Videos
WGU Virtual Lab Environment
WGU GitLab Environment - WGU Community
Supporting Documents
Health Insurance Dataset.csv
i can't upload the dataset here
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
