Question: Given the two National Health Interview Survey datasets, the data dictionaries, and the scoring document, achieve the following: 1 . Clean the data in R:

Given the two National Health Interview Survey datasets, the data dictionaries, and the scoring
document, achieve the following:
1. Clean the data in R:
a. In the child dataset, identify and print the duplicate records with the same ID value.
Eliminate the duplicate record for each instance so there is only one of each ID.
b. In the child dataset, identify and print records with values that are out of range for the
following variables (consult data dictionary). Determine how best to deal with these
values. Document your decisions in the dataset cover sheet (see #3 below).
i. BSCNWPPL_C
ii. BSCNWPLCS_C
iii. BSCCHG_C
iv. BSCHLOPPL_C
v. BSCCRYALT_C
vi. BSCCLMDWN_C
vii. BSCFUSSY_C
viii. BSCSTHE_C
ix. BSCSCHD_C
x. BSCPTSLP_C
xi. BSCSTYSLP_C
xii. BSCPRLKSL_C
c. In the child dataset, convert the above listed variables into factors and define the levels
according to the data dictionary.
d. In the child dataset:
i. Create a new variable AGECAT grouping the age of the child (AGEP_C) into the
following categories:
1.0-7.99
2.8-12.99
3.13-18
ii. Generate frequencies for the new age groups
iii. Generate a list of all records who have either had borderline diabetes or
prediabetes. Save these records to a new dataset called DIABETES.
2. Generate a permanent cross-sectional analysis data set.
a. Create a new variable called HHX_NUM that is the HHX variable value without the H0 at
the beginning (contains only the last 5 numbers of the variable).
b. Generate frequencies for sex, age, Hispanic ethnicity, and general health status.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!