Question: Given the two National Health Interview Survey datasets, the data dictionaries, and the scoring document, achieve the following: 1 . Clean the data in R:
Given the two National Health Interview Survey datasets, the data dictionaries, and the scoring
document, achieve the following:
Clean the data in R:
a In the child dataset, identify and print the duplicate records with the same ID value.
Eliminate the duplicate record for each instance so there is only one of each ID
b In the child dataset, identify and print records with values that are out of range for the
following variables consult data dictionary Determine how best to deal with these
values. Document your decisions in the dataset cover sheet see # below
i BSCNWPPLC
ii BSCNWPLCSC
iii. BSCCHGC
iv BSCHLOPPLC
v BSCCRYALTC
vi BSCCLMDWNC
vii. BSCFUSSYC
viii. BSCSTHEC
ix BSCSCHDC
x BSCPTSLPC
xi BSCSTYSLPC
xii. BSCPRLKSLC
c In the child dataset, convert the above listed variables into factors and define the levels
according to the data dictionary.
d In the child dataset:
i Create a new variable AGECAT grouping the age of the child AGEPC into the
following categories:
ii Generate frequencies for the new age groups
iii. Generate a list of all records who have either had borderline diabetes or
prediabetes. Save these records to a new dataset called DIABETES.
Generate a permanent crosssectional analysis data set.
a Create a new variable called HHXNUM that is the HHX variable value without the H at
the beginning contains only the last numbers of the variable
b Generate frequencies for sex, age, Hispanic ethnicity, and general health status.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
