Question: SAS Descriptive Data Analysis Assignment Purpose & Content of Analysis Paper The purpose of this assignment is to provide a rough draft of a section
SAS Descriptive Data Analysis Assignment
Purpose & Content of Analysis Paper
The purpose of this assignment is to provide a rough draft of a section of your analysis paper so that you can receive feedback on your methods. It should include the following:
1.Short description of your research question
2.Statement of the dataset you will use to answer the question
3.Names of the variables you will use
4.Descriptions of the variables you will use
5.List of the SAS analyses you did to provide descriptive statistics
6.Discussion of the results of those analyses
7.Your SAS code (copied into a Word document)
8.Your results (printed from SAS and saved as a PDF file)
An example is below. You do not have to use as many variables as I did, but I followed my own advice and did something that was of interest to me. I also wrote much more than would be expected for this assignment, but since some people expressed concern about having enough to say in the final paper I gave a lot of examples of what you could say about statistical results.
Two bits of SAS code that may help you
If you don't want to include everyone in your analysis, you can include this line
WHERE (something you want to exclude);
For example, I created a new data set as follows:
DATA temp ;
SET mydata.coh602 ;
WHERE AHEDUC < 91 ;
This new dataset was originally made up of the COH602 study data, but once you include the WHERE statement, now the new dataset will only include participants that have an AHEDUC value that is less than 91
You can also create new variables, for example, age and race categories, as follows:
DATA temp ;
SET mydata.coh602 ;
IF srage_p < 25 THEN age = "Young" ;
ELSE IF srage_p < 45 then age = "Midlife" ;
ELSE IF srage_p< 65 then age = "Older" ;
ELSE IF srage_p > 64 then age = "Oldest" ;
if srh = 1 then race = "Hispanic" ;
else if srw = 1 then race = "White" ;
else if sraa = 1 then race = "Af.Amer" ;
SAS DESCRIPTIVE ANALYSIS EXAMPLE (AN EXAMPLE OF WHAT YOU SHOULD WRITE)
The question addressed by this analysis is, "Is there a relationship between weight status and depression?" To assess this properly, one would need to consider extraneous variables that might affect this relationship. It was hypothesized that people who are obese are more likely to be older, have diabetes, be female, be non-white, be in worse general health and be poor.All of these factors might affect depression as well.
(In an actual study I would have done a literature review to examine all of the possible correlations with obesity and depression, but since this is a statistics project I am just focusing on the statistical analysis.)
This project used the class data set, COH602, which is a subset of variables from the 2012 California Health Interview Survey.
First, a PROC CONTENTS was conducted with SAS Web Editor to determine what variables might be available for the project and in what format. The following variables were selected for this analysis:
Variable NameLabel
AB1 General Health
AHEDUC Education
AK22_P Income
Obese
RACE
MARIT Marital Status
SRAGE_p Self-reported Age
SRSEX
Diabetes
DepressyrScore on how depressed they felt in the past year
The 2011- 2012 Data Dictionary (Regents of University of California, 2013) provided the following information;
General Health was rated on a scale from 1(=excellent) to 5(= poor). A lower score means better health.
Education was rated from 1(=Grade 1-8) to 10 (=Ph.D) with 91 as no formal education.
Annual income was coded in dollars.
Obese was coded as 0 =no and 1= yes
Race was coded as Black, White or Hispanic
Marital status was coded as 1= married, 2= widowed, separated, divorced 3=never married
Age was measured in years
Sex was coded as 1 = male, 2 = female
Diabetes was coded as 0 = no, 1= yes
Depression score was the sum of items rated 1(=All of the time) to 5(=Not at all). A lower score represents more depression
First, frequency distributions using the PROC FREQ procedure in SAS were computed for all of the categorical variables. These were obesity status, gender, race and marital status. Although they were ordinal variables, frequency distributions were also computed for general health rating and education.
The general health frequency distribution was computed given that with five categories, useful information might be gained from examining the frequencies within each health rating. With the education variable, the real interest was in the possibility of outliers since some of the participants had no formal education. It was found that 332 people, less than 1% of the sample, had no formal education and these were deleted from the sample for the remainder of the study.
Second, univariate statistics using the SAS PROC UNIVARIATE procedure were computed for education (without the 332), age, general health, income and the depression score.
Both the health and depression variables were graphed using SAS PROC GCHART to gain a clearer picture of their distributions.
For the final descriptive analysis, the data set was sorted by obesity status and means computed for health rating, depression score and age.
Results
Descriptive statistics for the sample are shown in Table 1. It should be noted that mean income was positively skewed, with a median of $50,000.Education was also slightly skewed.A mean educational level of 5.2 was equivalent to the category of "some college or vocational school", while the modal response was 3, or a high school graduate. The sample was disproportionately female and white compared to the state population as a whole, but this may be due to the fact that only adults were sampled. Approximately half of the subjects were married.
It was expected that the income variable would be skewed, but surprisingly, the maximum income in the sample was $300,000 a year. Income was skewed, the mean was over $70,000, the median was $50,000 and the mode was $100,000 - but it was not nearly as skewed as reported in some other studies. Another variable that was not as skewed as might be expected was general health. It is generally reported that people tend to rate themselves higher than average, with everyone thinking they are healthier, better-looking and smarter than the average. This non-skewed distribution of the health variable did not fit that assumption.
A perfectly normal distribution would have values for skewness and kurtosis both equal to zero. Among the variables in this analysis, the variable that was the most non-normal was depression.
Descriptive statistics for the dependent and independent variables in the study are shown in Table 2. Descriptive statistics for age, depression score and health by obesity status are shown in Table 3.It can be seen that there are slight differences in depression score by obesity status, although the standard deviation for those who are obese seems somewhat larger. The most noticeable difference between the obese and non-obese subjects is in general health rating, with the two groups nearly half of a standard deviation apart.
Table 1
Sample Demographics
(N = 42,603)
Mean S.D.
Age55.018.0
Income70,00063,973
Education5.22.5
N%
Gender
Female24,87658.4%
Race
African-American2,0715.5
Hispanic9,23524.4
White26,49270.1
Marital Status
Married21,19550.0
Widowed/Divorced13,97832.8
Never married7,43017.4
Table 2
Descriptive Statistics for Dependent and Independent Variables
MeanSD
Depression Score26.73.8
N%
Obesity Status
Obese10,73225.2
Diagnosed with Diabetes 4, 61310.8
Table 3
Depression and health scores by Obesity
ObeseNot Obese
MeanSDMeanSD
Depression Score26.24.226.93.6
Age54.715.855.218.7
General Health 3.01.02.51.1
Reference
Regents of University of California(2013)California Health Interview Survey Data Dictionary http://healthpolicy.ucla.edu/chis/data/public-use-data-file/Documents/CHIS_2011-2012_Data_Dictionary_PUF-ADULT.pdf
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
