Question: SAS Descriptive Data Analysis Assignment Purpose & Content of Analysis Paper The purpose of this assignment is to provide a rough draft of a section

SAS Descriptive Data Analysis Assignment

Purpose & Content of Analysis Paper

The purpose of this assignment is to provide a rough draft of a section of your analysis paper so that you can receive feedback on your methods. It should include the following:

1.Short description of your research question

2.Statement of the dataset you will use to answer the question

3.Names of the variables you will use

4.Descriptions of the variables you will use

5.List of the SAS analyses you did to provide descriptive statistics

6.Discussion of the results of those analyses

7.Your SAS code (copied into a Word document)

8.Your results (printed from SAS and saved as a PDF file)

An example is below. You do not have to use as many variables as I did, but I followed my own advice and did something that was of interest to me. I also wrote much more than would be expected for this assignment, but since some people expressed concern about having enough to say in the final paper I gave a lot of examples of what you could say about statistical results.

Two bits of SAS code that may help you

If you don't want to include everyone in your analysis, you can include this line

WHERE (something you want to exclude);

For example, I created a new data set as follows:

DATA temp ;

SET mydata.coh602 ;

WHERE AHEDUC < 91 ;

This new dataset was originally made up of the COH602 study data, but once you include the WHERE statement, now the new dataset will only include participants that have an AHEDUC value that is less than 91

You can also create new variables, for example, age and race categories, as follows:

DATA temp ;

SET mydata.coh602 ;

IF srage_p < 25 THEN age = "Young" ;

ELSE IF srage_p < 45 then age = "Midlife" ;

ELSE IF srage_p< 65 then age = "Older" ;

ELSE IF srage_p > 64 then age = "Oldest" ;

if srh = 1 then race = "Hispanic" ;

else if srw = 1 then race = "White" ;

else if sraa = 1 then race = "Af.Amer" ;

SAS DESCRIPTIVE ANALYSIS EXAMPLE (AN EXAMPLE OF WHAT YOU SHOULD WRITE)

The question addressed by this analysis is, "Is there a relationship between weight status and depression?" To assess this properly, one would need to consider extraneous variables that might affect this relationship. It was hypothesized that people who are obese are more likely to be older, have diabetes, be female, be non-white, be in worse general health and be poor.All of these factors might affect depression as well.

(In an actual study I would have done a literature review to examine all of the possible correlations with obesity and depression, but since this is a statistics project I am just focusing on the statistical analysis.)

This project used the class data set, COH602, which is a subset of variables from the 2012 California Health Interview Survey.

First, a PROC CONTENTS was conducted with SAS Web Editor to determine what variables might be available for the project and in what format. The following variables were selected for this analysis:

Variable NameLabel

AB1 General Health

AHEDUC Education

AK22_P Income

Obese

RACE

MARIT Marital Status

SRAGE_p Self-reported Age

SRSEX

Diabetes

DepressyrScore on how depressed they felt in the past year

The 2011- 2012 Data Dictionary (Regents of University of California, 2013) provided the following information;

General Health was rated on a scale from 1(=excellent) to 5(= poor). A lower score means better health.

Education was rated from 1(=Grade 1-8) to 10 (=Ph.D) with 91 as no formal education.

Annual income was coded in dollars.

Obese was coded as 0 =no and 1= yes

Race was coded as Black, White or Hispanic

Marital status was coded as 1= married, 2= widowed, separated, divorced 3=never married

Age was measured in years

Sex was coded as 1 = male, 2 = female

Diabetes was coded as 0 = no, 1= yes

Depression score was the sum of items rated 1(=All of the time) to 5(=Not at all). A lower score represents more depression

First, frequency distributions using the PROC FREQ procedure in SAS were computed for all of the categorical variables. These were obesity status, gender, race and marital status. Although they were ordinal variables, frequency distributions were also computed for general health rating and education.

The general health frequency distribution was computed given that with five categories, useful information might be gained from examining the frequencies within each health rating. With the education variable, the real interest was in the possibility of outliers since some of the participants had no formal education. It was found that 332 people, less than 1% of the sample, had no formal education and these were deleted from the sample for the remainder of the study.

Second, univariate statistics using the SAS PROC UNIVARIATE procedure were computed for education (without the 332), age, general health, income and the depression score.

Both the health and depression variables were graphed using SAS PROC GCHART to gain a clearer picture of their distributions.

For the final descriptive analysis, the data set was sorted by obesity status and means computed for health rating, depression score and age.

Results

Descriptive statistics for the sample are shown in Table 1. It should be noted that mean income was positively skewed, with a median of $50,000.Education was also slightly skewed.A mean educational level of 5.2 was equivalent to the category of "some college or vocational school", while the modal response was 3, or a high school graduate. The sample was disproportionately female and white compared to the state population as a whole, but this may be due to the fact that only adults were sampled. Approximately half of the subjects were married.

It was expected that the income variable would be skewed, but surprisingly, the maximum income in the sample was $300,000 a year. Income was skewed, the mean was over $70,000, the median was $50,000 and the mode was $100,000 - but it was not nearly as skewed as reported in some other studies. Another variable that was not as skewed as might be expected was general health. It is generally reported that people tend to rate themselves higher than average, with everyone thinking they are healthier, better-looking and smarter than the average. This non-skewed distribution of the health variable did not fit that assumption.

A perfectly normal distribution would have values for skewness and kurtosis both equal to zero. Among the variables in this analysis, the variable that was the most non-normal was depression.

Descriptive statistics for the dependent and independent variables in the study are shown in Table 2. Descriptive statistics for age, depression score and health by obesity status are shown in Table 3.It can be seen that there are slight differences in depression score by obesity status, although the standard deviation for those who are obese seems somewhat larger. The most noticeable difference between the obese and non-obese subjects is in general health rating, with the two groups nearly half of a standard deviation apart.

Table 1

Sample Demographics

(N = 42,603)

Mean S.D.

Age55.018.0

Income70,00063,973

Education5.22.5

Gender

Female24,87658.4%

Race

African-American2,0715.5

Hispanic9,23524.4

White26,49270.1

Marital Status

Married21,19550.0

Widowed/Divorced13,97832.8

Never married7,43017.4

Table 2

Descriptive Statistics for Dependent and Independent Variables

MeanSD

Depression Score26.73.8

Obesity Status

Obese10,73225.2

Diagnosed with Diabetes 4, 61310.8

Table 3

Depression and health scores by Obesity

ObeseNot Obese

MeanSDMeanSD

Depression Score26.24.226.93.6

Age54.715.855.218.7

General Health 3.01.02.51.1

Reference

Regents of University of California(2013)California Health Interview Survey Data Dictionary http://healthpolicy.ucla.edu/chis/data/public-use-data-file/Documents/CHIS_2011-2012_Data_Dictionary_PUF-ADULT.pdf

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!

Wk 2 - Apply: Signature Assignment: Statistical Report [due Mon] Assignment Content Resources: Pastas R Us, Inc. Database & Microsoft Excel, Wk 1: Descriptive Statistics Analysis Assignment Purpose...

Assignment Content Resources: Pastas R Us, Inc. Database & Microsoft Excel, Wk 1: Descriptive Statistics Analysis Assignment Purpose This assignment is intended to help you learn how to apply...

Descriptive Statistics Data Analysis[WLOs: 1, 3] [CLOs: 2, 3, 5] Prior to beginning work on this assignment, review Chapter 1 and Chapter 2 in your course textbook, Chapter 3 in the Jarman e-book,...

Course Code: BTM7104-8 Course Start Date: 09/12/2016 Section: Signature Assignment Week: 8 Activity: Create and Analyze a Self-designed Fictitious Study Activity Due Date: 11/06/2016 Activity...

For this assignment, you will undertake an analysis based on a self-designed fictitious study that utilizes statistical methodologies. You will first develop a fictitious problem to examine - it can...

MBA 5652, Research Methods Course Syllabus Course Description Business research methods will guide students in advancing their knowledge of different research principles and their applicability in...

Chapter 7 Revising and Presenting Your Writing I'm not a very good writer, but I'm an excellent rewriter. James A. Michener Half my life is an act of revision. John Irving Getting Started INT RODU CT...

Using Exhibits 6.4 and 6.5 from Chapter 6 in your textbook as guidelines, respond to the following statements. You are not required to submit the responses, but use your answers as the foundation for...

Match the following list of BPMN symbols to the letters A through I in Figure. 1. Message flow 2. Association 3. Pool 4. Event 5. Lane 6. Data object 7. Gateway 8. Activity 9. Sequenceflow Name Name...

The following selected account balances are provided for Delray Mfg. $1,195,000 39,000 56,900 62,900 173,200 224,000 20,800 54,000 5,250 53,000 101,000 130,000 40,800 40,300 71,100 Sales Raw...

Investors in securitized loans normally receive added assurance that they will be repaid in the form of guarantees against default issued by: a credit enhancer. the originator. the speciappurpose...

A company is preparing completing their Cash Budget. The following data has been prepared for cash receipts and payments. January February March Cash receipts $1,061,200 $1,182,400 $1,091,700 Cash...