Question: 8 . This exercise relates to the College data set, which can be found in the fle College.csv on the book website. It contains a

8 .

This exercise relates to the College data set, which can be found in

the fle College.csv on the book website. It contains a number of

variables for

777

diferent universities and colleges in the US

.

The

variables are

Private : Public

/

private indicator

Apps : Number of applications received

Accept : Number of applicants accepted

Enroll : Number of new students enrolled

Top

10

perc : New students from top

10 %

of high school class

Top

25

perc : New students from top

25 %

of high school class

.

Undergrad : Number of full

-

time undergraduates

.

Undergrad : Number of part

-

time undergraduates

2.4

Exercises

55

Outstate : Out

-

-

state tuition

Room.Board : Room and board costs

Books : Estimated book costs

Personal : Estimated personal spending

PhD : Percent of faculty with Ph

.

.

Terminal : Percent of faculty with terminal degree

.

.

Ratio : Student

/

faculty ratio

perc.alumni : Percent of alumni who donate

Expend : Instructional expenditure per student

Grad.Rate : Graduation rate

Before reading the data into R

,

it can be viewed in Excel or a text

editor.

(

)

Use the read.csv

()

function to read the data into R

.

Call the

loaded data college. Make sure that you have the directory set

to the correct location for the data.

(

)

Look at the data using the View

()

function. You should notice

that the frst column is just the name of each university. We don

really want R to treat this as data. However, it may be handy to

have these names for later. Try the following commands:

>

rownames

(

college

) < -

college

[, 1]

>

View

(

college

)

You should see that there is now a row.names column with the

name of each university recorded. This means that R has given

each row a name corresponding to the appropriate university. R

will not try to perform calculations on the row names. However,

we still need to eliminate the frst column in the data where the

names are stored. Try

>

college

< -

college

[, - 1]

>

View

(

college

)

Now you should see that the frst data column is Private. Note

that another column labeled row.names now appears before the

Private column. However, this is not a data column but rather

the name that R is giving to each row.

(

)

.

Use the summary

()

function to produce a numerical summary

of the variables in the data set.

.

Use the pairs

()

function to produce a scatterplot matrix of

the frst ten columns or variables of the data. Recall that

you can reference the frst ten columns of a matrix A using

[, 1

10] .

56 2 .

Statistical Learning

iii. Use the plot

()

function to produce side

-

-

side boxplots of

Outstate versus Private.

.

Create a new qualitative variable, called Elite, by binning

the Top

10

perc variable. We are going to divide universities

into two groups based on whether or not the proportion

of students coming from the top

10 %

of their high school

classes exceeds

50 % .

>

Elite

< -

rep

("

",

nrow

(

college

))

>

Elite

[

college$Top

10

perc

> 50] < -

"Yes"

>

Elite

< -

.

factor

(

Elite

)

>

college

< -

data.frame

(

college

,

Elite

)

Use the summary

()

function to see how many elite universities there are. Now use the plot

()

function to produce

side

-

-

side boxplots of Outstate versus Elite.

.

Use the hist

()

function to produce some histograms with

difering numbers of bins for a few of the quantitative variables. You may fnd the command par

(

mfrow

=

(2, 2))

useful: it will divide the print window into four regions so

that four plots can be made simultaneously. Modifying the

arguments to this function will divide the screen in other

ways.

.

Continue exploring the data, and provide a brief summary

of what you discover

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Create an R script to answer 8(a)-(c). For 8(c) use R comments (i.e., using "#") for your answer. 8. This exercise relates to the College data set, which can be found in the file College.csv. It...

8. This exercise relates to the College data set, which can be found in the file college.csv. It contains a number of variables for 777 different universities and colleges in the US. The variables...

Assignment 1 This exercise relates to the College data set, which can be found in the file College.csv http://www-bef ase edu/-gareth ISLUdata html). It contains a number of variables for 777...

Applied This exercise relates to the College data set, which can be found in the file College.csv on the book website. It contains a number of variables for 7 7 7 different universities and colleges...

This exercise relates to the College data set, which can be found in the file College.csv on the book website. It contains a number of variables for 777 different universities and colleges in the US....

Problem 1 (explore the data): This exercise relates to the College data set, which can be found in the file College.csv (http://www-bcf.usc.edu/~gareth/ISL/data.html). It contains a number of...

This exercise relates to the College data set, which can be found in the file College.csv (http://www-bcf.usc.edu/~gareth/ISL/data.html). It contains a number of variables for 777 different...

Subject: Statistical Methods and Analysis Topic: College Registration Instruction: Outcomes: The successful student will be able to: Describe the types of data collected in a specific enterprise...

What is the amino acid sequence (using three-letter abbreviations) of methionine enkephalin? Show it using one-letter abbreviations.

You have access to the following three spot exchange rates: $0.01/yen $0.20/krone 25 yen/krone You start with dollars and want to end up with dollars. a. How would you engage in arbitrage to profit...

Discuss how Malaysian companies can leverage these indicators to improve their market capitalization and attract investors

Assume that today is November 15, 2016 and there are exactly 4 six-month time periods remaining until maturity for a 30-year US Treasury bond that was originally issued on November 15, 1988. The bond...

Consider the redundancy scheme set out in Table 4.6. Make a list of the diffi culties that could be raised by employees, the problems that could be faced in practice and how you would sell the scheme...

Automatic telephone screening (case study 3.1) helps to reduce the exclusion of those over 50. Has it any other advantages for minority groups?

How objective is an interview? What makes is more objective? What makes it less objective? Should it be regarded as the fi rst part of a programme to socialise an applicant into the organisation? Or...