Question: R Project 0 2 : Data Frame / Data Preparation / Data Mining Please download the R file R Project 0 2 . R .

R Project

02

: Data Frame

/

Data Preparation

/

Data Mining

Please download the R file

R Project

02 .

.

Please download the data files:

Pokeman Data.csv

,

Pokemon Info.csv

,

and

Diabetes

Data.csv

Please write down your code in the file and submit the file in the submission box.

Section one: Pok

mon

1 .

Load two

.

csv files

Pokeman Data.csv

and

Pokemon Info.csv

as data frames.

The datasets record the basic information and key ratios of

721

Pok

mon

.

The data attributions

are listed as follows:

Pok

mon Info.csv Pok

mon Data.csv

ID PokeID

Pok

mon Pok

mon

Height Attack

Weight Defense

Type HP

Speed

2 .

Combine the two data frames into one data frame.

(

Hint: ID and PokeID are the primary keys for each table

)

3 .

Missing Values:

.

Print all the rows including missing values.

.

Pok

mon names: If you can find the Pok

mon names from another data frame, you

can replace the names. If you cannot find the names, you can remove the row.

.

Ratios: Please replace the missing values of the ratios with the median of the same

type.

4 .

Complete the following tasks:

.

Check the first

10

rows of the combined data frame.

.

Check the structure of the data frame. Make sure Pok

mon names and Type are

factors.

.

Use the summary command to check how many Pok

mon there are in each type.

(

Hint: summary

()

can be used for a single column of a data frame

)

5 .

Calculate a comprehensive score for Pok

mon

(=

the sum of attack, defense, hp

,

and

speed

) .

6 .

Complete the following filtering tasks:

.

Print all the Pok

mon of the

grass

type.

.

Print all the Pok

mon have an

attack

score larger than or equal to

50

and less than

or equal to

100 .

.

Print all the Pok

mon of the

grass

type and have a height larger than or equal to

15 .

.

Print the data for

pichu

.

R Project

02

: Data Frame

/

Data Preparation

/

Data Mining

Please download the

\ (

\)

file

"

R Project

02 .

" .

Please download the data files: "Pokeman Data.csv

",

"Pokemon Info.csv

",

and "Diabetes

Data.csv

"

Please write down your code in the file and submit the file in the submission box.

Section one: Pok

mon

1 .

Load two

.

csv files "Pokeman Data.csv

"

and "Pokemon Info.csv

"

as data frames.

The datasets record the basic information and key ratios of

721

Pok

mon

' .

The data attributions are listed as follows:

2 .

Combine the two data frames into one data frame.

(

Hint: ID and PokeID are the primary keys for each table

)

3 .

Missing Values:

.

Print all the rows including missing values.

.

Pok

mon names: If you can find the Pok

mon names from another data frame, you can replace the names. If you cannot find the names, you can remove the row.

.

Ratios: Please replace the missing values of the ratios with the median of the same type.

4 .

Complete the following tasks:

.

Check the first

10

rows of the combined data frame.

.

Check the structure of the data frame. Make sure Pok

mon names and Type are factors.

.

Use the summary command to check how many Pok

mon there are in each type.

(

Hint: summary

()

can be used for a single column of a data frame

)

5 .

Calculate a comprehensive score for Pok

mon

(=

the sum of attack, defense, hp

,

and speed

) .

6 .

Complete the following filtering tasks:

.

Print all the Pok

mon of the "grass" type.

.

Print all the Pok

mon have an "attack" score larger than or equal to

50

and less than or equal to

100 .

.

Print all the Pok

mon of the "grass" type and have a height larger than or equal to

15 .

.

Print the data for "pichu".

Section Two: Diabetes Data

This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes based on certain diagnostic measurements included in the dataset. In particular, all patients here are females at least

21

years old of Pima Indian heritage. The attributions are listed as follows:

1 .

Load the. csv file "Diabetes Data.csv

"

as a data frame

2 .

Data Preparation

/

Missing Values:

.

Check the summary and structure of the data

.

Encode the target variable as a factor

3 .

Data Splitting:

.

Split the dataset into the training set and testing set

.

Pick the variable needed: "Age", "BMI", "Glucose", and "Outcome"

4 .

Decision Tree Classification:

.

Fitting Decision Tree Classification to the Training set

.

Plotting the tree

5 .

Assess the model:

.

Predicting the Test set results

.

Making the Confusion Matrix

.

Calculate the accuracy ratios

6 .

Pick the variable needed for clustering analysis: "Age" and "BMI"

7 .

Using the Elbow Method to find the optimal number of cluster

8 .

Run the clustering analysis

9 .

Print the results in the plot

(

color the clusters

)

R Project 0 2 : Data Frame / Data Preparation /

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Using the PDF File, the Excel sheet needs to be completed. Master Budget Modeling Project for MyArmor, Inc. - 2017 Planning Year Project is Activated Save your project by following these directions -...

Problem 3 How does computing the amount paid in commissions to Metaphor agents in Problem 2 help an auditor verify the management assertion of completeness? Chapter 3 ACL Exercises and Problems...

MTH 596/696: Homework 2 Spring 2023 Due Monday, Feb 27th at 11:59pm Submission Instructions: Use R for all computation on this assignment. In your text, any explanations must be given in complete...

I will like step by step solution to chapters 3 problems , chapter 4 problems, chapter 5 problems of the attached document. Thanks ACL Assignments The ACL software bundled with the textbook comes...

I am very confused! Please answer 5 - 15 of the analysis portion. I am so lost. Link to video : https://physics.highpoint.edu/~atitus/videos/ Scroll down to 'Videos' and find the one titled ' Uniform...

Please write the entire R program in Rstudios Data frame in Janka.xlsx files contain two meaningful attributes. Please write the R program that reads file janka.csv and then computes: 1. mean,...

I will like answers to problems 3 , problem 4, problem 5 of the attached document. Thanks ACL Assignments The ACL software bundled with the textbook comes with a tutorial, which is a PDF file...

I need help please!!! Phase Two: Now that you have developed your cost estimates, its time to do some budgeting for your new business. Calculate the total amount of cash you will need to have before...

After comparing cash register tapes with inventory records, the accountant for Benning Convenience Stores is concerned that someone at one of the stores is not recording some of that store's cash...

2 decimal places in all answers. (a) Find the coins maximum height in meters above the ground? m Correct (100.0%) Input 1 StatusYou have completed this input. Correct(100.0%) (b) How long in seconds...

A private not - for - profit entity receives $ 3 2 , 0 0 0 in cash from solicitations made in the local community. The charity receives an additional $ 1 , 5 0 0 from members in payment of their...

Strategic, administrative, and operating plans are: a. organizational scope plans b. Standing plans c. hierarchical plans d. contingency plans