Question: 5 | P a g e Task First, copy the code below to a R script. Enter your student ID into the command set.seed (
P a g e
Task
First, copy the code below to a R script. Enter your student ID into the command set.seed
and run the whole code. The code will create a subsample that is unique to you.
Use the str command to check that the data type for each feature is correctly specified.
Address the issue if this is not the case.
You are to clean and perform basic data analysis on the relevant features in mydata, and as
well as principal component analysis PCA on the continuous variables. This is to be done
using R You will report on your findings.
Part Exploratory Data Analysis and Data Cleaning
i For each of your categorical or binary variables, determine the number of
instances for each of their categories and summarise them in a table as follows.
State all percentages in decimal places.
Categorical Feature Category N
Feature Category
Category
Category
Missing
Feature Binary YES
NO
Missing
Feature k Category
Category
Category
Category
Missing
# You may need to changeinclude the path of your working directory
dat read.csvHealthCareDatacsv stringsAsFactors TRUE
# Separate samples of normal and malicious events
dat.class dat filterClassification "Normal" # normal
dat.class dat filterClassification "Malicious" # malicious
# Randomly select samples from each class, then combine them to form a working dataset
set.seedEnter your student ID here
rand.class dat.classsample:nrowdatclass size replace FALSE
rand.class dat.classsample:nrowdatclass size replace FALSE
# Your subsample of observations
mydata rbindrandclass rand.class
dimmydata # Check the dimension of your subsample
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
