Question: Part 1 - General data preparation and cleaning. a) Import the MLDATASET_PartiallyCleaned.xlsxinto R Studio. This dataset is a partially cleaned version of MLDATASET-200000-1612938401.xlsx b) Write

Part 1 - General data preparation and cleaning. a) Import the MLDATASET_PartiallyCleaned.xlsxinto R Studio. This dataset is a partially cleaned version of MLDATASET-200000-1612938401.xlsx b) Write the appropriate code in R Studio to prepare and clean the MLDATASET PartiallyCleaned dataset as follows: i. ii. For How.Many.Times.File.Seen, set all values = 65535 to NA: Convert Threads.Started to a factor whose categories are given by 1= 1 thread started 2 = 2 threads started 3= 3 threads started 4 = 4 threads started 5= 5 or more threads started Hint: Replace all values greater than 5 with 5, then use the factor(.) function. iii. Log-transform Characters.in.URL using the log() function, and remove the original Characters.in.URL column from the dataset (unless you have overwritten it with the log-transformed data) iv. Select only the complete cases using the nagmit() function, and name the dataset MLDATASET.cleaned. Briefly outline the preparation and cleaning process in your report and why you believe the above steps were necessary
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
