Question: Part 1 - General data preparation and cleaning. a) Import the MLDATASET_PartiallyCleaned.xlsxinto R Studio. This dataset is a partially cleaned version of MLDATASET-200000-1612938401.xlsx b) Write

 Part 1 - General data preparation and cleaning. a) Import the

Part 1 - General data preparation and cleaning. a) Import the MLDATASET_PartiallyCleaned.xlsxinto R Studio. This dataset is a partially cleaned version of MLDATASET-200000-1612938401.xlsx b) Write the appropriate code in R Studio to prepare and clean the MLDATASET PartiallyCleaned dataset as follows: i. ii. For How.Many.Times.File.Seen, set all values = 65535 to NA: Convert Threads.Started to a factor whose categories are given by 1= 1 thread started 2 = 2 threads started 3= 3 threads started 4 = 4 threads started 5= 5 or more threads started Hint: Replace all values greater than 5 with 5, then use the factor(.) function. iii. Log-transform Characters.in.URL using the log() function, and remove the original Characters.in.URL column from the dataset (unless you have overwritten it with the log-transformed data) iv. Select only the complete cases using the nagmit() function, and name the dataset MLDATASET.cleaned. Briefly outline the preparation and cleaning process in your report and why you believe the above steps were necessary

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!