Question: Programming: Create an R script to explore and clean the raw data. In the file, at a minimum, you must: a . Read the Raw
Programming: Create an R script to explore and clean the raw data. In the file, at a minimum, you must:
a Read the RawData and Calendar worksheets of RawData.xlsx into R DO NOT CHANGE the name of the
input file or specify a particular working directory. This ensures that I can run the file on my machine without
making any environment changes to match your script.
b Add Quarter and Year columns to the RawData sheet by matching the Receipt Date listed in that
worksheet to the date ranges specified in the Calendar sheet. Do not hard code any of the date values!
c Calculate Intransit Lead Time and Manufacturing Lead Time for each row in the joined dataset.
d Clean all columns of RawData to ensure no unusual or missing values are included in the dataset. The date
columns do not need to be cleaned, but other columns, especially the Intransit Lead Time and
Manufacturing Lead Time columns, will require significant cleaning.
e Only full rows that contain mostly NAs should be deleted from the dataset. For all other unusual or missing
values, impute new values.
f After the data cleaning process, explore the columns with both numeric and graphical methods. Eg finding
the relationships between the Intransit Lead Time and any other columns that may affect it
g Calculate and report correlations between all variables. This must be either a correlation plot or a table that
lists each predictors correlation with Intransit Lead Time in descending order. Since you need to calculate
the correlation between Intransit lead time and other affecting factors, this requires all variables considered
are numeric. Therefore, you need to convert the LOB, Origin, Ship Mode, and Quarter to numeric variables.
You can consider onehot encode those categorical variables. To do so you can use the dummycols
function from the fastDummies package. Please do some self learning about onehot encode and the
aforementioned function.
h Your R code must include at least one user created functions that are used in your exploration and cleaning
process. That function cannot be the getuppertri function that I used in the Exploratory Data Analysis module.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
