Question: Programming: Create an R script to explore and clean the raw data. In the file, at a minimum, you must: a . Read the Raw

Programming: Create an R script to explore and clean the raw data. In the file, at a minimum, you must:
a. Read the Raw-Data and Calendar worksheets of RawData.xlsx into R. DO NOT CHANGE the name of the
input file or specify a particular working directory. This ensures that I can run the file on my machine without
making any environment changes to match your script.
b. Add Quarter and Year columns to the Raw-Data sheet by matching the Receipt Date listed in that
worksheet to the date ranges specified in the Calendar sheet. Do not hard code any of the date values!
c. Calculate In-transit Lead Time and Manufacturing Lead Time for each row in the joined dataset.
d. Clean all columns of Raw-Data to ensure no unusual or missing values are included in the dataset. The date
columns do not need to be cleaned, but other columns, especially the In-transit Lead Time and
Manufacturing Lead Time columns, will require significant cleaning.
e. Only full rows that contain mostly NAs should be deleted from the dataset. For all other unusual or missing
values, impute new values.
f. After the data cleaning process, explore the columns with both numeric and graphical methods. E.g., finding
the relationships between the In-transit Lead Time and any other columns that may affect it.
g. Calculate and report correlations between all variables. This must be either a correlation plot or a table that
lists each predictors correlation with In-transit Lead Time in descending order. Since you need to calculate
the correlation between In-transit lead time and other affecting factors, this requires all variables considered
are numeric. Therefore, you need to convert the LOB, Origin, Ship Mode, and Quarter to numeric variables.
You can consider one-hot encode those categorical variables. To do so, you can use the dummy_cols()
function from the fastDummies package. Please do some self learning about one-hot encode and the
aforementioned function.
h. Your R code must include at least one user created functions that are used in your exploration and cleaning
process. That function cannot be the get_upper_tri function that I used in the Exploratory Data Analysis module.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!