Question: Machine Learning ( Regression Models ) Instructions: Write step - by - step Python code to perform all data preparation you may consider relevant for
Machine Learning Regression Models
Instructions: Write stepbystep Python code to perform all data preparation you may consider relevant for the given dataset; you must perform:
An overview of the dataset.
method for data inputting andor removal of missing values
Data transformation: method for numeric variables andor method for categorical variables.
Exploratory data analysis: In your report you should describe your more interesting findings.
For the univariate analysis include only variables.
For bivariate or multivariate analysis include only plots
Explain the rationale for each of your selections.
Preliminary conclusions and insights derived from your EDA.
Refine your ML model using R as evaluation metric; you mst deliver:
The JPYNB script containing ONLY ONE model with the best result
Notes:
sampletest.csv can only be used for data preprocessing and testing purposes NOT training
NO tune parameter optimization will be allowed during
The training process will be allowed using the AssignmentIdata.csv file and with the same traintest split proportion employed during your research.
Note : The sampletest.csv file is a small sample of the AssignmentIdata.csv file. Do not draw any conclusions regarding the performance of your model using this dataset.
Note : The independent set to be used for the contest consists of rows.
Objective:
To predict the nextday temperature according to the given conditi
ons, using a machine learning model regression problem Note: It is NOT a timeseries problem.
Dataset Description:
The dataset is composed of several nextday forecast variables, maximum and minimum temperatures of presentday, and geographic auxiliary variables collected for a period of years by the Korean Meteorological Service over Seoul, South Korea. The output variable is the nextday average temperature NextDayAvTemp
Datafiles description:
"AssignmentIdata": csv file rows headers included
"sampletest": csv file rows headers included
Dataset Variables:
Station: Used weather station number.
PresentTmax : Maximum air temperature between ansampletestod h on the present day deg C
PresentTmin: Minimum air temperature between and h on the present day deg C
NextDayPredRHmin: Forecast of nextday minimum relative humidity NextDayPred RHmax: Forecast of nextday maximum relative humidity
NextDayPred Tmaxlapse: Forecast of nextday maximum air temperature applied lapse rate deg C
NextDayPred Tminlapse: Forecast of nextday minimum air temperature applied lapse rate deg C
NextDayPred WS: Forecast of nextday average wind speed ms
NextDayPred LH: Forecast of nextday average latent heat flux Wm
NextDayPred CC: Forecast of nextday st hour split average cloud cover h
NextDayPred CC: Forecast of nextday nd hour split average cloud cover h
NextDayPred CC: Forecast of nextday rd hour split average cloud cover h
NextDayPred CC: Forecast of nextday th hour split average cloud cover h
NextDayPred PPT: Forecast of nextday st hour split average precipitation h
NextDayPred PPT: Forecast of nextday nd hour split average precipitation h
NextDayPred PPT: Forecast of nextday rd hour split average precipitation h
NextDayPred PPT: Forecast of nextday th hour split average precipitation h
Lat: Latitude deg
Lon: Longitude deg
DEM: Elevation m
Slope: Slope deg
Solar radiation: Daily incoming solar radiation whm
NextDayAvTemp: The nextday average air temperature deg C
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
