Question: Note that examples of problems for you to find and solve can be: - Identify which suburb / location had the biggest growth in SalePrice
Note that examples of problems for you to find and solve can be:
Identify which suburblocation had the biggest growth in SalePrice by plotting and
examining the sale prices cross different suburbs;
Analyse a possible pattern of SalePrice vs YrSoldMoSold LotArea andor some other
variables which can reasonably be included;
Use predictions from your final model to compare suburbs which have shown
varying growth. Or to identify which suburbs have been growing the most over the
last few years.
UG students unit : Generate and address at least five problems.
G students unit : Generate and address at least seven problems, including the last
problem listed above which uses predictions from your final model, eg find a way to
compare the predictions maybe median? between suburbs could be the top suburbs
which have shown varying growth from your time series plots of growth over time.
Data preprocessing:
In this section you should:
Preprocess your code, treat missing values etc.
Note at least one key observation, eg identified possible missing values or outliers
for a particular areasuburb or year eg is significantly higher. Or perhaps one
column is missing more than of its values.
EDA:
In this section you should:
Include tasks such as determining which variables are significant, which observations
may be outliers etc., and other EDA goals.
Find as much insight as possible to support your modelling decisions later on
Use data visualisation techniques taught in the unit to answer your chosen problems
of interest.
Further preprocessing:
In this section you should:
Select the final variables for your model based off your EDA basically remove the
nonsignificant variables
Create any new variables which you think may help based on your EDA in this
section.
Justify your decisions and provide EDA evidence as to how a variable is insignificant
eg no observable relationship to target variable in scatter plot
Modelling:
In this section you should:
Fit and evaluate a linear model to describe the relationship between your target
variable and a number of selected significant predictors.
Use your model to predict the prices of properties described by your test dataset.
Alternatively, you may use another, more advanced model of your choice. If you do use a
linear model, remember its likings such as a normalised distribution in the target variable.
Evaluation:
You should:
Evaluate your model against the metric RMSE given the actual values in the test
dataset
Plot the residuals similar to that shown in the Week slides. Pick a suitable cut off
value for the red dots.
The data science methodology is an iterative process. Try to minimise your RMSE, so always
go back and think about what improvements can be made, then fit another model, and find
your second RMSE, and so on noting what works and what does not. Compare at least two
different models you considered, noting their differences.
Recommendations and final conclusions:
You should:
Summarise your findings and provide your found solutions to your problems of
interest. Note anything you found particularly interesting and useful to your project.
State the best RMSE you obtained and whyhow ie what variables you used, any
applied transformations etc.
State any improvements you could make and whyhow you could achieve such
improvements in future works.
References:
You should:
Include a reference list and cite your references via intext referencing or footnotes.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
