Question: Introduction In this assignment you will build the best model possible to predict SalePrice . Late Submissions Late submissions are permitted but 1 0 %
Introduction
In this assignment you will build the best model possible to predict SalePrice
Late Submissions
Late submissions are permitted but will be deducted from the assignment for each day
that the assignment is late. As always, sharing your code with others and copying code is not
permitted.
Deliverables
In your zipped deliverable folder please submit only three files:
Your report please use word and do not submit a pdf document or marks will be
deductedfirstnamereport.docx
Your modelling code for your best model which runs from start to finish without
error. firstnametesttrain.py
Your production code firstnameproduction.py The production code must:
Prepare the data, use your pretrained top model to make predictions
and output the predictions to a csv file.
Contain all data preparation code.
Not delete any data.
Not contain any model training or testing code.
Run without error.
Load data from a file that uses the exact format as the
asgnhousepricesmystery.csv file. The number of rows may vary.
Output to your predictions in the exact same format as shown with the
sample asgnhousepricespredictions.csv file.
Please see the hints for this assignment in the assignment folder for an example of
production code
Marks will be deducted if deliverables are not submitted in the format requested.
Competition
The person in the class who gets the lowest RMSE when I test their score code without
error will get added to their assignment mark.
The next lowest RMSE without error will get a bonus added to their assignment
mark.
All participants who make an honest effort earn good karma.
Disqualification from the competition will result if too many insignificant variables exist in the
score code. Excessive overfitting is not allowed. Late submissions are not eligible for the bonus.
You must use linear regression. You cannot use alternative algorithms.
Report
Please keep your report to limited between to pages. Be creative here about how you fit
the content in it and what you fit in Your employer is not actually familiar with the attributes in
the dataset so at least report on the numbers but you do not have to be an expert in the
domain to be a good analyst. Initiative, good selfjudgment and professional delivery are
appreciated by your employer. Aim for efficient readerfriendliness with comparison tables and
uncluttered visualizations where appropriate rather than large dumps of data in the report.
Not Evident
points
Little to none
points
Needs Improvement
points
About right; as expected
points
Above average
points
Extraordinary Well above the expected norm
points
Report Introduction marks
o Describe the problem that you are attempting to solve. Mention the best features that
were found at the start of the report.
Exploratory Data Analysis marks
o For the EDA section please assume you are presenting an overview of the data to a not
verytechnical group of managers. Please use good sense when assembling the EDA. Be
mindful of your time and also of the time for the group you are serving. In your report,
please be sure to focus on the variables that make a difference in your model right
away. Do not spend too much time discussing variables that are not relevant.
o Prepare a summary of the data.
o Show the correlations between the target and at least all predictor variables which are
in the final best model.
o Highlight features of interest and how they might impact the predictions positively or
negatively.
o You may show scatter plots, histograms, or other plots for relevant variables where
appropriate.
o Create a visual summary s which categorizes the target range into three or four
groups. For example; Group A has xx traits and high numbers of yy Group B has ww
traits and moderate amounts of zz Your employer is not really sure what is needed here but has asked you to figure out how to present this information in an easytoread
format. Please see the hints folder for suggestions.
Development marks
Not Evident
points
Little to none
points
Needs Improvement
points
About right; as expected
points
Above average
points
Extraordinary Well above the expected norm
points
You are to make at least three models:
o One of the models must include binned and dummy variables. Take a reasonable effort
to find binned and dummy values which ideally boost performance it may not though
Please be creative here.
o Remember categorical variables are good candidates for dummy variables and possibly
binning.
o One model must include outlier treatment of some kind. Please be creative here.
o One model must not include binned and dummy variables.
o Experimentation with many different variable combinations is encouraged and
necessary to discover the top performing models.
o Uses cross fold validation with truly random data in other words remove randomstate
from your code
Model Eval
Incl
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
