Question: Introduction In this assignment you will build the best model possible to predict SalePrice . Late Submissions Late submissions are permitted but 1 0 %

Introduction
In this assignment you will build the best model possible to predict SalePrice.
Late Submissions
Late submissions are permitted but 10% will be deducted from the assignment for each day
that the assignment is late. As always, sharing your code with others and copying code is not
permitted.
Deliverables
In your zipped deliverable folder please submit only three files:
1. Your report (please use word and do not submit a pdf document or marks will be
deducted).(firstname_report.docx)
2. Your modelling code for your best model which runs from start to finish without
error. (firstname_test_train.py)
3. Your production code (firstname_production.py). The production code must:
Prepare the data, use your pre-trained top model to make predictions
and output the predictions to a csv file.
Contain all data preparation code.
Not delete any data.
Not contain any model training or testing code.
Run without error.
Load data from a file that uses the exact format as the
asgn1_houseprices_mystery.csv file. The number of rows may vary.
Output to your predictions in the exact same format as shown with the
sample asgn1_houseprices_predictions.csv file.
(Please see the hints for this assignment in the assignment folder for an example of
production code).
Marks will be deducted if deliverables are not submitted in the format requested.
Competition
The person in the class who gets the lowest RMSE when I test their score code without
error will get 10% added to their assignment mark.
The next lowest RMSE (without error) will get a 5% bonus added to their assignment
mark.
All participants who make an honest effort earn good karma.
Disqualification from the competition will result if too many insignificant variables exist in the
score code. Excessive overfitting is not allowed. Late submissions are not eligible for the bonus.
You must use linear regression. You cannot use alternative algorithms.
Report
Please keep your report to limited between 8 to 12 pages. Be creative here about how you fit
the content in it and what you fit in. Your employer is not actually familiar with the attributes in
the dataset so at least report on the numbers but you do not have to be an expert in the
domain to be a good analyst. Initiative, good self-judgment and professional delivery are
appreciated by your employer. Aim for efficient reader-friendliness with comparison tables and
uncluttered visualizations where appropriate rather than large dumps of data in the report.
Not Evident
0 points
Little to none
6 points
Needs Improvement
14 points
About right; as expected
19 points
Above average
23 points
Extraordinary - Well above the expected norm
25 points
Report Introduction (2 marks)
o Describe the problem that you are attempting to solve. Mention the best features that
were found at the start of the report.
Exploratory Data Analysis (23 marks)
o For the EDA section please assume you are presenting an overview of the data to a not-
very-technical group of managers. Please use good sense when assembling the EDA. Be
mindful of your time and also of the time for the group you are serving. In your report,
please be sure to focus on the variables that make a difference in your model right
away. Do not spend too much time discussing variables that are not relevant.
o Prepare a summary of the data.
o Show the correlations between the target and at least all predictor variables which are
in the final best model.
o Highlight features of interest and how they might impact the predictions positively or
negatively.
o You may show scatter plots, histograms, or other plots for relevant variables where
appropriate.
o Create a visual summary (s) which categorizes the target range into three or four
groups. For example; Group A has xx traits and high numbers of yy. Group B has ww
traits and moderate amounts of zz. Your employer is not really sure what is needed here but has asked you to figure out how to present this information in an easy-to-read
format. Please see the hints folder for suggestions.
Development (15 marks)
Not Evident
0 points
Little to none
3 points
Needs Improvement
7 points
About right; as expected
11 points
Above average
13.5 points
Extraordinary - Well above the expected norm
15 points
You are to make at least three models:
o One of the models must include binned and dummy variables. Take a reasonable effort
to find binned and dummy values which ideally boost performance (it may not though).
Please be creative here.
o Remember categorical variables are good candidates for dummy variables and possibly
binning.
o One model must include outlier treatment of some kind. Please be creative here.
o One model must not include binned and dummy variables.
o Experimentation with many different variable combinations is encouraged and
necessary to discover the top performing models.
o Uses cross fold validation with truly random data (in other words remove random_state
from your code)
Model Eval
Incl

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!