Question: Deliverables In your zipped deliverable folder please submit only three files: Your report ( please use word and do not submit a pdf document or

Deliverables
In your zipped deliverable folder please submit only three files:
Your report (please use word and do not submit a pdf document or marks will be
deducted).(firstname_report.docx)
Your modelling code for your best model which runs from start to finish without
error. (firstname_test_train.py)
Your production code (firstname_production.py). The production code must:
Prepare the data, use your pre-trained top model to make predictions
and output the predictions to a csv file.
Contain all data preparation code.
Not delete any data.
Not contain any model training or testing code.
Run without error.
Load data from a file that uses the exact format as the
asgn1_houseprices_mystery.csv file. The number of rows may vary.
Output to your predictions in the exact same format as shown with the
sample asgn1_houseprices_predictions.csv file.
(Please see the hints for this assignment in the assignment folder for an example of
production code).Report
Please keep your report to limited between 8 to 12 pages. Be creative here about how you fit
the content in it and what you fit in. Your employer is not actually familiar with the attributes in
the dataset so at least report on the numbers but you do not have to be an expert in the
domain to be a good analyst. Initiative, good self-judgment and professional delivery are
appreciated by your employer. Aim for efficient reader-friendliness with comparison tables and
uncluttered visualizations where appropriate rather than large dumps of data in the report.
Report Introduction (2 marks)
Describe the problem that you are attempting to solve. Mention the best features that
were found at the start of the report.
Exploratory Data Analysis (23 marks)
For the EDA section please assume you are presenting an overview of the data to a not-
very-technical group of managers. Please use good sense when assembling the EDA. Be
mindful of your time and also of the time for the group you are serving. In your report,
please be sure to focus on the variables that make a difference in your model right
away. Do not spend too much time discussing variables that are not relevant.
Prepare a summary of the data.
Show the correlations between the target and at least all predictor variables which are
in the final best model.
Highlight features of interest and how they might impact the predictions positively or
negatively.
You may show scatter plots, histograms, or other plots for relevant variables where
appropriate.
Create a visual summary (s) which categorizes the target range into three or four
groups. For example; "Group A has xx traits and high numbers of yy. Group B has ww
traits and moderate amounts of zz. Your employer is not really sure what is needed herebut has asked you to figure out how to present this information in an easy-to-read
format. Please see the hints folder for suggestions.
Development (15 marks)
You are to make at least three models:
One of the models must include binned and dummy variables. Take a reasonable effort
to find binned and dummy values which ideally boost performance (it may not though).
Please be creative here.
Remember categorical variables are good candidates for dummy variables and possibly
binning.
One model must include outlier treatment of some kind. Please be creative here.
One model must not include binned and dummy variables.
Experimentation with many different variable combinations is encouraged and
necessary to discover the top performing models.
Uses cross fold validation with truly random data (in other words remove random_state
from your code)
Model Evaluation (15 marks)
In your report:
Compare the results of your models in a table with any scores that you find helpful.
The table must list all features included in each model beside with scores that include
average scores for RMSE, RMSE SD,R2,Rajj2, AIC, BIC. Links to look up the features in
other parts of the document will lose marks.
For your selected model, show a plot of the predicted vs. actual results, residual error
versus actual results.
Please make it easy for your readers to understand what you did.
Model Selection (5 marks)
Identify your preferred model and explain why you chose it.
Model Interpretation (4 marks)
In your report:
Write out the equation for your best model.
Deliverables In your zipped deliverable folder

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!