Question: Individual Assignment 2 : Predictive Analysis 1 . 0 Introduction An Asian automobile company aspires to enter the US market by setting up their manufacturing

Individual Assignment 2: Predictive Analysis
1.0 Introduction
An Asian automobile company aspires to enter the US market by setting up their manufacturing unit there and producing cars locally to give competition to their US and European counterparts.
They have contracted an automobile consulting company to understand the factors on which the pricing of cars depends. Specifically, they want to understand the factors affecting the pricing of cars in the American market, since those may be very different from the Asian market. The company wants to know:
- Which variables are significant in predicting the price of a car
- How well those variables describe the price of a car
Based on various market surveys, the consulting firm has gathered a large dataset of different types of cars across the American market.
Business Goal
You are required to model the price of cars with the available independent variables. It will be used by the management to understand how exactly the prices vary with the independent variables. They can accordingly manipulate the design of the cars, the business strategy etc. to meet certain price levels. Further, the model will be a good way for management to understand the pricing dynamics of a new market.
2.0 Dataset
The dataset contains the information of 205 cars across the American market. Here is the data dictionary which states a short description of each feature as well as the data type.
Data Dictionary
Car_ID: Unique id of each observation (Integer)
Car_Company: Name of car company (Categorical)
Car_body: Body of car (Categorical)
Wheel_base: Wheelbase of car (Numeric)
Car_length: Length of car (Numeric)
Car_width: Width of car (Numeric)
Car_height: Height of car (Numeric)
Car_weight: The weight of a car without occupants or baggage. (Numeric)
Cylinder_number: Cylinder placed in the car (Categorical)
Engine_size: Size of car (Numeric)
Bore_ratio: Bore ratio of car (Numeric)
Horse_power: Horsepower (Numeric)
City_mpg: Mileage in city (Numeric)
Highway_mpg: Mileage on highway (Numeric)
price (Dependent variable): Price of car (Numeric)
3.0 Required Analysis
Using the descriptive analytics and regression tools, complete the following steps and report on them.
1. Detect and handle missing values (describe your strategy in the report).
2. Fix invalid values: There seems to be some spelling error in the carcompany column:
(maxda, mazda); (Nissan, nissan); (porsche, porcshce); (toyota, toyouta); (vokswagen, volkswagen, vw). Fix the misspellings! (provide the formula/part of the results)
Hint: You may need to use IF function to automate the misspelling replacement.
3. Using descriptive analytics, complete the following table:
wheelbase carlength curbweight enginesize price
count
mean
standard deviation
min
25% percentile
50% percentile
75% percentile
max
4. Create two new features (share your approach and part of the results):
a) feuleconomy: defined as 0.55*citympg +0.45*highwaympg
b) carsrange: defined according to the following table
price carsrange
0-15000 Budget
15000-25000 Medium
>25000 Highend
5. What is the average engine size for each car range? (Provide the results in a table)
6. Encode the categorical columns: cylindernumber, carbody. Explain the type of encoding you applied for each of them.
7. Exclude the car_ID and carcompany columns, obtain a linear regression model for the rest of dataset where price is the target variable, and the rest of columns are features (independent variables). Discuss the obtained results in detail.
Important: make sure you apply the regression to the dataset that you modified in previous steps.
8. What is the regression function?
9. The Company wants to predict the price of one their product which has the following specs:
car_body Hatchback
wheel_base 94
car_length 155.3
car_width 64.2
car_height 51.1
car_weight 2211
cylinder_number 4
engine_size 100
bore_ratio 3.05
horse_power 104
city_mpg 21
highway_mpg 26
Price ?
What would be a reasonable price based on the predictive model you obtained?
4.0 Methodology
Mention the assumptions/techniques/calculations used to analyze the data (when necessary).
Tabulate the calculations for each question to simplify the understanding of the results.
Provide necessary explanations/discussions (when necessary).
5.0 Grading Rubric
Data Cleaning (20%): Accuracy in data cleaning, including the identification and removal of outliers.
Data processing and descriptive Analysis (20%)
Regression Analysis (30%): Correct application of regression techniques and interpretation of the results.
Predictive Scenario Analysis (10%): Precision of the predictive model and the depth of scenario analysis.
Report Quality (20%): Quality of writing, adherence to APA style, and overall organization and clarity.
N.B. Failure to comply with the above would result in low grades.
6.0

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!