Question: Students are required to submit a Mini Project in the field of data science. The mini project will essentially involve working with a dataset to

Students are required to submit a Mini Project in the field of data science. The mini project will essentially
involve working with a dataset to be imported from from either a CSV or Excel file. Students should find
a suitable dataset and define their problem statement clearly. The dataset should have a minimum of 1000
rows and some missing data, ouliers, noise, etc. It is recommended to use Pandas library/package in Python
to work on the project.
The project will be divided into four main stages:
1. Data Cleaning: Students will be required to clean the dataset by handling missing data
appropriately, removing duplicates and outliers, and ensuring consistency in data format. Based on
the data sets, students can implement some more cleaning if required.
2. Exploratory Data Analysis (EDA): After cleaning the data, students are expected to display the
basic statistics about dataset. Students will perform EDA to understand the dataset's distribution,
correlation, and relationship between variables. Students are expected to visualize their findings in
at least five ways, including but not limited to scatter plots, bar charts, histograms, and heat-maps
or any other format they prefer.
3. Feature Selection: Based on their EDA findings, students will select the relevant features for
analysis. Any suitable method of feature selection can be used so that students can explain why
they have selected the features and justify why other features were excluded.
4. Predictive Modeling: Students will use linear or multiple regression to predict the values for the
output variable for new inputs. For this, students should divide the dataset into training and test
sets, train their model on the training set, and validate the results on the test set. They should also
provide the accuracy of their model. Students should explain the rationale behind selecting the
regression method and interpret the results obtained.
The final project report should include a detailed explanation of the project's problem statement, the data
cleaning process, the EDA findings, the feature selection process, and the regression model. Students should
also include the visualizations they used to communicate their findings.
Students will be evaluated on the quality of project report, the quality of the code, and their ability to
communicate their findings effectively.
Student-Centered Learning Assignment / Mini Project Statement
Rubric for Evaluating Assignment (Total Marks: 15)
Criteria Marks Comments
Data Cleaning 43 marks for appropriately handling missing data and removing
duplicates and outliers.
1 mark for ensuring consistency in data format.
Exploratory Data
Analysis
41 mark for accurately summarizing the dataset distribution.
1 mark for identifying correlations and relationships between
variables.
2 marks for using at least five visualizations to effectively
communicate findings.
Feature Selection 21 mark for justifying the selection of relevant features.
1 mark for explaining why other features were excluded.
Regression Modeling 31 mark for appropriately dividing the dataset into training and test
sets.
1 mark for selecting the appropriate regression method and
justifying the selection.
1 mark for interpreting the results obtained from the regression
model.
Presentation & Clarity 22 marks for clear and concise writing, effective organization, and
appropriate formatting.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!