Question: Students are required to submit a Mini Project in the field of data science. The mini project will essentially involve working with a dataset to
Students are required to submit a Mini Project in the field of data science. The mini project will essentially
involve working with a dataset to be imported from from either a CSV or Excel file. Students should find
a suitable dataset and define their problem statement clearly. The dataset should have a minimum of
rows and some missing data, ouliers, noise, etc. It is recommended to use Pandas librarypackage in Python
to work on the project.
The project will be divided into four main stages:
Data Cleaning: Students will be required to clean the dataset by handling missing data
appropriately, removing duplicates and outliers, and ensuring consistency in data format. Based on
the data sets, students can implement some more cleaning if required.
Exploratory Data Analysis EDA: After cleaning the data, students are expected to display the
basic statistics about dataset. Students will perform EDA to understand the dataset's distribution,
correlation, and relationship between variables. Students are expected to visualize their findings in
at least five ways, including but not limited to scatter plots, bar charts, histograms, and heatmaps
or any other format they prefer.
Feature Selection: Based on their EDA findings, students will select the relevant features for
analysis. Any suitable method of feature selection can be used so that students can explain why
they have selected the features and justify why other features were excluded.
Predictive Modeling: Students will use linear or multiple regression to predict the values for the
output variable for new inputs. For this, students should divide the dataset into training and test
sets, train their model on the training set, and validate the results on the test set. They should also
provide the accuracy of their model. Students should explain the rationale behind selecting the
regression method and interpret the results obtained.
The final project report should include a detailed explanation of the project's problem statement, the data
cleaning process, the EDA findings, the feature selection process, and the regression model. Students should
also include the visualizations they used to communicate their findings.
Students will be evaluated on the quality of project report, the quality of the code, and their ability to
communicate their findings effectively.
StudentCentered Learning Assignment Mini Project Statement
Rubric for Evaluating Assignment Total Marks:
Criteria Marks Comments
Data Cleaning marks for appropriately handling missing data and removing
duplicates and outliers.
mark for ensuring consistency in data format.
Exploratory Data
Analysis
mark for accurately summarizing the dataset distribution.
mark for identifying correlations and relationships between
variables.
marks for using at least five visualizations to effectively
communicate findings.
Feature Selection mark for justifying the selection of relevant features.
mark for explaining why other features were excluded.
Regression Modeling mark for appropriately dividing the dataset into training and test
sets.
mark for selecting the appropriate regression method and
justifying the selection.
mark for interpreting the results obtained from the regression
model.
Presentation & Clarity marks for clear and concise writing, effective organization, and
appropriate formatting.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
