Question: Missing feature values need to be addressed prior to the model development phase of the CRISP - DM methodology to avoid training on incomplete data.

Missing feature values need to be addressed prior to the model development phase of the
CRISP-DM methodology to avoid training on incomplete data. This task assesses your ability to
navigate the complexities of constructing a data pipeline to transform partially erroneous raw
data to knowledge and evaluate the effectiveness of the proposed solution. The task will require
you to identify a suitable publicly available dataset/s, for which there is previous research that
addresses the missing feature value problem. Propose an approach to use the Nave Bayes
classifier to address the missing feature value problem in the context of categorical feature
values. Implement this approach on a publicly available dataset/s and report its performance in
the context of a classification problem. Compare the effectiveness of this approach against a
baseline imputation approach that uses the mode value, on two different machine learning
models. Justify and discuss your findings using appropriate metrics and relate your findings to
previous research on data imputation using the same dataset/s. Present your findings in a
written report and a video presentation. State and motivate any assumptions or scope adopted
during the task

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!