Question: Missing feature values need to be addressed prior to the model development phase of the CRISP - DM methodology to avoid training on incomplete data.
Missing feature values need to be addressed prior to the model development phase of the CRISPDM methodology to avoid training on incomplete data. This task assesses your ability to navigate the complexities of constructing a data pipeline to transform partially erroneous raw data to knowledge and evaluate the effectiveness of the proposed solution. The task will require you to identify a suitable publicly available datasets for which there is previous research that addresses the missing feature value problem. Propose an approach to use the Nave Bayes classifier to address the missing feature value problem in the context of categorical feature values. Implement this approach on a publicly available datasets and report its performance in the context of a classification problem. Compare the effectiveness of this approach against a baseline imputation approach that uses the mode value, on two different machine learning models. Justify and discuss your findings using appropriate metrics and relate your findings to previous research on data imputation using the same datasets Present your findings in a written report and a video presentation. State and motivate any assumptions or scope adopted during the task.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
