Suppose you have an example of Diabetes data collected from 200,000 patients, the data contains 8 features
Question:
Suppose you have an example of Diabetes data collected from 200,000 patients, the data contains 8 features with numeric values, and the last column is the class label. Suppose the number '0' refers to missing data. In order to clean the data, we need to
Requirement (1): to handle the missing data for each feature
,
Requirement (2): to reduce the number of features, Requirement (3): to normalize the 'plas' attribute using Min-max normalization to have a value between 0 and 1,
Requirement (4) and finally to use stratified sampling to reduce the number of records from 200,000 to 10,000. Note that the percentage of positive and negative class labels are 70% and 30%, respectively. Describe the possible methods to be used for the first two requirements and provide a sample of the resulting data for the 3 requirement, and finally, the number of records to be sampled from each class in the 4th requirement
Microeconomics An Intuitive Approach with Calculus
ISBN: 978-0538453257
1st edition
Authors: Thomas Nechyba