Question: Write a program in Python that will open a data file (data.csv) containing multiple rows and columns with no missing values. Each column in the

Write a program in Python that will open a data file (data.csv") containing multiple rows and columns with no missing values. Each column in the dataset is a feature and each row is an instance. All features are continuous. Your program will conduct min-max feature scaling on the data set and save the scaled data as data_scaled.csv". The program will then create artificial missingness and conduct imputation in both "data.csv" and "data_scaled.csv" in the following manner: 1) Randomly choose 50% of the instances for creating missingness. In these 50% instances, for each feature, create missingness by randomly removing 50% of values. 2) Impute missing data using 3 methods: mean, k-nn & weighted k-nn. Choose 3 values for k: 1,3,5. For weighted k-nn, use any valid approach to assign weights based on the distance while calculating the weighted mean. 3) For each of the 7 imputation methods, calculate and output the overall imputation accuracy in the dataset. Imputation accuracy is defined as the Mean Squared Error (MSE) between the original values and the imputed values. Search online for the definition of MSE. Write a program in Python that will open a data file (data.csv") containing multiple rows and columns with no missing values. Each column in the dataset is a feature and each row is an instance. All features are continuous. Your program will conduct min-max feature scaling on the data set and save the scaled data as data_scaled.csv". The program will then create artificial missingness and conduct imputation in both "data.csv" and "data_scaled.csv" in the following manner: 1) Randomly choose 50% of the instances for creating missingness. In these 50% instances, for each feature, create missingness by randomly removing 50% of values. 2) Impute missing data using 3 methods: mean, k-nn & weighted k-nn. Choose 3 values for k: 1,3,5. For weighted k-nn, use any valid approach to assign weights based on the distance while calculating the weighted mean. 3) For each of the 7 imputation methods, calculate and output the overall imputation accuracy in the dataset. Imputation accuracy is defined as the Mean Squared Error (MSE) between the original values and the imputed values. Search online for the definition of MSE
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
