Question: Help me to find the solution before submission date soon please!! Write a program in Python that will open a data file (data.csv) containing multiple
Help me to find the solution before submission date soon please!!
Write a program in Python that will open a data file (data.csv) containing multiple rows and columns with no missing values. Each column in the dataset is a feature and each row is an instance. All features are continuous. Your program will conduct min-max feature scaling on the data set and save the scaled data as data_scaled.csv. The program will then create artificial missingness and conduct imputation in both data.csv and data_scaled.csv in the following manner: 1) Randomly choose 50% of the instances for creating missingness. In these 50% instances, for each feature, create missingness by randomly removing 50% of values. 2) Impute missing data using 3 methods: mean, k-nn & weighted k-nn. Choose 3 values for k: 1,3,5. For weighted k-nn, use any valid approach to assign weights based on the distance while calculating the weighted mean. 3) For each of the 7 imputation methods, calculate and output the overall imputation accuracy in the dataset. Imputation accuracy is defined as the Mean Squared Error (MSE) between the original values and the imputed values. Search online for the definition of MSE.

Due Date: Friday, February 12, 10:00 pm Late submission is allowed up to 24 hours after the due time with 20% deduction. Write a program in Python that will open a data file ("data.csv") containing multiple rows and columns with no missing values. Each column in the dataset is a feature and each row is an instance. All features are continuous. Your program will conduct min-max feature scaling on the data set and save the scaled data as data_scaled.csv". The program will then create artificial missingness and conduct imputation in both "data.csv" and "data_scaled.csv" in the following manner: 1) Randomly choose 50% of the instances for creating missingness. In these 50% instances, for each feature, create missingness by randomly removing 50% of values. 2) Impute missing data using 3 methods: mean, k-nn & weighted k-nn. Choose 3 values for k: 1,3,5. For weighted k-nn, use any valid approach to assign weights based on the distance while calculating the weighted mean. 3) For each of the 7 imputation methods, calculate and output the overall imputation accuracy in the dataset. Imputation accuracy is defined as the Mean Squared Error (MSE) between the original values and the imputed values. Search online for the definition of MSE. You are not allowed to use any library for imputation, scaling, distance calculation or MSE calculation. However, you can use libraries/packages for conducting basic statistical calculations such as minimum, maximum and mean. Submit following files on mycourselink: 1) Source code: Add enough comments in the code explaining your program. 2) A sample data file ("data.csv") that you have used to test your program. The data file must have minimum 5 continuous features and 100 instances. 3) The scaled data file ("data_scaled.csv") 4) Screen shots of the output after you execute your program on "data.csv" & "data_scaled.csv" Due Date: Friday, February 12, 10:00 pm Late submission is allowed up to 24 hours after the due time with 20% deduction. Write a program in Python that will open a data file ("data.csv") containing multiple rows and columns with no missing values. Each column in the dataset is a feature and each row is an instance. All features are continuous. Your program will conduct min-max feature scaling on the data set and save the scaled data as data_scaled.csv". The program will then create artificial missingness and conduct imputation in both "data.csv" and "data_scaled.csv" in the following manner: 1) Randomly choose 50% of the instances for creating missingness. In these 50% instances, for each feature, create missingness by randomly removing 50% of values. 2) Impute missing data using 3 methods: mean, k-nn & weighted k-nn. Choose 3 values for k: 1,3,5. For weighted k-nn, use any valid approach to assign weights based on the distance while calculating the weighted mean. 3) For each of the 7 imputation methods, calculate and output the overall imputation accuracy in the dataset. Imputation accuracy is defined as the Mean Squared Error (MSE) between the original values and the imputed values. Search online for the definition of MSE. You are not allowed to use any library for imputation, scaling, distance calculation or MSE calculation. However, you can use libraries/packages for conducting basic statistical calculations such as minimum, maximum and mean. Submit following files on mycourselink: 1) Source code: Add enough comments in the code explaining your program. 2) A sample data file ("data.csv") that you have used to test your program. The data file must have minimum 5 continuous features and 100 instances. 3) The scaled data file ("data_scaled.csv") 4) Screen shots of the output after you execute your program on "data.csv" & "data_scaled.csv
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
