Question: In Python please. Write a skeleton template where I can then input my data (there's two .csv files in this, test and train) and get
In Python please. Write a skeleton template where I can then input my data (there's two .csv files in this, test and train) and get to the prediction.
- Check for missing values within the training data.
- If the training data contains missing values, you must describe and implement an approach to handle those missing values. (Go about it as if there were missing values)
- Check for outliers within the training data.
- If the training data contains outliers, you must describe and implement an approach to handle those outliers. (Go about it as if there are outliers)
- Determine whether or not you will implement normalization or standardization, and explain your decision. (Please explain when I'd use either)
- Build and train a k-nearest neighbors model on the training data.
- Report the best ROC AUC score, F1 score, and accuracy score that you were able to obtain form your model.
- These scores must be shown for all of your data segments.
- Predict the target vector for the test data (from test.csv) using your model.
Test.csv snippet
| id | fixed acidity | volatile acidity | citric acid | residual sugar | chlorides |
| 0 | 7.3 | 0.67 | 0.05 | 3.6 | 0.107 |
| 1 | 7.6 | 0.49 | 0.26 | 1.6 | 0.236 |
| free sulfur dioxide | total sulfur dioxide | density | pH | sulphates | alcohol |
| 6.0 | 20.0 | 0.9972 | 3.4 | 0.63 | 10.1 |
| 10.0 | 88.0 | 0.9968 | 3.11 | 0.8 | 9.3 |
Train.csv snippet
| id | fixed acidity | volatile acidity | citric acid | residual sugar | chlorides |
| 0 | 8.5 | 0.4 | 0.4 | 6.3 | 0.05 |
| 1 | 11.5 | 0.18 | 0.51 | 4.0 | 0.104 |
| 2 | 8.2 | 0.34 | 0.37 | 1.9 | 0.057 |
| 3 | 10.7 | 0.43 | 0.39 | 2.2 | 0.106 |
| 4 | 7.6 | 0.42 | 0.25 | 3.9 | 0.104 |
| 5 | 10.6 | 0.28 | 0.39 | 15.5 | 0.069 |
| free sulfur dioxide | total sulfur dioxide | density | pH | sulphates | alcohol | quality |
| 3.0 | 10.0 | 0.99566 | 3.28 | 0.56 | 12.0 | 0 |
| 4.0 | 23.0 | 0.9996 | 3.28 | 0.97 | 10.1 | 1 |
| 43.0 | 74.0 | 0.99408 | 3.23 | 0.81 | 12.0 | 1 |
| 8.0 | 32.0 | 0.9986 | 2.89 | 0.5 | 9.6 | 0 |
| 28.0 | 90.0 | 0.99784 | 3.15 | 0.57 | 9.1 | 0 |
| 6.0 | 23.0 | 1.0026 | 3.12 | 0.66 | 9.2 | 0 |
Prediction should have id and quality
Please let me know if there is anything else missing form this. There are almost 800 rows in each .csv file, hence why I did the snippets. The system wouldn't let me put much data in one table so I divided each table in 2 (1st table is the first half to the left and the table under is the other half to the right), I couldn't copy paste as I wanted because the system then says the question is too long, and for some reason the table setting in here is stuck at 3 rows and won't let me change it. If there is another way from me send the data let me know.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
