Question: In Python please. Write a skeleton template where I can then input my data (there's two .csv files in this, test and train) and get

In Python please. Write a skeleton template where I can then input my data (there's two .csv files in this, test and train) and get to the prediction.

  1. Check for missing values within the training data.
  2. If the training data contains missing values, you must describe and implement an approach to handle those missing values. (Go about it as if there were missing values)
  3. Check for outliers within the training data.
  4. If the training data contains outliers, you must describe and implement an approach to handle those outliers. (Go about it as if there are outliers)
  5. Determine whether or not you will implement normalization or standardization, and explain your decision. (Please explain when I'd use either)
  6. Build and train a k-nearest neighbors model on the training data.
  7. Report the best ROC AUC score, F1 score, and accuracy score that you were able to obtain form your model.
    • These scores must be shown for all of your data segments.
  8. Predict the target vector for the test data (from test.csv) using your model.

Test.csv snippet

id fixed acidity volatile acidity citric acid residual sugar chlorides
0 7.3 0.67 0.05 3.6 0.107
1 7.6 0.49 0.26 1.6 0.236
free sulfur dioxide total sulfur dioxide density pH sulphates alcohol
6.0 20.0 0.9972 3.4 0.63 10.1
10.0 88.0 0.9968 3.11 0.8 9.3

Train.csv snippet

id

fixed acidity

volatile acidity

citric acid

residual sugar

chlorides

0

8.5

0.4

0.4

6.3

0.05

1

11.5

0.18

0.51

4.0

0.104

2

8.2

0.34

0.37

1.9

0.057

3

10.7

0.43

0.39

2.2

0.106

4

7.6

0.42

0.25

3.9

0.104

5

10.6

0.28

0.39

15.5

0.069

free sulfur dioxide

total sulfur dioxide

density

pH

sulphates

alcohol

quality

3.0

10.0

0.99566

3.28

0.56

12.0

0

4.0

23.0

0.9996

3.28

0.97

10.1

1

43.0

74.0

0.99408

3.23

0.81

12.0

1

8.0

32.0

0.9986

2.89

0.5

9.6

0

28.0

90.0

0.99784

3.15

0.57

9.1

0

6.0

23.0

1.0026

3.12

0.66

9.2

0

Prediction should have id and quality

Please let me know if there is anything else missing form this. There are almost 800 rows in each .csv file, hence why I did the snippets. The system wouldn't let me put much data in one table so I divided each table in 2 (1st table is the first half to the left and the table under is the other half to the right), I couldn't copy paste as I wanted because the system then says the question is too long, and for some reason the table setting in here is stuck at 3 rows and won't let me change it. If there is another way from me send the data let me know.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!