Q 1 import numpy as np import pandas as pd from sklearn datasets import fetch openml from sklearn preprocessing import StandardScaler from sklearn impute import KNNImputer from sklearn neighbors import LocalOutlierFactor Load the diabetes dataset diabetes fetch openml ( ' diabetes ' , version 1 , as frame True ) diabetes df diabetes frame Create a descriptive statistics for the dataset print ( diabetes df describe ( ) ) Normalize the columns 'age' and 'insu' scaler StandardScaler ( ) diabetes df columns to normalize scaler fit transform ( diabetes df columns to normalize ) Detect outliers in two columns 'age' and 'insu' using LOF or Isolation forest diabetes df ' outlier ' lof fit predict ( diabetes df columns for outlier detection ) Filter out rows identified as outliers ( rows with outliers will be 1 ) filtered df diabetes df diabetes df ' outlier ' 1 Print the length of filtered dataset with and without outliers print ( f Length of dataset with outliers len ( diabetes df ) ) print ( f Length of dataset without outliers len ( filtered df ) ) Introduce missing values in the 'mass' and 'insu' columns and print the missing indices in those columns np random seed ( 4 2 ) diabetes df ' mass with missing' diabetes df ' mass ' copy ( ) diabetes df ' mass with missing' loc np random choice ( diabetes df index, size 2 0 ) np nan diabetes df ' insu with missing' diabetes df ' insu ' copy ( )

The Answer is in the image, click to view ...

Question: Q 1 . import numpy as np import pandas as pd from sklearn.datasets import fetch _ openml from sklearn.preprocessing import StandardScaler from sklearn.impute import KNNImputer

1 .

import numpy as np

import pandas as pd

from sklearn.datasets import fetch

_

openml

from sklearn.preprocessing import StandardScaler

from sklearn.impute import KNNImputer

from sklearn.neighbors import LocalOutlierFactor

Load the diabetes dataset

diabetes

=

fetch

_

openml

('

diabetes

',

version

= 1,

_

frame

=

True

)

diabetes

_

=

diabetes.frame

Create a descriptive statistics for the dataset

(

diabetes

_

.

describe

())

Normalize the columns 'age' and 'insu'

scaler

=

StandardScaler

()

diabetes

_

[

columns

_

_

normalize

] =

scaler.fit

_

transform

(

diabetes

_

[

columns

_

_

normalize

])

Detect outliers in two columns 'age' and 'insu' using LOF or Isolation forest

diabetes

_

['

outlier

'] =

lof.fit

_

predict

(

diabetes

_

[

columns

_

for

_

outlier

_

detection

])

Filter out rows identified as outliers

(

rows with outliers will be

- 1)

filtered

_

=

diabetes

_

[

diabetes

_

['

outlier

']! = - 1]

Print the length of filtered dataset with and without outliers

(

"

Length of dataset with outliers:

{

len

(

diabetes

_

)} ")

(

"

Length of dataset without outliers:

{

len

(

filtered

_

)} ")

Introduce missing values in the 'mass' and 'insu' columns and print the missing indices in those columns

.

random.seed

(42)

diabetes

_

['

mass

_

with

_

missing'

] =

diabetes

_

['

mass

'] .

copy

()

diabetes

_

['

mass

_

with

_

missing'

] .

loc

[

.

random.choice

(

diabetes

_

.

index, size

= 20)] =

.

nan

diabetes

_

['

insu

_

with

_

missing'

]

diabetes

_

['

insu

'] .

copy

() =

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Code 1 : import numpy as np import pandas as pd from PIL import Image from sklearn.preprocessing import StandardScaler # Initialize dataframe df = pd . DataFrame ( np . zeros ( ( 4 0 0 , 2 5 7 6 ) )...

The nbaallelo _ log . csv file contains data on 1 2 6 3 1 4 NBA games from 1 9 4 7 to 2 0 1 5 . The dataset includes the features pts , elo _ i , win _ equiv, and game _ result. Using a sample of the...

this is my code: import matplotlib.pyplot as plt import numpy as np import pandas as pd from sklearn.datasets import load_breast_cancer cancer=load_breast_cancer() cancer.keys()...

For this practice work, you are to determine which model is best for prediction, report the right hyperparameters, and the resulting accuracy for the Digit Recognition data set. Steps are as follows:...

"TensorFlow machine learning with Calilfornia housing data" In [ ]: import numpy as np import pandas as pd from sklearn.datasets import fetch_california_housing from sklearn.model_selection import...

Python, Import the following packages first: import numpy as np import pandas as pd import seaborn as sns import math from sklearn import preprocessing from sklearn import datasets import sklearn...

Hi! This is a Python course I'm currently taking. I'm having trouble trying to implement a for loop in my code code. The goal is to build knn models for these 9 k values 1,3,5,7,9,12,15,17,19 and...

Q: Apply k - means and hierarchical clustering to the ORL face dataset. Set k = 2 in k - means and select 2 clusters in hierarchical clustering. Do the clustering results match the two genders?...

This is my homework and I am stuck in the middle. I have to write code in COLABORATORY using Python. The topic is Data Analysis and Machine Learning. I got some instructions and data for grades of...

P1 Make use of the scikit-learn (sklearn) python package in your function implementations Complete train_test_split function Using the train_test_split function from sklearn implement a function that...

Suggest a structure for the compound whose mass spectrum is asfollows: 100 107 81 93 188 70 80 110 120 90 100 130 150 160 170 140 180 190 200 Relative abundance

Ireland Company is considering the addition of a new product to its cosmetics line. The company has three distinctly different options: a skin cream, a bath oil, or a hair coloring gel. Relevant...

Which of the following statements is correct? Group of answer choices Under the tax laws as they existed in 2 0 0 7 , a dollar received for repurchased stock must be taxed at the same rate as a...

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

=+international assignments and the issue of failure in an IA assignment and reasons for it.

=+j What compensation strategy does the organization want to pursue? That is,

=+j Describe the characteristics of successful IA selection programs and exemplary practices.