Question: Dealing with Raw DataRaw data often contains inconsistencies, missing values, and outliers. Data cleaning therefore is a crucial prerequiste for making accurate and informative visualizations.There

Dealing with Raw DataRaw data often contains inconsistencies, missing values, and outliers. Data cleaning therefore is a crucial prerequiste for making accurate and informative visualizations.There are several common strategies to deal with missing values.First, you can identify the missing datapoints using:data.isnull().sum()Delete (try dropna)data.dropna(inplace=True)If there are just a few missing data points, you may want to delete them directly. If there is no data for a particular feature across many samples, you may want to delete that whole feature.Replace by mean/median/majority (try fillna)data.fillna(data.mean(), inplace=True)The method of using mean/median to replace missing values is applicable to integers (int) and floating point numbers (float). For these types of data, the mean can be directly calculated and used to fill in missing values.If the data is categorical (e.g., strings denoting different categories), the mode can be used to replace the missing values.By model (nearest neighbor method)A more sophisticated method for data imputation. Determine the K samples closest to the sample with missing data based on Euclidean distance or correlation analysis, and then use the weighted average of these K values to estimate the missing data of the sample.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!