Question: PYTHON LANGUAGE ONLY.In this assignment, you will need to continue working on the Microsoft Malware Prediction problem. Here is the link to download data from
PYTHON LANGUAGE ONLY.In this assignment, you will need to continue working on the Microsoft Malware Prediction problem. Here is the link to download data from Kaggle. Please Load the data set into a pandas dataframe and see how many variables in the data set, and what are their data types. Since the size of the dataset is too big for your memory size, you can try to read a small sample like records using the following code:
pdreadcsvtraincsv nrows
Examine data types of the variables
Shows the top rows of the data frame
Encode string values if any to integers
Once again, examine data types of the variables
Produce some histograms of the variables
You need to provide analysis of the missing value percentage in each variable. You can use the following code:
Pandadat You need to show the total number of missing values in all variables using the following code:
#The sum of the missing values in each variabledataset.isnullsum
Pandadataframe isnullsumsum
Perform missing value imputation as we explained in the class and verify again that you don't have any missing values in the dataset using this code
#The sum of the missing values in each variabledataset.isnullsum
Pandadataframe isnullsumsum
Second, modeling:
Split the data into training and testing sets
Build different machine learning models like Decision Tree, Support Vector Machine, and Naive Bayes. Show the performance of the model f accuracy, precision, and recall Which model is the best? and why?
Experiment with different traintest split ratios and observe how they affect the model's performance.aframe.isnullsummake sure to download the train.csv file.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
