Question: Use python to code for the following questions. The credit card dataset can be downloaded from the below link. Need the code that can run
Use python to code for the following questions. The credit card dataset can be downloaded from the below link. Need the code that can run in python. For certain questions,just answer in words and put screenshot of visuals here. Thanks!
https://drive.google.com/file/d/152R8gP69HxogaeF3cMBE7yxj-wadxud_/view?usp=sharing
In this question, we will use the Credit Card dataset (creditcard.csv).
- Read in the dataset and create a DataFrame.Do someExploratory Data Analysis (EDA). Use pictures, graphs, descriptive statistics, correlations, etc. to tell a story about the data. Is your response (Class) balanced? Is the data well behaved? Think about if you may need to do any featurepreprocessingsuch as StandardScaler(), MinMaxScaler() or others.
- Display the distribution of yourtarget/response variable. What is the number of 0s and 1s? If a naive model were to always guess 0, what would the accuracy be?
- Now,splityour data intotraining and testing setwith 80% for training and 20% for testing.
- First, employ theRandomForest classifierto get some sense of a ballpark accuracy that you could get out of this data set.
- Usek-Fold Cross Validationto split dataset into 5 folds and print CV accuracy scores. Does your initial accuracy measure appear reasonable? Set up a pipeline and start to add more complexity. What happens when you add PCA into your pipeline?
- Examine the cross validation scores for several algorithms like Logistics Regression and KNN. Which are most promising? Why?
- Fine-tune your top algorithm usingGridSearchto determine the best hyper parameters, recalculate the cross validation scores. Plot ROC/AUC for your top algorithm. How well does it perform? Report the confusion matrix results.
- Determine your best algorithm. What did you consider? What did you learn?
- Redo analysis now using SMOTE.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
