Question: Make a rmarkdown in this code: library ( tidyverse ) library ( caret ) library ( cluster ) library ( factoextra ) # Load the
Make a rmarkdown in this code: librarytidyverse librarycaret librarycluster libraryfactoextra # Load the dataset data read.csvC:UsersarthuDownloadsbanktransactionsdatacsv # Explore Your Data Points ## a Basic Exploration # View summary statistics of the dataset catSummary of the dataset:
summarydata # Check for missing values cat
Missing values:
printcolSumsisnadata # Histograms to view the distribution of key variables ggplotdata aesx TransactionAmount geomhistogrambins fill "blue", color "black" labstitle "Distribution of Transaction Amount" ggplotdata aesx AccountBalance geomhistogrambins fill "green", color "black" labstitle "Distribution of Account Balance" # Scatterplot to explore relationships between variables ggplotdata aesx TransactionAmount, y AccountBalance geompointalpha labstitle "Scatterplot: Transaction Amount vs Account Balance" # Outlier detection using IQR method for TransactionAmount Q quantiledata$TransactionAmount, Q quantiledata$TransactionAmount, IQR Q Q outliers data filterTransactionAmount Q IQR TransactionAmount Q IQR cat
Outliers detected for TransactionAmount:
printoutliers # Normalize the dataset numericcols sapplydata isnumeric numericdata data selectwhereisnumeric numericscaled scalenumericdata # Step : # Handle missing values numericcols sapplydata isnumeric categoricalcols sapplydata ischaracter # Numeric columns with median datanumericcols lapplydatanumericcols functionx ifelseisnax medianx narm TRUE x # Fill categorical columns with mode for col in namesdatacategoricalcols modevalue ascharactersorttabledatacol decreasing TRUE datacol ifelseisnadatacol modevalue, datacol # Step : # Objective: Detect fraudulent transaction patterns using clustering numericscaled scaledata selectwhereisnumeric # Perform KMeans clustering set.seed kmeansresult kmeansnumericscaled, centers nstart data$KMeansCluster kmeansresult$cluster # Calculate distances from centroids. # Centroids refer to the central points of clusters in a clustering algorithm. centroids kmeansresult$centers distances applynumericscaled, functionx minsqrtcolSumstcentroids x data$KMeansDistance distances # Mean Standard Deviation: Identifying Fraud # The threshold for identifying potential fraud is set by calculating the mean distance from the centroids the center of the clusters # and adding times the standard deviation of the distances. This threshold is commonly used in anomaly detection, as points that # fall further than standard deviations from the mean are considered outliers fraudulent transactions, in this case threshold meandistances sddistances # Flagging Fraudulent Transactions: # The 'distances' variable contains the distance of each point transaction from its cluster centroid. # Any transaction with a distance greater than the threshold is flagged as potentially fraudulent. # The result is stored in the new column 'KMeansFraud', where TRUE indicates a fraud, and FALSE indicates a normal transaction. data$KMeansFraud distances threshold # Summary fraudsummary tabledata$KMeansFraud cat
Fraud Summary:
printfraudsummary # The objective of this part of the code is to visualize how the data points have been grouped into clusters based on # the features TransactionAmount and AccountBalance, which were used in the #KMeans clustering algorithm. This helps in understanding how well the clustering algorithm has separated the data. # Visualize KMeans Clusters with two key features: TransactionAmount and AccountBalance ggplotdata aesx TransactionAmount, y AccountBalance, color asfactorKMeansCluster geompointalpha labstitle KMeans clustering algorithm", x "Transaction Amount", y "Account Balance" scalecolorviridisd thememinimal # Highlight fraud points # Red points are fraud detected fraudpoints data filterKMeansFraud TRUE ggplotdata aesx TransactionAmount, y AccountBalance, color asfactorKMeansCluster geompointalpha geompointdata fraudpoints, aesx TransactionAmount, y AccountBalance color "red", size labstitle KMeans Clusters with Fraud Points", x "Transaction Amount", y "Account Balance" scalecolorviridisd thememinimal # Fraud detection logic is based on distance from centroids cat
Total Fraudulent Transactions Detected Using KMeans clustering: nrowfraudpoints
cat
Fraudulent Transactions Detected:
printfraudpoints
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
