Make a rmarkdown in this code library ( tidyverse ) library ( caret ) library ( cluster ) library ( factoextra ) Load the dataset data read csv ( C Users arthu Downloads bank transactions data 2 csv ) 2 Explore Your Data 5 Points a Basic Exploration View summary statistics of the dataset cat ( Summary of the dataset ) summary ( data ) Check for missing values cat ( Missing values ) print ( colSums ( is na ( data ) ) ) Histograms to view the distribution of key variables ggplot ( data , aes ( x TransactionAmount ) ) geom histogram ( bins 3 0 , fill blue , color black ) labs ( title Distribution of Transaction Amount ) ggplot ( data , aes ( x AccountBalance ) ) geom histogram ( bins 3 0 , fill green , color black ) labs ( title Distribution of Account Balance ) Scatterplot to explore relationships between variables ggplot ( data , aes ( x TransactionAmount, y AccountBalance ) ) geom point ( alpha 0 7 ) labs ( title Scatterplot Transaction Amount vs Account Balance ) Outlier detection using IQR method for TransactionAmount Q 1 quantile ( data$TransactionAmount, 0 2 5 ) Q 3 quantile ( data$TransactionAmount, 0 7 5 ) IQR Q 3 Q 1 outliers data filter ( TransactionAmount ( Q 1 1 5 IQR ) TransactionAmount ( Q 3 1 5 IQR ) ) cat ( Outliers detected for TransactionAmount ) print ( outliers ) Normalize the dataset numeric cols sapply ( data , is numeric ) numeric data data select ( where ( is numeric ) ) numeric scaled scale ( numeric data ) Step 3 Handle missing values numeric cols sapply ( data , is numeric ) categorical cols sapply ( data , is character ) Numeric columns with median data numeric cols lapply ( data numeric cols , function ( x ) ifelse ( is na ( x ) , median ( x , na rm TRUE ) , x ) ) Fill categorical columns with mode for ( col in names ( data ) categorical cols ) mode value as character ( sort ( table ( data col ) , decreasing TRUE ) 1 ) data col ifelse ( is na ( data col ) , mode value, data col ) Step 4 Objective Detect fraudulent transaction patterns using clustering numeric scaled scale ( data select ( where ( is numeric ) ) ) Perform K Means clustering set seed ( 4 2 ) kmeans result kmeans ( numeric scaled, centers 5 , nstart 2 5 ) data$KMeans Cluster kmeans result$cluster Calculate distances from centroids Centroids refer to the central points of clusters in a clustering algorithm centroids kmeans result$centers distances apply ( numeric scaled, 1 , function ( x ) min ( sqrt ( colSums ( ( t ( centroids ) x ) 2 ) ) ) ) data$KMeans Distance distances Mean 3 Standard Deviation Identifying Fraud The threshold for identifying potential fraud is set by calculating the mean distance from the centroids ( the center of the clusters ) and adding 3 times the standard deviation of the distances This threshold is commonly used in anomaly detection, as points that fall further than 3 standard deviations from the mean are considered outliers ( fraudulent transactions, in this case ) threshold mean ( distances ) 3 sd ( distances ) Flagging Fraudulent Transactions The 'distances' variable contains the distance of each point ( transaction ) from its cluster centroid Any transaction with a distance greater than the threshold is flagged as potentially fraudulent The result is stored in the new column 'KMeans Fraud', where TRUE indicates a fraud, and FALSE indicates a normal transaction data$KMeans Fraud distances threshold Summary fraud summary table ( data$KMeans Fraud ) cat ( Fraud Summary ) print ( fraud summary ) The objective of this part of the code is to visualize how the data points have been grouped into clusters based on the features TransactionAmount and AccountBalance, which were used in the K Means clustering algorithm This helps in understanding how well the clustering algorithm has separated the data Visualize K Means Clusters with two key features TransactionAmount and AccountBalance ggplot ( data , aes ( x TransactionAmount, y AccountBalance, color as factor ( KMeans Cluster ) ) ) geom point ( alpha 0 7 ) labs ( title K Means clustering algorithm , x Transaction Amount , y Account Balance ) scale color viridis d ( ) theme minimal ( ) Highlight fraud points Red points are fraud detected fraud points data filter ( KMeans Fraud TRUE ) ggplot ( data , aes ( x TransactionAmount, y AccountBalance, color as factor ( KMeans Cluster ) ) ) geom point ( alpha 0 7 ) geom point ( data fraud points, aes ( x TransactionAmount, y AccountBalance ) , color red , size 3 ) labs ( title K Means Clusters with Fraud Points , x Transaction Amount , y Account Balance ) scale color viridis d ( ) theme minimal ( ) Fraud detection logic is based on distance from centroids cat ( Total Fraudulent Transactions Detected ( Using K Means clustering ) , nrow ( fraud points ) , ) cat ( Fraudulent Transactions Detected ) print ( fraud points )

The Answer is in the image, click to view ...

Question: Make a rmarkdown in this code: library ( tidyverse ) library ( caret ) library ( cluster ) library ( factoextra ) # Load the

Make a rmarkdown in this code: library

(

tidyverse

)

library

(

caret

)

library

(

cluster

)

library

(

factoextra

)

# Load the dataset data

< -

read.csv

("

/

Users

/

arthu

/

Downloads

/

bank

_

transactions

_

data

_2 .

csv

")

2 .

Explore Your Data

5

Points ## a

.

Basic Exploration # View summary statistics of the dataset cat

("

Summary of the dataset:

")

summary

(

data

)

# Check for missing values cat

("

Missing values:

")

(

colSums

(

.

(

data

)))

# Histograms to view the distribution of key variables ggplot

(

data

,

aes

(

=

TransactionAmount

)) +

geom

_

histogram

(

bins

= 30,

fill

=

"blue", color

=

"black"

) +

labs

(

title

=

"Distribution of Transaction Amount"

)

ggplot

(

data

,

aes

(

=

AccountBalance

)) +

geom

_

histogram

(

bins

= 30,

fill

=

"green", color

=

"black"

) +

labs

(

title

=

"Distribution of Account Balance"

)

# Scatterplot to explore relationships between variables ggplot

(

data

,

aes

(

=

TransactionAmount, y

=

AccountBalance

)) +

geom

_

point

(

alpha

= 0.7) +

labs

(

title

=

"Scatterplot: Transaction Amount vs Account Balance"

)

# Outlier detection using IQR method for TransactionAmount Q

1 < -

quantile

(

data$TransactionAmount,

0.25)

3 < -

quantile

(

data$TransactionAmount,

0.75)

IQR

< -

3 -

1

outliers

< -

data

| >

filter

(

TransactionAmount

< (

1 - 1.5 *

IQR

) |

TransactionAmount

> (

3 + 1.5 *

IQR

))

cat

("

Outliers detected for TransactionAmount:

")

(

outliers

)

# Normalize the dataset numeric

_

cols

< -

sapply

(

data

,

.

numeric

)

numeric

_

data

< -

data

| >

select

(

where

(

.

numeric

))

numeric

_

scaled

< -

scale

(

numeric

_

data

)

# Step

3

: # Handle missing values numeric

_

cols

< -

sapply

(

data

,

.

numeric

)

categorical

_

cols

< -

sapply

(

data

,

.

character

)

# Numeric columns with median data

[

numeric

_

cols

] < -

lapply

(

data

[

numeric

_

cols

],

function

(

)

ifelse

(

.

(

),

median

(

,

.

=

TRUE

),

))

# Fill categorical columns with mode for

(

col in names

(

data

) [

categorical

_

cols

]) {

mode

_

value

< -

.

character

(

sort

(

table

(

data

[[

col

]]),

decreasing

=

TRUE

) [1])

data

[[

col

]] < -

ifelse

(

.

(

data

[[

col

]]),

mode

_

value, data

[[

col

]])}

# Step

4

: # Objective: Detect fraudulent transaction patterns using clustering numeric

_

scaled

< -

scale

(

data

| >

select

(

where

(

.

numeric

)))

# Perform K

-

Means clustering set.seed

(42)

kmeans

_

result

< -

kmeans

(

numeric

_

scaled, centers

= 5,

nstart

= 25)

data$KMeans

_

Cluster

< -

kmeans

_

result$cluster # Calculate distances from centroids. # Centroids refer to the central points of clusters in a clustering algorithm. centroids

< -

kmeans

_

result$centers distances

< -

apply

(

numeric

_

scaled,

1,

function

(

)

min

(

sqrt

(

colSums

((

(

centroids

) -

)^2))))

data$KMeans

_

Distance

< -

distances # Mean

+ 3 *

Standard Deviation: Identifying Fraud # The threshold for identifying potential fraud is set by calculating the mean distance from the centroids

(

the center of the clusters

)

# and adding

3

times the standard deviation of the distances. This threshold is commonly used in anomaly detection, as points that # fall further than

3

standard deviations from the mean are considered outliers

(

fraudulent transactions, in this case

) .

threshold

< -

mean

(

distances

) + 3 *

(

distances

)

# Flagging Fraudulent Transactions: # The 'distances' variable contains the distance of each point

(

transaction

)

from its cluster centroid. # Any transaction with a distance greater than the threshold is flagged as potentially fraudulent. # The result is stored in the new column 'KMeans

_

Fraud', where TRUE indicates a fraud, and FALSE indicates a normal transaction. data$KMeans

_

Fraud

< -

distances

>

threshold # Summary fraud

_

summary

< -

table

(

data$KMeans

_

Fraud

)

cat

("

Fraud Summary:

")

(

fraud

_

summary

)

# The objective of this part of the code is to visualize how the data points have been grouped into clusters based on # the features TransactionAmount and AccountBalance, which were used in the #K

-

Means clustering algorithm. This helps in understanding how well the clustering algorithm has separated the data. # Visualize K

-

Means Clusters with two key features: TransactionAmount and AccountBalance ggplot

(

data

,

aes

(

=

TransactionAmount, y

=

AccountBalance, color

=

.

factor

(

KMeans

_

Cluster

))) +

geom

_

point

(

alpha

= 0.7) +

labs

(

title

= "

-

Means clustering algorithm", x

=

"Transaction Amount", y

=

"Account Balance"

) +

scale

_

color

_

viridis

_

() +

theme

_

minimal

()

# Highlight fraud points # Red points are fraud detected fraud

_

points

< -

data

| >

filter

(

KMeans

_

Fraud

= =

TRUE

)

ggplot

(

data

,

aes

(

=

TransactionAmount, y

=

AccountBalance, color

=

.

factor

(

KMeans

_

Cluster

))) +

geom

_

point

(

alpha

= 0.7) +

geom

_

point

(

data

=

fraud

_

points, aes

(

=

TransactionAmount, y

=

AccountBalance

),

color

=

"red", size

= 3) +

labs

(

title

= "

-

Means Clusters with Fraud Points", x

=

"Transaction Amount", y

=

"Account Balance"

) +

scale

_

color

_

viridis

_

() +

theme

_

minimal

()

# Fraud detection logic is based on distance from centroids cat

("

Total Fraudulent Transactions Detected

(

Using K

-

Means clustering

)

",

nrow

(

fraud

_

points

), "

")

cat

("

Fraudulent Transactions Detected:

")

(

fraud

_

points

)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

title: "Assignment 2 " author: "First Last" format: pdf editor: visual \#\# Quarto Quarto enables you to weave together content and executable code into a finished document. To learn more about...

Could someone please help with this? The following command downloads loads a sample of the National Financial Capability Study 2015 survey responses. The data has been cleaned, coded, and stored in...

A ) Data IBTrACS The data for this project it comes from the International Best Track Archive for Climate Stewardship (IBTrACS) website: https :/ /www .nodc .noaa .gov /ibtracs /index .php ?name =...

We will se a dataset toxins.and.cancer from library nutshell Using dplyr , create 2 new vectors in the data frame: cancer.rate = number of cancer deaths / population state.toxins = (total toxic...

The header document (.h) will contain the capacity models, while the source record (.c) will contain thefunction definitions (the rationale and how each capacity functions). Reorder the remarked...

python code please with the steps from comments [6] 1 \# Import the "pandas" library as "pd" 2 import pandas as pd 3 4 \# Load in the data with "read_csv()" 5 digits =...

(JAVA - DATA STRUCTURES) Hi, THIS IS THE FOURTH TIME I HAVE POSTED THIS QUESTION AND NOBODY WANTS TO HELP ME. PLEASE, I NEED SOMEONE TO HELP ME. I need help with the program CountryDisplayer.java and...

## 1pt: Comments entered correctly above (you removed the ' ' text) ---- ## 2pts: Install and Load the tidyverse and fivethirtyeight packages ---- install.packages(" ") install.packages(" ") library(...

Complete the following R codes by using R.studio ## 1pt: Comments entered correctly above (you removed the ' ' text) ---- ## 2pts: Install and Load the tidyverse and fivethirtyeight packages ----...

Activate Now Python question I do not have access to the data set, it is built in to the zybooks website Write a program that will do the following tasks: Load the file internetusage.csv into a data...

Find the volume of the solid generated by revolving the area of the Q1 minor segment cut off from the parabola, 16y = 5(16 x) by the straight line 5x + 4y 20 = 0 about the X- axis. [8] Find the...

1. A typical transducer has following specifications: Brand Name: MIRAN Model Number: KTC-200mm Usage: Position Sensor Theory: Resistance Sensor Output: Analog SENSOR Item: Linear Transducer...

Assurance and advisory services differ from consuting services because Assurance and actisory services usually involve stuations in which oft puity wants to monitor another and focus on improving...

In order to develop a more appealing cheeseburger, a franchise uses taste tests with 6 different buns, 8 different cheeses, 2 types of lettuce, and 2 types of tomatoes. If the taste tests were done...