Question: Hello, I don't know how to do this question. Can you help me to solve it with code in Python? Thank you so much! Getting

Hello, I don't know how to do this question. Can you help me to solve it with code in Python? Thank you so much!

Hello, I don't know how to do this question. Can you help me to solve it with code in Python? Thank you so much! Getting Started Take a look at the columns in the dataset credit_card.csv. We have V1-V20, 'Amount', and 'Time' as input features, and 'Class' as output which takes the value 1 if it's fraud and 0 otherwise. This dataset presents 492 frauds out of 284,807 transactions. That means, there

Getting Started Take a look at the columns in the dataset credit_card.csv. We have V1-V20, 'Amount', and 'Time' as input features, and 'Class' as output which takes the value 1 if it's fraud and 0 otherwise. This dataset presents 492 frauds out of 284,807 transactions. That means, there are 284,808 rows (including the header) in the csv file. We will use this dataset to implement logistic regression using batch and stochastic gradient descent for binary classification. \#Initial imports and setup import numpy as np import os import pandas as pd from sklearn import svm from sklearn import model_selection \# Read credit card data into a pandas dataframe for Large tests dirname = os.getcwd() credit_card_data_filepath = os.path.join(dirname, 'credit_card.csv') restaurant_data_filepath = os.path.join(dirname, 'restaurant_data.csv') credit_df = pd.read_csv(credit_card_data_filepath) X= credit_df.values [:,:1] y= credit_df.values[ :,1:] credit_df_head() NameError Traceback (most recent call last) 1 credit_df. head() NameError: name "credit_df" is not defined Next, we can inspect the value counts of the 'Class' property in the dataframe to know the number of fraudulent and non-fraudulent transactions. I: Inspect the number of froudulent and non-fraudulent transactions. credit_df['Cless'].value_counts() Indexing and selecting data Similar to NumPy, we can also index and select data on Pandas dataframes. For example, we can select columns in the dataframe by their labels. In the following example, we use to index the 'Class' column in the credit dataframe. * Select the 'class' column in the credit dataframe credit_df['Class'] Or, we can use integer indexing. H Cr We can also select colunins in the dataframe that fulfils some condition. In the example below, credit_df [' Class'] == returns a Pandas Series of length 284807 , which is the size of our dataset. It contains the value True if and only if the class value for the particular entry is of value 0 , and False otherwise. We can use this Boolean series to index the credit dataframe to return the rows where the Class field is 0 . Does this remind you of how NumPy arrays operate? * obtain the credit actafrane where the 'class' fiela is 0 credit_df[credit_df['Class'] ==0] You can also concatenate Pandas series or dataframes together! In this example. We explored how we can concat the first 2 rows and last 2 rows of the dataframe. Similar to NumPy. you would also need to specify the axis for concatenation. pd. concat([credit_df [:2], credit_df [2:]], axis =0) Task 1.2: Resampling methods When we are faced with the issue of imbalanced data, there are several ways to deal with it. A more direct way might be just to collect more data instances. We realized that in our case this doesn't work well because the events unevenly occur. We then look at how to resample the existing instances. In this problem set, you are introduced with three resampling methods: undersampling, oversampling, and SMOTE. Concept 1.2.1: Undersampling rigute 1. VIsualisaliun ui umuersanimg. The figure above illustrates undersampling. In undersampling. we remove samples from the majority class. More specifically, we randomly take subsamples of the majority class such that the size of two classes is the same. Task 1.2.1: Random undersampling in practice Your task is to observe the data in credit_card.C5V] and randomly remove some observations of the majority class until the two classes balance out. ]: def random_undersampling(df: pd.Dataframe) pd.DataFrame: Given credit card dataset with o as the majority for 'class' column. Returns-credit card data with two classes having the same shape, given raw data read from csv. Parameters df: pd. Dataframe The potentially imbalanced dataset. Returns Undersampled dataset with equal number of fraudulent and legitimate data. * Tooo: add your solution here and remove raise Notimplementederror raise notimplementederror: Now you might realize the drawback of undersamping - by removing clala randonly you might have removed some valuable information from the dataset. Is there any way to do better without losing the valuable information? Getting Started Take a look at the columns in the dataset credit_card.csv. We have V1-V20, 'Amount', and 'Time' as input features, and 'Class' as output which takes the value 1 if it's fraud and 0 otherwise. This dataset presents 492 frauds out of 284,807 transactions. That means, there are 284,808 rows (including the header) in the csv file. We will use this dataset to implement logistic regression using batch and stochastic gradient descent for binary classification. \#Initial imports and setup import numpy as np import os import pandas as pd from sklearn import svm from sklearn import model_selection \# Read credit card data into a pandas dataframe for Large tests dirname = os.getcwd() credit_card_data_filepath = os.path.join(dirname, 'credit_card.csv') restaurant_data_filepath = os.path.join(dirname, 'restaurant_data.csv') credit_df = pd.read_csv(credit_card_data_filepath) X= credit_df.values [:,:1] y= credit_df.values[ :,1:] credit_df_head() NameError Traceback (most recent call last) 1 credit_df. head() NameError: name "credit_df" is not defined Next, we can inspect the value counts of the 'Class' property in the dataframe to know the number of fraudulent and non-fraudulent transactions. I: Inspect the number of froudulent and non-fraudulent transactions. credit_df['Cless'].value_counts() Indexing and selecting data Similar to NumPy, we can also index and select data on Pandas dataframes. For example, we can select columns in the dataframe by their labels. In the following example, we use to index the 'Class' column in the credit dataframe. * Select the 'class' column in the credit dataframe credit_df['Class'] Or, we can use integer indexing. H Cr We can also select colunins in the dataframe that fulfils some condition. In the example below, credit_df [' Class'] == returns a Pandas Series of length 284807 , which is the size of our dataset. It contains the value True if and only if the class value for the particular entry is of value 0 , and False otherwise. We can use this Boolean series to index the credit dataframe to return the rows where the Class field is 0 . Does this remind you of how NumPy arrays operate? * obtain the credit actafrane where the 'class' fiela is 0 credit_df[credit_df['Class'] ==0] You can also concatenate Pandas series or dataframes together! In this example. We explored how we can concat the first 2 rows and last 2 rows of the dataframe. Similar to NumPy. you would also need to specify the axis for concatenation. pd. concat([credit_df [:2], credit_df [2:]], axis =0) Task 1.2: Resampling methods When we are faced with the issue of imbalanced data, there are several ways to deal with it. A more direct way might be just to collect more data instances. We realized that in our case this doesn't work well because the events unevenly occur. We then look at how to resample the existing instances. In this problem set, you are introduced with three resampling methods: undersampling, oversampling, and SMOTE. Concept 1.2.1: Undersampling rigute 1. VIsualisaliun ui umuersanimg. The figure above illustrates undersampling. In undersampling. we remove samples from the majority class. More specifically, we randomly take subsamples of the majority class such that the size of two classes is the same. Task 1.2.1: Random undersampling in practice Your task is to observe the data in credit_card.C5V] and randomly remove some observations of the majority class until the two classes balance out. ]: def random_undersampling(df: pd.Dataframe) pd.DataFrame: Given credit card dataset with o as the majority for 'class' column. Returns-credit card data with two classes having the same shape, given raw data read from csv. Parameters df: pd. Dataframe The potentially imbalanced dataset. Returns Undersampled dataset with equal number of fraudulent and legitimate data. * Tooo: add your solution here and remove raise Notimplementederror raise notimplementederror: Now you might realize the drawback of undersamping - by removing clala randonly you might have removed some valuable information from the dataset. Is there any way to do better without losing the valuable information

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Hello, I don't know how to do this question 1.2.2. Can you help me to solve this question with code in Python? Also, Ignore the "credit_card.csv". Just do the given data1 part and the pd is basically...

Hello, I don't know how to start this question, can you help me to solve this question with pseudocode in Python? The MSE is the mean square error. Thank you so much. Task 3.4: Find number of epochs...

Hello, This is the Fina 210 Course about Real estate. Could you help me to solve this problem and show me a solution? because i want to know how to solve this type of question in exam. Thank you! ps:...

(JAVA) Hello, this is a project for a Data Structures class. I need help finishing up the code for Maze.java and MazeSquare.java. Please send me your email so I can share with you the provided code...

Instructions: Please create an ERD diagram for this scneario(create entities, include the attributes in the entities, indicate the relationships of entities with crows foot notation). Scenario: 1....

"We got our company back," said Addison (Tad) Piper, with an air of satisfaction, one spring morning in 2005. Over the course of more than thirty years, Piper had overseen the expansion of Piper...

Hello, I'm stuck on this question. Can you help me to solve this question with code using Python? Thank you for helping me. This is the full instruction of this question. 2-D Rubik's Cube 'The...

One easy python question, can anyone help me to solve it and show me a code screenshot? Question 3 (10 points): Purpose: To build a program and test it. To get warmed up with Python, in case you are...

Hello, I'm stuck in this question. Can you help me to solve with code using Python. Pls make sure to run the test cases first. Thank you so much for the help!! Task 2.6: Implement hill-climbing Using...

Hello, I'm stuck at this programming question. Can you help me to solve it with code using Python? Please run the test cases first before posting it. Thank you so much! You currently have 3 pitchers...

You are a trainee accountant working for a small firm of accountants. You have been asked to work on a new client, John Oteng. John recently opened a small shop through rented premises. John has...

Suppose that the inventor of a weather-forecasting technique determines that the weather during the growing season will be perfect, causing a bumper harvest. Explain how the inventor could use this...

Which of the following forms is used to report total wages, tips, and other compensation paid to all employees? a . Form W - 4 b . Form W - 2 G c . Form W - 2 d . Form 1 0 9 9 - MISC e . Form W - 3

Compared with half a century ago, adoption has become _ _ _ _ _ _ _ _ _ common, but it is more open and acceptabl e , so we probably discuss it _ _ _ _ _ _ _ . fill in the blanks more or much less or...

Suppose government spending increases. Would the effect on aggregate demand be larger if the Federal Reserve took no action in response or if the Fed were committed to maintaining a fixed interest...

Suppose that people suddenly wanted to hold more money balances. a. What would be the effect of this change on the economy if the Federal Reserve followed a rule of increasing the money supply by 3...

Suppose the federal government cuts taxes and increases spending, raising the budget deficit to 12 percent of GDP. If nominal GDP is rising 5 percent per year, are such budget deficits sustainable...