Question: Task 1 : Exploratory Analysis Using the Pandas DataFrame inventory _ purchases _ df , employ SweetViz, a Python library designed for initial data exploration

Task

1

: Exploratory Analysis

Using the Pandas DataFrame inventory

_

purchases

_

,

employ SweetViz, a Python library designed for initial data exploration and profiling. We used SweetViz in a previous lesson on Column Profiling. SweetViz will generate a comprehensive report that provides a detailed profile of each column in your DataFrame. This report will be instrumental in identifying cleaning opportunities such as the number of missing values for each column and other. By the end of this task, you should have a solid understanding of the dataset's structure, which will guide you in the data cleaning tasks that follow.

[]

[

insert your code here

]

After generating the SweetViz report, it's essential to document your initial observations regarding the state of data cleanliness for each column in the dataset. This step is crucial because it serves as a roadmap for the data cleaning tasks you'll undertake later. In the space provided below, note any inconsistencies, missing values, potential outliers, etc. that you observe for each variable. Your documentation should be concise but detailed enough to guide your cleaning process. This initial assessment will not only help you strategize your cleaning approach but also serve as a valuable reference for any future data projects involving similar datasets.

[

Insert your notes here about each of the columns in the dataset

]

Task

2

: Remove Duplicates

Use Python to remove all complete duplicate records from the dataset. A complete duplicate means that every field in the record is identical to another record in the dataset. Do not modify the original DataFrame because we will use it later. Instead, place all non

-

duplicate records in a new DataFrame named inventory

_

_

dups

_

.

Retain only the first occurrence of each duplicate record and remove the subsequent ones. This step is crucial for ensuring the integrity and reliability of your dataset, as duplicate records can skew your analysis and lead to incorrect conclusions.

After performing this operation, confirm that the duplicates have been successfully removed by examining the shape

(

use

.

shape

())

of the DataFrame and comparing it to its original shape. Document the number of records removed and any observations you may have.

[]

[

insert your code here

]

[

Use this space to document the number of duplicates removed

]

Task

5

: Address Outliers

In this task, you will focus on identifying and removing outliers from the 'Price' and 'Quantity' columns in the dataset. Outliers can significantly skew the results of your

data analysis, so it's crucial to handle them appropriately. Start by making a deep copy of the 'inventory

_

correct

_

format

_

'

DataFrame and name it

'inventory

_

_

outliers

_

' .

You will use

'

-

scores' to identify outliers. A Z

-

score represents how many standard deviations an element is from the mean.

Calculate the mean and standard deviation for the Price and Quantity columns. Then, compute the Z

-

scores for these columns. Once you have the Z

-

scores, filter out the

records where the Z

-

score is outside the range of

- 3

3

for either Price or Quantity.

After you've performed these steps, print the shape of both 'inventory

_

correct

_

format

_

'

and 'inventory

_

_

outliers

_

' .

Compare the number of records in these

two DataFrames to determine how many were removed due to outliers. Record your answer in the space provided below.

[]

[

insert your code here

]

[

Enter the number of records removed due to outliers

]

Task

6

: Assess the Impact of Data Cleaning

Great, you've made it through the data cleaning process! Now it's time to revisit the initial analysis with the cleaned dataset. This will allow us to evaluate whether the

data cleaning has made a significant impact on our understanding of Rick's Brick & Cue's protein purchases over the last six months.

For this task, please copy the code from the initial analysis and paste it into the code block below. Make sure to change the DataFrame name to match the name of your

cleaned DataFrame, then re

-

run the analysis.

After running the code, take a moment to evaluate the results. Do the proteins now reach the $

50, 000

mark in purchases over the last six months? Would it be

advantageous for Rick's Brick & Cue to enter into an exclusive contract with the supplier based on this new analysis? Write your response in the space provided below.

[]

[

insert your code here

]

[

write your response here

]

Congratulations on completing this assignment! You've not only honed your data cleaning skills but also made a significant impact on the decision

-

making process at

Rick's Brick & Cue. Your meticulous work in identifying and rectifying issues

Task 1 : Exploratory Analysis Using the Pandas

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Develop a new custom category variable based on 'Total Crash Injuries' variable. This new custom category should contain two categories only. One category is injuries equal to zero, while the other...

Requirements: Your team is asked to utilise PowerBI to analyse customer satisfaction data by completing the TWO tasks below. Task 1: Data Preparation Load the dataset into Power BI and perform...

Task 1: Financial Analysis The Board of Directors has asked you to perform a review of the financial performance and financial position of the business based on the companys 2033 financial statements...

using python and Juypter notebook 1. Exploratory Analysis a) Conduct Exploratory Data Analysis (EDA) using pandas-profiling to help identify key insights from the dataset

Using Python and Jupyter Notebook create the following script: 1. Exploratory Analysis a) Conduct Exploratory Data Analysis (EDA) using pandas-profiling to help identify key insights from the...

This task requires you to conduct data analysis on each data set created in the Workplace Assessment Task 1. STEPS TO TAKE 1. Access and review the following: Data analysis plan developed in the...

Accounting for Different Types of Firms_UA (3) (1) ..ni Share S File Edit View Insert Format Tools Extensions Help be A g 100% - Normal text Arial 14 + Editing F. 2 Unit Activity Summary Unit:...

Project 2 Data Analysis and Visualisation of Malicious Credit Card Transaction Worth: 1 5 % of the unit Submission: ( 1 ) your code and ( 2 ) your data analysis and visualisation report on the quiz...

Q1) The Board of Directors has asked you to perform a review of the financial performance and financial position of the business based on the company's 2033 financial statements and key financial...

COMPANY: EXXON MOBILE To execute the tasks below consult the company's annual report over the 3-year period 2018, 2019 and 2020. In addition to the requirements for each question, ensure to provide...

The comparative balance sheets for 2011 and 2010 and the statement of income for 2011 are given below for Metagrobolize Industries. Additional information from Metagrobolize's accounting records is...

Ibis Company is expected to pay a $1.50 dividend next year. Dividends are expected to grow at 3 percent forever and the required rate of return is 7 percent. a. What is the price of Ibis today? b....

in which scenario does an exclusive agency listing not require a seller to pay a commission

Q11 As an employee in the Lottery Commission, your job is to design a new prize. Your idea is to create two grand prize choices: (1) receiving $50,000 at the end of each year beginning in one year...