Question: Task 1 : Exploratory Analysis Using the Pandas DataFrame inventory _ purchases _ df , employ SweetViz, a Python library designed for initial data exploration

Task 1: Exploratory Analysis
Using the Pandas DataFrame inventory_purchases_df, employ SweetViz, a Python library designed for initial data exploration and profiling. We used SweetViz in a previous lesson on Column Profiling. SweetViz will generate a comprehensive report that provides a detailed profile of each column in your DataFrame. This report will be instrumental in identifying cleaning opportunities such as the number of missing values for each column and other. By the end of this task, you should have a solid understanding of the dataset's structure, which will guide you in the data cleaning tasks that follow.
[]: [insert your code here]
After generating the SweetViz report, it's essential to document your initial observations regarding the state of data cleanliness for each column in the dataset. This step is crucial because it serves as a roadmap for the data cleaning tasks you'll undertake later. In the space provided below, note any inconsistencies, missing values, potential outliers, etc. that you observe for each variable. Your documentation should be concise but detailed enough to guide your cleaning process. This initial assessment will not only help you strategize your cleaning approach but also serve as a valuable reference for any future data projects involving similar datasets.
[Insert your notes here about each of the columns in the dataset]
Task 2: Remove Duplicates
Use Python to remove all complete duplicate records from the dataset. A complete duplicate means that every field in the record is identical to another record in the dataset. Do not modify the original DataFrame because we will use it later. Instead, place all non-duplicate records in a new DataFrame named inventory_no_dups_df. Retain only the first occurrence of each duplicate record and remove the subsequent ones. This step is crucial for ensuring the integrity and reliability of your dataset, as duplicate records can skew your analysis and lead to incorrect conclusions.
After performing this operation, confirm that the duplicates have been successfully removed by examining the shape (use .shape()) of the DataFrame and comparing it to its original shape. Document the number of records removed and any observations you may have.
[]: [insert your code here]
[Use this space to document the number of duplicates removed]
Task 5: Address Outliers
In this task, you will focus on identifying and removing outliers from the 'Price' and 'Quantity' columns in the dataset. Outliers can significantly skew the results of your
data analysis, so it's crucial to handle them appropriately. Start by making a deep copy of the 'inventory_correct_format_df' DataFrame and name it
'inventory_no_outliers_df'.
You will use 'Z-scores' to identify outliers. A Z-score represents how many standard deviations an element is from the mean.
Calculate the mean and standard deviation for the Price and Quantity columns. Then, compute the Z-scores for these columns. Once you have the Z-scores, filter out the
records where the Z-score is outside the range of -3 to 3 for either Price or Quantity.
After you've performed these steps, print the shape of both 'inventory_correct_format_df' and 'inventory_no_outliers_df'. Compare the number of records in these
two DataFrames to determine how many were removed due to outliers. Record your answer in the space provided below.
[]: [insert your code here]
[Enter the number of records removed due to outliers]
Task 6: Assess the Impact of Data Cleaning
Great, you've made it through the data cleaning process! Now it's time to revisit the initial analysis with the cleaned dataset. This will allow us to evaluate whether the
data cleaning has made a significant impact on our understanding of Rick's Brick & Cue's protein purchases over the last six months.
For this task, please copy the code from the initial analysis and paste it into the code block below. Make sure to change the DataFrame name to match the name of your
cleaned DataFrame, then re-run the analysis.
After running the code, take a moment to evaluate the results. Do the proteins now reach the $50,000 mark in purchases over the last six months? Would it be
advantageous for Rick's Brick & Cue to enter into an exclusive contract with the supplier based on this new analysis? Write your response in the space provided below.
[]: [insert your code here]
[write your response here]
Congratulations on completing this assignment! You've not only honed your data cleaning skills but also made a significant impact on the decision-making process at
Rick's Brick & Cue. Your meticulous work in identifying and rectifying issues
Task 1 : Exploratory Analysis Using the Pandas

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!