Question: Data Processing using Python and Pandas The data is presented in csv format and has the following format: Attribute - Description Date - Date in
Data Processing using Python and Pandas
The data is presented in csv format and has the following format:
Attribute Description
Date Date in format ddmmyyyy
Time Time in format hh:mm:ss
Globalactivepower Household global minuteaveraged active power in kilowatt
Globalreactivepower Household global minuteaveraged reactive power in kilowatt
Voltage Minuteaveraged voltage in volt
Globalintensity Household global minuteaveraged current intensity in ampere
Submetering Energy submetering Noin watthour of active energy It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave hot plates are not electric but gas powered
Submetering Energy submetering Noin watthour of active energy It corresponds to the laundry room, containing a washingmachine, a tumbledrier, a refrigerator and a light.
Submetering Energy submetering Noin watthour of active energy It corresponds to an electric waterheater and an airconditioner.
These tasks are designed to give you anopportunityto demonstrate the following learning outcomes and to satisfy the assessment criteria:
Cleanirregularities in the raw data file to convert it into a proper CVS format.
Readdata from the cleaned CVS file into a Pandas Data frame.
Convertbetween different Datetime formats.
Filterrestrict the rows and columns in Pandas Data frames to help answer the queries.
Useaggregationoperations such as mean, median, sum, max and to summarize data.
Usegroup byto summarize data for various categories.
Create newcolumnsthat are computed based on other existing columns.
Demonstrate appropriate use of a variety of types ofPlotsto visualize data using Pandas
All plots should havemeaningful titles, axes labels and userfriendly data labels and be scaled large enough to easy see the details required.
Markdownheadings should be added to clearly separate and explain each of the tasks and markdown should be provided to discusssummarize the key observations.
Dont repeat yourself usefunctionsto avoid duplicating the same logic in multiple places.
Use programming best practice write clearsimplePython code and usewellchosenidentified names for all variables and functions.
Note that the raw CSV data may require "cleaning" before it can be processed.Everything should be included in a single Jupyter notebook which you will need to create yourself no skeleton solution for this assignment
Tasks:
Use markdown to document the data cleaning that you performed. Hint: All calendar timestamps are present in the dataset but for some timestamps, the measurement values are missing: a missing value is represented by the absence of value between two consecutive semicolon attribute separators.
Read the cleaned CSV file into a Pandas data frame
Determine the maximum household global minuteaveraged active power in kilowatt
Determine the average household global minuteaveraged current intensity in ampere
Add a column that shows the accumulated reactive power in megawatts
Show the global active power and energy submetering for the th of February in one plot
Plot the submetering value for every Tuesday in October in a graph
Add a column that indicates the percentage of submetering of the global household active energy
Plot a cumulative curve of active power used in the year
Plot the average voltage used in a week during the month of May compared to October
Markdownheadings should be added to clearly separate and explain each of the tasks and markdown should be provided to discusssummarize the key observations.
Dont repeat yourself usefunctionsto avoid duplicating the same logic in multiple places.
Use programming best practice write clearsimplePython code and usewellchosenidentified names for all variables and functions.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
