Question: Task 4, Part A This task uses a dataset containing some physical measurements of three penguin species: Adelie, Chinstrap and Gentoo. You can view the

Task 4, Part A

This task uses a dataset containing some physical measurements of three penguin species: Adelie, Chinstrap and Gentoo. You can view the dataset online by clicking here. Run the following code cell to load the dataset into a Pandas dataframe and look at the first few rows. Read through the code before executing it.

import pandas as pdURL= 'https://gist.githubusercontent.com/anibali/c2abc8cab4a2f7b0a6518d11a67c693c/raw/3b1bb5264736bb762584104c9e7a828bef0f6ec8/penguins.csv'

df = pd.read_csv (URL) print (df.head ( )) 

species island bill length_mm bill_depth_mm flipper_ length_mm

body_mass_g

sex
0 Adele Torgersen 39.1 18.7 181.0 3,750.0 MALE
1 Adele Torgersen 39.5 17.4 186.0 3,800.0 FEMALE
2 Adele Torgersen 40.3 18.0 195.0 3,250.0 FEMALE
3 Adele Torgersen 36.7 19.3 103.0 3,450.0 FEMALE
4 Adele Torgersen 39.3 20.6 190.0 3,650.0 MALE

As you can see, the dataset consists of seven columns. To get an idea of how these values differ between species, in the next cell you are to print out the mean value of each column, when grouped by 'species' this only takes a single line of code.

This should produce a table which contains the mean value of each numerical column for each of the three species. There is no need to include a print statement Google Colab will automatically display the result (like an interactive interpreter session does). Take note of how the mean values vary between species; we'll visualise it in Part B.

To achieve full marks for this task, you must follow the instructions above when writing your solution. Additionally, your solution must adhere to the following requirements:

  • You must use Pandas aggregation on the df object.
  • You must present your solution as a single line of code.

Task 4, Part B

Before commencing this part of the task, ensure that you have run the code in Part A, which loads the dataframe.

For this part, you are to produce a scatter plot which relates penguin mass and bill length. A successful solution should be identical to the example below, which clearly shows how these measurements are a good indicator of penguin species.

To achieve full marks for this task, you must follow the instructions above when writing your solution. Additionally, your solution must adhere to the following requirements:

  • You must create a different dataframe for each species by filtering the data appropriately. (Hint: you should end up with three new dataframes.)
  • You must plot each of these dataframes in the same graph, giving each data series an appropriate label.
  • You must give the plot a title, axis labels and a legend.

Task 4, Part C

The graph from Part B shows a nice separation between the species when plotting penguin weight and bill length. To improve upon this, you are to add another column to the dataset, and plot this instead of the penguin bill length.

In the next cell, write code which adds another column to the dataframe df called bill_proportion, which is the penguins' bill length divided by their bill depth. This new column represents the ratio between bill length and depth, and serves as an indicator of bill shape.

Copy and paste your code from Part B into the below cell, and add the new column you can call df.head() to check. Then, update your graphing code to use this new column; don't forget to update the labels and title! A successful solution should be identical to the example below, which does an even better job of separating penguin species than Part B.

Requirements

To achieve full marks for this task, you must follow the instructions above when writing your solution. Additionally, your solution must adhere to the following requirements:

  • The new column must be created using Pandas.

As your solution is based upon your code from Part B, the same requirements apply:

  • You must create a different dataframe for each species by filtering the data appropriately. (Hint: you should end up with three new dataframes.)
  • You must plot each of these dataframes in the same graph, giving each data series an appropriate label.
  • You must give the plot a title, axis labels and a legend.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!