Question: In this assignment, you will use pandas to process a data set of artists and their tracks downloaded from Spotify Web API. This data come

In this assignment, you will use pandas to process a data set of artists and their tracks downloaded from Spotify Web API. This data come in two files: artists.tsv and tracks.tsv, both of which have tab-separated values. They contain uniquely identified 240K artists and 450K tracks performed by these artists, respectively. In the former, you can find basic information such as number of followers and genre of the artists, and in the latter, there are information about popularity of the tracks and ID of the artists that performed them.
Note that the ID(s) in id_artists of tracks.tsv have one or multiple IDs separated by ,(comma and a space). These IDs can be matched with the ones in id column of artists.tsv to uniquely identify artists that performed the tracks. Another multi-value column is called genres in artists.tsv. It shows genre(s) assigned to each artist and the values are separated by ,(comma and a space). Below, we define a jazz artist as an artist who has at least one value in this column where jazz is mentioned (e.g.,jazz pop,soul jazz, etc.). Similarly, a pop artist would be an artist that has at least one genre value that has pop string. Similar definition goes for other genres.
Load the two .tsv files into two Pandas dataframes and use Pandas methods and functions to address the following questions:
Identify and print the name and genre of the artist with maximum number of followers.
Identify and print name of the most productive artist in terms of the number of tracks s/he performed.
Write a function called summarize_genres(genres) that takes a list of genres and return a dataframe that has three columns: genre(name of input genres),total N(total number of artists in each genre), and Av. followers" (average number of followers of artists in each genre).
Write a function called get_genre_variants(genre) that takes a genre string and returns an array that includes all variants of that genre (i.e., strings in which that genre is mentioned). Try it on jazz. How many variants of jazz can you find in this data set?
Write a function called summarize_artist_performance(name) that takes an artists name and print the following values: number of tracks, number of solo tracks, number of collaborative tracks, average popularity of total/solo/collaborative tracks, number of people with whom the artists have collaborated. Try it on Michael Jackson. Are his average total/solo/collaborative track popularities very different?
Record a three-minute video in which you run the code. Then, present your code. Specifically, answer the following questions:
How did you identify artists with maximum number of followers or maximum number of tracks?
How did you detect all artists in a given genre in summarize_genres function? Show the output of this function for the input list [pop,hip hop,rock,metal,jazz,blues,country,folklore].
Explain how you classify a track to either a solo or collaborative performance.
Explain how you identified all distinct collaborators of an artist.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!