Question: Question 2 Objectives: Understand dataset with data scientist mindset Design computation logic and routines in Python Conduct visualization in an appropriate way Assess the design
Question 2
Objectives: Understand dataset with data scientist mindset Design computation logic and routines in Python Conduct visualization in an appropriate way Assess the design and use of database ORM / SQLite methods to perform extract, load, transformation and calculation operations
(a) Store the social_graph.csv in a SQLite database (4 marks) (b) Use SQLite3 with SQL statements to find the number of unique users (8 marks) (c) For each user, use SQLite3 with SQL statements to compute the number of users whom she/he directly knows. (8 marks) (d) Use SQLite3 with SQL statements to produce the requirement described in Q1(d) (8 marks) (e) Load data into a new SQLite table called users which contains 3 columns: 1. user_id indicated in the social_graph.csv 2. popular_score outputted from the question 1.7 3. friend_count: the number of direct friends (edges) of the user indicated by user_id computed in Q2(c) (5 marks) (f) From the users table, use SQLite3 with SQL statements to find: 1. The most popular user 2. The least popular user 3. The user has the greatest number of friends 4. The user has the least number of friends (8 marks) (g) Is there any correlation between popular_score and friend_count? Visualize the correlation on a diagram and justify your findings. (5 marks)
CSV is as follows:
from to 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 10 0 11 0 12 0 13 0 17 0 19 0 21 0 31 1 2 1 3 1 7 1 13 1 17 1 19 1 21 1 30 2 3 2 7 2 8 2 9 2 13 2 27 2 28 2 32 3 7 3 12 3 13 4 6 4 10 5 6 5 10 5 16 6 16 8 30 8 32 8 33 9 33 13 33 14 32 14 33 15 32 15 33 18 32 18 33 19 33 20 32 20 33 22 32 22 33 23 25 23 27 23 29 23 32 23 33 24 25 24 27 24 31 25 31 26 29 26 33 27 33 28 31 28 33 29 32 29 33 30 32 30 33 31 32 31 33 32 33
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
