Question: This task should be done on a dataset from the stackexchange database dump, any dataset will do. Create a single Jupyter/IPython notebook (see the Artefacts

This task should be done on a dataset from the stackexchange

This task should be done on a dataset from the stackexchange database dump, any dataset will do.

Create a single Jupyter/IPython notebook (see the Artefacts section below for all the requirements), where you perform what follows. 1. Convert all the data tables (Badges, Comments, PostHistory, PostLinks, Posts, Tags, Users, Votes) from XML to CSV, using custom code that you write yourself. Ideally, you should write a Python function that takes a single input file name (.xml) and output file name (.csv) and performs the conversion of a single dataset. 2. Load the CSV files as pandas data frames. 3. Create at least five nontrivial data visualisations and/or tables, at least three of which are based on the extraction of information from text (e.g., tags, keywords, locations, etc.). 4. Draw insightful and interesting conclusions. Do not forget to reflect on the potential data privacy and ethics issues that arise during the data analysis process. The PDF version of the report must be at least 10 pages long (not including the data conversion/import part). Make it aesthetic and interesting to read

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

SIT 2 2 0 / 7 3 1 2 0 2 3 . T 3 : Task 4 P Working with pandas Data Frames ( Heterogeneous Data ) 1 Introduction This task is related to Module 4 ( see the Learning Resources on the unit site; see...

SIG731 Task 4P Working with pandas Data Frames Heterogeneous Data 1 Introduction This task is related to Module 4 see the Learning Resources on the unit site see also Chapters 10 11 12 16 of...

1 Task Create a single Jupyter/IPython notebook (see the Artefacts section below for all the requirements), where you perform what follows. 1. From...

Create a single Jupyter/IPython notebook (see the Artefacts section below for all the requirements), where you perform what follows. 1. Convert all the data tables (Badges, Comments, PostHistory,...

Create a single Jupyter/IPython notebook (see the Artefacts section below for all the requirements), where you perform what follows. 1. Establish a connection with a new SQLite database on your disk....

1 Task Create a single Jupyter/IPython notebook (see the Artefacts section below for all the requirements), where you perform what follows. 1. Download at least five different datasets that are part...

Download the nycflights 1 3 _ weather.csv . gz data file from our unit site ( Learning Resources - > Data ) . It gives the hourly meteorological data for three airports in New York: LGA, JFK , and...

2 Task Create a single Jupyter/IPython notebook (see the Artefacts section below for all the re the whole task specification first!), where you perform what follows. The use of pandas is forbidden....

Create a single Jupyter / IPython notebook ( see the Artefacts section below for all the requirements read the whole task specification first! ) , where you perform what follows. Do not use numpy nor...

Sony and Zenith must each decide which technology to utilize in building their 2019 model high definition television (HDTV) sets: either Alpha technology or Beta technology, Sony has a technological...

Consider the following description of a bank account. Bank account specifications i. When account is opened it must have a minimum deposit of $500. ii. The account owner can make as many deposits as...

E4-22A Reassess product costs using ABC (Learning Objective 2) Reynolds, Inc., manufactures only two products, Medium (42-inch) and Large (63-inch) TVs. To generate adequate profit and cover its...

5. Develop a scenario comparing two PH programs and involving the use of a CBA.

Identify the types of informal reports.

Write messages that are used for the various stages of collection.

Describe the four elements that are encompassed in the indirect plan for persuasive messages.