First, research gather the data 1 ChooseoneStackExchangesitedealingwithtopicsthatyoufindinteresting seehttps stackexc hange com sites view list traffic for a list The site cannot be too small, but also avoid selecting any of the largest ones ( especially StackOverflow, Mathematics ) unless you really want to challenge yourself As a rule of thumb, let s say that the site must have at least 1 0 , 0 0 0 questions and 1 0 , 0 0 0 answers This document was originally developed by Dr Marek Gagolewski It was subsequently revised by Dr Yang Li ( Kelvin ) during the work at the School of Information Technology, Deakin University, for the unit SIT 2 2 0 7 3 1 Data Wrangling, Trimester 1 , 2 0 2 4 2 Downloadthesite smostrecentdatadumpfromhttps archive org details stackexchange 3 Readthedescriptionofallthedatatablespublishedathttps meta stackexchange com questio ns 2 6 7 7 Then, create a single Quarto qmd file 1 that you will be rendering to a PDF report ( how to do that you will have to learn yourself this is part of this HD level task ) , where you perform what follows 1 Convertallthedatatables ( Badges , Comments,PostHistory,PostLinks,Posts,Tags,Users,Votes ) from XML to CSV , using custom code that you write yourself Ideally, you should write a Python function that takes a single input file name ( xml ) and output file name ( csv ) and performs the conversion of a single dataset 2 LoadtheCSVfilesaspandasdataframes 3 Createatleastfivenontrivialdatavisualisationsand ortables , atleastthreeofwhicharebasedon the extraction of information from text ( e g , tags, keywords, locations, etc ) You must demon strate that you have learned how to write your own regular expressions ( regexes ) 4 Drawinsightfulandinterestingconclusions Donotforgettoreflectonthepotentialdataprivacy and ethics issues that arise during the data analysis process This HD level task is purposely under defined you will not be told precisely what to do Your aim is to generate some interesting insights into data featuring lots of textual information In the course of the report preparation, you should apply a wide range of data frame wrangling and text processing techniques In particular, you must demonstrate that you mastered regular expressions Do not use pie charts ( as we discussed during the lecture ) Go beyond the basic plots that we have covered in this course Draw at least one map ( e g , of the world ) and a word cloud

Question

First, research   gather the data  1   ChooseoneStackExchangesitedealingwithtopicsthatyoufindinteresting seehttps      stackexc hange com   sites   view   list traffic for a list  The site cannot be too small, but also avoid selecting any of the largest ones ( especially StackOverflow, Mathematics ) unless you really want to challenge yourself  As a rule of thumb, let s say that the site must have at least 1 0 , 0 0 0 questions and 1 0 , 0 0 0 answers  This document was originally developed by Dr   Marek Gagolewski  It was subsequently revised by Dr   Yang Li ( Kelvin ) during the work at the School of Information Technology, Deakin University, for the unit SIT 2 2 0   7 3 1 Data Wrangling, Trimester 1 , 2 0 2 4   2   Downloadthesite smostrecentdatadumpfromhttps      archive   org   details   stackexchange     3   Readthedescriptionofallthedatatablespublishedathttps      meta   stackexchange com   questio ns   2 6 7 7     Then, create a single Quarto   qmd file 1 that you will be rendering to a PDF report ( how to do that you will have to learn yourself this is part of this HD   level task ) , where you perform what follows  1   Convertallthedatatables ( Badges , Comments,PostHistory,PostLinks,Posts,Tags,Users,Votes ) from XML to CSV , using custom code that you write yourself  Ideally, you should write a Python function that takes a single input file name (   xml ) and output file name (   csv ) and performs the conversion of a single dataset  2   LoadtheCSVfilesaspandasdataframes  3   Createatleastfivenontrivialdatavisualisationsand   ortables , atleastthreeofwhicharebasedon the extraction of information from text ( e   g   , tags, keywords, locations, etc  )   You must demon   strate that you have learned how to write your own regular expressions ( regexes )   4   Drawinsightfulandinterestingconclusions Donotforgettoreflectonthepotentialdataprivacy and ethics issues that arise during the data analysis process  This HD   level task is purposely under   defined you will not be told precisely what to do   Your aim is to generate some interesting insights into data featuring lots of textual information  In the course of the report preparation, you should apply a wide range of data frame wrangling and text processing techniques  In particular, you must demonstrate that you mastered regular expressions  Do not use pie charts ( as we discussed during the lecture )   Go beyond the basic plots that we have covered in this course  Draw at least one map ( e   g   , of the world ) and a word cloud

SolutionInn · Accepted Answer

The Answer is in the image, click to view ...

Question: First, research / gather the data: 1 . ChooseoneStackExchangesitedealingwithtopicsthatyoufindinteresting;seehttps: / / stackexc hange.com / sites ? view = list#traffic for a list. The site cannot

Step by Step Solution

Students Have Also Explored These Related Databases Questions!