Install the following package in the virtual environment (venv/) pip install beautifulsoup4 pip install requests
Fantastic news! We've Found the answer you've been seeking!
Question:
- Install the following package in the virtual environment (venv/)
- ■ pip install beautifulsoup4
- ■ pip install requests
- ■ pip install pandas
- ■ pip install Numpy
- Stage 2: Crawl and Scrape
- ○ Schulich wants to have an integrated dataset of all Electrical and Engineering department professors in one place. So as a data engineer, you're asked to gather some information about engineering professors by crawling the faculty website of university of calgary. Then, scrape their information and load them to a pandas dataframe and eventually
- save it as a csv file.
- ○ In the first step, you need to get the html text of the website using requests library, and then you must use Beautifulsoup4 library and lxml parser to parse the html and
- extract the needed information.
- ○ Then, get the html text of the webpage and scrape the information of all its Newest faculty members and professors to put them in a dataframe as presented below:
- firstname lastname title homepage
- ○ Tip: Use `Inspect Element` of Chrome to see the mapping html tags to objects in a webpage
- ● Stage3: Explore the Data
- ○ In this part, iterate on professors' dataframe and request to get their homepage html, and find the phone number and office (building and room) of each professor and add it to your previous dataframe as a new column. Finally, save the dataframe as a csv file in the data directory (uofc_prof.csv).
- ● Stage4: Generating Report
- ○ In this part, you need to generate the following reports:
- ■ Number of Assistant Professor
- ■ Number of Professor
- ■ Number of Senior Instructor
- ■ Number of Instructor
- ■ Number of Associate Professor
Related Book For
Project management the managerial process
ISBN: 978-0073403342
5th edition
Authors: Eric W Larson, Clifford F. Gray
Posted Date: