Install the following package in the virtual environment (venv ) pip install beautifulsoup4 pip install requests pip install pandas pip install Numpy Stage 2 Crawl and Scrape Schulich wants to have an integrated dataset of all Electrical and Engineering department professors in one place So as a data engineer, you're asked to gather some information about engineering professors by crawling the faculty website of university of calgary Then, scrape their information and load them to a pandas dataframe and eventually save it as a csv file In the first step, you need to get the html text of the website using requests library, and then you must use Beautifulsoup4 library and lxml parser to parse the html and extract the needed information Then, get the html text of the webpage and scrape the information of all its Newest faculty members and professors to put them in a dataframe as presented below firstname lastname title homepage Tip Use Inspect Element of Chrome to see the mapping html tags to objects in a webpage Stage3 Explore the Data In this part, iterate on professors' dataframe and request to get their homepage html, and find the phone number and office (building and room) of each professor and add it to your previous dataframe as a new column Finally, save the dataframe as a csv file in the data directory (uofc prof csv) Stage4 Generating Report In this part, you need to generate the following reports Number of Assistant Professor Number of Professor Number of Senior Instructor Number of Instructor Number of Associate Professor

Question: Install the following package in the virtual environment (venv/) pip install beautifulsoup4 pip install requests pip install pandas pip install Numpy

Install the following package in the virtual environment (venv/)
■ pip install beautifulsoup4
■ pip install requests
■ pip install pandas
■ pip install Numpy
Stage 2: Crawl and Scrape
○ Schulich wants to have an integrated dataset of all Electrical and Engineering department professors in one place. So as a data engineer, you're asked to gather some information about engineering professors by crawling the faculty website of university of calgary. Then, scrape their information and load them to a pandas dataframe and eventually
save it as a csv file.
○ In the first step, you need to get the html text of the website using requests library, and then you must use Beautifulsoup4 library and lxml parser to parse the html and
extract the needed information.
○ Then, get the html text of the webpage and scrape the information of all its Newest faculty members and professors to put them in a dataframe as presented below:
firstname lastname title homepage
○ Tip: Use `Inspect Element` of Chrome to see the mapping html tags to objects in a webpage
● Stage3: Explore the Data
○ In this part, iterate on professors' dataframe and request to get their homepage html, and find the phone number and office (building and room) of each professor and add it to your previous dataframe as a new column. Finally, save the dataframe as a csv file in the data directory (uofc_prof.csv).
● Stage4: Generating Report
○ In this part, you need to generate the following reports:
■ Number of Assistant Professor
■ Number of Professor
■ Number of Senior Instructor
■ Number of Instructor
■ Number of Associate Professor

Step by Step Solution

★★★★★

3.45 Rating (145 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock

Lets break down the tasks into stages Stage 1 Install Required Packages Open your command prompt and navigate to your project directory Then activate your virtual environment venv if its not already a... View full answer

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Computer Network Questions!

1.Produce a frequency table and a pie chart for RACE. Summarize the results. 2.Produce the mode, median, mean, standard deviation, variance, range, minimum, and maximum for JAILCREDIT and AGEADMIT....

Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...

This case study on project evaluation is applicable for beginning courses in corporate finance or finance strategy. Two alternative investment options are available to evaluate. Challenges are...

please help! Create a virtual environment (with any name) and activate it. Install the numpy package in this virtual environment (pip install numpy). Numpy is a statistical package used for matrices...

In this project, you are required to color the map of the South America continent using a backtracking search algorithm. When coloring the map, two neighboring countries should not have the same...

The assignment I'm working on is below, the setup is on an Oracle VM, ubuntu server. When I try to install the yum package, it says it's unable to locate the yum package. Instead, I installed...

While building test equipment for your senior project, you decide that you need to use Arduino e to control gathering sensor data. You plan to communicate with the Arduino board from a separate...

A Linux user requests help. They need the sympy python package installed. Using only the pip utility or software. How would you advise the user to install the sympy package locally? Show the complete...

if name main #test () #stress test() print () result1 = generate rhyming lines () print (resultl) print () result2 = generate 10 syllable lines () print (result2) print () result3 = generate metered...

Not sure what information I could be missing, please let me know and I can add whatever may be needed, I provided all the info I was given. The only thing I can think that I didn't add is the install...

You expect to receive $15,000 at graduation in two years. You plan on investing it at 11 percent until you have $85,000. How long will you wait from now?

Claim: 22,500; = 0.01; = 1200 Sample statistics: = 23,500, n = 45 Test the claim about the population mean at the level of significance . Assume the population is normally distributed. If...

A baseball team plays in a stadium that holds 6 0 0 0 0 spectators. With the ticket price at $ 8 the average attendance has been 2 6 0 0 0 . When the price dropped to $ 6 , the average attendance...

Metrocity Inc. owns a small television station in Nova Scotia. Its year-end is June 30. The company completed the following transactions. Prepare all appropriate journal entries. SHOW ALL...

Refer to the real estate data from the Halifax area, which reports information on home listings. MLS listings from the CREA website for Halifax, Nova Scotia. The listings are in close proximity to St...

8.48. A neutron in a nuclear reactor moving with an initial speed of 120 m/s collides with a deuteron (heavy hydrogen in which the nucleus is made of a proton and a neutron) at rest. The neutron is...

The three blocks shown are released from rest and are observed to move with accelerations that have a magnitude of 1.5 m/s 2 . What is the magnitude of the friction force on the block that slides...

Two blocks in contact with each other are pushed to the right across a rough horizontal surface by the two forces shown. If the coefficient of kinetic friction between each of the blocks and the...

Question 15,4.2.37 HW Score: 82.35%,14 of 17 points Points: 0 of 1 Solve the system by the substitution method. First simplify each equation by combining like terms. -3y+4y=2x+2(x-5)-2x+14(x+y)-x+y=-6

You are creating a customer database for the Modesto Nuts minor league baseball team. Draw a project network given the information below. Complete the forward and backward pass, compute activity...

How are the work breakdown structure and change control connected?

Hector Gaming Company (HGC) is an educational gaming company specializing in young childrens educational games. HGC has just completed their fourth year of operation. This year was a banner year for...

11.18. Refer to the previous exercise. (a) Find a 95% confidence interval for the change in the mean of y for a one-unit increase in 11.19. Use software with the "house selling price" data file atthe...

11.16. Refer to Table 11.5 on page 330. Test Hq: 132-0 that mental impairment is independent of SES, controlling for life events. Report the test statistic, and report and interpret the P-valuc for...

11.8. Referto the previous exercise. Using software with the "Florida crime" data file at the text website: (a) Construct box plots for each variable and scatterplots and partial regression plots...