Question: Write in python.We will compute the PageRank of the articles of the Hawaiian wikipedia, which is available at haw.wikipedia.org. Additional information of the Hawaiian wiki

Write in python.We will compute the PageRank of the articles of the Hawaiian wikipedia, which is available at
haw.wikipedia.org. Additional information of the Hawaiian wiki
can be found here.
Hints: If you don't speak Hawaiian, you might want to learn the wiki logic from the English wikipedia, and translate your findings. Also, caching is
recommended.
(a) Use the special AllPages page and understand its logic to retrieve the url of all articles in the Hawaiian wikipedia. Make sure to skip redirections.
How many articles did you find? (I found a bit more than 2541.)
# a)
import requests
import requests_cache
import lxml.html as 1x
import re
(b, i) Write a function that scans an article given by its url and retrieves all links to other articles in the Hawaiian wikipedia. Avoid links to special pages, images
or the ones that point to another website. Only count the proper article for links that point to a specific section. Use regular expressions to manage these
cases. (ii) Make sure to match redirections to their correct destiation article. To this end, find how wikipedia treats redirections and retrieve the true article.
(Help: Try searching for 'uc davis' on
en.wikipedia.org') To this end, I used the collection or article urls obtained in (a), which I stored in a dict object to allow for
fast lookups. Then, for each new found link I checked whether that link appeared in the dict. If not, It might be a re-direction and receive special attention.
(iii) Request all articles and obtain all links to other articles.
 Write in python.We will compute the PageRank of the articles of

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!