Question: _ ______1__'_'__ Please read section 4.9 Application to M crises Chains in your text before starting this project. 3 Markov Chains and Google PageRank When

_ ______1__'_'__ Please read section_ ______1__'_'__ Please read section_ ______1__'_'__ Please read section_ ______1__'_'__ Please read section_ ______1__'_'__ Please read section_ ______1__'_'__ Please read section
_ ______1__'_'__ Please read section 4.9 Application to M crises Chains in your text before starting this project. 3 Markov Chains and Google PageRank When Google went online in the late 1990s, one thing that set it apart from other search engines was that its search results listings always seemed to deliver the \"good stu'\" up front. With other search engines, you often had to wade through screen after screen of links to irrelevant web pages that just happened to match the search text. Part of the magic behind Google is its PageRank algorithm, which quantitatively rates the importance of each page on the web, allowing Google to rank the pages and thereby present to the user the more important {and typically most relevant and helpful) page: rst. Understanding how Google ranks its search results is essential for anyone designing a webpage that they want people to access frequently, since getting listed rst in a Google search leads to many people looking at your page. With Linear Algebra, you can understand the rst and bestrknown search algorithm used by Google, now a trillion dollar company! The PageRank algorithm, developed at Stanford University in 1996 by Google founders Sergey Erin and Larry Page, ranks webpago: in Coogle search results. The basic idea is that a webpage's PageRank should depend on how many other pagu link to it, how many pages link to those pages, and so on. A page with many inbound links is likely important, and a page with many inbound links from pages which themselves have many inbound links (and so on) is even more likely to be important. Example 1: Consider ve webpages A, B1 C, D, and E that link to each other according to the graph below. For instance, the arrow from A to C indicates that page A links to page C , and the twoway aiTow between A and B indicates that A and B link to each other. Kl The PageRank algorithm assigns ranks ifA) *r{B),r Lr(D), and r(E) that are all between 0 and 1 (inclusive) and sun] to 1 Each page equally distribute: its own PageRank along its outbound links. For example, page B has two outbound links, so page B \"donates" i? to both page A and page D. Page D also gives half its PageRank to page A, so the PageRank of A satises: 1(3) MD) MA) = 2 2 Since page B has inbound links from page A {which has four total outbound links) and page C (which has two total outbound links]1 the PageRank of page 3 satises: my) 2 m4) + r10) 4 2 The full list of equations is: 11A) _ r12 r129) AB) = 4:14) + if) rm) 2 \"TA + \"23) an) 2 g __ as) E 1(0) \"(0) 4 2 2 The unique solution of this system of linear equations for which the sum of the PageRanks is 1 is: 11A) 0.211 1113) 0.105 5(0) 2 0.105 5(1)) 0.316 (13) 0.253 We always normalize our solution vector so that the sum of the PageRanks is 1. Interpreting these results, we see that page D would show up rst in a search since it has the highut rank (0.316), followed by page E, with a rank of 0.263, and so on. It is interesting to note that even though page E has the most inbound links, it doun't have the highest PageRank. This is because page E only has one outbound link, to page D, so page D receives the entirety of page E 's PageRanlc, plus a small contribution from page A. Exercise 1: {6 pts) Use the technique of the example above to nd the PageRanks of pages A, B, and C in the graph below. A B \\/ Example 1 (con't): Returning to our example, we may write the system of linear equations we found in matrix form: 0 0.5 0 0.5 0 11A) 0.25 0 0.5 0 0 1(3) 0.25 0.5 {II t] 0 f": \"', where F: MC) 0.25 0 0 0 1 7113) 0.25 0 0.5 0.5 0 ms) Denote the 5 x 5 matrix above by P. The columns of P describe the outbound links from each webpage (the rst column of P, for example, indicates that A has outbound links to B, C, D, and E, and equally distributes its PageRank among them). Because of this, the sum of the entries in each column of P is 1. In other words, P is a stochastic matrix, and can be thought of as the transition metric of a Markov chain. Additionally, the vector 1" of PageRanks we are searching for is an eigenvector of P corresponding to the eigenvalue )1 = 1. Since the entriu of 1?" sum to 1, f" is actually a steadystate vector of P! This Markov connection is not accidental. Suppose we start on a random page (A through E) and begin randomly clicking links. If To in R is the probability vector for our initial location (i.e., the components of To are the probabilities that we begin at page A, or page B, etc.), then PTo is the probability vector for our location after 50 random clicks. As we'll see later, in "nice" cases the sequence of vectors P* To approaches the steady-state (PageRank) vector F as k -+ co, no matter what To is. In this way, we can interpret the PageRank vector as a vector of probabilities for our location after many random clicks. In Example 1, r(C) = 0.105 means that if we start on a random page and click a large number of random links, then there is about a 10.5% chance that our final destination is page C. We'll see later that this realization makes it possible to approximate the PageRank vector in cases where it is infeasible to calculate the PageRank vector directly, but we are still left with a question: can we always frame the problem of finding PageRanks in terms of finding the steady-state vector of a related stochastic matrix? The following example will help settle the question. Example 2: Consider the simple graph of links below: B The corresponding matrix is: 0 0 0 P = 1/2 0 0 1/2 The third column of P is the zero vector, so P is not the transition matrix of a Markov chain! If we access a random page (A, B, or C') and click links randomly, then we always end up stuck at C after at most two clicks. To remedy this, whenever we have a page that doesn't link to any other pages, we modify the link diagram so that the page links to all pages, even itself. In this example, we draw arrows from C to all three pages: BThe new matrix is: P: 1/2 c 1/3 1/2 1 1/3 o c 1/3] We call F the altered transition mam. In general, if there are n, webpages, then we form F from P by replacing any columns of zeroes by the vector g, %, . . . , %). An altered transition matrix is always a stochastic matrix (why?), and should be used to nd PageRanks if at least one webpage does not link to other pages. Exercise 2: (12 pts) Consider the link diagram below. AHB l l CHID a. {4 pts) Show that the method you used in Exercise 1 to nd PageRanks doesn't work here. [Note that D has no inbound links, so HID) = 0.] b. {2 pts) Find the altered transition matrix F. c. {6 pts) Find the PageRank vector by nding the unique steadystate vector of F. Exercise 3: {8 pts) Consider the very simple link diagram below. Ralph has a g chance of starting at page A and a % chance of starting at page B. Once per minute, he clicks on a random link, if possible. Aa-B a. {2 pts) What is the probability that Ralph will be on page B after four minutes? b. (6 pts) Now suppose that whenever Ralph accesses page B1 after one minute he will randomly (with equal probability) jump to either page A or stay on page E. Find the probability that Ralph will be on page B after four minutes. Finding a steadystate vector of an altered transition matrix works well when the number of web pages is small, but is completely unfeasible when attempting to rank millions of webpages. In such a case, the best we can do is to approximate the PageRank vector. The following denition and theorem from your text help us to do this. The new matrix is: 0 0 1/37 P = 1/2 0 1/3 1/2 1 1/3 We call P the altered transition matrix. In general, if there are n webpages, then we form P from P by replacing any columns of zeroes by the vector (2, 1,.. 1). An altered transition matrix is always a stochastic matrix (why?), and should be used to find PageRanks if at least one webpage does not link to other pages. Exercise 2: (12 pts) Consider the link diagram below. B a. (4 pts) Show that the method you used in Exercise 1 to find PageRanks doesn't work here. [Note that D has no inbound links, so r(D) = 0.] b. (2 pts) Find the altered transition matrix P. c. (6 pts) Find the PageRank vector by finding the unique steady-state vector of P. Exercise 3: (8 pts) Consider the very simple link diagram below. Ralph has a , chance of starting at page A and a ; chance of starting at page B. Once per minute, he clicks on a random link, if possible. A- - B a. (2 pts) What is the probability that Ralph will be on page B after four minutes? b. (6 pts) Now suppose that whenever Ralph accesses page B, after one minute he will randomly (with equal probability) jump to either page A or stay on page B. Find the probability that Ralph will be on page B after four minutes. Finding a steady-state vector of an altered transition matrix works well when the number of web- pages is small, but is completely unfeasible when attempting to rank millions of webpages. In such a case, the best we can do is to approximate the PageRank vector. The following definition and theorem from your text help us to do this

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!