MAT103

Franklin College

Erich Prisner

"PageRank" is Google's method to measure "importance" of web pages. The position where web pages are displayed in a Google search depends on the occurence of the search phrase in the document, but also on the PageRank, therefore achieving a high rank for their pages is of immense importance for web masters.

"PageRank" was introduced by Sergey Brin and Lawrence Page. It is a concept patented by Stanford University (since both were students at Stanford when deriving it) but leased to Google, the company both founded right after graduation. The term "Page" refers to Lawrence Page [].

So what makes a web page "important"? It is clear, that many other web pages would link towards such a page, therefore an obvious but too simple approach would be to just count the number of all incoming links and take this as the measure. Since people mostly only link to pages they consider to be important, by linking to it they perform a kind of vote [3].

This first approach is too simplistic for two reasons: A page with hundreds of outgoing links would have hundreds of votes, much more voting power than a page with only few outgoing links. When somebody assures you "I love you, but also I also love Alice and Alma and Beth and ...", this may carry less weight than if somebody loves only you. So it seems fair to relate these links to the total number of outgoing links. A link from a page with 100 outgoing links would count as 1/100, whereas a link from a page with only 10 outgoing links would have the higher count of 1/10.

This first modification made sure that the total number of all votes from any web page would add up to 1. But this democracy is maybe not what we want in this kind of voting. The second modification gives more "important" pages more voting power. It is like a shareholder meeting, where everybody has as many votes as shares. Note that our process now gets recursive---the importance of a page depends on the importance of the pages that link to that page, which themselves depend on the importance of other pages, ..., and so on without an end, since there may be link cycles.

So we arrive at the following formula. Assume the web pages
B_{1}, B_{2}, ... , B_{n} link towards the web pages A.
Let the variables A, B_{1}, B_{2}, ... , B_{n}
also denote the PageRank of these pages.
Assume the total number of outbound links of pages
B_{1}, B_{2}, ... , B_{n} are
d_{1}, d_{2}, ... , d_{n} respectively.
Then

The formula needs a last, technical modification (as otherwise the system of equations may not have a solution). Let m be a constant between 0 and 1, (usually m=0.85 is chosen [1,2,3]). Then the formula reads

Only m of the voting power of a web page is distributed.

Let us treat a simple example of three pages, A, B, and C. A is linking to both B and C, and both are linking back to A. The equations are

B = m*A/2 + (1-m)

C = m*A/2 + (1-m)

We solve this system (note that m is a constant) and get, by substituting first B and then C,

= m

= m

A = (1 + m - 2m

A = (2m+1)/(m+1) =1.459...

B = m*(2m+1)/(2m+2) + 1-m = (m*(2m+1)+(1-m)(2m+2))/(2m+2) =

= (2m

B =(m+2)/(2m+2) = 0.77...

C = (m+2)/(2m+2) = 0.77...

Note also that, for every possible m, A is larger than B and C,

Note that there are much more details. It may be that multiple links from one page to another are neglected. So-called "dangling pages", pages without a single link outside, are reduced to minimum PageRank [4]. Actually the exact formula (equations) used by Google nowadays is a well-kept secret.

Google also doesn't solve these 8 Billions equations using the substitution method, but rather uses a fast method approximating the values closer and closer [1,2].

Moreover, the numbers (usually 1 to 10) displayed in the Google toolbar are probably the logarithms of that numbers we created above (to an unknown base) [2,3].

[1] Chris Ridings, Mike Shishigin, PageRank Uncovered, 2002,

[2] Ian Rodgers, "Page Rank Explained---The Google Pagerank Algorithm and how
it works", __PageRank
Explained correctly with examples__, September 20, 2005 <http://www.iprcom.com/papers/pagerank/>

[3] Phil Craven, Google's PageRank explained and how to make the most of it, September 20, 2005, <http://www.webworkshop.net/pagerank.html>

[4] Sergey Brin, Lawrence Page, The Anatomy of a Large-Scale Hypertextual Web
Search Engine, __The
Anatomy of a Search Engine__, September 19, 2005 <http://www-db.stanford.edu/~backrub/google.html>

Erich Prisner, September 2005