The Mind of God Google Math and More

  • Slides: 19
Download presentation
“The Mind of God” Google, Math, and More Professor A Shilepsky WLLS 102 May

“The Mind of God” Google, Math, and More Professor A Shilepsky WLLS 102 May 7, 2004 Sergey Brin and Larry Page

Google is the most widely used web search engine. Definitions: Search engine: searches documents

Google is the most widely used web search engine. Definitions: Search engine: searches documents (webpages) for key words and returns a list of ones containing them. Spider (crawler): fetches web pages to be examined. Indexer: creates an index of documents based on words contained in them. Proprietary algorithm: determines which pages will be listed when a search (query) is made.

Confession: I have been a Google addict for years. Professionally -- invaluable Personally --

Confession: I have been a Google addict for years. Professionally -- invaluable Personally -- rewarding and exciting A medical journal paper that listed my daughter as a co-author Information about my daughter’s [now ex] boyfriend A novel that was made into a movie that had Tillie Shilepsky as the main character. A short biography and picture of Mrs. S’s great-grandfather as she and her father were talking about him.

How did I get interested in knowing more about Google, rather than just using

How did I get interested in knowing more about Google, rather than just using it? While on sabbatical I was happily working on another project when one of my searches led to the following response:

Forbidden Your client does not have permission to get URL /search? hl=. . .

Forbidden Your client does not have permission to get URL /search? hl=. . . Dear Google User: We apologize that you're not currently able to access Google's search service. Unfortunately, someone using the same Internet Service Provider (ISP) that you use is violating our conditions of use and is sending us numerous automated search queries. As a result of this abuse, we have been forced to shut off access to Google's services for a number of the people using your ISP. . Please note that we are not accusing you personally of having violated our Terms of Service; you are most likely an innocent victim of someone else's bad behavior here. If it were possible for us to identify and isolate the individual or individuals at fault, we would certainly do so and deal with them directly. This is regrettably not the case, and we need assistance from your ISP to solve the problem. . We have notified your ISP about this problem and we are awaiting their response. We encourage you to contact your ISP or system administrator as well. . . , Sincerely, The Google Team

Seminar notice from the College of Charleston: Markov Chains, Information Retrieval and the Updating

Seminar notice from the College of Charleston: Markov Chains, Information Retrieval and the Updating Problem Amy Langville, North Carolina State University Friday, November 1, 2002, 219 Maybank One recent popular application of Markov chains is information retrieval. The successful search engine Google uses a very, very large Markov chain to rank the list of relevant documents retrieved for a user query. As documents on the Web are changed, new documents are added, or old documents deleted, the Markov chain matrix changes, and Google's Page. Rank measurement be updated accordingly. Currently, Google only updates Page. Rank on a monthly basis. We have developed a numerical algorithm to efficiently update Page. Rank, enabling Google to update on a much more frequent basis with less work. In so doing, we have also solved the long-time problem of updating a general Markov chain in the presence of both element-updates and stateupdates. Such an updating algorithm should have great impact among Markov chain practitioners. P. S. Please join us for cookies at 3: 00 p. m. followed by the talk at 3: 10 p. m.

Facts about Google • 1995 -98 Larry Page and Sergey Brin- Stanford graduate students

Facts about Google • 1995 -98 Larry Page and Sergey Brin- Stanford graduate students • Original name was Back. Rub -- changed to Google • Google from Googol = 10100 [ 1 with 100 zeros after it] • $100, 000 to Google Inc -Andy Bechtolsheim (Sun Micro) • 10, 000 searches a day in 1998 ---over 200 million now • Google indexes over 3 billion web pages • Does over 75% of internet searches--licensed by Yahoo and AOL • Google has been sued for lowering a company’s list position • Some want to regulate Google as a quasi-public utility • Google’s lawyers requested “to google” not be added to a dictionary

Why is Google so successful? The quality of a search engine is measured by

Why is Google so successful? The quality of a search engine is measured by the appropriateness of the results of its queries. When asked what a perfect search engine would be like, Larry Page said: “ It would be like the mind of God. It would know exactly what you want and give you back exactly what you need. ”

What makes Google so effective? The key innovation was Page. Rank. Every page that

What makes Google so effective? The key innovation was Page. Rank. Every page that Google indexes is given a rank from 0 to 10 that is used in the process of deciding which pages to return on a query. To understand Page. Rank one must examine the structure of the web.

Link Structure of the Web Page A Link to B Link to C Page

Link Structure of the Web Page A Link to B Link to C Page C Link to A Page B

Directed Graph and Resulting Matrix B C A 0 1 1 0 0 0

Directed Graph and Resulting Matrix B C A 0 1 1 0 0 0 1 0 0 A to A A to B A to C B to A B to B B to C C to A C to B C to C no = 0 3 by 3 Matrix yes = 1

Page. Rank Explained [by Google] Page. Rank relies on the uniquely democratic nature of

Page. Rank Explained [by Google] Page. Rank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important. " i. e. You want links into your page from important ---- pages.

There is a problem calculating Page. Rank To calculate a page’s Page. Rank you

There is a problem calculating Page. Rank To calculate a page’s Page. Rank you must know the Page. Rank of all the pages that link to it. What if two pages point to each other? How do you start? Page or Brin probably remembered something from a mathematics class that might help solve this dilemma: Markov Chains

Page and Brin used Markov chains to define a Page. Rank that would 1)

Page and Brin used Markov chains to define a Page. Rank that would 1) calulate a Page. Rank for each page 2) depend on the Page. Ranks of incoming links Their original formula: PR(A) = (1 -d) + d (PR(T 1)/C(T 1) +. . . + PR(Tn)/C(Tn)) PR(A): T 1 , T 2 , C(T): d: the Page. Rank of page A. . Tn: the pages with links pointing to A the number of outgoing links from T a dampening factor that they added to make the calculation converge.

Original Concern of Investors: Is the Page. Rank calculation scalable? “The World’s Largest Matrix

Original Concern of Investors: Is the Page. Rank calculation scalable? “The World’s Largest Matrix Computation: Google’s Page. Rank is an eigenvector of a matrix of order 2. 7 billion” [Matlab News] (Using 10, 000 computers!) Amy Langville and Carl Meyer’s work: Page. Rank can be calculated more efficiently.

How to find the Page. Rank of any page: The Google Toolbar (Works with

How to find the Page. Rank of any page: The Google Toolbar (Works with IE but not Netscape) Example of Page. Rank and backward links Raising your Page. Rank: • Have an important page link to you • [anything else? ]

“The Mind of God” Google, Math, and More Sergey Brin and Larry Page

“The Mind of God” Google, Math, and More Sergey Brin and Larry Page

“ From a poem written by Michael Brin to his son Sergey on his

“ From a poem written by Michael Brin to his son Sergey on his 25 th birthday in 1998. He seemed unhappy that Sergey was working on a search engine. You are tough, you mine data, You surf first and think later, And your crawler fast as light Wanders madly in the night. You work hard to squeeze a thesis From the world wide web of feces. You live abroad on the sunny coast To you, my son I propose a toast. Google Information