CMU 15 505 Internet Search Technologies 15 505

  • Slides: 41
Download presentation
CMU 15 -505: Internet Search Technologies

CMU 15 -505: Internet Search Technologies

15 -505 Internet Search Technologies • Instructors: – Alona Fyshe – Scott Larsen –

15 -505 Internet Search Technologies • Instructors: – Alona Fyshe – Scott Larsen – Chris Monson – Kamal Nigam • http: //www. cs. cmu. edu/~knigam/15 -505

What does it take to build a worldclass search engine and related services? •

What does it take to build a worldclass search engine and related services? • • Lots of computer science Massively parallel computation Special-purpose data storage Information retrieval Machine learning Language analysis User interface design

 • Study each of these topics in narrow but deep fashion • Format:

• Study each of these topics in narrow but deep fashion • Format: small seminar, readings, interactive discussions, programming practicum • Grading: – 55% programming homework – 30% reading response – 15% class participation

What are reading responses? • Practice for reading and thinking about computer science research

What are reading responses? • Practice for reading and thinking about computer science research papers • Meant to be open-ended, fairly short (1 page) • Can be: – Summary of paper – Critique of theory, experiments, approach – Suggestions for follow-on studies

Collaboration and Cheating • Please collaborate on ideas, approaches, diagnosing problems – use the

Collaboration and Cheating • Please collaborate on ideas, approaches, diagnosing problems – use the mailing list • All words and code must be your own • Disclose all collaborations • Clarify any doubts

What will make this class enjoyable? • Interactive • Flexibility to explore fun domains

What will make this class enjoyable? • Interactive • Flexibility to explore fun domains and data • Early feedback to us about what works and doesn’t

Problems in Internet Search Technology: • Huge Problems – E. g. what changed in

Problems in Internet Search Technology: • Huge Problems – E. g. what changed in the web since this time yesterday? • Classic Problems – E. g. sorting a gazillion numbers fast • New Problems – E. g. making sense of dynamic Cyrillic web pages • Practical Problems – Eg. how do we make both advertisers and consumers happier at the same time? • Non-practical Problems – E. g. what do you see if you zoom all the way in on the moon? • Beautiful Problems – And Fun Problems

A Taste • Sorting – Scaling size up – Scale time requirements down •

A Taste • Sorting – Scaling size up – Scale time requirements down • Matrix Operations – Thinking about the problem in a blend of old ways and new ways

Classic Sorting Algorithms • • Quick Merge Selection Shell Heap Radix Bucket ….

Classic Sorting Algorithms • • Quick Merge Selection Shell Heap Radix Bucket ….

Enlarge the Problem: • 1, 000 x too many keys for a single machine

Enlarge the Problem: • 1, 000 x too many keys for a single machine • 1024 machines to use

Sorting: Parallel • How would you do it? – Quick? – Merge? – Selection?

Sorting: Parallel • How would you do it? – Quick? – Merge? – Selection? – Shell? – Heap? – Radix? – Bucket? – ….

Bitonic Sort: Batcher (1968) • Bitonic Sequence: <a 0, a 1, …, an-1 >

Bitonic Sort: Batcher (1968) • Bitonic Sequence: <a 0, a 1, …, an-1 > – Exists i such that <a 0. . ai> is monotonically increasing and <ai+1. . an-1> is monotonically decreasing – Or: there exists a cyclic shift of indices such that the above is satisfied – Eg. < 8, 9, 2, 1, 0, 4> is a bitonic sequence

Bitonic Merging Network

Bitonic Merging Network

Bitonic Merge on a Hypercube

Bitonic Merge on a Hypercube

Bitonic Sort

Bitonic Sort

Bitonic Sort Procedure Bitonic. Sort for i = 0 to d -1 for j

Bitonic Sort Procedure Bitonic. Sort for i = 0 to d -1 for j = i downto 0 if (i + 1)st bit of iproc <> jth bit of iproc comp_exchange_max(j, item) else comp_exchange_min(j, item) endif endfor comp_exchange_max and comp_exchange_min compare and exchange the item with the neighbor on the jth dimension

Bitonic Sort Demo http: //www. inf. fhflensburg. de/lang/algorithmen/sortieren/bit onic/bitonicen. htm

Bitonic Sort Demo http: //www. inf. fhflensburg. de/lang/algorithmen/sortieren/bit onic/bitonicen. htm

Parallel Sort: Beauty or a Beast? • What does it take to implement this?

Parallel Sort: Beauty or a Beast? • What does it take to implement this?

Bitonic Sort: Why? • • O(n log 2(n)) Data independent Resource needs are perfectly

Bitonic Sort: Why? • • O(n log 2(n)) Data independent Resource needs are perfectly defined Very parallel friendly

Matrix Multiplication 0. 75 0. 25 0. 0 0. 25 0. 75 0. 0

Matrix Multiplication 0. 75 0. 25 0. 0 0. 25 0. 75 0. 0 0. 75 0. 25 0. 0 0. 75 * = 0. 5625 0. 375 0. 0625 0. 0 0. 1875 0. 675 0. 1875 0. 0 0. 5625 0. 375 0. 0625 0. 0 0. 5625

Matrix Pipeline 0. 5625 + 0. 0 0. 75 0. 25 0. 0 +

Matrix Pipeline 0. 5625 + 0. 0 0. 75 0. 25 0. 0 + 0. 0 = 0. 0 0. 75 0. 25 0. 625 0. 375 0. 0 0. 25 0. 0 0. 75 0. 1875 0. 0625 0. 5625 0. 1875 0. 375 0. 0625 0. 0 0. 5625 0. 75 0. 25 0. 0 0. 25 0. 75 0. 0 0. 75 0. 25 0. 0 0. 75

Visualization * =

Visualization * =

Visualization * =

Visualization * =

Visualization

Visualization

Visualization

Visualization

Matrix Multiplication • A cube of processors • Each does a chunk of the

Matrix Multiplication • A cube of processors • Each does a chunk of the computation – Each needs different (and overlapping) portions of the input – Each passes intermediate results to certain neighbors • Result is stored across multiple machines • Seems kinda heavy for a simple algorithm! • Lookup Fox’s algorithm and Canon’s algorithm – Very pretty at one level – Gory at another level

A Different View Courtesy http: //www. unrealtournament 3. com/

A Different View Courtesy http: //www. unrealtournament 3. com/

Multiplication Multi-texturing *

Multiplication Multi-texturing *

Addition + Blending =

Addition + Blending =

Graphics Pipeline Multiply Add Image (Frame Buffer)

Graphics Pipeline Multiply Add Image (Frame Buffer)

How the Algorithm Works

How the Algorithm Works

How the Algorithm Works

How the Algorithm Works

How the Algorithm Works *

How the Algorithm Works *

How the Algorithm Works *

How the Algorithm Works *

How the Algorithm Works * +

How the Algorithm Works * +

Performance

Performance

GPU Sorting

GPU Sorting

Problems in Internet Search Technology: • • Huge Problems Classic Problems New Problems Practical

Problems in Internet Search Technology: • • Huge Problems Classic Problems New Problems Practical Problems Non-practical Problems Beautiful Problems Fun Problems

Questions? CMU 15 -505: Internet Search Technologies – Kamal Nigam (knigam@google. com) – Chris

Questions? CMU 15 -505: Internet Search Technologies – Kamal Nigam ([email protected] com) – Chris Monson ([email protected] com) – Alona Fyshe ([email protected] com) – Scott Larsen ([email protected] com)

Bitonic Rearranging (cycling)

Bitonic Rearranging (cycling)