COMP 83307336 Advanced Parallel and Distributed Computing Communication
COMP 8330/7336 Advanced Parallel and Distributed Computing Communication Costs (cont. ) Dr. Xiao Qin Auburn University http: //www. eng. auburn. edu/~xqin@auburn. edu
Recap: Routing Techniques 2
Recap: Packet Routing • The total communication time for packet routing is approximated by: • Compare: The total communication time for store-and-forward routing: 3
Cut-Through Routing • Takes packet routing to an extreme: further dividing messages into basic units called flits. • Force all flits to take the same path. Q 1: Why the same path? Any benefit? • Flits are typically small, the header information must be minimized. • A tracer message first programs all intermediate routers. All flits then take the same route. • Error checks are performed on the entire message, as opposed to flits. • No sequence numbers are needed. 4
Cut-Through Routing • The total communication time for cutthrough routing is approximated by: • This is identical to packet routing Q 2: What’s new? tw is typically much smaller. 5
COMP 8330/7336 Advanced Parallel and Distributed Computing Decomposition and Parallel Tasks Dr. Xiao Qin Auburn University http: //www. eng. auburn. edu/~xqin@auburn. edu
Decomposition • To decompose a problem into smaller tasks that can be executed in parallel 7
Example 1: Multiplying a Dense Matrix with a Vector Q 3: How to divide this computation into small tasks? A dense matrix-vector product can be decomposed into n tasks. Q 4: Any shared data among the tasks? The tasks share the vector b Q 5: Any control dependency among the tasks? No. No task needs to wait for the completion of any other. 8 Q 6: Is this the maximum number of tasks we could decompose this problem into?
Example 2: Database Query Processing Consider the execution of the query: MODEL = ``CIVIC'' AND YEAR = 2001 AND (COLOR = ``GREEN'' OR COLOR = ``WHITE) on the following database: ID# 4523 3476 7623 9834 6734 5342 3845 8354 4395 7352 Model Civic Corolla Camry Prius Civic Altima Maxima Accord Civic Year 2002 1999 2001 2001 2000 2001 2002 Color Blue White Green Blue Green Red Dealer MN IL NY CA OR FL NY VT CA WA Price $18, 000 $15, 000 $21, 000 $18, 000 $17, 000 $19, 000 $22, 000 $18, 000 $17, 000 $18, 000 9
Example 2: Database Query Processing Each task can be thought of as generating an intermediate table of entries that satisfy a particular clause. 10
Example 2: An Alternate Decomposition 11
Task Granularity Q 7: What factor affects task granularity? The number of tasks into which a problem is decomposed. • A large number of tasks results in fine-grained decomposition • A small number of tasks results in a coarse-grained decomposition. Q 8: Can you provide a coarse-grained decomposition? 12
Degree of Concurrency • The number of tasks that can be executed in parallel is the degree of concurrency of a decomposition. • Since the number of tasks that can be executed in parallel may change over program execution, the maximum degree of concurrency is the maximum number of such tasks at any point during execution. 13
Q 9: What is the maximum degree of concurrency of the database query examples? 14
Degree of Concurrency • The average degree of concurrency is the average number of tasks that can be processed in parallel over the execution of the program. • As the decomposition becomes finer in granularity, the degree of concurrency increases 15
Critical Path Length • A directed path in the task dependency graph represents a sequence of tasks that must be processed one after the other. • The longest such path determines the shortest time in which the program can be executed in parallel. • The length of the longest path in a task dependency graph is called the critical path length. 16
Consider the task dependency graphs of the two database query decompositions: Q 10: What are the critical path lengths for the two task dependency graphs? Q 11: How many processors are needed in each case to achieve this minimum parallel execution time? Q 12: What is the maximum degree of concurrency? 17
Q 13: What the average degree of concurrency in each decomposition? Average degree of concurrency = total amount of work / critical path length (a) Critical path length = 27, total amount of work = 63 Average degree of concurrency = 63/27 = 2. 33 (b) Critical path length = 34, total amount of work = 64 Average degree of concurrency = 64/34 = 1. 88 18
Limits on Parallel Performance • It would appear that the parallel time can be made arbitrarily small by making the decomposition finer in granularity. • There is an inherent bound on how fine the granularity of a computation can be. 19
Q 14: What is the upper bound of the number of concurrent tasks? • Multiplying a dense matrix with a vector: there can be no more than (n 2) concurrent tasks. 20
Q 15: The larger the number of concurrent tasks, the better? • Communication overhead: Concurrent tasks may have to exchange data with other tasks. • The tradeoff between the granularity of a decomposition and associated overheads often determines performance bounds. 21
Summary • Decomposition • Task Granularity • Degree of Concurrency • Critical Path 22
- Slides: 22