The hybird approach to programming clusters of multicore

The hybird approach to programming clusters of multi-core architetures HYBRID OPENMP/MPI

High-performance computing � Cluster computing �In order to gain high performance many computers are networked together to work on computation heavy problems as a whole � Multi-core computers �Since the max single core computer speeds began to level out, multi-core computers became the popular solution

High-performance computing � Multi-core clusters �With multi-core computers being popular, nodes in clusters began to make use of this style �Having multi-core node would seem to make more powerful nodes and save lot of power and space �Programming these multi-core nodes could possibly be different

Programming � Cluster computers �The standard model for programing clusters is MPI �MPI is the message passing interface that is used mostly with distributed memory � Multi-core computers �Multi-core computers have shared memory �An efficient way to program on shared memory would be the use of Open. MP

Multi-core Clusters � When programming for multi-core clusters MPI treats cores on different nodes and the same node in a similar manner � MPI completely ignore the fact that core on a single node work on shared memory � This causes problems because duplicate data is on on a single node with multiple cores

MPI � MPI uses communicator objects which connect groups of processes in the MPI session. � MPI supports point-to-point communication between two specific processes. � Collective functions are used to communicate all processes in a process group

Problems with MPI � Collectives �Collectives take arguments that are arrays of size equal to the number of processes in the communicator ○ Example of collectives are MPI_Gatherv, MPI_Scatterv, and MPI_Alltoallv �In the case of MPI_Alltoallv, it takes arguments for both sends and receives which add up to 4 MB on each process if there is a million processes

Solution � The combination of Open. MP and MPI is a worthy solution � MPI can handle the communication between nodes on distributed memory � Open. MP can handle communication within a single node on shared memory

Implementation � Since Open. MP is not an all or nothing model, it can be injected into certain parts of the program � One can identify the most time consuming loops and place Open. MP directives on them � One can also place directives on loops across undistributed dimensions.

Hybrid masteronly � This model is when only one MPI process is used per node � All communication within the node is done by Open. MP � Problem �The idling of other threads in the node while communication my MPI is taking place � MPI bandwidth not always fully used with a single communicating thread

Hybrid with overlap � Using this method is a way to avoid idling computed threads during communication � The communication is split into to one or more Open. MP threads to handle communication in parallel with useful calculation

Conclusion � Taking advantage of the hybrid approach has advantages over pure MPI in some cases � This is not valid for all cases, some cases will have reverse effects � If programmed well, applications can be very scalable and have great performance increase