MULTICORE SOFTWARE DEVELOPMENT WITH EXAMPLES IN C By

MULTI-CORE SOFTWARE DEVELOPMENT WITH EXAMPLES IN C++ By Jon Nosacek

Why should you care? � Multi-core systems are becoming the standard for all devices �Less heat � 1 core = 2 cores at half frequency using ¼ power! �(P = C × V 2 × F) � Designing a new system around multicore architecture can be quite difficult.

Why should you care? (cont) � Technology isn’t evolving like it was before �Not automatic gains � We want fast! �Our users deserve the same

Multi-threaded VS Multi-core � Same basic principle, but can yield very different results �Multi-threaded assumes no knowledge of the release environment and can make the program slower on a single-core platform �Multi-core means specifically designing your system for a platform that you know has two or more cores. Can yield significant performance boosts if done correctly

Hardware � To understand how the software works, you must first understand how the hardware works � Very much a hardware-oriented evolution (Hardware could not keep up with our increasing demands)

Why transition to multi-core? � Higher processor frequencies necessitated better cooling �There is a limit based on materials and methods � Computers are replacing us �Brain is not sequential

Why multi-core (cont) � Traditional: � Multi-core

Intel Core 2 extreme Quad http: //www. techspot. com/articlesinfo/23/images/img 2. jpg Intel Core i 7 965 quad core (8 threads) http: //tinyurl. com/3 tgfygn

Terminology � Thread �Smallest unit of execution that a program can be broken down into �Contains all the info that is needed for it to run � Atomic Statement �Single operation by the processor. Can’t slice out during execution

Terminology (cont) � Hyper threading: (SMT) �Intel’s route of having 2 threads per core to simulate more cores and reduce CPU waste �Virtual processors not necessarily tied to physical ones �Example of hardware helping software

How to design a multi-core system � Planning � Implementation � Testing � Deployment � Maintenance

Planning �A “code-and-fix” laissez faire mentality WILL NOT WORK � Too many things to go wrong, hard to pinpoint problem post factum � Single most important step �Problems here will cascade into other steps and become worse � Clear vision is a must � How deep into threading do you want to go?

Planning (cont. ) � Opportunity comes during the decomposition phase � Need to model �the state of the threads and what combinations effect each other �Thread interaction � Number of threads �More threads => more problems �Balance performance with understandability, maintainability, time �Fairness and priority �More threads => more communication

Planning (cont. ) � Error handling is more important �Who handles the errors? Other threads might take a while to respond and what if everyone responds? � Synchronization and semaphores should be used sparingly. �Threads should be as independent as possible � Need to make rules on memory access �Dataflow diagrams!

Concurrent Vs Parallel Design � Which do you think is better? http: //blog. rednael. com/content/binary/parallel%20 vs%20 concurrent. jpg

Concurrent Easy to design and implement � Works well for IO � Minimal interaction to plan and synchronize � Parallel Less CPU waste � Even more difficult to track � CPU has to keep track and time slice more (swap time) �

Implementation � Languages are becoming more and more open to multi-core programming � There are libraries for C++ that help ease the workload �A lot of threading is OS tied and Microsoft knows theirs better than anyone �Usually support goes Linux & Microsoft then Macs � Watch for CPU specific commands that can improve performance

Implementation (cont. ) � Make sure resources are being managed � Update the models as the system changes � The IDE you choose during this phase can be very important and effects what you see your system doing � Using existing libraries usually reduces workload and are often more efficient � Make sure all basic/shared initializations are done before threads are created

Implementation (cont. ) � Watch for evolving trends �If a lot of communication is going on between two threads, see if things can be merged/swapped �See which threads take up the most resources and what will increase program responsiveness � Keep the future in mind �More cores will always be added. �Think about the simplest case and expand into the complex �Also realize that more features are being added to C++ to help abstract multithreading

// Basic example: #include < iostream > #include < pthread. h > void *task 1(void *X) //define task to be executed by Thread. A { cout < < “Thread A complete” < < endl; return (NULL); } void *task 2(void *X) //define task to be executed by Thread. B { cout < < “Thread B complete” < < endl; return (NULL); } int main(int argc, char *argv[]) { pthread_t Thread. A, Thread. B; // declare threads pthread_create( & Thread. A, NULL, task 1, NULL); // create threads pthread_create( & Thread. B, NULL, task 2, NULL); pthread_join(Thread. A, NULL); // wait for threads to “join up” pthread_join(Thread. B, NULL); return (0); }

// Doing little things can make a big difference too: array<int, 4> a = { 24, 26, 41, 42 }; vector<tuple<int, int>> results 1; concurrent_vector<tuple<int, int>> results 2; elapsed = time_call([&] { for_each (a. begin(), a. end(), [&](int n) { results 1. push_back(make_tuple(n, fibonacci(n))); }); elapsed = time_call([&] { parallel_for_each (a. begin(), a. end(), [&](int n) { results 2. push_back(make_tuple(n, fibonacci(n))); }); // a 4 core system outputs: 9250 ms, 5726 ms

Testing � Race conditions are the most prevalent � Identify critical paths � Balance threads and tweak for performance � Non-determinism (for some initial state, the final state is ambiguously determined)

Deployment � Mostly the same � See what platforms are actually using you program and tune as necessary

Maintenance � Need to keep up with the changing tech (still pretty new) � Adding new functionality will be more difficult especially when it’s very different from existing. � Much more testing needed � Going back to the original plan and seeing how new features fit in and what is effected is much more important

Maintenance (cont. ) � What about adding to an existing system? �Very difficult �Should focus on largest time consumers (IO, disk, complex algorithms) �Applications with low coupling are the best to add parallel aspects

Challenges � Lots of planning needed �Thorough understanding of the environment � Very hard to debug � Built in support is hit-and-miss (language & IDE) � Security concerns (from other programs as well as your own) � A lot of life-critical embedded systems are sticking with single core platforms

What apps can help me out? � Intel’s Threading Building Blocks � Open. MP � Microsoft Visual Studio � MULTI-Green Hills � Total View - Rogue Wave

Intel’s Threading Building Blocks � Template Library �Algorithms, containers, mutex, atomic statements, timing, scheduling � Implements “Task Stealing” �If one core is idle, it will take a scheduled task from another to reduce CPU waste � Automatically creates the threads for you to maximize performance �Much like parallel_for � Tries to be like the STL �ease of use, generality, but more aggressive

Intel’s Threading Building Blocks (cont. ) �A bit more memory/cache oriented than STL � Intel knows their own cores and how to schedule on them � Adds a lot more concurrency-oriented data types (concurrent_queue, concurrent_vector, concurrent_hash_map) �Also geared for easy scalability � More atomic operations (also from knowing their own cores) � Follows a pipe-line architecture like graphics

Open. MP

$Open. MP int th_id, nthreads; #pragma omp parallel private(th_id) shared(nthreads) { th_id = omp_get_thread_num();$

Open. MP int th_id, nthreads; #pragma omp parallel private(th_id) shared(nthreads) { th_id = omp_get_thread_num(); #pragma omp critical { cout << "Hello World from thread " << th_id << 'n'; } #pragma omp barrier #pragma omp master { nthreads = omp_get_num_threads(); cout << "There are " << nthreads << " threads" << 'n'; } }

Microsoft Visual Studio � Thread View

Microsoft Visual Studio (cont. )

MULTI IDE – Green Hills � Cool debugging/recording features http: //www. ghs. com/products/MULTI_IDE. html

Total View - Rogue Wave � Thread viewer:

Sources: Buttari, Alfredo, Jack Dongarra, Jakub Kurzak et all. The Impact of Multicore on Math Software � Hughes, Cameron, and Tracey Hughes. Professional Multicore Programming Design and Implementation for C++ Developers. Indianapolis, IN: Wiley Pub. , 2008. � http: //msdn. microsoft. com/enus/concurrency/default. aspx � http: //channel 9. msdn. com/search? term=co ncurrency � http: //www. cs. kent. edu/~farrell/amc 09/lectu res/ �

Any Questions? � This is all sounds like a lot of work. Why should we bother when something easier might come along? �It’s very much a game of figuring out how much effort gets the largest returns. �True progress will take both EE’s and SE’s (and CS’s too if any showed up today) �Might be a long time before we see change