Unconventional applications of Intel Xeon Phi Processor KNL

  • Slides: 11
Download presentation
Unconventional applications of Intel® Xeon Phi™ Processor (KNL) Antonio Cisternino (@cisterni)

Unconventional applications of Intel® Xeon Phi™ Processor (KNL) Antonio Cisternino (@cisterni)

Intel® Xeon Phi™ Processor Knights Landing (KNL) � Differently from its predecessor (Knights Corner)

Intel® Xeon Phi™ Processor Knights Landing (KNL) � Differently from its predecessor (Knights Corner) KNL features full x 86 support and it’s bootable � Each core has 2 FPUs 512 b wide for vectorization � Core is derived from silvermont though significantly changed � Modes support different workloads though using MCDRAM as cache is usually preferred

How the non FPU part works? � We tested software stacks as far as

How the non FPU part works? � We tested software stacks as far as possible from the typical HPC stack � We used the following software stack: � Linux � Mono (open source. NET implementation) � F# (using fsharpi the language REPL) � Visual Studio Code + Ionide � Firefox � The platform allows to stress CPU (JIT and code) and memory (Garbage collection)

Visual Studio Code running on KNL accessed using X-Windows (Data science using FSLab)

Visual Studio Code running on KNL accessed using X-Windows (Data science using FSLab)

(easier) Parallel programming Parallel task library Sequential Parallel 52 x

(easier) Parallel programming Parallel task library Sequential Parallel 52 x

A platform for microservices? � Use of Suave F# web server to create a

A platform for microservices? � Use of Suave F# web server to create a process exposing a simple function through HTTP � Use of http_load tool for stress testing a number of URLs with different degree of parallelism � Tested a large number (up to 128) web servers running on Dell C 6320 p with Intel® Xeon Phi™ Processor 7210 and on a Dell R 730 with two Xeon E 2680 -v 4 CPUs � Measured: Latency of connection and data reception � Number of fetches over 10 seconds made by a number of concurrent threads to all the servers � � We tried 256 Web servers though Xeon failed to properly start all the instances whilst KNL managed to start all of them

The (trivial) web server

The (trivial) web server

64 web servers accessed by 64 threads Xeon. PHI msec/connect mean msecs/first-response mean 0.

64 web servers accessed by 64 threads Xeon. PHI msec/connect mean msecs/first-response mean 0. 8 80 60 40 20 0 0. 6 0. 4 0. 2 0 1 5 9 13172125293337414549535761 Scaleup 15 10 conn scaleup 5 first-response scalup 0 1 5 9 13172125293337414549535761 Xeon E 2680 -v 4 (2 x) msec/connect mean msecs/first-response mean 15 8 6 2 0 1 5 9 13172125293337414549535761 first-response scalup 4 2 0 conn scaleup 6 4 5 12 10 8 10 Scaleup 0 1 4 7 10131619222528313437404346495255586164 1 6 1116212631364146515661

64 Web servers accessed by 8 threads Xeon. PHI msec/connect max msecs/first-response mean 25

64 Web servers accessed by 8 threads Xeon. PHI msec/connect max msecs/first-response mean 25 2. 5 2 15 20 Scaleup 1. 5 15 1 10 10 0. 5 5 5 0 1 4 7 10131619222528313437404346495255586164 0 0 1 5 9 13172125293337414549535761 conn scaleup 1 5 9 13172125293337414549535761 first-response scalup Xeon E 2680 -v 4 (2 x) msec/connect mean msecs/first-response mean 2. 5 2 1 1. 5 1 0. 5 0 0 1 5 9 13172125293337414549535761 2 1. 5 Scaleup 0 1 4 7 10131619222528313437404346495255586164 1 5 9 13172125293337414549535761 conn scaleup first-response scalup

Comparisons of KNL vs Xeon msecs/first-response mean 80 6 10 40 4 5 20

Comparisons of KNL vs Xeon msecs/first-response mean 80 6 10 40 4 5 20 2 0 0 0 8/64 1 4 7 10131619222528313437404346495255586164 14 12 10 8 6 4 2 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 0. 6 2 0. 5 0. 4 1. 5 0. 3 1 0. 2 0. 5 0. 1 0 0 1 5 9 13172125293337414549535761 6 4 2 0 12 10 8 6 4 2 0 40 30 20 10 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 14 12 10 8 6 4 2 0 1 5 9 13172125293337414549535761 8 100 80 60 40 20 0 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 128/128 1 5 9 13172125293337414549535761 2. 5 1 4 7 10131619222528313437404346495255586164 8/128 8 0 1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121 60 #Fet. KNL / #Fet. Xeon 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 64/64 msecs/first-response mean

Conclusions �A single Intel® Xeon Phi™ Processor core is less powerful (around 5 -6

Conclusions �A single Intel® Xeon Phi™ Processor core is less powerful (around 5 -6 x of a Xeon one) � I/O bound workloads may nevertheless benefit from massively parallel architecture (including MCDRAM benefits) � Mixed workload are possible allowing for vectoraware code to be coordinated by more traditional languages � Productivity benefits from mature software stacks � Intel® Xeon Phi™ Processor looks viable as a platform for microservices specially for the advantages of reduced power consumption