AVX 512 Advanced Vector Extensions 1 F Foundation











- Slides: 11
AVX 512 Advanced Vector Extensions 1. F (Foundation) 2. CDI (Conflict Detection Instructions) CDI ERI & PHI AVX-512 VL, BW, DQ AVX-512 F 6. DQ (Doubleword and Quadword Instructions) AVX 2 7. AVX SSE KNL Xeon Phi Skylake Xeon 3. ERI (Exponential and Reciprocal Instructions) 4. PFI (Prefetch Instructions) 5. BW (Byte and Word Instructions) VL (Vector Length Extensions) МИЭМ НИУ ВШЭ, 2018 5
Библиотеки генераторов псевдослучайных чисел Библиотека Генераторы Реализации Параллельны е потоки GNU Scientific Library GFSR 4, TAUS 2, RANLXS 0, RANLXS 1, RANLXS 2, MT 19937, MRG ANSI C Intel MKL Library GFSR, MRG 32 k 3 a, MT 19937 , MT 2203, SFMT 19937 , PHILOX 4 X 32 X 10, ARS 5 SSE 2, SSE 3, SSSE 3 RNGSSELIB MT 19937, MRG 32 K 3 A, LFSR 113, GM 19, GM 31, GM 61 ANSI C, SSE 2 SPRNG PMLCG, LFG, MLFG, CMRG, PMLCG C++ TRNG MRG_, MRG_S, YARN_S, LAGFIB_XOR, LAGFIB_PLUS, MT 19937 ANSI C, CUDA RNGAVXLIB GM 19, GM 31, GM 61, GM 29, GM 55, GQ 58. 1, ANSI C, SSE 2, GQ 58. 3, GQ 58. 4 , MT 19937, MRG 32 K 3 A, LFSR 113 SSE 4. 1, AVX 2 МИЭМ НИУ ВШЭ, 2018 7
RNGAVXLIB (AVX 2) Генераторы Интерфейс Intel Core i 7 -4790 K (4 GHz); Compiler: gcc; Optimization level: -O 3 GM 19, GM 29, GM 31, GM 55, GM 61, GQ 58. 3, GQ 58. 4, LFSR 113, MRG 32 K 3 A, MT 19937 Генератор void rng_init_(rng_state* state); void rng_skipahead_(rng_state* state, unsigned long offset); unsigned int rng_ansi_generate_(rng_state* state); unsigned int rng_sse_generate_(rng_state* state); unsigned int rng_avx_generate_(rng_state* state); GM 29 SIMD ANSI C, SSE 2, SSE 4. 1, AVX 2 Языки C, Fortran Скорость (Gbit/sec) GM 19 GM 31 GM 55 GM 61 GQ 58. 3 GQ 58. 4 LFSR 113 MRG 32 K 3 A MT 19937 МИЭМ НИУ ВШЭ, 2018 8
Результаты 1 ANSI C SSE AVX 2 AVX 512 GM 19 0, 02 0, 30 0, 46 0, 82 GM 29 0, 02 0, 36 0, 50 0, 90 GM 31 0, 01 0, 22 0, 38 0, 67 GM 55 0, 08 0, 48 0, 53 0, 72 GM 61 0, 01 0, 12 0, 18 0, 33 GQ 58 X 1 0, 02 0, 14 0, 21 0, 36 GQ 58 X 3 0, 05 0, 31 0, 33 0, 37 GQ 58 X 4 0, 09 0, 42 0, 46 0, 66 MRG 32 K 3 A 0, 18 0, 83 0, 92 1, 07 MT 19937 0, 30 1, 36 1, 45 1, 47 LFSR 113 0, 42 1, 15 2, 48 4, 04 PHILOX 4 X 32 X 10 0, 24 0, 59 0, 81 0, 98 МИЭМ НИУ ВШЭ, 2018 Intel(R) Xeon Phi(TM) CPU 7210 @ 1. 30 GHz Compiler: gcc; Optimization level: –O 0 9
Результаты 2 ANSI C SSE AVX 2 AVX 512 GM 19 0, 02 0, 29 0, 43 0, 70 GM 29 0, 02 0, 34 0, 45 0, 74 GM 31 0, 01 0, 22 0, 36 0, 57 GM 55 0, 06 0, 43 0, 44 0, 65 GM 61 0, 01 0, 17 0, 30 GQ 58 X 1 0, 02 0, 14 0, 21 0, 34 GQ 58 X 3 0, 03 0, 30 0, 33 GQ 58 X 4 0, 06 0, 38 0, 40 0, 59 MRG 32 K 3 A 0, 97 1, 37 1, 71 2, 57 MT 19937 1, 57 3, 16 3, 80 4, 26 LFSR 113 1, 89 1, 41 1, 52 6, 27 PHILOX 4 X 32 X 10 1, 35 0, 97 1, 51 2, 33 МИЭМ НИУ ВШЭ, 2018 Intel(R) Xeon Phi(TM) CPU 7210 @ 1. 30 GHz Compiler: gcc; Optimization level: –O 3 10