Metodologa de programacin paralela Intel Software College Objetivos
Metodología de programación paralela Intel Software College
Objetivos Al final de este módulo • Será capaz de realizar un prototipo y estimar el esfuerzo requerido paralelizar regiones que consumen tiempo Threaded Programming Methodology 2 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Agenda Un ciclo de desarrollo genérico Caso de estudio: Generación de números primos Algunos problemas de rendimiento comunes Threaded Programming Methodology 3 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
¿Qué es paralelismo? Dos o más procesos o hilos se ejecutan al mismo tiempo Paralelismo para arquitecturas con varios núcleos • Múltiples procesos • Comunicación a través de IPCs (Inter-Process Communication) • Un solo proceso, múltiples hilos • Comunicación a través de memoria compartida Threaded Programming Methodology 4 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Ley de Amdahl Describe el límite máximo de aceleración con ejecución paralela 0. 5 P/2 (1 -P) + 0. 25 0. 0 Tparalelo = {(1 -P) + P/n} Tserial n = número de procesadores … (1 -P) Tserial P n=∞ 2 P/∞ 1. 0/0. 5 1. 0/0. 75==2. 0 1. 33 Aceleración = Tserial / Tparalelo El código serial limita la aceleración Threaded Programming Methodology 5 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Procesos e Hilos Stack thread main() Stack thread Stack … thread Code segment Data segment Los sistemas operativos modernos cargan programas como procesos • Tiene recursos • Ejecución Un proceso inicia ejecutando su punto de entrada como un hilo Los hilos pueden crear otros hilos dentro del proceso • Cada hilo obtiene su propio stack Todos los hilos dentro de un proceso comparten código y segmentos de datos Threaded Programming Methodology 6 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Hilos – Beneficios y riesgos Beneficios • Mayor rendimiento y mejor utilización de recursos • Incluso en sistemas con un solo procesador – para esconder latencia e incrementar el tiempo de respuesta • Comunicación entre procesos a través de memoria compartida es más eficiente Riesgos • Incrementa la complejidad de la aplicación • Difícil de depurar (condiciones de concurso, interbloqueos, etc. ) Threaded Programming Methodology 7 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Preguntas comunes cuando se paralelizan aplicaciones ¿Dónde paralelizar? ¿Cuánto tiempo lleva paralelizar? ¿Cuánto esfuerzo para rediseñar se requiere? ¿Es útil paralelizar una región específica? ¿Qué tanto se espera acelerar? ¿El rendimiento va de acuerdo a mis expectativas? ¿Será escalable a más hilos/datos añadidos? ¿Qué modelo de paralelización utilizar? Threaded Programming Methodology 8 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Generación de números primos i factor 61 63 65 67 69 71 73 75 77 79 357 3579 35 3579 bool Test. For. Prime(int val) { // let’s start checking from 3 int limit, factor = 3; limit = (long)(sqrtf((float)val)+0. 5 f); while( (factor <= limit) && (val % factor) ) factor ++; return (factor > limit); } void Find. Primes(int start, int end) { int range = end - start + 1; for( int i = start; i <= end; i += 2 ) { if( Test. For. Prime(i) ) global. Primes[g. Primes. Found++] = i; Show. Progress(i, range); } } Threaded Programming Methodology 9 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Actividad 1 Ejecutar la versión serial de los números primos • Buscar el directorio Prime. Single • Compilar con el compilador de Intel • Ejecutar algunas veces con rangos diferentes Threaded Programming Methodology 10 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Metodología de desarrollo Análisis • Buscar código donde se realiza cómputo intensivo Diseño (Introducir Hilos) • Determinar como implementar una solución paralelizada Depurar • Detectar cualquier problema como resultado de usar hilos Afinar para mejorar el rendimiento • Lograr el mejor rendimiento en paralelo Threaded Programming Methodology 11 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Ciclo de desarrollo Análisis –VTune™ Performance Analyzer Diseño (Introducir Hilos) –Intel® Performance libraries: IPP y MKL –Open. MP* (Intel® Compiler) –Creación explícita de hilos (Win 32*, Pthreads*) Depuración –Intel® Thread Checker –Intel Debugger Afinar para mejorar el rendimiento –Intel® Thread Profiler –VTune™ Performance Analyzer Threaded Programming Methodology 12 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Analisis - Muestreo Usar el muestreo en la aplicación bool Test. For. Prime(int val) { // let’s start checking from 3 con VTune Sampling para int limit, factor = 3; encontrar hotspots limit = (long)(sqrtf((float)val)+0. 5 f); while( (factor <= limit) && (val % factor)) factor ++; return (factor > limit); Usaremos el proyecto Prime. Single para el análisis } • Prime. Single <inicio> <fin> Uso: void Find. Primes(int start, int end) {. /Prime. Single 1 1000000 // start is always odd int range = end - start + 1; for( int i = start; i <= end; i+= 2 ){ if( Test. For. Prime(i) ) global. Primes[g. Primes. Found++] = i; Show. Progress(i, range); } } Identifica las regiones que consumen tiempo Threaded Programming Methodology 13 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Análisis – Gráfico de llamadas a funciones Este es el nivel en el árbol de llamadas donde necesitamos paralelizar Usado para encontrar el nivel adecuadoen el árbol de llamadas paralelizar Threaded Programming Methodology 14 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Análisis ¿Dónde paralelizar? • Find. Primes() ¿Vale la pena paralelizar una región seleccionada? • Parece que tiene un mínimo de dependencias • Aparenta ser paralelo en los datos • Consume sobre el 95% del tiempo de ejecución Medición base Threaded Programming Methodology 15 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Actividad 2 Ejecuta el código con el rango de ‘ 1 5000000’ para obtener la medición base • Tomar nota para referencias futuras Ejecutar la herramienta de análisis VTune en el código serial • ¿Qué función se lleva la mayor parte del tiempo? Threaded Programming Methodology 16 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Metodología de diseño de Foster De “Designing and Building Parallel Programs” por Ian Foster Cuatro pasos: • Particionar • Dividir cómputo y datos • Comunicación • Intercambio de datos entre cómputos • Aglomeración • Agrupar tareas para mejorar rendimiento • Mapeo • Asignar tareas a procesadores/hilos Threaded Programming Methodology 17 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Diseñando programas paralelos Particionar • Divide el problema en tareas Problema Comunicar • Determina la cantidad y el patrón de comunicación Aglomerar Tareas iniciales • Combinar tareas Mapear Comunicación • Asignar tareas aglomeradas a los hilos generados Tareas combinadas Programa final Threaded Programming Methodology 18 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Modelos de programación paralelos Descomposición funcional • Paralelismo de tareas • Dividir el cómputo, asociarle datos • Tareas independientes del mismo problema Descomposición de datos • La misma operación ejecutando diferentes datos • Dividir datos en piezas, asociarles cómputo Threaded Programming Methodology 19 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Métodos de descomposición Descomposición funcional Modelo atmosférico • Enfocarse a cómputo puede revelar la estructura en un problema Modelo de hidrología Grid reprinted with permission of Dr. Phu V. Luong, Coastal and Hydraulics Laboratory, ERDC Modelo Oceano Modelo terrestre Descomposición por dominio • Enfocarse en la estructura de datos más grande o más frecuentemente accesada • Paralelismo en los datos • La misma operación aplicada a todos los datos Threaded Programming Methodology 20 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Descomposición por Pipeline La computación se hace en etapas independientes Descomposición funcional • Los hilos se asignan a una etapa a computar • Línea de ensamble de automóviles Descomposición de datos • Los hilos procesan todas las etapas de una sola instancia • Un trabajador construye un auto completito Threaded Programming Methodology 21 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Estrategia del LAME Encoder LAME MP 3 encoder • Proyecto Open source • Herramienta educativa El objetivo de este proyecto es • Mejorar la calidad • Mejorar la velocidad de la codificación a MP 3 Threaded Programming Methodology 22 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Estrategia de LAME Pipeline Preludio Extraer siguiente frame Caracterización del frame Poner parámetros del encoder Time T 1 Acústicos Analisis FFT long/short Ensamblar el filtro Otro Agregar encabezado del frame Verificar si es correcto Escribe al disco Aplicar filtros Frame Suprimir ruidos Cuantiza y cuenta bits Hierarchical Barrier Preludio N+1 Acústicos N T 2 Codificación T 3 Preludio N+2 Preludio N+3 Acústicos N+1 Acústicos N+2 Codificación N+1 Otro N T 4 Frame N Otro N+1 Frame N + 1 Threaded Programming Methodology 23 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Diseño ¿Cuál es el beneficio esperado? Aceleración(2 P) = 100/(96/2+4) = ~1. 92 X ¿Cómo logramos esto con el menor esfuerzo? Prototipo rápido con Open. MP ¿Cuánto se lleva paralelizar? ¿Cuánto esfuerzo se requiere para rediseñar? Threaded Programming Methodology 24 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Open. MP Paralelismo Fork-join: • El hilo maestro se divide en un grupo de hilos como sea necesario • El paralelismo va incrementando • Un programa secuencial evoluciona a un programa paralelo Hilo maestro Regiones Paralelas Threaded Programming Methodology 25 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Diseño #pragma omp parallel for( int i = start; i <= end; i+= 2 ){ ) Divide Open. MPif( Test. For. Prime(i) iteraciones de el = i; global. Primes[g. Primes. Found++] ciclo for Show. Progress(i, Crea hilos aquírange); para Esta región paralela } Threaded Programming Methodology 26 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Actividad 3 Ejecuta la versión Open. MP del código • Localiza el directorio Prime. Open. MP y la solución • Compila el código • Ejecuta con ‘ 1 5000000’ para comparar • ¿Cuál es la aceleración? Threaded Programming Methodology 27 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Diseño ¿Cuál es el beneficio esperado? ¿Cómo logras esto con el menor esfuerzo? Aceleración de 1. 40 X (menor que 1. 92 X) ¿Cuánto tiempo se llevó paralelizar? ¿Cuánto esfuerzo se requiere para rediseñar? ¿Es la mejor aceleración posible? Threaded Programming Methodology 28 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Depuración ¿Es la implementación correcta de paralelismo? No! Los resultados son diferentes cada ejecución … Threaded Programming Methodology 29 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Depuración Intel® Thread Checker señana errores notorios en al paralelizar como condiciones de concurso, stalls e interbloqueos VTune™ Performance Analyzer Intel® Thread Checker Primes. exe Instrumentación Binaria Colector de datos en tiempo de ejecución Primes. exe (Instrumentado) +DLLs (Instrumentado) threadchecker. thr (archivo resultante) Threaded Programming Methodology 30 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Thread Checker Threaded Programming Methodology 31 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Actividad 4 Usa Thread Checker para analizar la aplicación paralelizada • Crear una actividad “Thread Checker activity” • Ejecuta la aplicación • ¿Se reportan errores? Threaded Programming Methodology 32 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Depuración ¿Qué tanto esfuerzo se requiere para rediseñar? Thread Checker reportó solo 2 dependencias, por lo tanto el esfuerzo necesario debe ser bajo ¿Cuánto tiempo llevará paralelizar? Threaded Programming Methodology 33 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Depuración #pragma omp parallel for( int i = start; i <= end; i+= 2 ){ Creará una sección crítica #pragma omp critical para esta global. Primes[g. Primes. Found++] = i; referencia if( Test. For. Prime(i) ) Show. Progress(i, range); } Creará una sección crítica para ambas referencias #pragma omp critical { g. Progress++; percent. Done = (int)(g. Progress/range *200. 0 f+0. 5 f) } Threaded Programming Methodology 34 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Actividad 5 Modifica y ejecuta la versión del código Open. MP • Añade pragmas de regiones críticas al código • Compila el código • Ejecuta dentro del Thread Checker • Si aun hay errores, haz las correcciones adecuadas al código y ejecútalas nuevamente en el Thread Checker • Ejecuta con ‘ 1 5000000’ para fines de comparación • Compila y ejecuta fuera del Thread Checker • ¿Cuál es la aceleración? Threaded Programming Methodology 35 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Depuración Respuesta correcta, pero el rendimiento bajo al ~1. 33 X 1. 33 ¿Es lo mejor que podemos esperar de este algoritmo? No! De acuerdo a la Ley de Amdahl, podemos esperar una aceleración cerca de 1. 9 X Threaded Programming Methodology 36 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Problemas comunes de rendimiento Sobrecarga en paralelo • Dada por la creación de hilos, planificación… Sincronización • Datos globales excesivos, contención de los mismos objetos de sincronización Carga desbalanceada • Distribución no adecuada del trabajo en paralelo Granularidad • No hay suficiente trabajo paralelo Threaded Programming Methodology 37 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Afinando para mejorar rendimiento Thread Profiler señala cuellos de botella en aplicaciones paralelas VTune™ Performance Analyzer Primes. c Compilador /Qopenmp_profile Instrumentación fuente Primes. exe Thread Profiler Instrumentación Binaria Colector de datos en tiempo de ejecución Primes. exe (Instrumentado) +DLL’s (Instrumentado) Bistro. tp/guide. gvs (archivo de resultados) Threaded Programming Methodology 38 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Thread Profiler para Open. MP Threaded Programming Methodology 39 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Thread Profiler para Open. MP Gráfica de aceleración Estima la aceleración al paralelizar y aceleración potencial – Basada en la ley de Amdahl Da las fronteras inferiores y superiores Threaded Programming Methodology 40 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Thread Profiler para Open. MP serial paralelo serial Threaded Programming Methodology 41 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Thread Profiler para Open. MP Thread 0 Thread 1 Thread 2 Thread 3 Threaded Programming Methodology 42 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Thread Profiler (para Hilos Explicitos) Threaded Programming Methodology 43 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Thread Profiler (para Hilos Explicitos) ¿Porqué demasiadas transiciones? Threaded Programming Methodology 44 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Rendimiento Esta implementación tiene llamadas de sincronización implícitas Esto limita la expansión del rendimiento debido a los cambios de contexto resultantes Regreso a la etapa de diseño Threaded Programming Methodology 45 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Actividad 6 Utilizar Thread Profiler para analizar una aplicación paralelizada • Usar /Qopenmp_profile para compilar y encadenar • Crear actividad “Thread Profiler Activity (for explicit threads)” • Ejecuta la aplicación en el Thread Profiler • Encuentra la línea en el código fuente que está causando que los hilos estén inactivos Threaded Programming Methodology 46 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Rendimiento ¿Es esto mucha contención esperada? void Show. Progress( int val, int range ) { int percent. Done; static int last. Percent. Done = 0; int percent. Done; #pragma omp critical g. Progress++; {percent. Done = (int)((float)g. Progress/(float)range*200. 0 f+0. 5 f); g. Progress++; percent. Done = (int)((float)g. Progress/(float)range*200. 0 f+0. 5 f); if( percent. Done % 10 == 0 ) } printf("bb%3 d%%", percent. Done); if( percent. Done % 10 == 0 && last. Percent. Done < percent. Done / 10){ } printf("bb%3 d%%", percent. Done); last. Percent. Done++; } } El Este algoritmo tienedebe mucho más actualizaciones que las 10 cambio arreglar el problema de contención necesarias para mostrar el progreso Threaded Programming Methodology 47 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Diseño Metas • Elimina la contención implícita debido a la sincronización Aceleración es 2. 32 X ! ¿Es correcto? Threaded Programming Methodology 48 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Rendimiento Nuestra línea base de medición ha “viciado” el algoritmo de actualización del progreso ¿Es lo mejor que podemos esperar de este algoritmo? La velocidad actual es 1. 40 X (<<1. 9 X)! Threaded Programming Methodology 49 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Actividad 7 Modifica la función Show. Progress (serial y Open. MP) para que muestre solo la salida necesitada if( percent. Done % 10 == 0 && last. Percent. Done < percent. Done / 10){ printf("bb%3 d%%", percent. Done); last. Percent. Done++; } • Recompila y ejecuta el código • Asegurarse que no se usan banderas de instrumentación • ¿Cuál es la aceleración de la versión serial? Threaded Programming Methodology 50 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Revisando el Rendimiento Sigue teniendo 62% de tiempo de ejecución en locks y sinchronización Threaded Programming Methodology 51 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Revisando el Rendimiento Veamos los Locks de Open. MP… void Find. Primes(int start, int end) { { // start is always odd int range = end - start + 1; #pragma omp parallel for for( int i = start; i <= end; i += 2 ) { { El lock está en un if( Test. For. Prime(i) ) #pragma omp critical ciclo global. Primes[Interlocked. Increment(&g. Primes. Found)] = i; global. Primes[g. Primes. Found++] = i; } } Show. Progress(i, range); Threaded Programming Methodology 52 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Revisando el Rendimiento Veamos el segundo lock void Show. Progress( int val, int range ) { { int percent. Done; long percent. Done, local. Progress; static int last. Percent. Done = 0; Este lock también está static int last. Percent. Done = 0; siendo llamado dentro #pragma omp critical local. Progress = Interlocked. Increment(&g. Progress); { de un ciclo percent. Done = (int)((float)local. Progress/(float)range*200. 0 f+0. 5 f); g. Progress++; percent. Done = (int)((float)g. Progress/(float)range*200. 0 f+0. 5 f); if( percent. Done % 10 == 0 && last. Percent. Done < percent. Done / 10){ } if( printf("bb%3 d%%", percent. Done % 10 == 0 &&percent. Done); last. Percent. Done < percent. Done / 10){ last. Percent. Done++; printf("bb%3 d%%", percent. Done); } last. Percent. Done++; } } } Threaded Programming Methodology 53 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Actividad 8 Modifica las regiones críticas de Open. MP para reemplazarlas Interlocked. Increment • Re-compila y ejecuta el código • ¿Cuál es la aceleración con respecto a la versión serial? Threaded Programming Methodology 54 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Thread Profiler para Open. MP Hilo 0 250000 342 factores para probar 116747 Hilo 1 612 factores para probar 373553 500000 Hilo 2 789 factores para probar 623759 750000 Hilo 3 1000000 934 factores para probar 873913 Threaded Programming Methodology 55 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Arreglando la Carga Desbalanceada Distribuye el trabajo más equitativamente void Find. Primes(int start, int end) { // start is always odd int range = end - start + 1; #pragma omp parallel for schedule(static, 8) for( int i = start; i <= end; i += 2 ) { if( Test. For. Prime(i) ) global. Primes[Interlocked. Increment(&g. Primes. Found)] = i; Show. Progress(i, range); } } La aceleración lograda es 1. 68 X Threaded Programming Methodology 56 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Actividad 9 Modifica el código para mejorar el balanceo de carga • Agrega la cláusula schedule (static, 8) en el pragma parallel for de Open. MP • Re-compila y ejecuta código • ¿Cuál es la aceleración con respecto al código serial? Threaded Programming Methodology 57 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Ejecución final del Thread Profiler La aceleración lograda es 1. 80 X Threaded Programming Methodology 58 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Análisis Comparativo Las aplicaciones paralelas requieren varias iteraciones al pasar por el ciclo de desarrollo de software Threaded Programming Methodology 59 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Metodología de programación paralela Lo que se Cubrió Cuatro pasos del ciclo de desarrollo para escribir aplicaciones paralelas desde el código serial y las herramientas de Intel® para soportar cada paso • Análisis • Diseño (Introducir Hilos) • Depurar para la correctud • Afinar el rendimiento Las aplicaciones paralelas requieren múltiples iteraciones de diseño, depuración y afinación de rendimiento Usar las herramientas para mejorar productividad Threaded Programming Methodology 60 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Threaded Programming Methodology 61 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Diapositivas Adicionales Threaded Programming Methodology 62 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Sobrecarga en paralelo Sobrecarga de creación de los hilos • La sobrecarga incrementa conforme incrementa el número de hilos activos Solución • Uso de hilos reusables y “thread pools” • Amortiza el costo de crear hilos • Mantiene el número de hilos activos relativamente constante Threaded Programming Methodology 63 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Sincronización Contención por asignación dinámica de memoria • La asignación dinámica de memoria causa sincronización implícita • Asignar en el stack para usar almacenamiento local en los hilos Actualizaciones atómicas versus secciones críticas • Algunas actualizaciones de datos globales pueden usar operaciones (familia Interlocked) • Usar actualizaciones atómicas cada que sea posible Secciones Críticas versus exclusión mutua • Los objetos de Sección Crítica residen en el espacio del usuario • Usar objetos CRITICAL SECTION cuando no se requiere visibilidad más allá de los límites del proceso • Introduce menos sobrecarga • Tiene una variante de spin-wait que es útil para algunas aplicaciones Threaded Programming Methodology 64 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Trabajo no balanceado Cargas de trabajo desigual nos llevan a hilos ociosos y tiempo desperdiciado Tiempo Ocupado Ocioso Threaded Programming Methodology 65 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Granularidad Grano grueso Escala: ~2. 5 X ~3 X Serial Porción paralelizable Grano fino Serial Porción paralelizable Escala: ~1. 05 X ~1. 10 X Threaded Programming Methodology 66 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
- Slides: 66