Condor Overview and User Guide to the Condor
- Slides: 70
Condor: Overview and User Guide to the Condor Biostatistics Environment
Autoria • Autores – Patrícia Kayser Vargas – Setembro de 2002 – Palestra na Biostat, Wisconsin, EUA • Revisões – V 1 • C. Geyer • PDP/2005 -2, PPGC, UFRGS • Dezembro 2005 2
Topics • Introduction – What is Condor? – Why and when use Condor? – What are Condor Universes? • Running Jobs on Condor – C programs • YAP – Java Programs • Final Remarks 3
Introduction 4
What is Condor? • Condor – is a distributed batch scheduling system • “The goal of Condor is to provide the highest feasible throughput by executing the most jobs over extended periods of time. ” [1] • What is a job? – Several possibilities 5
What is Condor? • Condor – is composed of a collection of different daemons that provide various services, such as • mecanismo de fila de jobs, • políticas de escalonamento, • esquema de prioridades, • monitoramento, • resource management, • job management, • matchmaking. . . 6
What is Condor? Architecture [1] 7
What is Condor? Architecture • Tipos de máquinas – Central Manager • Gerente de uma rede (grade) Condor • Uma por “pool” • Ponto de falha central (¯) – Submit Machines • Máquinas de usuários • Usuário submete, monitora e controla execução de 1 job – Execution Machine (escravo) • Executa jobs – Uma máquina pode ter vários papéis
What is Condor? Architecture • Tipos de máquinas (cont. ) – Check. Point Server • Opcional • Armazena arquivos com checkpoints
What is Condor? Architecture • Condor has four daemons • On Central Manager and on Submit Machines – startd: • monitors the conditions of the resource where it runs • publishes Class. Ads resource offer, and • is responsible for enforcing the resource owner’s policy for starting, suspending, and evicting jobs. – schedd: • maintains a persistent job queue • publishes Class. Ads resource request, and • negotiates for available resources 10
What is Condor? Architecture • Only on Central Manager: – collector: • is the central repository of information • startd and schedd send periodic updates to the collector – negotiator: • periodically performs a negotiation cycle – – process of matchmaking negotiator tries to find matches between various Class. Ads, of resource offers and requests, and once a match is made, both parties are notified and are responsible for acting on that match 11
What is Condor? Architecture [1] 12
What is Condor? Architecture Submitter Executing [1] 13
What is Condor? Architecture • Publicação de Class. Ads de recursos e de jobs que são enviados ao collector – Startd envia (de) recursos – Schedd envia (de) jobs • O collector tudo envia ao negotiator que faz o matchmaking 14
What is Condor? Architecture • Algoritmo de matchmaking – o negotiator pode descobrir recursos no qual um job pode ser executado – ele avisa ao daemon schedd, da máquina que submeteu, com quem ela deve se comunicar para exportar o job – ele avisa o daemon startd da máquina escolhida para executar (recurso ocioso que tem os requisitos) que vai receber um tarefa 15
What is Condor? Architecture • Neste ponto o central manager não age mais, são as duas máquinas que vão executar o job – a máquina de submissão cria um processo shadown • para enviar a tarefa e receber os resultados – a máquina que vai executar • cria um processo starter que recebe a tarefa e • um “user job” que por sua vez executa a tarefa • e ao final os resultados são enviados à máquina de submissão 16
Why and when use Condor? • Condor is useful when – there are several jobs to be submitted – there is one executable and several different input data 17
Why and when use Condor? • Condor is useful because – can use different available machines • opportunistic scheduling – controls file transfers • the job must be able to access the data files from any machine on which it can potentially run – send email notifying when job has completed • except if jobs submitted from a Linux machine 18
What are Condor Universes? • Types of universes – – standard vanilla java parallel • The Universe attribute is specified in the submit description file – the default is standard 19
What are Condor Universes? • standard – provides • checkpointing and • remote system calls – job more reliable and uniform access to resources from anywhere in the pool – to prepare a program as a standard universe job, it must be relinked with condor_ compile 20
What are Condor Universes? • standard – there a few restrictions – complete list in manual http: //www. cs. wisc. edu/condor/manual/v 6. 4/2_4 Road_map_running. html – examples • no multi-process jobs (no fork(), exec(), and system()) • no inter-process communication (includes pipes, semaphores, and shared memory) • no sending or receiving the SIGUSR 2 or SIGTSTP • all files must be opened read-only or write-only 21
What are Condor Universes? • vanilla – used for programs which cannot be successfully re -linked – useful for shell scripts – cannot checkpoint or use remote system calls – sometimes a job must restart from the beginning on another machine in the pool • sem checkpoint 22
What are Condor Universes? • java – can execute on any machine in the pool that will run the Java Virtual Machine – at the moment it does not work at Biostat • departamento de Wisconsin – compiled Java programs can be submitted – creating jar file for programs with several classes is recommended 23
What are Condor Universes? • parallel – MPI and PVM • used for parallel programs using message passing – Globus • must have Condor-G installed – I did not check if they work at Biostats 24
Running Jobs on Condor 25
Running Jobs on Condor • You can submit your jobs from any biostat machine, since all run schedd and startd • You must – set PATH environment variable – prepare a submission file – compile your job with condor_compile if using standard universe – submit your job(s) with condor_submit command 26
Running Jobs on Condor • Submission file – o submit description file é o arquivo que diz • qual é o executável • diretório onde vão ser colocados os arquivos de saída • quantos jobs vão ser instanciados, etc 27
Running Jobs on Condor • Submission file – esse arquivo é transformado em um Class. Add para cada job que precisa ser instanciado • p. ex. se no arq tiver o comando 'queue 50', vão ter que ser executados 50 jobs daquele programa • portanto vão ser publicados 50 Class. Ads no central manager 28
Running Jobs on Condor Setting PATH environment variable • Change PATH to find Condor commands (conforme shell) bash: source /s/pkg/condor. sh PATH=$PATH: /s/pkg/`/s/share/ostoken`/condor/bin; export PATH csh: source /s/pkg/condor. csh set path = ( $path /s/pkg/`/s/share/ostoken`/condor/bin ) rehash 29
Running Jobs on Condor Preparing a submission file • Class. Ads (Classified Advertisement) – pairs of values – syntax similar to C/Java • The commands are case insensitive, i. e. , executable = fact Executable = fact 30
Running Jobs on Condor Preparing a submission file • At least, must have the “executable” attribute: your program/binary Executable = fact • Other useful attribute: input file – your data input = test. data 31
Running Jobs on Condor Compiling your job with condor_compile • If using standard universe: – use condor_compile • it is necessary to relink the program with the Condor library condor_compile gcc fact. c -o fact 32
Running Jobs on Condor Submitting your job(s) with condor_submit • In any Condor Universe – jobs submitted using condor_submit command with submission file as parameter condor_submit condor 1. sub – -v option to see information about submission (full Class. Ad generated) • somente uma lista e encerra (não interativo) condor_submit -v condor 1. sub 33
Example of C Program
Running Jobs on Condor C programs bash-2. 03$ condor_compile gcc fact. c -o fact • options: – – – gcc (the GNU C compiler) cc (the system C compiler) acc (ANSI C compiler, on Sun systems) CC (the system C++ compiler) … (http: //www. cs. wisc. edu/condor/manual/v 6. 4/condor_compile. html) 35
Running Jobs on Condor C programs – exemplo de “submission file” ########## # C Example: demonstrate use of multiple directories # "Arguments = 5" to pass integer 5 as parameter # ########## Executable = fact Universe = standard output = loop. out error = loop. error Log = loop. log Arguments = 5 Initialdir Queue = run_1 = run_2 36
Running Jobs on Condor C programs • Log – contém informações importantes para avaliar a execução/desempenho da aplicação – para um usuário comum talvez não seja tão relevante – descreve cada evento que ocorre com o job, contendo informações de data/hora/máquina • quando: foi submetido, iniciou execução, foi suspendido, foi migrado, terminou (com erro ou com sucesso 37
Running Jobs on Condor C programs • Arguments – parâmetros para o executável – no exemplo; • arguments = 5 • equivaleria a executar no terminal 'fact 5' • Initialdir – onde os arquivos output/erro/log vão ser armazenados – initialdir= run_1 • Diretório “run_1” 38
Running Jobs on Condor C programs • Queue – roda uma única instância de job, usando run_1 como initialdir – diretório deve ser criado antes de rodar o condor_sub senão dá erro • “Initialdir = run_2” e “Queue” – mais uma instância do job agora em outro diretório 39
Running Jobs on Condor C programs outro exemplo de “submission file” ########## # C Example: # each job runs with a different argument and # store results in different files ########## Executable = fact notify_user = kayser@cos. ufrj. br Input Output Error Log = = in. $(Process) out. $(Process) err. $(Process) fact. log Queue 2 40
Running Jobs on Condor C programs • notify_user = kayser@cos. ufrj. br – diz para enviar msg avisando do término do job • Input = in. $(Process) – $(Process): variável do condor Process • que é instanciada com número inteiro sequencial para cada job criado • assim: vai criar in. 0, in. 1, in. 2 e 41
Running Jobs on Condor C programs • Log = fact. log – um único arquivo de log apesar de vários jobs – eventos são anotados com número do job • Queue 2 – cria dois jobs – pode ser colocado qq nro inteiro – Queue 100 • cria 100 tarefas 42
Running Jobs on Condor C programs – YAP • To configure YAP with Condor: configure --enable-depth-limit --enable-condor make 43
Running Jobs on Condor C programs – YAP • condor. sub Universe = standard Executable = /u/dutra/Yap-4. 3. 20/condor/yap. $$(Arch). $$(Op. Sys) Initialdir = /u/dutra/App/f 1/train_best Log = /u/dutra/App/f 1/train_best/log Requirements = ((Arch == "INTEL" && Op. Sys == "LINUX") && (Mips >= 500) || (Is. Dedicated && Uid. Domain == "cs. wisc. edu")) Arguments Input Output Error = = -b /u/dutra/Yap-4. 3. 20/condor/. . /pl/boot. yap condor. in. $(Process) /dev/null Queue 300 44
Running Jobs on Condor C programs – YAP • condor. in. 0 [‘~/Yap-4. 3. 20/condor/. . /pl/init. yap']. module(user). [‘~/Aleph/aleph. pl']. read_all(‘~/App/f 1/train_best/train'). set(i, 5). set(minacc, 0. 7). set(clauselength, 5). set(recordfile, ‘~/App/f 1/train_best/trace-0. 7 -5. 0'). set(test_pos, ‘~/App/f 1/train_best/test. f'). set(test_neg, ‘~/App/f 1/train_best/test. n'). set(evalfn, coverage). induce. write_rules(‘~/App/f 1/train_best/theory-0. 7 -5. 0'). halt. 45
Example of Java Program
Running Jobs on Condor Java programs • Using Java Universe • Does not need to compile with Condor • Use jar file to programs with several classes: http: //java. sun. com/docs/books/tutorial/jar/ • If using Computer Science environment, must grant access of files to be used on AFS http: //www. cs. wisc. edu/condor/uwcs/ 47
Running Jobs on Condor Java programs ########## # Example in Java Universe # executable must have the. class file and # arguments must have the main class as first argument ########## universe = java executable = Fact. class arguments = Fact notify_user = kayser@cos. ufrj. br output = loop. out error = loop. error log = loop. log Queue 48
Running Jobs on Condor Java programs ########## # Example in Java Universe using jar file ########## universe = java executable = jgf. Section 2. jar arguments = JGFAll. Size. A 4 jar_files = jgf. Section 2. jar transfer_files = ALWAYS output error log Queue = log. All. Section 2 f. out = log. All. Section 2 f. error = log. All. Section 2 f. log 49
Running Jobs on Condor Java programs • executable = jgf. Section 2. jar – é um jar – não um. class como no exemplo anterior • arguments = JGFAll. Size. A 4 – dois argumentos – exemplo gerado a partir do Java. Grand • jar_files = jgf. Section 2. jar – parece redundante – mas sem esse argumento arquivo não é transferido 50
Running Jobs on Condor Java programs • transfer_files = ALWAYS – idem: para transferir. jar – talvez um erro que tenha sido resolvido 51
Running Jobs on Condor Inspecting Condor Jobs • Some useful commands: – condor_q • mostra fila de jobs submetidos localmente – condor_q -analyze • mais informações • permitindo entender se um job não está executando pq teve algum problema nos requisitos ou se não há recurso 52 • condor_q –submitter <user>
Running Jobs on Condor Inspecting Condor Jobs • condor_q -run – mostra apenas os jobs que estão em execução • condor_q -submitter <user> – filtra pra mostrar informações apenas dos jobs submetidos pelo “user” 53
Running Jobs on Condor Inspecting Condor Jobs • condor_status – mostra cada uma das máquinas da condor_pool – mostrando informações • estáticas (p. ex. qual o SO) • dinâmicas (p. ex. se está ociosa ou ocupada) 54
Running Jobs on Condor Inspecting Condor Jobs • condor_rm – se resolver remover um job ou conjunto de jobs da fila – parecido como o kill – precisa dar o número do job • condor_q -global – mostra informações de todas as filas – em todas as máquinas onde houve submissão 55
Final Remarks 56
Final Remarks • So, Condor. . . – controls execution of several jobs – can really improve your runtime • Yap+Aleph: during three months: 53, 000 CPU hours (peak of 400 machines) • But, Condor. . . – does not automatically parallelize your job 57
Final Remarks • Running Jobs on Condor - Observations: – input data file and directory used to output/log/error must be previously created, • otherwise an error will be reported and no job will be executed – for each execution, • the outputs are appended to log files • the results are overwritten to out files – error, log and out files must have different names • to avoid race conditions 58
Final Remarks • Trabalhos sobre gerenciamento de dados – mas não sei até que ponto integrados ao Condor? – Stork (Data Placement Scheduler): http: //www. cs. wisc. edu/condor/stork – Kangaroo (parece que esse foi abandonado): http: //www. cs. wisc. edu/condor/kangaroo – Ne. ST: Network Storage : http: //www. cs. wisc. edu/condor/nest/ 59
Final Remarks • Trabalho sobre monitoração – Hawkeye System Monitoring Tool: http: //www. cs. wisc. edu/condor/hawkeye/ 60
Final Remarks • More information about Condor: http: //www. cs. wisc. edu/condor/ • Tutoriais – http: //www. cs. wisc. edu/condor/Condor. Week 2006/ – http: //www. cs. wisc. edu/condor/Condor. Week 2005/ presentations. html • More information about running Condor: http: //www. cs. wisc. edu/condor/manual/v 6. 4/ 61
Final Remarks • References: – [1] WRIGHT, Derek. Cheap cycles from the desktop to the dedicated cluster: combining opportunistic and dedicated scheduling with Condor. In: Conference on Linux Clusters: The HPC Revolution, June, 2001, Champaign - Urbana, IL - USA. http: //www. cs. wisc. edu/condor/doc/cheap-cycles. pdf 62
NMR-Star file to Class. Ad Patrícia Kayser Vargas Mangan kayser@cos. ufrj. br September, 2002
NMR-Star to Class. Ad • Bio. Mag. Res. Bank (http: //www. bmrb. wisc. edu) – an international repository for biological NMR (nuclear magnetic resonance) data – uses the NMR Self-defining Text Archival and Retrieval (NMR-STAR) format to store its data • NMR-STAR is characterized by a set of information organized as a hierarchical tree – stored as plain text file – some may have inconsistencies that are manually verified 65
NMR-Star to Class. Ad • Class. Ads – a simple representation language used first in the Condor context, • Steps: – conversion of NMR-STAR data to Class. Ads format using starlibj (Java package) – use to detect inconsistencies on NMR-STAR files 66
NMR-Star to Class. Ad • Future work: – Matchmaking as consistency checker – try to “learn” similarities among NMR data • Working with R. Kent Wenger from the Condor team of UW-Madison 67
68
TALK 1: Condor: Managing Resources in the Biostatistics Department Environment TALK 2: Using Class. Ads to Represent NMR Data
What is Condor? Architecture • After schedd receives a match for a given job, the schedd enters into a claiming protocol directly with the startd • Through this protocol, the schedd presents the job Class. Ad to the startd and requests temporary control over the resource 70
- Single user and multi user operating system
- Single user and multiple user operating system
- Vocabulary overview guide
- Wims nap
- Calyx point user guide
- Stratix 10 emif user guide
- Ellucian crm recruit user guide
- Qad barcoding solution
- Hp ppm project management user guide
- Stratix 10 power management user guide
- Hp data protector manual
- Fsaa dei user guide
- Olcf summit user guide
- Ataspas
- 7 series fpgas clocking resources user guide
- Ariba sourcing
- Pizza ontology tutorial
- Spartan 6 power estimator
- Pentaho multiway merge join
- Culinary outpost
- Condeco car parking
- Chrome river invoice user guide
- Cadence analog design
- Mainframe esp tutorial
- Astea user guide
- Sentaurus
- Sentaurus tcad tutorial
- Orderconnect
- Nuage vns user guide
- Assist user guide
- Justin burley
- Receipt bank user guide
- Emis proxy access
- How to change display name on mitel 8528 phone
- Fortify webinspect tutorial
- Destin8 user guide
- Synopsys dc
- Jac computer system
- Visa intellilink user guide
- Infor eam upload utility
- Orbital payment processing
- Litmos training
- Mivoice 6867 ip phone
- Oracle shop floor management user guide r12
- Usps business customer gateway login
- Tmva user guide
- Parents gateway user guide
- Tmva courses
- Accuplacer user guide
- El condor la orquidea y la palma de cera
- Dr milena ruiz
- Condor distributed computing
- Condor aero club
- Condor soaring
- Bagne de poulo condor
- Apis daten condor
- Condor job flavour
- A321b condor
- The condor cluster
- El condor pasa horse
- Condor distributed computing
- Condor de1668
- Condor scheduler
- Ccondor
- Condor homepage
- Whats a condor
- Condor grid
- Condor atm
- Critical thinking cda
- Condor software
- Condor grid