The Development of Process Migration and Dynamic Load

  • Slides: 20
Download presentation
The Development of Process Migration and Dynamic Load Balancing in Cluster Middleware Developer: Advisor:

The Development of Process Migration and Dynamic Load Balancing in Cluster Middleware Developer: Advisor: E-mail: Thadpong Pongthawornkamol Asst. Prof. Dr. Putchong Uthayopas Department of Computer Engineering Faculty of Engineering Kasetsart University b 4205091@prg. cpe. ku. ac. th

Outline • • • Goal Migration Library Load Balancing Module Performance Measurement Problems Future

Outline • • • Goal Migration Library Load Balancing Module Performance Measurement Problems Future Work

Goal • Develop Process Migration feature in cluster middleware • Use Process Migration system

Goal • Develop Process Migration feature in cluster middleware • Use Process Migration system to build Dynamic Load Balancing service in cluster middleware Busy Node . . process Middleware process Node

Migration Library (libmig( • a C dynamic library developed for process migration purpose –

Migration Library (libmig( • a C dynamic library developed for process migration purpose – Based on linux operating system – Preemptive process migration • User-space implemented – no kernel modification • User-transparent – no code recompiled or relinked – operate at run time

Libmig : mechanism signal Process checkpoint restore libmig machine libmig. sck file machine

Libmig : mechanism signal Process checkpoint restore libmig machine libmig. sck file machine

Libmig : example [b 4205091@compute 2 b 4205091]$ migsh. /endless 0 Hello # 0

Libmig : example [b 4205091@compute 2 b 4205091]$ migsh. /endless 0 Hello # 0 from process 29842 Hello # 1 from process 29842 Hello # 2 from process 29842 Hello # 3 from process 29842 Hello # 4 from process 29842 [b 4205091@compute 2 b 4205091]$ kill -s SIGUSR 2 29842 [b 4205091@compute 2 b 4205091]$ ls. migrate/ compute 2. cpe. ku. ac. th-29842. sck [b 4205091@compute 2 b 4205091]$ ]b 4205091@compute 2 b 4205091]$ rsh compute 3 Last login: Mon Feb 24 10: 38: 46 from compute 10 [b 4205091@compute 3 b 4205091]$ migsh. /endless 1. /. migrate/compute 2. cpe. ku. ac. th-29842. sck Hello # 5 from process 3457 Hello # 6 from process 3457 Hello # 7 from process 3457 Hello # 8 from process 3457

Libmig : function interception • Libmig intercepts user’s program by using “LD_PRELOAD” shell variable.

Libmig : function interception • Libmig intercepts user’s program by using “LD_PRELOAD” shell variable. • “LD_PRELOAD” is used to specify dynamic library that will be loaded before running program • Set “LD_PRELOAD=libmig. so” command will automatically load migration library into user’s program

Libmig : function interception (cont. ) Hey, Mr. OS Where’s __libc_start_main() ? __libc_start_main() is

Libmig : function interception (cont. ) Hey, Mr. OS Where’s __libc_start_main() ? __libc_start_main() is in me!! User program libmig. so Ouch!!! libc. so. 6 • Migration library inject itself into user’s program via __libc_start_main().

Libmig : manage stack • Checkpoint stack – Find top of stack address •

Libmig : manage stack • Checkpoint stack – Find top of stack address • Use the last local variable address in top of stack function. – Read memory from top of stack address to address 0 x. C 0000000. – Write stack into checkpoint file using fwrite(). • Restore stack – Read stack from checkpoint file using fread().

Libmig : manage registers • Checkpoint registers – Save the stack context/environment, including register,

Libmig : manage registers • Checkpoint registers – Save the stack context/environment, including register, by setjmp() function , which will write all register information into sigjmp_buf type variable – Write sigjmp_buf variable into checkpoint file • Restore registers – Read sigjmp_buf variable from checkpoint file – Restore stack using longjmp() command

Libmig : manage memory • Checkpoint memory – Receive memory information from memory map

Libmig : manage memory • Checkpoint memory – Receive memory information from memory map file ( /proc/{process ID}/maps ) – Read all memory region except stack (the last line of map file) • Restore memory – Create new memory region using mmap – Map new memory region to “/dev/null” device

Libmig : manage file table • Checkpoint file descriptor table – Get file descriptor

Libmig : manage file table • Checkpoint file descriptor table – Get file descriptor table from /proc/{process ID}/fd directory – Read file name – Read file offset using fseek() command • Restore file descriptor table – Read file name and offset from checkpoint file – Open file and use fseek() to set file descriptor to appropriate offset

Load Balancing module • a dynamic load balancing component built in KSIX cluster middleware

Load Balancing module • a dynamic load balancing component built in KSIX cluster middleware • Work with migration library • KSIX provide process migration service via API (kx_rspawn. IO() , kx_migrate()) or shell command (ksixexec , ksixmigrate(

LB thread • A thread created by KXD for load balancing purpose • Wake

LB thread • A thread created by KXD for load balancing purpose • Wake up every constant time to check node’s load and migrate process to other node if necessary KSIX Daemon KXD LB thread Node proces s KSIX Daemon KXD LB thread Node

LB thread : policy • wake up every constant time • compare local load

LB thread : policy • wake up every constant time • compare local load to global load average • If local load is more than load average, LB will act as a requester, find apropriate node and send migration request to that node • If local load is less than load average, LB will act as an acceptor, choose a request it receives and migrate process from request node to its node.

Performance measurement • submit povray processes into a node in cluster • Measure task

Performance measurement • submit povray processes into a node in cluster • Measure task completion time (min : sec( #processes #nodes 1 2 3 4 5 10: 02 8: 09 7: 34 7: 07 10 15: 00 10: 17 10: 07 8: 55 20 26: 10 15: 50 11: 35 11: 00

problem • Migration library – Process information access and control problem due to user-space

problem • Migration library – Process information access and control problem due to user-space implementation – Require several techniques to solve • Load Balancing module – Merge into KSIX : stability and dependency problem – Future design suggest module separated from KSIX

Future work • Migration library – Migrate socket, multithread process • LB module –

Future work • Migration library – Migrate socket, multithread process • LB module – Separate from KSIX – KXLB (another daemon) – Policy adjustment via API or config file – Use KSIX service layer API to migrate process – Develop advanced policy and measure performance

The End Any Question ?

The End Any Question ?

LB module flowchart Wake up Compare local load to global load average Is local

LB module flowchart Wake up Compare local load to global load average Is local load more than global load average ? no Choose request From busy nodes yes Migrate remote process to local node Send request to apropriate node Sleep for Constant time