Dynamic Linker Performance Introduction Tomasz Andel ELF Extensible

Dynamic Linker { Performance - Introduction Tomasz Andel

_ ELF (Extensible Linking Format) -. o. bin. elf. so and more. .

_ Where is my symbol ? _ Statically linked – fixed load address known by kernel _ Dynamically linked – kernel does not know the addresses. _ Dynamic linker (ld-linux. so) _ Relocations or Lazy relocations! _ Transfer control to program _ execve() – kernel program startup _ Starting up dynamic linker _ Same as executable!!! (Anyone see any Security issues here ? ) _ Kernel gives the control to dynamic linker _ Dynamic linker has to: _ Determine and load dependencies _ Relocate application and all dependencies ! _ Initialize application and dependencies in the correct order

_ Relocation process _ Most expensive part of dynamic linker’s work _ O(R + n*r) - R- Relative relocations , r – named relocations, n – number of DSO + 1 main executable _ O (R + n* r * log s) – s –number of symbols (in case of some ELF extensions) The result of a text relocation is that the binary text is written to. This means this page of the binary cannot be physically shared with other processes on the system (this is the goal of DSOs!). It also means that the binary must have permission to change the access permissions for the memory page to include writing and then back to executing! . data (Read/write) segment relocations vs. text (read only) segment relocations! PREVENT from rela. text TEXT RELOCATIONS! How ? Position Independent Code!

_ Relative relocations not known at link-time If any instruction at offset 0 x 80 in code section wants to reference anything in data section, the linker already knows relative offset which is. . . 0 x. EF 80, and that is enought!

_ Symbol relocations Hash table size can be optimized by –O to linker. GNU favors small table sizes _ Symbols used at run-time and not known at link-time over short chain length, this may increase startup costs for large projects. _ Many. Less references samechances symbol. . Each computed ! function symbols =tobetter for suboptimal hashing _ ELF Hash. Table _ First found first served _ LD_PRELOAD – DSO introduced at run-time (my tcmalloc blog post for more info) _ Hash Chain length _ Number of objects _. . C-strings. .

_ Symbol relocations _ Open. Office. org _ 144 DSO _ 20. 000 relocations _ Most using dlopen – thus no pre-linking allowed! _ Average for each symbol: ~85 string comparisions _ 20. 000 * 85 = 1. 700. 000 comparisions _ Average length of exported symbol : ~54 _ If only 20% (in reality much more) of string is searched before finding a mismatch this means total of. . . ~ 18. 000 characters to be loaded from memory and compared Use of different than default hash table like –hash-style=gnu for Open. Office will improve startup time by factor 33.

_ Lookup scope _ Breadth-first search _ DT_NEEDED in executables as first root level _ dlopen() parameters _ RTLD_GLOBAL – adds loaded object and all dependencies to global search scope (BAD IDEA!) _ RTLD_DEEPBIND – dynamic linker first search local scope before going to global scope – beware of LD_PRELOAD (global scope) _. . and more. .

_ Position-independent data addressing _ GOT _ Refer to variable – not using absolute _ _ address (relocation required) but address stored in GOT Relocations in code section required per variable reference! Here only once per variable Data section is writable and not shared between processes – easy to add new relocations. _ What about function calls in PIC ?

_ Lazy function resolution _ PLT Second call: First call: PLT[n] is called and jumps to the address pointed to in GOT[n]. ___PLT[n] is called and jumps to the address pointed to in GOT[n] points to func, so this just transfers control to func. _ This address points into PLT[n] itself, to the preparation of arguments for the resolver. _ The resolver is then called. _ The resolver performs resolution of the actual address of func, places its actual address into GOT[n] and calls func.

_ Each function which is called which is not guaranteed to be defined in the calling object requires a PLT entry _ PLT jumps are expensive _ Avoiding GOT when accessing global variable saves memory and relocations _ NOTE: address of the location in the GOT relative to the PIC register is known at link-time. _ Conclusion: No. text segment have to be changed , only the GOT one (. data!) _ Costs summary _ Code size: smaller ELF binaries need less memory at run-time _ Number of Objects: fewer objects loaded at run-time _ Lookup scope grows _ More symbol tables which in turn means more duplications _ Initializers/finalizers sorting much more complicated (MDW tricky BUG case!) _ Every time new DSO is requested the list of already loaded DSO must be searched _ Number of Symbols: number of exported and undefined symbols determines the size of dynamic symbol table, hash table, average hash table chain length _ Length of Symbol Strings: mangling scheme, string comparision (C-strings) _ Address space issue for 32 -bit machines as long strings has to be present during run-time _ Number of relocations: processing relocations is a major work during start _ Type of Relocations: Avoid text segment relocations, relative relocations are better than normal ones. _ Placement of Code and Data: Executable code should be placed in read-only memory. Const correctness. _ Zero initialized variables does not have to be initialized from file content.

Thanks! Questions ?