Intro to Reverse Engineering intropy Intro Why do

  • Slides: 63
Download presentation
Intro to Reverse Engineering ~ intropy ~

Intro to Reverse Engineering ~ intropy ~

Intro

Intro

Why do we reverse engineer? • Closed source software – Vulnerability Research – Product

Why do we reverse engineer? • Closed source software – Vulnerability Research – Product verification • Proprietary formats – Interoperability • SMB on UNIX • Word compatible editors • Virus research

Why should you give a fuck? • Basis of computing – Reverse engineering teaches

Why should you give a fuck? • Basis of computing – Reverse engineering teaches the inner workings of any processor – Learning how the processor handles data helps in understanding many other aspects of computer security • All the cool kids are doing it (not really)

Real Time RCE (Debugging) • Debuggers that disassemble – Olly. Dbg – Win. Dbg

Real Time RCE (Debugging) • Debuggers that disassemble – Olly. Dbg – Win. Dbg – Soft. Ice • Code actually runs – The application actually executes all instructions as if it was ran normally • Uses interrupts to control execution of the program – Swaps out the current instruction with an interrupt instruction code – Swaps it back when the execution is continued

Static Analysis (Dead Listing) • Traditional disassemblers – IDA Pro – W 32 Dasm

Static Analysis (Dead Listing) • Traditional disassemblers – IDA Pro – W 32 Dasm – objdump • Code does not execute – The disassembler parses the file format and related code sections – Good disassemblers do deep recursive analysis to ensure proper instruction disassembly • Allows the user the ability to look at what code will do without actually running it • Does not allow the ease of live disassembly/debugging – Viewing registers – Inspecting the contents of memory

File Formats

File Formats

What are file formats? • Files that adhere to a specific format often being

What are file formats? • Files that adhere to a specific format often being executable by an operating system • Executable files are created from source code and libraries by a compiler • Data files can be created by anything from a text editor to an mp 3 encoder

Executable Contents • Machine code – Instructions the program will run – Memory locations

Executable Contents • Machine code – Instructions the program will run – Memory locations • code addresses • function addresses • Program data – Static variables – Strings • Loader data – Imports – Exports

Sections • Allows the loader to find various information • Not finite, executables can

Sections • Allows the loader to find various information • Not finite, executables can have user defined sections

Executable Formats • ELF – Executable and Linker Format – History Originally published by

Executable Formats • ELF – Executable and Linker Format – History Originally published by UNIX system laboratories as a dynamic, linkable format to be used in various UNIX platforms – What uses ELF • Linux • Solaris • Most modern BSD based unix’s – Dissection • Header • Sections

ELF Header • The header contains various information the operating system loading needs e_ident

ELF Header • The header contains various information the operating system loading needs e_ident – Contains various identification fields including Endianess, ELF version, Operating System e_type – Identifies the object file type including relocatable, executable, or core file e_machine – Contains the processor type including Intel 80386, HPPA, Power. PC e_version – Contains the file version information e_entry - Contains the entry point for the executable e_phoff – Contains the program files header offset in bytes e_shoff – Contains the section header offset e_flags – Contains the processor specific flags e_ehsize – Contains the ELF header size in bytes

ELF Sections • Each section of an ELF executable contain various information needed to

ELF Sections • Each section of an ELF executable contain various information needed to execute. bss - This section holds uninitialized data that contributes to the program's memory image. By definition, the system initializes the data with zeros when the program begins to run. . comment - This section holds version control information. . ctors - This section holds initialized pointers to the C++ constructor functions. . data - This section holds initialized data that contribute to the program's memory image. . data 1 - This section holds initialized data that contribute to the program's memory image. . debug - This section holds information for symbolic debugging. The contents are unspecified. . dtors - This section holds initialized pointers to the C++ destructor functions. . dynamic - This section holds dynamic linking information.

ELF Sections Cont…. dynstr - This section holds strings needed for dynamic linking, most

ELF Sections Cont…. dynstr - This section holds strings needed for dynamic linking, most commonly the strings that represent the names associated with symbol table entries. . dynsym - This section holds the dynamic linking symbol table. . fini - This section holds executable instructions that contribute to the process termination code. When a program exits normally the system arranges to execute the code in this section. . got - This section holds the global offset table. . hash - This section holds a symbol hash table. . init - This section holds executable instructions that contribute to the process initialization code. When a program starts to run the system arranges to execute the code in this section before calling the main program entry point. . interp - This section holds the pathname of a program interpreter. If the file has a loadable segment that includes the section, the section's attributes will include the SHF_ALLOC bit. Otherwise, that bit will be off. . line - This section holds line number information for symbolic debugging, which describes the correspondence between the program source and the machine code. The contents are unspecified.

ELF Sections Cont…. note - This section holds information in the ``Note Section'' format

ELF Sections Cont…. note - This section holds information in the ``Note Section'' format described below. . plt - This section holds the procedure linkage table. . rel. NAME - This section holds relocation information. By convention, ``NAME'' is supplied by the section to which the relocations apply. Thus a relocation section for. text normally would have the name. rel. text. rodata - This section holds read-only data that typically contributes to a nonwritable segment in the process image. . rodata 1 - This section holds read-only data that typically contributes to a nonwritable segment in the process image. . shstrtab - This section holds section names. . strtab - This section holds strings, most commonly the strings that represent the names associated with symbol table entries. . symtab - This section holds a symbol table. If the file has a loadable segment that includes the symbol table, the section's attributes will include the SHF_ALLOC bit. Otherwise the bit will be off. . text - This section holds the ``text'' or executable instructions, of a program.

Executable Formats Cont… • PE – Portable Executable – History Microsoft migrated to the

Executable Formats Cont… • PE – Portable Executable – History Microsoft migrated to the PE format with the introduction of the Windows NT 3. 1 operating system. It is based of a modified form of the UNIX COFF format – What uses PE • • • Windows NT Window 2000 Windows XP Windows 2003 Windows CE – Dissection • DOS Stub – The DOS stub contains a message that the executable will not run in DOS mode • Optional Header (Not optional] • RVA – Relative virtual addressing • Sections

Optional Header • The optional header in a PE executable contains various information regarding

Optional Header • The optional header in a PE executable contains various information regarding the executable contents needed for the OS loader Size. Of. Code - Size of the code (text) section, or the sum of all code sections if there are multiple sections. Address. Of. Entry. Point – Address of the entry function to start execution from Base. Of. Code - RVA of the start of the code relative to the base address Base. Of. Data – RVA of the start of the data relative to the base address Section. Alignment – Alignment of sections when loaded into memory File. Alignment – Alignment of section on disk Size. Of. Image - Size, in bytes, of image, including all headers; must be a multiple of Section Alignment Size. Of. Headers - Combined size of MS-DOS stub, PE Header, and section headers rounded up to a multiple of File. Alignment. Number. Of. Rva. And. Sizes - Number of data-dictionary entries in the remainder of the Optional Header. Each describes a location and size.

Sections • The sections in a PE file contain various pieces of the executable

Sections • The sections in a PE file contain various pieces of the executable needed to run including various RVA’s and offsets. text – Contains all executable code. idata – Contains imported data such as dll addresses. edata – Contains any exported data – Contains initialized data like global variables and string literals. bss – Contains un-initialized data. rsrc – Contains all module resources. reloc – Contains relocation data for the OS loader

Data Formats • Different than executable formats – Doesn’t usually contain machine code –

Data Formats • Different than executable formats – Doesn’t usually contain machine code – Has structure but not always defined sections • A reverser often needs to reverse how a file format functions – Proprietary formats are not always published – Reversing allows compatibility (i. e. Microsoft doc) • Data rights management – Often the only way to get what you pay for is to take action

Assembly Language

Assembly Language

What is it • Lowest level of programming (besides microcode) • Direct processor register

What is it • Lowest level of programming (besides microcode) • Direct processor register access utilizing architecture defined instructions • Output of most compilers

How is it used • Directly using an assembler – NASM – ml –

How is it used • Directly using an assembler – NASM – ml – as • Output by a high level compiler – GCC – cl

What does it looks like • Depends on the instruction set – IA 32

What does it looks like • Depends on the instruction set – IA 32 • mov eax, 0 x 1 – PA-RISC • copy %r 14, %r 25 – ARM • LDR r 0, [r 8]

Instruction Sets • The mneumonics for the opcodes handled by the processor • Minimal

Instruction Sets • The mneumonics for the opcodes handled by the processor • Minimal set of “commands” that achieve a programming goal

Different Instruction Set Architectures • RISC - Reduced Instruction Set Computing – Fixed length

Different Instruction Set Architectures • RISC - Reduced Instruction Set Computing – Fixed length 32 bit instructions – 32 general purpose registers – Vendors • IBM (Power. PC) • HP (PA-RISC) • Apple (Power. PC) • CISC - Complex Instruction Set Computing – – Multibyte instructions Multiple synonymous opcodes 16 registers Vendors • Intel (IA-32) • DEC [PDP-11] • Motorola (m 68 K)

Registers and the Stack

Registers and the Stack

Overview • Purpose – Registers are used to store temporary data • Pointers •

Overview • Purpose – Registers are used to store temporary data • Pointers • Computations – The stack is used to manage data • Variables • Data

Stack Layout • Stack is dynamic but builds as it goes • Addresses start

Stack Layout • Stack is dynamic but builds as it goes • Addresses start at a higher address and builds to lower addresses • The stack is generally allocated in 4 byte chunks

Register sizes • Register sizes depend on the supported architecture – 32 bit –

Register sizes • Register sizes depend on the supported architecture – 32 bit – 64 bit • IA 32 – 16 registers 32 bits (4 bytes) each • RISC – 32 general purpose registers 64 bits [8 bytes] each

IA 32 Registers • EBP – Stack frame base pointer – Points to the

IA 32 Registers • EBP – Stack frame base pointer – Points to the start of the functions stack frame • ESP – Stack source pointer – Points to the current (top) location on the stack • EIP – Instruction pointer – Points to the next executable instruction

IA 32 Registers Cont… • General Purpose registers • Segment registers • EFLAGS –

IA 32 Registers Cont… • General Purpose registers • Segment registers • EFLAGS – – – – Used in general computation and control flow EAX – Accumulator register EBX – General data register ECX – Counter register EDX – General data register ESI – Source index register EDI – Destination index register – – – – Used to segment memory and compute addresses CS – Code segment register SS - Stack segment register DS - Data segment register ES - Extra (More data) segment register FS - Third data segment register GS – Fourth data segment register – CF – Carry Flag – SF – Signed Flag – ZF – Zero Flag

Overview of IA-32 Instruction Set • mov – Moves source to destination • lea

Overview of IA-32 Instruction Set • mov – Moves source to destination • lea – Loads effective address • jmp – Jump – jne – Jump if not equal – jg – Jump if greater than • • • call – Unconditional function call ret – Returns from a function to the caller add – Adds two values sub – subtracts two values xor – XORs two values cmp – Compares two registers

Calling conventions define how the callers data is arranged on the stack • cdecl

Calling conventions define how the callers data is arranged on the stack • cdecl – Most common calling convention – Dynamic parameters – Caller unwinds stack • pop ebp • ret • fastcall • stdcall – Higher performance – First two parameters are passed over registers – Common in Windows – Parameters are received in reverse order – Function unwinds stack • ret 0 x 16

Example PUSH MOV CMP JNZ EBP, ESP DWORD PTR [EBP+C], 111 00401054 ; Pushes

Example PUSH MOV CMP JNZ EBP, ESP DWORD PTR [EBP+C], 111 00401054 ; Pushes the contents of EBP onto the stack ; Moves the address of ESP to EBP ; Subtract what is at EBP+12 with 111 ; If previous compare is not zero jump to 00401054 MOV EAX, DWORD PTR [EBP+10] ; Move what is at EBP+16 to EAX CMP AX, 64 ; Subtract what we moved to EAX with 64 JNZ 00401068 ; If the comparison does not equal 0 jump to address POP EBP ; Store the current value on the stack in EBP RET ; Return to the caller

Olly. Dbg

Olly. Dbg

Overview • Purpose – Olly. Dbg is a general purpose win 32 user land

Overview • Purpose – Olly. Dbg is a general purpose win 32 user land debugger. The great thing about it is the intuitive UI and powerful disassembler • Licensing – Olly. Dbg is free (shareware), however it is not open source and the source code is not available • Extensibility – Olly. Dbg has defined a plugin architecture allowing extensibility via powerful plugins

Window Layouts • Window layouts are the various parts of the UI that contain

Window Layouts • Window layouts are the various parts of the UI that contain pertinent information – Code window – Displays the executable machine code – Register window – Allows the user to watch the contents of each register during execution – Memory window – Allows the user to view the contents of various memory locations – Stack window – Displays the stack, including memory addresses and values

Working in Olly. Dbg • Navigation – Moving – Searching • Commenting – Can

Working in Olly. Dbg • Navigation – Moving – Searching • Commenting – Can be entered in the code window with the ; or : keys • Listing Names – The names window displays all functions or imported functions used in the program – Listing them is easy via the shortcut Ctrl + N • Showing Memory – Displaying memory can be useful when looking for strings or other important data – Displaying the memory map window can be achieved via Alt + M

Working in Olly. Dbg Cont… • Breakpoints – Breakpoints allow the debugger to stop

Working in Olly. Dbg Cont… • Breakpoints – Breakpoints allow the debugger to stop at a specified address or instruction – There are two types of breakpoints in general • Software breakpoints – Handled by the operating system – Set by navigating to the specified address and hitting F 2 • Hardware breakpoints – Handled by the processor – Set by finding a place in memory you want to break on access and right clicking selecting the proper option – Olly also provides a way to view and turn on and off breakpoints via the breakpoints window with Alt + B

Working in Olly. Dbg Cont… • Controlling Execution – Starting the process • Once

Working in Olly. Dbg Cont… • Controlling Execution – Starting the process • Once the target program is either loaded or attached in Olly you can start execution. This will actually set up an initial breakpoint at the application entry point – There are several ways you can proceed from the entry point • Single stepping – Executes one instruction at a time and can be achieved by hitting F 7 – Steps into every function – Tedious as fuck • Execute until return – Executes until the ret instuction is encoutered which can be achieved by hitting Ctrl + F 9 – Executes all instructions in the current function – Faster than single stepping but not as comprehensive

Working in Olly. Dbg Cont… • Watching execution – Registers • Handled in the

Working in Olly. Dbg Cont… • Watching execution – Registers • Handled in the register window • Red highlighting indicates a register has changed – Stack • Handled in the stack window • Display can be address or relative address from ebp • Call stack – Displays the functions the current function has been called from – Can be displayed with the shortcut Alt + K

Olly. Dbg Case Study* (smarty word for demo) • Example – Program displays a

Olly. Dbg Case Study* (smarty word for demo) • Example – Program displays a popup box – Goal is to make the proper box show and exit • Patching – Allows us to modify the executable assembly code and save it to a new file with the changes

Olly. Dbg Plugins • Olly. Dbg provides a downloadable PDK for plugin development •

Olly. Dbg Plugins • Olly. Dbg provides a downloadable PDK for plugin development • Several plugins exist that provide extra usability – Heap Vis – Breakpoint manager – Ollyscript

IDA Pro

IDA Pro

Overview • IDA Pro was originally designed as a powerful disassembler • Supports 30+

Overview • IDA Pro was originally designed as a powerful disassembler • Supports 30+ processors • It has since been broadened to include a built in debugger • Designed for reverse engineers with quickness and robustness in mind – This sometimes makes the learning curve step • Extensible plugin architecture and scripting language

Window Layouts • Customizing window layouts – Each saved session will store any customized

Window Layouts • Customizing window layouts – Each saved session will store any customized layouts – A default layout can also be saved – Customized layouts are provided to help the user with workflow and can consist of any combination or number of windows

Navigation • Shortcuts – Most actions have equivalent shortcuts associated with them – Some

Navigation • Shortcuts – Most actions have equivalent shortcuts associated with them – Some of the most used • [Enter] – Jumps into the function under the cursor • [Esc] – Returns to the previous cursor position • Jumping – IDA allows the user to jump to various parts of a binary file easily – Some of the jumps • Entry point – Jumps to the entry point of the binary • By name – Allows the user to jump to a specific function or string in the binary • By address – Allows the user to jump to a specific address • Markers – Markers can be used to tag locations in the binary for future reference – Markers are set using Alt + M and naming – Jumping to a marker is easily achieved with Ctrl + M

Editing • Comments – Comments allow you to organize and document important parts of

Editing • Comments – Comments allow you to organize and document important parts of the binary – Comments can be entered using the shortcut keys ; or : • Function names can be renamed to something more descriptive – Often times symbols are not available for the binary and naming each functions allows you to understand track your work – Functions can be renamed using the shortcut Alt + P

Windows • IDA View – Displays the disassembled binary • Hex View – Display

Windows • IDA View – Displays the disassembled binary • Hex View – Display the hex view of the current cursor position • Names – The names windows displays textual names and addresses in the binary • Strings – The strings window contains any ascii strings present in the executable • Imports – The imports window contains the imported functions from dll’s • Functions – The functions window allows you to view all functions and their addresses

Graphing • IDA Pro has a powerful graphing engine that allows a user to

Graphing • IDA Pro has a powerful graphing engine that allows a user to visualize call graphs and xrefs – Flow chart graphs display the current functions machine code and any branches – Function call graph will display the call flow of all the functions in the executable (Can be large) – Xref graphs display the to and from xrefs with machine code

SDK/Plugins • The SDK allows the user to develop plugins for use in IDA

SDK/Plugins • The SDK allows the user to develop plugins for use in IDA Pro • Plugins are generally written in C/C++ and compiled against the SDK libraries and headers • Using the plugins you can write – processor modules – input processing modules – plugin modules • Some good plugins – x 86 emu – Allows ida to do runtime emulation – IDAPython – Access the IDA API in Python – Processes Stalker – Allows visualization and run time tracing

Flirt • Fast Library Identification and Recognition Technology • Flirt is a means for

Flirt • Fast Library Identification and Recognition Technology • Flirt is a means for IDA Pro to identify imported functions and compilers by matching against a database of known signatures • This greatly speeds up analysis by automatically naming discovered functions • Only works with C/C++ functions

IDC Scripting • The IDC scripting engine allows the user to achieve small tasks

IDC Scripting • The IDC scripting engine allows the user to achieve small tasks through the IDC scripting engine • IDC resembles C and has many helpful functions built in – Patch. Byte – Comment – Find. Code

Decompiling

Decompiling

Overview • Decompiling is different than disassembling in that it tries to reconstruct machine

Overview • Decompiling is different than disassembling in that it tries to reconstruct machine code to readable (and ultimately compilable) source code – Native compiled code is difficult to reconstruct because of the compilers behavior when optimizing the produced code – Virtual machine code is much easier to achieve readable code because of its nature. It must be compiled into a intermediate language with all necessary information the target platform may need to run • . Net • Java

. Net • . Net is compiled down into MSIL (Microsoft intermediate language) and

. Net • . Net is compiled down into MSIL (Microsoft intermediate language) and is a good example of decompiling • . Net must provide the operating system with a wealth of information including symbol names, and data structures

Native code • Native code is a language that has been compiled down into

Native code • Native code is a language that has been compiled down into machine language • Often times because of optimization a compiler inadvertently obfuscates the higher lever source code • Decompiling is not quite to the point of producing a good representation of the original source code

Decompilers • . Net – ILDasm – Remotesoft Salamander – Reflector for. Net •

Decompilers • . Net – ILDasm – Remotesoft Salamander – Reflector for. Net • Java – JODE – JAD (Disappeared) • Native – Boomerang

Decompilation Demo Thanks fend 3 r!

Decompilation Demo Thanks fend 3 r!

Conclusion • Reverse engineering is a vast and complex world • With a lot

Conclusion • Reverse engineering is a vast and complex world • With a lot of practice though it becomes much easier • A good reverser knows their tools inside and out • Workflow and organization are the keys to reversing

Shirt Quiz • • Name the IA-32 registers What does. Net assemble into In

Shirt Quiz • • Name the IA-32 registers What does. Net assemble into In Olly. Dbg how do you list the Names What is the IA-32 instruction to Compare two integers How does the IA-32 processor handle signedness What does the IDC scripting language resemble How many processors does IDA support (roughly) In IDA how do you quickly follow a CALL

References • • • • Reversing - http: //www. wiley. com/Wiley. CDA/Wiley. Title/product. Cd

References • • • • Reversing - http: //www. wiley. com/Wiley. CDA/Wiley. Title/product. Cd 0764574817. html ELF File format - http: //www. skyfree. org/linux/references/ELF_Format. pdf PE File Format - http: //msdn. microsoft. com/library/default. asp? url=/library/enus/dndebug/html/msdn_peeringpe. asp http: //lsd-pl. net/references. html Olly. Dbg - http: //ollydbg. de/ Olly. Dbg Plugins - http: //ollydbg. win 32 asmcommunity. net/stuph/ IDA Pro - http: //www. datarescue. com/ IDC - http: //www. datarescue. com/idadoc/707. htm IDA Plugins - http: //home. arcor. de/idapalace/ Reflector - http: //www. aisto. com/roeder/dotnet/ JODE - http: //jode. sourceforge. net/ Boomerang - http: //boomerang. sourceforge. net/ Crackmes. de - http: //www. crackmes. de/

Fucking done. Questions?

Fucking done. Questions?