Practical Introduction to Reverse Engineering Julio Auto julio

  • Slides: 41
Download presentation
Practical (Introduction to) Reverse Engineering Julio Auto <julio. auto *a* gmail>

Practical (Introduction to) Reverse Engineering Julio Auto <julio. auto *a* gmail>

Agenda Part I - 101 Why this presentation? (I mean. . . WHY? !?

Agenda Part I - 101 Why this presentation? (I mean. . . WHY? !? !) A few concepts (Mumble jumble++) Demo (Show me the goods) Part II - 1337 Advancing RE (Do your own!) Something extra (Finish pretty) Linkz, lulz, refz, and shoutz Q & (maybe) A

Why? Initially suggested by the H 2 HC crew Based on my article ‘Cracking

Why? Initially suggested by the H 2 HC crew Based on my article ‘Cracking Crack. Mes’, published earlier this year while working for my previous employer, Scanit ME RE is getting lots of attention, and many people seem interested in learning it Still, it remains largely a black art

Why? (2) It seems, then, that moving up from ground zero is the most

Why? (2) It seems, then, that moving up from ground zero is the most problematic step This presentation tries to help fix it It aims to expose instant useful knowledge And pointers to where go digging deeper Instead of advanced research _results_, basic _techniques_ and _processes_ Obs. : We’ll be targeting the Windows platform most of the time in this speech

Concepts Reverse Engineering is a very self-explicative term You take something and, from there,

Concepts Reverse Engineering is a very self-explicative term You take something and, from there, try to learn how (some aspect of) it was engineered It’s also obviously broad For example, it’s often used to describe the process through which you generate a higher-level, architectural view of a piece of software given its source code

My Own Concept Think of the times you asked yourself “why” and “how” and

My Own Concept Think of the times you asked yourself “why” and “how” and let it go without an answer. . . RE is not letting go

A Few Applications Malware Analysis Vulnerability Analysis Security Assessment of 3 rd-party COTS Evaluation/Breaking

A Few Applications Malware Analysis Vulnerability Analysis Security Assessment of 3 rd-party COTS Evaluation/Breaking of copy-protection schemes Assorted how’s and why’s

Why Still a Black Art? Perhaps because people think it’s only good for SW

Why Still a Black Art? Perhaps because people think it’s only good for SW cracking Perhaps because DRM has become a nightmare no one is happy with and related laws everywhere bash reversers too hard every now and then (does anybody remember Dmitry Sklyarov, the DMCA and all that madness? ) Perhaps because many people still think it should be illegal (wtf? !)

How To Learn The Crack-Me approach The one I illustrate in the paper I

How To Learn The Crack-Me approach The one I illustrate in the paper I mentioned Small and targeted challenges with different levels and obstacles to choose from The real life approach Choose a real-world problem and attack it Tough but rewarding We’ll demo a bit of both

Tools of The Trade Probably millions of tools that can give you some useful

Tools of The Trade Probably millions of tools that can give you some useful piece of info about your target I’ll try to restrict myself to the most relevant/common, then Unfortunately, many of the best tools are commercial On the other hand, many of them have free/student/evaluation versions For the rest. . . Well, remember “the real life approach”? ; )

Debuggers Obvious importance Fairly good variety It’s nice to play and know your way

Debuggers Obvious importance Fairly good variety It’s nice to play and know your way with all of them But mastering them all is quite hard, so you’ll most likely elect your debugger of choice in little time Choose your debugger well!

Debuggers (2) Win. Dbg My personal choice of debugger Developed by MSFT Comes for

Debuggers (2) Win. Dbg My personal choice of debugger Developed by MSFT Comes for free in the “Debugging Tools for Windows” package Amazingly rich in features Extensible with some C++ programming Not the easiest or simplest dev environment Very rich API, though Poor interface

Debuggers (3) Visual Studio Debugger It’s crap, not suited for reversing But it’s pretty

Debuggers (3) Visual Studio Debugger It’s crap, not suited for reversing But it’s pretty and nice for developers : > Seriously, don’t try to go very far reversing with it It may use up the rest of your sanity

Debuggers (4) Olly. Dbg Enjoys quite a lot of popularity in the reversing community

Debuggers (4) Olly. Dbg Enjoys quite a lot of popularity in the reversing community Nice interface In particular, a nice disassembly view Comes in a few “tuned” versions, being one of the most popular. . .

Debuggers (5) Immunity Debugger Developed by Immunity Inc. (one of u. Con’s proud sponsors)

Debuggers (5) Immunity Debugger Developed by Immunity Inc. (one of u. Con’s proud sponsors) Extends Olly. Dbg with a python interpreter and exposes a couple of debugging modules for the user to interact with Very neat plugin support Embeds a command-line with windbg-aliased commands Maintains a forum to support developers/users of Imm. Dbg plugins

Debuggers (6) gdb The standard debugger on *NIX systems Quite complete debugger Not the

Debuggers (6) gdb The standard debugger on *NIX systems Quite complete debugger Not the best thing in the RE world, but overall a good debugger

Disassemblers Reading assembly is not the sweetest thing for most people The way the

Disassemblers Reading assembly is not the sweetest thing for most people The way the code is represented is extremely important and makes an increasingly great difference in big RCE tasks Therefore, being confortable with your disassembler is essential

Disassemblers (2) Pretty much every debugger is capable of disassembling Apart of that, there’s

Disassemblers (2) Pretty much every debugger is capable of disassembling Apart of that, there’s lots of other tools that can do it too In Linux, objdump is pretty much a standard tool However, one particular tool is specially known for its disassembly features

Disassemblers (3) IDA Pro Supports many binary formats and architectures Displays the code in

Disassemblers (3) IDA Pro Supports many binary formats and architectures Displays the code in graphs, which greatly enhance the visualization Block-level CFGs Many things can be customized/adjusted Graph layout, data types, annotations. . . Quite frankly, it’s in every reverser’s toolkit IDA Pro is a commercial tool currently in version 5. 4 But version 4. 9 is available in a free edition

System Monitoring Tools All of those from the Sys. Internals Suite Process Explorer Reg.

System Monitoring Tools All of those from the Sys. Internals Suite Process Explorer Reg. Mon File. Mon TCPView Etc. . .

Advanced Tools Binary Diff’ers Bin. Diff Decompilers Hex-Rays RE Frameworks ERESI ; ) Pai.

Advanced Tools Binary Diff’ers Bin. Diff Decompilers Hex-Rays RE Frameworks ERESI ; ) Pai. Mei and all the Py. Things

Demo We’ll try and beat a crack-me challenge This crack-me was taken from a

Demo We’ll try and beat a crack-me challenge This crack-me was taken from a real competition HITB Dubai 2007 CTF Perhaps it can serve as a tip for u. Con’s CTF as well

RE – Advanced Topics Cutting to the chase, advancing RE basically means automating stuff

RE – Advanced Topics Cutting to the chase, advancing RE basically means automating stuff Many of the RE tools are scriptable/programmable/extensible Developing smart ways to deal with repetitive tasks is the way for more effective analyses

RE – Advanced Topics (2) Less often, you might see opportunities to advance RE

RE – Advanced Topics (2) Less often, you might see opportunities to advance RE in ways not based on automation Defeating a new anti-debug trick Developing new environments for RE Virtualization, Sandboxing. . . Or even radically changing paradigms E. g. The graph-based approach to binary navigation

RE – Advanced Topics (3) Perhaps the most important lesson here is not to

RE – Advanced Topics (3) Perhaps the most important lesson here is not to reinvent the wheel Re-use the tools you have! You’ll be amazed at how much stuff you can do by “glueing” pieces together Having that said. . . Perhaps the tools you have are not perfect Or you might wanna re-do something just for learning But be sure to have the right goals in mind!

Teaching By Example I will demonstrate how you can use advanced RE to solve

Teaching By Example I will demonstrate how you can use advanced RE to solve real life problems The main idea behind the “re-use” thing I mentioned in the previous is slide is too keep your solution simple, by focusing on the logics itself rather than in the engineering Unfortunately, what I’m about to show is actually a bad example in this aspect (more on this later)

Problem Suppose you have ways to reproduce a high-profile, possibly exploitable bug – Yay!

Problem Suppose you have ways to reproduce a high-profile, possibly exploitable bug – Yay! BUT. . The target is closed-source software The target is as large and complex as an operating system – and way less documented The input is huge and has a complex, possibly undisclosed format The source of the bug can be anywhere in the input From user-input to actual bug/crash, about 3 million instructions happen

WHAT DO YOU DO? ?

WHAT DO YOU DO? ?

Introducing LEP tries to answer a big question in this problem: What exact part

Introducing LEP tries to answer a big question in this problem: What exact part of this input is causing the bug? If you can answer this question and somehow co-relate this with the input format, you may gain a great deal of understanding of the bug For this, I have invented a new technique: “Staged Partial Tracing-Based Backwards Taint Analysis” Because not sounding like a Ph. D. is so 2001 : > And also because we all just love new terms we can go media-cuckoo about

Introducing LEP (2) One-liner idea: If we know when our input is brought to

Introducing LEP (2) One-liner idea: If we know when our input is brought to memory and know where it’s mapped, we can trace the program from this point to the crash and then go backwards analyzing the dataflow to find out where the faulting data came from We do it in two stages, with a component for each: the tracer and the analyzer Simple, huh?

Fundamental Concepts When we trace the program, it becomes “linear”, i. e. control-flow is

Fundamental Concepts When we trace the program, it becomes “linear”, i. e. control-flow is irrelevant Dataflow becomes concretely deterministic Aliasing is not an issue (no need to theorize on side- effects) All info we need is available in runtime In particular, effective addresses If the input is as big as the problem states, it should be no problem to find it in memory We get most of the info we need from the disassembly text (ASCII)! It’s like hacking with grep again!

LEP Tracer A Win. Dbg extension Traces every instruction until the program raises an

LEP Tracer A Win. Dbg extension Traces every instruction until the program raises an exception Dumps the following instruction info to a file: Mnemonic Destination operand Source operand Dependences of the source op – e. g. mov eax, [ecx+edx*2]

LEP Tracer (2) Discards control-flow changing instructions Discards in/out instructions (all relevant input should

LEP Tracer (2) Discards control-flow changing instructions Discards in/out instructions (all relevant input should be in memory already? ) Discards other groups of instructions that will be supported as we go FPU, MMX, SSE{2, 3}, etc. . . Tries to parse the right info even when the debugger is too stupid to work as expected Why not to compute effective addresses in rep’ed instructions?

LEP Analyzer Reads the file generated by the tracer and goes bottom- up investigating

LEP Analyzer Reads the file generated by the tracer and goes bottom- up investigating the dataflow You have to specify the piece of data that causes the last instruction to fail – usually (always? ) a register And the memory range(s) where your input was mapped into, at the time the trace was taken Ignores register “slices” for simplicity (al || ah) == ax == eax == rax

LEP Analyzer (2) When the source operand of a given instruction is an immediate/constant,

LEP Analyzer (2) When the source operand of a given instruction is an immediate/constant, LEP tries it best to evaluate whether it _transforms_ or _overwrites_ the destination If it overwrites, we finish the analysis for this branch mov eax, deadf 0 f 0 h Else if it transforms, we keep looking for another def of the same destination operand inc eax This gives a very special meaning for LEP’s existence Otherwise, searching for occurences of the faulting data inside the input could be just as effective LEP also tries to identify non-obvious constant overwrites xor eax, eax

Engineering Tech-Talk LEP was intended to be written entirely in Python Didn’t work for

Engineering Tech-Talk LEP was intended to be written entirely in Python Didn’t work for performance reasons LEP Tracer is written in C++, since it’s a Win. Dbg extension It makes use of a reference of the x 86 instruction set written in XML by Maze. Gen The XML is mapped to C++ using Code. Synthesis’ XSD XML Data Binding LEP Analyzer was firstly written in Python Then I also re-wrote it in C++ LEP Analyzer’s search algorithm was initially a DFS Then I implemented it as a BFS

Demo II • Placeholder slide : >

Demo II • Placeholder slide : >

Linkz & Refz Cracking Crack. Mes http: //www. scanit. net/rd/wp/wp 04 X 86 Opcode

Linkz & Refz Cracking Crack. Mes http: //www. scanit. net/rd/wp/wp 04 X 86 Opcode and Instruction Reference, by Maze. Gen http: //ref. x 86 asm. net/ Code. Synthesis XSD – XML Data Binding for C++ http: //www. codesynthesis. com/products/xsd/ Thousands of elite RE projects http: //www. google. com Seriously though, contact me if you can’t find anything

Greetz & Shoutz Filipe Balestra for lending me the bug used in the 2

Greetz & Shoutz Filipe Balestra for lending me the bug used in the 2 nd demo H 2 HC crew for inspiring me to do this work u. Con Crew for having the elitest con ever Everybody in the room for coming The ERESI team, with whom I have most of my discussions about RE, programa analysis, etc All of the great people that I know from the security scene It’s simply impossible to mention each and everyone of you, but you know who you are!

Questions?

Questions?

Practical (Introduction to) Reverse Engineering Julio Auto <julio. auto *a* gmail>

Practical (Introduction to) Reverse Engineering Julio Auto <julio. auto *a* gmail>