atlas' potentially braindead tips to deadlisting Introduction ------------ "deadlisting" is a term used to describe the analysis of disassembly code in somewhat raw form. For example, I recently ran a binary through disass-ng (which spits out disassembly along with some other helper information). The results are called "a deadlisting" and the act of my reviewing the ASM is called "deadlisting". Got it? Good. This article will address slightly more than deadlisting, but the focus is on tips for analyzing disassembly code. For those of you still confused, disassembly looks a lot like assembly, but comes from an already-compiled binary. Often this is of a binary which has been compiled from a high-level language like C or C++, which looks very different than what you or I would actually *write* in assembly. Understanding how a compiler writes assembly is a long and enjoyable learning curve, and is often best served by simple exposure. Baptism by submersion. Walking through disassembly, "thinking" (perhaps on paper) through the inner processing of the computer, asking others who know more (sorry to you l33t dudes... it's true). This article will discuss deadlisting tips which have proven useful to me. They are a bit of mental control and direction. They may not be useful to you... YMMV. One thing which I have realized over the past year is that Reverse Engineering (reversing) plays an important part of Vulnerability Discovery/hacking... but they are different skillsets. Reversing is about understanding how an application works. Hacking/bug-hunting is basically reversing with a particular goal... which is an additional skillset. The purpose of this article is *not* to teach you how Assembly works. There are many great books to teach that. We will discuss some of the idiosyncracies of the Intel x86 assembly, as compiled by common compilers (eg. gnu c compiler) For this article, we will be using at&t syntax, the default syntax of Unix asm :) Windows defaults to Intel syntax... at&t looks like this: 0xfffffff0(%ebp) (consider 0xfffffff0 as a signed integer.... -0x10) intel looks like this: [EBX-0x10] Meat ---- RevEng Lessons Learned: * Trace jmp's early in the process (jmp, je, jz, jl, jle, jg, jge, j...) At least get an idea which direction in the code you're jmping to and whether it jmps out of the current sub. * Determine loops and constraints Where do loops begin and end? What conditions continue the loop? * %ebp is the anchor of all local variables and parameters 0x8(%ebp) represents the first parameter of the current sub 0xfffffff0(%ebp) represents a local variable of size <= 0x10 bytes 0x8049ac8 points to a .bss location in memory. Think of this as a global variable. local variables often hold 4 byte pointers to heap variables, accessed like this: mov 0xfffffff8(%ebp), %eax ; pointer loaded into %eax eg. 0x80599aa mov 0x20, (%eax) ; places the byte 0x20 (which is a ) into memory location 0x80599aa * Focus NOT on the registers themselves... focus on what they represent It is important to understand how registers work and are used by the processor It is more important to understand what is going on overall... registers are temporary Registers are set by " , %" eg. mov 0x8(%ebp), %eax add 0x3, %eax The register "transaction" often completes with something like " %, " eg. mov %eax, 0xffffefc8(%ebp) ; 4 bytes in %eax get stored in local variable inc (%eax) ; memory location pointed to by %eax get's incremented by 1 * Knowing where a sub is called from can help determine what it does Short sub called by 20 other subs.... hmmm.. a really repetative sub. Maybe a parser of some sort? Called by each of 10 function forks based on byte three of some input... * Knowing what a sub calls can definitely help determine what it does It calls accept() hmmm.. Start the network protocol analysis here... It calls seteuid and setguid... Perhaps this is the "drop_privileges()" sub? It calls printf 10 times... obviously some sort of display function. * Sub calls are preceeded by the parameters being pushed onto the stack in reverse. For example: push %edi push %esi push %ebx call 804931c
is like calling: main(%ebx, %esi, %edi) (*example taken from "echod" from ctf/dc13) * (%ebp) points to the 4-byte memory location of the calling sub's %ebp pointer How's that for recursion!? * 0x4(%ebp) points to the sub's return pointer Nice to keep in mind for buffer overflows * GNU Developer MAN pages are your friend! Vuln Research (Hacking) Bug-hunting, Vulnerability Research, "hacking" or whatever name you call it, is the act of finding and exploiting programming flaws in existing software. Although many common programming errors currently exist, vulnerabilities can range from the unique to the infrequent to the common buffer overflow. The act of bug-hunting can be engaged in by anyone who understands how a target computer works at the machine-level (ASM counts). The state of "Bug Hunter", however, I reserve for the few, the proud, who have taken their reverse engineering status to the next level. Doctors undergo a great deal of training before they can be called "Dr." so-and-so. To become a specialist, doctors must then engage in yet more years of training. Bug Hunters have done this, by honing their RE Skillz for the express purpose of finding and exploiting vulnerabilities. Below are some tips I've found to be helpful for specifically binary reversing * Start out looking for ways to affect an application * environment variables? * file name? (some programs change their behavior based on the name they are called with) * PATH? current working directory? * accept() / recv()? * fopent() / read()? * Use a test system similar to your target, over which you have control (root-level access) - vmware is your friend! * OS versions do make a difference for things like stack arrangement, etc... * Poke and prod at the application ass things develop * disassemble and analyze the ASM * use a debugger to watch execution * use a test client for remote vulnerabilities * Fuzz - poke at the edges * memory buffers * short/integer/long boundaries * Format Strings? %x%n * etc * Recommendations for analysis and playing around: * file * strings * does it run? -h/--help * network client? * network listener/server? * suid/sgid? * How can I affect it? (see above) * Start at or around common input points accept() recv() read() * GNU developer MAN pages! They are a must! * look at common memory "stuff" m/c/alloc() local variables 0xfffffff0(%ebp) sub parameters 0x8(%ebp) * When I'm hacking an ELF binary I use have a 5-headed Konsole session going. * less disassambly.txt (for quick viewing and searching of the binary) * vi disassembly.txt (for commenting the binary, specifically vars and parms) * fuzzer/client/work * ssh into the target VM, running GDB * ssh into the target VM, doing miscellaneous stuff (like "netstat -na") (YMMV, perhaps you prefer a GUI editor. Perhaps a GUI debugger, etc...) * While learning, I have enjoyed printing off the deadlisting to paper, which I can then write all over, learning how the system is doing different things I've been trying to lessen my use of this, for the sake of speed and convenience (who takes a printer to ctf?) But this has lead to the best understanding in my experience Take it to the coffeeshop! (This is how I cracked stage7 last year) Take it camping! (My wife would have beat me had I brought the laptop... but 15 pages of "fucktcpd" she didn't seem to mind :) Take it to a wedding rehearsal! (like there's anything else to do there!) Take it to the beach! (perhaps you'll get in less trouble) * Study up on vulnerability types! * Hacking: The Art of Exploitation () * Reversing: Secrets of Reverse Engineering () * The Shellcoder's Handbook (Litchfield et al.) * Building Secure Software (Viega) Summary ------- Reverse Engineering is fun and addictive. Add it to your nightly schedule! Send me money :)