Disassembly is the process of converting machine code back into a human-readable representation of assembly instructions.
Tools for Disassembly
The most common tool for disassembly in Linux is objdump, which provides detailed information about a binary. To disassemble a binary, you can use the following options:
-D: Disassemble the entire binary, showing all sections.-b: Specify the binary format (e.g.,binaryfor raw binary files,elffor ELF files).-m: Define the target architecture (e.g.,i386for 32-bit x86,x86-64for 64-bit).-M: Provide additional options for the disassembler, such as output style. For example:intel: Use Intel syntax for x86 assembly.att: Use AT&T syntax for x86 assembly.-z: Include all sections, even those marked as empty.
Example command to disassemble a raw binary:
objdump -D -b binary -mi386 -Mx86-64 -Mintel -z shellcode.binInterpreting the Output
Disassembly output typically contains three key columns:
- Offsets (left column): These represent memory addresses relative to the start of the section or binary. They are useful for locating specific instructions in the code.
- Hexadecimal bytes (middle column): These are the raw machine instructions encoded as bytes. Each instruction is made up of a specific sequence of bytes based on the CPU architecture’s instruction set.
- Assembly instructions (right column): These are the human-readable assembly equivalents of the hexadecimal bytes.
For example:
00000000 <.data>:
0: 68 65 6c 6c 6f push 0x6f6c6c65
5: 20 77 6f and BYTE PTR [rdi+0x6f],dh
Here:
- The offset
0indicates the first instruction’s location in the.datasection. - The hexadecimal bytes
68 65 6c 6c 6fencode thepushinstruction. - The assembly instruction
push 0x6f6c6c65pushes the value0x6f6c6c65(ASCII for “hello”) onto the stack.
Sections and Their Role
Disassembly outputs are organized by sections, such as .text for code, .data for initialized variables, and .bss for uninitialized variables. Understanding sections helps identify which parts of the binary contain executable code versus static data. For example, instructions from the .text section are typically executed, while .data or .bss store variables.
When working with stripped binaries or Shellcodes, sections might not be present, as tools like objcopy remove metadata. In these cases, the disassembly starts directly from the raw instructions, often labeled generically as .data.
Assembly Encoding
Each CPU architecture has a defined instruction set architecture (ISA), which maps assembly instructions to specific byte sequences. For example:
- x86-64: The
pushinstruction for a 32-bit immediate value is encoded as0x68, followed by the value in little-endian format. - ARM: Uses a different encoding based on its reduced instruction set.
Instruction encoding tables for each architecture define how these bytes are constructed. Tools like objdump decode these bytes according to the ISA to produce the assembly output.