Disassembly is the process of converting machine code back into a human-readable representation of assembly instructions.

Tools for Disassembly

The most common tool for disassembly in Linux is objdump, which provides detailed information about a binary. To disassemble a binary, you can use the following options:

  • -D: Disassemble the entire binary, showing all sections.
  • -b: Specify the binary format (e.g., binary for raw binary files, elf for ELF files).
  • -m: Define the target architecture (e.g., i386 for 32-bit x86, x86-64 for 64-bit).
  • -M: Provide additional options for the disassembler, such as output style. For example:
    • intel: Use Intel syntax for x86 assembly.
    • att: Use AT&T syntax for x86 assembly.
    • -z: Include all sections, even those marked as empty.

Example command to disassemble a raw binary:

objdump -D -b binary -mi386 -Mx86-64 -Mintel -z shellcode.bin

Interpreting the Output

Disassembly output typically contains three key columns:

  • Offsets (left column): These represent memory addresses relative to the start of the section or binary. They are useful for locating specific instructions in the code.
  • Hexadecimal bytes (middle column): These are the raw machine instructions encoded as bytes. Each instruction is made up of a specific sequence of bytes based on the CPU architecture’s instruction set.
  • Assembly instructions (right column): These are the human-readable assembly equivalents of the hexadecimal bytes.

For example:

00000000 <.data>:
   0:   68 65 6c 6c 6f          push   0x6f6c6c65
   5:   20 77 6f                and    BYTE PTR [rdi+0x6f],dh

Here:

  • The offset 0 indicates the first instruction’s location in the .data section.
  • The hexadecimal bytes 68 65 6c 6c 6f encode the push instruction.
  • The assembly instruction push 0x6f6c6c65 pushes the value 0x6f6c6c65 (ASCII for “hello”) onto the stack.

Sections and Their Role

Disassembly outputs are organized by sections, such as .text for code, .data for initialized variables, and .bss for uninitialized variables. Understanding sections helps identify which parts of the binary contain executable code versus static data. For example, instructions from the .text section are typically executed, while .data or .bss store variables.

When working with stripped binaries or Shellcodes, sections might not be present, as tools like objcopy remove metadata. In these cases, the disassembly starts directly from the raw instructions, often labeled generically as .data.

Assembly Encoding

Each CPU architecture has a defined instruction set architecture (ISA), which maps assembly instructions to specific byte sequences. For example:

  • x86-64: The push instruction for a 32-bit immediate value is encoded as 0x68, followed by the value in little-endian format.
  • ARM: Uses a different encoding based on its reduced instruction set.

Instruction encoding tables for each architecture define how these bytes are constructed. Tools like objdump decode these bytes according to the ISA to produce the assembly output.