A disk block (or sector) is the smallest unit of storage that a disk can read or write. Disks don’t access individual bytes directly but operate in fixed-size blocks, typically 512 bytes or 4 KB in size.
- Reading and Writing:
- Data on disk is organized into blocks, and every read or write involves transferring a full block between disk and memory.
- If you modify a single byte, the entire block containing that byte must be read into memory, updated, and written back.
- Alignment:
- Disk-based data structures, like trees, are organized so that nodes fit neatly into blocks, minimizing wasted space and unnecessary I/O.
Writing a Tree to Disk
When working with a tree (e.g., a binary tree or B-Tree), you’re dealing with nodes, keys, and pointers. These must be serialized into a format suitable for storage on disk.
Serialization of a Tree Node
To store a tree node on disk:
- Keys and Values:
- Serialize the keys and values in the node into a contiguous binary format.
- Pointers:
- Replace in-memory pointers (e.g.,
parent.leftorparent.right) with disk offsets (byte positions within the file).
- Replace in-memory pointers (e.g.,
Example
Consider a binary tree node with:
- Key: 42.
- Left Pointer: Points to a child node.
- Right Pointer: Points to another child node.
In memory:
Node = [42, left=0x7ffd1234, right=0x7ffd5678]On disk:
[Key: 42][Left Offset: 512][Right Offset: 1024]Here, 512 and 1024 are byte offsets within the file where the child nodes are stored.
Updating a Tree
When you add a new node:
- Write the new node to the next free block in the file.
- Update the parent node’s pointer to reference the new block (e.g., set
left=512). - Write the updated parent node back to disk.
Disk Writing and fseek
In low-level file operations (e.g., using fopen, fwrite, and fseek), you control where in a file data is written.
How Disk Writing Works
- Open the File:
- Use
fopento open the file where the tree is stored.
- Use
- Seek to the Position:
- Use
fseekto move the file cursor to the desired byte offset. - For example, to write to the block at offset
512:fseek(file, 512, SEEK_SET);
- Use
- Write the Data:
- Serialize the node into a binary format and write it:
fwrite(buffer, sizeof(buffer), 1, file);
- Serialize the node into a binary format and write it:
- Close the File:
- Use
fcloseto flush the data and release resources.
- Use
Example of Writing a Node
If you’re writing a node with:
- Key: 42.
- Left Offset: 512.
- Right Offset: 1024.
You might serialize it into a buffer like this:
unsigned char buffer[12]; // 4 bytes for key, 4 for left, 4 for right
memcpy(buffer, &key, 4);
memcpy(buffer + 4, &left_offset, 4);
memcpy(buffer + 8, &right_offset, 4);Then write the buffer:
fseek(file, 0, SEEK_SET); // Write to the root node position
fwrite(buffer, sizeof(buffer), 1, file);Challenges with Disk Pointers
-
Fixed Structure:
- Once a tree node is written, its disk offset is fixed. If you need to move the node (e.g., during a tree reorganization), you must update all pointers referencing it.
-
Fragmentation:
- Over time, free space on disk may become fragmented, making it harder to write contiguous blocks.
-
Crash Recovery:
- If the system crashes mid-update, you could end up with partially written nodes or invalid pointers. Systems use techniques like Write-Ahead Logs (WAL) to ensure durability.
Example: A Simple File-Based Tree
Here’s how a binary tree might look when stored in a file:
| Block Offset | Contents |
|---|---|
| 0 | Root Node: Key=42, Left=512, Right=1024 |
| 512 | Left Child: Key=17, Left=1536, Right=2048 |
| 1024 | Right Child: Key=84, Left=NULL, Right=NULL |
| 1536 | Left-Left Child: Key=9, Left=NULL, Right=NULL |
The tree is serialized into blocks, with each block representing a node. Pointers between nodes are stored as byte offsets within the file.
Summary
- Disk Blocks: The smallest unit of I/O, typically 4 KB.
- Serialization: Tree nodes are converted into a binary format, with pointers replaced by file offsets.
- fseek and fwrite: Control where data is written in the file, ensuring nodes are stored at predictable offsets.
- Efficiency: By aligning nodes with disk blocks, disk I/O is minimized, and the tree remains structured and accessible.
Let me know if you’d like to explore this further or if an example in a specific programming language would help!