Normal Reads (No Data Loss): You don’t need to read all segments. Typically, you only need to read the minimum number of fragments required to reconstruct the data. For example:
If the coding scheme is (6+3) (6 data fragments + 3 parity fragments), you only need 6 fragments (any combination of data and parity) to reconstruct the object.
This reduces network overhead and improves performance during normal operations.
Recovery Reads (Data Loss): If some fragments are lost (e.g., due to node failure), erasure coding uses the remaining fragments (data + parity) to mathematically reconstruct the missing ones.
How Erasure Coding Recovers Data Loss
Fragment Encoding:
During storage, the original data is split into fragments (e.g., D1, D2, D3, …) and parity fragments (P1, P2, …).
Parity fragments are created using mathematical algorithms like Reed-Solomon coding.
Example: For a (6+3) setup, you store 6 data fragments and 3 parity fragments, distributed across 9 nodes.
Data Loss Detection:
When a fragment is missing (e.g., due to a node failure), the system detects the absence during a read request.
Reconstruction:
The system retrieves the remaining fragments (e.g., 6 out of 9) and uses the parity fragments to reconstruct the missing data using the erasure coding algorithm.
Example:
Missing: D2
Retrieve: D1, D3, D4, D5, D6, P1
Use D1–D6 and P1 to calculate the value of D2.
Why Erasure Coding Is Efficient for Recovery
Mathematical Redundancy: Parity fragments allow reconstruction even with multiple missing fragments.
Lower Storage Overhead: Instead of replicating the entire object (as in replication), you only store parity fragments, reducing redundancy costs.