Apache Iceberg is a table metadata format that provides more flexibility than alternatives by decoupling the data layout on disk from its features. It works in the following way:

manifest files keep metadata for a single datafile
manifest lists keep track of lists of manifest files
metadata files keeps track of all the snapshots of the table, by having an array of manifests lists per each snapshot

When reading or writing, the query engine asks a catalog what is the current version of the table, and use the metadata file at the right version as an entrypoint.

Tip

For example in Hive, partitions match folders, making it impossible to change partitioning without rewriting entire tables

The following material provide deeper informations about Iceberg:

Apache Iceberg Key features
Apache Iceberg Catalogs

Additionally, you might want to deep dive in how to use Iceberg with popular platforms:

Apache Iceberg with Spark

Reference Material

Apache Iceberg The Definitive Guide

Edmondo's Vault

Explorer

Introduction to Apache Iceberg

Reference Material

Graph View

Backlinks