Fact Tables and Dimension Tables

Data warehouse modeling relies on two primary types of tables: fact tables and dimension tables.

Fact Tables

A fact table stores measurable business events (e.g., sales, transactions) and contains foreign keys to dimension tables.

Important

Fact tables are immutable: they represents facts and we don’t rewrite the history

Example fact table:

CREATE TABLE fact_sales (
    sale_id SERIAL PRIMARY KEY,
    sale_date_id INT NOT NULL REFERENCES dim_date(date_id),
    customer_id INT NOT NULL REFERENCES dim_customer(customer_id),
    product_id INT NOT NULL REFERENCES dim_product(product_id),
    store_id INT NOT NULL REFERENCES dim_store(store_id),
    sales_amount DECIMAL(10,2) NOT NULL
);

Dimension Tables

Dimension tables store descriptive attributes related to facts. They are typically denormalized for faster lookups.

Important

Dimension tables are mutable, i.e. a customer can relocate, see example below

Example dimension tables:

CREATE TABLE dim_date (
    date_id SERIAL PRIMARY KEY,
    full_date DATE NOT NULL,
    year INT NOT NULL,
    month INT NOT NULL,
    day INT NOT NULL,
    day_of_week TEXT NOT NULL
);
 
CREATE TABLE dim_customer (
    customer_id SERIAL PRIMARY KEY,
    customer_name TEXT NOT NULL,
    region TEXT NOT NULL
);
 
CREATE TABLE dim_product (
    product_id SERIAL PRIMARY KEY,
    product_name TEXT NOT NULL,
    category TEXT NOT NULL
);

Star Schema

A star schema consists of a single central fact table connected to denormalized dimension tables.

              +-----------------+
              |   dim_date      |
              +-----------------+
                     |
+-----------------+   +-----------------+   +-----------------+
| dim_customer    |-->|  fact_sales      |<--| dim_product    |
+-----------------+   +-----------------+   +-----------------+
                     |
              +-----------------+
              |   dim_store     |
              +-----------------+

Tip

A star schema try to strike a balance between data usage on disk and ease of usage and aggregation speed. Since dimensions are denormalized, the analysts only need to join fact tables with dimensions.

Snowflake Schema

A snowflake schema further normalizes dimension tables by breaking them into smaller tables.

Example: Instead of storing category in dim_product, it is moved to a separate dim_category table.

CREATE TABLE dim_category (
    category_id SERIAL PRIMARY KEY,
    category_name TEXT NOT NULL
);
 
CREATE TABLE dim_product (
    product_id SERIAL PRIMARY KEY,
    product_name TEXT NOT NULL,
    category_id INT NOT NULL REFERENCES dim_category(category_id)
);

Important

The snowflake schema, being even further normalized compared to the Star schema, require additional joins but save additional spaces

Partial Denormalization

A hybrid approach that balances performance and storage efficiency. Some dimension tables remain normalized, while others are denormalized.

Example:

CREATE TABLE fact_sales (
    sale_id SERIAL PRIMARY KEY,
    sale_date DATE NOT NULL, -- Instead of using dim_date, store directly
    customer_id INT NOT NULL REFERENCES dim_customer(customer_id),
    product_name TEXT NOT NULL, -- Store product_name directly
    sales_amount DECIMAL(10,2) NOT NULL
);