Fact Tables and Dimension Tables
Data warehouse modeling relies on two primary types of tables: fact tables and dimension tables.
Fact Tables
A fact table stores measurable business events (e.g., sales, transactions) and contains foreign keys to dimension tables.
Important
Fact tables are immutable: they represents facts and we don’t rewrite the history
Example fact table:
CREATE TABLE fact_sales (
sale_id SERIAL PRIMARY KEY,
sale_date_id INT NOT NULL REFERENCES dim_date(date_id),
customer_id INT NOT NULL REFERENCES dim_customer(customer_id),
product_id INT NOT NULL REFERENCES dim_product(product_id),
store_id INT NOT NULL REFERENCES dim_store(store_id),
sales_amount DECIMAL(10,2) NOT NULL
);Dimension Tables
Dimension tables store descriptive attributes related to facts. They are typically denormalized for faster lookups.
Important
Dimension tables are mutable, i.e. a customer can relocate, see example below
Example dimension tables:
CREATE TABLE dim_date (
date_id SERIAL PRIMARY KEY,
full_date DATE NOT NULL,
year INT NOT NULL,
month INT NOT NULL,
day INT NOT NULL,
day_of_week TEXT NOT NULL
);
CREATE TABLE dim_customer (
customer_id SERIAL PRIMARY KEY,
customer_name TEXT NOT NULL,
region TEXT NOT NULL
);
CREATE TABLE dim_product (
product_id SERIAL PRIMARY KEY,
product_name TEXT NOT NULL,
category TEXT NOT NULL
);Star Schema
A star schema consists of a single central fact table connected to denormalized dimension tables.
+-----------------+
| dim_date |
+-----------------+
|
+-----------------+ +-----------------+ +-----------------+
| dim_customer |-->| fact_sales |<--| dim_product |
+-----------------+ +-----------------+ +-----------------+
|
+-----------------+
| dim_store |
+-----------------+
Tip
A star schema try to strike a balance between data usage on disk and ease of usage and aggregation speed. Since dimensions are denormalized, the analysts only need to join fact tables with dimensions.
Snowflake Schema
A snowflake schema further normalizes dimension tables by breaking them into smaller tables.
Example: Instead of storing category in dim_product, it is moved to a separate dim_category table.
CREATE TABLE dim_category (
category_id SERIAL PRIMARY KEY,
category_name TEXT NOT NULL
);
CREATE TABLE dim_product (
product_id SERIAL PRIMARY KEY,
product_name TEXT NOT NULL,
category_id INT NOT NULL REFERENCES dim_category(category_id)
);Important
The snowflake schema, being even further normalized compared to the Star schema, require additional joins but save additional spaces
Partial Denormalization
A hybrid approach that balances performance and storage efficiency. Some dimension tables remain normalized, while others are denormalized.
Example:
CREATE TABLE fact_sales (
sale_id SERIAL PRIMARY KEY,
sale_date DATE NOT NULL, -- Instead of using dim_date, store directly
customer_id INT NOT NULL REFERENCES dim_customer(customer_id),
product_name TEXT NOT NULL, -- Store product_name directly
sales_amount DECIMAL(10,2) NOT NULL
);