Below is a concise, corrected, and more polished set of notes on Slowly Changing Dimensions (SCD) for your Obsidian vault. It covers Kimball-style SCD types, clarifies the difference between mini-dimensions (Type 4 in classic Kimball) and the “separate historical table” variant, and provides example DDL.


Slowly Changing Dimensions (SCD)

Slowly Changing Dimensions handle and preserve changes in dimension (descriptive) data over time for historical analysis (OLAP). They ensure you can accurately report on both current and past states of key attributes (e.g., customer info, product descriptions, etc.).

Impact of Different Schemas

  1. Star Schema

    • Denormalized dimension tables simplify SCD implementation.
    • Common to implement row-versioning (Type 2) or extra columns (Type 3) directly in a single dimension table.
  2. Snowflake Schema

    • Normalized dimension tables can make SCD updates more complex.
    • Changes can cascade through multiple related dimension tables.
  3. Fully Denormalized Schema

    • Often overwrites data in place to keep things simple.
    • Not ideal if you need detailed historical tracking.

Types of Slowly Changing Dimensions

Below are the main SCD types as originally defined by Ralph Kimball, plus a note on the alternative “separate historical table” approach.


SCD Type 0: Fixed Dimension

  • No changes are made after the initial load.
  • The data is static and never updated, often used for strict regulatory or archival purposes.
CREATE TABLE FixedDimension (
    customer_id INT PRIMARY KEY,
    customer_name VARCHAR(255),
    region VARCHAR(255)
    -- No updates permitted
);

SCD Type 1: Overwrite on Change

  • Overwrites old data with new.
  • No history is preserved.
  • Suitable if historical changes are irrelevant or misleading.
CREATE TABLE Customer_SCD1 (
    customer_id INT PRIMARY KEY,
    customer_name VARCHAR(255),
    region VARCHAR(255)
    -- On change, simply UPDATE region = 'New Value'
);

SCD Type 2: Versioned Rows

  • Most common approach to retain full history of attribute changes.
  • Inserts a new row on every change, with start_date/end_date or is_current flags.
  • Allows exact historical reporting over time.
CREATE TABLE Customer_SCD2 (
    customer_id INT,
    customer_name VARCHAR(255),
    region VARCHAR(255),
    start_date DATE,
    end_date DATE,
    is_current BOOLEAN,
    PRIMARY KEY (customer_id, start_date)
);
 
/*
Example usage:
1) Mark the old record as closed:
   UPDATE Customer_SCD2
   SET end_date = '2025-01-01', is_current = FALSE
   WHERE customer_id = 1 AND is_current = TRUE;
 
2) Insert new record:
   INSERT INTO Customer_SCD2(customer_id, customer_name, region, start_date, end_date, is_current)
   VALUES (1, 'John Doe', 'California', '2025-01-02', NULL, TRUE);
*/

SCD Type 3: Add New Columns

  • Maintains current and one previous value in the same row.
  • Good for limited historical comparisons (e.g., “previous region” vs. “current region”).
  • Not suitable for tracking more than one or two changes historically.
CREATE TABLE Customer_SCD3 (
    customer_id INT PRIMARY KEY,
    customer_name VARCHAR(255),
    current_region VARCHAR(255),
    previous_region VARCHAR(255)
);

SCD Type 4 (Kimball’s Classic Definition: Mini-Dimension)

  • Mini-dimension: Move frequently changing attributes into a separate, smaller dimension to reduce bloat in the main dimension.
  • For example, if “preferences” or “demographics” change often, store them in a separate mini-dimension; link via a foreign key in the main dimension or the fact table.
**Example (Mini-Dimension)**
-- Main dimension (stable attributes)
CREATE TABLE Customer_MainDim (
    customer_id INT PRIMARY KEY,
    customer_name VARCHAR(255),
    birthdate DATE,
    -- link to mini-dimension
    mini_dim_id INT,
    FOREIGN KEY (mini_dim_id) REFERENCES Customer_MiniDim(mini_dim_id)
);
 
-- Mini-dimension (fast-changing attributes)
CREATE TABLE Customer_MiniDim (
    mini_dim_id INT PRIMARY KEY,
    income_bracket VARCHAR(50),
    marketing_segment VARCHAR(50),
    effective_date DATE,
    end_date DATE
);

Alternative “Type 4” Variant: Separate Historical Table

  • Sometimes called “Type 4” in other references:
  • Instead of storing all versions in one dimension, keep the main dimension table minimal (current data only) and put historical rows in a separate table if not needed for every query.
CREATE TABLE Customer_Current (
    customer_id INT PRIMARY KEY,
    customer_name VARCHAR(255),
    region VARCHAR(255)
    -- Only stores the latest/current info
);
 
CREATE TABLE Customer_History (
    customer_id INT,
    customer_name VARCHAR(255),
    region VARCHAR(255),
    change_date DATE
    -- All historical records go here
);

SCD Type 5

  • Type 2 + Mini-Dimension.
  • Adds a “mini-dimension key” attribute to the Type 2 main dimension row.
  • Rarely used in simple scenarios; helpful when combining full history with a mini-dimension.

SCD Type 6: Hybrid (Types 1 + 2 + 3)

  • Combines Type 2 row versioning (full history) with a Type 3 column for “previous” (or sometimes “current”) value in the latest row.
  • You can see old vs. new in one row and still have a complete row history.
  • Typically has columns like current_<attribute>, previous_<attribute>, plus start_date, end_date, and is_current.
CREATE TABLE Customer_SCD6 (
    customer_id INT,
    customer_name VARCHAR(255),
    current_region VARCHAR(255),
    previous_region VARCHAR(255),
    start_date DATE,
    end_date DATE,
    is_current BOOLEAN,
    PRIMARY KEY (customer_id, start_date)
);
 
/*
Typical "change" flow:
1) Close the old row (Type 2 logic)
2) Overwrite the 'previous_region' in the new active row (Type 3 logic)
3) Always keep 'current_region' updated in the active row (Type 1 style)
*/

SCD Type 7

  • Less commonly cited, but in some references it’s Types 1, 2, and 3 combined on different sets of attributes.
  • For instance, some attributes might be purely overwritten (Type 1), while others require full row history (Type 2), and one or two require a “previous value” column (Type 3).

Example Fact Table and Queries

CREATE TABLE Sales (
    sale_id INT PRIMARY KEY,
    sale_date DATE,
    customer_id INT,
    amount DECIMAL(10, 2),
    FOREIGN KEY (customer_id) REFERENCES Customer_SCD2(customer_id)
    -- or Customer_SCD6, depending on which approach you use
);
  1. Current Sales by Region

    • Type 1 or Type 6 easily show the latest region without complicated joins.
    • Type 2 alone requires filtering for the current record (WHERE is_current = TRUE or end_date IS NULL).
  2. Transition Analysis (Old Region → New Region)

    • Type 3 or Type 6 can store previous_region on the same row for quick comparison.
    • Type 2 alone requires a self-join between consecutive versions.
  3. Full Historical Reporting

    • Type 2, Type 6 both have complete row-by-row versions.
    • Type 3 or Type 1 are insufficient if you need multiple past changes.

Quick Comparison

SCD TypePreserves Full History?Eases “Previous vs. Current”Schema ImpactUse Case
0NoN/AMinimalTruly static data (no changes allowed).
1NoOverwrites onlySimple (UPDATE in place)Attributes that aren’t needed historically.
2Yes (row versioning)Requires self-joinBigger dimension tableAny dimension needing complete historical snapshots.
3Limited (current + 1 prev)Yes, single row comparisonExtra columnsTrack only one or two historical states.
4Varies (mini-dim or separate table)VariesSplits out frequent changes or moves history to separate tableWhen you want to reduce dimension bloat or only occasionally need history.
5Yes, plus mini-dimComplexType 2 + mini-dimension keyAdvanced design for complex attributes.
6Yes (Type 2) + “previous”Yes, single row vs. historyAdditional columns + row versioningFull history + easy comparison. Ideal for many use cases.
7Yes, for selected attributesYes, for selected attributesVery flexible but complexDifferent SCD types for different columns.

Key Takeaways

  • SCD Type 2 is the go-to for full historical tracking.
  • SCD Type 6 (or 2 + 3) adds a “previous” attribute to the latest row, reducing the need for self-joins.
  • Mini-dimensions (Type 4 in Kimball) reduce bloat for frequently changing attributes.
  • Choose the approach that matches your business and analytical needs (simplicity vs. depth of history vs. frequent attribute changes).

Tip: In real-world data warehouses, you often mix and match. For instance, some columns in a dimension are Type 1 (e.g., “corrected” typos), others are Type 2 (full history), and maybe a single Type 3 column for a previous key attribute.