Understood. I’ll stop producing the deck.
Below is an extensively detailed, structured report capturing everything we discussed—facts, numbers, technical work, the narrative arc, and the “deep dive hooks” you can use verbally. I’m writing this as something you can hand to Claude Code (or use yourself) to generate a Marp deck.
I’ll also ground a couple details in the documents you shared: the Rippling interview guide/department screen expectations and Gang Wang’s background.
Project Deep Dive Report — Pinterest Cloud Cost Attribution Platform (Datamart)
0) Interview context (what Rippling is explicitly asking for)
Rippling’s Department Screen / Project Deep Dive is a 60-minute technical deep dive where you’re expected to (a) explain the system at a high level, then (b) go deep on components/decisions/tradeoffs, and (c) be able to discuss technical decisions in depth even if you didn’t personally make every single one. They recommend focusing on technical complexity over organizational complexity, choosing something recent, high-scale, and relevant, and having crisp boundaries on what you did vs others.
1) Audience: Gang Wang (why this story fits him)
Gang Wang is SVP, Finance Products at Rippling; background includes building financial engines and data systems at Stripe (payments methods, GPTN chief architect / head of financial data), Intuit (core TurboTax engine for tens of millions), etc. He’s deeply technical and will care about data semantics, correctness, auditability-style thinking, modeling boundaries, and operational reliability.
This Pinterest story maps well because it’s effectively a financial infrastructure system: attributing large-scale spend across internal “customers,” consumed by executives, tied to budgeting, and judged on trust, timeliness, and operational integrity.
2) One-line framing (the thesis)
Pinterest’s cloud cost attribution platform—responsible for allocating ~$700M/year of cloud spend across multi-tenant infrastructure—had become business-critical, but its data engineering foundations were not strong enough to sustain trust, timeliness, or safe change. I introduced modern modeling discipline (dbt) company-wide, built the internal platform glue to adopt it inside Pinterest’s constraints, and drove an end-to-end reliability and velocity turnaround.
(You can soften/strengthen as needed, but that’s the core.)
3) Your role and scope (facts, no fluff)
-
You joined Pinterest ~11 months ago as a Staff+ engineer in IDSE (Infrastructure Data Science & Engineering), which later split into a data engineering/webapp team and a separate data science team.
-
The team owns the Datamart: a collection of pipelines that slice/dice AWS billing logs and join them with usage data from internal multi-tenant platforms:
- Spark platform (MONARCH)
- compute platform (Pincompute)
- KV store (Xenon)
- plus other “platforms,” including treating S3 as a multi-tenant platform
-
Datamart supports 12 onboarded platforms and (your estimate) around ~60 data pipelines.
-
The system correlates budgets with actual money spent by team, and Superset reports are consumed at the C-level.
-
Annual cloud computing costs around $700M.
4) System overview (how Datamart works)
4.1 Inputs and cadence
- AWS billing data: CUR ingested once or twice per day.
- Platform usage data: typically daily (one delivery/day from platform teams).
- Budgets: historically entered via Excel uploaded to S3, then converted/loaded (you described this as an ugly/unsafe workflow).
4.2 Multi-tenant cost model
-
For each platform, costs are split:
- some portion stays “in the belly” of the platform team
- some portion is allocated to consuming teams
-
This is driven by a chargeback model built by data science:
- usage units + usage rates determine allocation.
4.3 Pipeline structure (the common shape across platforms)
You described a consistent 4–5 layer pipeline pattern:
- Staging: filtered AWS CUR / raw data subset per platform.
- Processing: join staging with usage data from platform teams.
- Fact: enrich with additional dimensions for grouping and analysis.
- Reporting: first-level aggregates for dashboards.
- Budgeting: join aggregates with budgets.
Implementation details:
- SQL-heavy approach: large DDL statements; giant SQL with many CTEs;
INSERT OVERWRITE PARTITIONpatterns (Hive-style partition management). - Dependencies built via sensors on Hive partitions and external data, with pipelines depending on each other to establish “platform health.”
4.4 Data lifecycle / “delivery ID”
- The system supports backfilling and re-instantiation (“automatically re-instatement date”).
- You created (or at least highlighted and leaned on) a synthetic data delivery ID each time new data arrived from any source—used to reason about versioning / reproducibility of derived outputs across multiple upstream deliveries.
5) What you found on arrival (system health + why it mattered)
You described the system as fundamentally failing at its job:
5.1 Trust failure
- Stakeholders did not trust the data.
- At least one instance: spend reports were presented to the CFO and were wrong.
5.2 Operational failure
- ~200 PagerDuty pages/month (later corrected to “pages,” not “incidents”).
- Frequent SLA breaches: about one major incident per month where the 2-day SLA was missed by about 2× (multi-day delay).
5.3 Engineering failure modes (structural, not tactical)
-
Transformations were primarily done in templated SQL, huge monolithic queries (30 CTEs / 500 lines, etc.).
-
Testing was manual/integration-only, no modern best practices.
-
A “custom validation framework” existed, but:
- thresholds were poorly understood
- it generated noise
- it wasn’t easy to map failures to the right owner/team.
-
Pinterest Airflow fork (Spinner) was used, but the team had:
- “tens of custom airflow components” to submit queries
- lots of copy/paste operators
- duplicated logic across platforms.
-
Alerting/paging:
- based on callbacks that sent emails
- relied on PagerDuty email integration
- regex-based parsing to infer which platform/pipeline succeeded/failed.
-
Dependency structure:
- heavy reliance on sensors and partitions
- fragile coupling made it hard to isolate root causes quickly.
Bottom line: you believed the problems were fundamentally data engineering (modeling/testing/contracts/operability), not something fixable with small tactical patches.
6) Your intervention: the actual work (what you did)
This is the meat. I’m listing it as a set of “workstreams” that together produced the turnaround.
6.1 Company-wide introduction of dbt (not just org-wide)
- Key point you emphasized: dbt was not used anywhere in Pinterest at the time—not “in the org,” but in the entire company.
- You recognized the core issue: transformations were all SQL, but without dbt’s modeling discipline and testability.
- You began introducing dbt to the team and rewriting parts of the Datamart as dbt models.
Why this mattered
- dbt gave the team a real, structured modeling workflow: modular transformations, explicit dependencies, metadata, tests, and a language for “model contracts.”
6.2 Spinner constraints and why off-the-shelf dbt orchestration didn’t work
-
Pinterest runs into capacity constraints in AWS data centers and relies on a heavily optimized orchestration environment.
-
Spinner (Pinterest’s Airflow fork) includes significant custom scheduling/queueing optimizations:
- tiered workflows that can be delayed if critical workflows run long
- queuing system / capacity management behavior
-
Because of this, you could not simply use “standard” dbt orchestration tooling like Astronomer Cosmos.
6.3 Partnership with Data Platform team: dbt-on-Spinner pilot
- You did not do this as a solo side project; you worked with the platform DATA team, and under your pressure they created a pilot program to run dbt pipelines on Spinner.
- They built (or agreed to build) a custom DAG renderer that:
- parses the dbt manifest
- produces an Airflow/Spinner DAG automatically.
Your direct technical contribution
-
Because everything is in a monorepo and you had specific needs, you extended the DAG renderer to:
- incorporate dbt model metadata for additional platform features your team needed
- specifically mentioned: supporting Iceberg WAP.
-
Important nuance: this renderer approach means:
- dbt adapter is not the execution mechanism for running transformations inside Spinner
- instead, the system uses Pinterest’s existing operators / job submission patterns
- dbt tests are converted/mapped to Pinterest’s custom data quality operator.
This is a key “platform glue” contribution: bringing a standard OSS workflow into a proprietary execution environment.
6.4 Developer velocity: Presto adapter wrapper for real-environment iteration
- Pinterest has Presto connected to the same Iceberg catalog as Moka (your Spark execution environment).
- Developers authenticate via an Envoy sidecar on their machines.
- You created a Pinterest Presto adapter that wraps the official dbt-presto adapter.
- This enabled:
- defining views
- certain one-off materializations
- particularly for workflows that happen rarely, e.g. annual budget planning.
6.5 Developer velocity: local DuckDB environment from sampled production data
You said this was a game changer.
-
You wrote a script to create a local DuckDB database by sampling production data.
-
With this environment, engineers could:
- run dbt models locally
- run dbt tests locally
- filter by dbt tags to run the right subset
-
Net effect: two orders of magnitude improvement in developer speed (≈ 100× faster feedback loops).
This is a high-signal engineering contribution: changed the “economics” of making changes safely.
6.6 Operational reliability: replacing email-based PagerDuty paging with API-driven routing
The old state:
- email-based callbacks
- PD email integration
- regex parsing headers to infer pipeline/platform status
- fragile and hard to use to route ownership.
Your change:
-
You worked with a junior engineer to build a custom Airflow callback handler that uses the PagerDuty API directly.
-
Constraints:
- Pinterest runs a custom Airflow version < 2.6 (can’t use the modern native integration/operators)
- needed to use the low-level PD client (PagerDuty hook)
-
Implementation details you called out:
- incident keys
- PagerDuty service orchestration
-
Outcome:
-
clear routing of failures to the right team:
- if a failure is in upstream platform data, page/assign the owning platform team
- if failure is in your attribution logic or your tests, it routes to your team
-
better classification because dbt models/tests created boundaries and made it easier to identify where the break occurred.
-
This is crucial for “trust + operability”: paging has to map to ownership.
6.7 Data model maturity: from “no logical model” to real logical ERD + principled modeling
You explicitly said:
- When you asked for the logical data model, they couldn’t provide one.
- They responded with autogenerated physical ERDs and “weird” physical workarounds (multiple date materializations purely to support partitioning/hive efficiency).
Your intervention:
- You coached the team on what a logical model is.
- You designed a logical ERD to guide the physical model.
The influential paper
You wrote a paper titled like “A New Logical Model for the Datamart” (your phrasing). Key contributions of that paper:
- Introduced abstraction of cost centers because the system was misusing a service/catalog identifier as a proxy for cost center (many rows had a value that wasn’t actually a valid service/catalog entity).
- Introduced the idea of making the system independent from the aggregation strategy/level/hierarchy.
- Introduced a split into two systems:
- system that performs the first layers (staging + attribution joining + tagging)
- separate system for aggregation + budgets that is canonical and decoupled.
Canonical “usage records” fact
You described the canonical fact table:
-
usage_amount
-
usage_unit
-
usage_cost
-
link to either:
- lower-level platform attribution
- or AWS CUR for “root platforms”
-
plus metadata/dimensions for grouping and analysis.
This is extremely relevant to a Finance Data Platform interview, because it’s basically “ledger-adjacent” thinking (even if not formal ledgering).
6.8 Reporting layer changes (Superset dashboards)
- You replaced existing Superset dashboards with new dashboards built on the improved canonical model.
- You also noted that previously dashboards were curated by analysts via point-and-click; you believed that was risky for critical reporting and moved logic “into the governed layer.”
(For interview tone: avoid framing as “putting analysts out of job”; describe it as “moving business logic out of BI configuration into governed models.”)
6.9 Additional responsibilities (web app / budget tool)
You mentioned a parallel scope:
- Team also managed a web app called “infra budget tool” for budget transfers.
- You were asked to develop budget planning capability for 2026 planning.
- You were handed a contractor who couldn’t code in Python.
- Codebase issues: no unit tests, no pyright, etc. You suggested mentioning this briefly, but keeping the focus on data engineering.
7) Quantitative outcomes (the numbers you want in the story)
You gave a lot of key metrics; consolidating:
7.1 Reliability / trust
- PagerDuty pages: ~200/month → ~5/month
- SLA: previously frequent breaches; roughly one “large” incident/month where SLA was missed by ~2× (multi-day delay). After migration, no SLA breaches (you said “Never had an SLA breach” post-migration).
7.2 Delivery velocity
- Historical migration/onboarding: around 1 year per platform (in the old implementation).
- Your migration: 6 platforms in ~6 months with 2 people, including setup work.
- New onboarding: now onboarding a new platform takes ~1 month, and you’re adding a new platform not in the old implementation.
7.3 Scale / breadth
- 12 platforms total; 6 migrated (50%).
- Each platform touches ~10–12 teams (upstream/partner teams) in some way.
- Spinner scale: “thousands of jobs/day” and about 150 active engineers.
7.4 Developer productivity
- Local DuckDB workflow improved development speed by about two orders of magnitude (≈ 100×).
8) Organizational scaling (what you want present, but in the right tone)
You said you didn’t want “organizational influence” as a cringy slide title—but you also explicitly stated these as facts:
- Infra governance space is migrating broader data pipeline development to dbt.
- You are the dbt champion in the org (and effectively at the company level given it didn’t exist before).
- Your dbt-on-Spinner approach is a reusable pattern beyond your team.
For SVP tone: treat this as “enablement + standardization” rather than “influence.”
9) Critical technical deep-dive hooks (what an SVP will probe)
These are the likely probing directions, phrased as the kinds of questions you should be ready to answer. This is where your details matter.
9.1 Data semantics / correctness
- What exactly is the canonical grain of your usage record?
- How do you avoid double attribution across platform layers?
- How do you handle partial allocation (platform team keeps some costs, consumers get some)?
- How do you handle late or missing upstream usage data?
- How do you ensure backfills don’t corrupt downstream aggregates?
- What’s the role of the “data delivery ID”? How is it used?
(You said reconciliation wasn’t useful; the safe framing is: you rely on deterministic recomputation from canonical facts + invariant checks/quality tests rather than a separate “ledger reconciliation subsystem.”)
9.2 Modeling workflow
- How did you structure dbt models (staging/processing/fact/marts)? What conventions?
- How did you pick tests? What’s the taxonomy (uniqueness, not null, referential, freshness, threshold tests)?
- How did you map dbt tests to Pinterest’s DQ operator?
- How do tags map to subsets of pipelines (for local + CI)?
9.3 Orchestration integration
- Why couldn’t you use Cosmos? What were the precise constraints?
- How does the dbt manifest translate into a Spinner DAG?
- How do you handle dependencies, retries, partial failures?
- How do tiered workflows affect dbt graph execution?
9.4 Iceberg / WAP
- What does WAP require in your environment?
- How did you integrate WAP semantics into the generated DAG?
- What does “publish” vs “stage” mean for your tables?
9.5 Developer velocity and safe change
- DuckDB sampling: how do you sample representatively?
- How do you ensure local tests reflect production pitfalls?
- What does the local loop look like end-to-end (dbt run/test subsets, tags, fixtures)?
9.6 Paging and ownership routing
- What’s the routing logic? How do you classify upstream vs internal?
- How do incident keys map to platform/pipeline?
- How do you avoid noisy paging? (test tuning, classification, ownership)
9.7 Forward path (StarRocks)
- Why StarRocks for gold aggregates?
- How do you plan to materialize aggregates while keeping Iceberg canonical?
- External tables strategy + dbt pipeline with StarRocks provider.
10) Clean “what you did vs what others did” (for credibility)
This is the crisp ownership split, as described:
You
- Identified dbt as the missing modeling discipline and introduced it company-wide.
- Trained/upskilled the team in dbt + modeling/testing practices.
- Extended the dbt→Spinner DAG renderer (monorepo integration, metadata features, Iceberg WAP support).
- Built the Presto adapter wrapper for dev and one-off workflows.
- Built the DuckDB local sampled environment + scripts + tagging workflows.
- Designed the logical ERD and authored the influential “new logical model” paper (cost center abstraction; attribution vs aggregation separation).
- Drove operational improvements: PagerDuty API callback handler (with junior engineer), incident routing, service orchestration.
- Replaced dashboards to align with governed models.
Data Platform team (partnership)
- Partnered on the dbt-on-Spinner pilot and the initial custom DAG renderer approach.
Others (existing team)
- Prior system existed; contractor + analysts performed ongoing operations/dashboard curation before you moved logic into governed layers.
11) Tone and phrasing recommendations (since you explicitly care)
Keep the story extremely technical and mature:
- “The system evolved organically and lacked modular boundaries.”
- “We lacked a repeatable modeling/testing workflow.”
- “Failures were difficult to localize; paging didn’t map to ownership.”
- “We moved logic into governed models and made changes safe.”
You can still be blunt, but blunt in a senior way:
- “Exec reports were unreliable. That’s unacceptable for a system allocating $700M/year.”