In September 2024, Ed was assigned by his VP (Fraud Tech+) to lead Fraud Defense Authoring, the second most critical project in Fraud after Claims Modernization.
Capital One uses ML for Fraud Detection, but business analysts proactively identify fraud patterns and deploy early defenses based on SQL queries before ML models adapt.
The project involved migrating from EMR to Glue and moving away from Capital One-managed Airflow into a new orchestration framework.
This is a multi-year project that began in summer 2024, involving 10+ business analysts and a dozen engineers across multiple teams.
Major Issues Discovered:
Business analysts wrote SQL queries interactively in Databricks with no formal testing, relying only on manual inspection.
The execution relied on 5000+ line YAML files that defined sources, queries, transformations, and actions, making maintenance impossible.
Engineers proposed using Lambda functions for orchestration, but when testing Glue, they found increasing CPU/DPU made jobs slower—which made no sense.
When Ed reviewed the Spark UI, he discovered only one executor was running, caused by a hardcoded QA environment setting.
Engineers admitted they had never used the Spark UI before, revealing a severe knowledge gap in big data processing.
Task:
Ensure a successful migration from EMR to Glue while solving deeper architectural and process deficiencies.
Upskill engineers who were responsible for a big data project but lacked big data expertise.
Provide a sustainable authoring framework to replace the unmaintainable YAML-based approach.
Action:
Upskilled engineers, designing and leading training sessions on Spark UI debugging, Glue optimizations, and performance best practices.
Reviewed the authoring workflow and introduced an incremental migration to DBT, teaching engineers the benefits of modular query authoring.
Recommended using Glue Workflows instead of building a custom orchestration layer with Step Functions and Lambda functions, simplifying execution.
Proposed observability dashboards instead of an unrealistic custom UI, ensuring fraud defense authors had better execution insights.
Influenced leadership to rethink the entire lifecycle, preventing future inefficiencies and improving maintainability.
Result:
Engineers gained confidence and expertise in Spark and Glue, leading to better debugging and performance improvements.
The DBT migration plan was approved, improving modularity and maintainability of fraud defenses.
The Glue Workflow implementation reduced complexity, removing the need for custom orchestration.
Observability dashboards replaced the UI approach, improving visibility without unnecessary engineering overhead.
Fraud defense execution became more reliable, allowing business analysts to deploy defenses faster and more accurately.
The team inherited a complex big data project without proper experience, leading to confusion and ineffective decision-making.
Engineers were unsure how to optimize Spark jobs, misunderstood Glue performance scaling, and lacked confidence in debugging failures.
They relied on trial and error rather than structured debugging, making progress slow and unpredictable.
Transformation Through Mentorship:
Exposing the knowledge gap early (e.g., Spark UI revelation) helped Ed frame the problem as a learning opportunity rather than a failure.
Instead of fixing issues himself, Ed empowered the team by guiding them through real debugging sessions instead of just theoretical training.
He broke down improvements into incremental milestones, ensuring that engineers felt small wins along the way, building confidence.
Over time, engineers became proactive—they started checking the Spark UI themselves, optimizing their queries, and debugging failures without external help.
The shift from reactive (guessing solutions) to proactive (structured debugging and analysis) was the defining success metric of his mentorship efforts.
One key turning point was when a junior engineer who initially relied on trial-and-error debugging successfully diagnosed a complex query optimization issue in Glue by leveraging the Spark UI. This moment boosted team confidence and marked a major step in their transformation.
Lessons Learned:
Technical knowledge gaps can cripple a project, but mentorship can transform a struggling team into a high-functioning one.
Inspiration comes from empowerment—giving engineers the tools and knowledge to solve problems themselves rather than dictating solutions.
Breaking down learning into incremental wins is key to keeping a team motivated and engaged in a difficult transition.