DWH Optimization with Dataform
98.5% cost reduction and 8x faster update speed on a legacy Data Warehouse.
Context
This project covers the complete restructuring of a BigQuery Data Warehouse, migrating a legacy and costly dbt pipeline to a native and efficient solution with Google Dataform.
The Problem
The previous architecture, based on dbt and Airflow, was unsustainable, with high and unnecessary processing costs.
Many tables were no longer in use but still consumed resources.
The 40-minute update time for the entire DWH impacted business agility.
The Solution & My Contribution
- I led the project to redesign the Data Warehouse from scratch.
- I implemented the Medallion Architecture (Bronze, Silver, Gold) to ensure data governance and quality.
- I used JavaScript within Dataform to modularize and reuse complex business rules.
- I ensured a secure and validated migration with no data loss.
Diagrama de alto nível da arquitetura da solução do DWH Optimization with Dataform
Results & Impact
- 8x reduction in DWH update time (from 40 to 5 minutes).
- Drastically reduced processing costs due to the elimination of obsolete tables and optimized processes.
- Significant improvement in data governance and maintainability.
Technologies Used
Technical Deep Dive (Illustrative Code)
Native Modeling in BigQuery
By adopting Dataform, we eliminated the dependency on Airflow to orchestrate dbt, leveraging BigQuery's native and free integration. This simplified the stack and drastically accelerated the update time from 40 to less than 5 minutes.
Modularization with JavaScript
Business rules that were repeated across multiple models were encapsulated in JavaScript functions, imported directly into the SQLX files. This increased maintainability and reduced code duplication.