🧰 1. Data Engineer Associate Certification (July 25, 2025 version)
Exam domains & weights are based on the updated guide published for exams taken on or after July 25, 2025 ().
Domain 1: Databricks Intelligence Platform (≈10%)
- Understand Databricks architecture (control plane vs data plane)
- Workspace components: notebooks, clusters, Repos, magic commands
- Git integration via Repos & version control
- Compute types: serverless vs interactive clusters, selection strategies
- Platform UI: query optimizers, performance/compute selection advantages
Hands-on: Create and manage Repos, launch clusters (including serverless), explore the UI features.
Domain 2: Development & Ingestion (≈30%)
- Data ingestion using Spark SQL and PySpark
- COPY INTO, Auto Loader, schema inference, handling complex types: JSON, structs, arrays
- SQL DML (INSERT, MERGE, UPSERT, INSERT OVERWRITE), view creation
- User-defined functions (UDFs) in SQL and PySpark
- Databricks Connect to develop locally while executing on remote clusters (, , )
Hands-on: Load JSON/XML and CSV into Delta using COPY INTO and Auto Loader; write UDFs; run local code via Databricks Connect.
Domain 3: Data Processing & Transformations (≈31%)
- Multi-hop ETL architecture: Bronze → Silver → Gold layers
- Delta Lake internals: ACID transactions, schema evolution, time travel, versioning
- Table maintenance: VACUUM, OPTIMIZE, ZORDER, Cloning
- Change data capture (CDC) and COPY INTO
- Declarative pipeline building via Delta Live Tables (DLT): LIVE vs STREAM, error handling
- Managed vs external tables; DDL & DML operations in Delta
Hands-on: Build a full DLT pipeline; practice MERGE, OPTIMIZE, time travel; partition and Z‑order tables.
Domain 4: Productionizing Data Pipelines (≈18%)
- Databricks Workflows & Jobs: multi-task DAGs, task dependencies, parameterization
- Scheduling with CRON, retries, alerts and notifications
- CI/CD integration via Repos, Asset Bundles (DAB) deployment workflows (, , )
Hands-on: Orchestrate a multi-step job, configure retries and alerts, deploy a pipeline via Asset Bundles.
Domain 5: Data Governance & Quality (≈11%)
- Unity Catalog components: catalogs, schemas, tables, privileges
- Role-based access control: grants, service principals, SCIM
- Secure clusters, object controls, metadata management
- Data quality concepts: expectations, constraints, validation rules
- Delta Sharing for external data collaboration across organizations (, )
Hands-on: Set up Unity Catalog hierarchy, assign permissions, enable Delta Sharing, create data quality constraints.