Data Engineer Associate Certification (July 25, 2025 version)


🧰 1. Data Engineer Associate Certification (July 25, 2025 version)

Exam domains & weights are based on the updated guide published for exams taken on or after July 25, 2025 ().

Domain 1: Databricks Intelligence Platform (≈10%)

  • Understand Databricks architecture (control plane vs data plane)
  • Workspace components: notebooks, clusters, Repos, magic commands
  • Git integration via Repos & version control
  • Compute types: serverless vs interactive clusters, selection strategies
  • Platform UI: query optimizers, performance/compute selection advantages
    Hands-on: Create and manage Repos, launch clusters (including serverless), explore the UI features.

Domain 2: Development & Ingestion (≈30%)

  • Data ingestion using Spark SQL and PySpark
  • COPY INTO, Auto Loader, schema inference, handling complex types: JSON, structs, arrays
  • SQL DML (INSERT, MERGE, UPSERT, INSERT OVERWRITE), view creation
  • User-defined functions (UDFs) in SQL and PySpark
  • Databricks Connect to develop locally while executing on remote clusters (, , )
    Hands-on: Load JSON/XML and CSV into Delta using COPY INTO and Auto Loader; write UDFs; run local code via Databricks Connect.

Domain 3: Data Processing & Transformations (≈31%)

  • Multi-hop ETL architecture: Bronze → Silver → Gold layers
  • Delta Lake internals: ACID transactions, schema evolution, time travel, versioning
  • Table maintenance: VACUUM, OPTIMIZE, ZORDER, Cloning
  • Change data capture (CDC) and COPY INTO
  • Declarative pipeline building via Delta Live Tables (DLT): LIVE vs STREAM, error handling
  • Managed vs external tables; DDL & DML operations in Delta
    Hands-on: Build a full DLT pipeline; practice MERGE, OPTIMIZE, time travel; partition and Z‑order tables.

Domain 4: Productionizing Data Pipelines (≈18%)

  • Databricks Workflows & Jobs: multi-task DAGs, task dependencies, parameterization
  • Scheduling with CRON, retries, alerts and notifications
  • CI/CD integration via Repos, Asset Bundles (DAB) deployment workflows (, , )
    Hands-on: Orchestrate a multi-step job, configure retries and alerts, deploy a pipeline via Asset Bundles.

Domain 5: Data Governance & Quality (≈11%)

  • Unity Catalog components: catalogs, schemas, tables, privileges
  • Role-based access control: grants, service principals, SCIM
  • Secure clusters, object controls, metadata management
  • Data quality concepts: expectations, constraints, validation rules
  • Delta Sharing for external data collaboration across organizations (, )
    Hands-on: Set up Unity Catalog hierarchy, assign permissions, enable Delta Sharing, create data quality constraints.

Leave a Comment