Databricks Components


Databricks Components Hierarchy

1. Account Level (Top Layer)

  • Account Console – central place to manage everything across workspaces.
  • Workspaces – logical environments where teams work.
  • Unity Catalog (Metastore) – unified governance across all workspaces.

2. Governance & Data Management

  • Unity Catalog
    • Catalogs → top container of data assets.
    • Schemas (Databases) → inside catalogs.
    • Tables → structured data (Managed / External).
    • Views → logical queries on tables.
    • Volumes → for non-tabular data (images, PDFs, etc.).
    • Models → ML models registered.
    • Functions → SQL or Python-defined functions.
    • Lineage → track where data comes from and how it’s used.
  • Access Control
    • Users → individual identities.
    • Groups → manage permissions collectively.
    • Service Principals → for apps/automation.
    • ACLs (Access Control Lists) → fine-grained permissions.
    • Personal Access Tokens (PATs) → authentication for APIs.

3. Computation & Execution

  • Clusters
    • All-purpose clusters → interactive, shared by users.
    • Job clusters → spin up just for a job, then shut down.
    • Pools → pre-warmed instances to reduce cluster spin-up time.
    • Databricks Runtime (DBR) → core software stack (Spark + optimizations).
      • DBR for Machine Learning (ML/DL libraries pre-installed).
      • DBR for Genomics, SQL, etc.
  • Jobs & Pipelines
    • Jobs UI → scheduling & automation of notebooks, SQL, scripts.
    • Lakeflow Declarative Pipelines → manage Delta tables with orchestration.
    • Workflows → CI/CD style orchestration.
  • Workloads
    • Data Engineering → ETL, batch jobs.
    • Data Analytics → interactive queries, dashboards.
    • Machine Learning → model training/inference.
    • Streaming → real-time with Structured Streaming.

4. Developer Interfaces

  • Workspace UI → notebooks, data, clusters, jobs, dashboards.
  • Notebooks → code in Python, SQL, R, Scala.
  • Dashboards → visual insights.
  • Git Folders (Repos) → version control integration.
  • Libraries → attach external or custom libraries.
  • Catalog Explorer → browse data assets.
  • APIs & Tools
    • REST API → programmatic access.
    • SQL REST API → SQL automation.
    • CLI → Databricks command line tool.
    • dbutils → utility commands inside notebooks.

5. Data & AI Layers

  • Delta Lake (Default Table Format)
    • Delta Tables
    • Delta Transaction Logs (ACID)
    • Time Travel, Schema Evolution
  • Lakehouse Storage Pattern
    • Bronze → Raw data
    • Silver → Clean/curated data
    • Gold → Business-ready data
  • AI & ML (Mosaic AI)
    • MLflow → experiment tracking, model registry.
    • Feature Store → reusable features for ML.
    • Generative AI (LLMs) → foundation models, fine-tuning.
    • AI Playground → test LLMs interactively.
    • Model Serving → REST API for deploying models.

In one line:

  • Account Console (top) → WorkspacesUnity Catalog (Governance)Data Assets (Tables, Schemas, Models, Volumes) → Compute (Clusters, Jobs, Pipelines)Developer Interfaces (Notebooks, APIs, CLI)AI/ML & Analytics Tools.

Leave a Comment