Databricks Components


Databricks Components Hierarchy

1. Account Level (Top Layer)

  • Account Console – central place to manage everything across workspaces.
  • Workspaces – logical environments where teams work.
  • Unity Catalog (Metastore) – unified governance across all workspaces.

2. Governance & Data Management

  • Unity Catalog
    • Catalogs → top container of data assets.
    • Schemas (Databases) → inside catalogs.
    • Tables → structured data (Managed / External).
    • Views → logical queries on tables.
    • Volumes → for non-tabular data (images, PDFs, etc.).
    • Models → ML models registered.
    • Functions → SQL or Python-defined functions.
    • Lineage → track where data comes from and how it’s used.
  • Access Control
    • Users → individual identities.
    • Groups → manage permissions collectively.
    • Service Principals → for apps/automation.
    • ACLs (Access Control Lists) → fine-grained permissions.
    • Personal Access Tokens (PATs) → authentication for APIs.

3. Computation & Execution

  • Clusters
    • All-purpose clusters → interactive, shared by users.
    • Job clusters → spin up just for a job, then shut down.
    • Pools → pre-warmed instances to reduce cluster spin-up time.
    • Databricks Runtime (DBR) → core software stack (Spark + optimizations).
      • DBR for Machine Learning (ML/DL libraries pre-installed).
      • DBR for Genomics, SQL, etc.
  • Jobs & Pipelines
    • Jobs UI → scheduling & automation of notebooks, SQL, scripts.
    • Lakeflow Declarative Pipelines → manage Delta tables with orchestration.
    • Workflows → CI/CD style orchestration.
  • Workloads
    • Data Engineering → ETL, batch jobs.
    • Data Analytics → interactive queries, dashboards.
    • Machine Learning → model training/inference.
    • Streaming → real-time with Structured Streaming.

4. Developer Interfaces

  • Workspace UI → notebooks, data, clusters, jobs, dashboards.
  • Notebooks → code in Python, SQL, R, Scala.
  • Dashboards → visual insights.
  • Git Folders (Repos) → version control integration.
  • Libraries → attach external or custom libraries.
  • Catalog Explorer → browse data assets.
  • APIs & Tools
    • REST API → programmatic access.
    • SQL REST API → SQL automation.
    • CLI → Databricks command line tool.
    • dbutils → utility commands inside notebooks.

5. Data & AI Layers

  • Delta Lake (Default Table Format)
    • Delta Tables
    • Delta Transaction Logs (ACID)
    • Time Travel, Schema Evolution
  • Lakehouse Storage Pattern
    • Bronze → Raw data
    • Silver → Clean/curated data
    • Gold → Business-ready data
  • AI & ML (Mosaic AI)
    • MLflow → experiment tracking, model registry.
    • Feature Store → reusable features for ML.
    • Generative AI (LLMs) → foundation models, fine-tuning.
    • AI Playground → test LLMs interactively.
    • Model Serving → REST API for deploying models.

In one line:

  • Account Console (top) → WorkspacesUnity Catalog (Governance)Data Assets (Tables, Schemas, Models, Volumes) → Compute (Clusters, Jobs, Pipelines)Developer Interfaces (Notebooks, APIs, CLI)AI/ML & Analytics Tools.

Related Posts

Strategic Cloud Financial Management With Certified FinOps Professional Training

Introduction The Certified FinOps Professional program is a transformative milestone for any engineer or manager looking to master the intersection of finance, technology, and business operations. This…

Read More

Professional Certified FinOps Engineer improves financial performance visibility systems

Introduction In the modern landscape of cloud infrastructure, technical expertise alone is no longer sufficient to drive enterprise success. The Certified FinOps Engineer program has emerged as…

Read More

Complete Cloud Financial Management Guide for Certified FinOps Manager

Introduction The Certified FinOps Manager program is designed to bridge the widening gap between cloud engineering and financial accountability. As cloud environments become more complex, organizations require…

Read More

Industry Ready FinOps Knowledge Through Certified FinOps Architect Program

Introduction The Certified FinOps Architect certification is designed to help professionals bridge the gap between cloud financial management and operational efficiency. This guide is tailored for working…

Read More

Advance Your Data Management Career with CDOM – Certified DataOps Manager

The CDOM – Certified DataOps Manager is a breakthrough certification designed for professionals who want to master the intersection of data engineering and operational agility. This guide…

Read More

Future focused learning with CDOA – Certified DataOps Architect certification

Introduction The CDOA – Certified DataOps Architect is a professional designed to bridge the gap between data engineering and operational excellence. This guide is written for engineers…

Read More

Leave a Reply