Databricks Components


Databricks Components Hierarchy

1. Account Level (Top Layer)

  • Account Console – central place to manage everything across workspaces.
  • Workspaces – logical environments where teams work.
  • Unity Catalog (Metastore) – unified governance across all workspaces.

2. Governance & Data Management

  • Unity Catalog
    • Catalogs → top container of data assets.
    • Schemas (Databases) → inside catalogs.
    • Tables → structured data (Managed / External).
    • Views → logical queries on tables.
    • Volumes → for non-tabular data (images, PDFs, etc.).
    • Models → ML models registered.
    • Functions → SQL or Python-defined functions.
    • Lineage → track where data comes from and how it’s used.
  • Access Control
    • Users → individual identities.
    • Groups → manage permissions collectively.
    • Service Principals → for apps/automation.
    • ACLs (Access Control Lists) → fine-grained permissions.
    • Personal Access Tokens (PATs) → authentication for APIs.

3. Computation & Execution

  • Clusters
    • All-purpose clusters → interactive, shared by users.
    • Job clusters → spin up just for a job, then shut down.
    • Pools → pre-warmed instances to reduce cluster spin-up time.
    • Databricks Runtime (DBR) → core software stack (Spark + optimizations).
      • DBR for Machine Learning (ML/DL libraries pre-installed).
      • DBR for Genomics, SQL, etc.
  • Jobs & Pipelines
    • Jobs UI → scheduling & automation of notebooks, SQL, scripts.
    • Lakeflow Declarative Pipelines → manage Delta tables with orchestration.
    • Workflows → CI/CD style orchestration.
  • Workloads
    • Data Engineering → ETL, batch jobs.
    • Data Analytics → interactive queries, dashboards.
    • Machine Learning → model training/inference.
    • Streaming → real-time with Structured Streaming.

4. Developer Interfaces

  • Workspace UI → notebooks, data, clusters, jobs, dashboards.
  • Notebooks → code in Python, SQL, R, Scala.
  • Dashboards → visual insights.
  • Git Folders (Repos) → version control integration.
  • Libraries → attach external or custom libraries.
  • Catalog Explorer → browse data assets.
  • APIs & Tools
    • REST API → programmatic access.
    • SQL REST API → SQL automation.
    • CLI → Databricks command line tool.
    • dbutils → utility commands inside notebooks.

5. Data & AI Layers

  • Delta Lake (Default Table Format)
    • Delta Tables
    • Delta Transaction Logs (ACID)
    • Time Travel, Schema Evolution
  • Lakehouse Storage Pattern
    • Bronze → Raw data
    • Silver → Clean/curated data
    • Gold → Business-ready data
  • AI & ML (Mosaic AI)
    • MLflow → experiment tracking, model registry.
    • Feature Store → reusable features for ML.
    • Generative AI (LLMs) → foundation models, fine-tuning.
    • AI Playground → test LLMs interactively.
    • Model Serving → REST API for deploying models.

In one line:

  • Account Console (top) → WorkspacesUnity Catalog (Governance)Data Assets (Tables, Schemas, Models, Volumes) → Compute (Clusters, Jobs, Pipelines)Developer Interfaces (Notebooks, APIs, CLI)AI/ML & Analytics Tools.

Related Posts

DataOps Project Learning Builds Awareness of Data Quality Automation Practices

Introduction Learning DataOps only through theory is not enough. Beginners must work on practical projects to understand how data pipelines are designed, tested, automated, monitored, and improved…

Read More

Ultimate Career Guide: Best Practices for Entry-Level DataOps Professionals

Introduction Data is now one of the most important assets for modern organizations. Companies depend on data pipelines, analytics dashboards, reporting systems, cloud platforms, and automated workflows…

Read More

Understanding Fundamental Analysis of Stocks for Long Term Equity Investing

Introduction Stepping into the financial world can feel overwhelming, but securing high-quality stock market education is the ultimate way to build long-term wealth. For individuals starting their…

Read More

A Complete Review of the Top Rank Tracking Tools for Local & Global Scale

To win in the modern digital landscape, visibility is everything. Growing brands and busy agencies frequently struggle to balance keyword tracking, technical audits, content creation, creator outreach,…

Read More

Modern DevOps Consulting for Cloud and Kubernetes Success

Introduction Digital‑first businesses are under intense pressure to ship faster, stay secure, and scale reliably across complex multi‑cloud environments. Traditional ways of building and operating software cannot…

Read More

Enterprise DevOps: A Beginner Guide to Scaling IT

Introduction Modern enterprises face the monumental challenge of delivering software at breakneck speeds without sacrificing infrastructure stability. Relying on isolated development and operations teams is no longer…

Read More

Leave a Reply