Databricks Components Hierarchy
1. Account Level (Top Layer)
- Account Console – central place to manage everything across workspaces.
- Workspaces – logical environments where teams work.
- Unity Catalog (Metastore) – unified governance across all workspaces.
2. Governance & Data Management
- Unity Catalog
- Catalogs → top container of data assets.
- Schemas (Databases) → inside catalogs.
- Tables → structured data (Managed / External).
- Views → logical queries on tables.
- Volumes → for non-tabular data (images, PDFs, etc.).
- Models → ML models registered.
- Functions → SQL or Python-defined functions.
- Lineage → track where data comes from and how it’s used.
- Access Control
- Users → individual identities.
- Groups → manage permissions collectively.
- Service Principals → for apps/automation.
- ACLs (Access Control Lists) → fine-grained permissions.
- Personal Access Tokens (PATs) → authentication for APIs.
3. Computation & Execution
- Clusters
- All-purpose clusters → interactive, shared by users.
- Job clusters → spin up just for a job, then shut down.
- Pools → pre-warmed instances to reduce cluster spin-up time.
- Databricks Runtime (DBR) → core software stack (Spark + optimizations).
- DBR for Machine Learning (ML/DL libraries pre-installed).
- DBR for Genomics, SQL, etc.
- Jobs & Pipelines
- Jobs UI → scheduling & automation of notebooks, SQL, scripts.
- Lakeflow Declarative Pipelines → manage Delta tables with orchestration.
- Workflows → CI/CD style orchestration.
- Workloads
- Data Engineering → ETL, batch jobs.
- Data Analytics → interactive queries, dashboards.
- Machine Learning → model training/inference.
- Streaming → real-time with Structured Streaming.
4. Developer Interfaces
- Workspace UI → notebooks, data, clusters, jobs, dashboards.
- Notebooks → code in Python, SQL, R, Scala.
- Dashboards → visual insights.
- Git Folders (Repos) → version control integration.
- Libraries → attach external or custom libraries.
- Catalog Explorer → browse data assets.
- APIs & Tools
- REST API → programmatic access.
- SQL REST API → SQL automation.
- CLI → Databricks command line tool.
- dbutils → utility commands inside notebooks.
5. Data & AI Layers
- Delta Lake (Default Table Format)
- Delta Tables
- Delta Transaction Logs (ACID)
- Time Travel, Schema Evolution
- Lakehouse Storage Pattern
- Bronze → Raw data
- Silver → Clean/curated data
- Gold → Business-ready data
- AI & ML (Mosaic AI)
- MLflow → experiment tracking, model registry.
- Feature Store → reusable features for ML.
- Generative AI (LLMs) → foundation models, fine-tuning.
- AI Playground → test LLMs interactively.
- Model Serving → REST API for deploying models.
✅ In one line:
- Account Console (top) → Workspaces → Unity Catalog (Governance) → Data Assets (Tables, Schemas, Models, Volumes) → Compute (Clusters, Jobs, Pipelines) → Developer Interfaces (Notebooks, APIs, CLI) → AI/ML & Analytics Tools.