Databricks: Unity Catalog vs Catalogs vs Workspace vs Metastore


๐Ÿ”‘ Unity Catalog vs Catalogs vs Workspace vs Metastore


1. Unity Catalog (UC) โœ…

  • Think of it as the master governance system.
  • Itโ€™s account-level (above all workspaces).
  • Manages:
    • Who can see what (permissions, ACLs).
    • Metadata (table names, schemas, lineage).
    • Secure data sharing across workspaces (Delta Sharing).

๐Ÿ‘‰ Analogy: National Library System โ€“ it governs all libraries in a country.


2. Catalogs ๐Ÿ“š

  • A container for organizing data assets inside Unity Catalog.
  • A catalog contains Schemas (databases).
  • Within schemas, you have Tables, Views, Volumes, Functions, Models.

๐Ÿ‘‰ Analogy: A library inside the national library system.


3. Schemas (Databases)

  • Sub-containers within Catalogs.
  • Organize Tables and Views.

๐Ÿ‘‰ Analogy: Sections in the library (History, Science, Fiction).


4. Tables & Views

  • Actual data objects stored in schemas.
  • Tables โ†’ structured datasets (Delta by default).
  • Views โ†’ saved queries on tables.

๐Ÿ‘‰ Analogy: Books on the library shelves.


5. Workspace ๐Ÿ–ฅ๏ธ

  • A UI and compute environment where users collaborate (notebooks, jobs, clusters).
  • Workspaces donโ€™t โ€œownโ€ the data; they just connect to Unity Catalog for governed data access.

๐Ÿ‘‰ Analogy: The reading room where you sit, study, and work with books.


6. Metastore ๐Ÿ“’

  • The backend metadata database that stores info about catalogs, schemas, tables, permissions.
  • In Unity Catalog:
    • Account-level Metastore is shared across workspaces.
  • In legacy mode:
    • Each workspace had its own Hive Metastore (separate, siloed).

๐Ÿ‘‰ Analogy: The card catalog / index system telling you where each book is and who can borrow it.


โœ… Hierarchy

Unity Catalog (Account-level Governance)
   โ””โ”€โ”€ Metastore (Metadata storage)
         โ””โ”€โ”€ Catalogs (Top-level containers)
               โ””โ”€โ”€ Schemas (Databases)
                     โ””โ”€โ”€ Tables / Views / Volumes / Models

And Workspaces are where you interact with all of this (via notebooks, jobs, queries).


Key Distinction

  • Unity Catalog = The system of rules + governance layer.
  • Catalog = Logical container for data inside UC.
  • Metastore = Metadata database that keeps track of it all.
  • Workspace = Your working environment (UI + compute) that connects to the above.

Letโ€™s wrap up everything weโ€™ve discussed about Metastore โ†’ Catalog โ†’ Schema โ†’ Table and external locations into a single, step-by-step tutorial that you can follow on Databricks.


๐Ÿ“˜ Tutorial: Understanding Databricks Data Hierarchy & External Locations


1. The Hierarchy

In Unity Catalog, Databricks enforces this hierarchy:

Metastore 
   โ””โ”€โ”€ Catalog 
         โ””โ”€โ”€ Schema 
               โ””โ”€โ”€ Table / View / Volume / Model
  • Metastore โ†’ The root metadata container. Every account gets one Unity Catalog metastore.
  • Catalog โ†’ Top-level logical container for schemas and data assets.
  • Schema (Database) โ†’ Organizes objects within a catalog.
  • Table โ†’ Stores data (managed or external).

2. Managed vs External Tables

  • Managed Table: Databricks manages both metadata + storage. Dropping the table deletes files.
  • External Table: Databricks manages only metadata. The data stays in your cloud storage when dropped.

3. External Location (Key Concept)

  • An External Location is a Unity Catalog object that maps a cloud storage path (S3, ADLS, GCS) + a storage credential.
  • Defined at the Metastore level โ†’ not at Catalog or Schema.
  • Used when creating external tables.

Example:

-- Step 1: Create storage credential (cloud-specific)
CREATE STORAGE CREDENTIAL my_cred
WITH AZURE_MANAGED_IDENTITY 'my-managed-identity'
COMMENT 'Credential for ADLS';

-- Step 2: Register external location
CREATE EXTERNAL LOCATION my_ext_loc
URL 'abfss://external-container@mydatalake.dfs.core.windows.net/data/'
WITH (STORAGE CREDENTIAL my_cred)
COMMENT 'External data location';

4. Creating Catalog, Schema, and Tables

-- Create a catalog
CREATE CATALOG sales_catalog;

-- Create a schema (inside catalog)
CREATE SCHEMA sales_catalog.sales_schema;

-- Managed table (Databricks manages storage)
CREATE TABLE sales_catalog.sales_schema.customers_managed (
  id INT, name STRING
);

-- External table (you provide LOCATION)
CREATE TABLE sales_catalog.sales_schema.customers_external
USING DELTA
LOCATION 'abfss://external-container@mydatalake.dfs.core.windows.net/data/customers/';

5. Insert, Query, and Drop Data

-- Insert data (only works on managed tables)
INSERT INTO sales_catalog.sales_schema.customers_managed VALUES (1, 'Alice'), (2, 'Bob');

-- Query data
SELECT * FROM sales_catalog.sales_schema.customers_managed;

-- Drop table
DROP TABLE sales_catalog.sales_schema.customers_managed;

-- For external table โ†’ only metadata removed, data files remain
DROP TABLE sales_catalog.sales_schema.customers_external;

6. Quick Rules Recap โœ…

  • Metastore โ†’ Mandatory, stores metadata & external location definitions.
  • Catalog โ†’ Mandatory, logical top-level container.
  • Schema โ†’ Mandatory, organizes tables within catalogs.
  • Table โ†’ Optional, where the actual data lives.
  • External Location โ†’ Defined at Metastore, used at Table level.

๐ŸŽฏ Conclusion

  • You cannot define external locations at catalog or schema level.
  • You must define them at the metastore and use them when creating external tables.
  • Always think:
    • Metastore = registry
    • Catalog = library section
    • Schema = shelf
    • Table = book (data itself)

Catalogs themselves donโ€™t directly โ€œuseโ€ external locations, but you can associate them with external storage in Unity Catalog. Let me explain:


Catalog Storage in Databricks using External Location

1. Default Behavior

  • When you create a catalog in Unity Catalog: CREATE CATALOG sales_catalog; โ†’ Databricks automatically assigns it a default storage location (in the metastoreโ€™s root storage).
    • All managed tables created inside sales_catalog go there by default.

2. Catalog With External Location

  • You can override this by explicitly binding a catalog to an external location: CREATE CATALOG sales_catalog MANAGED LOCATION 'abfss://container@storageacct.dfs.core.windows.net/sales_data/';
    • Here, the catalogโ€™s default managed tables will live in that external location.
    • This is sometimes called a โ€œcatalog-level managed location.โ€

3. Schema Level

  • Similarly, you can also define a managed location for a schema: CREATE SCHEMA sales_catalog.retail MANAGED LOCATION 'abfss://container@storageacct.dfs.core.windows.net/retail_data/';
    • Now tables created in this schema (without explicit LOCATION) go here.

4. Table Level

  • For external tables, you still provide the LOCATION explicitly: CREATE TABLE sales_catalog.retail.customers USING DELTA LOCATION 'abfss://container@storageacct.dfs.core.windows.net/customers/';

โœ… So to answer your question

  • Yes, you can create a catalog in Databricks that uses an external location as its default managed storage.
  • But โš ๏ธ this doesnโ€™t mean all tables are external tables โ€” it just changes the default storage path for managed tables created in that catalog (or schema).
  • External tables still need their own explicit LOCATION or use a registered external location.

Related Posts

DataOps Project Learning Builds Awareness of Data Quality Automation Practices

Introduction Learning DataOps only through theory is not enough. Beginners must work on practical projects to understand how data pipelines are designed, tested, automated, monitored, and improved…

Read More

Ultimate Career Guide: Best Practices for Entry-Level DataOps Professionals

Introduction Data is now one of the most important assets for modern organizations. Companies depend on data pipelines, analytics dashboards, reporting systems, cloud platforms, and automated workflows…

Read More

Understanding Fundamental Analysis of Stocks for Long Term Equity Investing

Introduction Stepping into the financial world can feel overwhelming, but securing high-quality stock market education is the ultimate way to build long-term wealth. For individuals starting their…

Read More

A Complete Review of the Top Rank Tracking Tools for Local & Global Scale

To win in the modern digital landscape, visibility is everything. Growing brands and busy agencies frequently struggle to balance keyword tracking, technical audits, content creation, creator outreach,…

Read More

Modern DevOps Consulting for Cloud and Kubernetes Success

Introduction Digitalโ€‘first businesses are under intense pressure to ship faster, stay secure, and scale reliably across complex multiโ€‘cloud environments. Traditional ways of building and operating software cannot…

Read More

Enterprise DevOps: A Beginner Guide to Scaling IT

Introduction Modern enterprises face the monumental challenge of delivering software at breakneck speeds without sacrificing infrastructure stability. Relying on isolated development and operations teams is no longer…

Read More

Leave a Reply