Databricks – Catalog, Schemas & Tables with External Location

this is exactly the core of Unity Catalog’s object model. The way Databricks resolves storage paths for managed tables depends on where you attach the external/managed location. Let’s break it down carefully by Catalog → Schema → Table levels.


🔑 Unity Catalog Storage Hierarchy

Unity Catalog object model:
Metastore → Catalog → Schema → Table

  • You can set managed storage locations (via external locations) at Metastore, Catalog, or Schema level.
  • Tables may also define a LOCATION clause (explicit path).
  • Databricks applies a fallback precedence:
    Schema location (if exists) → Catalog location (if exists) → Metastore location (if exists).

1️⃣ Catalog with External Location

When you create a catalog with an external location:

CREATE CATALOG dev_ext
MANAGED LOCATION 'abfss://data@<storage_account>.dfs.core.windows.net/ADB/catalog/';
  • Effect:
    • Any managed tables created inside this catalog (and its schemas) will, by default, write to this catalog’s external location.
    • The actual storage path will be: /ADB/catalog/<catalog-id>/tables/<table-id>/
    • If no schema-level location is set, the catalog’s location is used.
  • Example: CREATE SCHEMA dev_ext.bronze; CREATE TABLE dev_ext.bronze.sales (...) ; → Data files stored under the catalog-level path.

2️⃣ Schema with External Location

When you create a schema with an external location:

CREATE SCHEMA dev_ext.bronze_ext
MANAGED LOCATION 'abfss://data@<storage_account>.dfs.core.windows.net/ADB/schema/bronze_ext/';
  • Effect:
    • Any managed tables inside this schema are stored under the schema’s external location.
    • Schema location overrides the catalog location.
    • Actual storage path: /ADB/schema/bronze_ext/<schema-id>/tables/<table-id>/
  • Example: CREATE TABLE dev_ext.bronze_ext.sales (...) ; → Data files stored at the schema-level location.

3️⃣ Table with External Location

When you create a table with a LOCATION clause:

CREATE TABLE dev_ext.bronze.sales_external (
  id INT, product STRING, amount DOUBLE
)
LOCATION 'abfss://data@<storage_account>.dfs.core.windows.net/ADB/external-tables/sales_external';
  • Effect:
    • This is an external table, not a managed one.
    • The data files live exactly at the path you specify.
    • Databricks only manages metadata in Unity Catalog; it will never delete the underlying files if you drop the table.
  • Example:
    • Drop the table → metadata gone, files remain at /ADB/external-tables/sales_external/.
    • Undrop → metadata is restored (within 7 days).

🔄 Comparison Across Scenarios

ScenarioDefault Data StorageCleanup Behavior on DROP TABLE
Catalog with external locationTables stored under catalog’s managed path (/catalog/<catalog-id>/tables/<table-id>/)Dropping table: metadata removed, data files deleted after 7–30 days (grace period for undrop).
Schema with external locationTables stored under schema’s managed path (/schema/<schema-id>/tables/<table-id>/)Same as above: metadata gone immediately, data eventually deleted after retention period.
Table with LOCATION (external table)Data stored exactly at the specified path (/external-tables/<name>/)Metadata dropped immediately, data files always remain. UC never deletes them.

⚡ Key Takeaways

  • Metastore-level location = global default (used if neither catalog nor schema has a location).
  • Catalog-level location = default for all schemas under it (unless schema overrides).
  • Schema-level location = strongest override for managed tables.
  • Table with LOCATION = always external → UC does not manage lifecycle of data files.

👉 So, in short:

  • Catalog external location = whole catalog defaults there.
  • Schema external location = just that schema overrides catalog.
  • Table external location = explicit → always external, metadata-only management.

Related Posts

DataOps Project Learning Builds Awareness of Data Quality Automation Practices

Introduction Learning DataOps only through theory is not enough. Beginners must work on practical projects to understand how data pipelines are designed, tested, automated, monitored, and improved…

Read More

Ultimate Career Guide: Best Practices for Entry-Level DataOps Professionals

Introduction Data is now one of the most important assets for modern organizations. Companies depend on data pipelines, analytics dashboards, reporting systems, cloud platforms, and automated workflows…

Read More

Understanding Fundamental Analysis of Stocks for Long Term Equity Investing

Introduction Stepping into the financial world can feel overwhelming, but securing high-quality stock market education is the ultimate way to build long-term wealth. For individuals starting their…

Read More

A Complete Review of the Top Rank Tracking Tools for Local & Global Scale

To win in the modern digital landscape, visibility is everything. Growing brands and busy agencies frequently struggle to balance keyword tracking, technical audits, content creation, creator outreach,…

Read More

Modern DevOps Consulting for Cloud and Kubernetes Success

Introduction Digital‑first businesses are under intense pressure to ship faster, stay secure, and scale reliably across complex multi‑cloud environments. Traditional ways of building and operating software cannot…

Read More

Enterprise DevOps: A Beginner Guide to Scaling IT

Introduction Modern enterprises face the monumental challenge of delivering software at breakneck speeds without sacrificing infrastructure stability. Relying on isolated development and operations teams is no longer…

Read More