Databricks Unity Catalog Tutorial
Managed vs External Tables + UNDROP (with External Location setup)
Introduction (what we’ll build)
You’ll learn to:
- Create (or reuse) an external location for tables.
- Create one managed table and one external table in the same schema.
- Inspect their storage locations and types.
- DROP both tables and recover them using UNDROP (by name and by ID).
- Managed tables’ data isn’t deleted immediately; it’s eligible for cleanup after a grace period (7–30 days). You can recover within 7 days. (Microsoft Learn, Databricks Documentation)
Prereqs
- A Unity Catalog–enabled cluster.
- A catalog
Dev
and schemabronze
you can write to.- A storage credential already created (e.g.,
SC_catalog_storage
).- Proper permissions:
USE CATALOG
,USE SCHEMA
,CREATE TABLE
, etc.- You’re on Azure (paths use abfss://). Adjust for S3/GCS if needed.
What’s new in Databricks? (Updates & Releases)
In the workspace, click ? → What’s new to see the monthly release notes (features are rolled out in stages). For reference: August 2024 release notes pages: Azure and Databricks docs. (Microsoft Learn, Databricks Documentation)
Create External Location in Databricks
A) Make a folder in ADLS for external tables
In your ADLS Gen2 container (e.g., data
), create:
/ADB/external-tables/
You’ll store all external table data here (the video used externaltables
—name doesn’t matter).
B) (If needed) Reuse or create a storage credential
If you already have one (e.g., SC_catalog_storage
), reuse it. Otherwise, create it via Catalog → Storage credentials (Azure Managed Identity or SAS/Key as per your setup). Docs: Create external locations (Azure/AWS). (Microsoft Learn, Databricks Documentation)
C) Create the external location (UI or SQL)
UI path (as in the video):
Catalog → External data → External locations → Create
- Name:
ext_tables
- Storage credential:
SC_catalog_storage
- URL:
abfss://data@<storage_account>.dfs.core.windows.net/ADB/external-tables/
- Create → Test connection (should show all permissions OK).
SQL equivalent (does the same thing):
CREATE EXTERNAL LOCATION `ext_tables`
URL 'abfss://data@<storage_account>.dfs.core.windows.net/ADB/external-tables/'
WITH STORAGE CREDENTIAL `SC_catalog_storage`;
Reference: SQL syntax for CREATE EXTERNAL LOCATION. (Databricks Documentation)
03:37 — Managed Tables vs External Tables (same schema)
We’ll use catalog Dev
and schema bronze
.
Tip: attach your SQL notebook to a Unity Catalog–enabled cluster before running.
1) Create a managed table and insert a row
-- Managed table (no LOCATION clause)
CREATE TABLE Dev.bronze.sales_managed (
id INT,
product STRING,
amount DOUBLE
);
INSERT INTO Dev.bronze.sales_managed VALUES (1, 'Laptop', 1200.50);
2) Create an external table and insert a row
External table requires a LOCATION pointing into your external location path.
-- External table (explicit LOCATION under the external location root)
CREATE TABLE Dev.bronze.sales_external (
id INT,
product STRING,
amount DOUBLE
)
LOCATION 'abfss://data@<storage_account>.dfs.core.windows.net/ADB/external-tables/sales_external';
INSERT INTO Dev.bronze.sales_external VALUES (2, 'Phone', 750.00);
3) Verify each table’s type and storage path
-- Managed table details
DESCRIBE EXTENDED Dev.bronze.sales_managed;
-- External table details
DESCRIBE EXTENDED Dev.bronze.sales_external;
sales_managed
→ tableType = MANAGED, path resolves to your metastore managed storage (Unity Catalog-managed).sales_external
→ tableType = EXTERNAL, path is exactly the LOCATION you specified under/ADB/external-tables/
.
Background docs on table types. (Databricks Documentation)
07:07 — UNDROP in Databricks (recover dropped tables)
Now we’ll drop both tables and recover them.
1) Drop the tables
DROP TABLE Dev.bronze.sales_managed;
DROP TABLE Dev.bronze.sales_external;
- In Unity Catalog, dropping a managed table does not immediately delete data from cloud storage; it’s eligible for cleanup within ~7–30 days.
- You can recover (UNDROP) a dropped relation within 7 days. (Microsoft Learn, Databricks Documentation)
2) List dropped tables (get names/IDs)
USE CATALOG Dev;
-- See dropped tables for a schema
SHOW TABLES DROPPED IN Dev.bronze;
You’ll get rows including tablename
, tableid
(UUID), deletedat
, and more. (Microsoft Learn)
3) UNDROP by name
UNDROP TABLE Dev.bronze.sales_managed;
- Parent catalog and schema must still exist.
- Recovers the most recently dropped relation with that name in the schema. (Databricks Documentation)
4) UNDROP by ID (alternative)
Copy the tableid
for sales_external
from SHOW TABLES DROPPED
, then:
UNDROP TABLE WITH ID '<table_uuid_from_show_tables_dropped>';
The WITH ID
form is handy if a new table with the same name was recreated and blocks name-based recovery (in that case you can rename the new one or use the ID). (Databricks Documentation, kb.databricks.com)
Keep in mind
- UNDROP works for both managed and external relations in Unity Catalog.
- Managed: data is soft-retained for up to the 7-day recovery window, then scheduled for deletion (within ~7–30 days).
- External: data files always remain at your external path; dropping only removes UC metadata. You can still UNDROP within 7 days to re-register. (Databricks Documentation, Microsoft Learn)
(Optional) Quick sanity checks in storage
- Managed table (after drop, before cleanup): you’ll still see data under the Unity Catalog managed path (e.g., metastore-scoped “tables/
<table-id>
/”). Cleanup is deferred. (Microsoft Learn) - External table: data remains at
/ADB/external-tables/sales_external/
regardless; dropping only removes the table’s entry from the catalog. (Databricks Documentation)
Clean-up (optional)
-- Re-drop if you restored them
DROP TABLE IF EXISTS Dev.bronze.sales_managed;
DROP TABLE IF EXISTS Dev.bronze.sales_external;
References (docs used)
- UNDROP command (Azure & Databricks docs): retention 7 days to recover. (Microsoft Learn, Databricks Documentation)
- DROP TABLE behavior: deletion within 7–30 days for managed tables in UC. (Microsoft Learn, Databricks Documentation)
- SHOW TABLES DROPPED (to list droppped relations and IDs). (Microsoft Learn)
- External locations (how to configure + SQL syntax). (Microsoft Learn, Databricks Documentation)
- Table types overview (managed vs external). (Databricks Documentation)
- What’s New / August 2024 releases (staged rollouts). (Microsoft Learn, Databricks Documentation)