Good Read – https://dataopsschool.com/blog/databricks-catalog-schemas-tables-with-external-location/
1. Create Catalog without External Location
- Go to Catalog tab in Databricks workspace.
- Click “+ Add Catalog”.
- Provide a catalog name (e.g.,
Dev
). - Choose type = Standard.
- Do not specify any external location → data defaults to metastore location.
- Click Create → Catalog is created with default schemas (
default
,information_schema
). - To check catalog details:
DESCRIBE CATALOG EXTENDED Dev;
(Shows no location specified → fallback to metastore storage.)
2. Create Catalog with SQL
- Open a SQL notebook in Databricks.
- Ensure cluster is Unity Catalog enabled.
- Run SQL command:
CREATE CATALOG DevSQL COMMENT "This catalog is created using SQL";
- Verify with:
DESCRIBE CATALOG EXTENDED DevSQL;
3. Drop Catalog and Drop Catalog Recursively
- To drop a catalog:
DROP CATALOG DevSQL;
❌ Error → cannot drop because it contains schemas/tables. - Use CASCADE to drop everything inside recursively:
DROP CATALOG DevSQL CASCADE;
✅ Removes catalog + schemas + tables.
4. Create External Location in Databricks
- Go to Catalog > External Data > Create Location.
- Or via SQL:
CREATE EXTERNAL LOCATION `ext_catalogs` URL 'abfss://data@<storage_account>.dfs.core.windows.net/ADB/catalog/' WITH STORAGE CREDENTIAL `SC_catalog_storage`;
5. Create Storage Credential in Databricks
- Navigate to Catalog > Storage Credentials > Create Credential.
- Choose Azure Managed Identity.
- Provide credential name (e.g.,
SC_catalog_storage
). - Supply UC connector Resource ID (from Azure portal).
- Click Create → Storage credential created.
6. Create Catalog with External Location
- Run SQL to create a catalog with managed location:
CREATE CATALOG DevEXT
MANAGED LOCATION 'abfss://data@<storage_account>.dfs.core.windows.net/ADB/catalog/'
COMMENT "Catalog with external location";
- Verify:
DESCRIBE CATALOG EXTENDED DevEXT;
Final Flow Summary
- Create Catalog without external location → defaults to metastore.
- Create Catalog using SQL.
- Drop catalog normally or recursively with CASCADE.
- Create External Location in Databricks.
- Create Storage Credential for external storage.
- Create Catalog with External Location bound to that credential.
Let’s build a practical example using your catalog dev_ext
.
I’ll show you:
- Create a schema inside the catalog.
- Create a table in that schema.
- Insert sample data.
- Query the data back.
1️⃣ Create a Schema in dev_ext
-- Create a schema called bronze inside catalog dev_ext
CREATE SCHEMA IF NOT EXISTS dev_ext.bronze
COMMENT 'Bronze schema for raw ingested data in dev_ext catalog';
2️⃣ Create a Managed Table in the Schema
-- Create a managed table (no LOCATION clause → stored at catalog/schema location)
CREATE TABLE IF NOT EXISTS dev_ext.bronze.sales_data (
id INT,
product STRING,
amount DOUBLE,
sale_date DATE
)
COMMENT 'Sales transactions table';
3️⃣ Insert Some Sample Data
INSERT INTO dev_ext.bronze.sales_data VALUES
(1, 'Laptop', 1200.50, DATE'2025-08-01'),
(2, 'Phone', 750.00, DATE'2025-08-05'),
(3, 'Tablet', 500.00, DATE'2025-08-10'),
(4, 'Headphones', 120.00, DATE'2025-08-12');
4️⃣ Query the Data
SELECT *
FROM dev_ext.bronze.sales_data;
✅ You’ll see the four rows returned.
- The data files will be stored in the catalog-level managed location (since you didn’t set schema/table
LOCATION
).