{"id":766,"date":"2025-08-19T15:29:17","date_gmt":"2025-08-19T15:29:17","guid":{"rendered":"https:\/\/dataopsschool.com\/blog\/?p=766"},"modified":"2025-08-19T15:29:17","modified_gmt":"2025-08-19T15:29:17","slug":"databricks-lab-working-with-schemas-and-external-locations","status":"publish","type":"post","link":"https:\/\/dataopsschool.com\/blog\/databricks-lab-working-with-schemas-and-external-locations\/","title":{"rendered":"Databricks Lab \u2013 Working with Schemas and External Locations"},"content":{"rendered":"\n<p><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">We will:<\/h1>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Create schemas at different levels (with and without external locations).<\/li>\n\n\n\n<li>Create managed tables inside these schemas.<\/li>\n\n\n\n<li>Validate where Databricks stores the underlying data.<\/li>\n\n\n\n<li>Explore how Unity Catalog organizes data using IDs in <strong>Catalog Explorer<\/strong>.<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Unity Catalog has a 4-level hierarchy:<\/h2>\n\n\n\n<p><strong>Metastore \u2192 Catalog \u2192 Schema \u2192 Table<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Location fallback:\n<ul class=\"wp-block-list\">\n<li>If no schema-level location \u2192 fallback to catalog.<\/li>\n\n\n\n<li>If no catalog-level location \u2192 fallback to metastore.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p>\ud83d\udc49 Today we\u2019ll create <strong>three schemas<\/strong> to see how Unity Catalog stores managed table data at different levels:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Schema in Dev catalog<\/strong> (no external location).<\/li>\n\n\n\n<li><strong>Schema in DevEXT catalog<\/strong> (catalog has external location).<\/li>\n\n\n\n<li><strong>Schema in DevEXT catalog but with schema-level external location<\/strong>.<\/li>\n<\/ol>\n\n\n\n<p>Finally, we\u2019ll create tables in each schema and validate where the actual data files are stored.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Create Schema in Unity Catalog<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">1\ufe0f\u20e3 Create schema under a catalog without external location (Dev)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE SCHEMA Dev.bronze\nCOMMENT 'Bronze schema under Dev catalog without external location';\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Here, <code>Dev<\/code> catalog has no external location.<\/li>\n\n\n\n<li>By default, <strong>managed table data will go to the metastore location<\/strong>.<\/li>\n<\/ul>\n\n\n\n<p>\u2705 Verify in Databricks UI: expand <strong>Dev \u2192 bronze schema<\/strong>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">2\ufe0f\u20e3 Create schema under a catalog with external location (DevEXT)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE SCHEMA DevEXT.bronze\nCOMMENT 'Bronze schema in DevEXT catalog (catalog has external location)';\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>DevEXT<\/code> catalog already points to an external location.<\/li>\n\n\n\n<li>Managed tables in this schema will store data <strong>at the catalog-level external path<\/strong>.<\/li>\n<\/ul>\n\n\n\n<p>\u2705 Verify in UI: expand <strong>DevEXT \u2192 bronze schema<\/strong>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Create Schema with External Location in Unity Catalog<\/strong><\/h2>\n\n\n\n<p>Now, let\u2019s override the storage location at the <strong>schema level<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Create schema-level external location<\/h3>\n\n\n\n<p>We need a <strong>new folder<\/strong> in our ADLS storage:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/data\/ADB\/schema\/\n<\/code><\/pre>\n\n\n\n<p>Create a corresponding <strong>external location<\/strong> in Databricks:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE EXTERNAL LOCATION `ext_schema`\nURL 'abfss:\/\/data@&lt;storage_account&gt;.dfs.core.windows.net\/ADB\/schema\/'\nWITH STORAGE CREDENTIAL `SC_catalog_storage`;\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Create schema with managed location<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE SCHEMA DevEXT.bronze_ext\nMANAGED LOCATION 'abfss:\/\/data@&lt;storage_account&gt;.dfs.core.windows.net\/ADB\/schema\/bronze_ext\/'\nCOMMENT 'Bronze schema with schema-level external location';\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>This schema\u2019s managed tables will <strong>always<\/strong> store data in the schema external location.<\/li>\n<\/ul>\n\n\n\n<p>\u2705 Verify in UI: under <strong>DevEXT \u2192 bronze_ext schema<\/strong>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Validate Location for Managed Tables<\/strong><\/h2>\n\n\n\n<p>Let\u2019s now create a test <strong>managed table<\/strong> in each schema and check where data lands.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Example Table \u2013 Bronze Schema in Dev (no external location)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE TABLE Dev.bronze.raw_sales (id INT, product STRING, amount DOUBLE);\n\nINSERT INTO Dev.bronze.raw_sales VALUES (1, 'Laptop', 1200.50);\n<\/code><\/pre>\n\n\n\n<p>Check details:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>DESCRIBE EXTENDED Dev.bronze.raw_sales;\n<\/code><\/pre>\n\n\n\n<p>\ud83d\udccd <strong>Location<\/strong> \u2192 Stored under <strong>Metastore location<\/strong>.<br>Path will look like:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/&lt;metastore-id&gt;\/tables\/&lt;table-id&gt;\/\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Example Table \u2013 Bronze Schema in DevEXT (catalog external location)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE TABLE DevEXT.bronze.raw_sales (id INT, product STRING, amount DOUBLE);\n\nINSERT INTO DevEXT.bronze.raw_sales VALUES (2, 'Phone', 750.00);\n<\/code><\/pre>\n\n\n\n<p>Check details:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>DESCRIBE EXTENDED DevEXT.bronze.raw_sales;\n<\/code><\/pre>\n\n\n\n<p>\ud83d\udccd <strong>Location<\/strong> \u2192 Stored under <strong>catalog external location<\/strong>.<br>Path will look like:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/ADB\/catalogs\/&lt;catalog-id&gt;\/tables\/&lt;table-id&gt;\/\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Example Table \u2013 Bronze_EXT Schema in DevEXT (schema external location)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE TABLE DevEXT.bronze_ext.raw_sales (id INT, product STRING, amount DOUBLE);\n\nINSERT INTO DevEXT.bronze_ext.raw_sales VALUES (3, 'Tablet', 500.00);\n<\/code><\/pre>\n\n\n\n<p>Check details:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>DESCRIBE EXTENDED DevEXT.bronze_ext.raw_sales;\n<\/code><\/pre>\n\n\n\n<p>\ud83d\udccd <strong>Location<\/strong> \u2192 Stored under <strong>schema external location<\/strong>.<br>Path will look like:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/ADB\/schema\/bronze_ext\/&lt;schema-id&gt;\/tables\/&lt;table-id&gt;\/\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Databricks Catalog Explorer<\/strong><\/h2>\n\n\n\n<p>You can visually confirm all this in <strong>Catalog Explorer<\/strong> inside Databricks:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Go to <strong>Catalog \u2192 Dev \u2192 bronze \u2192 raw_sales \u2192 Details<\/strong>\n<ul class=\"wp-block-list\">\n<li>Shows <strong>Metastore path<\/strong> and <strong>Table ID<\/strong>.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Go to <strong>Catalog \u2192 DevEXT \u2192 bronze \u2192 raw_sales \u2192 Details<\/strong>\n<ul class=\"wp-block-list\">\n<li>Shows <strong>Catalog-level path<\/strong>.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Go to <strong>Catalog \u2192 DevEXT \u2192 bronze_ext \u2192 raw_sales \u2192 Details<\/strong>\n<ul class=\"wp-block-list\">\n<li>Shows <strong>Schema-level path<\/strong>.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<p>\ud83d\udd0e Note:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Unity Catalog <strong>does not store tables by name<\/strong>, instead uses <strong>Table IDs<\/strong> in the storage path.<\/li>\n\n\n\n<li>You can find <strong>Schema ID, Catalog ID, and Table ID<\/strong> from the Catalog Explorer.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Dev.bronze<\/strong> \u2192 data stored at <strong>Metastore level<\/strong>.<\/li>\n\n\n\n<li><strong>DevEXT.bronze<\/strong> \u2192 data stored at <strong>Catalog external location<\/strong>.<\/li>\n\n\n\n<li><strong>DevEXT.bronze_ext<\/strong> \u2192 data stored at <strong>Schema external location<\/strong>.<\/li>\n<\/ol>\n\n\n\n<p>Unity Catalog provides <strong>flexible storage mapping<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Metastore-level (fallback).<\/li>\n\n\n\n<li>Catalog-level (optional).<\/li>\n\n\n\n<li>Schema-level (optional, overrides catalog).<\/li>\n\n\n\n<li>Table-level (for <strong>external tables<\/strong>).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>\ud83d\udc49 In the next part (as per video series), you\u2019ll explore <strong>External Tables<\/strong> and operations like <strong>DROP\/UNDROP<\/strong>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>Would you like me to also create a <strong>diagram of the fallback hierarchy (Metastore \u2192 Catalog \u2192 Schema \u2192 Table)<\/strong> so you can visualize how data location resolution works across these levels?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We will: Unity Catalog has a 4-level hierarchy: Metastore \u2192 Catalog \u2192 Schema \u2192 Table \ud83d\udc49 Today we\u2019ll create three schemas to see how Unity Catalog stores&#8230; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-766","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/766","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=766"}],"version-history":[{"count":1,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/766\/revisions"}],"predecessor-version":[{"id":767,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/766\/revisions\/767"}],"wp:attachment":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=766"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=766"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=766"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}