{"id":736,"date":"2025-08-19T12:53:38","date_gmt":"2025-08-19T12:53:38","guid":{"rendered":"https:\/\/dataopsschool.com\/blog\/?p=736"},"modified":"2025-08-19T13:46:51","modified_gmt":"2025-08-19T13:46:51","slug":"databricks-unity-catalog-vs-catalogs-vs-workspace-vs-metastore","status":"publish","type":"post","link":"https:\/\/dataopsschool.com\/blog\/databricks-unity-catalog-vs-catalogs-vs-workspace-vs-metastore\/","title":{"rendered":"Databricks: Unity Catalog vs Catalogs vs Workspace vs Metastore"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"683\" height=\"1024\" src=\"https:\/\/dataopsschool.com\/blog\/wp-content\/uploads\/2025\/08\/datadog-1-683x1024.png\" alt=\"\" class=\"wp-image-739\" srcset=\"https:\/\/dataopsschool.com\/blog\/wp-content\/uploads\/2025\/08\/datadog-1-683x1024.png 683w, https:\/\/dataopsschool.com\/blog\/wp-content\/uploads\/2025\/08\/datadog-1-200x300.png 200w, https:\/\/dataopsschool.com\/blog\/wp-content\/uploads\/2025\/08\/datadog-1-768x1152.png 768w, https:\/\/dataopsschool.com\/blog\/wp-content\/uploads\/2025\/08\/datadog-1.png 1024w\" sizes=\"auto, (max-width: 683px) 100vw, 683px\" \/><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\ud83d\udd11 <strong>Unity Catalog vs Catalogs vs Workspace vs Metastore<\/strong><\/h1>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>1. Unity Catalog (UC)<\/strong> \u2705<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Think of it as the <strong>master governance system<\/strong>.<\/li>\n\n\n\n<li>It\u2019s <strong>account-level<\/strong> (above all workspaces).<\/li>\n\n\n\n<li>Manages:\n<ul class=\"wp-block-list\">\n<li>Who can see what (permissions, ACLs).<\/li>\n\n\n\n<li>Metadata (table names, schemas, lineage).<\/li>\n\n\n\n<li>Secure data sharing across workspaces (Delta Sharing).<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p>\ud83d\udc49 Analogy: <strong>National Library System<\/strong> \u2013 it governs all libraries in a country.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>2. Catalogs<\/strong> \ud83d\udcda<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A <strong>container for organizing data assets<\/strong> inside Unity Catalog.<\/li>\n\n\n\n<li>A catalog contains <strong>Schemas<\/strong> (databases).<\/li>\n\n\n\n<li>Within schemas, you have <strong>Tables, Views, Volumes, Functions, Models<\/strong>.<\/li>\n<\/ul>\n\n\n\n<p>\ud83d\udc49 Analogy: <strong>A library<\/strong> inside the national library system.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>3. Schemas (Databases)<\/strong><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Sub-containers within <strong>Catalogs<\/strong>.<\/li>\n\n\n\n<li>Organize <strong>Tables<\/strong> and <strong>Views<\/strong>.<\/li>\n<\/ul>\n\n\n\n<p>\ud83d\udc49 Analogy: <strong>Sections in the library<\/strong> (History, Science, Fiction).<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>4. Tables &amp; Views<\/strong><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Actual <strong>data objects<\/strong> stored in schemas.<\/li>\n\n\n\n<li><strong>Tables<\/strong> \u2192 structured datasets (Delta by default).<\/li>\n\n\n\n<li><strong>Views<\/strong> \u2192 saved queries on tables.<\/li>\n<\/ul>\n\n\n\n<p>\ud83d\udc49 Analogy: <strong>Books<\/strong> on the library shelves.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>5. Workspace<\/strong> \ud83d\udda5\ufe0f<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A <strong>UI and compute environment<\/strong> where users collaborate (notebooks, jobs, clusters).<\/li>\n\n\n\n<li>Workspaces don\u2019t \u201cown\u201d the data; they just <strong>connect to Unity Catalog<\/strong> for governed data access.<\/li>\n<\/ul>\n\n\n\n<p>\ud83d\udc49 Analogy: <strong>The reading room<\/strong> where you sit, study, and work with books.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>6. Metastore<\/strong> \ud83d\udcd2<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The <strong>backend metadata database<\/strong> that stores info about catalogs, schemas, tables, permissions.<\/li>\n\n\n\n<li>In Unity Catalog:\n<ul class=\"wp-block-list\">\n<li><strong>Account-level Metastore<\/strong> is shared across workspaces.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>In legacy mode:\n<ul class=\"wp-block-list\">\n<li>Each workspace had its own <strong>Hive Metastore<\/strong> (separate, siloed).<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p>\ud83d\udc49 Analogy: <strong>The card catalog \/ index system<\/strong> telling you where each book is and who can borrow it.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\u2705 <strong>Hierarchy<\/strong><\/h1>\n\n\n\n<pre class=\"wp-block-code\"><code>Unity Catalog (Account-level Governance)\n   \u2514\u2500\u2500 Metastore (Metadata storage)\n         \u2514\u2500\u2500 Catalogs (Top-level containers)\n               \u2514\u2500\u2500 Schemas (Databases)\n                     \u2514\u2500\u2500 Tables \/ Views \/ Volumes \/ Models\n<\/code><\/pre>\n\n\n\n<p>And <strong>Workspaces<\/strong> are where you interact with all of this (via notebooks, jobs, queries).<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\"><strong>Key Distinction<\/strong><\/h1>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Unity Catalog<\/strong> = The system of rules + governance layer.<\/li>\n\n\n\n<li><strong>Catalog<\/strong> = Logical container for data inside UC.<\/li>\n\n\n\n<li><strong>Metastore<\/strong> = Metadata database that keeps track of it all.<\/li>\n\n\n\n<li><strong>Workspace<\/strong> = Your working environment (UI + compute) that connects to the above.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"683\" height=\"1024\" src=\"https:\/\/dataopsschool.com\/blog\/wp-content\/uploads\/2025\/08\/image-7-683x1024.png\" alt=\"\" class=\"wp-image-742\" srcset=\"https:\/\/dataopsschool.com\/blog\/wp-content\/uploads\/2025\/08\/image-7-683x1024.png 683w, https:\/\/dataopsschool.com\/blog\/wp-content\/uploads\/2025\/08\/image-7-200x300.png 200w, https:\/\/dataopsschool.com\/blog\/wp-content\/uploads\/2025\/08\/image-7-768x1152.png 768w, https:\/\/dataopsschool.com\/blog\/wp-content\/uploads\/2025\/08\/image-7.png 1024w\" sizes=\"auto, (max-width: 683px) 100vw, 683px\" \/><\/figure>\n\n\n\n<p> <\/p>\n\n\n\n<p>Let\u2019s wrap up everything we\u2019ve discussed about <strong>Metastore \u2192 Catalog \u2192 Schema \u2192 Table<\/strong> and <strong>external locations<\/strong> into a <strong>single, step-by-step tutorial<\/strong> that you can follow on Databricks.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\ud83d\udcd8 Tutorial: Understanding Databricks Data Hierarchy &amp; External Locations<\/h1>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">1. The Hierarchy<\/h2>\n\n\n\n<p>In <strong>Unity Catalog<\/strong>, Databricks enforces this hierarchy:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Metastore \n   \u2514\u2500\u2500 Catalog \n         \u2514\u2500\u2500 Schema \n               \u2514\u2500\u2500 Table \/ View \/ Volume \/ Model\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Metastore<\/strong> \u2192 The <strong>root metadata container<\/strong>. Every account gets one Unity Catalog metastore.<\/li>\n\n\n\n<li><strong>Catalog<\/strong> \u2192 Top-level logical container for schemas and data assets.<\/li>\n\n\n\n<li><strong>Schema (Database)<\/strong> \u2192 Organizes objects within a catalog.<\/li>\n\n\n\n<li><strong>Table<\/strong> \u2192 Stores data (managed or external).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">2. Managed vs External Tables<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Managed Table<\/strong>: Databricks manages <strong>both metadata + storage<\/strong>. Dropping the table deletes files.<\/li>\n\n\n\n<li><strong>External Table<\/strong>: Databricks manages <strong>only metadata<\/strong>. The data stays in your cloud storage when dropped.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">3. External Location (Key Concept)<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>An <strong>External Location<\/strong> is a Unity Catalog object that maps a <strong>cloud storage path<\/strong> (S3, ADLS, GCS) + a <strong>storage credential<\/strong>.<\/li>\n\n\n\n<li>Defined <strong>at the Metastore level<\/strong> \u2192 not at Catalog or Schema.<\/li>\n\n\n\n<li>Used when creating external tables.<\/li>\n<\/ul>\n\n\n\n<p>Example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>-- Step 1: Create storage credential (cloud-specific)\nCREATE STORAGE CREDENTIAL my_cred\nWITH AZURE_MANAGED_IDENTITY 'my-managed-identity'\nCOMMENT 'Credential for ADLS';\n\n-- Step 2: Register external location\nCREATE EXTERNAL LOCATION my_ext_loc\nURL 'abfss:\/\/external-container@mydatalake.dfs.core.windows.net\/data\/'\nWITH (STORAGE CREDENTIAL my_cred)\nCOMMENT 'External data location';\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">4. Creating Catalog, Schema, and Tables<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>-- Create a catalog\nCREATE CATALOG sales_catalog;\n\n-- Create a schema (inside catalog)\nCREATE SCHEMA sales_catalog.sales_schema;\n\n-- Managed table (Databricks manages storage)\nCREATE TABLE sales_catalog.sales_schema.customers_managed (\n  id INT, name STRING\n);\n\n-- External table (you provide LOCATION)\nCREATE TABLE sales_catalog.sales_schema.customers_external\nUSING DELTA\nLOCATION 'abfss:\/\/external-container@mydatalake.dfs.core.windows.net\/data\/customers\/';\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">5. Insert, Query, and Drop Data<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>-- Insert data (only works on managed tables)\nINSERT INTO sales_catalog.sales_schema.customers_managed VALUES (1, 'Alice'), (2, 'Bob');\n\n-- Query data\nSELECT * FROM sales_catalog.sales_schema.customers_managed;\n\n-- Drop table\nDROP TABLE sales_catalog.sales_schema.customers_managed;\n\n-- For external table \u2192 only metadata removed, data files remain\nDROP TABLE sales_catalog.sales_schema.customers_external;\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">6. Quick Rules Recap \u2705<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Metastore<\/strong> \u2192 Mandatory, stores metadata &amp; external location definitions.<\/li>\n\n\n\n<li><strong>Catalog<\/strong> \u2192 Mandatory, logical top-level container.<\/li>\n\n\n\n<li><strong>Schema<\/strong> \u2192 Mandatory, organizes tables within catalogs.<\/li>\n\n\n\n<li><strong>Table<\/strong> \u2192 Optional, where the actual data lives.<\/li>\n\n\n\n<li><strong>External Location<\/strong> \u2192 Defined at <strong>Metastore<\/strong>, used at <strong>Table<\/strong> level.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83c\udfaf Conclusion<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You <strong>cannot<\/strong> define external locations at catalog or schema level.<\/li>\n\n\n\n<li>You <strong>must<\/strong> define them at the <strong>metastore<\/strong> and use them when creating <strong>external tables<\/strong>.<\/li>\n\n\n\n<li>Always think:\n<ul class=\"wp-block-list\">\n<li><strong>Metastore = registry<\/strong><\/li>\n\n\n\n<li><strong>Catalog = library section<\/strong><\/li>\n\n\n\n<li><strong>Schema = shelf<\/strong><\/li>\n\n\n\n<li><strong>Table = book (data itself)<\/strong><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><strong>Catalogs themselves don\u2019t directly \u201cuse\u201d external locations<\/strong>, but you can <strong>associate them<\/strong> with external storage in <strong>Unity Catalog<\/strong>. Let me explain:<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Catalog Storage in Databricks using External Location<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">1. <strong>Default Behavior<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>When you create a catalog in Unity Catalog: <code>CREATE CATALOG sales_catalog;<\/code> \u2192 Databricks automatically assigns it a <strong>default storage location<\/strong> (in the metastore\u2019s root storage).\n<ul class=\"wp-block-list\">\n<li>All managed tables created inside <code>sales_catalog<\/code> go there by default.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">2. <strong>Catalog With External Location<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You <em>can<\/em> override this by explicitly binding a <strong>catalog to an external location<\/strong>: <code>CREATE CATALOG sales_catalog MANAGED LOCATION 'abfss:\/\/container@storageacct.dfs.core.windows.net\/sales_data\/';<\/code>\n<ul class=\"wp-block-list\">\n<li>Here, the catalog\u2019s <strong>default managed tables<\/strong> will live in that external location.<\/li>\n\n\n\n<li>This is sometimes called a <strong>\u201ccatalog-level managed location.\u201d<\/strong><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">3. <strong>Schema Level<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Similarly, you can also define a managed location for a <strong>schema<\/strong>: <code>CREATE SCHEMA sales_catalog.retail MANAGED LOCATION 'abfss:\/\/container@storageacct.dfs.core.windows.net\/retail_data\/';<\/code>\n<ul class=\"wp-block-list\">\n<li>Now tables created in this schema (without explicit <code>LOCATION<\/code>) go here.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">4. <strong>Table Level<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>For <strong>external tables<\/strong>, you still provide the <code>LOCATION<\/code> explicitly: <code>CREATE TABLE sales_catalog.retail.customers USING DELTA LOCATION 'abfss:\/\/container@storageacct.dfs.core.windows.net\/customers\/';<\/code><\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\u2705 So to answer your question<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Yes, you <strong>can create a catalog in Databricks that uses an external location<\/strong> as its <strong>default managed storage<\/strong>.<\/li>\n\n\n\n<li>But \u26a0\ufe0f this doesn\u2019t mean <em>all tables are external tables<\/em> \u2014 it just changes the <strong>default storage path<\/strong> for managed tables created in that catalog (or schema).<\/li>\n\n\n\n<li>External tables still need their own explicit <code>LOCATION<\/code> or use a registered external location.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>\ud83d\udd11 Unity Catalog vs Catalogs vs Workspace vs Metastore 1. Unity Catalog (UC) \u2705 \ud83d\udc49 Analogy: National Library System \u2013 it governs all libraries in a country&#8230;. <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-736","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/736","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=736"}],"version-history":[{"count":6,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/736\/revisions"}],"predecessor-version":[{"id":747,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/736\/revisions\/747"}],"wp:attachment":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=736"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=736"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=736"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}