{"id":781,"date":"2025-08-22T12:58:35","date_gmt":"2025-08-22T12:58:35","guid":{"rendered":"https:\/\/dataopsschool.com\/blog\/?p=781"},"modified":"2025-08-22T12:58:36","modified_gmt":"2025-08-22T12:58:36","slug":"databricks-delta-tables-catalogs-views-and-clones","status":"publish","type":"post","link":"https:\/\/dataopsschool.com\/blog\/databricks-delta-tables-catalogs-views-and-clones\/","title":{"rendered":"Databricks: Delta Tables, Catalogs, Views, and Clones"},"content":{"rendered":"\n<p><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\"><\/h1>\n\n\n\n<p>This tutorial will walk you through <strong>core Delta Lake functionality in Databricks<\/strong>, including catalogs, schemas, tables, views, CTAS, deep clone, and shallow clone. Each section is backed with <strong>SQL and PySpark examples<\/strong> so you can directly try them inside a Databricks notebook.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">1. \ud83d\udd39 Introduction to Delta Tables<\/h2>\n\n\n\n<p>A <strong>Delta Table<\/strong> is a Databricks table format that provides:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>ACID transactions (Atomicity, Consistency, Isolation, Durability)<\/li>\n\n\n\n<li>Time travel (query old versions of data)<\/li>\n\n\n\n<li>Schema enforcement &amp; evolution<\/li>\n\n\n\n<li>Audit history<\/li>\n<\/ul>\n\n\n\n<p>\ud83d\udc49 You can think of Delta tables as <strong>Parquet tables + transaction log<\/strong> that keeps track of all changes.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">2. \ud83d\udd39 Listing Catalogs<\/h2>\n\n\n\n<p>Catalogs are the <strong>top-level containers<\/strong> in Unity Catalog.<br>You can see which catalogs you have with:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>-- List all catalogs\nSHOW CATALOGS;\n\n-- Search catalogs starting with 'dev'\nSHOW CATALOGS LIKE 'dev*';\n\n-- Search catalogs containing 'm'\nSHOW CATALOGS LIKE '*m*';\n<\/code><\/pre>\n\n\n\n<p>\ud83d\udccc Example output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>dev\ndev_external\nhive_metastore\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">3. \ud83d\udd39 Listing Schemas<\/h2>\n\n\n\n<p>Schemas (databases) live inside a catalog.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>-- Show schemas in a catalog\nSHOW SCHEMAS IN dev;\n\n-- Search schemas starting with 'info'\nSHOW SCHEMAS IN dev LIKE 'info*';\n<\/code><\/pre>\n\n\n\n<p>\ud83d\udccc Example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>information_schema\nbronze\nsilver\ngold\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">4. \ud83d\udd39 Listing Tables<\/h2>\n\n\n\n<p>Tables live inside schemas.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>-- List tables in schema\nSHOW TABLES IN dev.bronze;\n\n-- Search tables starting with 'sales'\nSHOW TABLES IN dev.bronze LIKE 'sales*';\n<\/code><\/pre>\n\n\n\n<p>Output shows:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Table name<\/li>\n\n\n\n<li>Schema name<\/li>\n\n\n\n<li>Whether it\u2019s temporary or permanent<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">5. \ud83d\udd39 Check if a Table Exists (PySpark)<\/h2>\n\n\n\n<p>Using Python:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Check if table exists\nspark.catalog.tableExists(\"dev.bronze.sales_external\")\n\n# Example: Non-existing table\nspark.catalog.tableExists(\"dev.bronze.sales_external_1\")\n<\/code><\/pre>\n\n\n\n<p>\ud83d\udc49 Returns <code>True<\/code> or <code>False<\/code>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">6. \ud83d\udd39 Creating Tables with IF NOT EXISTS<\/h2>\n\n\n\n<p>When creating a table, schema, or catalog, you can avoid errors by using <code>IF NOT EXISTS<\/code>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>-- Create a table safely\nCREATE TABLE IF NOT EXISTS dev.bronze.emp (\n  emp_id INT,\n  emp_name STRING,\n  dept_code STRING,\n  salary DOUBLE\n);\n\n-- Insert some data\nINSERT INTO dev.bronze.emp VALUES\n(1, 'Alice', 'D101', 5000),\n(2, 'Bob', 'D102', 6000),\n(3, 'Charlie', 'D101', 5500),\n(4, 'David', 'D103', 7000);\n\n-- Verify data\nSELECT * FROM dev.bronze.emp;\n<\/code><\/pre>\n\n\n\n<p>\ud83d\udccc Re-running won\u2019t fail if the table already exists.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">7. \ud83d\udd39 Temporary Views<\/h2>\n\n\n\n<p>Temporary views:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Exist only during your session\/cluster runtime<\/li>\n\n\n\n<li>Do not persist data physically<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>-- Create a temporary view\nCREATE OR REPLACE TEMP VIEW emp_tmp_view AS\nSELECT * FROM dev.bronze.emp WHERE dept_code = 'D101';\n\n-- Query it\nSELECT * FROM emp_tmp_view;\n<\/code><\/pre>\n\n\n\n<p>\ud83d\udc49 Will disappear after cluster termination.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">8. \ud83d\udd39 Permanent Views<\/h2>\n\n\n\n<p>Permanent views are stored metadata and persist beyond sessions.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>-- Create a permanent view\nCREATE VIEW dev.bronze.emp_view AS\nSELECT * FROM dev.bronze.emp WHERE dept_code = 'D102';\n\n-- Query it\nSELECT * FROM dev.bronze.emp_view;\n<\/code><\/pre>\n\n\n\n<p>\ud83d\udc49 Stored under schema (like tables, but no data copy).<\/p>\n\n\n\n<p>If you want to update it:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE OR REPLACE VIEW dev.bronze.emp_view AS\nSELECT * FROM dev.bronze.emp WHERE dept_code = 'D101';\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">9. \ud83d\udd39 CTAS (Create Table As Select)<\/h2>\n\n\n\n<p><strong>CTAS physically copies data<\/strong> into a new Delta table.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE TABLE dev.bronze.emp_ctas AS\nSELECT * FROM dev.bronze.emp;\n\n-- Verify\nSELECT * FROM dev.bronze.emp_ctas;\n<\/code><\/pre>\n\n\n\n<p>\ud83d\udc49 Creates a new table with its own physical data files.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">10. \ud83d\udd39 Deep Clone of Delta Table<\/h2>\n\n\n\n<p><strong>Deep Clone<\/strong> copies <strong>both metadata and data<\/strong> into a new location.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE TABLE dev.bronze.emp_dc\nDEEP CLONE dev.bronze.emp;\n<\/code><\/pre>\n\n\n\n<p>\u2705 Metadata + data are fully duplicated at a new location.<br>\u2705 Safer than CTAS because it preserves <strong>all metadata<\/strong> (e.g., partitioning, nullability).<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">11. \ud83d\udd39 Shallow Clone of Delta Table<\/h2>\n\n\n\n<p><strong>Shallow Clone<\/strong> copies only <strong>metadata<\/strong>. Data still points to the original table at the specific version.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE TABLE dev.bronze.emp_sc\nSHALLOW CLONE dev.bronze.emp;\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>No data copied.<\/li>\n\n\n\n<li>Faster, uses less storage.<\/li>\n\n\n\n<li>But tied to the original table\u2019s version.<\/li>\n<\/ul>\n\n\n\n<p>Example:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If you add rows to <code>emp<\/code>, shallow clone won\u2019t see them (it\u2019s tied to older version).<\/li>\n\n\n\n<li>If you insert directly into <code>emp_sc<\/code>, Databricks starts storing new files for the clone.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">12. \ud83d\udd39 Comparing CTAS, Deep Clone, Shallow Clone<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Feature<\/th><th>CTAS<\/th><th>Deep Clone<\/th><th>Shallow Clone<\/th><\/tr><\/thead><tbody><tr><td>Metadata copied<\/td><td>\u274c (some may be lost)<\/td><td>\u2705 Full metadata<\/td><td>\u2705 Full metadata<\/td><\/tr><tr><td>Data copied<\/td><td>\u2705 Yes<\/td><td>\u2705 Yes<\/td><td>\u274c No<\/td><\/tr><tr><td>Location<\/td><td>New folder<\/td><td>New folder<\/td><td>New folder (points to old)<\/td><\/tr><tr><td>Use case<\/td><td>Quick duplicate table<\/td><td>Reliable full replica<\/td><td>Lightweight dev\/test copy<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">13. \ud83d\udd39 History &amp; Time Travel<\/h2>\n\n\n\n<p>Delta tables support versioning.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>-- See history\nDESCRIBE HISTORY dev.bronze.emp;\n\n-- Query old version\nSELECT * FROM dev.bronze.emp VERSION AS OF 0;\n\n-- Or use timestamp\nSELECT * FROM dev.bronze.emp TIMESTAMP AS OF '2023-01-01 00:00:00';\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\u2705 Conclusion<\/h1>\n\n\n\n<p>By now you understand:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>How to navigate <strong>catalogs, schemas, and tables<\/strong><\/li>\n\n\n\n<li>Use <code>IF NOT EXISTS<\/code> to avoid errors<\/li>\n\n\n\n<li>Create <strong>temporary vs permanent views<\/strong><\/li>\n\n\n\n<li>Duplicate data using <strong>CTAS, Deep Clone, Shallow Clone<\/strong><\/li>\n\n\n\n<li>How Delta tables maintain <strong>history and time travel<\/strong><\/li>\n<\/ul>\n\n\n\n<p>This covers the <strong>core building blocks of Delta Lake in Databricks<\/strong>. \ud83d\ude80<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n","protected":false},"excerpt":{"rendered":"<p>This tutorial will walk you through core Delta Lake functionality in Databricks, including catalogs, schemas, tables, views, CTAS, deep clone, and shallow clone. Each section is backed&#8230; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-781","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/781","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=781"}],"version-history":[{"count":1,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/781\/revisions"}],"predecessor-version":[{"id":782,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/781\/revisions\/782"}],"wp:attachment":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=781"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=781"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=781"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}