Databricks: Delta Tables, Catalogs, Views, and Clones


This tutorial will walk you through core Delta Lake functionality in Databricks, including catalogs, schemas, tables, views, CTAS, deep clone, and shallow clone. Each section is backed with SQL and PySpark examples so you can directly try them inside a Databricks notebook.


1. πŸ”Ή Introduction to Delta Tables

A Delta Table is a Databricks table format that provides:

  • ACID transactions (Atomicity, Consistency, Isolation, Durability)
  • Time travel (query old versions of data)
  • Schema enforcement & evolution
  • Audit history

πŸ‘‰ You can think of Delta tables as Parquet tables + transaction log that keeps track of all changes.


2. πŸ”Ή Listing Catalogs

Catalogs are the top-level containers in Unity Catalog.
You can see which catalogs you have with:

-- List all catalogs
SHOW CATALOGS;

-- Search catalogs starting with 'dev'
SHOW CATALOGS LIKE 'dev*';

-- Search catalogs containing 'm'
SHOW CATALOGS LIKE '*m*';

πŸ“Œ Example output:

dev
dev_external
hive_metastore

3. πŸ”Ή Listing Schemas

Schemas (databases) live inside a catalog.

-- Show schemas in a catalog
SHOW SCHEMAS IN dev;

-- Search schemas starting with 'info'
SHOW SCHEMAS IN dev LIKE 'info*';

πŸ“Œ Example:

information_schema
bronze
silver
gold

4. πŸ”Ή Listing Tables

Tables live inside schemas.

-- List tables in schema
SHOW TABLES IN dev.bronze;

-- Search tables starting with 'sales'
SHOW TABLES IN dev.bronze LIKE 'sales*';

Output shows:

  • Table name
  • Schema name
  • Whether it’s temporary or permanent

5. πŸ”Ή Check if a Table Exists (PySpark)

Using Python:

# Check if table exists
spark.catalog.tableExists("dev.bronze.sales_external")

# Example: Non-existing table
spark.catalog.tableExists("dev.bronze.sales_external_1")

πŸ‘‰ Returns True or False.


6. πŸ”Ή Creating Tables with IF NOT EXISTS

When creating a table, schema, or catalog, you can avoid errors by using IF NOT EXISTS.

-- Create a table safely
CREATE TABLE IF NOT EXISTS dev.bronze.emp (
  emp_id INT,
  emp_name STRING,
  dept_code STRING,
  salary DOUBLE
);

-- Insert some data
INSERT INTO dev.bronze.emp VALUES
(1, 'Alice', 'D101', 5000),
(2, 'Bob', 'D102', 6000),
(3, 'Charlie', 'D101', 5500),
(4, 'David', 'D103', 7000);

-- Verify data
SELECT * FROM dev.bronze.emp;

πŸ“Œ Re-running won’t fail if the table already exists.


7. πŸ”Ή Temporary Views

Temporary views:

  • Exist only during your session/cluster runtime
  • Do not persist data physically
-- Create a temporary view
CREATE OR REPLACE TEMP VIEW emp_tmp_view AS
SELECT * FROM dev.bronze.emp WHERE dept_code = 'D101';

-- Query it
SELECT * FROM emp_tmp_view;

πŸ‘‰ Will disappear after cluster termination.


8. πŸ”Ή Permanent Views

Permanent views are stored metadata and persist beyond sessions.

-- Create a permanent view
CREATE VIEW dev.bronze.emp_view AS
SELECT * FROM dev.bronze.emp WHERE dept_code = 'D102';

-- Query it
SELECT * FROM dev.bronze.emp_view;

πŸ‘‰ Stored under schema (like tables, but no data copy).

If you want to update it:

CREATE OR REPLACE VIEW dev.bronze.emp_view AS
SELECT * FROM dev.bronze.emp WHERE dept_code = 'D101';

9. πŸ”Ή CTAS (Create Table As Select)

CTAS physically copies data into a new Delta table.

CREATE TABLE dev.bronze.emp_ctas AS
SELECT * FROM dev.bronze.emp;

-- Verify
SELECT * FROM dev.bronze.emp_ctas;

πŸ‘‰ Creates a new table with its own physical data files.


10. πŸ”Ή Deep Clone of Delta Table

Deep Clone copies both metadata and data into a new location.

CREATE TABLE dev.bronze.emp_dc
DEEP CLONE dev.bronze.emp;

βœ… Metadata + data are fully duplicated at a new location.
βœ… Safer than CTAS because it preserves all metadata (e.g., partitioning, nullability).


11. πŸ”Ή Shallow Clone of Delta Table

Shallow Clone copies only metadata. Data still points to the original table at the specific version.

CREATE TABLE dev.bronze.emp_sc
SHALLOW CLONE dev.bronze.emp;
  • No data copied.
  • Faster, uses less storage.
  • But tied to the original table’s version.

Example:

  • If you add rows to emp, shallow clone won’t see them (it’s tied to older version).
  • If you insert directly into emp_sc, Databricks starts storing new files for the clone.

12. πŸ”Ή Comparing CTAS, Deep Clone, Shallow Clone

FeatureCTASDeep CloneShallow Clone
Metadata copied❌ (some may be lost)βœ… Full metadataβœ… Full metadata
Data copiedβœ… Yesβœ… Yes❌ No
LocationNew folderNew folderNew folder (points to old)
Use caseQuick duplicate tableReliable full replicaLightweight dev/test copy

13. πŸ”Ή History & Time Travel

Delta tables support versioning.

-- See history
DESCRIBE HISTORY dev.bronze.emp;

-- Query old version
SELECT * FROM dev.bronze.emp VERSION AS OF 0;

-- Or use timestamp
SELECT * FROM dev.bronze.emp TIMESTAMP AS OF '2023-01-01 00:00:00';

βœ… Conclusion

By now you understand:

  • How to navigate catalogs, schemas, and tables
  • Use IF NOT EXISTS to avoid errors
  • Create temporary vs permanent views
  • Duplicate data using CTAS, Deep Clone, Shallow Clone
  • How Delta tables maintain history and time travel

This covers the core building blocks of Delta Lake in Databricks. πŸš€