This tutorial will walk you through core Delta Lake functionality in Databricks, including catalogs, schemas, tables, views, CTAS, deep clone, and shallow clone. Each section is backed with SQL and PySpark examples so you can directly try them inside a Databricks notebook.
1. πΉ Introduction to Delta Tables
A Delta Table is a Databricks table format that provides:
- ACID transactions (Atomicity, Consistency, Isolation, Durability)
- Time travel (query old versions of data)
- Schema enforcement & evolution
- Audit history
π You can think of Delta tables as Parquet tables + transaction log that keeps track of all changes.
2. πΉ Listing Catalogs
Catalogs are the top-level containers in Unity Catalog.
You can see which catalogs you have with:
-- List all catalogs
SHOW CATALOGS;
-- Search catalogs starting with 'dev'
SHOW CATALOGS LIKE 'dev*';
-- Search catalogs containing 'm'
SHOW CATALOGS LIKE '*m*';
π Example output:
dev
dev_external
hive_metastore
3. πΉ Listing Schemas
Schemas (databases) live inside a catalog.
-- Show schemas in a catalog
SHOW SCHEMAS IN dev;
-- Search schemas starting with 'info'
SHOW SCHEMAS IN dev LIKE 'info*';
π Example:
information_schema
bronze
silver
gold
4. πΉ Listing Tables
Tables live inside schemas.
-- List tables in schema
SHOW TABLES IN dev.bronze;
-- Search tables starting with 'sales'
SHOW TABLES IN dev.bronze LIKE 'sales*';
Output shows:
- Table name
- Schema name
- Whether itβs temporary or permanent
5. πΉ Check if a Table Exists (PySpark)
Using Python:
# Check if table exists
spark.catalog.tableExists("dev.bronze.sales_external")
# Example: Non-existing table
spark.catalog.tableExists("dev.bronze.sales_external_1")
π Returns True
or False
.
6. πΉ Creating Tables with IF NOT EXISTS
When creating a table, schema, or catalog, you can avoid errors by using IF NOT EXISTS
.
-- Create a table safely
CREATE TABLE IF NOT EXISTS dev.bronze.emp (
emp_id INT,
emp_name STRING,
dept_code STRING,
salary DOUBLE
);
-- Insert some data
INSERT INTO dev.bronze.emp VALUES
(1, 'Alice', 'D101', 5000),
(2, 'Bob', 'D102', 6000),
(3, 'Charlie', 'D101', 5500),
(4, 'David', 'D103', 7000);
-- Verify data
SELECT * FROM dev.bronze.emp;
π Re-running wonβt fail if the table already exists.
7. πΉ Temporary Views
Temporary views:
- Exist only during your session/cluster runtime
- Do not persist data physically
-- Create a temporary view
CREATE OR REPLACE TEMP VIEW emp_tmp_view AS
SELECT * FROM dev.bronze.emp WHERE dept_code = 'D101';
-- Query it
SELECT * FROM emp_tmp_view;
π Will disappear after cluster termination.
8. πΉ Permanent Views
Permanent views are stored metadata and persist beyond sessions.
-- Create a permanent view
CREATE VIEW dev.bronze.emp_view AS
SELECT * FROM dev.bronze.emp WHERE dept_code = 'D102';
-- Query it
SELECT * FROM dev.bronze.emp_view;
π Stored under schema (like tables, but no data copy).
If you want to update it:
CREATE OR REPLACE VIEW dev.bronze.emp_view AS
SELECT * FROM dev.bronze.emp WHERE dept_code = 'D101';
9. πΉ CTAS (Create Table As Select)
CTAS physically copies data into a new Delta table.
CREATE TABLE dev.bronze.emp_ctas AS
SELECT * FROM dev.bronze.emp;
-- Verify
SELECT * FROM dev.bronze.emp_ctas;
π Creates a new table with its own physical data files.
10. πΉ Deep Clone of Delta Table
Deep Clone copies both metadata and data into a new location.
CREATE TABLE dev.bronze.emp_dc
DEEP CLONE dev.bronze.emp;
β
Metadata + data are fully duplicated at a new location.
β
Safer than CTAS because it preserves all metadata (e.g., partitioning, nullability).
11. πΉ Shallow Clone of Delta Table
Shallow Clone copies only metadata. Data still points to the original table at the specific version.
CREATE TABLE dev.bronze.emp_sc
SHALLOW CLONE dev.bronze.emp;
- No data copied.
- Faster, uses less storage.
- But tied to the original tableβs version.
Example:
- If you add rows to
emp
, shallow clone wonβt see them (itβs tied to older version). - If you insert directly into
emp_sc
, Databricks starts storing new files for the clone.
12. πΉ Comparing CTAS, Deep Clone, Shallow Clone
Feature | CTAS | Deep Clone | Shallow Clone |
---|---|---|---|
Metadata copied | β (some may be lost) | β Full metadata | β Full metadata |
Data copied | β Yes | β Yes | β No |
Location | New folder | New folder | New folder (points to old) |
Use case | Quick duplicate table | Reliable full replica | Lightweight dev/test copy |
13. πΉ History & Time Travel
Delta tables support versioning.
-- See history
DESCRIBE HISTORY dev.bronze.emp;
-- Query old version
SELECT * FROM dev.bronze.emp VERSION AS OF 0;
-- Or use timestamp
SELECT * FROM dev.bronze.emp TIMESTAMP AS OF '2023-01-01 00:00:00';
β Conclusion
By now you understand:
- How to navigate catalogs, schemas, and tables
- Use
IF NOT EXISTS
to avoid errors - Create temporary vs permanent views
- Duplicate data using CTAS, Deep Clone, Shallow Clone
- How Delta tables maintain history and time travel
This covers the core building blocks of Delta Lake in Databricks. π