This tutorial covers how to perform upserts (MERGE) in Delta tables on Databricks, with both hard deletes and soft deletes (using SCD1 style).
1. πΉ Introduction
In Delta Lake, the MERGE operation (also called UPSERT) lets you:
- Update rows in the target table when keys match.
- Insert rows when keys donβt exist.
- Delete or mark inactive rows when they exist in target but not in source.
π This is the same concept as Slowly Changing Dimension Type 1 (SCD1) in data warehousing.
2. πΉ Setup: Create EMP (target) table
Weβll use an EMP table as the target.
CREATE TABLE IF NOT EXISTS dev.bronze.emp (
emp_id INT,
emp_name STRING,
dept_code STRING,
salary DOUBLE
);
-- Insert initial data
INSERT INTO dev.bronze.emp VALUES
(101, 'Alice', 'D101', 10000),
(102, 'Bob', 'D102', 12000),
(103, 'Charlie', 'D101', 11000),
(104, 'David', 'D103', 9000),
(105, 'Eve', 'D104', 9500);
-- Check data
SELECT * FROM dev.bronze.emp;
3. πΉ Create EMP_UPDATES (source) table
This table will hold updates and new rows.
CREATE TABLE dev.bronze.emp_updates (
emp_id INT,
emp_name STRING,
dept_code STRING,
salary DOUBLE
);
-- Insert updates (existing employees with new salary)
INSERT INTO dev.bronze.emp_updates VALUES
(101, 'Alice', 'D101', 15000),
(102, 'Bob', 'D102', 16000),
(104, 'David', 'D103', 17000);
-- Insert a new employee
INSERT INTO dev.bronze.emp_updates VALUES
(106, 'Rohan', 'D103', 10000);
-- Check source data
SELECT * FROM dev.bronze.emp_updates;
π Now we have 3 updates + 1 new record.
4. πΉ MERGE for Upserts
Update existing employees, insert new ones.
MERGE INTO dev.bronze.emp AS e
USING dev.bronze.emp_updates AS u
ON e.emp_id = u.emp_id
WHEN MATCHED THEN
UPDATE SET e.salary = u.salary
WHEN NOT MATCHED THEN
INSERT *;
π Results:
- Alice (101), Bob (102), David (104) β salary updated.
- Rohan (106) β inserted as new row.
Check:
SELECT * FROM dev.bronze.emp;
5. πΉ Hard Deletes (Remove missing records)
Suppose source doesnβt have emp_id 103 and 105.
We want to delete them from target.
MERGE INTO dev.bronze.emp AS e
USING dev.bronze.emp_updates AS u
ON e.emp_id = u.emp_id
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
π emp_id 103, 105 β deleted.
Check:
SELECT * FROM dev.bronze.emp;
6. πΉ Soft Deletes (SCD1 style)
Instead of deleting, we mark records inactive.
Step 1: Add column
ALTER TABLE dev.bronze.emp ADD COLUMNS (is_active STRING);
π All existing rows get NULL
by default.
Step 2: Modify MERGE for soft delete
MERGE INTO dev.bronze.emp AS e
USING dev.bronze.emp_updates AS u
ON e.emp_id = u.emp_id
WHEN MATCHED THEN
UPDATE SET e.salary = u.salary, e.is_active = 'Y'
WHEN NOT MATCHED THEN
INSERT (emp_id, emp_name, dept_code, salary, is_active)
VALUES (u.emp_id, u.emp_name, u.dept_code, u.salary, 'Y')
WHEN NOT MATCHED BY SOURCE THEN
UPDATE SET e.is_active = 'N';
π Behavior:
- Updated rows β marked Y.
- New rows β inserted with Y.
- Missing rows (e.g., 103, 105) β marked as N instead of being deleted.
Check:
SELECT * FROM dev.bronze.emp;
7. πΉ Benefits of MERGE
- Handles incremental loads seamlessly.
- Supports real-time updates in Delta.
- Allows hard deletes (completely remove) or soft deletes (mark inactive).
- Enables auditable, consistent data pipelines.
β Final Recap
- MERGE (Upsert/SCD1) = Update existing + Insert new.
- Can also delete missing records (
WHEN NOT MATCHED BY SOURCE THEN DELETE
). - For soft deletes, add a column (e.g.,
is_active
) and mark inactive instead of removing.