This tutorial covers how to perform upserts (MERGE) in Delta tables on Databricks, with both hard deletes and soft deletes (using SCD1 style).
1. 🔹 Introduction
In Delta Lake, the MERGE operation (also called UPSERT) lets you:
- Update rows in the target table when keys match.
- Insert rows when keys don’t exist.
- Delete or mark inactive rows when they exist in target but not in source.
👉 This is the same concept as Slowly Changing Dimension Type 1 (SCD1) in data warehousing.
2. 🔹 Setup: Create EMP (target) table
We’ll use an EMP table as the target.
CREATE TABLE IF NOT EXISTS dev.bronze.emp (
emp_id INT,
emp_name STRING,
dept_code STRING,
salary DOUBLE
);
-- Insert initial data
INSERT INTO dev.bronze.emp VALUES
(101, 'Alice', 'D101', 10000),
(102, 'Bob', 'D102', 12000),
(103, 'Charlie', 'D101', 11000),
(104, 'David', 'D103', 9000),
(105, 'Eve', 'D104', 9500);
-- Check data
SELECT * FROM dev.bronze.emp;
3. 🔹 Create EMP_UPDATES (source) table
This table will hold updates and new rows.
CREATE TABLE dev.bronze.emp_updates (
emp_id INT,
emp_name STRING,
dept_code STRING,
salary DOUBLE
);
-- Insert updates (existing employees with new salary)
INSERT INTO dev.bronze.emp_updates VALUES
(101, 'Alice', 'D101', 15000),
(102, 'Bob', 'D102', 16000),
(104, 'David', 'D103', 17000);
-- Insert a new employee
INSERT INTO dev.bronze.emp_updates VALUES
(106, 'Rohan', 'D103', 10000);
-- Check source data
SELECT * FROM dev.bronze.emp_updates;
👉 Now we have 3 updates + 1 new record.
4. 🔹 MERGE for Upserts
Update existing employees, insert new ones.
MERGE INTO dev.bronze.emp AS e
USING dev.bronze.emp_updates AS u
ON e.emp_id = u.emp_id
WHEN MATCHED THEN
UPDATE SET e.salary = u.salary
WHEN NOT MATCHED THEN
INSERT *;
📌 Results:
- Alice (101), Bob (102), David (104) → salary updated.
- Rohan (106) → inserted as new row.
Check:
SELECT * FROM dev.bronze.emp;
5. 🔹 Hard Deletes (Remove missing records)
Suppose source doesn’t have emp_id 103 and 105.
We want to delete them from target.
MERGE INTO dev.bronze.emp AS e
USING dev.bronze.emp_updates AS u
ON e.emp_id = u.emp_id
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
📌 emp_id 103, 105 → deleted.
Check:
SELECT * FROM dev.bronze.emp;
6. 🔹 Soft Deletes (SCD1 style)
Instead of deleting, we mark records inactive.
Step 1: Add column
ALTER TABLE dev.bronze.emp ADD COLUMNS (is_active STRING);
👉 All existing rows get NULL by default.
Step 2: Modify MERGE for soft delete
MERGE INTO dev.bronze.emp AS e
USING dev.bronze.emp_updates AS u
ON e.emp_id = u.emp_id
WHEN MATCHED THEN
UPDATE SET e.salary = u.salary, e.is_active = 'Y'
WHEN NOT MATCHED THEN
INSERT (emp_id, emp_name, dept_code, salary, is_active)
VALUES (u.emp_id, u.emp_name, u.dept_code, u.salary, 'Y')
WHEN NOT MATCHED BY SOURCE THEN
UPDATE SET e.is_active = 'N';
📌 Behavior:
- Updated rows → marked Y.
- New rows → inserted with Y.
- Missing rows (e.g., 103, 105) → marked as N instead of being deleted.
Check:
SELECT * FROM dev.bronze.emp;
7. 🔹 Benefits of MERGE
- Handles incremental loads seamlessly.
- Supports real-time updates in Delta.
- Allows hard deletes (completely remove) or soft deletes (mark inactive).
- Enables auditable, consistent data pipelines.
✅ Final Recap
- MERGE (Upsert/SCD1) = Update existing + Insert new.
- Can also delete missing records (
WHEN NOT MATCHED BY SOURCE THEN DELETE). - For soft deletes, add a column (e.g.,
is_active) and mark inactive instead of removing.