Databricks: Delta Tables MERGE & UPSERT (SCD1 + Soft Deletes)


This tutorial covers how to perform upserts (MERGE) in Delta tables on Databricks, with both hard deletes and soft deletes (using SCD1 style).


1. πŸ”Ή Introduction

In Delta Lake, the MERGE operation (also called UPSERT) lets you:

  • Update rows in the target table when keys match.
  • Insert rows when keys don’t exist.
  • Delete or mark inactive rows when they exist in target but not in source.

πŸ‘‰ This is the same concept as Slowly Changing Dimension Type 1 (SCD1) in data warehousing.


2. πŸ”Ή Setup: Create EMP (target) table

We’ll use an EMP table as the target.

CREATE TABLE IF NOT EXISTS dev.bronze.emp (
  emp_id INT,
  emp_name STRING,
  dept_code STRING,
  salary DOUBLE
);

-- Insert initial data
INSERT INTO dev.bronze.emp VALUES
(101, 'Alice', 'D101', 10000),
(102, 'Bob',   'D102', 12000),
(103, 'Charlie', 'D101', 11000),
(104, 'David', 'D103', 9000),
(105, 'Eve',   'D104', 9500);

-- Check data
SELECT * FROM dev.bronze.emp;

3. πŸ”Ή Create EMP_UPDATES (source) table

This table will hold updates and new rows.

CREATE TABLE dev.bronze.emp_updates (
  emp_id INT,
  emp_name STRING,
  dept_code STRING,
  salary DOUBLE
);

-- Insert updates (existing employees with new salary)
INSERT INTO dev.bronze.emp_updates VALUES
(101, 'Alice', 'D101', 15000),
(102, 'Bob',   'D102', 16000),
(104, 'David', 'D103', 17000);

-- Insert a new employee
INSERT INTO dev.bronze.emp_updates VALUES
(106, 'Rohan', 'D103', 10000);

-- Check source data
SELECT * FROM dev.bronze.emp_updates;

πŸ‘‰ Now we have 3 updates + 1 new record.


4. πŸ”Ή MERGE for Upserts

Update existing employees, insert new ones.

MERGE INTO dev.bronze.emp AS e
USING dev.bronze.emp_updates AS u
ON e.emp_id = u.emp_id

WHEN MATCHED THEN
  UPDATE SET e.salary = u.salary

WHEN NOT MATCHED THEN
  INSERT *;

πŸ“Œ Results:

  • Alice (101), Bob (102), David (104) β†’ salary updated.
  • Rohan (106) β†’ inserted as new row.

Check:

SELECT * FROM dev.bronze.emp;

5. πŸ”Ή Hard Deletes (Remove missing records)

Suppose source doesn’t have emp_id 103 and 105.
We want to delete them from target.

MERGE INTO dev.bronze.emp AS e
USING dev.bronze.emp_updates AS u
ON e.emp_id = u.emp_id

WHEN NOT MATCHED BY SOURCE THEN
  DELETE;

πŸ“Œ emp_id 103, 105 β†’ deleted.

Check:

SELECT * FROM dev.bronze.emp;

6. πŸ”Ή Soft Deletes (SCD1 style)

Instead of deleting, we mark records inactive.

Step 1: Add column

ALTER TABLE dev.bronze.emp ADD COLUMNS (is_active STRING);

πŸ‘‰ All existing rows get NULL by default.


Step 2: Modify MERGE for soft delete

MERGE INTO dev.bronze.emp AS e
USING dev.bronze.emp_updates AS u
ON e.emp_id = u.emp_id

WHEN MATCHED THEN
  UPDATE SET e.salary = u.salary, e.is_active = 'Y'

WHEN NOT MATCHED THEN
  INSERT (emp_id, emp_name, dept_code, salary, is_active)
  VALUES (u.emp_id, u.emp_name, u.dept_code, u.salary, 'Y')

WHEN NOT MATCHED BY SOURCE THEN
  UPDATE SET e.is_active = 'N';

πŸ“Œ Behavior:

  • Updated rows β†’ marked Y.
  • New rows β†’ inserted with Y.
  • Missing rows (e.g., 103, 105) β†’ marked as N instead of being deleted.

Check:

SELECT * FROM dev.bronze.emp;

7. πŸ”Ή Benefits of MERGE

  • Handles incremental loads seamlessly.
  • Supports real-time updates in Delta.
  • Allows hard deletes (completely remove) or soft deletes (mark inactive).
  • Enables auditable, consistent data pipelines.

βœ… Final Recap

  • MERGE (Upsert/SCD1) = Update existing + Insert new.
  • Can also delete missing records (WHEN NOT MATCHED BY SOURCE THEN DELETE).
  • For soft deletes, add a column (e.g., is_active) and mark inactive instead of removing.

Related Posts

Strategic Cloud Financial Management With Certified FinOps Professional Training

Introduction The Certified FinOps Professional program is a transformative milestone for any engineer or manager looking to master the intersection of finance, technology, and business operations. This…

Read More

Professional Certified FinOps Engineer improves financial performance visibility systems

Introduction In the modern landscape of cloud infrastructure, technical expertise alone is no longer sufficient to drive enterprise success. The Certified FinOps Engineer program has emerged as…

Read More

Complete Cloud Financial Management Guide for Certified FinOps Manager

Introduction The Certified FinOps Manager program is designed to bridge the widening gap between cloud engineering and financial accountability. As cloud environments become more complex, organizations require…

Read More

Industry Ready FinOps Knowledge Through Certified FinOps Architect Program

Introduction The Certified FinOps Architect certification is designed to help professionals bridge the gap between cloud financial management and operational efficiency. This guide is tailored for working…

Read More

Advance Your Data Management Career with CDOM – Certified DataOps Manager

The CDOM – Certified DataOps Manager is a breakthrough certification designed for professionals who want to master the intersection of data engineering and operational agility. This guide…

Read More

Future focused learning with CDOA – Certified DataOps Architect certification

Introduction The CDOA – Certified DataOps Architect is a professional designed to bridge the gap between data engineering and operational excellence. This guide is written for engineers…

Read More