Databricks: Delta Tables MERGE & UPSERT (SCD1 + Soft Deletes)


This tutorial covers how to perform upserts (MERGE) in Delta tables on Databricks, with both hard deletes and soft deletes (using SCD1 style).


1. 🔹 Introduction

In Delta Lake, the MERGE operation (also called UPSERT) lets you:

  • Update rows in the target table when keys match.
  • Insert rows when keys don’t exist.
  • Delete or mark inactive rows when they exist in target but not in source.

👉 This is the same concept as Slowly Changing Dimension Type 1 (SCD1) in data warehousing.


2. 🔹 Setup: Create EMP (target) table

We’ll use an EMP table as the target.

CREATE TABLE IF NOT EXISTS dev.bronze.emp (
  emp_id INT,
  emp_name STRING,
  dept_code STRING,
  salary DOUBLE
);

-- Insert initial data
INSERT INTO dev.bronze.emp VALUES
(101, 'Alice', 'D101', 10000),
(102, 'Bob',   'D102', 12000),
(103, 'Charlie', 'D101', 11000),
(104, 'David', 'D103', 9000),
(105, 'Eve',   'D104', 9500);

-- Check data
SELECT * FROM dev.bronze.emp;

3. 🔹 Create EMP_UPDATES (source) table

This table will hold updates and new rows.

CREATE TABLE dev.bronze.emp_updates (
  emp_id INT,
  emp_name STRING,
  dept_code STRING,
  salary DOUBLE
);

-- Insert updates (existing employees with new salary)
INSERT INTO dev.bronze.emp_updates VALUES
(101, 'Alice', 'D101', 15000),
(102, 'Bob',   'D102', 16000),
(104, 'David', 'D103', 17000);

-- Insert a new employee
INSERT INTO dev.bronze.emp_updates VALUES
(106, 'Rohan', 'D103', 10000);

-- Check source data
SELECT * FROM dev.bronze.emp_updates;

👉 Now we have 3 updates + 1 new record.


4. 🔹 MERGE for Upserts

Update existing employees, insert new ones.

MERGE INTO dev.bronze.emp AS e
USING dev.bronze.emp_updates AS u
ON e.emp_id = u.emp_id

WHEN MATCHED THEN
  UPDATE SET e.salary = u.salary

WHEN NOT MATCHED THEN
  INSERT *;

📌 Results:

  • Alice (101), Bob (102), David (104) → salary updated.
  • Rohan (106) → inserted as new row.

Check:

SELECT * FROM dev.bronze.emp;

5. 🔹 Hard Deletes (Remove missing records)

Suppose source doesn’t have emp_id 103 and 105.
We want to delete them from target.

MERGE INTO dev.bronze.emp AS e
USING dev.bronze.emp_updates AS u
ON e.emp_id = u.emp_id

WHEN NOT MATCHED BY SOURCE THEN
  DELETE;

📌 emp_id 103, 105 → deleted.

Check:

SELECT * FROM dev.bronze.emp;

6. 🔹 Soft Deletes (SCD1 style)

Instead of deleting, we mark records inactive.

Step 1: Add column

ALTER TABLE dev.bronze.emp ADD COLUMNS (is_active STRING);

👉 All existing rows get NULL by default.


Step 2: Modify MERGE for soft delete

MERGE INTO dev.bronze.emp AS e
USING dev.bronze.emp_updates AS u
ON e.emp_id = u.emp_id

WHEN MATCHED THEN
  UPDATE SET e.salary = u.salary, e.is_active = 'Y'

WHEN NOT MATCHED THEN
  INSERT (emp_id, emp_name, dept_code, salary, is_active)
  VALUES (u.emp_id, u.emp_name, u.dept_code, u.salary, 'Y')

WHEN NOT MATCHED BY SOURCE THEN
  UPDATE SET e.is_active = 'N';

📌 Behavior:

  • Updated rows → marked Y.
  • New rows → inserted with Y.
  • Missing rows (e.g., 103, 105) → marked as N instead of being deleted.

Check:

SELECT * FROM dev.bronze.emp;

7. 🔹 Benefits of MERGE

  • Handles incremental loads seamlessly.
  • Supports real-time updates in Delta.
  • Allows hard deletes (completely remove) or soft deletes (mark inactive).
  • Enables auditable, consistent data pipelines.

✅ Final Recap

  • MERGE (Upsert/SCD1) = Update existing + Insert new.
  • Can also delete missing records (WHEN NOT MATCHED BY SOURCE THEN DELETE).
  • For soft deletes, add a column (e.g., is_active) and mark inactive instead of removing.

Related Posts

DataOps Project Learning Builds Awareness of Data Quality Automation Practices

Introduction Learning DataOps only through theory is not enough. Beginners must work on practical projects to understand how data pipelines are designed, tested, automated, monitored, and improved…

Read More

Ultimate Career Guide: Best Practices for Entry-Level DataOps Professionals

Introduction Data is now one of the most important assets for modern organizations. Companies depend on data pipelines, analytics dashboards, reporting systems, cloud platforms, and automated workflows…

Read More

Understanding Fundamental Analysis of Stocks for Long Term Equity Investing

Introduction Stepping into the financial world can feel overwhelming, but securing high-quality stock market education is the ultimate way to build long-term wealth. For individuals starting their…

Read More

A Complete Review of the Top Rank Tracking Tools for Local & Global Scale

To win in the modern digital landscape, visibility is everything. Growing brands and busy agencies frequently struggle to balance keyword tracking, technical audits, content creation, creator outreach,…

Read More

Modern DevOps Consulting for Cloud and Kubernetes Success

Introduction Digital‑first businesses are under intense pressure to ship faster, stay secure, and scale reliably across complex multi‑cloud environments. Traditional ways of building and operating software cannot…

Read More

Enterprise DevOps: A Beginner Guide to Scaling IT

Introduction Modern enterprises face the monumental challenge of delivering software at breakneck speeds without sacrificing infrastructure stability. Relying on isolated development and operations teams is no longer…

Read More