{"id":783,"date":"2025-08-22T13:14:43","date_gmt":"2025-08-22T13:14:43","guid":{"rendered":"https:\/\/dataopsschool.com\/blog\/?p=783"},"modified":"2025-08-22T13:14:44","modified_gmt":"2025-08-22T13:14:44","slug":"databricks-delta-tables-merge-upsert-scd1-soft-deletes","status":"publish","type":"post","link":"https:\/\/dataopsschool.com\/blog\/databricks-delta-tables-merge-upsert-scd1-soft-deletes\/","title":{"rendered":"Databricks: Delta Tables MERGE &amp; UPSERT (SCD1 + Soft Deletes)"},"content":{"rendered":"\n<p><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\"><\/h1>\n\n\n\n<p>This tutorial covers how to perform <strong>upserts (MERGE)<\/strong> in Delta tables on Databricks, with both <strong>hard deletes<\/strong> and <strong>soft deletes (using SCD1 style)<\/strong>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">1. \ud83d\udd39 Introduction<\/h2>\n\n\n\n<p>In Delta Lake, the <strong>MERGE<\/strong> operation (also called <strong>UPSERT<\/strong>) lets you:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Update<\/strong> rows in the target table when keys match.<\/li>\n\n\n\n<li><strong>Insert<\/strong> rows when keys don\u2019t exist.<\/li>\n\n\n\n<li><strong>Delete or mark inactive<\/strong> rows when they exist in target but not in source.<\/li>\n<\/ul>\n\n\n\n<p>\ud83d\udc49 This is the same concept as <strong>Slowly Changing Dimension Type 1 (SCD1)<\/strong> in data warehousing.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">2. \ud83d\udd39 Setup: Create EMP (target) table<\/h2>\n\n\n\n<p>We\u2019ll use an <strong>EMP<\/strong> table as the target.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE TABLE IF NOT EXISTS dev.bronze.emp (\n  emp_id INT,\n  emp_name STRING,\n  dept_code STRING,\n  salary DOUBLE\n);\n\n-- Insert initial data\nINSERT INTO dev.bronze.emp VALUES\n(101, 'Alice', 'D101', 10000),\n(102, 'Bob',   'D102', 12000),\n(103, 'Charlie', 'D101', 11000),\n(104, 'David', 'D103', 9000),\n(105, 'Eve',   'D104', 9500);\n\n-- Check data\nSELECT * FROM dev.bronze.emp;\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">3. \ud83d\udd39 Create EMP_UPDATES (source) table<\/h2>\n\n\n\n<p>This table will hold <strong>updates and new rows<\/strong>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE TABLE dev.bronze.emp_updates (\n  emp_id INT,\n  emp_name STRING,\n  dept_code STRING,\n  salary DOUBLE\n);\n\n-- Insert updates (existing employees with new salary)\nINSERT INTO dev.bronze.emp_updates VALUES\n(101, 'Alice', 'D101', 15000),\n(102, 'Bob',   'D102', 16000),\n(104, 'David', 'D103', 17000);\n\n-- Insert a new employee\nINSERT INTO dev.bronze.emp_updates VALUES\n(106, 'Rohan', 'D103', 10000);\n\n-- Check source data\nSELECT * FROM dev.bronze.emp_updates;\n<\/code><\/pre>\n\n\n\n<p>\ud83d\udc49 Now we have 3 updates + 1 new record.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">4. \ud83d\udd39 MERGE for Upserts<\/h2>\n\n\n\n<p>Update existing employees, insert new ones.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>MERGE INTO dev.bronze.emp AS e\nUSING dev.bronze.emp_updates AS u\nON e.emp_id = u.emp_id\n\nWHEN MATCHED THEN\n  UPDATE SET e.salary = u.salary\n\nWHEN NOT MATCHED THEN\n  INSERT *;\n<\/code><\/pre>\n\n\n\n<p>\ud83d\udccc Results:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Alice (101), Bob (102), David (104) \u2192 salary updated.<\/li>\n\n\n\n<li>Rohan (106) \u2192 inserted as new row.<\/li>\n<\/ul>\n\n\n\n<p>Check:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>SELECT * FROM dev.bronze.emp;\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">5. \ud83d\udd39 Hard Deletes (Remove missing records)<\/h2>\n\n\n\n<p>Suppose source doesn\u2019t have emp_id 103 and 105.<br>We want to <strong>delete them from target<\/strong>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>MERGE INTO dev.bronze.emp AS e\nUSING dev.bronze.emp_updates AS u\nON e.emp_id = u.emp_id\n\nWHEN NOT MATCHED BY SOURCE THEN\n  DELETE;\n<\/code><\/pre>\n\n\n\n<p>\ud83d\udccc emp_id 103, 105 \u2192 deleted.<\/p>\n\n\n\n<p>Check:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>SELECT * FROM dev.bronze.emp;\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">6. \ud83d\udd39 Soft Deletes (SCD1 style)<\/h2>\n\n\n\n<p>Instead of deleting, we <strong>mark records inactive<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Add column<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>ALTER TABLE dev.bronze.emp ADD COLUMNS (is_active STRING);\n<\/code><\/pre>\n\n\n\n<p>\ud83d\udc49 All existing rows get <code>NULL<\/code> by default.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Modify MERGE for soft delete<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>MERGE INTO dev.bronze.emp AS e\nUSING dev.bronze.emp_updates AS u\nON e.emp_id = u.emp_id\n\nWHEN MATCHED THEN\n  UPDATE SET e.salary = u.salary, e.is_active = 'Y'\n\nWHEN NOT MATCHED THEN\n  INSERT (emp_id, emp_name, dept_code, salary, is_active)\n  VALUES (u.emp_id, u.emp_name, u.dept_code, u.salary, 'Y')\n\nWHEN NOT MATCHED BY SOURCE THEN\n  UPDATE SET e.is_active = 'N';\n<\/code><\/pre>\n\n\n\n<p>\ud83d\udccc Behavior:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Updated rows \u2192 marked <strong>Y<\/strong>.<\/li>\n\n\n\n<li>New rows \u2192 inserted with <strong>Y<\/strong>.<\/li>\n\n\n\n<li>Missing rows (e.g., 103, 105) \u2192 marked as <strong>N<\/strong> instead of being deleted.<\/li>\n<\/ul>\n\n\n\n<p>Check:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>SELECT * FROM dev.bronze.emp;\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">7. \ud83d\udd39 Benefits of MERGE<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Handles <strong>incremental loads<\/strong> seamlessly.<\/li>\n\n\n\n<li>Supports <strong>real-time updates<\/strong> in Delta.<\/li>\n\n\n\n<li>Allows <strong>hard deletes<\/strong> (completely remove) or <strong>soft deletes<\/strong> (mark inactive).<\/li>\n\n\n\n<li>Enables <strong>auditable, consistent data pipelines<\/strong>.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\u2705 Final Recap<\/h1>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>MERGE (Upsert\/SCD1)<\/strong> = Update existing + Insert new.<\/li>\n\n\n\n<li>Can also <strong>delete missing records<\/strong> (<code>WHEN NOT MATCHED BY SOURCE THEN DELETE<\/code>).<\/li>\n\n\n\n<li>For <strong>soft deletes<\/strong>, add a column (e.g., <code>is_active<\/code>) and mark inactive instead of removing.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This tutorial covers how to perform upserts (MERGE) in Delta tables on Databricks, with both hard deletes and soft deletes (using SCD1 style). 1. \ud83d\udd39 Introduction In&#8230; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-783","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/783","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=783"}],"version-history":[{"count":1,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/783\/revisions"}],"predecessor-version":[{"id":784,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/783\/revisions\/784"}],"wp:attachment":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=783"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=783"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=783"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}