Delta Lake keeps improving with features that optimize performance and storage. Two of the most important recent features are:
- Deletion Vectors (DV) β Efficient deletes without rewriting files.
 - Liquid Clustering β Automatic clustering for better query performance on high-cardinality data.
 
Letβs explore both in detail with examples you can run inside Databricks.
πΉ 1. Introduction
- Traditional Delta operations like partitioning and Z-ordering help optimize queries.
 - But:
- Partitioning struggles with high-cardinality columns.
 - Z-ordering requires rewriting the data each time.
 
 - Two new improvements solve this:
- Deletion Vectors for efficient deletes.
 - Liquid Clustering for adaptive clustering without heavy rewrites.
 
 
πΉ 2. What are Deletion Vectors?
Normally, when you delete or update a row inside a Delta table:
- The entire Parquet file containing that row must be rewritten.
 - If the file has millions of rows, this is slow and costly.
 
π Deletion Vectors (DV) fix this:
- Instead of rewriting the file, Delta marks deleted rows with a flag.
 - On subsequent reads, Delta skips flagged rows.
 - Later, when you run OPTIMIZE (maintenance), those flagged rows are permanently removed.
 
π This significantly reduces I/O, time, and costs.
πΉ 3. How Deletion Vectors Work (Step by Step with Example)
Step 1: Create a Delta table from CSV
-- Read a CSV into a table
CREATE TABLE dev.bronze.sales
AS SELECT * FROM read_files(
  'dbfs:/databricks-datasets/retail-org/sales_orders.csv',
  format => 'csv',
  header => 'true'
);
-- Check data
SELECT * FROM dev.bronze.sales LIMIT 5;
Step 2: Verify if Deletion Vectors are enabled
DESCRIBE EXTENDED dev.bronze.sales;
Look at Table Properties:
delta.enableDeletionVectors = trueβ DV is enabled.- If itβs 
false, Delta rewrites files on delete. 
Step 3: Disable DV and delete a row
ALTER TABLE dev.bronze.sales
SET TBLPROPERTIES ('delta.enableDeletionVectors' = false);
DELETE FROM dev.bronze.sales WHERE invoiceNo = '5464';
-- Check table history
DESCRIBE HISTORY dev.bronze.sales;
π Youβll see:
- Files removed + new files added β full file rewrite happened.
 numDeletionVectorsAdded = 0.
Step 4: Enable DV and delete again
ALTER TABLE dev.bronze.sales
SET TBLPROPERTIES ('delta.enableDeletionVectors' = true);
DELETE FROM dev.bronze.sales WHERE invoiceNo = '5468';
DESCRIBE HISTORY dev.bronze.sales;
π Youβll see:
- No files rewritten.
 numDeletionVectorsAdded > 0β rows flagged as deleted.
Step 5: Clean flagged rows (optional)
OPTIMIZE dev.bronze.sales;
π This physically removes flagged rows from Parquet files during compaction.
β Key Takeaways β Deletion Vectors
- Faster deletes and updates.
 - Reduced cost (no frequent rewrites).
 - Permanent cleanup via 
OPTIMIZE. 
πΉ 4. What is Liquid Clustering?
Partitioning and Z-ordering optimize queries but come with issues:
- Partitioning β inefficient with too many/too few partitions.
 - Z-ordering β requires data rewrite for new inserts.
 
π Liquid Clustering is the solution:
- Automatically clusters new data based on chosen columns.
 - No need for expensive rewrites.
 - Handles skew, high-cardinality, and fast-growing datasets.
 
π Scenarios where Liquid Clustering shines:
- Queries with filters on high-cardinality columns (e.g., userId, invoiceId).
 - Data with imbalanced distributions.
 - Fast incremental loads (streaming or batch).
 
πΉ 5. How to Enable Liquid Clustering (Examples)
Step 1: Enable on an existing table
ALTER TABLE dev.bronze.sales
CLUSTER BY (invoiceNo);
Check metadata:
DESCRIBE HISTORY dev.bronze.sales;
π Shows clusterBy: invoiceNo.
Step 2: Create a new clustered table
CREATE TABLE dev.bronze.sales_cd
CLUSTER BY (invoiceNo)
AS SELECT * FROM dev.bronze.sales;
Check:
DESCRIBE EXTENDED dev.bronze.sales_cd;
π Clustering Columns: invoiceNo.
Step 3: Run a query with clustering column
SELECT * 
FROM dev.bronze.sales
WHERE invoiceNo = '5464';
π Query runs much faster because Delta prunes files using clustering metadata.
Step 4: Disable clustering if not needed
ALTER TABLE dev.bronze.sales CLUSTER BY NONE;
β Key Takeaways β Liquid Clustering
- No need for manual 
ZORDERmaintenance. - Handles new incoming data automatically.
 - Works best on filter/join columns.
 - Limited: clustering columns must be within the first 32 columns.
 
πΉ 6. Final Recap
Deletion Vectors
- Efficient deletes/updates with row-level flags.
 - Avoid rewriting large files.
 - Permanent cleanup via 
OPTIMIZE. 
Liquid Clustering
- Adaptive clustering β incremental-friendly.
 - Solves partitioning & Z-order limitations.
 - Best for high-cardinality columns.
 
β Tutorial Summary
- Use Deletion Vectors when you need frequent deletes/updates.
 - Use Liquid Clustering when your queries filter/join on high-cardinality columns.
 - Together, they make Delta tables more scalable, efficient, and cost-effective.