Databricks: Delta Tables – Deletion Vectors & Liquid Clustering

Delta Lake keeps improving with features that optimize performance and storage. Two of the most important recent features are: Let’s explore both in detail with examples you can run inside Databricks. πŸ”Ή 1. Introduction πŸ”Ή 2. What are Deletion Vectors? Normally, when you delete or update a row inside a Delta table: πŸ‘‰ Deletion Vectors … Read more

Databricks: Delta Tables MERGE & UPSERT (SCD1 + Soft Deletes)

This tutorial covers how to perform upserts (MERGE) in Delta tables on Databricks, with both hard deletes and soft deletes (using SCD1 style). 1. πŸ”Ή Introduction In Delta Lake, the MERGE operation (also called UPSERT) lets you: πŸ‘‰ This is the same concept as Slowly Changing Dimension Type 1 (SCD1) in data warehousing. 2. πŸ”Ή … Read more

Databricks: Delta Tables, Catalogs, Views, and Clones

This tutorial will walk you through core Delta Lake functionality in Databricks, including catalogs, schemas, tables, views, CTAS, deep clone, and shallow clone. Each section is backed with SQL and PySpark examples so you can directly try them inside a Databricks notebook. 1. πŸ”Ή Introduction to Delta Tables A Delta Table is a Databricks table … Read more

Databricks – Catalog, Schemas & Tables with External Location

this is exactly the core of Unity Catalog’s object model. The way Databricks resolves storage paths for managed tables depends on where you attach the external/managed location. Let’s break it down carefully by Catalog β†’ Schema β†’ Table levels. πŸ”‘ Unity Catalog Storage Hierarchy Unity Catalog object model:Metastore β†’ Catalog β†’ Schema β†’ Table 1️⃣ … Read more

Databricks Lab – Managed vs External Tables + UNDROP (with External Location setup)

Databricks Unity Catalog Tutorial Managed vs External Tables + UNDROP (with External Location setup) Introduction (what we’ll build) You’ll learn to: What’s new in Databricks? (Updates & Releases) In the workspace, click ? β†’ What’s new to see the monthly release notes (features are rolled out in stages). For reference: August 2024 release notes pages: … Read more

Databricks Lab – Working with Schemas and External Locations

We will: Unity Catalog has a 4-level hierarchy: Metastore β†’ Catalog β†’ Schema β†’ Table πŸ‘‰ Today we’ll create three schemas to see how Unity Catalog stores managed table data at different levels: Finally, we’ll create tables in each schema and validate where the actual data files are stored. Create Schema in Unity Catalog 1️⃣ … Read more

Databricks Lab – Catalog with External Location, & Storage Credentials in Unity Catalog

Good Read – https://dataopsschool.com/blog/databricks-catalog-schemas-tables-with-external-location/ 1. Create Catalog without External Location 2. Create Catalog with SQL 3. Drop Catalog and Drop Catalog Recursively 4. Create External Location in Databricks 5. Create Storage Credential in Databricks 6. Create Catalog with External Location Final Flow Summary Let’s build a practical example using your catalog dev_ext.I’ll show you: 1️⃣ … Read more

Databricks: Unity Catalog vs Catalogs vs Workspace vs Metastore

πŸ”‘ Unity Catalog vs Catalogs vs Workspace vs Metastore 1. Unity Catalog (UC) βœ… πŸ‘‰ Analogy: National Library System – it governs all libraries in a country. 2. Catalogs πŸ“š πŸ‘‰ Analogy: A library inside the national library system. 3. Schemas (Databases) πŸ‘‰ Analogy: Sections in the library (History, Science, Fiction). 4. Tables & Views … Read more