🔹 1. Introduction
In Databricks, we usually store tabular data in Delta tables (structured data).
But what about:
- Unstructured (images, logs, videos)
- Semi-structured (JSON, CSV, XML)
- Other structured files (Parquet, ORC)
👉 For these, Databricks introduces Volumes, which provide a governed, secure storage layer managed by Unity Catalog.
Key Requirements
- Unity Catalog enabled in your Databricks workspace.
- Databricks Runtime 13.3 LTS or above.
🔹 2. What are Volumes?
- Volumes are part of the Unity Catalog hierarchy:
Metastore → Catalog → Schema → Volume - Just like tables, Volumes store files but are designed for file-based data.
- Volumes are governed by Unity Catalog policies (ACLs, permissions).
🔹 3. Types of Volumes
Just like tables, Volumes come in two flavors:
- Managed Volume
- Data location managed by Unity Catalog.
- Files are stored in the default managed storage.
- If you drop the volume → both data + metadata are deleted.
- External Volume
- Points to an external location (e.g., Azure Data Lake, S3, GCS).
- Requires external location + storage credential.
- If you drop the volume → only metadata is deleted, files remain.
🔹 4. Create External Location (for External Volume)
Before creating an External Volume, you must configure an External Location.
Step 1: Create a folder in Azure Storage
- Storage Account:
adbewithdata01 - Container:
data - Folder:
adb/ext_volume
Step 2: Create External Location in Databricks (UI or SQL)
Using UI:
- Go to Catalog Explorer > External Locations > Create
- Example:
- Name →
ext_volume - Credential →
sc_catalog_storage - Path →
abfss://data@adbwithdata01.dfs.core.windows.net/adb/ext_volume - Test connection → ✅ Success
- Name →
Using SQL:
CREATE EXTERNAL LOCATION ext_volume
URL 'abfss://data@adbwithdata01.dfs.core.windows.net/adb/ext_volume'
WITH STORAGE CREDENTIAL sc_catalog_storage
COMMENT 'This is for external volume';
🔹 5. Create a Managed Volume
Let’s create a managed volume in the dev.bronze schema.
CREATE VOLUME dev.bronze.managed_volume
COMMENT 'This is a managed volume';
📌 Key point:
- No
LOCATIONspecified → Unity Catalog decides storage path. - Data stored under metastore-managed location.
Check volume details:
DESCRIBE VOLUME dev.bronze.managed_volume;
Output shows:
- Location (metastore path)
- Type = MANAGED
🔹 6. Using Volumes with File Paths
When accessing volumes with dbutils.fs or %sh, you must use a special path format:
/Volumes/<catalog>/<schema>/<volume>/<subfolder>/<file>
Example:
/Volumes/dev/bronze/managed_volume/files/emp.csv
🔹 7. Example: Copy Files into Managed Volume
Step 1: Download a CSV
%sh
wget https://raw.githubusercontent.com/databricks/Spar02Hero-Datasets/main/emp.csv
ls -ltr
pwd
Assume file is saved at /databricks/driver/emp.csv.
Step 2: Create a folder inside Volume
dbutils.fs.mkdirs("/Volumes/dev/bronze/managed_volume/files")
Step 3: Copy file into Volume
dbutils.fs.cp("file:/databricks/driver/emp.csv",
"/Volumes/dev/bronze/managed_volume/files/emp.csv")
Step 4: Query file directly
SELECT *
FROM csv.`/Volumes/dev/bronze/managed_volume/files/emp.csv`;
✅ You can now read structured data (CSV, JSON, Parquet) stored in your volume.
🔹 8. Create an External Volume
Now let’s create an external volume that points to the external location we created earlier.
CREATE EXTERNAL VOLUME dev.bronze.external_volume
LOCATION 'abfss://data@adbwithdata01.dfs.core.windows.net/adb/ext_volume'
COMMENT 'External volume for semi/unstructured data';
Check details:
DESCRIBE VOLUME dev.bronze.external_volume;
- Type = EXTERNAL
- Location = Azure path provided
Step 1: Create a folder inside external volume
dbutils.fs.mkdirs("/Volumes/dev/bronze/external_volume/files")
Step 2: Copy file into external volume
dbutils.fs.cp("file:/databricks/driver/emp.csv",
"/Volumes/dev/bronze/external_volume/files/emp.csv")
Step 3: Verify in Azure Portal
- Navigate to
adb/ext_volume/files/emp.csv - File is now available outside Databricks too.
🔹 9. Drop a Volume
- Managed Volume → drops data + metadata.
- External Volume → drops only metadata; files remain in storage.
Example:
-- Drop external volume
DROP VOLUME dev.bronze.external_volume;
-- Files still exist in Azure
If you recreate the volume pointing to the same location:
CREATE EXTERNAL VOLUME dev.bronze.external_volume
LOCATION 'abfss://data@adbwithdata01.dfs.core.windows.net/adb/ext_volume';
👉 Files reappear inside Databricks.
🔹 10. Summary
- Volumes allow Databricks to govern files (structured/unstructured) under Unity Catalog.
- Managed Volume → fully controlled by Databricks, data removed on drop.
- External Volume → points to external storage, dropping only removes metadata.
- File access always via
/Volumes/<catalog>/<schema>/<volume>/.... - You can read, write, copy, and query files in volumes with SQL or dbutils.