{"id":787,"date":"2025-08-22T13:53:57","date_gmt":"2025-08-22T13:53:57","guid":{"rendered":"https:\/\/dataopsschool.com\/blog\/?p=787"},"modified":"2025-08-22T13:53:58","modified_gmt":"2025-08-22T13:53:58","slug":"databricks-using-volumes-in-databricks-with-unity-catalog","status":"publish","type":"post","link":"https:\/\/dataopsschool.com\/blog\/databricks-using-volumes-in-databricks-with-unity-catalog\/","title":{"rendered":"Databricks: Using Volumes in Databricks with Unity Catalog"},"content":{"rendered":"\n<p><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\"><\/h1>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udd39 1. Introduction<\/h2>\n\n\n\n<p>In Databricks, we usually store <strong>tabular data<\/strong> in <strong>Delta tables<\/strong> (structured data).<br>But what about:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Unstructured<\/strong> (images, logs, videos)<\/li>\n\n\n\n<li><strong>Semi-structured<\/strong> (JSON, CSV, XML)<\/li>\n\n\n\n<li><strong>Other structured files<\/strong> (Parquet, ORC)<\/li>\n<\/ul>\n\n\n\n<p>\ud83d\udc49 For these, Databricks introduces <strong>Volumes<\/strong>, which provide a <strong>governed, secure storage layer<\/strong> managed by <strong>Unity Catalog<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Key Requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Unity Catalog enabled<\/strong> in your Databricks workspace.<\/li>\n\n\n\n<li><strong>Databricks Runtime 13.3 LTS or above<\/strong>.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udd39 2. What are Volumes?<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Volumes are part of the <strong>Unity Catalog hierarchy<\/strong>: <code>Metastore \u2192 Catalog \u2192 Schema \u2192 Volume<\/code><\/li>\n\n\n\n<li>Just like tables, <strong>Volumes store files<\/strong> but are designed for <strong>file-based data<\/strong>.<\/li>\n\n\n\n<li>Volumes are governed by Unity Catalog policies (ACLs, permissions).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udd39 3. Types of Volumes<\/h2>\n\n\n\n<p>Just like tables, Volumes come in two flavors:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Managed Volume<\/strong>\n<ul class=\"wp-block-list\">\n<li>Data location managed by Unity Catalog.<\/li>\n\n\n\n<li>Files are stored in the default managed storage.<\/li>\n\n\n\n<li>If you drop the volume \u2192 both <strong>data + metadata<\/strong> are deleted.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>External Volume<\/strong>\n<ul class=\"wp-block-list\">\n<li>Points to an <strong>external location<\/strong> (e.g., Azure Data Lake, S3, GCS).<\/li>\n\n\n\n<li>Requires <strong>external location + storage credential<\/strong>.<\/li>\n\n\n\n<li>If you drop the volume \u2192 only metadata is deleted, <strong>files remain<\/strong>.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udd39 4. Create External Location (for External Volume)<\/h2>\n\n\n\n<p>Before creating an <strong>External Volume<\/strong>, you must configure an <strong>External Location<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Create a folder in Azure Storage<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Storage Account: <code>adbewithdata01<\/code><\/li>\n\n\n\n<li>Container: <code>data<\/code><\/li>\n\n\n\n<li>Folder: <code>adb\/ext_volume<\/code><\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Create External Location in Databricks (UI or SQL)<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">Using UI:<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Go to <strong>Catalog Explorer > External Locations > Create<\/strong><\/li>\n\n\n\n<li>Example:\n<ul class=\"wp-block-list\">\n<li>Name \u2192 <code>ext_volume<\/code><\/li>\n\n\n\n<li>Credential \u2192 <code>sc_catalog_storage<\/code><\/li>\n\n\n\n<li>Path \u2192 <code>abfss:\/\/data@adbwithdata01.dfs.core.windows.net\/adb\/ext_volume<\/code><\/li>\n\n\n\n<li>Test connection \u2192 \u2705 Success<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Using SQL:<\/h4>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE EXTERNAL LOCATION ext_volume\nURL 'abfss:\/\/data@adbwithdata01.dfs.core.windows.net\/adb\/ext_volume'\nWITH STORAGE CREDENTIAL sc_catalog_storage\nCOMMENT 'This is for external volume';\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udd39 5. Create a Managed Volume<\/h2>\n\n\n\n<p>Let\u2019s create a <strong>managed volume<\/strong> in the <code>dev.bronze<\/code> schema.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE VOLUME dev.bronze.managed_volume\nCOMMENT 'This is a managed volume';\n<\/code><\/pre>\n\n\n\n<p>\ud83d\udccc Key point:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>No <code>LOCATION<\/code> specified \u2192 Unity Catalog decides storage path.<\/li>\n\n\n\n<li>Data stored under <strong>metastore-managed location<\/strong>.<\/li>\n<\/ul>\n\n\n\n<p>Check volume details:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>DESCRIBE VOLUME dev.bronze.managed_volume;\n<\/code><\/pre>\n\n\n\n<p>Output shows:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Location (metastore path)<\/li>\n\n\n\n<li>Type = <strong>MANAGED<\/strong><\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udd39 6. Using Volumes with File Paths<\/h2>\n\n\n\n<p>When accessing volumes with <strong>dbutils.fs<\/strong> or <code>%sh<\/code>, you must use a special path format:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/Volumes\/&lt;catalog&gt;\/&lt;schema&gt;\/&lt;volume&gt;\/&lt;subfolder&gt;\/&lt;file&gt;\n<\/code><\/pre>\n\n\n\n<p>Example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/Volumes\/dev\/bronze\/managed_volume\/files\/emp.csv\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udd39 7. Example: Copy Files into Managed Volume<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Download a CSV<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>%sh\nwget https:\/\/raw.githubusercontent.com\/databricks\/Spar02Hero-Datasets\/main\/emp.csv\nls -ltr\npwd\n<\/code><\/pre>\n\n\n\n<p>Assume file is saved at <code>\/databricks\/driver\/emp.csv<\/code>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Create a folder inside Volume<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>dbutils.fs.mkdirs(\"\/Volumes\/dev\/bronze\/managed_volume\/files\")\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3: Copy file into Volume<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>dbutils.fs.cp(\"file:\/databricks\/driver\/emp.csv\",\n              \"\/Volumes\/dev\/bronze\/managed_volume\/files\/emp.csv\")\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 4: Query file directly<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>SELECT * \nFROM csv.`\/Volumes\/dev\/bronze\/managed_volume\/files\/emp.csv`;\n<\/code><\/pre>\n\n\n\n<p>\u2705 You can now read structured data (CSV, JSON, Parquet) stored in your volume.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udd39 8. Create an External Volume<\/h2>\n\n\n\n<p>Now let\u2019s create an <strong>external volume<\/strong> that points to the <strong>external location<\/strong> we created earlier.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE EXTERNAL VOLUME dev.bronze.external_volume\nLOCATION 'abfss:\/\/data@adbwithdata01.dfs.core.windows.net\/adb\/ext_volume'\nCOMMENT 'External volume for semi\/unstructured data';\n<\/code><\/pre>\n\n\n\n<p>Check details:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>DESCRIBE VOLUME dev.bronze.external_volume;\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Type = <strong>EXTERNAL<\/strong><\/li>\n\n\n\n<li>Location = <strong>Azure path provided<\/strong><\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Create a folder inside external volume<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>dbutils.fs.mkdirs(\"\/Volumes\/dev\/bronze\/external_volume\/files\")\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Copy file into external volume<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>dbutils.fs.cp(\"file:\/databricks\/driver\/emp.csv\",\n              \"\/Volumes\/dev\/bronze\/external_volume\/files\/emp.csv\")\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3: Verify in Azure Portal<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Navigate to <code>adb\/ext_volume\/files\/emp.csv<\/code><\/li>\n\n\n\n<li>File is now available outside Databricks too.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udd39 9. Drop a Volume<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Managed Volume<\/strong> \u2192 drops data + metadata.<\/li>\n\n\n\n<li><strong>External Volume<\/strong> \u2192 drops only metadata; files remain in storage.<\/li>\n<\/ul>\n\n\n\n<p>Example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>-- Drop external volume\nDROP VOLUME dev.bronze.external_volume;\n\n-- Files still exist in Azure\n<\/code><\/pre>\n\n\n\n<p>If you recreate the volume pointing to the same location:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE EXTERNAL VOLUME dev.bronze.external_volume\nLOCATION 'abfss:\/\/data@adbwithdata01.dfs.core.windows.net\/adb\/ext_volume';\n<\/code><\/pre>\n\n\n\n<p>\ud83d\udc49 Files reappear inside Databricks.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udd39 10. Summary<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Volumes<\/strong> allow Databricks to govern <strong>files<\/strong> (structured\/unstructured) under Unity Catalog.<\/li>\n\n\n\n<li><strong>Managed Volume<\/strong> \u2192 fully controlled by Databricks, data removed on drop.<\/li>\n\n\n\n<li><strong>External Volume<\/strong> \u2192 points to external storage, dropping only removes metadata.<\/li>\n\n\n\n<li>File access always via <code>\/Volumes\/&lt;catalog>\/&lt;schema>\/&lt;volume>\/...<\/code>.<\/li>\n\n\n\n<li>You can <strong>read, write, copy, and query files<\/strong> in volumes with SQL or dbutils.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\ud83d\udd39 1. Introduction In Databricks, we usually store tabular data in Delta tables (structured data).But what about: \ud83d\udc49 For these, Databricks introduces Volumes, which provide a governed,&#8230; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-787","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/787","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=787"}],"version-history":[{"count":1,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/787\/revisions"}],"predecessor-version":[{"id":788,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/787\/revisions\/788"}],"wp:attachment":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=787"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=787"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=787"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}