Databricks: File Storage Options on Databricks

The main file storage options in Databricks are:

  • Unity Catalog Volumes: Recommended for storing structured, semi-structured, and unstructured data, libraries, build artifacts, and configuration files. Offers robust governance, fine-grained access control, cross-workspace accessibility, and direct cloud storage integration (S3, Azure ADLS, GCS). Suitable for large files and supports audit logging.
  • Workspace Files: Intended for notebooks, SQL queries, source code files, and small project data files (usually <500MB). Access and permissions are limited to a single workspace. Useful for temporary or development artifacts; supports Git folder integration for version control.
  • Databricks File System (DBFS): Distributed file system abstraction layered over cloud object storage. Provides a unified, Unix-like interface for all clusters; holds files in directories such as /FileStore, /databricks-datasets, and /user/hive/warehouse. DBFS is not recommended for new workflows due to limited security controls (all workspace users have access) and governance features.
  • Direct Cloud Object Storage Access: Use native protocols (such as abfss:// for Azure, s3:// for AWS, gs:// for Google Cloud) to read/write files directly in object stores—usually governed via Unity Catalog external locations.
  • External Locations (via Unity Catalog): Securely register cloud storage locations for creating and governing external tables and file access. Best practice for production systems needing strong security and compliance.
  • Mount Points (/mnt, legacy): Old method of mounting external storage into the DBFS namespace (e.g., S3 buckets, ADLS containers). Deprecated in favor of Unity Catalog volumes and direct access.
OptionBest Use CaseSecurity/GovernanceNotes
Unity Catalog VolumesData, artifacts across workspacesStrongRecommended, scalable
Workspace FilesNotebooks, code, small filesWorkspace ACLsLimited to one workspace
DBFS Root & FoldersLegacy, temp, example datasetsBasicNot recommended for prod
Direct Cloud Storage (abfss/s3/gs)High-performance, large datasetsGoverned by UCPreferred for new workloads
External LocationsTables/files on cloud storageStrong (via UC)Full audit, compliance
Mount PointsLegacy scripts, migrationBasicDeprecated

For new and production-grade workloads, prefer Unity Catalog volumes, external locations, or direct cloud storage access; use workspace files for development and temporary needs. Avoid DBFS root and mount points for sensitive or critical data.

Example of Each File Storage Option in Databricks

Here are practical examples for each main Databricks file storage option, demonstrating how you’d store, access, or manage files using these systems.


1. Unity Catalog Volumes

Create a volume and write a file to it with Python:

python# Create a Unity Catalog volume via SQL (admin required)
CREATE VOLUME IF NOT EXISTS my_catalog.my_schema.my_volume 
COMMENT 'Example volume';

# Write to the volume in a notebook
with open('/Volumes/my_catalog/my_schema/my_volume/example.txt', 'w') as f:
    f.write('Unity Catalog Volume Example')
  • Access File: /Volumes/my_catalog/my_schema/my_volume/example.txt

2. Workspace Files

Upload or create a small file in the workspace (notebook or UI):

  • In the Databricks UI, go to Workspace > Files and upload demo.txt.
  • Access File: Use in notebooks as /Workspace/Files/demo.txt
python# Read from workspace file in a notebook
with open('/Workspace/Files/demo.txt', 'r') as f:
    print(f.read())

3. Databricks File System (DBFS)

Store and read a file in DBFS:

python# Save a file to DBFS (e.g., /FileStore)
dbutils.fs.put("/FileStore/my_example.txt", "DBFS example data", True)

# Read the file back
display(dbutils.fs.head('/FileStore/my_example.txt'))
  • Access File: dbfs:/FileStore/my_example.txt

4. Direct Cloud Object Storage Access (abfss, s3, gs)

Read a file directly from Azure Data Lake Storage Gen2 (example for abfss):

python# Load a CSV directly from ADLS Gen2
df = spark.read.csv("abfss://mycontainer@mystorageaccount.dfs.core.windows.net/mydata/myfile.csv")
df.show()
  • Access File: abfss://..., s3://..., or gs://...

5. External Locations (Unity Catalog)

Create an external location, then create a table from it:

sql-- Register your external location (once admin sets up storage credential)
CREATE EXTERNAL LOCATION my_ext_loc
  URL 'abfss://container@account.dfs.core.windows.net/folder/'
  WITH (STORAGE CREDENTIAL my_credential);

-- Create an external table using the registered location
CREATE TABLE my_catalog.my_schema.ext_table
LOCATION 'abfss://container@account.dfs.core.windows.net/folder/data/';
  • Access Table: Governed by Unity Catalog, referencing external cloud storage.

6. Mount Points (/mnt, legacy)

(Deprecated; not for new projects, but still seen in older scripts)

python# Mount an external storage (older pattern)
dbutils.fs.mount(
  source = "wasbs://container@account.blob.core.windows.net/",
  mount_point = "/mnt/my_mount",
  extra_configs = {"fs.azure.account.key.account.blob.core.windows.net": "key"}
)

# Access file from mount
dbutils.fs.ls("/mnt/my_mount/data/")
  • Access File: /mnt/my_mount/data/

Summary Table: Examples

Storage OptionExample Path/UsageCode/SQL Example
Unity Catalog Volume/Volumes/my_catalog/my_schema/my_volume/fileCreate volume, Python
Workspace Files/Workspace/Files/demo.txtPython
DBFSdbfs:/FileStore/my_example.txtdbutils.fs API
Direct Cloud Storageabfss://container@account/..., s3://...spark.read, SQL
External Locations (UC)Registered cloud path in Unity CatalogCREATE EXTERNAL LOCATION
Mount Points (/mnt)/mnt/my_mount/data/dbutils.fs.mount

Each storage solution fits distinct needs for governance, sharing, scalability, and compatibility in Databricks workflows.

Related Posts

Expert Certified MLOps Professional Training for Scalable ML Deployment Pipelines

Introduction The transition from traditional software development to machine learning requires a fundamental shift in how we manage production environments. The Certified MLOps Professional designation is designed…

Read More

Expert Certified MLOps Engineer Training for Production Ready Machine Learning Pipelines

Introduction The Certified MLOps Engineer is a professional benchmark designed for those who want to master the intersection of data science and systems engineering. This guide provides…

Read More

Complete Learning Path for MLOps Foundation Certification and Modern Reliability Practices

Introduction Machine Learning Operations is the critical bridge between data science experimentation and reliable production software. The MLOps Foundation Certification provides a structured approach for engineers to…

Read More

Your Ultimate Certified AIOps Manager Roadmap for IT Operations Leadership

Introduction The Certified AIOps Manager is a professional designation designed to bridge the gap between traditional IT service management and the era of autonomous, data-driven operations. This…

Read More

Secure Your IT Career with AIOps Architect Skills to Achieve Professional Growth

Introduction The modern engineering landscape is shifting from manual intervention to autonomous operations. The Certified AIOps Architect program is designed for professionals who want to bridge the…

Read More

Expert Certified AIOps Professional Roadmap for Building Intelligent Automation Driven Careers

Introduction Getting a Certified AIOps Professional credential is a major step for any engineer looking to stay ahead in the modern tech world. This guide is written…

Read More

Leave a Reply