Databricks: Working with Different Types of Tables

Databricks supports several types of tables, each designed for distinct storage, management, and integration scenarios. The main table types are:

  • Managed Tables: Databricks (Unity Catalog) manages both the metadata and the underlying data. Data files are stored in Databricks-managed storage locations. They support Delta Lake or Apache Iceberg formats. Managed tables are recommended for full lifecycle management and optimized performance.
  • External Tables: Metadata is managed by Databricks, but the data itself is stored in external cloud storage (such as ADLS, S3, or Google Cloud Storage). You specify the location explicitly, and responsibility for managing the data outside Databricks remains with you. External tables support more formats including Delta, Parquet, CSV, JSON, Avro, ORC, and text.
  • Foreign Tables: These reference and query data residing in an external system or catalog service, such as Snowflake, BigQuery, or Redshift. Foreign tables are read-only, with only metadata references managed within Databricks.
  • Streaming Tables: Used for continuous data pipelines and real-time streaming scenarios, typically managed by Databricks and implemented with technologies like Delta Live Tables (DLT).
  • Delta Tables: Leveraging the Delta Lake format for ACID transactions, schema evolution, and time travel. Both managed and external tables in Databricks are typically Delta tables.
  • Feature Tables: Specialized tables for storing ML features, supporting both managed and external storage.
  • Hive Tables (Legacy): Databricks’ legacy metastore tables, stored in a Hive metastore prior to Unity Catalog adoption. These support formats such as Parquet, ORC, Avro, and CSV.

Summary Table

Table TypeStorage/LocationManagementFormats SupportedUse Case
ManagedDatabricks-managed storage (internal)Unity CatalogDelta, IcebergFull lifecycle, performance, security
ExternalExternal cloud storage (explicit path)UserDelta, Parquet, CSV, etc.Shared or BYO-storage scenarios
ForeignExternal database/catalog serviceDatabricks/ExternalSnowflake, BigQuery, etc.Federation, read-only queries
StreamingDatabricks-managed storageUnity CatalogDeltaReal-time/continuous processing
DeltaInternal/External with Delta formatUnity Catalog/UserDelta LakeACID, time travel, governance
FeatureManaged or externalUnity Catalog/UserDelta LakeML feature engineering
Hive (Legacy)Managed or externalHive metastoreParquet, ORC, Avro, CSVPre-Unity Catalog, legacy workflows

Each table type is tailored for specific data governance, sharing, and analytical requirements in Databricks, with Delta Lake being the default and recommended format for new tables.
Example Queries for Each Table Type in Databricks

Below are example SQL queries for the main table types supported in Databricks: Managed Tables, External Tables, and Foreign Tables. Replace placeholder values (catalog, schema, table names, paths) with your actual environment details.


Managed Table

Managed tables store both metadata and data in Databricks-managed locations.

sqlCREATE TABLE my_catalog.my_schema.my_managed_table (
    id INT,
    name STRING,
    created_date DATE
);
  • Data is managed and stored by Databricks; only metadata location is specified, no explicit storage path needed.databricks+2

External Table

External tables reference data stored in external cloud storage such as ADLS or S3.

sqlCREATE TABLE my_catalog.my_schema.my_external_table (
    id INT,
    value STRING
)
LOCATION 'abfss://mycontainer@mystorageaccount.dfs.core.windows.net/data/external_table/';

Foreign Table

Foreign tables allow querying data residing in an external database (e.g., SQL Server, PostgreSQL, Snowflake) by federating connections through Unity Catalog.

sql-- Querying a foreign table after registering a foreign catalog (read-only access):
SELECT * FROM my_foreign_catalog.my_external_db.my_foreign_table;
  • Foreign tables reside within “foreign catalogs” and provide access to external systems using secure JDBC or catalog federation. You must first establish a connection and catalog mapping in the Unity Catalog pane; queries are read-only.databricks+2

Additional Reference: Querying Tables

To query any table type:

sqlSELECT * FROM catalog_name.schema_name.table_name;
  • Databricks recommends using fully-qualified table names for best practice.databricks

Summary Table

Table TypeExample QueryComment
ManagedCREATE TABLE my_catalog.my_schema.my_managed_table (id INT, name STRING, created_date DATE);No LOCATION needed
ExternalCREATE TABLE my_catalog.my_schema.my_external_table (id INT, value STRING) LOCATION '<path>';LOCATION required
ForeignSELECT * FROM my_foreign_catalog.my_external_db.my_foreign_table;Read-only; query after setup

Related Posts

Expert Certified MLOps Professional Training for Scalable ML Deployment Pipelines

Introduction The transition from traditional software development to machine learning requires a fundamental shift in how we manage production environments. The Certified MLOps Professional designation is designed…

Read More

Expert Certified MLOps Engineer Training for Production Ready Machine Learning Pipelines

Introduction The Certified MLOps Engineer is a professional benchmark designed for those who want to master the intersection of data science and systems engineering. This guide provides…

Read More

Complete Learning Path for MLOps Foundation Certification and Modern Reliability Practices

Introduction Machine Learning Operations is the critical bridge between data science experimentation and reliable production software. The MLOps Foundation Certification provides a structured approach for engineers to…

Read More

Your Ultimate Certified AIOps Manager Roadmap for IT Operations Leadership

Introduction The Certified AIOps Manager is a professional designation designed to bridge the gap between traditional IT service management and the era of autonomous, data-driven operations. This…

Read More

Secure Your IT Career with AIOps Architect Skills to Achieve Professional Growth

Introduction The modern engineering landscape is shifting from manual intervention to autonomous operations. The Certified AIOps Architect program is designed for professionals who want to bridge the…

Read More

Expert Certified AIOps Professional Roadmap for Building Intelligent Automation Driven Careers

Introduction Getting a Certified AIOps Professional credential is a major step for any engineer looking to stay ahead in the modern tech world. This guide is written…

Read More

Leave a Reply