Databricks: Working with Different Types of Tables

Databricks supports several types of tables, each designed for distinct storage, management, and integration scenarios. The main table types are:

  • Managed Tables: Databricks (Unity Catalog) manages both the metadata and the underlying data. Data files are stored in Databricks-managed storage locations. They support Delta Lake or Apache Iceberg formats. Managed tables are recommended for full lifecycle management and optimized performance.
  • External Tables: Metadata is managed by Databricks, but the data itself is stored in external cloud storage (such as ADLS, S3, or Google Cloud Storage). You specify the location explicitly, and responsibility for managing the data outside Databricks remains with you. External tables support more formats including Delta, Parquet, CSV, JSON, Avro, ORC, and text.
  • Foreign Tables: These reference and query data residing in an external system or catalog service, such as Snowflake, BigQuery, or Redshift. Foreign tables are read-only, with only metadata references managed within Databricks.
  • Streaming Tables: Used for continuous data pipelines and real-time streaming scenarios, typically managed by Databricks and implemented with technologies like Delta Live Tables (DLT).
  • Delta Tables: Leveraging the Delta Lake format for ACID transactions, schema evolution, and time travel. Both managed and external tables in Databricks are typically Delta tables.
  • Feature Tables: Specialized tables for storing ML features, supporting both managed and external storage.
  • Hive Tables (Legacy): Databricks’ legacy metastore tables, stored in a Hive metastore prior to Unity Catalog adoption. These support formats such as Parquet, ORC, Avro, and CSV.

Summary Table

Table TypeStorage/LocationManagementFormats SupportedUse Case
ManagedDatabricks-managed storage (internal)Unity CatalogDelta, IcebergFull lifecycle, performance, security
ExternalExternal cloud storage (explicit path)UserDelta, Parquet, CSV, etc.Shared or BYO-storage scenarios
ForeignExternal database/catalog serviceDatabricks/ExternalSnowflake, BigQuery, etc.Federation, read-only queries
StreamingDatabricks-managed storageUnity CatalogDeltaReal-time/continuous processing
DeltaInternal/External with Delta formatUnity Catalog/UserDelta LakeACID, time travel, governance
FeatureManaged or externalUnity Catalog/UserDelta LakeML feature engineering
Hive (Legacy)Managed or externalHive metastoreParquet, ORC, Avro, CSVPre-Unity Catalog, legacy workflows

Each table type is tailored for specific data governance, sharing, and analytical requirements in Databricks, with Delta Lake being the default and recommended format for new tables.
Example Queries for Each Table Type in Databricks

Below are example SQL queries for the main table types supported in Databricks: Managed Tables, External Tables, and Foreign Tables. Replace placeholder values (catalog, schema, table names, paths) with your actual environment details.


Managed Table

Managed tables store both metadata and data in Databricks-managed locations.

sqlCREATE TABLE my_catalog.my_schema.my_managed_table (
    id INT,
    name STRING,
    created_date DATE
);
  • Data is managed and stored by Databricks; only metadata location is specified, no explicit storage path needed.databricks+2

External Table

External tables reference data stored in external cloud storage such as ADLS or S3.

sqlCREATE TABLE my_catalog.my_schema.my_external_table (
    id INT,
    value STRING
)
LOCATION 'abfss://mycontainer@mystorageaccount.dfs.core.windows.net/data/external_table/';

Foreign Table

Foreign tables allow querying data residing in an external database (e.g., SQL Server, PostgreSQL, Snowflake) by federating connections through Unity Catalog.

sql-- Querying a foreign table after registering a foreign catalog (read-only access):
SELECT * FROM my_foreign_catalog.my_external_db.my_foreign_table;
  • Foreign tables reside within “foreign catalogs” and provide access to external systems using secure JDBC or catalog federation. You must first establish a connection and catalog mapping in the Unity Catalog pane; queries are read-only.databricks+2

Additional Reference: Querying Tables

To query any table type:

sqlSELECT * FROM catalog_name.schema_name.table_name;
  • Databricks recommends using fully-qualified table names for best practice.databricks

Summary Table

Table TypeExample QueryComment
ManagedCREATE TABLE my_catalog.my_schema.my_managed_table (id INT, name STRING, created_date DATE);No LOCATION needed
ExternalCREATE TABLE my_catalog.my_schema.my_external_table (id INT, value STRING) LOCATION '<path>';LOCATION required
ForeignSELECT * FROM my_foreign_catalog.my_external_db.my_foreign_table;Read-only; query after setup

Leave a Comment