Databricks supports several types of tables, each designed for distinct storage, management, and integration scenarios. The main table types are:
- Managed Tables: Databricks (Unity Catalog) manages both the metadata and the underlying data. Data files are stored in Databricks-managed storage locations. They support Delta Lake or Apache Iceberg formats. Managed tables are recommended for full lifecycle management and optimized performance.
- External Tables: Metadata is managed by Databricks, but the data itself is stored in external cloud storage (such as ADLS, S3, or Google Cloud Storage). You specify the location explicitly, and responsibility for managing the data outside Databricks remains with you. External tables support more formats including Delta, Parquet, CSV, JSON, Avro, ORC, and text.
- Foreign Tables: These reference and query data residing in an external system or catalog service, such as Snowflake, BigQuery, or Redshift. Foreign tables are read-only, with only metadata references managed within Databricks.
- Streaming Tables: Used for continuous data pipelines and real-time streaming scenarios, typically managed by Databricks and implemented with technologies like Delta Live Tables (DLT).
- Delta Tables: Leveraging the Delta Lake format for ACID transactions, schema evolution, and time travel. Both managed and external tables in Databricks are typically Delta tables.
- Feature Tables: Specialized tables for storing ML features, supporting both managed and external storage.
- Hive Tables (Legacy): Databricks’ legacy metastore tables, stored in a Hive metastore prior to Unity Catalog adoption. These support formats such as Parquet, ORC, Avro, and CSV.
Summary Table
Table Type | Storage/Location | Management | Formats Supported | Use Case |
---|---|---|---|---|
Managed | Databricks-managed storage (internal) | Unity Catalog | Delta, Iceberg | Full lifecycle, performance, security |
External | External cloud storage (explicit path) | User | Delta, Parquet, CSV, etc. | Shared or BYO-storage scenarios |
Foreign | External database/catalog service | Databricks/External | Snowflake, BigQuery, etc. | Federation, read-only queries |
Streaming | Databricks-managed storage | Unity Catalog | Delta | Real-time/continuous processing |
Delta | Internal/External with Delta format | Unity Catalog/User | Delta Lake | ACID, time travel, governance |
Feature | Managed or external | Unity Catalog/User | Delta Lake | ML feature engineering |
Hive (Legacy) | Managed or external | Hive metastore | Parquet, ORC, Avro, CSV | Pre-Unity Catalog, legacy workflows |
Each table type is tailored for specific data governance, sharing, and analytical requirements in Databricks, with Delta Lake being the default and recommended format for new tables.
Example Queries for Each Table Type in Databricks
Below are example SQL queries for the main table types supported in Databricks: Managed Tables, External Tables, and Foreign Tables. Replace placeholder values (catalog, schema, table names, paths) with your actual environment details.
Managed Table
Managed tables store both metadata and data in Databricks-managed locations.
sqlCREATE TABLE my_catalog.my_schema.my_managed_table (
id INT,
name STRING,
created_date DATE
);
- Data is managed and stored by Databricks; only metadata location is specified, no explicit storage path needed.databricks+2
External Table
External tables reference data stored in external cloud storage such as ADLS or S3.
sqlCREATE TABLE my_catalog.my_schema.my_external_table (
id INT,
value STRING
)
LOCATION 'abfss://mycontainer@mystorageaccount.dfs.core.windows.net/data/external_table/';
- The
LOCATION
clause points to the external storage path.learn.microsoft+3
Foreign Table
Foreign tables allow querying data residing in an external database (e.g., SQL Server, PostgreSQL, Snowflake) by federating connections through Unity Catalog.
sql-- Querying a foreign table after registering a foreign catalog (read-only access):
SELECT * FROM my_foreign_catalog.my_external_db.my_foreign_table;
- Foreign tables reside within “foreign catalogs” and provide access to external systems using secure JDBC or catalog federation. You must first establish a connection and catalog mapping in the Unity Catalog pane; queries are read-only.databricks+2
Additional Reference: Querying Tables
To query any table type:
sqlSELECT * FROM catalog_name.schema_name.table_name;
- Databricks recommends using fully-qualified table names for best practice.databricks
Summary Table
Table Type | Example Query | Comment |
---|---|---|
Managed | CREATE TABLE my_catalog.my_schema.my_managed_table (id INT, name STRING, created_date DATE); | No LOCATION needed |
External | CREATE TABLE my_catalog.my_schema.my_external_table (id INT, value STRING) LOCATION '<path>'; | LOCATION required |
Foreign | SELECT * FROM my_foreign_catalog.my_external_db.my_foreign_table; | Read-only; query after setup |