Below is a complete workflow—with working SQL and Python code—demonstrating how to create, manage, insert, read, and delete data for both Managed and External tables in Databricks. After each step, commands using dbutils.fs
are used to check underlying file storage differences, highlighting the distinction between managed and external tables.
1. Create a Managed Table
SQL:
sqlCREATE TABLE my_catalog.my_schema.managed_example (
id INT,
name STRING
);
2. Create an External Table
Suppose the external data location is /dbfs/FileStore/external_table_data/
.
SQL:
sqlCREATE TABLE my_catalog.my_schema.external_example (
id INT,
name STRING
)
USING PARQUET
LOCATION '/dbfs/FileStore/external_table_data/';
3. Insert Sample Data (Both Tables)
SQL:
sqlINSERT INTO my_catalog.my_schema.managed_example VALUES (1, 'Alice'), (2, 'Bob');
INSERT INTO my_catalog.my_schema.external_example VALUES (3, 'Carol'), (4, 'Dave');
4. Check Table Data using SQL
SQL:
sqlSELECT * FROM my_catalog.my_schema.managed_example;
SELECT * FROM my_catalog.my_schema.external_example;
5. Inspect Underlying Files with dbutils (Python)
python# Managed table storage location (internally managed, path can be found using SQL below)
display(spark.sql("DESCRIBE DETAIL my_catalog.my_schema.managed_example"))
# External table (path known, list files directly)
dbutils.fs.ls("/FileStore/external_table_data/")
6. Delete Table Data
Delete all rows:
sqlDELETE FROM my_catalog.my_schema.managed_example;
DELETE FROM my_catalog.my_schema.external_example;
Or drop the tables entirely:
sqlDROP TABLE my_catalog.my_schema.managed_example;
DROP TABLE my_catalog.my_schema.external_example;
7. Check Data/Files Again
SQL:
sqlSELECT * FROM my_catalog.my_schema.managed_example;
SELECT * FROM my_catalog.my_schema.external_example;
Python:
python# Managed table storage usually deleted after DROP (internally managed).
# If not dropped, use DESCRIBE DETAIL to review location and check DBFS.
# External table storage persists after DROP unless deleted manually:
dbutils.fs.ls("/FileStore/external_table_data/")
Difference Between Managed and External Table in Databricks
- Managed Table: Databricks manages both metadata and files. Dropping the table deletes both the metadata and the data files from storage automatically. Path is internal.
- External Table: You specify a storage path. Dropping the table only deletes the metadata; files in the specified external location remain. You manage storage lifecycle yourself. Path is externalized (e.g.,
/dbfs/FileStore/external_table_data/
).
dbutils.fs lets you inspect the file system. With managed tables, files are auto-managed and cleaned up. For external tables, you must manually manage/remove files as needed.
Summary:
- Use SQL to create, insert, query, and delete tables and data.
- Use Python (
dbutils.fs
) to inspect file storage differences. - Managed tables are fully handled by Databricks; external tables give you full control of file storage and persistence.