Databricks: Step-by-Step Commands: Managed vs. External Table in Databricks

Below is a complete workflow—with working SQL and Python code—demonstrating how to create, manage, insert, read, and delete data for both Managed and External tables in Databricks. After each step, commands using dbutils.fs are used to check underlying file storage differences, highlighting the distinction between managed and external tables.


1. Create a Managed Table

SQL:

sqlCREATE TABLE my_catalog.my_schema.managed_example (
    id INT,
    name STRING
);

2. Create an External Table

Suppose the external data location is /dbfs/FileStore/external_table_data/.

SQL:

sqlCREATE TABLE my_catalog.my_schema.external_example (
    id INT,
    name STRING
)
USING PARQUET
LOCATION '/dbfs/FileStore/external_table_data/';

3. Insert Sample Data (Both Tables)

SQL:

sqlINSERT INTO my_catalog.my_schema.managed_example VALUES (1, 'Alice'), (2, 'Bob');
INSERT INTO my_catalog.my_schema.external_example VALUES (3, 'Carol'), (4, 'Dave');

4. Check Table Data using SQL

SQL:

sqlSELECT * FROM my_catalog.my_schema.managed_example;
SELECT * FROM my_catalog.my_schema.external_example;

5. Inspect Underlying Files with dbutils (Python)

python# Managed table storage location (internally managed, path can be found using SQL below)
display(spark.sql("DESCRIBE DETAIL my_catalog.my_schema.managed_example"))

# External table (path known, list files directly)
dbutils.fs.ls("/FileStore/external_table_data/")

6. Delete Table Data

Delete all rows:

sqlDELETE FROM my_catalog.my_schema.managed_example;
DELETE FROM my_catalog.my_schema.external_example;

Or drop the tables entirely:

sqlDROP TABLE my_catalog.my_schema.managed_example;
DROP TABLE my_catalog.my_schema.external_example;

7. Check Data/Files Again

SQL:

sqlSELECT * FROM my_catalog.my_schema.managed_example;
SELECT * FROM my_catalog.my_schema.external_example;

Python:

python# Managed table storage usually deleted after DROP (internally managed).
# If not dropped, use DESCRIBE DETAIL to review location and check DBFS.
# External table storage persists after DROP unless deleted manually:
dbutils.fs.ls("/FileStore/external_table_data/")

Difference Between Managed and External Table in Databricks

  • Managed Table: Databricks manages both metadata and files. Dropping the table deletes both the metadata and the data files from storage automatically. Path is internal.
  • External Table: You specify a storage path. Dropping the table only deletes the metadata; files in the specified external location remain. You manage storage lifecycle yourself. Path is externalized (e.g., /dbfs/FileStore/external_table_data/).

dbutils.fs lets you inspect the file system. With managed tables, files are auto-managed and cleaned up. For external tables, you must manually manage/remove files as needed.


Summary:

  • Use SQL to create, insert, query, and delete tables and data.
  • Use Python (dbutils.fs) to inspect file storage differences.
  • Managed tables are fully handled by Databricks; external tables give you full control of file storage and persistence.

Related Posts

Expert Certified MLOps Professional Training for Scalable ML Deployment Pipelines

Introduction The transition from traditional software development to machine learning requires a fundamental shift in how we manage production environments. The Certified MLOps Professional designation is designed…

Read More

Expert Certified MLOps Engineer Training for Production Ready Machine Learning Pipelines

Introduction The Certified MLOps Engineer is a professional benchmark designed for those who want to master the intersection of data science and systems engineering. This guide provides…

Read More

Complete Learning Path for MLOps Foundation Certification and Modern Reliability Practices

Introduction Machine Learning Operations is the critical bridge between data science experimentation and reliable production software. The MLOps Foundation Certification provides a structured approach for engineers to…

Read More

Your Ultimate Certified AIOps Manager Roadmap for IT Operations Leadership

Introduction The Certified AIOps Manager is a professional designation designed to bridge the gap between traditional IT service management and the era of autonomous, data-driven operations. This…

Read More

Secure Your IT Career with AIOps Architect Skills to Achieve Professional Growth

Introduction The modern engineering landscape is shifting from manual intervention to autonomous operations. The Certified AIOps Architect program is designed for professionals who want to bridge the…

Read More

Expert Certified AIOps Professional Roadmap for Building Intelligent Automation Driven Careers

Introduction Getting a Certified AIOps Professional credential is a major step for any engineer looking to stay ahead in the modern tech world. This guide is written…

Read More

Leave a Reply