{"id":425,"date":"2025-08-11T15:07:17","date_gmt":"2025-08-11T15:07:17","guid":{"rendered":"https:\/\/dataopsschool.com\/blog\/?p=425"},"modified":"2025-08-11T15:07:18","modified_gmt":"2025-08-11T15:07:18","slug":"databricks-step-by-step-commands-managed-vs-external-table-in-databricks","status":"publish","type":"post","link":"https:\/\/dataopsschool.com\/blog\/databricks-step-by-step-commands-managed-vs-external-table-in-databricks\/","title":{"rendered":"Databricks: Step-by-Step Commands: Managed vs. External Table in Databricks"},"content":{"rendered":"\n<p><\/p>\n\n\n\n<p>Below is a complete workflow\u2014with working SQL and Python code\u2014demonstrating how to create, manage, insert, read, and delete data for both <strong>Managed<\/strong> and <strong>External<\/strong> tables in Databricks. After each step, commands using <code>dbutils.fs<\/code> are used to check underlying file storage differences, highlighting the distinction between managed and external tables.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">1. <strong>Create a Managed Table<\/strong><\/h2>\n\n\n\n<p><strong>SQL:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">sql<code>CREATE TABLE my_catalog.my_schema.managed_example (\n    id INT,\n    name STRING\n);\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">2. <strong>Create an External Table<\/strong><\/h2>\n\n\n\n<p>Suppose the external data location is <code>\/dbfs\/FileStore\/external_table_data\/<\/code>.<\/p>\n\n\n\n<p><strong>SQL:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">sql<code>CREATE TABLE my_catalog.my_schema.external_example (\n    id INT,\n    name STRING\n)\nUSING PARQUET\nLOCATION '\/dbfs\/FileStore\/external_table_data\/';\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">3. <strong>Insert Sample Data (Both Tables)<\/strong><\/h2>\n\n\n\n<p><strong>SQL:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">sql<code>INSERT INTO my_catalog.my_schema.managed_example VALUES (1, 'Alice'), (2, 'Bob');\nINSERT INTO my_catalog.my_schema.external_example VALUES (3, 'Carol'), (4, 'Dave');\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">4. <strong>Check Table Data using SQL<\/strong><\/h2>\n\n\n\n<p><strong>SQL:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">sql<code>SELECT * FROM my_catalog.my_schema.managed_example;\nSELECT * FROM my_catalog.my_schema.external_example;\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">5. <strong>Inspect Underlying Files with dbutils (Python)<\/strong><\/h2>\n\n\n\n<pre class=\"wp-block-preformatted\">python<code><em># Managed table storage location (internally managed, path can be found using SQL below)<\/em>\ndisplay(spark.sql(\"DESCRIBE DETAIL my_catalog.my_schema.managed_example\"))\n\n<em># External table (path known, list files directly)<\/em>\ndbutils.fs.ls(\"\/FileStore\/external_table_data\/\")\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">6. <strong>Delete Table Data<\/strong><\/h2>\n\n\n\n<p><strong>Delete all rows:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">sql<code>DELETE FROM my_catalog.my_schema.managed_example;\nDELETE FROM my_catalog.my_schema.external_example;\n<\/code><\/pre>\n\n\n\n<p><strong>Or drop the tables entirely:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">sql<code>DROP TABLE my_catalog.my_schema.managed_example;\nDROP TABLE my_catalog.my_schema.external_example;\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">7. <strong>Check Data\/Files Again<\/strong><\/h2>\n\n\n\n<p><strong>SQL:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">sql<code>SELECT * FROM my_catalog.my_schema.managed_example;\nSELECT * FROM my_catalog.my_schema.external_example;\n<\/code><\/pre>\n\n\n\n<p><strong>Python:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">python<code><em># Managed table storage usually deleted after DROP (internally managed).<\/em>\n<em># If not dropped, use DESCRIBE DETAIL to review location and check DBFS.<\/em>\n<em># External table storage persists after DROP unless deleted manually:<\/em>\ndbutils.fs.ls(\"\/FileStore\/external_table_data\/\")\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Difference Between Managed and External Table in Databricks<\/strong><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Managed Table:<\/strong> Databricks manages both metadata and files. Dropping the table deletes both the metadata and the data files from storage automatically. Path is internal.<\/li>\n\n\n\n<li><strong>External Table:<\/strong> You specify a storage path. Dropping the table only deletes the metadata; files in the specified external location remain. You manage storage lifecycle yourself. Path is externalized (e.g., <code>\/dbfs\/FileStore\/external_table_data\/<\/code>).<\/li>\n<\/ul>\n\n\n\n<p><strong>dbutils.fs<\/strong> lets you inspect the file system. With managed tables, files are auto-managed and cleaned up. For external tables, you must manually manage\/remove files as needed.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><strong>Summary:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use SQL to create, insert, query, and delete tables and data.<\/li>\n\n\n\n<li>Use Python (<code>dbutils.fs<\/code>) to inspect file storage differences.<\/li>\n\n\n\n<li>Managed tables are fully handled by Databricks; external tables give you full control of file storage and persistence.<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Below is a complete workflow\u2014with working SQL and Python code\u2014demonstrating how to create, manage, insert, read, and delete data for both Managed and External tables in Databricks&#8230;. <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-425","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/425","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=425"}],"version-history":[{"count":1,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/425\/revisions"}],"predecessor-version":[{"id":426,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/425\/revisions\/426"}],"wp:attachment":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=425"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=425"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=425"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}