{"id":734,"date":"2025-08-19T12:49:35","date_gmt":"2025-08-19T12:49:35","guid":{"rendered":"https:\/\/dataopsschool.com\/blog\/?p=734"},"modified":"2025-08-19T12:49:36","modified_gmt":"2025-08-19T12:49:36","slug":"databricks-components","status":"publish","type":"post","link":"https:\/\/dataopsschool.com\/blog\/databricks-components\/","title":{"rendered":"Databricks Components"},"content":{"rendered":"\n<p><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\"><strong>Databricks Components Hierarchy<\/strong><\/h1>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>1. Account Level (Top Layer)<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Account Console<\/strong> \u2013 central place to manage everything across workspaces.<\/li>\n\n\n\n<li><strong>Workspaces<\/strong> \u2013 logical environments where teams work.<\/li>\n\n\n\n<li><strong>Unity Catalog (Metastore)<\/strong> \u2013 unified governance across all workspaces.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>2. Governance &amp; Data Management<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Unity Catalog<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>Catalogs<\/strong> \u2192 top container of data assets.<\/li>\n\n\n\n<li><strong>Schemas (Databases)<\/strong> \u2192 inside catalogs.<\/li>\n\n\n\n<li><strong>Tables<\/strong> \u2192 structured data (Managed \/ External).<\/li>\n\n\n\n<li><strong>Views<\/strong> \u2192 logical queries on tables.<\/li>\n\n\n\n<li><strong>Volumes<\/strong> \u2192 for non-tabular data (images, PDFs, etc.).<\/li>\n\n\n\n<li><strong>Models<\/strong> \u2192 ML models registered.<\/li>\n\n\n\n<li><strong>Functions<\/strong> \u2192 SQL or Python-defined functions.<\/li>\n\n\n\n<li><strong>Lineage<\/strong> \u2192 track where data comes from and how it\u2019s used.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Access Control<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>Users<\/strong> \u2192 individual identities.<\/li>\n\n\n\n<li><strong>Groups<\/strong> \u2192 manage permissions collectively.<\/li>\n\n\n\n<li><strong>Service Principals<\/strong> \u2192 for apps\/automation.<\/li>\n\n\n\n<li><strong>ACLs (Access Control Lists)<\/strong> \u2192 fine-grained permissions.<\/li>\n\n\n\n<li><strong>Personal Access Tokens (PATs)<\/strong> \u2192 authentication for APIs.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>3. Computation &amp; Execution<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Clusters<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>All-purpose clusters<\/strong> \u2192 interactive, shared by users.<\/li>\n\n\n\n<li><strong>Job clusters<\/strong> \u2192 spin up just for a job, then shut down.<\/li>\n\n\n\n<li><strong>Pools<\/strong> \u2192 pre-warmed instances to reduce cluster spin-up time.<\/li>\n\n\n\n<li><strong>Databricks Runtime (DBR)<\/strong> \u2192 core software stack (Spark + optimizations).\n<ul class=\"wp-block-list\">\n<li>DBR for Machine Learning (ML\/DL libraries pre-installed).<\/li>\n\n\n\n<li>DBR for Genomics, SQL, etc.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Jobs &amp; Pipelines<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>Jobs UI<\/strong> \u2192 scheduling &amp; automation of notebooks, SQL, scripts.<\/li>\n\n\n\n<li><strong>Lakeflow Declarative Pipelines<\/strong> \u2192 manage Delta tables with orchestration.<\/li>\n\n\n\n<li><strong>Workflows<\/strong> \u2192 CI\/CD style orchestration.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Workloads<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>Data Engineering<\/strong> \u2192 ETL, batch jobs.<\/li>\n\n\n\n<li><strong>Data Analytics<\/strong> \u2192 interactive queries, dashboards.<\/li>\n\n\n\n<li><strong>Machine Learning<\/strong> \u2192 model training\/inference.<\/li>\n\n\n\n<li><strong>Streaming<\/strong> \u2192 real-time with Structured Streaming.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>4. Developer Interfaces<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Workspace UI<\/strong> \u2192 notebooks, data, clusters, jobs, dashboards.<\/li>\n\n\n\n<li><strong>Notebooks<\/strong> \u2192 code in Python, SQL, R, Scala.<\/li>\n\n\n\n<li><strong>Dashboards<\/strong> \u2192 visual insights.<\/li>\n\n\n\n<li><strong>Git Folders (Repos)<\/strong> \u2192 version control integration.<\/li>\n\n\n\n<li><strong>Libraries<\/strong> \u2192 attach external or custom libraries.<\/li>\n\n\n\n<li><strong>Catalog Explorer<\/strong> \u2192 browse data assets.<\/li>\n\n\n\n<li><strong>APIs &amp; Tools<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>REST API<\/strong> \u2192 programmatic access.<\/li>\n\n\n\n<li><strong>SQL REST API<\/strong> \u2192 SQL automation.<\/li>\n\n\n\n<li><strong>CLI<\/strong> \u2192 Databricks command line tool.<\/li>\n\n\n\n<li><strong>dbutils<\/strong> \u2192 utility commands inside notebooks.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>5. Data &amp; AI Layers<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Delta Lake (Default Table Format)<\/strong>\n<ul class=\"wp-block-list\">\n<li>Delta Tables<\/li>\n\n\n\n<li>Delta Transaction Logs (ACID)<\/li>\n\n\n\n<li>Time Travel, Schema Evolution<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Lakehouse Storage Pattern<\/strong>\n<ul class=\"wp-block-list\">\n<li>Bronze \u2192 Raw data<\/li>\n\n\n\n<li>Silver \u2192 Clean\/curated data<\/li>\n\n\n\n<li>Gold \u2192 Business-ready data<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>AI &amp; ML (Mosaic AI)<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>MLflow<\/strong> \u2192 experiment tracking, model registry.<\/li>\n\n\n\n<li><strong>Feature Store<\/strong> \u2192 reusable features for ML.<\/li>\n\n\n\n<li><strong>Generative AI (LLMs)<\/strong> \u2192 foundation models, fine-tuning.<\/li>\n\n\n\n<li><strong>AI Playground<\/strong> \u2192 test LLMs interactively.<\/li>\n\n\n\n<li><strong>Model Serving<\/strong> \u2192 REST API for deploying models.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>\u2705 <strong>In one line:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Account Console<\/strong> (top) \u2192 <strong>Workspaces<\/strong> \u2192 <strong>Unity Catalog (Governance)<\/strong> \u2192 <strong>Data Assets<\/strong> (Tables, Schemas, Models, Volumes) \u2192 <strong>Compute (Clusters, Jobs, Pipelines)<\/strong> \u2192 <strong>Developer Interfaces (Notebooks, APIs, CLI)<\/strong> \u2192 <strong>AI\/ML &amp; Analytics Tools<\/strong>.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Databricks Components Hierarchy 1. Account Level (Top Layer) 2. Governance &amp; Data Management 3. Computation &amp; Execution 4. Developer Interfaces 5. Data &amp; AI Layers \u2705 In&#8230; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-734","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/734","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=734"}],"version-history":[{"count":1,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/734\/revisions"}],"predecessor-version":[{"id":735,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/734\/revisions\/735"}],"wp:attachment":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=734"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=734"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=734"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}