{"id":793,"date":"2025-08-22T14:37:22","date_gmt":"2025-08-22T14:37:22","guid":{"rendered":"https:\/\/dataopsschool.com\/blog\/?p=793"},"modified":"2025-08-22T14:37:23","modified_gmt":"2025-08-22T14:37:23","slug":"databricks-databricks-compute-clusters-access-modes-policies-and-permissions","status":"publish","type":"post","link":"https:\/\/dataopsschool.com\/blog\/databricks-databricks-compute-clusters-access-modes-policies-and-permissions\/","title":{"rendered":"Databricks: Databricks Compute (Clusters, Access Modes, Policies, and Permissions)"},"content":{"rendered":"\n<p><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">1. What is Compute in Databricks?<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Compute = processing power<\/strong> in Databricks.<\/li>\n\n\n\n<li>In practice, compute means <strong>clusters<\/strong> (a group of virtual machines).<\/li>\n\n\n\n<li>A cluster always has:\n<ul class=\"wp-block-list\">\n<li><strong>Driver node<\/strong> \u2192 coordinates the job.<\/li>\n\n\n\n<li><strong>Worker nodes<\/strong> \u2192 perform actual data processing.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">2. Types of Compute in Databricks<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udd39 All-Purpose Compute<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Interactive clusters<\/strong> used for notebooks, SQL queries, or ad-hoc jobs.<\/li>\n\n\n\n<li>Stay running until manually terminated or auto-terminated.<\/li>\n\n\n\n<li>Good for:\n<ul class=\"wp-block-list\">\n<li>Exploratory data analysis<\/li>\n\n\n\n<li>Development<\/li>\n\n\n\n<li>Testing<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udd39 Job Compute<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Ephemeral clusters<\/strong> created automatically when you run a scheduled job\/workflow.<\/li>\n\n\n\n<li>Start when the job runs \u2192 terminate immediately after.<\/li>\n\n\n\n<li>Good for:\n<ul class=\"wp-block-list\">\n<li>Production workloads<\/li>\n\n\n\n<li>Automated pipelines<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Saves cost since cluster exists only while the job runs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udd39 Serverless Compute (coming in preview\/GA by region)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Fully managed, no need to configure cluster size\/type.<\/li>\n\n\n\n<li>Databricks decides resources behind the scenes.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">3. Access Modes in Compute<\/h2>\n\n\n\n<p>Access modes determine how users and Unity Catalog interact with clusters:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Single User<\/strong> \u2192 Cluster tied to one user; good for personal work.<\/li>\n\n\n\n<li><strong>Shared<\/strong> \u2192 Multiple users can attach notebooks; Unity Catalog enabled.<\/li>\n\n\n\n<li><strong>No Isolation Shared<\/strong> \u2192 Legacy option for Hive metastore, <strong>not supported<\/strong> by Unity Catalog.<\/li>\n<\/ul>\n\n\n\n<p>\ud83d\udca1 Best practice:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <strong>Shared clusters<\/strong> with Unity Catalog for team projects.<\/li>\n\n\n\n<li>Use <strong>Single User clusters<\/strong> for development.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">4. Cluster Permissions<\/h2>\n\n\n\n<p>You can assign access at the cluster level:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Can Manage<\/strong> \u2192 Full rights (edit, delete, restart).<\/li>\n\n\n\n<li><strong>Can Restart<\/strong> \u2192 Start\/stop cluster only.<\/li>\n\n\n\n<li><strong>Can Attach To<\/strong> \u2192 Attach notebooks or SQL queries but cannot stop\/start or modify.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">5. Cluster Policies<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A <strong>policy = template + restrictions<\/strong> for cluster creation.<\/li>\n\n\n\n<li><strong>Unrestricted<\/strong> = full freedom (default).<\/li>\n\n\n\n<li><strong>Predefined Policies<\/strong>:\n<ul class=\"wp-block-list\">\n<li><strong>Personal Compute<\/strong> \u2192 single node, single user.<\/li>\n\n\n\n<li><strong>Shared Compute<\/strong> \u2192 multi-node, shared mode.<\/li>\n\n\n\n<li><strong>Power User Compute<\/strong> \u2192 allows scaling.<\/li>\n\n\n\n<li><strong>Legacy Shared<\/strong> \u2192 for non-Unity Catalog workloads.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>You can also create <strong>custom policies<\/strong> to enforce:\n<ul class=\"wp-block-list\">\n<li>Allowed VM types<\/li>\n\n\n\n<li>Auto-termination rules<\/li>\n\n\n\n<li>Worker\/driver size<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">6. Important Cluster Settings<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Databricks Runtime (DBR)<\/strong> \u2192 Pre-packaged Spark + Scala + Python + libraries.\n<ul class=\"wp-block-list\">\n<li>Always pick the latest <strong>LTS (Long-Term Support)<\/strong> version.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Photon<\/strong> \u2192 C++ engine, speeds up Spark SQL jobs, but slightly higher cost.<\/li>\n\n\n\n<li><strong>Autoscaling<\/strong> \u2192 Define min\/max workers; cluster grows\/shrinks automatically.<\/li>\n\n\n\n<li><strong>Auto-Termination<\/strong> \u2192 Saves cost by shutting cluster after X mins of inactivity.<\/li>\n\n\n\n<li><strong>VM Types<\/strong> \u2192 Choose compute optimized vs memory optimized based on workload.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">7. Monitoring &amp; Debugging<\/h2>\n\n\n\n<p>Clusters provide:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Event Logs<\/strong> \u2192 track autoscaling up\/down.<\/li>\n\n\n\n<li><strong>Spark UI<\/strong> \u2192 debug jobs and see DAG execution.<\/li>\n\n\n\n<li><strong>Metrics tab<\/strong> \u2192 monitor CPU\/memory usage.<\/li>\n\n\n\n<li><strong>Driver Logs<\/strong> \u2192 check stdout, stderr for errors.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">8. Key Differences: All Purpose vs Job Compute<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Feature<\/th><th>All Purpose Compute<\/th><th>Job Compute<\/th><\/tr><\/thead><tbody><tr><td>Usage<\/td><td>Interactive (notebooks, SQL)<\/td><td>Scheduled Jobs<\/td><\/tr><tr><td>Lifecycle<\/td><td>Manual start\/stop<\/td><td>Auto-create, auto-kill<\/td><\/tr><tr><td>Cost Efficiency<\/td><td>Less efficient if left running<\/td><td>More efficient<\/td><\/tr><tr><td>Best for<\/td><td>Dev &amp; exploration<\/td><td>Production workloads<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>\u2705 <strong>Conclusion<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <strong>All Purpose Compute<\/strong> for dev\/test.<\/li>\n\n\n\n<li>Use <strong>Job Compute<\/strong> for scheduled production pipelines.<\/li>\n\n\n\n<li>Always enable <strong>auto-termination<\/strong> and <strong>policies<\/strong> to save cost.<\/li>\n\n\n\n<li>Prefer <strong>Unity Catalog enabled clusters<\/strong> (Single User \/ Shared) for governance.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>1. What is Compute in Databricks? 2. Types of Compute in Databricks \ud83d\udd39 All-Purpose Compute \ud83d\udd39 Job Compute \ud83d\udd39 Serverless Compute (coming in preview\/GA by region) 3&#8230;. <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-793","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/793","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=793"}],"version-history":[{"count":1,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/793\/revisions"}],"predecessor-version":[{"id":794,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/793\/revisions\/794"}],"wp:attachment":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=793"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=793"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=793"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}