{"id":795,"date":"2025-08-22T14:56:49","date_gmt":"2025-08-22T14:56:49","guid":{"rendered":"https:\/\/dataopsschool.com\/blog\/?p=795"},"modified":"2025-08-22T14:56:50","modified_gmt":"2025-08-22T14:56:50","slug":"databricks-custom-cluster-policies-instance-pools-in-databricks","status":"publish","type":"post","link":"https:\/\/dataopsschool.com\/blog\/databricks-custom-cluster-policies-instance-pools-in-databricks\/","title":{"rendered":"Databricks: Custom Cluster Policies &amp; Instance Pools in Databricks"},"content":{"rendered":"\n<p>Perf<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">1. \ud83d\udd39 Why Policies and Pools?<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Policies<\/strong> \u2192 Standardize and enforce cluster configurations across your organization.<\/li>\n\n\n\n<li><strong>Instance Pools<\/strong> \u2192 Pre-create and reuse VMs to <strong>reduce startup time<\/strong> for clusters.<\/li>\n<\/ul>\n\n\n\n<p>These features are <strong>critical in enterprise Databricks deployments<\/strong> to enforce compliance, control costs, and improve performance.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">2. Custom Cluster Policies in Databricks<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udccc What is a Cluster Policy?<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A <strong>JSON template<\/strong> that defines allowed, fixed, or forbidden cluster settings.<\/li>\n\n\n\n<li>Ensures <strong>users follow org standards<\/strong> (e.g., fixed runtime, mandatory auto-termination).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udee0 How to Create a Custom Policy<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Go to <strong>Admin Console \u2192 Cluster Policies<\/strong>.<\/li>\n\n\n\n<li>Instead of creating from scratch, <strong>clone an existing policy<\/strong> (e.g., <code>Shared Compute<\/code>).<\/li>\n\n\n\n<li>Edit the JSON to <strong>override settings<\/strong>. Example:<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>{\n  \"autotermination_minutes\": {\n    \"type\": \"fixed\",\n    \"value\": 10\n  },\n  \"num_workers\": {\n    \"type\": \"fixed\",\n    \"value\": 1\n  },\n  \"autoscale.min_workers\": {\n    \"type\": \"forbidden\"\n  },\n  \"autoscale.max_workers\": {\n    \"type\": \"forbidden\"\n  },\n  \"spark_version\": {\n    \"type\": \"fixed\",\n    \"value\": \"15.4.x-scala2.12\"\n  },\n  \"node_type_id\": {\n    \"type\": \"enum\",\n    \"values\": &#91;\"Standard_DS3_v2\", \"Standard_DS4_v2\"],\n    \"defaultValue\": \"Standard_DS4_v2\"\n  }\n}\n<\/code><\/pre>\n\n\n\n<p>\ud83d\udd11 Explanation:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Auto termination<\/strong> \u2192 Always 10 minutes.<\/li>\n\n\n\n<li><strong>Fixed workers<\/strong> \u2192 No autoscaling allowed.<\/li>\n\n\n\n<li><strong>Fixed runtime<\/strong> \u2192 Spark 15.4 only.<\/li>\n\n\n\n<li><strong>Restricted VM types<\/strong> \u2192 Only Standard_DS3_v2 or DS4_v2 allowed.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udccc How to Apply Policy to New Clusters<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>When creating a cluster, select <strong>Policy \u2192 Custom Policy Name<\/strong>.<\/li>\n\n\n\n<li>UI will <strong>grey out forbidden fields<\/strong> (e.g., Spark version, node type).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udccc Enforcing Policy on Existing Clusters<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If policy changes (e.g., Spark version updated), old clusters show <strong>\u201cNon-compliant\u201d<\/strong>.<\/li>\n\n\n\n<li>Click <strong>Fix All<\/strong> \u2192 Databricks auto-updates them to comply.<\/li>\n\n\n\n<li>Example: Changing Spark version from <code>14.3<\/code> \u2192 <code>15.4<\/code> updates all linked clusters.<\/li>\n<\/ul>\n\n\n\n<p>\u2705 This ensures <strong>org-wide compliance<\/strong> instantly.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">3. Instance Pools in Databricks<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udccc What is an Instance Pool?<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A <strong>predefined set of VMs<\/strong> ready to be attached to clusters.<\/li>\n\n\n\n<li>Benefit \u2192 <strong>Reduce startup time<\/strong> (clusters don\u2019t need to wait for VM provisioning).<\/li>\n\n\n\n<li>Clusters <strong>draw workers from the pool<\/strong> instead of requesting fresh VMs.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udee0 How to Create an Instance Pool<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Go to <strong>Compute \u2192 Instance Pools \u2192 Create Pool<\/strong>.<\/li>\n\n\n\n<li>Configure:\n<ul class=\"wp-block-list\">\n<li><strong>Min Idle Instances<\/strong> \u2192 Always running. Keeps pool \u201cwarm.\u201d\n<ul class=\"wp-block-list\">\n<li>Example: <code>2<\/code> = always 2 ready VMs.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Max Capacity<\/strong> \u2192 Upper limit of VMs in the pool.\n<ul class=\"wp-block-list\">\n<li>Example: <code>10<\/code> = pool can scale up to 10 nodes.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Idle Auto Termination<\/strong> \u2192 Time (mins) after which unused VMs shut down.<\/li>\n\n\n\n<li><strong>Node Type<\/strong> \u2192 VM family (e.g., DS4_v2).<\/li>\n\n\n\n<li><strong>Databricks Runtime (DBR)<\/strong> \u2192 Pre-load runtime for faster attach.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udccc Warm vs Cold Pools<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Warm Pool<\/strong> \u2192 Min Idle > 0 (e.g., 2 VMs always running).\n<ul class=\"wp-block-list\">\n<li>\u2705 Fast startup (sub-second).<\/li>\n\n\n\n<li>\u274c Higher cost (pay for idle VMs).<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Cold Pool<\/strong> \u2192 Min Idle = 0.\n<ul class=\"wp-block-list\">\n<li>\u2705 Cost-efficient (no idle VMs).<\/li>\n\n\n\n<li>\u274c Slower startup (still need to spin up VMs).<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Example: Warm Instance Pool<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>Name: demo-pool\nMin Idle Instances: 1\nMax Capacity: 10\nIdle Auto Termination: 10 mins\nNode Type: Standard_DS4_v2\nRuntime: 15.4 LTS\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>At least <strong>1 VM always running<\/strong>.<\/li>\n\n\n\n<li>Jobs launch instantly by borrowing warm node.<\/li>\n\n\n\n<li>Released nodes wait <strong>10 mins<\/strong> before termination \u2192 reused if another job comes.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">4. Best Practices<\/h2>\n\n\n\n<p>\u2705 Use <strong>Custom Policies<\/strong> to:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enforce <strong>auto-termination<\/strong> (prevent zombie clusters).<\/li>\n\n\n\n<li>Fix runtime versions (e.g., always LTS).<\/li>\n\n\n\n<li>Restrict node types to control cost.<\/li>\n\n\n\n<li>Disable autoscaling if not needed.<\/li>\n<\/ul>\n\n\n\n<p>\u2705 Use <strong>Warm Pools<\/strong> for:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Low-latency SLA jobs<\/strong> (e.g., real-time ETL, streaming, dashboards).<\/li>\n<\/ul>\n\n\n\n<p>\u2705 Use <strong>Cold Pools<\/strong> for:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Batch jobs<\/strong> that can tolerate 2\u20135 min startup delay.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">5. Key Differences: Policy vs Pool<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Feature<\/th><th>Cluster Policy \ud83d\udea6<\/th><th>Instance Pool \ud83c\udfca<\/th><\/tr><\/thead><tbody><tr><td>Purpose<\/td><td>Enforce rules<\/td><td>Reduce startup time<\/td><\/tr><tr><td>Controls<\/td><td>Runtime, nodes, auto-termination<\/td><td>VM availability<\/td><\/tr><tr><td>Cost Impact<\/td><td>Avoids misuse<\/td><td>May add idle VM costs<\/td><\/tr><tr><td>Governance<\/td><td>Compliance tool<\/td><td>Performance tool<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>\u2705 <strong>Conclusion<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <strong>Policies<\/strong> for governance and cost control.<\/li>\n\n\n\n<li>Use <strong>Pools<\/strong> to optimize SLA and startup latency.<\/li>\n\n\n\n<li>Combine both: <strong>Policy + Pool-backed clusters<\/strong> = controlled + fast compute.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Perf 1. \ud83d\udd39 Why Policies and Pools? These features are critical in enterprise Databricks deployments to enforce compliance, control costs, and improve performance. 2. Custom Cluster Policies&#8230; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-795","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/795","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=795"}],"version-history":[{"count":1,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/795\/revisions"}],"predecessor-version":[{"id":796,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/795\/revisions\/796"}],"wp:attachment":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=795"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=795"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=795"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}