Databricks: Databricks Compute (Clusters, Access Modes, Policies, and Permissions)


1. What is Compute in Databricks?

  • Compute = processing power in Databricks.
  • In practice, compute means clusters (a group of virtual machines).
  • A cluster always has:
    • Driver node → coordinates the job.
    • Worker nodes → perform actual data processing.

2. Types of Compute in Databricks

🔹 All-Purpose Compute

  • Interactive clusters used for notebooks, SQL queries, or ad-hoc jobs.
  • Stay running until manually terminated or auto-terminated.
  • Good for:
    • Exploratory data analysis
    • Development
    • Testing

🔹 Job Compute

  • Ephemeral clusters created automatically when you run a scheduled job/workflow.
  • Start when the job runs → terminate immediately after.
  • Good for:
    • Production workloads
    • Automated pipelines
  • Saves cost since cluster exists only while the job runs.

🔹 Serverless Compute (coming in preview/GA by region)

  • Fully managed, no need to configure cluster size/type.
  • Databricks decides resources behind the scenes.

3. Access Modes in Compute

Access modes determine how users and Unity Catalog interact with clusters:

  • Single User → Cluster tied to one user; good for personal work.
  • Shared → Multiple users can attach notebooks; Unity Catalog enabled.
  • No Isolation Shared → Legacy option for Hive metastore, not supported by Unity Catalog.

💡 Best practice:

  • Use Shared clusters with Unity Catalog for team projects.
  • Use Single User clusters for development.

4. Cluster Permissions

You can assign access at the cluster level:

  • Can Manage → Full rights (edit, delete, restart).
  • Can Restart → Start/stop cluster only.
  • Can Attach To → Attach notebooks or SQL queries but cannot stop/start or modify.

5. Cluster Policies

  • A policy = template + restrictions for cluster creation.
  • Unrestricted = full freedom (default).
  • Predefined Policies:
    • Personal Compute → single node, single user.
    • Shared Compute → multi-node, shared mode.
    • Power User Compute → allows scaling.
    • Legacy Shared → for non-Unity Catalog workloads.
  • You can also create custom policies to enforce:
    • Allowed VM types
    • Auto-termination rules
    • Worker/driver size

6. Important Cluster Settings

  • Databricks Runtime (DBR) → Pre-packaged Spark + Scala + Python + libraries.
    • Always pick the latest LTS (Long-Term Support) version.
  • Photon → C++ engine, speeds up Spark SQL jobs, but slightly higher cost.
  • Autoscaling → Define min/max workers; cluster grows/shrinks automatically.
  • Auto-Termination → Saves cost by shutting cluster after X mins of inactivity.
  • VM Types → Choose compute optimized vs memory optimized based on workload.

7. Monitoring & Debugging

Clusters provide:

  • Event Logs → track autoscaling up/down.
  • Spark UI → debug jobs and see DAG execution.
  • Metrics tab → monitor CPU/memory usage.
  • Driver Logs → check stdout, stderr for errors.

8. Key Differences: All Purpose vs Job Compute

FeatureAll Purpose ComputeJob Compute
UsageInteractive (notebooks, SQL)Scheduled Jobs
LifecycleManual start/stopAuto-create, auto-kill
Cost EfficiencyLess efficient if left runningMore efficient
Best forDev & explorationProduction workloads

Conclusion:

  • Use All Purpose Compute for dev/test.
  • Use Job Compute for scheduled production pipelines.
  • Always enable auto-termination and policies to save cost.
  • Prefer Unity Catalog enabled clusters (Single User / Shared) for governance.

Related Posts

Strategic Cloud Financial Management With Certified FinOps Professional Training

Introduction The Certified FinOps Professional program is a transformative milestone for any engineer or manager looking to master the intersection of finance, technology, and business operations. This…

Read More

Professional Certified FinOps Engineer improves financial performance visibility systems

Introduction In the modern landscape of cloud infrastructure, technical expertise alone is no longer sufficient to drive enterprise success. The Certified FinOps Engineer program has emerged as…

Read More

Complete Cloud Financial Management Guide for Certified FinOps Manager

Introduction The Certified FinOps Manager program is designed to bridge the widening gap between cloud engineering and financial accountability. As cloud environments become more complex, organizations require…

Read More

Industry Ready FinOps Knowledge Through Certified FinOps Architect Program

Introduction The Certified FinOps Architect certification is designed to help professionals bridge the gap between cloud financial management and operational efficiency. This guide is tailored for working…

Read More

Advance Your Data Management Career with CDOM – Certified DataOps Manager

The CDOM – Certified DataOps Manager is a breakthrough certification designed for professionals who want to master the intersection of data engineering and operational agility. This guide…

Read More

Future focused learning with CDOA – Certified DataOps Architect certification

Introduction The CDOA – Certified DataOps Architect is a professional designed to bridge the gap between data engineering and operational excellence. This guide is written for engineers…

Read More