Databricks: Service Principal in Databricks using Azure?

What Is a Service Principal in Databricks?

A service principal is a specialized, non-human identity within Azure Databricks, designed exclusively for automation, integrations, and programmatic access. Service principals are intended for use by tools, scripts, CI/CD pipelines, or external systems—never by individual users. They provide API-only access to Databricks resources, which increases security and stability by decoupling permissions from user accounts.

Key Features

  • Security: No risk of workflow interruptions when users change roles or leave the organization.
  • Fine-grained Access: Can be granted specific entitlements (e.g., workspace access, SQL access) or admin roles.
  • API-Only: Cannot log into the Databricks UI directly.

Use Cases

At the Databricks Account Console Level

  • Global automation across multiple workspaces (e.g., create workspaces, assign users/groups, manage Unity Catalog, auditing, and workspace configurations).
  • Central identity for CI/CD pipelines, Terraform/Pulumi scripts, or admin task automations that span all organizational Databricks resources.

At the Databricks Workspace Level

  • Manage and automate workspace resources (clusters, jobs, notebooks).
  • Programmatic data access and ingest, including API access to tables, Delta Lake resources, and job runs.
  • Secure credential for data engineering pipelines or scheduled jobs that need persistent, stable permissions.
  • Running jobs “as service principal” so workflows don’t fail if a user account changes or is removed.

How to Use Service Principal: Step-by-Step with cURL

Prerequisites:

  • You must be an account or workspace admin.
  • You need a registered service principal with appropriate roles/entitlements.

1. Create/Assign Service Principal

Account Console

  • Log into the Databricks Account Console.
  • Go to “User management” > “Service principals” > “Add service principal”, enter details, and add.

Workspace

  • Go to Workspace UI > Settings > Identity and Access > Manage > Add Service Principal.

2. Grant Permissions and Generate Token/Secret

  • Assign roles (User/Manager) and required entitlements.
  • Generate OAuth secret or Personal Access Token (PAT) for API usage.

3. Authenticate with cURL for Databricks REST APIs

Example: Create a Personal Access Token for Service Principal

bashcurl -X POST \
  https://<databricks-instance>/api/2.0/token-management/on-behalf-of/tokens \
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer <ADMIN_PERSONAL_ACCESS_TOKEN>" \
  --data '{
     "principal": "<service-principal-id>",
     "comment": "Token for service principal automation"
   }'

You need an admin token or OAuth for initial API access. The returned token is your service principal’s API credential.

Example: Use Service Principal to List Databricks Jobs

(Assume <SP_PAT> is the token generated for the service principal)

bashcurl -X GET \
  https://<databricks-instance>/api/2.1/jobs/list \
  --header "Authorization: Bearer <SP_PAT>"

4. Create and Use Storage Credential (Advanced Example)

For Unity Catalog or storage integration, you may need to create a storage credential with service principal for secure access.

bashcurl -X POST \
  https://<databricks-instance>/api/2.1/unity-catalog/storage-credentials \
  -d '{
    "name": "sp-credential",
    "azure_service_principal": {
      "directory_id": "<tenant-id>",
      "application_id": "<sp-client-id>",
      "client_secret": "<sp-client-secret>"
    },
    "skip_validation": false
  }'

This sets up data access using the service principal identity.


Summary Table: Service Principal Use Cases

LevelUse Case Examples
Account ConsoleWorkspace automation, global governance, CI/CD
WorkspaceData access, job automation, scheduled pipelines

To use service principals in Databricks:

  1. Register and assign them at account or workspace level.
  2. Grant relevant permissions/entitlements.
  3. Generate a token for API authentication.
  4. Execute REST API calls securely with cURL—ideal for automation, integration, and stable orchestration of Databricks resources.

Related Posts

Definitive Analytics Engineering Guide to Enterprise DataOps CI/CD Automation Workflows

Introduction Modern data engineering has undergone a structural paradigm shift. Gone are the days when data teams consisted of a lone analyst executing manual SQL scripts against…

Read More

Detailed Travel Experiences Shared by HolidayLandmark Forum Members

Introduction Embarking on a new journey is undeniably thrilling, yet the initial phase of piecing together a seamless travel plan can quickly transform into a chaotic exercise…

Read More

Transform Your Journey Using HolidayLandmark Local Travel Marketplace

Introduction The definition of a meaningful vacation is undergoing a massive shift. Modern adventurers are stepping away from rigid itineraries and crowded tourist traps, choosing instead to…

Read More

Understanding Version Control in DataOps Projects Essential Guide

Managing modern data systems feels like working on a high-speed train while laying down the tracks at the same time. Business demands shift by the hour. New…

Read More

Best Practices for Building Reliable Data Pipelines for Analytics

The data engineering team blames a modified upstream API schema, while the analytics team scrambles to fix a broken SQL script. DataOps provides a practical framework designed…

Read More

Complete DevOps Engineer Salary Roadmap for Beginners

Introduction The demand for skilled professionals who can bridge the gap between development and operations has never been higher. As businesses transition from legacy systems to cloud-native…

Read More

Leave a Reply