Databricks: Service Principal in Databricks using Azure?

What Is a Service Principal in Databricks?

A service principal is a specialized, non-human identity within Azure Databricks, designed exclusively for automation, integrations, and programmatic access. Service principals are intended for use by tools, scripts, CI/CD pipelines, or external systems—never by individual users. They provide API-only access to Databricks resources, which increases security and stability by decoupling permissions from user accounts.

Key Features

  • Security: No risk of workflow interruptions when users change roles or leave the organization.
  • Fine-grained Access: Can be granted specific entitlements (e.g., workspace access, SQL access) or admin roles.
  • API-Only: Cannot log into the Databricks UI directly.

Use Cases

At the Databricks Account Console Level

  • Global automation across multiple workspaces (e.g., create workspaces, assign users/groups, manage Unity Catalog, auditing, and workspace configurations).
  • Central identity for CI/CD pipelines, Terraform/Pulumi scripts, or admin task automations that span all organizational Databricks resources.

At the Databricks Workspace Level

  • Manage and automate workspace resources (clusters, jobs, notebooks).
  • Programmatic data access and ingest, including API access to tables, Delta Lake resources, and job runs.
  • Secure credential for data engineering pipelines or scheduled jobs that need persistent, stable permissions.
  • Running jobs “as service principal” so workflows don’t fail if a user account changes or is removed.

How to Use Service Principal: Step-by-Step with cURL

Prerequisites:

  • You must be an account or workspace admin.
  • You need a registered service principal with appropriate roles/entitlements.

1. Create/Assign Service Principal

Account Console

  • Log into the Databricks Account Console.
  • Go to “User management” > “Service principals” > “Add service principal”, enter details, and add.

Workspace

  • Go to Workspace UI > Settings > Identity and Access > Manage > Add Service Principal.

2. Grant Permissions and Generate Token/Secret

  • Assign roles (User/Manager) and required entitlements.
  • Generate OAuth secret or Personal Access Token (PAT) for API usage.

3. Authenticate with cURL for Databricks REST APIs

Example: Create a Personal Access Token for Service Principal

bashcurl -X POST \
  https://<databricks-instance>/api/2.0/token-management/on-behalf-of/tokens \
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer <ADMIN_PERSONAL_ACCESS_TOKEN>" \
  --data '{
     "principal": "<service-principal-id>",
     "comment": "Token for service principal automation"
   }'

You need an admin token or OAuth for initial API access. The returned token is your service principal’s API credential.

Example: Use Service Principal to List Databricks Jobs

(Assume <SP_PAT> is the token generated for the service principal)

bashcurl -X GET \
  https://<databricks-instance>/api/2.1/jobs/list \
  --header "Authorization: Bearer <SP_PAT>"

4. Create and Use Storage Credential (Advanced Example)

For Unity Catalog or storage integration, you may need to create a storage credential with service principal for secure access.

bashcurl -X POST \
  https://<databricks-instance>/api/2.1/unity-catalog/storage-credentials \
  -d '{
    "name": "sp-credential",
    "azure_service_principal": {
      "directory_id": "<tenant-id>",
      "application_id": "<sp-client-id>",
      "client_secret": "<sp-client-secret>"
    },
    "skip_validation": false
  }'

This sets up data access using the service principal identity.


Summary Table: Service Principal Use Cases

LevelUse Case Examples
Account ConsoleWorkspace automation, global governance, CI/CD
WorkspaceData access, job automation, scheduled pipelines

To use service principals in Databricks:

  1. Register and assign them at account or workspace level.
  2. Grant relevant permissions/entitlements.
  3. Generate a token for API authentication.
  4. Execute REST API calls securely with cURL—ideal for automation, integration, and stable orchestration of Databricks resources.

Leave a Comment