Databricks: Service Principal in Databricks using Azure?

What Is a Service Principal in Databricks? A service principal is a specialized, non-human identity within Azure Databricks, designed exclusively for automation, integrations, and programmatic access. Service principals are intended for use by tools, scripts, CI/CD pipelines, or external systems—never by individual users. They provide API-only access to Databricks resources, which increases security and stability … Read more

Databricks: What is Databricks workspace?

What Is a Databricks Workspace? A Databricks workspace is the core organizational environment in Databricks where teams perform all collaborative data engineering, data science, analytics, and machine learning tasks. It provides a unified web-based interface and compute management layer that allows users to develop code in notebooks, run jobs, manage clusters, share results, and access … Read more

Databricks: Set Up Metastore & Map Azure Storage Account with Access Connector, Enable Unity Catalog

This guide walks you through setting up a Unity Catalog metastore in Azure Databricks, connecting it securely to an Azure storage account using the Access Connector, validating the setup, and enabling Unity Catalog for your Databricks workspace. Step 1: Create a Storage Account and Container for Metastore Step 2: Create Access Connector (Managed Identity) for … Read more

Databricks: Step-by-Step Commands: Managed vs. External Table in Databricks

Below is a complete workflow—with working SQL and Python code—demonstrating how to create, manage, insert, read, and delete data for both Managed and External tables in Databricks. After each step, commands using dbutils.fs are used to check underlying file storage differences, highlighting the distinction between managed and external tables. 1. Create a Managed Table SQL: … Read more

Databricks: File Storage Options on Databricks

The main file storage options in Databricks are: Option Best Use Case Security/Governance Notes Unity Catalog Volumes Data, artifacts across workspaces Strong Recommended, scalable Workspace Files Notebooks, code, small files Workspace ACLs Limited to one workspace DBFS Root & Folders Legacy, temp, example datasets Basic Not recommended for prod Direct Cloud Storage (abfss/s3/gs) High-performance, large … Read more

Databricks: Working with Different Types of Tables

Databricks supports several types of tables, each designed for distinct storage, management, and integration scenarios. The main table types are: Summary Table Table Type Storage/Location Management Formats Supported Use Case Managed Databricks-managed storage (internal) Unity Catalog Delta, Iceberg Full lifecycle, performance, security External External cloud storage (explicit path) User Delta, Parquet, CSV, etc. Shared or … Read more

Databricks: dbutils is a utility library

dbutils is a built-in utility module in Databricks notebooks (Python, Scala, R) that provides programmatic access to common workspace tasks, including interacting with the Databricks File System (DBFS), handling secrets, controlling notebook workflow, and creating parameter widgets. Core Features of dbutils Example Usage in Python python# List files in a DBFS directory dbutils.fs.ls(‘/databricks-datasets’) # Get … Read more

Databricks: Unity Catalog

here’s the simplified definition of Unity Catalog: In short — it’s the “library catalog” and “security guard” for all your Databricks data and AI. If you want, I can give you a one-page Unity Catalog cheat sheet with a diagram so you remember it instantly. I get it — Unity Catalog can feel abstract until … Read more

Databricks Account Console

The Databricks Account Console is the central, account-level management portal for Databricks — it’s where you control everything that spans multiple workspaces. Think of it as the “control tower” for your Databricks environment. Purpose It sits above individual workspaces and lets you: What You Do in the Account Console Feature Description User & Group Management … Read more