Databricks: Databricks Workflows (Jobs, Tasks, Passing Values, If/Else, Re-runs, and Loops)


1. 🔹 Introduction to Workflows

  • Databricks Workflow (Job) = a pipeline of tasks (notebooks, scripts, SQL, pipelines, etc.).
  • Use cases: ETL orchestration, data quality checks, ML pipelines, conditional branching.
  • Each job = multiple tasks with dependencies, parameters, retries, schedules, etc.

2. 🔹 Jobs UI Overview

When creating a job:

  • Task types: Notebook, Python script, Wheel, JAR, SQL, dbt, Spark submit, If/Else, For Each.
  • Cluster: Use job clusters (terminate after run) or all-purpose clusters.
  • Parameters: Pass values via widgets (dbutils.widgets.get() in notebooks).
  • Notifications: Configure success/failure emails or alerts.
  • Retries & Timeouts: Control job resiliency.
  • Schedule/Trigger: Run once, on schedule, or triggered by events.
  • Permissions: Control who can run/edit/manage jobs.
  • Advanced: Queueing, max concurrent runs.

3. 🔹 Creating a Job (Example: Process Employee Data)

Workflow:

  1. Get Day (extract current day name from date).
  2. Check if Sunday (If/Else branch).
  3. If True → Process data by department.
  4. If False → Print “Not Sunday”.

Notebook Setup

  • Notebook 1: Get Day dbutils.widgets.text("input_date", "") input_date = dbutils.widgets.get("input_date") # Get day of week input_day = spark.sql(f""" SELECT date_format(to_timestamp('{input_date}', "yyyy-MM-dd'T'HH:mm:ss"), 'E') as day """).collect()[0].day # Set task value dbutils.jobs.taskValues.set(key="input_day", value=input_day)
  • Notebook 2: Process Data (department-based ETL).
  • Notebook 3: Else branch (just print day). input_day = dbutils.jobs.taskValues.get(taskKey="01_set_day", key="input_day") print(f"Today is {input_day}, skipping processing.")

4. 🔹 Passing Values Between Tasks

  • Use dbutils.jobs.taskValues.set() in producer task.
  • Retrieve with dbutils.jobs.taskValues.get(taskKey, key) in consumer task.

✅ Example: Pass “Sunday” check from Notebook 1 → If/Else task.


5. 🔹 Conditional (If/Else) Tasks

  • Add If/Else task in Workflow.
  • Condition:
    • Value from task output = “Sunday”.
    • Operator = equals.
  • True branch → Run process notebook.
  • False branch → Run else notebook.

6. 🔹 Re-run Failed Jobs

  • Go to Run history → Repair run.
  • Select failed tasks → re-execute only those.
  • Saves time (no need to rerun entire pipeline).

7. 🔹 Override Parameters at Runtime

  • Use Run with different parameters in UI.
  • Example: Override input_date to "2024-10-27T13:00:00" → Forces workflow to evaluate as Sunday.

8. 🔹 For Each Loop in Workflows

  • Wrap a task (e.g., Process Department Data) inside a For Each loop.
  • Provide array of values (static or dynamic). ["sales", "office"]
  • Each loop iteration passes one value → notebook parameter.

Example notebook parameter setup:

dbutils.widgets.text("department", "")
department = dbutils.widgets.get("department")

print(f"Processing department: {department}")

💡 This runs the same task multiple times (parallel/sequential) for each department.


9. 🔹 Best Practices

  • ✅ Always use job clusters (auto-terminate) → cost saving.
  • ✅ Centralize parameters at job level, override at task level when needed.
  • ✅ Use taskValues for cross-task communication.
  • ✅ Use If/Else for conditional ETL or SLA workflows.
  • ✅ Use For Each for department-wise ETL, multi-source ingestion, or model training per dataset.
  • ✅ Leverage repair runs instead of restarting full pipelines.

10. 🔹 Summary

  • Jobs orchestrate pipelines.
  • Tasks define execution units (Notebook, Python, SQL, etc.).
  • Parameters & TaskValues allow passing dynamic values.
  • If/Else = branch logic.
  • For Each = loop logic.
  • Repair Runs = selective reruns.
  • Override Params = test/debug flexibility.

This makes Databricks Workflows a lightweight orchestrator (similar to Airflow but native inside Databricks).


Related Posts

DataOps Project Learning Builds Awareness of Data Quality Automation Practices

Introduction Learning DataOps only through theory is not enough. Beginners must work on practical projects to understand how data pipelines are designed, tested, automated, monitored, and improved…

Read More

Ultimate Career Guide: Best Practices for Entry-Level DataOps Professionals

Introduction Data is now one of the most important assets for modern organizations. Companies depend on data pipelines, analytics dashboards, reporting systems, cloud platforms, and automated workflows…

Read More

Understanding Fundamental Analysis of Stocks for Long Term Equity Investing

Introduction Stepping into the financial world can feel overwhelming, but securing high-quality stock market education is the ultimate way to build long-term wealth. For individuals starting their…

Read More

A Complete Review of the Top Rank Tracking Tools for Local & Global Scale

To win in the modern digital landscape, visibility is everything. Growing brands and busy agencies frequently struggle to balance keyword tracking, technical audits, content creation, creator outreach,…

Read More

Modern DevOps Consulting for Cloud and Kubernetes Success

Introduction Digital‑first businesses are under intense pressure to ship faster, stay secure, and scale reliably across complex multi‑cloud environments. Traditional ways of building and operating software cannot…

Read More

Enterprise DevOps: A Beginner Guide to Scaling IT

Introduction Modern enterprises face the monumental challenge of delivering software at breakneck speeds without sacrificing infrastructure stability. Relying on isolated development and operations teams is no longer…

Read More