Orchestrating and Scheduling Notebooks in Databricks

Perfect — this transcript is about Databricks Notebook Orchestration and how to parameterize/run

1. Introduction

Databricks notebooks can be parameterized and orchestrated like workflows.
You can:

  • Pass parameters to notebooks using widgets.
  • Trigger one notebook from another using dbutils.notebook.run().
  • Capture results (exit status, record counts, etc.).
  • Schedule notebooks as jobs.

2. Setup: Parent vs Child Notebook

  • Child Notebook: Does the main processing (e.g., read employees, filter, write Delta tables).
  • Parent Notebook: Calls child notebooks with parameters and orchestrates runs.

3. Step 1: Parameterizing a Child Notebook

Inside the child notebook (write_emp_data):

# Step 1: Create a widget for parameter input
dbutils.widgets.text("DPT", "", "Department Name")

# Step 2: Fetch the widget value
dept = dbutils.widgets.get("DPT")
print(f"Department passed: {dept}")

✅ This creates a text box at the top of the notebook where you can pass department values (like Sales, Office).


4. Step 2: Process Data in the Child Notebook

# Read employee dataset
df = spark.read.csv("dbfs:/mnt/emp/employee.csv", header=True)

# Filter by department and active records
df_filtered = df.where(
    (df["Department"] == dept.upper()) & (df["Active_Record"] == "1")
)

# Write data dynamically into a Delta table
if df_filtered.count() > 0:
    table_name = f"dev.bronze.d_{dept.lower()}"
    df_filtered.write.mode("overwrite").saveAsTable(table_name)
    print(f"Data written for {dept}")
else:
    print(f"No data for {dept}")

5. Step 3: Return Results to Parent Notebook

Child notebook can return a result:

# Pass the record count back to parent
count = df_filtered.count()
dbutils.notebook.exit(str(count))

6. Step 4: Orchestrating from the Parent Notebook

In parent notebook (run_emp_data):

# Run child notebook with parameters
result = dbutils.notebook.run(
    "write_emp_data",  # Notebook path or name
    timeout_seconds=600,
    arguments={"DPT": "Sales"}
)

print(f"Child notebook returned {result} records")

You can loop through multiple departments:

departments = ["Sales", "Office", "Production"]

for dept in departments:
    result = dbutils.notebook.run(
        "write_emp_data",
        timeout_seconds=600,
        arguments={"DPT": dept}
    )
    print(f"{dept}: {result} records processed")

7. Step 5: Scheduling the Notebook

Databricks UI lets you schedule notebooks without external tools:

  • In the parent notebook, click Schedule (top-right).
  • Give the job a name.
  • Choose cluster (existing or new).
  • Set frequency:
    • Daily, weekly, or custom cron syntax (e.g., 0 8 * * * for 8AM daily).
  • (Optional) Configure alerts: email/Slack on success/failure.
  • Add parameters (widgets like DPT).

8. Key Commands Recap

  • Create widget: dbutils.widgets.text("DPT", "", "Department")
  • Get widget value: dept = dbutils.widgets.get("DPT")
  • Run notebook: dbutils.notebook.run("child_nb", 600, {"param": "value"})
  • Return value from notebook: dbutils.notebook.exit("some_value")

Outcome: You now have a reusable system where a parent notebook can call child notebooks with parameters, orchestrate workflows, and schedule them automatically in Databricks.


Related Posts

Strategic Cloud Financial Management With Certified FinOps Professional Training

Introduction The Certified FinOps Professional program is a transformative milestone for any engineer or manager looking to master the intersection of finance, technology, and business operations. This…

Read More

Professional Certified FinOps Engineer improves financial performance visibility systems

Introduction In the modern landscape of cloud infrastructure, technical expertise alone is no longer sufficient to drive enterprise success. The Certified FinOps Engineer program has emerged as…

Read More

Complete Cloud Financial Management Guide for Certified FinOps Manager

Introduction The Certified FinOps Manager program is designed to bridge the widening gap between cloud engineering and financial accountability. As cloud environments become more complex, organizations require…

Read More

Industry Ready FinOps Knowledge Through Certified FinOps Architect Program

Introduction The Certified FinOps Architect certification is designed to help professionals bridge the gap between cloud financial management and operational efficiency. This guide is tailored for working…

Read More

Advance Your Data Management Career with CDOM – Certified DataOps Manager

The CDOM – Certified DataOps Manager is a breakthrough certification designed for professionals who want to master the intersection of data engineering and operational agility. This guide…

Read More

Future focused learning with CDOA – Certified DataOps Architect certification

Introduction The CDOA – Certified DataOps Architect is a professional designed to bridge the gap between data engineering and operational excellence. This guide is written for engineers…

Read More