Orchestrating and Scheduling Notebooks in Databricks

Perfect — this transcript is about Databricks Notebook Orchestration and how to parameterize/run

1. Introduction

Databricks notebooks can be parameterized and orchestrated like workflows.
You can:

  • Pass parameters to notebooks using widgets.
  • Trigger one notebook from another using dbutils.notebook.run().
  • Capture results (exit status, record counts, etc.).
  • Schedule notebooks as jobs.

2. Setup: Parent vs Child Notebook

  • Child Notebook: Does the main processing (e.g., read employees, filter, write Delta tables).
  • Parent Notebook: Calls child notebooks with parameters and orchestrates runs.

3. Step 1: Parameterizing a Child Notebook

Inside the child notebook (write_emp_data):

# Step 1: Create a widget for parameter input
dbutils.widgets.text("DPT", "", "Department Name")

# Step 2: Fetch the widget value
dept = dbutils.widgets.get("DPT")
print(f"Department passed: {dept}")

✅ This creates a text box at the top of the notebook where you can pass department values (like Sales, Office).


4. Step 2: Process Data in the Child Notebook

# Read employee dataset
df = spark.read.csv("dbfs:/mnt/emp/employee.csv", header=True)

# Filter by department and active records
df_filtered = df.where(
    (df["Department"] == dept.upper()) & (df["Active_Record"] == "1")
)

# Write data dynamically into a Delta table
if df_filtered.count() > 0:
    table_name = f"dev.bronze.d_{dept.lower()}"
    df_filtered.write.mode("overwrite").saveAsTable(table_name)
    print(f"Data written for {dept}")
else:
    print(f"No data for {dept}")

5. Step 3: Return Results to Parent Notebook

Child notebook can return a result:

# Pass the record count back to parent
count = df_filtered.count()
dbutils.notebook.exit(str(count))

6. Step 4: Orchestrating from the Parent Notebook

In parent notebook (run_emp_data):

# Run child notebook with parameters
result = dbutils.notebook.run(
    "write_emp_data",  # Notebook path or name
    timeout_seconds=600,
    arguments={"DPT": "Sales"}
)

print(f"Child notebook returned {result} records")

You can loop through multiple departments:

departments = ["Sales", "Office", "Production"]

for dept in departments:
    result = dbutils.notebook.run(
        "write_emp_data",
        timeout_seconds=600,
        arguments={"DPT": dept}
    )
    print(f"{dept}: {result} records processed")

7. Step 5: Scheduling the Notebook

Databricks UI lets you schedule notebooks without external tools:

  • In the parent notebook, click Schedule (top-right).
  • Give the job a name.
  • Choose cluster (existing or new).
  • Set frequency:
    • Daily, weekly, or custom cron syntax (e.g., 0 8 * * * for 8AM daily).
  • (Optional) Configure alerts: email/Slack on success/failure.
  • Add parameters (widgets like DPT).

8. Key Commands Recap

  • Create widget: dbutils.widgets.text("DPT", "", "Department")
  • Get widget value: dept = dbutils.widgets.get("DPT")
  • Run notebook: dbutils.notebook.run("child_nb", 600, {"param": "value"})
  • Return value from notebook: dbutils.notebook.exit("some_value")

Outcome: You now have a reusable system where a parent notebook can call child notebooks with parameters, orchestrate workflows, and schedule them automatically in Databricks.