Perfect — this transcript is about Databricks Notebook Orchestration and how to parameterize/run
1. Introduction
Databricks notebooks can be parameterized and orchestrated like workflows.
You can:
- Pass parameters to notebooks using widgets.
- Trigger one notebook from another using
dbutils.notebook.run()
. - Capture results (exit status, record counts, etc.).
- Schedule notebooks as jobs.
2. Setup: Parent vs Child Notebook
- Child Notebook: Does the main processing (e.g., read employees, filter, write Delta tables).
- Parent Notebook: Calls child notebooks with parameters and orchestrates runs.
3. Step 1: Parameterizing a Child Notebook
Inside the child notebook (write_emp_data
):
# Step 1: Create a widget for parameter input
dbutils.widgets.text("DPT", "", "Department Name")
# Step 2: Fetch the widget value
dept = dbutils.widgets.get("DPT")
print(f"Department passed: {dept}")
✅ This creates a text box at the top of the notebook where you can pass department values (like Sales, Office).
4. Step 2: Process Data in the Child Notebook
# Read employee dataset
df = spark.read.csv("dbfs:/mnt/emp/employee.csv", header=True)
# Filter by department and active records
df_filtered = df.where(
(df["Department"] == dept.upper()) & (df["Active_Record"] == "1")
)
# Write data dynamically into a Delta table
if df_filtered.count() > 0:
table_name = f"dev.bronze.d_{dept.lower()}"
df_filtered.write.mode("overwrite").saveAsTable(table_name)
print(f"Data written for {dept}")
else:
print(f"No data for {dept}")
5. Step 3: Return Results to Parent Notebook
Child notebook can return a result:
# Pass the record count back to parent
count = df_filtered.count()
dbutils.notebook.exit(str(count))
6. Step 4: Orchestrating from the Parent Notebook
In parent notebook (run_emp_data
):
# Run child notebook with parameters
result = dbutils.notebook.run(
"write_emp_data", # Notebook path or name
timeout_seconds=600,
arguments={"DPT": "Sales"}
)
print(f"Child notebook returned {result} records")
You can loop through multiple departments:
departments = ["Sales", "Office", "Production"]
for dept in departments:
result = dbutils.notebook.run(
"write_emp_data",
timeout_seconds=600,
arguments={"DPT": dept}
)
print(f"{dept}: {result} records processed")
7. Step 5: Scheduling the Notebook
Databricks UI lets you schedule notebooks without external tools:
- In the parent notebook, click Schedule (top-right).
- Give the job a name.
- Choose cluster (existing or new).
- Set frequency:
- Daily, weekly, or custom cron syntax (e.g.,
0 8 * * *
for 8AM daily).
- Daily, weekly, or custom cron syntax (e.g.,
- (Optional) Configure alerts: email/Slack on success/failure.
- Add parameters (widgets like
DPT
).
8. Key Commands Recap
- Create widget:
dbutils.widgets.text("DPT", "", "Department")
- Get widget value:
dept = dbutils.widgets.get("DPT")
- Run notebook:
dbutils.notebook.run("child_nb", 600, {"param": "value"})
- Return value from notebook:
dbutils.notebook.exit("some_value")
✅ Outcome: You now have a reusable system where a parent notebook can call child notebooks with parameters, orchestrate workflows, and schedule them automatically in Databricks.