Orchestrating and Scheduling Notebooks in Databricks

Perfect — this transcript is about Databricks Notebook Orchestration and how to parameterize/run

1. Introduction

Databricks notebooks can be parameterized and orchestrated like workflows.
You can:

  • Pass parameters to notebooks using widgets.
  • Trigger one notebook from another using dbutils.notebook.run().
  • Capture results (exit status, record counts, etc.).
  • Schedule notebooks as jobs.

2. Setup: Parent vs Child Notebook

  • Child Notebook: Does the main processing (e.g., read employees, filter, write Delta tables).
  • Parent Notebook: Calls child notebooks with parameters and orchestrates runs.

3. Step 1: Parameterizing a Child Notebook

Inside the child notebook (write_emp_data):

# Step 1: Create a widget for parameter input
dbutils.widgets.text("DPT", "", "Department Name")

# Step 2: Fetch the widget value
dept = dbutils.widgets.get("DPT")
print(f"Department passed: {dept}")

✅ This creates a text box at the top of the notebook where you can pass department values (like Sales, Office).


4. Step 2: Process Data in the Child Notebook

# Read employee dataset
df = spark.read.csv("dbfs:/mnt/emp/employee.csv", header=True)

# Filter by department and active records
df_filtered = df.where(
    (df["Department"] == dept.upper()) & (df["Active_Record"] == "1")
)

# Write data dynamically into a Delta table
if df_filtered.count() > 0:
    table_name = f"dev.bronze.d_{dept.lower()}"
    df_filtered.write.mode("overwrite").saveAsTable(table_name)
    print(f"Data written for {dept}")
else:
    print(f"No data for {dept}")

5. Step 3: Return Results to Parent Notebook

Child notebook can return a result:

# Pass the record count back to parent
count = df_filtered.count()
dbutils.notebook.exit(str(count))

6. Step 4: Orchestrating from the Parent Notebook

In parent notebook (run_emp_data):

# Run child notebook with parameters
result = dbutils.notebook.run(
    "write_emp_data",  # Notebook path or name
    timeout_seconds=600,
    arguments={"DPT": "Sales"}
)

print(f"Child notebook returned {result} records")

You can loop through multiple departments:

departments = ["Sales", "Office", "Production"]

for dept in departments:
    result = dbutils.notebook.run(
        "write_emp_data",
        timeout_seconds=600,
        arguments={"DPT": dept}
    )
    print(f"{dept}: {result} records processed")

7. Step 5: Scheduling the Notebook

Databricks UI lets you schedule notebooks without external tools:

  • In the parent notebook, click Schedule (top-right).
  • Give the job a name.
  • Choose cluster (existing or new).
  • Set frequency:
    • Daily, weekly, or custom cron syntax (e.g., 0 8 * * * for 8AM daily).
  • (Optional) Configure alerts: email/Slack on success/failure.
  • Add parameters (widgets like DPT).

8. Key Commands Recap

  • Create widget: dbutils.widgets.text("DPT", "", "Department")
  • Get widget value: dept = dbutils.widgets.get("DPT")
  • Run notebook: dbutils.notebook.run("child_nb", 600, {"param": "value"})
  • Return value from notebook: dbutils.notebook.exit("some_value")

Outcome: You now have a reusable system where a parent notebook can call child notebooks with parameters, orchestrate workflows, and schedule them automatically in Databricks.


Related Posts

DataOps Project Learning Builds Awareness of Data Quality Automation Practices

Introduction Learning DataOps only through theory is not enough. Beginners must work on practical projects to understand how data pipelines are designed, tested, automated, monitored, and improved…

Read More

Ultimate Career Guide: Best Practices for Entry-Level DataOps Professionals

Introduction Data is now one of the most important assets for modern organizations. Companies depend on data pipelines, analytics dashboards, reporting systems, cloud platforms, and automated workflows…

Read More

Understanding Fundamental Analysis of Stocks for Long Term Equity Investing

Introduction Stepping into the financial world can feel overwhelming, but securing high-quality stock market education is the ultimate way to build long-term wealth. For individuals starting their…

Read More

A Complete Review of the Top Rank Tracking Tools for Local & Global Scale

To win in the modern digital landscape, visibility is everything. Growing brands and busy agencies frequently struggle to balance keyword tracking, technical audits, content creation, creator outreach,…

Read More

Modern DevOps Consulting for Cloud and Kubernetes Success

Introduction Digital‑first businesses are under intense pressure to ship faster, stay secure, and scale reliably across complex multi‑cloud environments. Traditional ways of building and operating software cannot…

Read More

Enterprise DevOps: A Beginner Guide to Scaling IT

Introduction Modern enterprises face the monumental challenge of delivering software at breakneck speeds without sacrificing infrastructure stability. Relying on isolated development and operations teams is no longer…

Read More