{"id":791,"date":"2025-08-22T14:18:28","date_gmt":"2025-08-22T14:18:28","guid":{"rendered":"https:\/\/dataopsschool.com\/blog\/?p=791"},"modified":"2025-08-22T14:18:33","modified_gmt":"2025-08-22T14:18:33","slug":"orchestrating-and-scheduling-notebooks-in-databricks","status":"publish","type":"post","link":"https:\/\/dataopsschool.com\/blog\/orchestrating-and-scheduling-notebooks-in-databricks\/","title":{"rendered":"Orchestrating and Scheduling Notebooks in Databricks"},"content":{"rendered":"\n<p>Perfect \u2014 this transcript is about <strong>Databricks Notebook Orchestration<\/strong> and how to parameterize\/run <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Introduction<\/h2>\n\n\n\n<p>Databricks notebooks can be parameterized and orchestrated like workflows.<br>You can:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Pass parameters to notebooks using <strong>widgets<\/strong>.<\/li>\n\n\n\n<li>Trigger one notebook from another using <strong><code>dbutils.notebook.run()<\/code><\/strong>.<\/li>\n\n\n\n<li>Capture results (exit status, record counts, etc.).<\/li>\n\n\n\n<li>Schedule notebooks as jobs.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">2. Setup: Parent vs Child Notebook<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Child Notebook<\/strong>: Does the main processing (e.g., read employees, filter, write Delta tables).<\/li>\n\n\n\n<li><strong>Parent Notebook<\/strong>: Calls child notebooks with parameters and orchestrates runs.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">3. Step 1: Parameterizing a Child Notebook<\/h2>\n\n\n\n<p>Inside the child notebook (<code>write_emp_data<\/code>):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Step 1: Create a widget for parameter input\ndbutils.widgets.text(\"DPT\", \"\", \"Department Name\")\n\n# Step 2: Fetch the widget value\ndept = dbutils.widgets.get(\"DPT\")\nprint(f\"Department passed: {dept}\")\n<\/code><\/pre>\n\n\n\n<p>\u2705 This creates a text box at the top of the notebook where you can pass department values (like <em>Sales<\/em>, <em>Office<\/em>).<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">4. Step 2: Process Data in the Child Notebook<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code># Read employee dataset\ndf = spark.read.csv(\"dbfs:\/mnt\/emp\/employee.csv\", header=True)\n\n# Filter by department and active records\ndf_filtered = df.where(\n    (df&#91;\"Department\"] == dept.upper()) &amp; (df&#91;\"Active_Record\"] == \"1\")\n)\n\n# Write data dynamically into a Delta table\nif df_filtered.count() &gt; 0:\n    table_name = f\"dev.bronze.d_{dept.lower()}\"\n    df_filtered.write.mode(\"overwrite\").saveAsTable(table_name)\n    print(f\"Data written for {dept}\")\nelse:\n    print(f\"No data for {dept}\")\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">5. Step 3: Return Results to Parent Notebook<\/h2>\n\n\n\n<p>Child notebook can return a result:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Pass the record count back to parent\ncount = df_filtered.count()\ndbutils.notebook.exit(str(count))\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">6. Step 4: Orchestrating from the Parent Notebook<\/h2>\n\n\n\n<p>In parent notebook (<code>run_emp_data<\/code>):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Run child notebook with parameters\nresult = dbutils.notebook.run(\n    \"write_emp_data\",  # Notebook path or name\n    timeout_seconds=600,\n    arguments={\"DPT\": \"Sales\"}\n)\n\nprint(f\"Child notebook returned {result} records\")\n<\/code><\/pre>\n\n\n\n<p>You can loop through multiple departments:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>departments = &#91;\"Sales\", \"Office\", \"Production\"]\n\nfor dept in departments:\n    result = dbutils.notebook.run(\n        \"write_emp_data\",\n        timeout_seconds=600,\n        arguments={\"DPT\": dept}\n    )\n    print(f\"{dept}: {result} records processed\")\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">7. Step 5: Scheduling the Notebook<\/h2>\n\n\n\n<p>Databricks UI lets you schedule notebooks without external tools:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>In the parent notebook, click <strong>Schedule<\/strong> (top-right).<\/li>\n\n\n\n<li>Give the job a <strong>name<\/strong>.<\/li>\n\n\n\n<li>Choose <strong>cluster<\/strong> (existing or new).<\/li>\n\n\n\n<li>Set <strong>frequency<\/strong>:\n<ul class=\"wp-block-list\">\n<li>Daily, weekly, or <strong>custom cron syntax<\/strong> (e.g., <code>0 8 * * *<\/code> for 8AM daily).<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>(Optional) Configure <strong>alerts<\/strong>: email\/Slack on success\/failure.<\/li>\n\n\n\n<li>Add <strong>parameters<\/strong> (widgets like <code>DPT<\/code>).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">8. Key Commands Recap<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Create widget:<\/strong> <code>dbutils.widgets.text(\"DPT\", \"\", \"Department\")<\/code><\/li>\n\n\n\n<li><strong>Get widget value:<\/strong> <code>dept = dbutils.widgets.get(\"DPT\")<\/code><\/li>\n\n\n\n<li><strong>Run notebook:<\/strong> <code>dbutils.notebook.run(\"child_nb\", 600, {\"param\": \"value\"})<\/code><\/li>\n\n\n\n<li><strong>Return value from notebook:<\/strong> <code>dbutils.notebook.exit(\"some_value\")<\/code><\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>\u2705 <strong>Outcome<\/strong>: You now have a reusable system where a parent notebook can call child notebooks with parameters, orchestrate workflows, and schedule them automatically in Databricks.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Perfect \u2014 this transcript is about Databricks Notebook Orchestration and how to parameterize\/run 1. Introduction Databricks notebooks can be parameterized and orchestrated like workflows.You can: 2. Setup:&#8230; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-791","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/791","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=791"}],"version-history":[{"count":1,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/791\/revisions"}],"predecessor-version":[{"id":792,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/791\/revisions\/792"}],"wp:attachment":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=791"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=791"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=791"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}