{"id":797,"date":"2025-08-22T15:23:59","date_gmt":"2025-08-22T15:23:59","guid":{"rendered":"https:\/\/dataopsschool.com\/blog\/?p=797"},"modified":"2025-08-22T15:23:59","modified_gmt":"2025-08-22T15:23:59","slug":"databricks-databricks-workflows-jobs-tasks-passing-values-if-else-re-runs-and-loops","status":"publish","type":"post","link":"https:\/\/dataopsschool.com\/blog\/databricks-databricks-workflows-jobs-tasks-passing-values-if-else-re-runs-and-loops\/","title":{"rendered":"Databricks: Databricks Workflows (Jobs, Tasks, Passing Values, If\/Else, Re-runs, and Loops)"},"content":{"rendered":"\n<p><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">1. \ud83d\udd39 Introduction to Workflows<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Databricks Workflow (Job)<\/strong> = a pipeline of tasks (notebooks, scripts, SQL, pipelines, etc.).<\/li>\n\n\n\n<li><strong>Use cases<\/strong>: ETL orchestration, data quality checks, ML pipelines, conditional branching.<\/li>\n\n\n\n<li>Each <strong>job<\/strong> = multiple <strong>tasks<\/strong> with dependencies, parameters, retries, schedules, etc.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">2. \ud83d\udd39 Jobs UI Overview<\/h2>\n\n\n\n<p>When creating a job:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Task types<\/strong>: Notebook, Python script, Wheel, JAR, SQL, dbt, Spark submit, If\/Else, For Each.<\/li>\n\n\n\n<li><strong>Cluster<\/strong>: Use job clusters (terminate after run) or all-purpose clusters.<\/li>\n\n\n\n<li><strong>Parameters<\/strong>: Pass values via widgets (<code>dbutils.widgets.get()<\/code> in notebooks).<\/li>\n\n\n\n<li><strong>Notifications<\/strong>: Configure success\/failure emails or alerts.<\/li>\n\n\n\n<li><strong>Retries &amp; Timeouts<\/strong>: Control job resiliency.<\/li>\n\n\n\n<li><strong>Schedule\/Trigger<\/strong>: Run once, on schedule, or triggered by events.<\/li>\n\n\n\n<li><strong>Permissions<\/strong>: Control who can run\/edit\/manage jobs.<\/li>\n\n\n\n<li><strong>Advanced<\/strong>: Queueing, max concurrent runs.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">3. \ud83d\udd39 Creating a Job (Example: Process Employee Data)<\/h2>\n\n\n\n<p>Workflow:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Get Day<\/strong> (extract current day name from date).<\/li>\n\n\n\n<li><strong>Check if Sunday<\/strong> (If\/Else branch).<\/li>\n\n\n\n<li><strong>If True<\/strong> \u2192 Process data by department.<\/li>\n\n\n\n<li><strong>If False<\/strong> \u2192 Print &#8220;Not Sunday&#8221;.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Notebook Setup<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Notebook 1<\/strong>: Get Day <code>dbutils.widgets.text(\"input_date\", \"\") input_date = dbutils.widgets.get(\"input_date\") # Get day of week input_day = spark.sql(f\"\"\" SELECT date_format(to_timestamp('{input_date}', \"yyyy-MM-dd'T'HH:mm:ss\"), 'E') as day \"\"\").collect()[0].day # Set task value dbutils.jobs.taskValues.set(key=\"input_day\", value=input_day)<\/code><\/li>\n\n\n\n<li><strong>Notebook 2<\/strong>: Process Data (department-based ETL).<\/li>\n\n\n\n<li><strong>Notebook 3<\/strong>: Else branch (just print day). <code>input_day = dbutils.jobs.taskValues.get(taskKey=\"01_set_day\", key=\"input_day\") print(f\"Today is {input_day}, skipping processing.\")<\/code><\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">4. \ud83d\udd39 Passing Values Between Tasks<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <code>dbutils.jobs.taskValues.set()<\/code> in producer task.<\/li>\n\n\n\n<li>Retrieve with <code>dbutils.jobs.taskValues.get(taskKey, key)<\/code> in consumer task.<\/li>\n<\/ul>\n\n\n\n<p>\u2705 Example: Pass &#8220;Sunday&#8221; check from Notebook 1 \u2192 If\/Else task.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">5. \ud83d\udd39 Conditional (If\/Else) Tasks<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Add <strong>If\/Else task<\/strong> in Workflow.<\/li>\n\n\n\n<li>Condition:\n<ul class=\"wp-block-list\">\n<li>Value from task output = &#8220;Sunday&#8221;.<\/li>\n\n\n\n<li>Operator = equals.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>True branch \u2192 Run process notebook.<\/li>\n\n\n\n<li>False branch \u2192 Run else notebook.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">6. \ud83d\udd39 Re-run Failed Jobs<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Go to <strong>Run history \u2192 Repair run<\/strong>.<\/li>\n\n\n\n<li>Select failed tasks \u2192 re-execute only those.<\/li>\n\n\n\n<li>Saves time (no need to rerun entire pipeline).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">7. \ud83d\udd39 Override Parameters at Runtime<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <strong>Run with different parameters<\/strong> in UI.<\/li>\n\n\n\n<li>Example: Override <code>input_date<\/code> to <code>\"2024-10-27T13:00:00\"<\/code> \u2192 Forces workflow to evaluate as Sunday.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">8. \ud83d\udd39 For Each Loop in Workflows<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Wrap a task (e.g., Process Department Data) inside a <strong>For Each loop<\/strong>.<\/li>\n\n\n\n<li>Provide array of values (static or dynamic). <code>[\"sales\", \"office\"]<\/code><\/li>\n\n\n\n<li>Each loop iteration passes one value \u2192 notebook parameter.<\/li>\n<\/ul>\n\n\n\n<p>Example notebook parameter setup:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>dbutils.widgets.text(\"department\", \"\")\ndepartment = dbutils.widgets.get(\"department\")\n\nprint(f\"Processing department: {department}\")\n<\/code><\/pre>\n\n\n\n<p>\ud83d\udca1 This runs the same task multiple times (parallel\/sequential) for each department.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">9. \ud83d\udd39 Best Practices<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u2705 Always use <strong>job clusters<\/strong> (auto-terminate) \u2192 cost saving.<\/li>\n\n\n\n<li>\u2705 Centralize parameters at <strong>job level<\/strong>, override at task level when needed.<\/li>\n\n\n\n<li>\u2705 Use <strong>taskValues<\/strong> for cross-task communication.<\/li>\n\n\n\n<li>\u2705 Use <strong>If\/Else<\/strong> for conditional ETL or SLA workflows.<\/li>\n\n\n\n<li>\u2705 Use <strong>For Each<\/strong> for department-wise ETL, multi-source ingestion, or model training per dataset.<\/li>\n\n\n\n<li>\u2705 Leverage <strong>repair runs<\/strong> instead of restarting full pipelines.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">10. \ud83d\udd39 Summary<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Jobs<\/strong> orchestrate pipelines.<\/li>\n\n\n\n<li><strong>Tasks<\/strong> define execution units (Notebook, Python, SQL, etc.).<\/li>\n\n\n\n<li><strong>Parameters &amp; TaskValues<\/strong> allow passing dynamic values.<\/li>\n\n\n\n<li><strong>If\/Else<\/strong> = branch logic.<\/li>\n\n\n\n<li><strong>For Each<\/strong> = loop logic.<\/li>\n\n\n\n<li><strong>Repair Runs<\/strong> = selective reruns.<\/li>\n\n\n\n<li><strong>Override Params<\/strong> = test\/debug flexibility.<\/li>\n<\/ul>\n\n\n\n<p>This makes Databricks Workflows a <strong>lightweight orchestrator<\/strong> (similar to Airflow but native inside Databricks).<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>1. \ud83d\udd39 Introduction to Workflows 2. \ud83d\udd39 Jobs UI Overview When creating a job: 3. \ud83d\udd39 Creating a Job (Example: Process Employee Data) Workflow: Notebook Setup 4&#8230;. <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-797","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/797","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=797"}],"version-history":[{"count":1,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/797\/revisions"}],"predecessor-version":[{"id":798,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/797\/revisions\/798"}],"wp:attachment":[{"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=797"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=797"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dataopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=797"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}