Engineering

Inside Dagu v2.7.0: Reproducible Workflows

Dagu v2.7.0 makes workflow runs easier to reproduce by moving action versions, files, tools, inputs, outputs, and run evidence into the workflow contract.

Yota Hamada
Share

Inside Dagu v2.7.0: Reproducible Workflows

The more I think about Dagu v2.7.0, the less I want to describe it as a release about reusable actions.

Reuse matters. It is useful when ten workflows can call the same action instead of copying the same YAML around.

But reuse is not the story I would lead with.

Reproducibility is.

A workflow is not reproducible if it only works because the right script happens to be on one machine, the right CLI was installed by someone last year, or the worker happens to share the same repository checkout.

That kind of workflow can run. It can even run for months. Then it moves to another worker, another team member runs it, or a production incident happens at 2 a.m., and suddenly nobody knows which assumption broke.

v2.7.0 is about pulling those assumptions into the workflow contract.

Not perfect determinism. External APIs can change. Databases can change. Time still moves. But the execution context of the workflow should be visible and movable: the action version, the files it needs, the tools it uses, the shape of its inputs, the shape of its outputs, and the artifacts it leaves behind.

That is the story I would lead with.

The problem is hidden state

Before v2.7.0, Dagu could already run serious workflows. The problem was not that workflows were impossible.

The problem was how much production behavior lived outside the DAG.

For example:

  • this step expects jq to exist on the worker
  • this shell script must live next to the DAG
  • this action depends on files the worker may not have
  • this sub-workflow returns JSON, but the caller has to know where to read it from
  • this distributed worker can run the DAG only if it has the same checkout as the parent worker

None of those assumptions is unusual. Every workflow system has some version of this problem.

But if the assumptions are outside the workflow, the run is harder to reproduce. You can look at the YAML and still not know what actually made the run work.

v2.7.0 does not remove every possible source of drift. It does something more practical: it moves more of the run's required context into Dagu itself.

Dagu Actions are workflow packages

Dagu Actions and third-party actions package a workflow so another DAG can call it by reference.

The package is still simple:

my-action/
  dagu-action.yaml
  workflow.yaml
  scripts/
    helper.sh

The manifest tells Dagu which DAG to run and what the boundary looks like:

apiVersion: v1alpha1
name: classify-release
dag: workflow.yaml
inputs:
  type: object
  additionalProperties: false
  required: [version]
  properties:
    version:
      type: string
outputs:
  type: object
  additionalProperties: false
  required: [channel]
  properties:
    channel:
      type: string

The caller sees one action:

steps:
  - id: classify
    action: acme/classify-release@v1.2.3
    with:
      version: ${VERSION}

  - id: route
    run: echo "send to ${classify.outputs.channel}"
    depends: [classify]

Here is what happens when Dagu runs that step.

First, Dagu resolves the action reference. For production, that should be a tag or commit SHA, not a floating branch. Dagu also rejects unsafe ref syntax because action resolution is part of the security boundary.

Then Dagu reads dagu-action.yaml. That file says which workflow file belongs to the action and what input and output shapes are allowed.

Next, Dagu validates the caller's with values against the action input schema. If the caller passes the wrong shape, the action does not start.

After that, Dagu loads the action's workflow file and runs it as a sub-DAG. It is still Dagu running a DAG. There is no second plugin runtime hidden inside the feature.

When the action finishes, Dagu validates the action output object against the manifest's output schema. If the output is valid, the parent DAG can read it as ${classify.outputs.channel}.

That is the whole model:

  1. resolve a versioned action reference
  2. read the manifest
  3. validate the input
  4. run the action DAG
  5. validate the output
  6. expose the output to the parent DAG

You can reuse the action because it is packaged. The deeper win is that the package has a declared execution boundary.

The package travels with the run

This is the part I care about most.

Dagu can run in a distributed setup. A parent DAG may start on one worker, while an action sub-DAG runs somewhere else. That means the child worker needs the action files before it can run the action.

Without bundling, the feature would be fragile. It would work only when every worker already had the same repository checkout and helper scripts in the same place.

v2.7.0 avoids that assumption.

When the parent worker resolves an action, Dagu materializes the action workspace. Then it packs that workspace into a bounded bundle. The bundle includes the action workflow, manifest, and support files. If another worker needs to run the action DAG, the coordinator moves the bundle with the run. The child worker extracts it before execution starts.

That means the worker does not need to know your repository layout ahead of time. It receives the action package it needs for this run.

There are limits on purpose:

  • 64 MiB compressed by default
  • 256 MiB uncompressed by default
  • 8192 files by default
  • .git directories are excluded
  • symlinks and special files are rejected
  • paths are normalized so files cannot escape the workspace

These limits are not glamorous, but they are exactly the kind of thing that makes the feature production-shaped.

The point is simple: if an action needs files, those files should travel with the action run. Do not hope the right files already exist on the right worker.

Tools are part of the contract

Scripts often depend on small command-line tools.

jq. yq. Release helpers. Diagnostic tools. Converters.

In many systems, those tools live in a worker image, a README, or somebody's memory. That is hidden state again.

v2.7.0 lets the DAG say what it needs:

tools:
  - jqlang/jq@jq-1.7.1

steps:
  - id: transform
    run: jq ".items[] | .name" data.json

Dagu's managed tools build on aquaproj/aqua. Aqua already solved a hard and boring problem well: install pinned CLI tools from a registry in a repeatable way. Dagu uses that model internally, while keeping the user experience inside a single Dagu binary. You do not need to install the aqua CLI separately to use managed tools in a Dagu workflow.

At runtime, Dagu generates an aqua config from the DAG's tools, computes a toolset hash for the worker platform, installs the tools into a worker-local cache, writes a manifest of resolved commands, and prepends the toolset bin directory to PATH.

The practical result is that a workflow no longer says, "I hope jq is installed."

It says:

tools:
  - jqlang/jq@jq-1.7.1

That is much easier to reproduce.

There is one important boundary. Tools belong to the DAG that declares them. Sub-DAGs and action packages do not inherit the caller's tools. If an action package needs jq, the action DAG should declare jq.

That is intentional. A package should carry its own requirements.

Managed tools in v2.7.0 apply to host-executed command steps. They are not injected into Docker, Kubernetes, SSH, or DAG-level container execution yet. In those cases, the tool still belongs in the image or remote host.

Outputs are not stdout folklore anymore

Workflow data flow often starts out casual.

A script prints JSON. Another step scrapes stdout. Someone remembers which line matters. Later, nobody wants to touch it because changing the output might break a caller they forgot about.

v2.7.0 gives action boundaries a clearer return path.

Inside an action DAG, a step can publish values from stdout:

steps:
  - id: classify
    run: ./classify.sh
    stdout:
      outputs:
        decode: json

Or the workflow can write outputs directly:

steps:
  - id: publish
    action: outputs.write
    with:
      values:
        channel: ${classify.output.channel}
        priority: high
    depends: [classify]

The parent reads the action result through the action boundary:

steps:
  - id: classify
    action: acme/classify-release@v1.2.3
    with:
      version: ${VERSION}

  - id: notify
    run: echo "channel=${classify.outputs.channel}"
    depends: [classify]

Object-form output still exists for step-scoped data inside a DAG. But if a value crosses a DAG or action boundary, use stdout.outputs or outputs.write.

That gives the action something close to a function signature: schema-checked input, workflow execution, schema-checked output.

Artifacts make the run inspectable

Reproducibility is not only about starting the same run. It is also about understanding what happened after the run finishes.

v2.7.0 adds artifact actions and stdout/stderr artifact capture. That gives workflows a standard place to put files that matter to the run: reports, transformed data, logs, query results, and other evidence.

The new built-in actions include:

  • artifact.write
  • artifact.read
  • artifact.list
  • artifact capture from stdout
  • artifact capture from stderr

This is useful when a run fails, but it is also useful when a run succeeds. A successful production workflow should still leave behind enough context for someone else to inspect it later.

Built-in actions remove little shell conventions

Shell is still the foundation. I do not want Dagu to turn every script into framework code.

But a lot of workflow scripts are not business logic. They are small wrappers around common operations:

  • check whether a file exists
  • copy or move a file
  • wait for an HTTP endpoint
  • write an artifact
  • pick a value out of data
  • run a DuckDB query
  • enqueue another DAG

v2.7.0 adds built-in actions for those kinds of operations:

  • file.stat, file.read, file.write, file.copy, file.move, file.delete, file.mkdir, file.list
  • artifact.write, artifact.read, artifact.list
  • data.convert, data.pick
  • wait.duration, wait.until, wait.file, wait.http
  • git.checkout
  • duckdb.query, duckdb.import
  • outputs.write
  • dag.enqueue

The reason this matters for reproducibility is boring but real. A named built-in action has a clearer contract than a one-off shell snippet that every workflow writes slightly differently.

Use shell when shell is the right tool. Use a named action when the workflow operation itself is the thing you mean.

Production workflows need production surfaces

v2.7.0 also adds features around the action runtime.

Secrets can be managed with global and workspace scopes. Notification channels and routing rules can send workflow events through email, webhooks, Slack, and Telegram. Incident routing adds provider connections and policies for systems such as PagerDuty and SolarWinds Incident Response. The MCP server gives AI tools and other clients a structured way to read, change, and execute Dagu workflows.

I would not treat these as side quests.

Once workflows become more explicit units of execution, they need operational surfaces around them. Secrets should not live in YAML. Failures should route to the right place. Incidents should be visible. External tools should have an API-level way to inspect and operate workflows.

Compatibility matters

Workflow languages cannot pretend old workflows do not exist.

v2.7.0 keeps older fields working. DAGs using command, script, type, call, and legacy custom step_types still load. The new schema can warn about deprecated execution syntax, but there is no flag day.

That compatibility constraint shaped the implementation.

New syntax is normalized into the existing runtime model. Custom actions expand during DAG loading. Built-in action names map to executor types where that makes sense. Action packages get dedicated runtime work because reference resolution, manifest validation, workspace bundling, and output validation are genuinely new responsibilities.

The authoring model gets cleaner, but the existing runtime is not thrown away.

What v2.7.0 is really about

Dagu started with a plain idea: a workflow can be a YAML file, a set of commands, and a single binary that runs them.

v2.7.0 keeps that.

But it makes the layer around those commands much more explicit:

  • action references are versioned
  • action packages declare inputs and outputs
  • action files travel with distributed runs
  • tools can be pinned in the DAG
  • outputs cross boundaries in a structured way
  • artifacts make run evidence easier to keep
  • secrets, notifications, incidents, and MCP make the operational layer less ad hoc

That is why I would describe v2.7.0 as a reproducibility release.

Reusability is a nice side effect.

Reproducible workflow runs are the reason this matters.

Β· Β· Β·

Written by

Yota Hamada

Working on Dagu, a self-hosted command workflow engine for reliable, portable automation.

More from Yota Hamada