Essay

The Gap Between Cron and Airflow

Cron is too small. Airflow is too big. Most teams live in the gap between them, duct-taping bash scripts or fighting a Python-shaped orchestrator. Here is why that gap exists, and why Dagu fits in it.

Yota Hamada
分享

The Gap Between Cron and Airflow

Every team with more than one scheduled job eventually hits the same wall.

You start with cron. Cron is perfect, until it isn't. Then someone says "we need a real workflow engine," and a week later you are standing in front of an Airflow deployment asking yourself what exactly you signed up for.

There is a gap between those two worlds. Most teams live inside it, unhappy in both directions. This essay is about that gap, and what the right tool in it should look like.

Cron Is the Best Worst Thing

Cron is beautiful. One line. One file. One binary that has been running on every Unix box since 1975. You write 0 2 * * * /usr/local/bin/run-report.sh and go to sleep. It runs.

And then, slowly, the cracks appear.

  • A step fails at 3 AM. You find out at 10 AM from a Slack message asking where the report is.
  • Two jobs need to run in sequence. You chain them with && and hope neither hangs.
  • A script needs to be retried. You SSH in, scroll through /var/log/syslog, paste a command, pray.
  • The rerun logic lives in someone's head. The scheduling lives in crontab -l. The failure handling lives in a file called wrapper.sh that nobody is allowed to touch.
  • Visibility is a word that does not exist in cron's vocabulary. There is no dashboard. There is no history. There is the log file you remembered to tee into.

Cron does not scale with your operational maturity. It scales with your tolerance for 3 AM pages.

At some point, every team crosses the line where the cost of "just cron, plus some scripts" exceeds the cost of a real orchestrator. That line is where the trouble starts.

Airflow Is the Obvious Answer, and It Is Wrong for Most Teams

So you go shopping. You read some blog posts. Airflow comes up. Airflow is what the cool data teams use. Airflow has a UI. Airflow has retries and backfills and SLAs.

You install Airflow.

Now you have a Postgres database to babysit. A scheduler process. A webserver. A Redis or Celery or Kubernetes executor, depending on which flavor of complexity you chose. A DAG parser that chews CPU. A metadata DB that needs migrations every upgrade. A DAG file format that is, technically, Python, which means every workflow is a program, which means every workflow can break in every way a program can break.

And the worst part: all of your workflows now have to be rewritten in Python.

Think about what that means if your stack is Java. Or PHP. Or Rails. Or Go. Or a pile of bash scripts that actually work fine and just need to be scheduled and observed. You are being asked to learn an entirely new language and runtime, not to solve a new problem, but to schedule the problems you have already solved.

That is not a workflow engine. That is a religion you joined because cron hurt.

Most of what Airflow gives you is a bazooka for a mosquito. Dynamic DAG generation. Pluggable executors across cloud-native backends. XCom. Task groups. SubDAGs. Sensors. The feature list reads like the spec sheet of an aircraft carrier when most teams just want a sailboat.

Other orchestrators (Prefect, Dagster, Temporal, and a dozen more) are thoughtful software built by smart people. They also require you to adopt their framework, their data model, their deployment shape, and usually their language runtime. The gap narrows slightly. It does not close.

What the Gap Actually Looks Like

Strip the question down. What does a team in the middle actually need?

  • Run a sequence of commands on a schedule.
  • Show me what ran, what failed, what the output was.
  • Let me retry a failed step without SSH.
  • Let me define the workflow in a file I can put in git.
  • Do not make me learn a new programming language.
  • Do not make me run a database and a message broker to schedule a nightly ETL.

That list is small. It is also what 80 percent of workflow use cases look like, across every industry, at every company size.

Nothing in that list requires Python. Nothing in that list requires Postgres. Nothing in that list requires Kubernetes. Nothing in that list requires a distributed executor unless you choose to scale.

The tooling grew in the wrong direction because the loud users (massive data platforms, ML pipelines at hyperscalers) needed the bazooka. Everyone else got dragged along. The simple middle was never served, so people kept writing bash wrappers around cron and calling it good.

Why Not Just OS and Files

Here is the radical idea: a workflow is a description of which commands run in which order, with what inputs, on what schedule. That description does not need a database. It needs a file.

  • A file, because files are the universal substrate of the operating system. You can cat them, grep them, diff them, commit them to git, review them in a PR, copy them to another machine.
  • Commands, because commands are the universal substrate of software. Every language ships an executable. Every legacy system has a CLI or an HTTP endpoint. Every script on disk is already a command.
  • The OS, because the OS already knows how to run commands, manage processes, capture stdout, route stderr, and kill things that hang. You do not need to reinvent any of that.

If the engine is just a well-behaved process that reads a YAML file, runs commands in order, writes logs to disk, and exposes a UI on localhost, you have eliminated 90 percent of the operational surface of a traditional orchestrator.

No database to migrate. No broker to monitor. No framework to learn. Just a process, a file, and the filesystem underneath.

This is the shape that Dagu took.

Dagu, Stated Plainly

Dagu is a single Go binary. You drop it on a machine and it runs. Workflows are YAML files. State lives in files on disk. The UI is served from the same binary on localhost.

name: daily-report
schedule: "0 2 * * *"
steps:
  - name: extract
    command: python extract.py
  - name: transform
    command: ./transform.sh
    depends: extract
  - name: load
    command: java -jar loader.jar
    depends: transform

Notice what is not in that file. No framework imports. No decorator. No task registry. No connection string. The steps are not "operators," they are commands. Python, bash, Java, whatever you already have. Dagu does not care. It just runs them and tracks the result.

That last point is the one that matters for legacy estates. If your nightly job is a PHP script, Dagu runs the PHP script. If it is a Perl cron that has been working for 12 years, Dagu runs it. You do not rewrite anything. You do not port anything. You schedule it and move on.

Why This Shape Fits the AI Era

The CLI-native shape was useful before AI. It is essential now.

AI agents operate on commands. They write bash, they call CLIs, they invoke HTTP endpoints. The unit of work that agents naturally produce is exactly the unit of work that Dagu naturally runs. When an agent generates a workflow, it does not need to emit Python with the correct imports and the correct operator class. It emits a list of commands. That is the grain at which language models are already fluent.

If your orchestrator forces agents to produce framework-specific code, you have created an impedance mismatch between the model and the runtime. If your orchestrator accepts commands, the agent can drive it, a human can read it, and neither side is translating between worldviews.

The same property that makes Dagu friendly to legacy code is what makes it friendly to AI. The lowest common denominator of software is the command. Meet software where it is.

Orchestration Is Not Business Logic

There is a deeper principle under all of this.

Workflow orchestration is a separate concern from your application. Scheduling, retries, observability, dependency resolution, these are infrastructure problems. They are real problems, but they are not your problem. Your problem is the business logic inside the steps.

When an orchestrator forces you to write your business logic inside its framework, it has collapsed two concerns into one. Your ETL is now tangled with Airflow's task abstraction. Your machine learning pipeline is now tangled with your orchestrator's class hierarchy. If you want to move off that orchestrator, you are not replacing the scheduling layer, you are rewriting the business logic that grew into it.

A good orchestrator sits outside your code. It calls your code, watches your code, restarts your code, reports on your code. It does not infect your code. The interface between the two is the same interface the OS already offers: a command, its arguments, its stdin, its stdout, its exit code.

Keep orchestration a thin layer. Keep your business logic portable. The day you want to swap the orchestrator for a better one, or move from one machine to a hundred, the cost should be configuration, not a rewrite.

The Point

Cron is too small. Airflow is too big. The gap between them is where most teams actually work, and for a long time nobody was building for that gap honestly.

Dagu is what fits there. Not because it is clever, but because it refuses to be clever. Files on disk. Commands in YAML. A single binary. A UI you do not have to host. No database. No broker. No framework. No language lock-in.

If that sounds boring, good. Workflow orchestration should be boring. Your software already has enough hard problems. Scheduling a job and watching it run should not be one of them.

Try it on a laptop. Point it at your existing scripts. If it fits your gap, you will know within an hour.

· · ·

作者

Yota Hamada

开发 Dagu:一款可靠、可移植的自托管工作流编排引擎。

Yota Hamada 的更多文章