Makefiles Are Still the Best Research Automation Tool

2025-10-02

Every few years someone writes a post explaining why you should replace Make with something newer — Snakemake, Nextflow, DVC, Prefect, whatever the current recommendation is. And then I watch research teams reach for Make anyway. I've stopped being surprised by this.

Make is a dependency tracker. You describe what depends on what, and it figures out the minimum set of things that need to run. For software builds: if this source file changed, recompile this object, relink this binary. For research: if this raw data file changed, rerun this cleaning script, regenerate this figure, rebuild this paper. Same logic. Stuart Feldman wrote the first version in 1976, which puts Make solidly in the era before personal computers, and it's been continuously useful for most of the time since.

The Turing Way documents the research Makefile pattern in detail. A typical setup has targets for each pipeline stage: downloading data, cleaning it, running the model, generating figures, compiling the document. Each stage declares its inputs and its outputs. Each stage depends on the outputs of what came before. Run `make` and you get only the steps that are out of date. Change one input file, and only the downstream steps re-execute. Nothing else runs.

That incremental execution is the main practical reason people keep coming back to it. A full analysis might take six hours. You don't want to rerun everything because you adjusted a label on a figure. The alternative — a shell script — forces you to either run everything or manually comment out the steps you want to skip, which is exactly how pipelines stop being reproducible. Someone comments out a preprocessing step to save time, forgets to uncomment it, and six months later nobody can figure out why the results look slightly different from the paper. Make doesn't give you that option. It either runs what needs to run or it doesn't.

The reproducibility angle is almost a side effect of the dependency tracking. When your analysis lives in a Makefile, a collaborator can clone the repository and run `make` and reproduce your results without relying on a README that may or may not be current. The Makefile is the record of how outputs are produced. You can't accidentally leave a step undocumented if that step is a Make rule. This matters in practice — research code accumulates undocumented manual steps over months of iteration, and those steps are exactly what breaks when someone tries to rerun the analysis two years later.

Make also forces a useful discipline: to write a rule, you have to name the output file the rule produces. That sounds trivial, but a lot of analysis scripts just write files wherever seemed convenient at the time, with no consistent naming scheme. Makefiles make that harder to get away with. It's a small constraint that tends to push you toward a cleaner directory structure.

The limits are real. Make works cleanly when each rule produces exactly one output file. A lot of research scripts don't behave that way — one plotting script might generate a dozen figures, and expressing that in Make gets clunky. The syntax has genuine oddities; the tab-versus-space indentation rule catches almost everyone at least once. Variables and functions exist but feel like afterthoughts. Make has no native concept of environments or package versions, so you end up wrapping those concerns in shell commands that grow over time.

For very large distributed pipelines — anything running across a cluster, anything involving tens of thousands of files — you'll probably want something more specialized eventually. Snakemake is essentially Make with Python-defined rules and better cluster support. A 2016 paper in Frontiers in Neuroinformatics demonstrated Make handling full neuroimaging pipelines with parallel execution that held up well against specialized tools. That's not a toy use case, and it gives you a sense of how much headroom Make has before you actually hit its ceiling.

The pattern I keep seeing: a research group decides Make is too old or too limited, writes a Python script to do the same thing — check timestamps, decide what's stale, call subprocesses in the right order. Six months later the script has grown to 300 lines, has its own bugs, and handles edge cases inconsistently. They've written a worse Make. The case for Make isn't that it's the best possible tool. It's that it already exists, it's already installed, and writing something to replace it is harder than it looks.