What Is a Research Software Engineer?
2025-11-10
There's a job I've been doing for most of my career that didn't have a name until relatively recently. You write software, but for scientists. You're not a researcher — no publication record to protect, no grants to write, no realistic shot at tenure. You're not a software engineer in the commercial sense either, because the thing you're building doesn't need to handle Black Friday traffic. It needs to produce a correct result for one very specific dataset, reliably enough that someone can put their name on the paper that depends on it.
The term Research Software Engineer only solidified around 2012, when a group of people at a workshop in Oxford realized they all had the same job and none of them knew what to call it. The Software Sustainability Institute organized the event. Neil Chue Hong and James Hetherington were among the people who pushed the label forward. The UK Research Software Engineers Association formed in 2013. The first RSE conference, in 2016, drew about 200 people from 14 countries. Most of them had been doing the job for years without knowing there was a word for it.
The structural problem the role sits in is real and not going away. Academic incentives reward publications, grants, citations. Software is infrastructure — it's what makes the publications possible, but it doesn't get credited the same way. A paper might describe a method that took three years and 40,000 lines of code to implement, and the software gets a footnote acknowledging 'computational support.' So RSEs end up carrying a lot of weight for research output while barely showing up in the accounting systems that determine whose career goes where. Some universities are trying to fix this. Princeton built out a formal RSE career ladder with defined levels and a management track. Most institutions haven't thought about it seriously.
What the work actually looks like varies a lot more than the job title suggests. Sometimes it's numerical computing — writing solvers, tuning simulations, tracking down why a climate model produces slightly different results on two compiler versions, which matters because the paper's figures depend on it. Sometimes it's data pipelines: taking raw genomic sequences and turning them into something a biologist can actually query. I spent years building the kind of tooling that lets a research lab function at all — build systems, test harnesses, version control infrastructure for code that will never be a product but still has to work when a grad student runs it at midnight.
The Software Sustainability Institute has done a lot of work documenting what RSEs actually do and what keeps them in the job. One thing that shows up repeatedly: people in these roles often discovered the profession sideways. They were the grad student who kept getting asked to fix the lab's code because they were the one who knew Python. Or the postdoc who liked the computing side more than the writing side and never quite left it. There's no standard on-ramp. No undergraduate degree in research software engineering. You end up here through a series of small decisions that felt like detours at the time.
That's part of why the career path is so murky. If you don't know the role exists, you can't aim for it. You end up mid-career with skills — parallel computing, scientific Python, some R, maybe Fortran for reasons you'd rather not examine — and no clear sense of what level you're at or where you'd go next. Industry will hire you, usually for more money than a university pays, but the problems shift and the feedback loops are different. Commercial software has users who complain when something breaks. Research software has a smaller audience and a much longer time horizon; code you wrote in 2015 might still be running in someone's pipeline a decade later.
The skill combination the role actually needs is unusual. You need enough domain knowledge to understand what a researcher actually needs, not just what they said they needed in a message sent at 11pm. Someone asking for 'a script to compare two datasets' might actually need a full validation pipeline with fixed random seeds, logging, and a report they can include in a supplementary materials section. You won't know that unless you understand what they're trying to prove. And you need engineering discipline to build something that doesn't only work the day you demo it — that handles missing data, documents its assumptions, runs on a machine that isn't yours.
Those two things together don't often come packaged in the same person. Most people with deep engineering expertise have limited insight into scientific domains. Most researchers with deep domain knowledge have never had to think about what happens when someone else runs their code. RSEs live in the gap between those two things. It's an interesting gap. It's also an exhausting one, because you're rarely the most expert person in the room on either side, which means you spend a lot of time being the person who has to figure out what the two sides are actually saying to each other.
I'm not sure there's a clean answer to any of the structural problems. The role is inherently in-between things, and some of what makes it frustrating is also what makes it worth doing. You're the one who actually knows how the analysis works, end to end. That's not nothing.