
Relax, It’s Just a Race Condition (Therac-25 Says Hi)
What’s the worst a bug can do? The Therac-25 killed people with X-rays thanks to one race condition. A programmer’s nightmare in real life.
Therac-25: The Worst Software Bug You’ve Never Heard Of
As I was staying there, bored, annoyed by the lack of things to spark my curiosity and monotone work, I started wondering about the worst bug in programming history. Even though I was tired from the screen, I am almost never that tired to not ask stupid questions to Google, or mostly these times chatGPT.
From what I saw, I decided to write today about what’s known as…. Therac25.
In the early 80s where software hadn’t still replaced all things, especially not human stupidity, an anonymous programmer decided to create an application that would replace an existing hardware based radiation machine.
Language? Of course Assembly, PDP-11. Machine? DEC PDP-11 (no idea what the hell that is, ill just refer to it as a minicomputer). OS? A custom, minimal operating environment.
In those times, Assembly was used in embedded systems because it offered tight control over hardware. Speed and memory efficiency were critical and the systems had very limited resources. Higher-level languages like C were still gaining traction in embedded medical systems.
Now imagine creating, testing and maintaining a software that automates all the processes and test them as efficently as it should when it comes to medical ideology. (Probably that’s why the programmer, even though amateur, chose to be anonymous)
So what happened exacly? Let’s talk a bit about the existing system.
The existing system, Therac20 (I think), was a radiation therapy machine used to treat cancer patients. It’s primary function was to deliver high doses of radiation to tumors, in a process called radiotherapy.
This therapy was supposed to help treat cancer patients by directing focused radiation beams at cancerous tumors to destroy them or shrink them. It was used specifically in external beam radiotherapy, meaning the radiation was directed from outside the body, not internally. (still worried about how to center a div, huh?)
The Therac-25 was an evolution to previous models, and it was meant to provide precise control over the radiation dose, allowing for higher energy doses than earlier models, and also different radiation modes. Was also intended to treat a variety of cancers in different body locations.
Now enough with the the logic, and lets get back to tech. The earlier models had interlocks in hardware that didn’t allow accidents to happen. (at least most of the times)
So, what happened?
The machine operated by magnets, which needed to be in the right position for the operation begin. there was a variable that checked if they were ready, and the value for that was 0 = true and anything else = false.Now, these magnets took about 8 seconds to move into position, any change given at that time period would be ignored…except at one precise moment. Now what happened was what we call an arithmetical overflow(or byte overflow), which happens when an 8-bit value reaches 256 — an invalid number — and rolls back to 0(which, ironically, meant “ready.”), and if the operator would specifically at that moment press the set command, the check would catch 0, which means ready,and it would go down a code path that would fire a concentrated X-Ray beam directly at the patient.
There were no interlocks.
No last-minute checks.
The software thought everything was ready.
It. Was. Wrong.
At any moment that something like this would happen, or that the values set were different from the actual position of the machine, there would appear a MALFUNCTION 54 error on the screen.
No documentation.
No idea what’s happening.
Do you want to proceed?
Machines can never be wrong.
never.
Hit ENTER.
How many people were harmed back then, how many lives were taken. And the company didn’t accept anything, the programmer was never known…
So the next time you are worried about your code, think….It’s just a bug, whats the worse that can happen?