Inside the Machine That Must Never Fail: How NASA Engineered Artemis II's Triple-Redundant Flight Computer

Four astronauts will strap themselves atop 8.8 million pounds of thrust sometime in 2025, riding the most powerful rocket ever built on a loop around the Moon and back. Their lives will depend on a computer system that NASA designed to survive its own failures — a machine architecture so paranoid in its redundancy that it can lose an entire processor mid-flight and never skip a beat.

The Orion spacecraft’s flight computer isn’t one computer. It’s three, running in lockstep, constantly voting on every calculation, every sensor reading, every command. If one disagrees with the other two, it gets outvoted and sidelined. Instantly. No human intervention required.

This is triple modular redundancy, or TMR — a concept that dates back decades in aerospace engineering but has been implemented in Orion with a level of sophistication that reflects both the ambitions and the anxieties of sending humans beyond low Earth orbit for the first time since 1972. As Communications of the ACM reported in a detailed examination of the system, NASA’s Honeywell-built flight computer represents a fundamentally different engineering philosophy from the computers aboard the International Space Station or even the Space Shuttle — one where fault tolerance isn’t bolted on as an afterthought but woven into the system’s DNA from the silicon up.

The stakes explain the paranoia. Artemis II will carry NASA astronauts Reid Wiseman, Victor Glover, Christina Koch, and Canadian Space Agency astronaut Jeremy Hansen on a roughly 10-day mission around the Moon. For much of that flight, the crew will be too far from Earth for ground controllers to intervene in real time. The flight computer must handle navigation, life support management, propulsion control, and communications autonomously when it matters most. A computer failure at the wrong moment — during a trans-lunar injection burn, say, or during reentry into Earth’s atmosphere at 25,000 miles per hour — could be catastrophic.

So NASA and Honeywell built a system where failure is expected, planned for, and absorbed.

The architecture works like this: three identical computer modules, each containing a POWER750 processor derived from IBM’s commercial line, execute the same software simultaneously. A voter circuit compares their outputs on every cycle. Two-out-of-three agreement wins. The dissenting module gets flagged, and the system continues operating on the two remaining units with zero interruption to flight operations. If a second module fails, the single remaining computer can still fly the spacecraft, though with reduced fault coverage. The crew can also manually intervene at that point.

What makes this more than a textbook TMR implementation is the depth of the redundancy. According to Communications of the ACM, the voting doesn’t just happen at the processor level. It extends through the memory, the I/O buses, and the interfaces with Orion’s sensors and actuators. The entire data path is triplicated. Even the power supplies are independent, so an electrical fault can’t cascade across modules.

This is not how most spacecraft computers work. The ISS uses a distributed architecture with multiple computers handling different functions, but they don’t operate in lockstep voting mode. SpaceX’s Dragon capsule uses a dual-redundant system with a different philosophy — if the primary computer fails, the backup takes over, but there’s no continuous voting. NASA’s approach for Orion is more conservative, more belt-and-suspenders, which makes sense given that Artemis II pushes farther from Earth than any crewed mission in half a century.

The choice of the POWER750 processor itself tells a story. It’s not the fastest chip available. Not even close. But speed wasn’t the primary design criterion — determinism was. NASA needed a processor whose behavior could be exhaustively characterized and predicted, one where every possible execution path could be tested and verified. Commercial processors optimized for raw performance use speculative execution, branch prediction, and other techniques that make their behavior harder to analyze formally. The POWER750 trades some performance for predictability, which in aerospace engineering is worth more than clock speed.

Honeywell has been building flight computers for NASA since the Apollo era, and the institutional knowledge embedded in Orion’s system reflects lessons learned across decades. The Apollo Guidance Computer, famously, had to be restarted during the Apollo 11 landing when it threw priority alarms — the 1202 and 1201 errors that nearly aborted humanity’s first Moon landing. The Shuttle’s five general-purpose computers used a four-plus-one redundancy scheme, with four computers running identical software and a fifth running independently developed backup software as a hedge against systematic software bugs.

Orion’s approach borrows from both traditions. The TMR hardware architecture handles random hardware faults — a cosmic ray flipping a bit, a capacitor failing, a solder joint cracking. But hardware redundancy can’t protect against software bugs, because if all three modules run the same flawed code, they’ll all produce the same wrong answer and the voter will happily approve it. This is the Achilles’ heel of any TMR system.

NASA addresses this through an extraordinarily rigorous software verification process. The flight software undergoes formal methods analysis, extensive simulation testing, and independent verification and validation by teams separate from the developers. Every requirement is traced from high-level mission objectives down to individual lines of code. The testing regime, as described by Communications of the ACM, involves millions of test cases executed across hardware-in-the-loop simulators that replicate the exact electrical and thermal environment the computers will face in space.

Still, no amount of testing can guarantee the absence of bugs. It can only increase confidence. NASA engineers know this, which is why the system also includes watchdog timers, memory scrubbing routines that detect and correct bit-flip errors before they propagate, and safe-mode protocols that can put the spacecraft into a stable configuration even if the flight software encounters an unrecoverable error.

The radiation environment adds another layer of complexity. Beyond the Van Allen belts, Orion will be exposed to galactic cosmic rays and potential solar particle events that can wreak havoc on electronics. The POWER750 processors in Orion are radiation-hardened versions, manufactured with processes that make them resistant to single-event upsets — the technical term for when a charged particle strikes a transistor and flips its state. But radiation hardening isn’t absolute protection. It reduces the probability of upsets; it doesn’t eliminate them. The TMR voting architecture serves as the second line of defense, catching and correcting any errors that slip through the radiation hardening.

Artemis II’s flight computer also has to manage something the Apollo computer never did: a glass cockpit. Orion’s crew interface consists of modern digital displays rather than the switches-and-dials panels of earlier spacecraft. The flight computer drives these displays and processes crew inputs through them, adding another critical function to its workload. A display failure is manageable — the crew has backup procedures. But a computer failure that corrupts the display data could be dangerously misleading, which is why the display outputs are also subject to the voting architecture.

The broader context for this engineering effort is NASA’s Artemis program, which aims to return humans to the lunar surface and eventually establish a sustained presence there. Artemis I, an uncrewed test flight, successfully sent an Orion capsule around the Moon and back in late 2022, validating the spacecraft’s heat shield, propulsion, and — critically — its flight computer in the actual space environment. The computer performed nominally throughout the 25-day mission, but uncrewed success doesn’t guarantee crewed success. The flight software for Artemis II includes additional modules for life support management and crew interaction that weren’t active on Artemis I.

NASA has faced repeated delays with the Artemis program, with Artemis II’s launch date slipping multiple times due to issues with the heat shield, the spacecraft’s life support system, and various component qualifications. The flight computer itself has not been identified as a source of delays, which in aerospace development is about the highest compliment a subsystem can receive. It means the engineering was mature when it needed to be.

There’s an interesting tension in NASA’s approach. The agency chose a heritage processor architecture and a well-understood redundancy scheme rather than pushing for the latest technology. Some in the aerospace community have questioned whether this conservatism limits Orion’s capabilities. Modern commercial processors offer orders of magnitude more computing power, which could enable more sophisticated autonomous operations, better real-time data processing, and more capable onboard decision-making. But NASA’s calculus is different from a commercial technology company’s. In human spaceflight, the cost of a wrong bet on unproven technology is measured in lives, not quarterly earnings.

And the POWER750, while not bleeding-edge, is substantially more capable than what came before. The Apollo Guidance Computer had 74 kilobytes of memory and ran at 0.043 MHz. The Shuttle’s computers each had about one megabyte of memory. Orion’s flight computer has processing power and memory capacity that would have seemed absurd to the engineers of those earlier programs, even if it looks modest next to a modern laptop.

The software architecture deserves attention too. Orion’s flight software is written primarily in C, a language that gives engineers precise control over memory and timing — essential for real-time systems where a missed deadline can mean a missed maneuver. The software is structured as a set of partitioned applications running on a real-time operating system that enforces strict temporal and spatial isolation between functions. Navigation code can’t accidentally corrupt life support code. A timing overrun in one partition can’t starve another of processing time. This partitioning is itself a form of redundancy — not against hardware faults, but against software faults propagating across functional boundaries.

The real-time operating system underlying all of this is based on the ARINC 653 standard, widely used in commercial aviation for flight-critical systems. It’s a proven foundation. Boeing 787s and Airbus A350s rely on the same standard for their flight control computers. Applying it to a spacecraft bound for lunar orbit required adaptation — the radiation environment, the communication latencies, the mission timeline are all different — but the core principles of deterministic scheduling and resource partitioning translated directly.

One aspect that often gets overlooked in discussions of spacecraft computers is the ground segment. Orion’s flight computer doesn’t operate in isolation. It communicates continuously with Mission Control in Houston through NASA’s Deep Space Network, and ground controllers can upload software patches, adjust parameters, and even modify the flight plan in near-real-time. This ground-in-the-loop capability provides yet another layer of redundancy — human intelligence backing up machine autonomy. But it’s a layer that degrades with distance. Light-speed delays to the Moon are about 1.3 seconds each way. Manageable. For future Artemis missions to Mars, the delay stretches to 20 minutes or more, which means the flight computer will need to handle increasingly complex decisions without waiting for Houston’s input.

That future is already influencing Orion’s design. The flight software architecture is built to be extensible, with well-defined interfaces that allow new capabilities to be added for later missions without rewriting the core system. Artemis III, which aims to land astronauts on the lunar surface using SpaceX’s Starship as a lander, will require Orion’s computer to manage rendezvous and docking operations in lunar orbit — a significantly more demanding computational task than the free-return trajectory of Artemis II.

So what happens if everything goes wrong? If all three computer modules fail? Orion has a minimal set of hardwired controls that allow the crew to maintain basic spacecraft attitude and initiate emergency procedures without any computer assistance. It’s the ultimate backup — analog switches that bypass the digital system entirely. NASA calls this the “lifeboat mode.” It won’t fly a precision lunar trajectory, but it can keep the crew alive long enough for ground controllers to work the problem or for the crew to execute a manual abort.

The existence of lifeboat mode underscores a fundamental truth about fault-tolerant computing in human spaceflight: no system is infallible. The goal isn’t to build a computer that can’t fail. It’s to build a system where failure doesn’t mean death. Triple redundancy, radiation hardening, software partitioning, watchdog timers, memory scrubbing, ground segment backup, and hardwired manual controls — each layer catches what the previous layer misses. The probability of all layers failing simultaneously is vanishingly small. Not zero. But small enough that four human beings are willing to sit on top of that rocket and trust it with their lives.

That trust is the ultimate test of engineering. Not benchmarks or specifications or test reports, but whether people will stake their existence on what you built. By that measure, the Orion flight computer will face its final exam when Artemis II launches. Every voting cycle, every bit-flip correction, every partitioned software task will either validate decades of careful engineering — or expose a flaw that billions of dollars and millions of engineering hours failed to catch.

No pressure.

Inside the Machine That Must Never Fail: How NASA Engineered Artemis II’s Triple-Redundant Flight Computer

Notice an error?

Ready to get started?