The article is a bit light on technical details, but the following is noteworthy...

Nokinside · on May 11, 2021

It's standard way to program in industrial automation, flight control and safety critical system.

It's how digital PLC process control loop works. Programmable logic controller (PLC) https://en.wikipedia.org/wiki/Programmable_logic_controller

detritus · on May 11, 2021

> Wow, sounds an awful lot like a typical event loop from a game engine.

I coulda sworn I recall John Carmack making a similar comparison when he was working on Armadillo Aerospace, but implying that rockets were actually a bit simpler.

Apparently I misrecalled...

https://twitter.com/id_aa_carmack/status/557223985977765890?...

noneeeed · on May 11, 2021

Embedded high-integrity systems have traditioanlly used this kind of cycle.

It's simple to reason about and analyse. You can make assertions about worst case execution time and memory useage and you know for a fact what the order of execution is. The concurrancy on large systems like this often comes in the form of a lot of small computers talking to each other over some network/bus rather than having many systems all running on the same hardware.

Building concurrant systems with predictable real-time characteristics is hard. When you have a bunch of things that really need to happen every Nth of a second in order to fly straight, a simple approach like this is definitely preferrable. In situations like this predictability tends to be more important than raw performance (assuming you can reliably hit minimums).

That doesn't mean people don't use multi-threading in domains like this, but it's one of those things you want to keep as simple as you can and avoid where possible.

sudhirj · on May 11, 2021

Seems like a really clean and testable system, too. Can make a harness with a set of inputs, run the cycle and test outputs consistently. Performance is also easily checked, each cycle needs to run without 100ms for the 10hz systems, I guess, including garbage collection.

Nice it see things kept this simple.

jb1991 · on May 11, 2021

> including garbage collection.

Not only does flight control software typically never use garbage collection, it is also preferable to never allocate memory during the runtime -- all memory ever needed is allocated at program startup, and memory is not acquired or released until landing or some other major end event. Because memory issues are often the most significant causes of crashes, which you don't want to translate into, well.. a different kind of crash.

The creator of C++, Bjarne, wrote a nice paper on flight control software if you are interested in more.

bombcar · on May 11, 2021

They've been known to leak memory ...

https://devblogs.microsoft.com/oldnewthing/20180228-00/?p=98...

>This sparked an interesting memory for me. I was once working with a customer who was producing on-board software for a missile. In my analysis of the code, I pointed out that they had a number of problems with storage leaks. Imagine my surprise when the customers chief software engineer said "Of course it leaks". He went on to point out that they had calculated the amount of memory the application would leak in the total possible flight time for the missile and then doubled that number. They added this much additional memory to the hardware to "support" the leaks. Since the missile will explode when it hits its target or at the end of its flight, the ultimate in garbage collection is performed without programmer intervention.

TchoBeer · on May 11, 2021

I think we've solved the GC vs Manual debate. Just add more RAM and explode your computer when it's done running

rm445 · on May 11, 2021

Very funny. But leak memory until you have to restart the process seems to be a very common strategy in practice. The programs explode even if the computer doesn't.

TchoBeer · on May 11, 2021

Many moons ago I remember talking with someone who works in HFT software and they said they'd just load the system with tons of RAM because allocation/deallocation and even manual memory management was too slow.

crazypython · on May 11, 2021

Really, this is just region-based memory management or actor-based memory management.

mottosso · on May 11, 2021

It's actually rocket-based memory management. Ba dum tss!

HeyLaughingBoy · on May 11, 2021

Reminds me of an old discussion in comp.arch.embedded (dating myself here!) about what you say when people at a party ask you what you do.

Hands down best answer was the engineer from a defense company: "I build flying robots that arrive at their destination in a really bad mood."

jetrink · on May 11, 2021

That is a great story, but I cringed at the part about adding additional hardware to support their leaky code. Surely there had to be a better way?

bodhiandphysics · on May 11, 2021

Don’t overengineer! Malloc has o(n) time in reallocs, so leaking memory can be a viable strategy

Edit: yes I know it’s more complicated than that!

TchoBeer · on May 11, 2021

Allocating/deallocating memory has a cost, if you can afford to just add more memory that's faster than any clever memory management solution you have.

mrfusion · on May 11, 2021

Do you have a link to bjarnes writeup? My ask Jeeves skills are failing me.

LeifCarrotson · on May 11, 2021

I don't think it was actually written by Bjarne, I think it was K. Carroll and others from Lockheed Martin, but I expect the document is this one:

https://www.stroustrup.com/JSF-AV-rules.pdf

which is linked from the Applications page of Bjarne's website.

maxlybbert · on May 11, 2021

There are a couple others as well:

https://stroustrup.com/autonomics09.pdf

https://stroustrup.com/sec09.pdf

https://stroustrup.com/mbd09.pdf

jb1991 · on May 11, 2021

Ah, thanks for the link; yes this is the one I was referring to. I had thought it was by Bjarne since it was on his site, but either way, it's an interesting read.

funklute · on May 11, 2021

This one might also be interesting https://web.cecs.pdx.edu/~kimchris/cs201/handouts/The%20Powe...

It's basically just a quick list of 10 useful rules to follow for safety-critical code.

(I've seen a different format in the past, that wasn't quite as fancy, but this is the only version I can find at the moment)

TchoBeer · on May 11, 2021

Interesting. I wonder how compiler-dependent rule 10 is. Like, if I'm writing for a compiler that gives really bad and usually unhelpful warnings that make my code worse... but I suppose these are more very strict guidelines than rules.

pjc50 · on May 11, 2021

That section explicitly addresses that question and says you should rewrite it in a way that avoids the warning, since "usually unhelpful" means "sometimes critical". It's certainly an uncompromising view but that's what you get when failure is disastrous.

WJW · on May 11, 2021

Pretty sure they use a language without a garbage collector for the control software. Probably C or ADA. Look at the FAA specifications for more guidelines. For airplanes it's DO-178C I think, not sure about rockets.

EDIT: One of the coolest projects I saw in recent time was CoPilot, which is a Haskell dialect that compiles down to C control loops which are statically guaranteed to run in constant time and memory. There is also arduino-copilot for if you want to play with ultra-hard real time software but can't afford an entire rocket.

helsinkiandrew · on May 11, 2021

According to the Reddit AMA last year SpaceX use: "C & C++ for flight software, HTML, JavaScript & CSS for displays and python for testing"

icegreentea2 · on May 11, 2021

Yes, this stuff is nice.

But it is worth noting that these control loops can often have some sort of memory, so you normally need to test over multiple cycles - you usually have a test vector to load in, and watch for the response vector.

Trivial example would be if you had a PID controller implemented into your main loop. Your main loop would be storing the integral and "previous" error term.

bagels · on May 11, 2021

Equivalent to handling an additionaanl sensor input and an additional output, no?

icegreentea2 · on May 11, 2021

Yes, you could in principle convert any memory within a given loop into a separate unit. You just gotta pick between the tradeoffs between having more I/O and software/hardware units, versus having more memory within specific software/hardware units.

You just gotta find a balance that works for your requirements and constraints. In my application (it was medical devices), we found that stateful loops fit nicely.

EmielMols · on May 11, 2021

True, but you do want to test the emergent control system.

jcims · on May 11, 2021

That's the irony of closed loop control systems. They appear to be simple but the emergent behavior, particularly when there is a hierarchy of control loops, is incredibly complex.

numpad0 · on May 11, 2021

I had once thinking about this type of system, and later found that the SNMP(1988) as well as NIST/Army 4D/RCS project(1980s) had this train of thought before I was even born. Now I'm wondering why this type of distributed, synchronized, hard realtime decision making framework don't seem to exist as some Apache Foundation project or something. It just sounds right and massively useful but I can't find an implementation, at least on public Internet -- why?

icegreentea2 · on May 11, 2021

There are a few layers to your question.

If you mean distributed to mean "over multiple computers", then ya... this is a really specialized field. Your ability to synchronize and maintain hard real-timeness between computers is completely dependent on IO and hardware features, and topology. This makes it much harder to make a generalized framework.

We we ignore the distributed portion, then you're describing a real-time operating system - in which case something like https://www.freertos.org/ might fit the bill?

numpad0 · on May 11, 2021

I mean realtime inter-node message passing in a tree topology network, with message frequency associated to edges, like shown in [1]. Each nodes might run on its own CPU clocks, but message exchanges occur realtime, progressively faster at each depth, and the system as a whole(what "Autonomous System" probably supposed to mean) runs as a slow but coherent realtime cluster.

Perhaps the idea that message passing has to happen realtime is something obvious or elementary to military/defense/aerospace from technical requirements or experiences fighting fog of war etc., but to me it was dumbfounding new and I guess it might also be for the general public?

1: https://en.wikipedia.org/wiki/File:4D-RCS_reference_model_ar...

pjc50 · on May 11, 2021

It's not really a framework that you implement, more a process for making an implementation?

zie · on May 11, 2021

See the Beam/OTP as part of Erlang and now Elixir, Gleam, etc.

Erlang came out of Ericcson when they were building giant telephone switches.

Nowadays things like RabbitMQ and WhatsApp run on the Beam VM.

The beam handles both local and remote nodes, mostly transparently.

Then there are things like Nomad or k8s which try to tackle the problem more like *nix(linux/BSD/etc) does, but across multiple nodes. These are not meant for hard real-time systems like launching rockets obviously, not even for soft real-time.

StreamBright · on May 11, 2021

Even though you can achieve pretty good latency with Erlang because of the lack of global memory (most if not all allocation is local to an Erlang process).

mstipetic · on May 11, 2021

That’s mostly how kubernetes works also. It’s not that uncommon of a pattern

StreamBright · on May 11, 2021

This is not how Kubernetes works. The concepts are completely different (pod, allocation etc.)

mstipetic · on May 11, 2021

all the main controllers work by running a reconciliation loop constantly, what are you talking about?

treespace88 · on May 11, 2021

Really nice for integration testing. Not to be confused with unit testing.

You could literally replay entire missions.

ACS_Solver · on May 11, 2021

The article's indeed light on details, but what it describes sounds very similar to autonomous driving / driver assistance software I have experience with. That's not really surprising, as the overall system for an autonomous car and a spaceship is very similar - keep processing inputs from sensors, calculate some values and manage state machines, use those results to adjust outputs, some of which may operate actuators, and repeat.

The similarity with game event loops is IMO superficial. A game typically processes events as fast as it can, and the time between any two iterations of the loop is completely unpredictable. One frame could render in 20ms and the next in 25ms, which is completely fine. For a car or spaceship though, there are hard real-time requirements. Like the article describes, some Dragon tasks run every 100ms, others run every 20ms. A lot of effort has definitely gone into making sure the tasks can complete quickly enough, with some margin, and there are definitely systems that monitor these deadlines and treat even one timing miss as a major failure.

hutzlibu · on May 11, 2021

"One frame could render in 20ms and the next in 25ms, which is completely fine."

No it is not fine, if you want to have a real smooth gameplay. So also in game loops, you can adjust for that, by checking timediff.

But sure, the difference is that, if you miss a frame in a game loop - no real rockets crash, so there is probably (and hopefully) not as much effort put into that, like they do on SpaceX.

ACS_Solver · on May 11, 2021

Framerates are averaged over much longer periods than two frames. Rendering 45 frames in 20ms each and then having a couple take 25ms just doesn't matter mostly, and addressing that is the kind of extra "nice to have" (and even then not for every kind of game). Yeah it's nice to have consistent frame times but less important than avoiding individual frames getting very slow, and a framerate drop certainly won't be treated by the engine as a critical failure that causes e.g. a crash or restart.

On a hard real-time system like the Dragon, failing the deadline once is a critical error just like a game engine being unable to talk to the GPU.

dralley · on May 11, 2021

This is just the distinction between soft real-time and hard real-time.

oskhan · on May 11, 2021

SpaceX has been hiring game developers for years now, they even showed up at GDC once to pick up game devs.

skizm · on May 11, 2021

Makes sense from a financial standpoint too since game devs are known to be passionate enough to accept lower salaries if they can work on cool stuff. Not necessarily knocking it (not my preference), but it matches exactly what I hear from people that work there. The recruiters even say the pay is low, but you'll have "SpaceX" on your resume so it's worth it.

ilrwbwrkhv · on May 11, 2021

Also it's world changing work. The best you can do at other faang companies is click ads.

noir_lord · on May 11, 2021

I mean...Amazon has AWS which has to be fascinating to work on (though the impression I get is not a pleasant place to work).

valarauko · on May 11, 2021

Do we know how much lower?

Teknoman117 · on May 11, 2021

The part that's way harder though is that at least in a game engine, you know the absolute world state and when you make a change to your world state, it happens exactly as you wanted.

In dynamic systems, such as a rocket, robot, etc., reality is fuzzy. Your loop is mostly about "what is my best guess at my present state" and "what is my desired next state". You make changes to try and get from A to B but you may not exactly achieve B. You may have not even been in state A. The error propagates back into your next guess of state, and this repeats forever.

Sensors like GPS give you an absolute position but they're imprecise. Inertial navigation is extremely accurate in telling you the change from a sample to the next, but as your position is the second integral of your acceleration, any error compounds quickly (admittedly the quality of the INS available to people willing to spend 10s of millions of dollars on a vehicle FAR exceeds what you'd be able to put in a car). Some rockets have even used ground based radar measurements uplinked to them to further improve position estimates. They probably still do this, I just don't have any hard data on current launchers.

codeulike · on May 11, 2021

I can't really think of any other way of doing it. Interrupts? That would be bonkers though.

devit · on May 11, 2021

You could abstractly structure the computation as a graph (there are many concrete approaches to that) and only recompute the parts that change due to changed input or changed intermediate results.

If you have multiple outputs you can also have a scheduling/prioritization system for the subtasks.

And yes, use interrupts or multiple timers to detect only changed parts without having to compare current input to previous input.

It's basically the same problem as updating a browser DOM in response to application state changes.

icegreentea2 · on May 11, 2021

Right, but since you typically want timing guarantees (for example, "respond to change in measured acceleration within 20ms"), you'd often end up structuring and resourcing everything for the worst case (everything needs to be computed) anyways... which means that your graph optimization didn't end up saving any resources.

Also, there sometimes ARE interrupts meshed in with these main loop based systems (for whatever reason). They just aren't predominant. If your willing to through more hardware at things, you can typically always transform anything into a buffer.

HeyLaughingBoy · on May 11, 2021

That pretty well describes the board in front of me ATM. Most of the system runs on a 20ms (I think) tick to manage the sensor read/evaluate/control loops, but there's a 10kHz timer interrupt always going that handles commutation of a stepper motor if it needs to run. 99.9% of the time, the interrupt fires, sees there's nothing to do and just returns.

froh · on May 11, 2021

for battery electric use cases it saves energy: you do just the necessary computations and you sleep the CPU until the next cycle. if little has changed you can sleep a lot. if a lot of rumpus is happening, you use the full cycle.

PaulDavisThe1st · on May 11, 2021

Remind me not to drive a vehicle where you implemented the control systems.

This is a great example of a little knowledge being dangerous. There's a reason control systems are not implemented the same way as a browser DOM. There's a reason control systems are designed to avoid subtasks, conditional execution, interrupts (as much as possible) and complex data models. The reasons are pretty damn solid, IMO.

Conceptually, you can talk about the two domains as being somehow equivalent. That says nothing about the appropriate implementations.

tomp · on May 11, 2021

But surely you can still do that, while maintaining compliance to the "event loop" interface...

codeulike · on May 11, 2021

By 'abstractly structure the computation as a graph' you mean 'structure the journey of the rocket as a graoh'?

devit · on May 11, 2021

No, as in a the graph starts from nodes corresponding to sensor inputs, with edges going to nodes that represent computations based on those, to further computations, to the outputs that drive actuators.

If only one sensor input changes then intermediate computations that don't depend on it don't need to be updated.

codeulike · on May 11, 2021

Right. I can see how that could be useful in some situations but presumably it wouldn't work very well for a very dynamic situation like a rocket in flight (changing height, orientation, fuel, thrust every millisecond)

And you still need a control loop to look for sensor changes? Its just a way of caching most of the computation?

fighterpilot · on May 11, 2021

It can work well for dynamic event-driven problems, with no need for a central busy loop[1]. Any sensor changes are responsible for triggering an update in any nodes that depend on it, who are responsible for triggering nodes that depend on those, and so on. One way to do it: abstract class Node, which has a list of child nodes that it loops through and alerts whenever there's been a state change in the parent node.

[1] There is a busy loop but it's only there to trigger tasks on a timer.

furiouslambda · on May 11, 2021

I’ve never seen this implemented well in a real-time setup. I’ve come to associate it with everything falling apart and otherwise being taped together in weird ways to function.

It sounds elegant, but I’m not convinced there’s a benefit to a model of abstractions of abstractions like this unless it’s done at the language level. In practice, I’ve only seen this produce a web of race conditions, performance issues, and system complexity. My experience is anecdotal.

fighterpilot · on May 11, 2021

> I’m not convinced there’s a benefit

I've seen it work well once, but others struggled to contribute to it because it was so complicated and it added little benefit to the usual busy loop approach. So I'm in the same boat as you.

PaulDavisThe1st · on May 11, 2021

One reason why this is dangerous is that you can see extremely large changes in the CPU load. For a bunch of cycles, there's apparently nothing to do, then there's a whole bunch of stuff to do, then there's almost nothing etc.

This makes it hard to reason about the overall capacity of the system (in terms of numbers of sensors and actuators). You will always end up uncertain whether you've ever really evaluated the "full load" condition, even if you formally test it in ways that say you have.

gwmnxnp_516a · on May 13, 2021

The flight software is actually a real-time (aka deterministic) embedded system software. And the control loop is the typical control loop found in embedded systems: 1 - read data from sensors through ADC (Analog-To-Digital Converters), I2C, SPI, CAN (Controller Area Network) and so on; 2 - compute the output to actuators, such as motors, hydraulic cylinders, valves, motors, lights and so on, using some control law and the current state; 3 - repeat the cycle. The algorithm that drives the output may be based on algorithms from control theory, namely state space model, PID control or Kalman filter. The computers that they may be using might be single-board computers based on ARM-core, MIPS-core, PowerPC or even X86 variant for embedded systems and/or lots of microcontrollers. Before the advent of computers, the control theory algorithms were implemented using "analog computers", which are specially designed analog circuits for computing differential equations.

Control theory algorithms can be developed using tools such as Matlab, Matlab-Simulink, Modelica or Scilab-Scicos (Open-Source) from Inria. Besides C or C++, another language used in embedded systems like that is Ada, which is much safer and reliable than both C and C++.

Unklejoe · on May 11, 2021

This is also how I'd normally implement control systems in general. Sample the input, calculate, then set output.

WrtCdEvrydy · on May 11, 2021

This is an extremely common pattern for robotics processing... so I wonder if they just consider the rockets a robot where you're just using a finite state machine based on inputs.

ragebol · on May 11, 2021

Industrial applications often use PLCs, Programmable Logic Controllers that also do this.

Anarch157a · on May 11, 2021

That's basically how an arduino works and, IIRC, the Apollo Guidance Computer.

xen2xen1 · on May 11, 2021

And didn't the Apollo Guidance Computer do some of it in analog?

jccooper · on May 11, 2021

It had manually encoded ROM in the form of "core rope memory", which is pretty wacky, but it was a digital computer. In fact, it was the first IC computer.

You can learn way too much about it and even operate (a simulation of) one here: http://www.ibiblio.org/apollo/

cronix · on May 11, 2021

> Wow, sounds an awful lot like a typical event loop from a game engine.

Or Arduino's loop() method, where all the code just runs in a loop "forever."

holoduke · on May 11, 2021

You could probably use some of the game "Kerbal Space Program" source code. Would be a good start to control the rockets.

NortySpock · on May 11, 2021

KSP would be proprietary code designed for a simulation, not for inside a vehicle.

chasd00 · on May 11, 2021

In all the embedded code I’ve seen there is invariably a “while(true)” somewhere. Is this not the same?

useryman · on May 12, 2021

On Arduinos, it's actually "for (;;)", which surprised me.

https://github.com/arduino/ArduinoCore-avr/blob/master/cores...

galangalalgol · on May 11, 2021

Unlikely, they talk about 50hz and 10hz tasks. Real time OS often allow you to create threads that run at a given rate and must comolete. Like the 50hz task would have 20ms to complete.

yetihehe · on May 11, 2021

Very likely:

  while(true) {
    ...[50hz tick tasks]
    if(tick%5==0) {
    ...[10hz tick tasks]
    }
    
    wait_for_next_tick();
    tick++;
  }

photojosh · on May 12, 2021

`wait_for_next_tick();` in any microcontroller-based embedded system will put the CPU to sleep, often lowering your overall power consumption from the mA range it uses while actively running, to the uA range.

Some systems, like those based on coin cells, will completely shut down the chip, leaving only a single timer or input active, and when triggered will turn back on and initialise the entire CPU again to handle the input/timer. Some microcontrollers give you a small allocation of you can preserve under these conditions. That's how you get year+ runtimes on tiny batteries.

Edit: if you want to nerd out on this, I would start with Jack Ganssle's excellent guide: http://www.ganssle.com/reports/ultra-low-power-design.html

galangalalgol · on May 11, 2021

If they are rolling their own os or are using microcontrollers sure, but if they are using a safety critical rtos like vxworks or integrity then it has rate monotonic scheduling built in using os threads.

BigMajestic · on May 11, 2021

Nope, more likely there are internal CPU timers which emits hardware interrupts on which the tasks are performed

Yoofie · on May 11, 2021

No, he is correct. Alot of embedded software uses a more complex form of that exact technique that he is shown in the code.

You technically can have a dedicated timer interrupt for every task but there are usually alot more tasks than HW timers, so instead they use a dedicated HW timer for time-keeping and use that as reference for all other tasks.

yetihehe · on May 11, 2021

Exactly. IF there is operating system used (it's not always needed), programmers implement separate OS tasks. For simpler systems (like just reading some sensors) real time os is not needed, then such loops are implemented as a template (with much more error checking and restarting). Typically that "wait for tick" function just waits until hardware timer sets overflow flag (not even full interrupt), and this is done to have simple, easy to reason about system.

chasd00 · on May 11, 2021

I guess I should also say at the bottom of those while loops there’s always some wait function attached to a timer