Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I'd love to see data on the average on-call incidents

Don't have any hard data to compare but having been involved in debugging running Erlang systems. It's very nice having the ability to restart separate supervisors while the rest of the processes handle requests. Being able to do hot code loading to say fix bugs or add extra logging. And my all time favorite -- live tracing after connecting to a VM's remote shell. You can just pick any function, args, and process and say "trace these for a few seconds if a specific condition happens". None of those individually are earth shattering but taken together they are just so pleasant to use. I wouldn't enjoy going back to anything didn't those capabilities.

And yes, that restarting of sub-systems (supervision trees) happens automatically as well. There were a number of cases were it turned a potential "wake up 4am and fix this now, cause everything crashed" into a "meh, it's fine until I get to it next week" kind of a problem.



Is there a good write up of how to do that somewhere?


Which part or just in general ops with Erlang?

Overall I would say this book is a good start https://www.erlang-in-anger.com/

Supervisors are just a general pattern in Erlang. Any book will have something about it. I like this one: https://learnyousomeerlang.com/supervisors

Restarting frequency and limits are just one of the parameters you specify. So don't need to do anything fancy or special there.

Hot code loading might not be as obvious: http://erlang.org/doc/reference_manual/code_loading.html but is essentially just compiling the module on the same VM version (or close by, no more than 2 version away), copying it to the server in the same path as the original. The original could be save to a backup file. The do `l(modulename)` to load it.

For tracing I recommend http://ferd.github.io/recon/. Erlang in Anger book will also have example of tracing. http://erlang.org/doc/man/dbg.html has some nice shortcuts too, but be careful using it in production is it doesn't have any overload protection. So if you accidentally trace all the messages on all the processes, you might crash your service :-)


Tracing is mainly what I was going for. I'm very familiar with the various patterns and the run time, but I haven't seen the tracing aspects referenced in as much detail.

Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: