Re: taxonomy of dependencies

From: Laurent Bercot <ska-supervision_at_skarnet.org>
Date: Mon, 08 Jun 2015 19:55:08 +0200

On 08/06/2015 19:29, Jonathan de Boyne Pollard wrote:
> You're assuming that all softwares work like daemontools, forgetting
> that not even yours does. (-: As I pointed out, nosh makes what
> happens in the event of termination user-configurable, including the
> decision to even restart at all.

  Eh, you can do that with s6 and runit too. That's what finish scripts
are for - they know how the daemon died and it's easy enough to
perform a conditional "s6-svc -d ." or "sv down ." ; nosh provides
syntactic sugar for sure, but the functionality is the same. (Be
careful with sugar - too much of it makes software fat.)

  What you cannot do with s6 and runit is bypass the one-second
delay between two startups of the same service, and I maintain that
it's a good thing. I've Seen Things (tm), and I guarantee you that
at *some* point in the life of your datacenter, the stars will
align in a sufficiently evil way and your service will die repeatedly
from what will appear to be graceful termination, but will just be
your automation screwing up. At that point, the admin will hate
herself (gender pronoun chosen by flipping a coin) for having disabled
the waiting time between respawns on graceful termination.


> So I repeat: Sometimes, one does _not_ want these things. If it's
> doing a graceful restart, I want dnscache back up *right now*, not 1
> second from now. There's no "price" to this. Take careful note of
> the words "If it's doing a graceful restart".

  I understand the intention, but especially when you're dealing with
large numbers of machines, shit happens and *will* happen, including
mistaking a series of random SIGTERMs (which will only be started by
some root process with pid 1729 during a leap second in a leap year
when the moon is full) for graceful restarts. Why take the risk, why
even have to wonder about the risk, when you can have two dnscache
processes so it does not matter if one of them is down for a second ?
I insist that redundancy of mission-critical processes is the right
approach here, and you are attempting to solve a problem that nobody
should ever have.

-- 
  Laurent
Received on Mon Jun 08 2015 - 17:55:08 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC