Re: s6: something like runit's ./check script

From: Jan Bramkamp <crest_at_rlwinm.de>
Date: Tue, 8 Sep 2015 15:04:39 +0200

On 08/09/15 14:50, Laurent Bercot wrote:
> On 08/09/2015 14:10, Jan Bramkamp wrote:
>
>> How would the ./run script or more likely the daemon it exec()ed into
>> die from a failed child process?
>
> The child process could s6-svc -t if it fails to find readiness, for
> instance. There should be an option in the polling tool to kill the
> daemon if the polling does not succeed.
> I went too far in saying "the run script will die": there needs to
> be support for that, indeed. But "the service is stuck" problem is
> easy to fix.

Not if something kills the polling script e.g. stray kill -9 $WRONG_PID.
Such things shouldn't happen but that's why I want a supervision tree
rooted in init. If anything happens to a subtree the supervisor for that
subtree restarts the subtree and if something happens to the root of the
supervision tree (init) the kernel panics and a hardware watchdog
triggers within a few seconds. To let services fail and restart the
infrastructure has to notice errors. Maybe adding an optional timeout
between forking the ./run script and the readiness notification to
s6-supervise would solve the problem without depending on other demons.
Since such errors are expected to very rare a higher recovery time
(whatever the the admin guessed as a worst case start up time) would be
an appropriate trade-off if it avoids complexity. It would make sense to
signal this condition to the ./finish script and at least log it from where.
Received on Tue Sep 08 2015 - 13:04:39 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC