The "sv check" paradigm is a bit wrong-headed in its approach to
dependency handling. It forces a service to block and wait for its
dependency. To the rest of the world, that service will then seem like
it is up and running normally, when in fact it may only be waiting for
an unmet dependency.
The better paradigm for dependency checking is this: for a service
with any unmet dependency, fail immediately.
The supervisor itself will then automatically take care of trying to
restart the service at periodic intervals, until the dependency check
for that service succeeds.
The perpok(8) utility for dependency checking is included in the perp
distribution and described here:
http://b0llix.net/perp/site.cgi?page=perpok.8
A complete perp runscript for the scenario you describe might look like
this:
#!/bin/sh
exec 2>&1
start() {
echo "starting lightppd..."
## postgresql dependency check:
if ! perpok -u 3 postgresql ; then
echo "sorry: dependency check failure postgresql"
exit 1
fi
## dependency check ok, start lightppd:
exec lighttpd -f /etc/lighttpd/lighttpd.conf -D
}
reset() {
echo "resetting lightppd..."
exit 0
}
eval ${TARGET} "$_at_"
### eof (/etc/perp/lightppd/rc.main)
In many cases we may generally resist the idea of failure being okay.
But in the case of dependency checking within a service management
framework, failing -- and failing quickly -- is actually the best
thing to do.
Wayne
http://b0llix.net/perp/
On Wed, 14 Jan 2015 16:24:19 +0000
James Byrne <james.byrne_at_origamienergy.com> wrote:
> Hi,
>
> I am working on an embedded Linux system where I want to use the
> 'runit' tools to start various system services, and I have an issue
> where "sv check" doesn't seem to behave in a useful way.
>
> I have seen it suggested (specifically in the article at
> http://rubyists.github.io/2011/05/02/runit-for-ruby-and-everything-else.html)
> that "sv check" can be used to implement dependencies in the run
> file. The example given in the article is:
>
> /service/lighttpd/run:
> #!/bin/sh -e
> sv -w7 check postgresql
> exec 2>&1 lighttpd -f /etc/lighttpd/lighttpd.conf -D
>
> It goes on to say "This would wait 7 seconds for the postgresql
> service to be running, exiting with an error if that timout is
> reached. runsv will then run this script again. Lighttpd will never
> be executed unless sv check exits without an error (postgresql is
> up)."
>
> However in practice this will not work, because "sv check" will
> return exit code 0 if the "postgresql" service is down, or if it
> failed to run at all (i.e. if postgresql/run exited with a non-zero
> exit code).
>
> Having looked at the code and done various tests (using runit 2.1.2),
> "sv check" doesn't appear to be very useful with its current
> behaviour. The documentation is ambiguous about what it does, saying
> that it will:
>
> "Check for the service to be in the state that’s been requested. Wait
> up to 7 seconds for the service to reach the requested state, then
> report the status or timeout."
>
> This doesn't really make sense, because there isn't any such thing as
> the "requested state".
>
> My solution is to make the following change to sv.c:
>
> --- old/sv.c 2014-08-10 19:22:34.000000000 +0100
> +++ new/sv.c 2015-01-14 14:29:31.384556297 +0000
> _at_@ -227,7 +227,7 @@
> if (!checkscript()) return(0);
> break;
> case 'd': if (pid || svstatus[19] != 0) return(0); break;
> - case 'C': if (pid) if (!checkscript()) return(0); break;
> + case 'C': if (!pid || !checkscript()) return(0); break;
> case 't':
> case 'k':
> if (!pid && svstatus[17] == 'd') break;
>
> With this change, "sv check" works in a much more useful way. If all
> the services specified are up it will exit with exit code 0, and if
> not it will wait until the timeout for them to come up, and return a
> non-zero exit code if any are still down.
>
> Is there any reason why I should not make this change? Have I
> misunderstood what "sv check" is supposed to do? If this change is
> OK, could it be included in future releases of "runit"?
>
> Regards,
>
> James Byrne
>
Received on Fri Jan 16 2015 - 15:48:26 UTC