Re: patch: sv check should wait when svrun is not ready

From: Buck Evan <buck_at_yelp.com>
Date: Tue, 17 Feb 2015 16:49:44 -0800

On Tue, Feb 17, 2015 at 4:20 PM, Avery Payne <avery.p.payne_at_gmail.com> wrote:
>
> On 2/17/2015 11:02 AM, Buck Evan wrote:
>>
>> I think there's only three cases here:
>>
>> 1. Users that would have gotten immediate failure, and no amount of
>> spinning would help. These users will see their error delayed by $SVWAIT
>> seconds, but no other difference.
>> 2. Users that would have gotten immediate failure, but could have gotten
>> a success within $SVWAIT seconds. All of these users will of course be glad
>> of the change.
>> 3. Users that would not have gotten immediate failure. None of these
>> users will see the slightest change in behavior.
>>
>> Do you have a particular scenario in mind when you mention "breaking lots
>> of existing installations elsewhere due to a default behavior change"? I
>> don't see that there is any case this change would break.
<snip>

Thanks for the thoughtful reply Avery. My background is also
"maintaining business software", although putting it in those terms
gives me horrific visions of java servlets and soap protocols.

> I have to look at it from a viewpoint of "what is everything else in the system expecting when this code is called". This means thinking in terms of code-as-API, so that calls elsewhere don't break.

As a matter of API, sv-check does sometimes take up to $SVWAIT seconds to fail.
Any caller to sv-check will be expecting this (strictly limited)
delay, in the exceptional case.
My patch just extends this existing, documented behavior to the
special case of "unable to open supervise/ok".
The API is unchanged, just the amount of time to return the result is changed.

> This happens because the use of "sv check (child)" follows the convention of "check, and either succeed fast or fail fast", ...

Either you're confused about what sv-check does, or I'm confused about
what you're saying.
sv-check generaly doesn't fail fast (except in the special case I'm
trying to make no longer fail fast -- svrun is not started).
Generally it will spin for $SVWAIT seconds before failing.

> Without that fast-fail, the logged hint never occurs; the sysadmin now has to figure out which of three possible services in a dependency chain are causing the hang.

Even if I put the above issue aside aside, you wouldn't get a hang,
you'd get the failure message you're familiar with, just several
seconds (default: 7) later. The sysadmin wouldn't search any more than
previously. He would however find that the system fails less often,
since it has that 7 seconds of tolerance now. This is how sv-check
behaves already when a ./check script exits nonzero.


> While this is
> implemented differently from other installations, there are known cases
> similar to what I am doing, where people have ./run scripts like this:
>
> #!/bin/sh
> sv check child-service || exit 1
> exec parent-service

This would still work just fine, just strictly more often.
Received on Wed Feb 18 2015 - 00:49:44 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC