Re: tipidee + nph + long-lived connection from Mario Rugiero on 2025-05-31 (skaware)

From: Mario Rugiero <mrugiero_at_gmail.com>
Date: Sat, 31 May 2025 18:44:24 -0300

> On 31 May 2025, at 17:33, Laurent Bercot <ska-skaware_at_skarnet.org> wrote:
>
>> may very well be a significant overhead compared to the operation itself.
>
> In the wise words of someone: profile, don't speculate.
>
> If you have a situation where you fear that establishing a new HTTP (or
> worse, HTTPS) connection will negate the benefits of sendfile, the thing
> you need to do is benchmark it.
> Run your script as a CGI. Time it. Run it as an NPH. Time it. Repeat
> until you have a sufficient sample. See how both cases compare. Then we
> have a basis to identify bottlenecks and have a discussion on what the
> best way to improve performance is.
>
> Without that, all we have is more or less educated guesses, and even the
> most educated guesses can be totally wrong.

I'm currently debugging my implementation, so it's too soon for profiling. What I
did measure before is that on my server it takes about 1.5 seconds to establish
the HTTPS session.
(Keep in mind I'm calling a server a small netbook with an Atom processor and
2GiB of RAM)

Also, I like to do dumb things in my free time, I spend a lot of time profiling at
my day job.

> My hunch is that it really doesn't matter. If you're on Linux, tipideed
> (when it sticks around for non-NPH) streams CGI data with splice(), which
> is zero-copy just like sendfile(), so apart from a few context switches,
> there should be no difference with the NPH case.

Is that already on Alpine or does it require building? I seem to recall streaming
is part of the latest release. Nice to know that it does it with splice. I might
switch back to regular CGI then.

> And if you're serving
> over HTTPS, the data will need to be copied to userspace anyway to be
> encrypted, and that's where the bottleneck will likely be: if anything,
> kernel-mode TLS would probably be the best bang for your buck when it
> comes to performance improvement, but that's not on the s6-networking
> roadmap at this time.

A kTLS version of s6-tls-io is in my backlog precisely because of that. It's
already a bottleneck for KOReader, which is dumb enough to open a connection
per request when syncing Wallabag. My plan for that was fixing it in the
KOReader plugin, as it knows it's going to be a burst of requests and it should
keep the connection alive.

>
> If you have benchmarks, measurements, concrete data, I'm very interested:
> I would really like to know how tipidee fares performance-wise when
> heavily loaded. But if you're brainstorming in advance, trying to predict
> where the bottlenecks will be... don't. You're wasting your time without
> hard data.

As mentioned, I measured session negotiation in the same server, for a
different use-case. For small static files it tends to be much longer than the
transfer itself in that case, as they are in the tens of milliseconds. In the
current case it will depend mostly in how big the blobs are I believe.
I haven't measured this project yet.

> An anecdote I have told multiple times, but maybe not on this list:
> right before releasing tipidee, I feared that there would be a bottleneck
> in s6-tcpserver - the lookups done when a new client connected and when
> a child died were in a simple array, so, linear in the number of
> connections - so under heavy load the overhead of a connection would be
> quadratic. So I rewrote s6-tcpserverd to use binary search trees, so
> the overhead would be in O(n log n) instead; and I did a comparative
> performance test of both versions.
>
> The result was that it didn't matter. The binary search tree version
> was slightly less efficient than the linear version for light loads
> (under 1k concurrent connections), and slightly more efficient than
> the linear version for heavy loads (over 4k), but nothing noticeable.
> However, what the performance test showed me is that both versions
> actually spent a lot of time in fork().
>
> So I wrote a version of s6-tcpserverd that used posix_spawn() instead
> of fork(), and benchmarked it. And there, the performance gains were
> *incredible*. They made the difference between O(n^2) and O(n log n)
> completely irrelevant. I could actually flirt with c10k, whereas trying
> c10k with fork() would just asphyxiate the server.
>
> So I made deep changes to skalibs, adding the cspawn interface in order
> to automatically use posix_spawn() in place of fork() on systems that
> support it, and removed every explicit call to fork() everywhere I could
> replace it with cspawn. It was some work, but I'm very happy with the
> result.
>
> Morality: the performance bottleneck was not at all where I expected
> it to be, and only hard testing data showed it.

Been there.

>
> (I kept the O(n log n) algorithm, because I'm a CS nerd.)

Can relate.

>
> --
> Laurent
>
Received on Sat May 31 2025 - 23:44:24 CEST

This archive was generated by hypermail 2.4.0 : Sat May 31 2025 - 23:45:07 CEST