Re: tipidee + nph + long-lived connection from Laurent Bercot on 2025-05-31 (skaware)

From: Laurent Bercot <ska-skaware_at_skarnet.org>
Date: Sat, 31 May 2025 20:33:51 +0000

>may very well be a significant overhead compared to the operation itself.

  In the wise words of someone: profile, don't speculate.

  If you have a situation where you fear that establishing a new HTTP (or
worse, HTTPS) connection will negate the benefits of sendfile, the thing
you need to do is benchmark it.
  Run your script as a CGI. Time it. Run it as an NPH. Time it. Repeat
until you have a sufficient sample. See how both cases compare. Then we
have a basis to identify bottlenecks and have a discussion on what the
best way to improve performance is.

  Without that, all we have is more or less educated guesses, and even
the
most educated guesses can be totally wrong.

  My hunch is that it really doesn't matter. If you're on Linux, tipideed
(when it sticks around for non-NPH) streams CGI data with splice(),
which
is zero-copy just like sendfile(), so apart from a few context switches,
there should be no difference with the NPH case. And if you're serving
over HTTPS, the data will need to be copied to userspace anyway to be
encrypted, and that's where the bottleneck will likely be: if anything,
kernel-mode TLS would probably be the best bang for your buck when it
comes to performance improvement, but that's not on the s6-networking
roadmap at this time.

  If you have benchmarks, measurements, concrete data, I'm very
interested:
I would really like to know how tipidee fares performance-wise when
heavily loaded. But if you're brainstorming in advance, trying to
predict
where the bottlenecks will be... don't. You're wasting your time without
hard data.

  An anecdote I have told multiple times, but maybe not on this list:
right before releasing tipidee, I feared that there would be a
bottleneck
in s6-tcpserver - the lookups done when a new client connected and when
a child died were in a simple array, so, linear in the number of
connections - so under heavy load the overhead of a connection would be
quadratic. So I rewrote s6-tcpserverd to use binary search trees, so
the overhead would be in O(n log n) instead; and I did a comparative
performance test of both versions.

  The result was that it didn't matter. The binary search tree version
was slightly less efficient than the linear version for light loads
(under 1k concurrent connections), and slightly more efficient than
the linear version for heavy loads (over 4k), but nothing noticeable.
However, what the performance test showed me is that both versions
actually spent a lot of time in fork().

  So I wrote a version of s6-tcpserverd that used posix_spawn() instead
of fork(), and benchmarked it. And there, the performance gains were
*incredible*. They made the difference between O(n^2) and O(n log n)
completely irrelevant. I could actually flirt with c10k, whereas trying
c10k with fork() would just asphyxiate the server.

  So I made deep changes to skalibs, adding the cspawn interface in order
to automatically use posix_spawn() in place of fork() on systems that
support it, and removed every explicit call to fork() everywhere I could
replace it with cspawn. It was some work, but I'm very happy with the
result.

  Morality: the performance bottleneck was not at all where I expected
it to be, and only hard testing data showed it.

  (I kept the O(n log n) algorithm, because I'm a CS nerd.)

--
  Laurent

Received on Sat May 31 2025 - 22:33:51 CEST

This archive was generated by hypermail 2.4.0 : Sat May 31 2025 - 22:34:24 CEST