Re: s6-rc-init hangs during stage2

From: Vasil Yordanov <vasil.yordanov_at_gmail.com>
Date: Mon, 6 Jun 2016 12:43:33 +0300

Sorry, ignore the last question.
PS. do we actually need to run s6-rc-init on each boot as a first line in
the stage2 script?


On Mon, Jun 6, 2016 at 12:35 PM, Vasil Yordanov <vasil.yordanov_at_gmail.com>
wrote:

> Hello Laurent,
>
> I have an issue with s6-rc, using the latest master code of the s6, s6-rc,
> s6-linux-init.
>
> When I run a docker container with s6 installed on it, from time to time
> it hangs on s6-rc-init.
>
> This happens from four times, ones or even more rear, but unfortunately
> happens :(
>
> I have investigated a bit more the issue with strace, but it seems more
> complex to understand the core reason of the problem.
>
> Here is my setup.
>
> Docker, with ENTRYPOINT "/etc/s6-linux-init/init"
> root_at_25b2cb1a6998 / # cat /etc/s6-linux-init/init
> #!/bin/execlineb -P
>
> /bin/export PATH "/usr/bin:/usr/sbin:/bin:/sbin"
> /bin/cd /
> s6-setsid -qb --
> umask 022
> if { s6-echo -- "\n init created by s6-linux-init-maker\n seehttp://
> skarnet.org/software/s6-linux-init/\n\n
> <http://skarnet.org/software/s6-linux-init/%5Cn%5Cn>" }
> #if { s6-mount -nwt tmpfs -o mode=0755 tmpfs "/run" }
> if { s6-hiercopy "/etc/s6-linux-init"/run-image "/run" }
> emptyenv -p
> s6-envdir -I -- "/etc/s6-linux-init"/env
> redirfd -r 0 /dev/null
> redirfd -wnb 1 "/run"/service/s6-svscan-log/fifo
> background
> {
> s6-setsid --
> redirfd -w 1 "/run"/service/s6-svscan-log/fifo
> fdmove -c 1 2
> "/etc/rc.init"
> }
> unexport !
> cd "/run"/service
> fdmove -c 2 1
> s6-svscan -st0
>
> and the rc.init is
> root_at_25b2cb1a6998 / # cat /etc/rc.init
> #!/bin/sh
> strace -o /tmp/str_s6-rc-init s6-rc-init -c /tmp/s6-rc-compiled-dir
> /run/service
> s6-rc change
>
> Currently I have put strace to trace s6-rc-init.
>
> Here is how my processes looks like when it hangs:
> root 1 0 0 Jun04 ? 00:00:00 s6-svscan -st0
> root 7 1 0 Jun04 ? 00:00:00 /bin/sh /etc/rc.init
> root 8 1 0 Jun04 ? 00:00:00 s6-supervise s6-svscan-log
> nobody 9 8 0 Jun04 ? 00:00:00 s6-log -bp -- t
> /run/uncaught-logs
> root 10 7 0 Jun04 ? 00:00:00 strace -o
> /tmp/str_s6-rc-init s6-rc-init -c /tmp/s6-rc-compiled-dir /run/service
> root 13 10 0 Jun04 ? 00:00:00 s6-rc-init -c
> /tmp/s6-rc-compiled-dir /run/service
> root 14 13 0 Jun04 ? 00:00:00 s6-ftrigrd
> ....
>
> You can see there is one additional process s6-ftrigrd.
>
> In the strace log you can see:
>
> ....
> open("/run/s6-rc/scandir/.s6-svscan/control", O_WRONLY|O_NONBLOCK) = 6
> fcntl(6, F_GETFL) = 0x8801 (flags
> O_WRONLY|O_NONBLOCK|O_LARGEFILE)
> fcntl(6, F_SETFL, O_WRONLY|O_LARGEFILE) = 0
> write(6, "a", 1) = 1
> close(6) = 0
> ppoll([{fd=5, events=POLLIN|POLLHUP}], 1, {1152921504606846975,
> 950409000}, NULL, 8) = 1 ([{fd=5, revents=POLLIN}], left
> {1152921504606846975, 909368723})
> recvmsg(5, {msg_name(0)=NULL, msg_iov(2)=[{"\0\0\0\4\0\0\0\f!s\0\0\0\4\0\0\0
> !s\0\0\0\4\0\0\0\"!s", 2047}, {NULL, 0}], msg_controllen=0,
> msg_flags=MSG_CMSG_C
> LOEXEC}, MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC) = 30
> recvmsg(5, 0x7ffd3eaca250, MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC) =
> -1 EAGAIN (Resource temporarily unavailable)
> ppoll([{fd=5, events=POLLIN|POLLHUP}], 1, {1152921504606846975,
> 909072000}, NULL, 8) = 1 ([{fd=5, revents=POLLIN}], left
> {1152921504606846975, 907791669})
> recvmsg(5, {msg_name(0)=NULL, msg_iov(2)=[{"\0\0\0\4\0\0\0\
> 0!s\0\0\0\4\0\0\0#!s\0\0\0\4\0\0\0!!s\0\0"..., 2018}, {"", 29}],
> msg_controllen=0, msg_flags=MSG_C
> MSG_CLOEXEC}, MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC) = 350
> ppoll([{fd=5, events=POLLIN|POLLHUP}], 1, {1152921504606846975,
> 907695000}, NULL, 8
>
> it seems s6-rc-init is hanging and waiting for some event.
>
> The normal strace without hangs is
> ....
> open("/run/s6-rc/scandir/.s6-svscan/control", O_WRONLY|O_NONBLOCK) = 6
> fcntl(6, F_GETFL) = 0x8801 (flags
> O_WRONLY|O_NONBLOCK|O_LARGEFILE)
> fcntl(6, F_SETFL, O_WRONLY|O_LARGEFILE) = 0
> write(6, "a", 1) = 1
> close(6) = 0
> ppoll([{fd=5, events=POLLIN|POLLHUP}], 1, {1152921504606846975,
> 950277000}, NULL, 8) = 1 ([{fd=5, revents=POLLIN}], left
> {1152921504606846975, 911900084})
> recvmsg(5, {msg_name(0)=NULL, msg_iov(2)=[{"\0\0\0\4\0\0\0\0!s", 2047},
> {NULL, 0}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC},
> MSG_DONTWAIT|MSG_WAITALL|M
> SG_CMSG_CLOEXEC) = 10
> recvmsg(5, 0x7fff43f02640, MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC) =
> -1 EAGAIN (Resource temporarily unavailable)
> ppoll([{fd=5, events=POLLIN|POLLHUP}], 1, {1152921504606846975,
> 911598000}, NULL, 8) = 1 ([{fd=5, revents=POLLIN}], left
> {1152921504606846975, 910204953})
> recvmsg(5, {msg_name(0)=NULL, msg_iov(2)=[{"\0\0\0\4\0\0\0%!
> s\0\0\0\4\0\0\0$!s\0\0\0\4\0\0\0\1!s\0\0"..., 2038}, {"", 9}],
> msg_controllen=0, msg_flags=MSG_CM
> SG_CLOEXEC}, MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC) = 290
> recvmsg(5, 0x7fff43f02640, MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC) =
> -1 EAGAIN (Resource temporarily unavailable)
> ppoll([{fd=5, events=POLLIN|POLLHUP}], 1, {1152921504606846975,
> 910117000}, NULL, 8) = 1 ([{fd=5, revents=POLLIN}], left
> {1152921504606846975, 909970613})
> recvmsg(5, {msg_name(0)=NULL, msg_iov(2)=[{"\0\0\0\4\0\0\0#!
> s\0\0\0\4\0\0\0\t!s\0\0\0\4\0\0\0\6!s\0\0"..., 1748}, {"", 299}],
> msg_controllen=0, msg_flags=MSG
> _CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC) = 80
> recvmsg(5, 0x7fff43f02640, MSG_DONTWAIT|MSG_WAITALL|MSG_CMSG_CLOEXEC) =
> -1 EAGAIN (Resource temporarily unavailable)
> close(4) = 0
> close(5) = 0
> wait4(14, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 14
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=14, si_status=0,
> si_utime=0, si_stime=1} ---
> exit_group(0) = ?
> +++ exited with 0 +++
>
> And the processes are:
> root 1 0 0 06:11 ? 00:00:00 s6-svscan -st0
> root 8 1 0 06:11 ? 00:00:00 s6-supervise s6-svscan-log
> nobody 9 8 0 06:11 ? 00:00:00 s6-log -bp -- t
> /run/uncaught-logs
> ...
>
>
> I have observed that the hangs of s6-rc-init happens more often when I
> have more services. Currently I have 38 services and 2 bundles that group
> these services
>
> Please help me to resolve this issue.
>
> Regards,
> Vasil
>
> PS. do we actually need to run s6-rc-init on each boot as a first line in
> the stage2 script?
>
Received on Mon Jun 06 2016 - 09:43:33 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC