skarnet.org
skarnet.org downtime
Past outages
- 2024-05-19: Storage
incident at Gandi.
The outage lasts for more than four days, until 2024-05-23 in the morning;
they had to copy entire disk bays' worth of data to new storage, it appears.
That tanked Gandi's service level for 2024 to under two nines,
which is worse than the average of French public ISPs (which are terrible). Nice.
Currently waiting for Gandi's post-mortem analysis of the crash; I will
update this entry as soon as they publish it.
- 2018-04-08: VM crash at Gandi.
It can happen, no big deal. Except that alyss does not boot back. It
was a Sunday, so no help from tech support. The next day, tech support
takes a couple hours to answer but points me to the right direction:
the kernel boots, but can't find the rootfs. Investigation shows Gandi
changed the way their Xen PV installation presents hard disks to the
guests, so my grub configuration was obsolete. Problem solved after
about 24 hours of downtime. My main gripe here is that I cannot make
sure that it doesn't happen again: if disk configuration changes again,
I have to modify the grub.cfg by hand, unless I install the whole
gandi-vm-config machinery that is a Python monster and a way for Gandi
to backdoor your machine as they please - which I obviously won't do.
- 2016-11-05: scheduled downtime for maintenance: alyss was
switched from Gandi's Xen hypervisor
infrastructure to their new KVM hypervisor infrastructure. I could
not make the "boot on a raw disk and have your custom kernel" feature
work, so it's still using a stock Gandi kernel for now.
- 2013-09-02: switch from antah to alyss, a virtual server at
Gandi. A few hiccups while fixing the
last bugs, but no major downtime. Complete switch to a homemade
distribution. No more hardware failures, no more distribution failures,
no more OpenSSH failures. The future is bright!
- 2007-07-20: Antah hardware failure. For several reasons, there's
one month of downtime. My apologies. Read the story
here.
- 2006-02-28: Power failure at RedBus. antah doesn't boot when the
power comes back. Analysis shows that the last Debian upgrade has messed up
lilo configuration, and the kernel can't be found. Sigh. And they ask why
I don't trust Linux distributions.
Lilo installed manually, problem fixed.
Kernel upgraded to 2.6.15.4.
- 2005-03-04: antah's main disk has been having major problems
for a few days. I go to RedBus, take the disk home, and dump it onto
another one before it's too late. The machine is back up on 2005-03-06,
9h50 (GMT+1). Kernel upgraded to 2.6.11.
- 2004-08-16, 17h (GMT+2): unable to login, so I immediately go
to RedBus and reboot. I can then login and analyze. Problem:
sshd didn't like /dev/pts/100. Linux developers pretend
it's a userland problem and OpenSSH developers pretend it's a Linux
kernel problem. Great. Kernel upgraded to 2.6.8.1, I'll try to write
a workaround to the /dev/pts/100 problem before it arises again.
- 2004-05-04, 9h50 (GMT+2) to 2004-05-07, 16h00 (GMT+2):
scheduled ISP change. The whole story can be read
here.
- 2004-02-25: 13h30 - 18h30 (Paris time, GMT+1):
scheduled kernel and boot system upgrade. No problems.
- 2003-08-27: the whole skarnet.org site was down, to support the
online demonstration
of the FFII against software patents.
Downtime from 2003-08-27 at 03:00 GMT+2 to 2003-08-28 at 15:00 GMT+2.
Kernel upgraded to 2.4.22, init system upgraded.
- 2003-02-19: 6h - 8h (Paris time, GMT+1): scheduled electrical
upgrade. ClaraNet warns their users only a day in advance, pretending
that the previous upgrade was incomplete and must be fixed
immediately. The outage actually starts at 6:10 and ends at 10:35.
- 2002-12-12: 6h - 8h (Paris time, GMT+1): scheduled electrical
upgrade. Kernel upgraded to 2.4.20. No boot problems.
- 2002-10-08: Power outage at ClaraNet.
Antah doesn't boot properly when the power comes back. Cause: modutils failure -
hardcoded /sbin paths in binaries and in kernel. Sigh. Upgraded to
2.4.19, without module support; modutils thrown out.
Scheduled outages
None planned for the moment.