From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id 3DBB0138247 for ; Sat, 18 Jan 2014 05:03:11 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 7B4EEE0B79; Sat, 18 Jan 2014 05:02:59 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 3EE2BE08A1 for ; Sat, 18 Jan 2014 05:02:58 +0000 (UTC) Received: from grubbs.orbis-terrarum.net (localhost [127.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.gentoo.org (Postfix) with ESMTPS id 53CCF33F95D for ; Sat, 18 Jan 2014 05:02:57 +0000 (UTC) Received: (qmail 29808 invoked by uid 10000); 18 Jan 2014 05:02:56 -0000 Date: Sat, 18 Jan 2014 05:02:56 +0000 From: "Robin H. Johnson" To: gentoo-dev@lists.gentoo.org Cc: gentoo-dev-announce@lists.gentoo.org Subject: [gentoo-dev] overlays.gentoo.org restoration & post-mortem Message-ID: <20140118050256.GF3378@orbis-terrarum.net> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="orO6xySwJI16pVnm" Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Archives-Salt: 6794e269-e694-4de7-bcea-e67a617c7a7e X-Archives-Hash: 129abc5b0a31fb58a8bc78e9c4d0b4b9 --orO6xySwJI16pVnm Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable overlays.gentoo.org service has been restored on a new system. =20 Some statistics and a post-mortem follow. Special thanks to antarus and a3li for all their interactions with our spon= sor, and managing most of the details. I just did the final data recovery and th= is writeup. Please resume using the service, and if you see something weird that you think is different from before, please file a bug for Infrastructure. In the process, the service moved to a new machine. The SSH keys have chang= ed as follows: DSA: d6:71:99:1f:46:c9:42:95:e1:9d:be:8e:f7:76:51:b5 RSA: 92:b5:40:16:63:a3:61:9f:d7:63:64:ba:d5:51:41:b9 ECDSA: 96:f0:29:e6:d4:85:58:46:31:ba:0e:17:0b:8c:fa:d8 As this time, we will NOT be restoring Trac due to low demand. If you still require an web-based SVN browser for old SVN repos, please contact us at infra@gentoo.org. If you have a dev/ repo under the list 'IMPORTANT' below, you MUST push to the server again. IMPORTANT: The following repos were damaged beyond repair, and were not available in backups. You'll need to push again, I have reset the repos to empty: dev/anarchy.git dev/dberkholz.git dev/dev-zero.git dev/dilfridge.git dev/fordfrog.git dev/graaff.git dev/maekke.git dev/mschiff.git dev/quantumsummers.git dev/zorry.git FYI: The following repos appeared to be empty: dev/b33fc0d3.git dev/moult.git dev/tomwij.git user/blueicefield.git user/disinbox.git user/palatis.git user/paragon.git user/vmalov.git user/xray.git FYI: The following repos contained dangling commits/tags/blobs, and this should not be considered new breakage; if you have a newer copy, you are encouraged to push again: dev/blueness.git dev/maksbotan.git dev/mgorny.git dev/qiaomuf.git dev/xmw.git proj/betagarden.git proj/catalyst.git (+tags) proj/devmanual.git proj/dotnet.git proj/elfix.git (+tags) proj/emacs-tools.git proj/gamerlay.git proj/hardened-dev.git proj/hardened-patchset.git proj/kde.git proj/lisp.git proj/openrc.git (+tags) proj/portage.git proj/ruby-overlay.git proj/sci.git proj/sunrise.git proj/webapp-config.git proj/x11.git user/gmt.git user/mv.git (+blobs) user/palmer.git Statistics: ----------- 354 repos total - 10 repos unrecoverable (all in /dev) =3D 344 repos recovered/available 9 repos that seem to empty 26 repos with dangling commits/tags/blobs 2 repos recovered from external sources. Breakdown by path: ------------------ 193 proj/ repos 69 dev/ repos 91 user/ repos 1 other repo Post-mortem ----------- Hornbill went offline around: 2014-01-10 13:13 UTC Hornbill last started a backup of VCS: 2014-01-10 07:59:04 UTC Hornbill last completed a backup of VCS: 2014-01-10 08:20:54 UTC Between the backup starting, and the server going offline, we were able to confirm writes to the following Git repos: dev/fordfrog.git proj/kde.git gitolite-admin.git We believe that there were no writes to user/ repos, but are not 100% certain, as the logging was insufficient for this purpose. Hornbill went offline just over a week ago: Mid-afternoon on a Friday for the timezone where it's located. Due staff turnover and business changes at the previous sponsor, we were not able to contact anybody until regular office hours on Monday, January 13th. The server in question, while previously functioning, was not recoverable after a remote hands reboot on Monday afternoon (UTC). On Tuesday, more the sponsor was able to examine in it more depth, and it was not recoverable. More concealingly, it turned out to be one of the few remaining Gentoo infrastructure systems with IDE drives. The data was recovered, however it seemed to have a lot of corruption. It was noted that our backups were missing all of the dev/ repos, due to a system-wide rule to exclude /dev/ from backups (the rule should only be the real /dev, not any directory simply named "dev"). For this reason, we decided to try and get the data from the old server. Verification/recovery of the remaining data was also hampered by confirming that some of the Git repos in the backup were not entirely clean, containing legacy errors that turned out to be false positives =66rom their CVS/SVN conversions, or dangling commits/blobs/tags. What could we do better next time: ---------------------------------- - Have backups of all repos! - Compare the age of the backup immediately, and consider going live with the backup. Only 5 hours of work would have been lost, and even then possibly only temporarily, due to the distributed nature of Git. - More people need to use the infra-status page to learn about the state of Gentoo services. Actions for Infra ----------------- - Include dev/ repos were not in the backup - Set up Gitolite mirroring - Review gitolite logging (needs to be easier to confirm when writes took place) --=20 Robin Hugh Johnson Gentoo Linux: Developer, Infrastructure Lead E-Mail : robbat2@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 --orO6xySwJI16pVnm Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (GNU/Linux) Comment: Robbat2 @ Orbis-Terrarum Networks - The text below is a digital signature. If it doesn't make any sense to you, ignore it. iKUEARECAGYFAlLaCwBfFIAAAAAALgAoaXNzdWVyLWZwckBub3RhdGlvbnMub3Bl bnBncC5maWZ0aGhvcnNlbWFuLm5ldDc1OTQwNEJFQkQ0MUY3MTIzODIzODZFRjNF OTIyQzIyMzIzM0MyMkMACgkQPpIsIjIzwiy3RQCfeVfyYnSMfrtnsfMy58QQKD4h JXwAmLLltJFEmHj0AkYDWQl0qXFTCJs= =IjHg -----END PGP SIGNATURE----- --orO6xySwJI16pVnm--