From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gentoo-dev+bounces-64435-garchives=archives.gentoo.org@lists.gentoo.org>
Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80])
	by finch.gentoo.org (Postfix) with ESMTP id B099B138247
	for <garchives@archives.gentoo.org>; Sat, 18 Jan 2014 05:58:26 +0000 (UTC)
Received: from pigeon.gentoo.org (localhost [127.0.0.1])
	by pigeon.gentoo.org (Postfix) with SMTP id 24585E0A69;
	Sat, 18 Jan 2014 05:58:13 +0000 (UTC)
Received: from mail-we0-f169.google.com (mail-we0-f169.google.com [74.125.82.169])
	(using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
	(No client certificate requested)
	by pigeon.gentoo.org (Postfix) with ESMTPS id F001FE0A5D
	for <gentoo-dev@lists.gentoo.org>; Sat, 18 Jan 2014 05:58:11 +0000 (UTC)
Received: by mail-we0-f169.google.com with SMTP id u57so5380612wes.28
        for <gentoo-dev@lists.gentoo.org>; Fri, 17 Jan 2014 21:58:10 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20130820;
        h=x-gm-message-state:mime-version:sender:in-reply-to:references:date
         :message-id:subject:from:to:content-type;
        bh=uP0KMx6KXcFdqy9JecJccfLYCO6KTjRV643eqLqrnCw=;
        b=jziPWUFSIfB+fBBKS3aN9qW+1Dxb/DQpuv7WawVOxzH6W2nwkoGzNY1ECADkgl70dO
         z5YixBgnG+ezprGDQ/zshJMrywW7PX1UZRRUgqJGjtvfEMgepOapDNUugOjUvX4Dg/WF
         L+4soPNASJkvrijkb/LwSoI+6Hfgg3jd/30jL6GEMuvV5BSpM1AIn7ZA8XvbcqHll9fj
         h023/XrN/OkSv6dCpVJZy7DZzP45rWYJtiiJgumOSxpHvMHBnGQTlrMc4j5LQxzsr0DX
         J9QfL2r9bPNTvA4awScSU0AX3wgnbkn3D1AJCjLxNcYbmjJyWq2nxcAfUgiMpSEQQEGZ
         hSfA==
X-Gm-Message-State: ALoCoQmlmIQmx42m/BJMr982Ohy98gYdvTw+8Lv0Bvfj6g7V6uM5BXvjZcbMw13GY/vKbhrDvktb
Precedence: bulk
List-Post: <mailto:gentoo-dev@lists.gentoo.org>
List-Help: <mailto:gentoo-dev+help@lists.gentoo.org>
List-Unsubscribe: <mailto:gentoo-dev+unsubscribe@lists.gentoo.org>
List-Subscribe: <mailto:gentoo-dev+subscribe@lists.gentoo.org>
List-Id: Gentoo Linux mail <gentoo-dev.gentoo.org>
X-BeenThere: gentoo-dev@lists.gentoo.org
Reply-to: gentoo-dev@lists.gentoo.org
MIME-Version: 1.0
X-Received: by 10.180.19.165 with SMTP id g5mr1462811wie.31.1390024690444;
 Fri, 17 Jan 2014 21:58:10 -0800 (PST)
Sender: antarus@scriptkitty.com
Received: by 10.216.170.129 with HTTP; Fri, 17 Jan 2014 21:58:10 -0800 (PST)
X-Originating-IP: [173.8.165.226]
In-Reply-To: <CAATnKFBskCUbR5iB895Vr26Ysu0fizJFF7r1R8XrWeenXjuQSQ@mail.gmail.com>
References: <20140118050256.GF3378@orbis-terrarum.net>
	<CAATnKFBskCUbR5iB895Vr26Ysu0fizJFF7r1R8XrWeenXjuQSQ@mail.gmail.com>
Date: Fri, 17 Jan 2014 21:58:10 -0800
X-Google-Sender-Auth: MBC87u2cCmv4O8s3-vLmnioOREE
Message-ID: <CAAr7Pr9GGy3KsDQJ8bzcJ95NpyUM-q-DMHd2a4FVZ=+UcoDXgw@mail.gmail.com>
Subject: Re: [gentoo-dev] overlays.gentoo.org restoration & post-mortem
From: Alec Warner <antarus@gentoo.org>
To: Gentoo Dev <gentoo-dev@lists.gentoo.org>
Content-Type: multipart/alternative; boundary=bcaec53d5281e741ec04f0385729
X-Archives-Salt: f6170b3e-b289-4d10-a87c-3309ad46cacc
X-Archives-Hash: 14494ee9dd74c142dca334fef280882e

--bcaec53d5281e741ec04f0385729
Content-Type: text/plain; charset=UTF-8

On Fri, Jan 17, 2014 at 9:23 PM, Kent Fredric <kentfredric@gmail.com> wrote:

>
> On 18 January 2014 18:02, Robin H. Johnson <robbat2@gentoo.org> wrote:
>
>> - More people need to use the infra-status page to learn about the state
>>   of Gentoo services.
>>
>
>
> A service middle layer like fastly or cloudflare which could link to the
> infra page would be good here perhaps, so when an outage occurred ( at
> least on the web side ) appropriate links to infra could be given.
>

Cloudly stuff aside (most of infra is not super experienced or trusting of
cloud stuff) I think there was a lot of indecision during the outage.
Do we wait for the sponsor or restore from backup?
How good are the backups (turns out, they were decent?)
How much work is it to rebuild from them (turns out, one evening of Robin's
time + incidentals.)

Once we got the data back on the new machine, why did we post the all
clear? Then we knew there was corruption, but it took a long time to
disable git and http access. Some repos were missing, some were corrupt,
etc.

We don't have procedures for these sorts of things. I think we were
conservative in the changes we made. How do you disable a service like
gitolite? We deployed two fixes. One was to disable ssh for the 'git' user,
the second was to move the authorized keys files out of the way. We pursued
these avenues independently, and we did not check them into configuration
management, which I wish had happened. Later when we disabled the http part
(to make overlays throw 503's) that was checked in, which was nice.
Certainly I was afraid of breaking stuff for Robin, so I really tried to
avoid doing anything unless I was confident it would not impact him.


> And the infra status page is not exactly obvious. Its not listed on the
> "gentoo sites" list on the top right, and perhaps it aught to be.
>

I consider the page a great success in this story. I'm really happy about
it, and while you can always say 'hey we could have done better here' I
think we did pretty well.

-A


>
>
>
>
> --
> Kent
>
> perl -e  "print substr( \"edrgmaM  SPA NOcomil.ic\\@tfrken\", \$_ * 3, 3 )
> for ( 9,8,0,7,1,6,5,4,3,2 );"
>
> http://kent-fredric.fox.geek.nz
>

--bcaec53d5281e741ec04f0385729
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On F=
ri, Jan 17, 2014 at 9:23 PM, Kent Fredric <span dir=3D"ltr">&lt;<a href=3D"=
mailto:kentfredric@gmail.com" target=3D"_blank">kentfredric@gmail.com</a>&g=
t;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div class=3D"im"><div clas=
s=3D"gmail_extra"><br><div class=3D"gmail_quote">On 18 January 2014 18:02, =
Robin H. Johnson <span dir=3D"ltr">&lt;<a href=3D"mailto:robbat2@gentoo.org=
" target=3D"_blank">robbat2@gentoo.org</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div style=3D"overflow:hidden">- More people=
 need to use the infra-status page to learn about the state<br>
=C2=A0 of Gentoo services.</div></blockquote></div><br><br></div></div><div=
 class=3D"gmail_extra">A service middle layer like fastly or cloudflare whi=
ch could link to the infra page would be good here perhaps, so when an outa=
ge occurred ( at least on the web side ) appropriate links to infra could b=
e given.<br>
</div></div></blockquote><div><br></div><div>Cloudly stuff aside (most of i=
nfra is not super experienced or trusting of cloud stuff) I think there was=
 a lot of indecision during the outage.</div><div>Do we wait for the sponso=
r or restore from backup?</div>
<div>How good are the backups (turns out, they were decent?)</div><div>How =
much work is it to rebuild from them (turns out, one evening of Robin&#39;s=
 time + incidentals.)</div><div><br></div><div>Once we got the data back on=
 the new machine, why did we post the all clear? Then we knew there was cor=
ruption, but it took a long time to disable git and http access. Some repos=
 were missing, some were corrupt, etc.</div>
<div><br></div><div>We don&#39;t have procedures for these sorts of things.=
 I think we were conservative in the changes we made. How do you disable a =
service like gitolite? We deployed two fixes. One was to disable ssh for th=
e &#39;git&#39; user, the second was to move the authorized keys files out =
of the way. We pursued these avenues independently, and we did not check th=
em into configuration management, which I wish had happened. Later when we =
disabled the http part (to make overlays throw 503&#39;s) that was checked =
in, which was nice.</div>
<div>Certainly I was afraid of breaking stuff for Robin, so I really tried =
to avoid doing anything unless I was confident it would not impact him.<br>=
</div><div><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 =
0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir=3D"ltr"><div class=3D"gmail_extra">
<br></div><div class=3D"gmail_extra">And the infra status page is not exact=
ly obvious. Its not listed on the &quot;gentoo sites&quot; list on the top =
right, and perhaps it aught to be.</div></div></blockquote><div><br></div>
<div>I consider the page a great success in this story. I&#39;m really happ=
y about it, and while you can always say &#39;hey we could have done better=
 here&#39; I think we did pretty well.</div><div><br></div><div>-A</div>
<div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8=
ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div class=
=3D"gmail_extra"><span class=3D"HOEnZb"><font color=3D"#888888"><br></font>=
</span></div>
<span class=3D"HOEnZb"><font color=3D"#888888"><div class=3D"gmail_extra"><=
br>
<br clear=3D"all"></div><div class=3D"gmail_extra"><br>-- <br>Kent <br><br>=
perl -e=C2=A0 &quot;print substr( \&quot;edrgmaM=C2=A0 SPA NOcomil.ic\\@tfr=
ken\&quot;, \$_ * 3, 3 ) for ( 9,8,0,7,1,6,5,4,3,2 );&quot;<br><br><a href=
=3D"http://kent-fredric.fox.geek.nz" target=3D"_blank">http://kent-fredric.=
fox.geek.nz</a>
</div></font></span></div>
</blockquote></div><br></div></div>

--bcaec53d5281e741ec04f0385729--