From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1RYt3l-0002t6-1z for garchives@archives.gentoo.org; Fri, 09 Dec 2011 05:31:49 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 6B2E621C0C4; Fri, 9 Dec 2011 05:31:39 +0000 (UTC) Received: from mail-ww0-f41.google.com (mail-ww0-f41.google.com [74.125.82.41]) by pigeon.gentoo.org (Postfix) with ESMTP id 96DA121C0A2 for ; Fri, 9 Dec 2011 05:30:52 +0000 (UTC) Received: by wgbdt12 with SMTP id dt12so2684643wgb.4 for ; Thu, 08 Dec 2011 21:30:51 -0800 (PST) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org MIME-Version: 1.0 Received: by 10.180.103.170 with SMTP id fx10mr2044493wib.56.1323408651808; Thu, 08 Dec 2011 21:30:51 -0800 (PST) Sender: antarus@scriptkitty.com Received: by 10.227.206.197 with HTTP; Thu, 8 Dec 2011 21:30:51 -0800 (PST) In-Reply-To: <4EDD43B1.5050803@gentoo.org> References: <2345971.y9mealLpxW@grenadine> <4EDD43B1.5050803@gentoo.org> Date: Thu, 8 Dec 2011 21:30:51 -0800 X-Google-Sender-Auth: 1I7sd5tU1deaxJWqRTuF77ZDxhE Message-ID: Subject: Re: [gentoo-dev] sources.gentoo.org instability From: Alec Warner To: gentoo-dev@lists.gentoo.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Archives-Salt: 8de4bad0-a490-4abb-b04b-da4286b93d42 X-Archives-Hash: 2ace36dbafb26bb1a8a905a9a64623b8 2011/12/5 Ch=C3=AD-Thanh Christopher Nguy=E1=BB=85n : > Alec Warner schrieb: >>> Seriously, what do we gain from crawlers accessing sources.gentoo.org? = =C2=A0I cant >>> really remember seeing it once in a google query result... >> >> We want the site searchable. > >>>> The majority of the expensive requests are related to package.mask and >>>> use.local.desc queries by crawlers. Like crawling the entire 13000 rev >>>> history for package.mask (or similar.) > > Would it be feasible to use mod_rewrite to direct the most expensive > requests to a static copy, which is re-generated every > ${REASONABLE_TIMEFRAME}? For now user-agents that look like a bot get sent to sources2.gentoo.org (via HTTP-302, not a perm redirect) and humans are good on sources.gentoo.org. Assuming the crawlers and indexing systems follow the spec; hopefully all our search resutls do not get rewritten to sources2.gentoo.org (that would surprise me greatly...wait no it wouldn't ;p) Robin added a caching layer for some segments of the application; I am looking at cprofile dumps and discussing pain points with upstream. -A > > > Best regards, > Ch=C3=AD-Thanh Christopher Nguy=E1=BB=85n >