From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1Rv9r9-0003cD-BS for garchives@archives.gentoo.org; Wed, 08 Feb 2012 15:54:52 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id E28FDE064C; Wed, 8 Feb 2012 15:54:42 +0000 (UTC) Received: from mail-bk0-f53.google.com (mail-bk0-f53.google.com [209.85.214.53]) by pigeon.gentoo.org (Postfix) with ESMTP id 5C91CE0730 for ; Wed, 8 Feb 2012 15:53:43 +0000 (UTC) Received: by bkcjk7 with SMTP id jk7so704029bkc.40 for ; Wed, 08 Feb 2012 07:53:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=PbWg9FmSb0pVfWuAYC+JzXwL10b/otS7lEts98scSwg=; b=EukYiUFRVCbA2NAcSVoJtwwW4/uObPhZU1citk//E/TtZXffRYwCIdmFSvG4UXvSGq VN12ldDScX7OiB5ICetlPEekLxH2Bs5u+q8jIdzLQ4zQt+nie4q1YDeXrF+p5kNK6sNr VG58U6h7CW5LYllUkVeptleEFxl36MOZivCso= Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 Received: by 10.204.157.130 with SMTP id b2mr12813970bkx.22.1328716423325; Wed, 08 Feb 2012 07:53:43 -0800 (PST) Received: by 10.205.124.129 with HTTP; Wed, 8 Feb 2012 07:53:43 -0800 (PST) In-Reply-To: References: <4F20FDB1.1030100@gmail.com> <20120127164835.f1e12ba8.v_2e@ukr.net> Date: Wed, 8 Feb 2012 10:53:43 -0500 Message-ID: Subject: Re: [gentoo-user] Google privacy changes From: Michael Mol To: gentoo-user@lists.gentoo.org Content-Type: text/plain; charset=UTF-8 X-Archives-Salt: 7640a9c9-139e-4255-9d08-0e24789d4768 X-Archives-Hash: 1e882ece77b0fe2621f467beed6eda54 On Wed, Feb 8, 2012 at 10:46 AM, Paul Hartman wrote: > On Wed, Feb 8, 2012 at 2:55 AM, Pandu Poluan wrote: >> >> On Jan 27, 2012 11:18 PM, "Paul Hartman" >> wrote: >>> >> >> ---- >8 snippage >> >>> >>> BTW, the Baidu spider hits my site more than all of the others combined... >>> >> >> Somewhat anecdotal, and definitely veering way off-topic, but Baidu was the >> reason why my company decided to change our webhosting company: Its >> spidering brought our previous webhosting to its knees... >> >> Rgds, > > I wonder if Baidu crawler honors the Crawl-delay directive in robots.txt? > > Or I wonder if Baidu cralwer IPs need to be covered by firewall tarpit rules. ;) I don't remember if it respects Crawl-Delay, but it respects forbidden paths, etc. I've never been DDOS'd by Baidu crawlers, but I did get DDOS'd by Yahoo a number of times. Turned out the solution was to disallow access to expensive-to-render pages. If you're using MediaWiki with prettified URLs, this works great: User-agent: * Allow: /mw/images/ Allow: /mw/skins/ Allow: /mw/title.png Disallow: /w/ Disallow: /mw/ Disallow: /wiki/Special: -- :wq