public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
From: Michael Mol <mikemol@gmail.com>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] Google privacy changes
Date: Wed, 8 Feb 2012 10:53:43 -0500	[thread overview]
Message-ID: <CA+czFiCYUS6R-9NR6WJyjRK5Yx+nnHC3ZVbn1oTpABYPUh8YAQ@mail.gmail.com> (raw)
In-Reply-To: <CAEH5T2MCSWhNMYQaFQ0xLoj3UL-3sQDbm9FCC50f2VVXv_h0Rg@mail.gmail.com>

On Wed, Feb 8, 2012 at 10:46 AM, Paul Hartman
<paul.hartman+gentoo@gmail.com> wrote:
> On Wed, Feb 8, 2012 at 2:55 AM, Pandu Poluan <pandu@poluan.info> wrote:
>>
>> On Jan 27, 2012 11:18 PM, "Paul Hartman" <paul.hartman+gentoo@gmail.com>
>> wrote:
>>>
>>
>> ---- >8 snippage
>>
>>>
>>> BTW, the Baidu spider hits my site more than all of the others combined...
>>>
>>
>> Somewhat anecdotal, and definitely veering way off-topic, but Baidu was the
>> reason why my company decided to change our webhosting company: Its
>> spidering brought our previous webhosting to its knees...
>>
>> Rgds,
>
> I wonder if Baidu crawler honors the Crawl-delay directive in robots.txt?
>
> Or I wonder if Baidu cralwer IPs need to be covered by firewall tarpit rules. ;)

I don't remember if it respects Crawl-Delay, but it respects forbidden
paths, etc. I've never been DDOS'd by Baidu crawlers, but I did get
DDOS'd by Yahoo a number of times. Turned out the solution was to
disallow access to expensive-to-render pages. If you're using
MediaWiki with prettified URLs, this works great:

User-agent: *
Allow: /mw/images/
Allow: /mw/skins/
Allow: /mw/title.png
Disallow: /w/
Disallow: /mw/
Disallow: /wiki/Special:

-- 
:wq



  reply	other threads:[~2012-02-08 15:54 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-26  7:16 [gentoo-user] Google privacy changes Dale
2012-01-26  8:47 ` J. Roeleveld
2012-01-26  8:48 ` Michael Mathurin
2012-01-26 10:07   ` Mick
2012-01-26 11:33     ` Neil Bothwick
2012-01-26 12:56       ` Mick
2012-01-26 13:50         ` Neil Bothwick
2012-01-26 14:05           ` Michael Hampicke
2012-01-26 14:10             ` Michael Mol
2012-01-26 14:16               ` Dale
2012-01-26 14:34                 ` Michael Mol
2012-01-26 16:04                   ` Frank Steinmetzger
2012-01-26 16:14                     ` Michael Mol
2012-01-27  0:38                       ` William Kenworthy
2012-01-27  1:49                         ` Michael Mol
2012-01-27  8:47                         ` Neil Bothwick
2012-01-26 16:20                     ` Mick
2012-01-26 18:36                     ` Mike Edenfield
2012-01-26 16:12                   ` Paul Hartman
2012-01-26 16:18                     ` Michael Mol
2012-01-26 16:18                     ` Michael Hampicke
2012-01-26 16:24                       ` James Broadhead
2012-01-27  6:41                         ` Graham Murray
2012-01-27 14:53                           ` Michael Mol
2012-01-26 20:45                 ` Daniel da Veiga
2012-01-26 15:13             ` Neil Bothwick
2012-01-26 15:23               ` Michael Mol
2012-01-26 14:12           ` Mick
2012-01-26 15:12             ` Neil Bothwick
2012-01-26 21:29               ` Alan McKinnon
2012-01-27  0:48                 ` Peter Humphrey
2012-01-27 12:21                   ` Mick
2012-01-27 12:31                     ` Neil Bothwick
2012-01-27 12:59                       ` Mick
2012-01-27 13:13                         ` Neil Bothwick
2012-01-27 13:24                       ` Alan McKinnon
2012-01-26 21:47               ` Michael Hampicke
2012-01-26 23:02                 ` Neil Bothwick
2012-01-26 18:09     ` Florian Philipp
2012-01-26 19:30       ` Mick
2012-01-26 19:52         ` Michael Mol
2012-01-26 20:06           ` Paul Hartman
2012-01-26 20:57           ` Neil Bothwick
2012-01-26  8:58 ` Walter Dnes
2012-01-26 12:36 ` Timo Briddigkeit
2012-01-26 13:07 ` John J. Foster
2012-01-26 13:59   ` Dale
2012-01-26 15:22     ` John J. Foster
2012-01-26 15:28       ` John J. Foster
2012-01-26 16:08     ` Frank Steinmetzger
2012-01-27  0:08       ` Dale
2012-01-26 16:38 ` Paul Hartman
2012-01-26 17:11   ` Lorenzo Bandieri
2012-01-26 17:35     ` Mick
2012-01-26 17:38     ` Paul Hartman
2012-01-27  6:57 ` Dale
2012-01-27 18:49   ` Florian Philipp
2012-01-27 14:48 ` v_2e
2012-01-27 16:14   ` Paul Hartman
2012-02-08  8:55     ` Pandu Poluan
2012-02-08 15:46       ` Paul Hartman
2012-02-08 15:53         ` Michael Mol [this message]
2012-02-08 17:17           ` Pandu Poluan
2012-02-08 18:28             ` Michael Mol
2012-01-29 14:35 ` Volker Armin Hemmann
2012-01-29 19:12   ` Dale
2012-01-29 19:47     ` Mick
2012-01-29 23:57       ` Chris Walters
2012-02-08  8:01 ` Dale

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+czFiCYUS6R-9NR6WJyjRK5Yx+nnHC3ZVbn1oTpABYPUh8YAQ@mail.gmail.com \
    --to=mikemol@gmail.com \
    --cc=gentoo-user@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox