From: Michael Mol <mikemol@gmail.com>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] Google privacy changes
Date: Wed, 8 Feb 2012 10:53:43 -0500 [thread overview]
Message-ID: <CA+czFiCYUS6R-9NR6WJyjRK5Yx+nnHC3ZVbn1oTpABYPUh8YAQ@mail.gmail.com> (raw)
In-Reply-To: <CAEH5T2MCSWhNMYQaFQ0xLoj3UL-3sQDbm9FCC50f2VVXv_h0Rg@mail.gmail.com>
On Wed, Feb 8, 2012 at 10:46 AM, Paul Hartman
<paul.hartman+gentoo@gmail.com> wrote:
> On Wed, Feb 8, 2012 at 2:55 AM, Pandu Poluan <pandu@poluan.info> wrote:
>>
>> On Jan 27, 2012 11:18 PM, "Paul Hartman" <paul.hartman+gentoo@gmail.com>
>> wrote:
>>>
>>
>> ---- >8 snippage
>>
>>>
>>> BTW, the Baidu spider hits my site more than all of the others combined...
>>>
>>
>> Somewhat anecdotal, and definitely veering way off-topic, but Baidu was the
>> reason why my company decided to change our webhosting company: Its
>> spidering brought our previous webhosting to its knees...
>>
>> Rgds,
>
> I wonder if Baidu crawler honors the Crawl-delay directive in robots.txt?
>
> Or I wonder if Baidu cralwer IPs need to be covered by firewall tarpit rules. ;)
I don't remember if it respects Crawl-Delay, but it respects forbidden
paths, etc. I've never been DDOS'd by Baidu crawlers, but I did get
DDOS'd by Yahoo a number of times. Turned out the solution was to
disallow access to expensive-to-render pages. If you're using
MediaWiki with prettified URLs, this works great:
User-agent: *
Allow: /mw/images/
Allow: /mw/skins/
Allow: /mw/title.png
Disallow: /w/
Disallow: /mw/
Disallow: /wiki/Special:
--
:wq
next prev parent reply other threads:[~2012-02-08 15:54 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-26 7:16 [gentoo-user] Google privacy changes Dale
2012-01-26 8:47 ` J. Roeleveld
2012-01-26 8:48 ` Michael Mathurin
2012-01-26 10:07 ` Mick
2012-01-26 11:33 ` Neil Bothwick
2012-01-26 12:56 ` Mick
2012-01-26 13:50 ` Neil Bothwick
2012-01-26 14:05 ` Michael Hampicke
2012-01-26 14:10 ` Michael Mol
2012-01-26 14:16 ` Dale
2012-01-26 14:34 ` Michael Mol
2012-01-26 16:04 ` Frank Steinmetzger
2012-01-26 16:14 ` Michael Mol
2012-01-27 0:38 ` William Kenworthy
2012-01-27 1:49 ` Michael Mol
2012-01-27 8:47 ` Neil Bothwick
2012-01-26 16:20 ` Mick
2012-01-26 18:36 ` Mike Edenfield
2012-01-26 16:12 ` Paul Hartman
2012-01-26 16:18 ` Michael Mol
2012-01-26 16:18 ` Michael Hampicke
2012-01-26 16:24 ` James Broadhead
2012-01-27 6:41 ` Graham Murray
2012-01-27 14:53 ` Michael Mol
2012-01-26 20:45 ` Daniel da Veiga
2012-01-26 15:13 ` Neil Bothwick
2012-01-26 15:23 ` Michael Mol
2012-01-26 14:12 ` Mick
2012-01-26 15:12 ` Neil Bothwick
2012-01-26 21:29 ` Alan McKinnon
2012-01-27 0:48 ` Peter Humphrey
2012-01-27 12:21 ` Mick
2012-01-27 12:31 ` Neil Bothwick
2012-01-27 12:59 ` Mick
2012-01-27 13:13 ` Neil Bothwick
2012-01-27 13:24 ` Alan McKinnon
2012-01-26 21:47 ` Michael Hampicke
2012-01-26 23:02 ` Neil Bothwick
2012-01-26 18:09 ` Florian Philipp
2012-01-26 19:30 ` Mick
2012-01-26 19:52 ` Michael Mol
2012-01-26 20:06 ` Paul Hartman
2012-01-26 20:57 ` Neil Bothwick
2012-01-26 8:58 ` Walter Dnes
2012-01-26 12:36 ` Timo Briddigkeit
2012-01-26 13:07 ` John J. Foster
2012-01-26 13:59 ` Dale
2012-01-26 15:22 ` John J. Foster
2012-01-26 15:28 ` John J. Foster
2012-01-26 16:08 ` Frank Steinmetzger
2012-01-27 0:08 ` Dale
2012-01-26 16:38 ` Paul Hartman
2012-01-26 17:11 ` Lorenzo Bandieri
2012-01-26 17:35 ` Mick
2012-01-26 17:38 ` Paul Hartman
2012-01-27 6:57 ` Dale
2012-01-27 18:49 ` Florian Philipp
2012-01-27 14:48 ` v_2e
2012-01-27 16:14 ` Paul Hartman
2012-02-08 8:55 ` Pandu Poluan
2012-02-08 15:46 ` Paul Hartman
2012-02-08 15:53 ` Michael Mol [this message]
2012-02-08 17:17 ` Pandu Poluan
2012-02-08 18:28 ` Michael Mol
2012-01-29 14:35 ` Volker Armin Hemmann
2012-01-29 19:12 ` Dale
2012-01-29 19:47 ` Mick
2012-01-29 23:57 ` Chris Walters
2012-02-08 8:01 ` Dale
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CA+czFiCYUS6R-9NR6WJyjRK5Yx+nnHC3ZVbn1oTpABYPUh8YAQ@mail.gmail.com \
--to=mikemol@gmail.com \
--cc=gentoo-user@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox