From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([69.77.167.62] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1KRrcZ-0006Jw-8z for garchives@archives.gentoo.org; Sat, 09 Aug 2008 16:48:52 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 0CD79E0487; Sat, 9 Aug 2008 16:48:50 +0000 (UTC) Received: from rv-out-0708.google.com (rv-out-0708.google.com [209.85.198.241]) by pigeon.gentoo.org (Postfix) with ESMTP id C511CE0487 for ; Sat, 9 Aug 2008 16:48:49 +0000 (UTC) Received: by rv-out-0708.google.com with SMTP id b17so1276178rvf.46 for ; Sat, 09 Aug 2008 09:48:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=A1QNzzO7eyrZJrdNC7eCmUaERCyALJRqWqcx/qUVMo4=; b=QXA4U/Xa0ce5ojdW0vGis01ty1qc/uGZpgsfphVpjAfKJyuETYYnh/dDjiQH4SSmMJ HtpdV8IKcjk8QeP7xK7mw/ZAeNCYAbRGysO8mVFoZ83hqi4nEfS2UouoedjkjddYdY4e 0xkUlfJsqNpoRgKvFvBVOfjdROv3Dl71v0FxY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=s27FUI1RjH16fjAZaqbEcP7Vue6f5mZOd3PMtseiqeUO0CPS5DUo+YrtXFpgg7woVI i29PkqmbBeB1yZ8KtTjnYiuBH97qHwerbylv87NgW09CT4WkvwOf7Log/barwVZlMROq tsF/6/+lFhXdtFbXhT4pVWg8zp+KDBeSOdKtg= Received: by 10.114.152.17 with SMTP id z17mr2603570wad.63.1218300529278; Sat, 09 Aug 2008 09:48:49 -0700 (PDT) Received: by 10.115.110.19 with HTTP; Sat, 9 Aug 2008 09:48:49 -0700 (PDT) Message-ID: <49bf44f10808090948l435276aap348bc17f56f46572@mail.gmail.com> Date: Sat, 9 Aug 2008 09:48:49 -0700 From: Grant To: gentoo-user@lists.gentoo.org Subject: Re: [gentoo-user] Re: Dealing with scrapers - Help! In-Reply-To: <200808091726.17258.michaelkintzios@gmail.com> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <49bf44f10808090908radd58bcl2283b7589dcd10e9@mail.gmail.com> <49bf44f10808090915j326072d0ldc64d898bac93869@mail.gmail.com> <200808091726.17258.michaelkintzios@gmail.com> X-Archives-Salt: 2e4de5b9-ca55-4e9a-b773-ed6e385a0aa6 X-Archives-Hash: 91b0cb2037e60bb749b74fd8a9b6b41a >> > My apache web server has been very slow lately and webalizer charts >> > show page accesses at 5x normal with other stats normal. I'm thinking >> > scrapers? How do you guys deal with this? Do you identify the IP >> > (how?) and ban it (how?)? >> > >> > - Grant >> >> I used netstat to identify the IP and I see that I can use it with >> "deny from" in httpd.conf. It seems to be over now, but this type of >> thing happens periodically. How can I be alerted to this type of >> situation when it starts so I can block the IP right away? > > You will need to configure quotas probably using something like: > > http://www.howtoforge.com/mod_cband_apache2_bandwidth_quota_throttling > > Not sure if it is possible to differentiate between rogue and legit clients, > other than by checking your logs to see what was blocked. Turns out it was a "legit" bot. Watch out for this one: Mozilla/5.0 (compatible; discobot/1.0; +http://discoveryengine.com/discobot.html) It's bad that a single IP can bring down my http isn't it? - Grant