From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id 8386E1381F3 for ; Mon, 15 Apr 2013 15:07:44 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id B574DE0983; Mon, 15 Apr 2013 15:07:34 +0000 (UTC) Received: from mail-qa0-f49.google.com (mail-qa0-f49.google.com [209.85.216.49]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id B8ADBE097C for ; Mon, 15 Apr 2013 15:07:33 +0000 (UTC) Received: by mail-qa0-f49.google.com with SMTP id bs12so750024qab.15 for ; Mon, 15 Apr 2013 08:07:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:sender:date:x-google-sender-auth:message-id :subject:from:to:content-type; bh=GlpxrotDpAOfETOWY1J6U97maXtITo/uyHDO9Sk3CAQ=; b=oO0GYHsbadzyqgjRJfvtYSkUuiLFI8RVyfqkeN1jVEYY3A/BxPxCI532gUI7vkcpUx Aw25mxP2UVk4bX7jETRVESxCdNGr0AV4a5Em/4s3OiM8+X3ny9DIOCsDSiqfttjlmICD fpiSNi1ciCW71oI+HUqR3Qdstruy8DlfkSaIKxvsgN3TvE425oWHQ1opCHzqq6GwcV1t ee9lLh2k9+jzqCANTyNHxly1nS8a7bpv39EnPWjCjZsAzx53/FWPc9qULvqBw5rGRR+G b64sMEmN1Pv7w4U7vN2tcF65awlYnOmeDp3/xSK8DDZ9jby4WYcYrGaSKIjq+CaMWEUA 81Wg== Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-server@lists.gentoo.org Reply-to: gentoo-server@lists.gentoo.org MIME-Version: 1.0 X-Received: by 10.224.25.210 with SMTP id a18mr22470158qac.71.1366038452714; Mon, 15 Apr 2013 08:07:32 -0700 (PDT) Sender: trapni@gmail.com Received: by 10.49.127.169 with HTTP; Mon, 15 Apr 2013 08:07:32 -0700 (PDT) Date: Mon, 15 Apr 2013 17:07:32 +0200 X-Google-Sender-Auth: OL84ze412_RbH-2NyCjjQlhd2Q8 Message-ID: Subject: [gentoo-server] DoS Analysis and Prevemption From: Christian Parpart To: gentoo-server@lists.gentoo.org Content-Type: multipart/alternative; boundary=047d7bf0e140b943c104da679c84 X-Archives-Salt: 1f3c859c-49d2-462d-8bc6-3660a6d54097 X-Archives-Hash: 9abc36afbb3b894967bb646b9d637791 --047d7bf0e140b943c104da679c84 Content-Type: text/plain; charset=ISO-8859-1 Hey all, we hit some nice traffic last night that took our main gateway down. Pacemaker was configured to failover to our second one, but that one died aswell. In a little post-analysis, I found the following in the logs: Apr 14 21:42:11 cesar1 kernel: [27613652.439846] BUG: soft lockup - CPU#4 stuck for 22s! [swapper/4:0] Apr 14 21:42:11 cesar1 kernel: [27613652.440319] Stack: Apr 14 21:42:11 cesar1 kernel: [27613652.440446] Call Trace: Apr 14 21:42:11 cesar1 kernel: [27613652.440595] Apr 14 21:42:12 cesar1 kernel: [27613652.440828] Apr 14 21:42:12 cesar1 kernel: [27613652.440979] Code: c1 51 da 03 81 48 c7 c2 4e da 03 81 e9 dd fe ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 55 b8 00 00 01 00 48 89 e5 f0 0f c1 07 <89> c2 Apr 14 21:42:12 cesar1 CRON[13599]: nss_ldap: could not connect to any LDAP server as cn=admin,dc=rz,dc=dawanda,dc=com - Can't contact LDAP server Apr 14 21:42:12 cesar1 CRON[13599]: nss_ldap: could not search LDAP server - Server is unavailable Apr 14 21:42:24 cesar1 crmd: [7287]: ERROR: process_lrm_event: LRM operation management-gateway-ip1_stop_0 (917) Timed Out (timeout=20000ms) Apr 14 21:42:48 cesar1 kernel: [27613688.611501] BUG: soft lockup - CPU#7 stuck for 22s! [named:32166] Apr 14 21:42:48 cesar1 kernel: [27613688.611914] Stack: Apr 14 21:42:48 cesar1 kernel: [27613688.612036] Call Trace: Apr 14 21:42:48 cesar1 kernel: [27613688.612200] Apr 14 21:42:48 cesar1 kernel: [27613688.612408] Apr 14 21:42:48 cesar1 kernel: [27613688.612626] Code: c1 51 da 03 81 48 c7 c2 4e da 03 81 e9 dd fe ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 55 b8 00 00 01 00 48 89 e5 f0 0f c1 07 <89> c2 Apr 14 21:42:55 cesar1 kernel: [27613695.946295] BUG: soft lockup - CPU#0 stuck for 21s! [ksoftirqd/0:3] Apr 14 21:42:55 cesar1 kernel: [27613695.946785] Stack: Apr 14 21:42:55 cesar1 kernel: [27613695.946917] Call Trace: Apr 14 21:42:55 cesar1 kernel: [27613695.947137] Code: c4 00 00 81 a8 44 e0 ff ff ff 01 00 00 48 63 80 44 e0 ff ff a9 00 ff ff 07 74 36 65 48 8b 04 25 c8 c4 00 00 83 a8 44 e0 ff ff 01 <5d> c3 We're using irqbalance to not only hit the first CPU for ethernet card hardware interrupts when traffic comes in (learned from last much more intensive DDoS). However, since this not helped, I'd like to find out what else we can do. Our gateway has to do NAT and has a few other iptables rules it needs in order to run OpenStack behind, so I can't just drop it. Regarding the logs, I can see, that something caused the CPU cores to get stuck for a number of different processes. Has anyone ever encountered such error messages I quoted above or knows other things one might want to do in order to prevent hugh unsocialized incoming traffic from bringing a Linux node down? Best regards, Christian. --047d7bf0e140b943c104da679c84 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hey all,

we hit some nice traffic last night that took o= ur main gateway down. Pacemaker was configured to failover to our second on= e, but that one died aswell.

In a little post-anal= ysis, I found the following in the logs:

Apr 14 21:42:11 cesar1 kernel: [27613652.439846] B= UG: soft lockup - CPU#4 stuck for 22s! [swapper/4:0]
Apr 14 21:42= :11 cesar1 kernel: [27613652.440319] Stack:
Apr 14 21:42:11 cesar= 1 kernel: [27613652.440446] Call Trace:
Apr 14 21:42:11 cesar1 kernel: [27613652.440595] =A0<IRQ>=A0
Apr 14 21:42:12 cesar1 kernel: [27613652.440828] =A0<EOI>=A0
Apr 14 21:42:12 cesar1 kernel: [27613652.440979] Code: c1 51 da 03= 81 48 c7 c2 4e da 03 81 e9 dd fe ff ff 90 90 90 90 90 90 90 90 90 90 90 90= 90 55 b8 00 00 01 00 48 89 e5 f0 0f c1 07 <89> c2
Apr 14 21:42:12 cesar1 CRON[13599]: nss_ldap: could not connect to any= LDAP server as cn=3Dadmin,dc=3Drz,dc=3Ddawanda,dc=3Dcom - Can't contac= t LDAP server
Apr 14 21:42:12 cesar1 CRON[13599]: nss_ldap: could= not search LDAP server - Server is unavailable
Apr 14 21:42:24 cesar1 crmd: [7287]: ERROR: process_lrm_event: LRM ope= ration management-gateway-ip1_stop_0 (917) Timed Out (timeout=3D20000ms)
Apr 14 21:42:48 cesar1 kernel: [27613688.611501] BUG: soft lockup -= CPU#7 stuck for 22s! [named:32166]
Apr 14 21:42:48 cesar1 kernel: [27613688.611914] Stack:
Apr = 14 21:42:48 cesar1 kernel: [27613688.612036] Call Trace:
Apr 14 2= 1:42:48 cesar1 kernel: [27613688.612200] =A0<IRQ>=A0
Apr 14= 21:42:48 cesar1 kernel: [27613688.612408] =A0<EOI>=A0
Apr 14 21:42:48 cesar1 kernel: [27613688.612626] Code: c1 51 da 03 81 = 48 c7 c2 4e da 03 81 e9 dd fe ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 = 55 b8 00 00 01 00 48 89 e5 f0 0f c1 07 <89> c2
Apr 14 21:42= :55 cesar1 kernel: [27613695.946295] BUG: soft lockup - CPU#0 stuck for 21s= ! [ksoftirqd/0:3] =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=A0
Apr 14 21:42:55 cesar1 kernel: [27613695.946785] Stack:
Apr = 14 21:42:55 cesar1 kernel: [27613695.946917] Call Trace:
Apr 14 2= 1:42:55 cesar1 kernel: [27613695.947137] Code: c4 00 00 81 a8 44 e0 ff ff f= f 01 00 00 48 63 80 44 e0 ff ff a9 00 ff ff 07 74 36 65 48 8b 04 25 c8 c4 0= 0 00 83 a8 44 e0 ff ff 01 <5d> c3

We're using irqbalance to not only hit the fi= rst CPU for ethernet card hardware interrupts when traffic comes in (learne= d from last much more intensive DDoS).
However, since this not he= lped, I'd like to find out what else we can do. Our gateway has to do N= AT and has a few other iptables rules it needs in order to run OpenStack be= hind,
so I can't just drop it.

Regarding the lo= gs, I can see, that something caused the CPU cores to get stuck for a numbe= r of different processes.
Has anyone ever encountered such error = messages I quoted above or knows other things one might want to do in order= to prevent hugh=A0unsocialized incoming traffic from bringing a Linux node= down?

Best regards,
Christian.
--047d7bf0e140b943c104da679c84--