public inbox for gentoo-server@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-server] DoS Analysis and Prevemption
@ 2013-04-15 15:07 Christian Parpart
  2013-07-28 14:01 ` Kerin Millar
  0 siblings, 1 reply; 2+ messages in thread
From: Christian Parpart @ 2013-04-15 15:07 UTC (permalink / raw
  To: gentoo-server

[-- Attachment #1: Type: text/plain, Size: 2776 bytes --]

Hey all,

we hit some nice traffic last night that took our main gateway down.
Pacemaker was configured to failover to our second one, but that one died
aswell.

In a little post-analysis, I found the following in the logs:

Apr 14 21:42:11 cesar1 kernel: [27613652.439846] BUG: soft lockup - CPU#4
stuck for 22s! [swapper/4:0]
Apr 14 21:42:11 cesar1 kernel: [27613652.440319] Stack:
Apr 14 21:42:11 cesar1 kernel: [27613652.440446] Call Trace:
Apr 14 21:42:11 cesar1 kernel: [27613652.440595]  <IRQ>
Apr 14 21:42:12 cesar1 kernel: [27613652.440828]  <EOI>
Apr 14 21:42:12 cesar1 kernel: [27613652.440979] Code: c1 51 da 03 81 48 c7
c2 4e da 03 81 e9 dd fe ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 55 b8
00 00 01 00 48 89 e5 f0 0f c1 07 <89> c2
Apr 14 21:42:12 cesar1 CRON[13599]: nss_ldap: could not connect to any LDAP
server as cn=admin,dc=rz,dc=dawanda,dc=com - Can't contact LDAP server
Apr 14 21:42:12 cesar1 CRON[13599]: nss_ldap: could not search LDAP server
- Server is unavailable
Apr 14 21:42:24 cesar1 crmd: [7287]: ERROR: process_lrm_event: LRM
operation management-gateway-ip1_stop_0 (917) Timed Out (timeout=20000ms)
Apr 14 21:42:48 cesar1 kernel: [27613688.611501] BUG: soft lockup - CPU#7
stuck for 22s! [named:32166]
Apr 14 21:42:48 cesar1 kernel: [27613688.611914] Stack:
Apr 14 21:42:48 cesar1 kernel: [27613688.612036] Call Trace:
Apr 14 21:42:48 cesar1 kernel: [27613688.612200]  <IRQ>
Apr 14 21:42:48 cesar1 kernel: [27613688.612408]  <EOI>
Apr 14 21:42:48 cesar1 kernel: [27613688.612626] Code: c1 51 da 03 81 48 c7
c2 4e da 03 81 e9 dd fe ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 55 b8
00 00 01 00 48 89 e5 f0 0f c1 07 <89> c2
Apr 14 21:42:55 cesar1 kernel: [27613695.946295] BUG: soft lockup - CPU#0
stuck for 21s! [ksoftirqd/0:3]

Apr 14 21:42:55 cesar1 kernel: [27613695.946785] Stack:
Apr 14 21:42:55 cesar1 kernel: [27613695.946917] Call Trace:
Apr 14 21:42:55 cesar1 kernel: [27613695.947137] Code: c4 00 00 81 a8 44 e0
ff ff ff 01 00 00 48 63 80 44 e0 ff ff a9 00 ff ff 07 74 36 65 48 8b 04 25
c8 c4 00 00 83 a8 44 e0 ff ff 01 <5d> c3

We're using irqbalance to not only hit the first CPU for ethernet card
hardware interrupts when traffic comes in (learned from last much more
intensive DDoS).
However, since this not helped, I'd like to find out what else we can do.
Our gateway has to do NAT and has a few other iptables rules it needs in
order to run OpenStack behind,
so I can't just drop it.

Regarding the logs, I can see, that something caused the CPU cores to get
stuck for a number of different processes.
Has anyone ever encountered such error messages I quoted above or knows
other things one might want to do in order to prevent hugh unsocialized
incoming traffic from bringing a Linux node down?

Best regards,
Christian.

[-- Attachment #2: Type: text/html, Size: 3320 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-07-28 14:01 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-15 15:07 [gentoo-server] DoS Analysis and Prevemption Christian Parpart
2013-07-28 14:01 ` Kerin Millar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox