From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id 0236313888F for ; Wed, 7 Oct 2015 18:40:39 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id DF44AE07F2; Wed, 7 Oct 2015 18:40:21 +0000 (UTC) Received: from mail-wi0-f177.google.com (mail-wi0-f177.google.com [209.85.212.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id AC74EE07DA for ; Wed, 7 Oct 2015 18:40:20 +0000 (UTC) Received: by wicfx3 with SMTP id fx3so41382730wic.0 for ; Wed, 07 Oct 2015 11:40:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-type:content-transfer-encoding; bh=XPRjSNuiAJ8rN8yuXPUmDDNigXmiqSjhZkRQ57imOg0=; b=lFqXYSKGCfAfAWqlTwTFnn7HiyWYQMQpNqqRNTo4c9SXeEnozcDGKBS/UaXDS0HOoN 0XIn01Bm2ce8clCgbP3W+D7NExqsF1qGh4+Z0/q5pmeKdIlMEkAVRNh2tGHqon2/9rf6 SzQ0m0aYhomAChoNpwBmbPM6vim8UxXSVOHOX2VD3DNgOTrEM85usrWihgRO87+HYvSP Sxhl7JBF7sQUDLkb6Z0LdMvz2hrADXDwcNkGAz1tK/pmeeW9N4DBk3Lnae99YQ9M+wuf f8rdgSp0UM68h7mgTwTqSLXyMS+vn4NS8SyOWPFBPVXyHHMUe3FBVtLBjzIFL7/NpsdE aXuw== X-Received: by 10.180.107.1 with SMTP id gy1mr26966131wib.56.1444243219424; Wed, 07 Oct 2015 11:40:19 -0700 (PDT) Received: from [172.20.0.41] ([105.210.54.110]) by smtp.googlemail.com with ESMTPSA id fx2sm3707187wib.24.2015.10.07.11.40.17 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 07 Oct 2015 11:40:18 -0700 (PDT) Subject: Re: [gentoo-user] strange TCP timeout errors To: gentoo-user@lists.gentoo.org References: <5613003F.5020303@iinet.net.au> <5613073B.5090600@gmail.com> <56152AAD.5030003@gmail.com> From: Alan McKinnon X-Enigmail-Draft-Status: N1110 Message-ID: <561566EE.9000507@gmail.com> Date: Wed, 7 Oct 2015 20:39:42 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Archives-Salt: 6579f767-759b-4d89-933f-c6f8a119520f X-Archives-Hash: c86cf1ce380793ce05a335e1cb817f29 On 07/10/2015 17:55, Grant wrote: >>>>>> I've attached a PNG from Munin showing the TCP timeout errors on my >>>>>> Gentoo server over the past month. The data is expressed in timeouts >>>>>> per second and that rate is shown to be steadily increasing over the >>>>>> past month. That seems strange to me. Munin doesn't show any other >>>>>> data point increasing like this over the time period. Any ideas? >>>>>> >>>>>> - Grant >>>>>> >>>>> >>>>> weird - does it reset on an interface restart or reboot? >>>> >>>> this would be my test #1 >>> >>> >>> I rebooted and the rate of errors has dropped off to almost nothing. >>> >>> >>>>> Can you verify its not an artefact within munin (how?) >>>> >>>> In theory, a misconfigured graph can do this. Munin can draw many >>>> different types of graph, including cumulative values. Even for a data >>>> type like this which is X events per unit time, if you tell munin to add >>>> them all up, it will do so and graph it. >>>> >>>> Qucik test is to look at the graph config. >>> >>> >>> This graph lives in the "network" section of the munin web interface. >>> There is no matching section in /etc/munin/plugin-conf.d/munin-node so >>> it should be be using the default config. >>> >>> Any ideas based on this new info? >> >> A few :-) >> >> >> I can't find the plugin that delivers that graph though. Maybe I just >> don't have it, maybe it comes from contrib/ >> >> What's your USE for munin? > > > USE="apache cgi http mysql ssl syslog -asterisk -dhcpd -doc -ipmi > -ipv6 -irc -java -memcached -minimal -postgres (-selinux) {-test}" > > >> What do you have in "ls -al /etc/munin/plugins/" ? It's as I thought - your data is accurate but rrd has been given a completely wrong method to derive the graphs. Munin graphs for section "Network" do not have to be in a file called "network" - it's just a category and the plugin defines what web-page section it must be in. In your case, the relevant plugin is netstat_multi which doesn't often get installed. It's data source is "netstat -s" so grep that output for "timeout" to see it. Timeouts are cumulative counters, they do not get less till they wrap around. So to scale them, the plugin gets the rrd file to subtract previous reading from current reading and divide by the time interval to get the timeouts/sec. This is all done inside rrd when the data files are updated (it's quite a lot of magic) That plugin sets the graph type to DERIVE (/etc/munin/plugins/netstat_multi around line 190. I feel it should be GAUGE or COUNTER. The proper reference on rrd is http://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html and the munin docs are https://munin.readthedocs.org/en/latest/index.html You must edit the plugin file and IIRC recreate the rrd, you will lose all past info (can't be helped). [snip ls output] > P.S. Any other good plugins you'd recommend? http://gallery.munin-monitoring.org/ Monitoring is highly site-specific so recommendations aren't usually worth much, but that gallery has LOTS of contributed plugins -- Alan McKinnon alan.mckinnon@gmail.com