* [gentoo-user] strange TCP timeout errors @ 2015-10-05 17:35 Grant 2015-10-05 22:57 ` Bill Kenworthy 0 siblings, 1 reply; 10+ messages in thread From: Grant @ 2015-10-05 17:35 UTC (permalink / raw To: Gentoo mailing list [-- Attachment #1: Type: text/plain, Size: 351 bytes --] I've attached a PNG from Munin showing the TCP timeout errors on my Gentoo server over the past month. The data is expressed in timeouts per second and that rate is shown to be steadily increasing over the past month. That seems strange to me. Munin doesn't show any other data point increasing like this over the time period. Any ideas? - Grant [-- Attachment #2: tcp-timeouts.png --] [-- Type: image/png, Size: 61019 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [gentoo-user] strange TCP timeout errors 2015-10-05 17:35 [gentoo-user] strange TCP timeout errors Grant @ 2015-10-05 22:57 ` Bill Kenworthy 2015-10-05 23:26 ` Alan McKinnon 0 siblings, 1 reply; 10+ messages in thread From: Bill Kenworthy @ 2015-10-05 22:57 UTC (permalink / raw To: gentoo-user On 06/10/15 01:35, Grant wrote: > I've attached a PNG from Munin showing the TCP timeout errors on my > Gentoo server over the past month. The data is expressed in timeouts > per second and that rate is shown to be steadily increasing over the > past month. That seems strange to me. Munin doesn't show any other > data point increasing like this over the time period. Any ideas? > > - Grant > weird - does it reset on an interface restart or reboot? Can you verify its not an artefact within munin (how?) BillK ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [gentoo-user] strange TCP timeout errors 2015-10-05 22:57 ` Bill Kenworthy @ 2015-10-05 23:26 ` Alan McKinnon 2015-10-07 12:58 ` Grant 0 siblings, 1 reply; 10+ messages in thread From: Alan McKinnon @ 2015-10-05 23:26 UTC (permalink / raw To: gentoo-user On 06/10/2015 00:57, Bill Kenworthy wrote: > On 06/10/15 01:35, Grant wrote: >> I've attached a PNG from Munin showing the TCP timeout errors on my >> Gentoo server over the past month. The data is expressed in timeouts >> per second and that rate is shown to be steadily increasing over the >> past month. That seems strange to me. Munin doesn't show any other >> data point increasing like this over the time period. Any ideas? >> >> - Grant >> > > weird - does it reset on an interface restart or reboot? this would be my test #1 > Can you verify its not an artefact within munin (how?) In theory, a misconfigured graph can do this. Munin can draw many different types of graph, including cumulative values. Even for a data type like this which is X events per unit time, if you tell munin to add them all up, it will do so and graph it. Qucik test is to look at the graph config. -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [gentoo-user] strange TCP timeout errors 2015-10-05 23:26 ` Alan McKinnon @ 2015-10-07 12:58 ` Grant 2015-10-07 14:22 ` Alan McKinnon 0 siblings, 1 reply; 10+ messages in thread From: Grant @ 2015-10-07 12:58 UTC (permalink / raw To: Gentoo mailing list >>> I've attached a PNG from Munin showing the TCP timeout errors on my >>> Gentoo server over the past month. The data is expressed in timeouts >>> per second and that rate is shown to be steadily increasing over the >>> past month. That seems strange to me. Munin doesn't show any other >>> data point increasing like this over the time period. Any ideas? >>> >>> - Grant >>> >> >> weird - does it reset on an interface restart or reboot? > > this would be my test #1 I rebooted and the rate of errors has dropped off to almost nothing. >> Can you verify its not an artefact within munin (how?) > > In theory, a misconfigured graph can do this. Munin can draw many > different types of graph, including cumulative values. Even for a data > type like this which is X events per unit time, if you tell munin to add > them all up, it will do so and graph it. > > Qucik test is to look at the graph config. This graph lives in the "network" section of the munin web interface. There is no matching section in /etc/munin/plugin-conf.d/munin-node so it should be be using the default config. Any ideas based on this new info? - Grant ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [gentoo-user] strange TCP timeout errors 2015-10-07 12:58 ` Grant @ 2015-10-07 14:22 ` Alan McKinnon 2015-10-07 15:55 ` Grant 0 siblings, 1 reply; 10+ messages in thread From: Alan McKinnon @ 2015-10-07 14:22 UTC (permalink / raw To: gentoo-user On 07/10/2015 14:58, Grant wrote: >>>> I've attached a PNG from Munin showing the TCP timeout errors on my >>>> Gentoo server over the past month. The data is expressed in timeouts >>>> per second and that rate is shown to be steadily increasing over the >>>> past month. That seems strange to me. Munin doesn't show any other >>>> data point increasing like this over the time period. Any ideas? >>>> >>>> - Grant >>>> >>> >>> weird - does it reset on an interface restart or reboot? >> >> this would be my test #1 > > > I rebooted and the rate of errors has dropped off to almost nothing. > > >>> Can you verify its not an artefact within munin (how?) >> >> In theory, a misconfigured graph can do this. Munin can draw many >> different types of graph, including cumulative values. Even for a data >> type like this which is X events per unit time, if you tell munin to add >> them all up, it will do so and graph it. >> >> Qucik test is to look at the graph config. > > > This graph lives in the "network" section of the munin web interface. > There is no matching section in /etc/munin/plugin-conf.d/munin-node so > it should be be using the default config. > > Any ideas based on this new info? A few :-) I can't find the plugin that delivers that graph though. Maybe I just don't have it, maybe it comes from contrib/ What's your USE for munin? What do you have in "ls -al /etc/munin/plugins/" ? -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [gentoo-user] strange TCP timeout errors 2015-10-07 14:22 ` Alan McKinnon @ 2015-10-07 15:55 ` Grant 2015-10-07 18:39 ` Alan McKinnon 0 siblings, 1 reply; 10+ messages in thread From: Grant @ 2015-10-07 15:55 UTC (permalink / raw To: Gentoo mailing list >>>>> I've attached a PNG from Munin showing the TCP timeout errors on my >>>>> Gentoo server over the past month. The data is expressed in timeouts >>>>> per second and that rate is shown to be steadily increasing over the >>>>> past month. That seems strange to me. Munin doesn't show any other >>>>> data point increasing like this over the time period. Any ideas? >>>>> >>>>> - Grant >>>>> >>>> >>>> weird - does it reset on an interface restart or reboot? >>> >>> this would be my test #1 >> >> >> I rebooted and the rate of errors has dropped off to almost nothing. >> >> >>>> Can you verify its not an artefact within munin (how?) >>> >>> In theory, a misconfigured graph can do this. Munin can draw many >>> different types of graph, including cumulative values. Even for a data >>> type like this which is X events per unit time, if you tell munin to add >>> them all up, it will do so and graph it. >>> >>> Qucik test is to look at the graph config. >> >> >> This graph lives in the "network" section of the munin web interface. >> There is no matching section in /etc/munin/plugin-conf.d/munin-node so >> it should be be using the default config. >> >> Any ideas based on this new info? > > A few :-) > > > I can't find the plugin that delivers that graph though. Maybe I just > don't have it, maybe it comes from contrib/ > > What's your USE for munin? USE="apache cgi http mysql ssl syslog -asterisk -dhcpd -doc -ipmi -ipv6 -irc -java -memcached -minimal -postgres (-selinux) {-test}" > What do you have in "ls -al /etc/munin/plugins/" ? # ls -al /etc/munin/plugins/ total 8 drwxr-xr-x 2 munin munin 4096 Aug 26 13:22 . drwxr-xr-x 7 root root 4096 Aug 27 08:42 .. -rw-r--r-- 1 root root 0 Aug 23 18:10 .keep_net-analyzer_munin-0 lrwxrwxrwx 1 root root 42 Jun 16 2013 apache_accesses -> /usr/libexec/munin/plugins/apache_accesses lrwxrwxrwx 1 root root 43 Jun 16 2013 apache_processes -> /usr/libexec/munin/plugins/apache_processes lrwxrwxrwx 1 root root 40 Jun 16 2013 apache_volume -> /usr/libexec/munin/plugins/apache_volume lrwxrwxrwx 1 root root 30 Jun 16 2013 cpu -> /usr/libexec/munin/plugins/cpu lrwxrwxrwx 1 root root 29 Jun 16 2013 df -> /usr/libexec/munin/plugins/df lrwxrwxrwx 1 root root 35 Jun 16 2013 df_inode -> /usr/libexec/munin/plugins/df_inode lrwxrwxrwx 1 root root 36 Jun 21 2013 diskstat_ -> /usr/libexec/munin/plugins/diskstat_ lrwxrwxrwx 1 root root 36 Jun 16 2013 diskstats -> /usr/libexec/munin/plugins/diskstats lrwxrwxrwx 1 root root 34 Jun 16 2013 entropy -> /usr/libexec/munin/plugins/entropy lrwxrwxrwx 1 root root 32 Jun 16 2013 forks -> /usr/libexec/munin/plugins/forks lrwxrwxrwx 1 root root 34 Jun 18 2013 hddtemp -> /usr/libexec/munin/plugins/hddtemp lrwxrwxrwx 1 root root 35 Jun 18 2013 hddtemp2 -> /usr/libexec/munin/plugins/hddtemp2 lrwxrwxrwx 1 root root 43 Jun 18 2013 hddtemp_smartctl -> /usr/libexec/munin/plugins/hddtemp_smartctl lrwxrwxrwx 1 root root 35 Jun 18 2013 hddtempd -> /usr/libexec/munin/plugins/hddtempd lrwxrwxrwx 1 root root 30 Jun 21 2013 if_enp2s2f0 -> /usr/libexec/munin/plugins/if_ lrwxrwxrwx 1 root root 34 Jun 21 2013 if_err_enp2s2f0 -> /usr/libexec/munin/plugins/if_err_ lrwxrwxrwx 1 root root 37 Jun 16 2013 interrupts -> /usr/libexec/munin/plugins/interrupts lrwxrwxrwx 1 root root 35 Jun 16 2013 irqstats -> /usr/libexec/munin/plugins/irqstats lrwxrwxrwx 1 root root 31 Jun 16 2013 load -> /usr/libexec/munin/plugins/load lrwxrwxrwx 1 root root 33 Jun 16 2013 lpstat -> /usr/libexec/munin/plugins/lpstat lrwxrwxrwx 1 root root 34 Jun 18 2013 meminfo -> /usr/libexec/munin/plugins/meminfo lrwxrwxrwx 1 root root 33 Jun 16 2013 memory -> /usr/libexec/munin/plugins/memory lrwxrwxrwx 1 root root 38 Jun 16 2013 munin_stats -> /usr/libexec/munin/plugins/munin_stats lrwxrwxrwx 1 root root 39 Jun 18 2013 munin_update -> /usr/libexec/munin/plugins/munin_update lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_bin_relay_log -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_commands -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_connections -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_files_tables -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_innodb_bpool -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_innodb_bpool_act -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_innodb_insert_buf -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_innodb_io -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_innodb_io_pend -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_innodb_log -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_innodb_rows -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_innodb_semaphores -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_innodb_tnx -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_myisam_indexes -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_network_traffic -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_qcache -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_qcache_mem -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_replication -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_select_types -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_slow -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_sorts -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_table_locks -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 33 Jun 21 2013 mysql_tmp_tables -> /usr/libexec/munin/plugins/mysql_ lrwxrwxrwx 1 root root 34 Jun 16 2013 netstat -> /usr/libexec/munin/plugins/netstat lrwxrwxrwx 1 root root 40 Jun 18 2013 netstat_multi -> /usr/libexec/munin/plugins/netstat_multi lrwxrwxrwx 1 root root 40 Jun 16 2013 nginx_request -> /usr/libexec/munin/plugins/nginx_request lrwxrwxrwx 1 root root 39 Jun 16 2013 nginx_status -> /usr/libexec/munin/plugins/nginx_status lrwxrwxrwx 1 root root 37 Jun 16 2013 open_files -> /usr/libexec/munin/plugins/open_files lrwxrwxrwx 1 root root 38 Jun 16 2013 open_inodes -> /usr/libexec/munin/plugins/open_inodes lrwxrwxrwx 1 root root 44 Jun 16 2013 postfix_mailqueue -> /usr/libexec/munin/plugins/postfix_mailqueue lrwxrwxrwx 1 root root 44 Jun 16 2013 postfix_mailstats -> /usr/libexec/munin/plugins/postfix_mailstats lrwxrwxrwx 1 root root 45 Jun 16 2013 postfix_mailvolume -> /usr/libexec/munin/plugins/postfix_mailvolume lrwxrwxrwx 1 root root 31 Jun 18 2013 proc -> /usr/libexec/munin/plugins/proc lrwxrwxrwx 1 root root 35 Jun 16 2013 proc_pri -> /usr/libexec/munin/plugins/proc_pri lrwxrwxrwx 1 root root 36 Jun 16 2013 processes -> /usr/libexec/munin/plugins/processes lrwxrwxrwx 1 root root 35 Jun 18 2013 sensors_ -> /usr/libexec/munin/plugins/sensors_ lrwxrwxrwx 1 root root 31 Jun 16 2013 swap -> /usr/libexec/munin/plugins/swap lrwxrwxrwx 1 root root 34 Jun 16 2013 threads -> /usr/libexec/munin/plugins/threads lrwxrwxrwx 1 root root 33 Jun 16 2013 uptime -> /usr/libexec/munin/plugins/uptime lrwxrwxrwx 1 root root 32 Jun 16 2013 users -> /usr/libexec/munin/plugins/users So I don't have a "network" plugin either but I do have a "network" section under Categories in the munin web interface. - Grant P.S. Any other good plugins you'd recommend? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [gentoo-user] strange TCP timeout errors 2015-10-07 15:55 ` Grant @ 2015-10-07 18:39 ` Alan McKinnon 2015-10-07 19:42 ` brettrsears 2015-10-09 14:15 ` Grant 0 siblings, 2 replies; 10+ messages in thread From: Alan McKinnon @ 2015-10-07 18:39 UTC (permalink / raw To: gentoo-user On 07/10/2015 17:55, Grant wrote: >>>>>> I've attached a PNG from Munin showing the TCP timeout errors on my >>>>>> Gentoo server over the past month. The data is expressed in timeouts >>>>>> per second and that rate is shown to be steadily increasing over the >>>>>> past month. That seems strange to me. Munin doesn't show any other >>>>>> data point increasing like this over the time period. Any ideas? >>>>>> >>>>>> - Grant >>>>>> >>>>> >>>>> weird - does it reset on an interface restart or reboot? >>>> >>>> this would be my test #1 >>> >>> >>> I rebooted and the rate of errors has dropped off to almost nothing. >>> >>> >>>>> Can you verify its not an artefact within munin (how?) >>>> >>>> In theory, a misconfigured graph can do this. Munin can draw many >>>> different types of graph, including cumulative values. Even for a data >>>> type like this which is X events per unit time, if you tell munin to add >>>> them all up, it will do so and graph it. >>>> >>>> Qucik test is to look at the graph config. >>> >>> >>> This graph lives in the "network" section of the munin web interface. >>> There is no matching section in /etc/munin/plugin-conf.d/munin-node so >>> it should be be using the default config. >>> >>> Any ideas based on this new info? >> >> A few :-) >> >> >> I can't find the plugin that delivers that graph though. Maybe I just >> don't have it, maybe it comes from contrib/ >> >> What's your USE for munin? > > > USE="apache cgi http mysql ssl syslog -asterisk -dhcpd -doc -ipmi > -ipv6 -irc -java -memcached -minimal -postgres (-selinux) {-test}" > > >> What do you have in "ls -al /etc/munin/plugins/" ? It's as I thought - your data is accurate but rrd has been given a completely wrong method to derive the graphs. Munin graphs for section "Network" do not have to be in a file called "network" - it's just a category and the plugin defines what web-page section it must be in. In your case, the relevant plugin is netstat_multi which doesn't often get installed. It's data source is "netstat -s" so grep that output for "timeout" to see it. Timeouts are cumulative counters, they do not get less till they wrap around. So to scale them, the plugin gets the rrd file to subtract previous reading from current reading and divide by the time interval to get the timeouts/sec. This is all done inside rrd when the data files are updated (it's quite a lot of magic) That plugin sets the graph type to DERIVE (/etc/munin/plugins/netstat_multi around line 190. I feel it should be GAUGE or COUNTER. The proper reference on rrd is http://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html and the munin docs are https://munin.readthedocs.org/en/latest/index.html You must edit the plugin file and IIRC recreate the rrd, you will lose all past info (can't be helped). [snip ls output] > P.S. Any other good plugins you'd recommend? http://gallery.munin-monitoring.org/ Monitoring is highly site-specific so recommendations aren't usually worth much, but that gallery has LOTS of contributed plugins -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [gentoo-user] strange TCP timeout errors 2015-10-07 18:39 ` Alan McKinnon @ 2015-10-07 19:42 ` brettrsears 2015-10-07 22:25 ` Alan McKinnon 2015-10-09 14:15 ` Grant 1 sibling, 1 reply; 10+ messages in thread From: brettrsears @ 2015-10-07 19:42 UTC (permalink / raw To: gentoo-user YyyyYYuIIIIIU Sent from my Verizon Wireless BlackBerry -----Original Message----- From: Alan McKinnon <alan.mckinnon@gmail.com> Date: Wed, 7 Oct 2015 20:39:42 To: <gentoo-user@lists.gentoo.org> Reply-to: gentoo-user@lists.gentoo.org Subject: Re: [gentoo-user] strange TCP timeout errors On 07/10/2015 17:55, Grant wrote: >>>>>> I've attached a PNG from Munin showing the TCP timeout errors on my >>>>>> Gentoo server over the past month. The data is expressed in timeouts >>>>>> per second and that rate is shown to be steadily increasing over the >>>>>> past month. That seems strange to me. Munin doesn't show any other >>>>>> data point increasing like this over the time period. Any ideas? >>>>>> >>>>>> - Grant >>>>>> >>>>> >>>>> weird - does it reset on an interface restart or reboot? >>>> >>>> this would be my test #1 >>> >>> >>> I rebooted and the rate of errors has dropped off to almost nothing. >>> >>> >>>>> Can you verify its not an artefact within munin (how?) >>>> >>>> In theory, a misconfigured graph can do this. Munin can draw many >>>> different types of graph, including cumulative values. Even for a data >>>> type like this which is X events per unit time, if you tell munin to add >>>> them all up, it will do so and graph it. >>>> >>>> Qucik test is to look at the graph config. >>> >>> >>> This graph lives in the "network" section of the munin web interface. >>> There is no matching section in /etc/munin/plugin-conf.d/munin-node so >>> it should be be using the default config. >>> >>> Any ideas based on this new info? >> >> A few :-) >> >> >> I can't find the plugin that delivers that graph though. Maybe I just >> don't have it, maybe it comes from contrib/ >> >> What's your USE for munin? > > > USE="apache cgi http mysql ssl syslog -asterisk -dhcpd -doc -ipmi > -ipv6 -irc -java -memcached -minimal -postgres (-selinux) {-test}" > > >> What do you have in "ls -al /etc/munin/plugins/" ? It's as I thought - your data is accurate but rrd has been given a completely wrong method to derive the graphs. Munin graphs for section "Network" do not have to be in a file called "network" - it's just a category and the plugin defines what web-page section it must be in. In your case, the relevant plugin is netstat_multi which doesn't often get installed. It's data source is "netstat -s" so grep that output for "timeout" to see it. Timeouts are cumulative counters, they do not get less till they wrap around. So to scale them, the plugin gets the rrd file to subtract previous reading from current reading and divide by the time interval to get the timeouts/sec. This is all done inside rrd when the data files are updated (it's quite a lot of magic) That plugin sets the graph type to DERIVE (/etc/munin/plugins/netstat_multi around line 190. I feel it should be GAUGE or COUNTER. The proper reference on rrd is http://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html and the munin docs are https://munin.readthedocs.org/en/latest/index.html You must edit the plugin file and IIRC recreate the rrd, you will lose all past info (can't be helped). [snip ls output] > P.S. Any other good plugins you'd recommend? http://gallery.munin-monitoring.org/ Monitoring is highly site-specific so recommendations aren't usually worth much, but that gallery has LOTS of contributed plugins -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [gentoo-user] strange TCP timeout errors 2015-10-07 19:42 ` brettrsears @ 2015-10-07 22:25 ` Alan McKinnon 0 siblings, 0 replies; 10+ messages in thread From: Alan McKinnon @ 2015-10-07 22:25 UTC (permalink / raw To: gentoo-user On 07/10/2015 21:42, brettrsears@gmail.com wrote: > YyyyYYuIIIIIU > Sent from my Verizon Wireless BlackBerry Hmmmmmmmmmmmmmm, interesting reply. I'm wondering if it has something to do with: 1. verizon 2. dodgy 3g 3. crapberry. oops, sorry: blackberry Or maybe it's because y, u and i are in a row on the keyboard, shift and enter are adjacent, and you have a over-friendly cat? :-) > > -----Original Message----- > From: Alan McKinnon <alan.mckinnon@gmail.com> > Date: Wed, 7 Oct 2015 20:39:42 > To: <gentoo-user@lists.gentoo.org> > Reply-to: gentoo-user@lists.gentoo.org > Subject: Re: [gentoo-user] strange TCP timeout errors > > On 07/10/2015 17:55, Grant wrote: >>>>>>> I've attached a PNG from Munin showing the TCP timeout errors on my >>>>>>> Gentoo server over the past month. The data is expressed in timeouts >>>>>>> per second and that rate is shown to be steadily increasing over the >>>>>>> past month. That seems strange to me. Munin doesn't show any other >>>>>>> data point increasing like this over the time period. Any ideas? >>>>>>> >>>>>>> - Grant >>>>>>> >>>>>> >>>>>> weird - does it reset on an interface restart or reboot? >>>>> >>>>> this would be my test #1 >>>> >>>> >>>> I rebooted and the rate of errors has dropped off to almost nothing. >>>> >>>> >>>>>> Can you verify its not an artefact within munin (how?) >>>>> >>>>> In theory, a misconfigured graph can do this. Munin can draw many >>>>> different types of graph, including cumulative values. Even for a data >>>>> type like this which is X events per unit time, if you tell munin to add >>>>> them all up, it will do so and graph it. >>>>> >>>>> Qucik test is to look at the graph config. >>>> >>>> >>>> This graph lives in the "network" section of the munin web interface. >>>> There is no matching section in /etc/munin/plugin-conf.d/munin-node so >>>> it should be be using the default config. >>>> >>>> Any ideas based on this new info? >>> >>> A few :-) >>> >>> >>> I can't find the plugin that delivers that graph though. Maybe I just >>> don't have it, maybe it comes from contrib/ >>> >>> What's your USE for munin? >> >> >> USE="apache cgi http mysql ssl syslog -asterisk -dhcpd -doc -ipmi >> -ipv6 -irc -java -memcached -minimal -postgres (-selinux) {-test}" >> >> >>> What do you have in "ls -al /etc/munin/plugins/" ? > > > It's as I thought - your data is accurate but rrd has been given a > completely wrong method to derive the graphs. > > Munin graphs for section "Network" do not have to be in a file called > "network" - it's just a category and the plugin defines what web-page > section it must be in. In your case, the relevant plugin is > netstat_multi which doesn't often get installed. It's data source is > "netstat -s" so grep that output for "timeout" to see it. > > Timeouts are cumulative counters, they do not get less till they wrap > around. So to scale them, the plugin gets the rrd file to subtract > previous reading from current reading and divide by the time interval to > get the timeouts/sec. This is all done inside rrd when the data files > are updated (it's quite a lot of magic) > > That plugin sets the graph type to DERIVE > (/etc/munin/plugins/netstat_multi around line 190. I feel it should be > GAUGE or COUNTER. > > The proper reference on rrd is > http://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html > and the munin docs are > https://munin.readthedocs.org/en/latest/index.html > > You must edit the plugin file and IIRC recreate the rrd, you will lose > all past info (can't be helped). > > > [snip ls output] > > >> P.S. Any other good plugins you'd recommend? > > http://gallery.munin-monitoring.org/ > > Monitoring is highly site-specific so recommendations aren't usually > worth much, but that gallery has LOTS of contributed plugins > -- Alan McKinnon alan.mckinnon@gmail.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [gentoo-user] strange TCP timeout errors 2015-10-07 18:39 ` Alan McKinnon 2015-10-07 19:42 ` brettrsears @ 2015-10-09 14:15 ` Grant 1 sibling, 0 replies; 10+ messages in thread From: Grant @ 2015-10-09 14:15 UTC (permalink / raw To: Gentoo mailing list > It's as I thought - your data is accurate but rrd has been given a > completely wrong method to derive the graphs. > > Munin graphs for section "Network" do not have to be in a file called > "network" - it's just a category and the plugin defines what web-page > section it must be in. In your case, the relevant plugin is > netstat_multi which doesn't often get installed. It's data source is > "netstat -s" so grep that output for "timeout" to see it. > > Timeouts are cumulative counters, they do not get less till they wrap > around. So to scale them, the plugin gets the rrd file to subtract > previous reading from current reading and divide by the time interval to > get the timeouts/sec. This is all done inside rrd when the data files > are updated (it's quite a lot of magic) > > That plugin sets the graph type to DERIVE > (/etc/munin/plugins/netstat_multi around line 190. I feel it should be > GAUGE or COUNTER. > > The proper reference on rrd is > http://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html > and the munin docs are > https://munin.readthedocs.org/en/latest/index.html > > You must edit the plugin file and IIRC recreate the rrd, you will lose > all past info (can't be helped). > > > [snip ls output] > > >> P.S. Any other good plugins you'd recommend? > > http://gallery.munin-monitoring.org/ > > Monitoring is highly site-specific so recommendations aren't usually > worth much, but that gallery has LOTS of contributed plugins Many thanks Alan! - Grant ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-10-09 14:16 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-10-05 17:35 [gentoo-user] strange TCP timeout errors Grant 2015-10-05 22:57 ` Bill Kenworthy 2015-10-05 23:26 ` Alan McKinnon 2015-10-07 12:58 ` Grant 2015-10-07 14:22 ` Alan McKinnon 2015-10-07 15:55 ` Grant 2015-10-07 18:39 ` Alan McKinnon 2015-10-07 19:42 ` brettrsears 2015-10-07 22:25 ` Alan McKinnon 2015-10-09 14:15 ` Grant
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox