* [gentoo-user] Networking trouble @ 2015-10-15 13:30 hw 2015-10-15 13:54 ` J. Roeleveld 0 siblings, 1 reply; 8+ messages in thread From: hw @ 2015-10-15 13:30 UTC (permalink / raw To: gentoo-user Hi, I have a xen host with some HV guests which becomes unreachable via the network after apparently random amount of times. I have already switched the network card to see if that would make a difference, and with the card currently installed, it worked fine for over 20 days until it become unreachable again. Before switching the network card, it would run a week or two before becoming unreachable. The previous card was the on-board BCM5764M which uses the tg3 driver. There are messages like this in the log file: Oct 14 20:58:02 moonflo kernel: ------------[ cut here ]------------ Oct 14 20:58:02 moonflo kernel: WARNING: CPU: 10 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x259/0x270() Oct 14 20:58:02 moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit queue 0 timed out Oct 14 20:58:02 moonflo kernel: Modules linked in: arc4 ecb md4 hmac nls_utf8 cifs fscache xt_physdev br_netfilter iptable_filter ip_tables xen_pciback xen_gntalloc xen_gntdev bridge stp llc zfs(PO) nouveau snd_hda_codec_realtek snd_hda_codec_generic zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate video backlight drm_kms_helper ttm snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm snd_timer snd soundcore r8169 mii xts aesni_intel glue_helper lrw gf128mul ablk_helper cryptd aes_x86_64 sha256_generic hid_generic usbhid uhci_hcd usb_storage ehci_pci ehci_hcd usbcore usb_common Oct 14 20:58:02 moonflo kernel: CPU: 10 PID: 0 Comm: swapper/10 Tainted: P O 4.0.5-gentoo #3 Oct 14 20:58:02 moonflo kernel: Hardware name: Hewlett-Packard HP Z800 Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 Oct 14 20:58:02 moonflo kernel: ffffffff8175a77d ffff880124d43d98 ffffffff814da8d8 0000000000000001 Oct 14 20:58:02 moonflo kernel: ffff880124d43de8 ffff880124d43dd8 ffffffff81088850 ffff880124d43dd8 Oct 14 20:58:02 moonflo kernel: 0000000000000000 ffff8800d45f2000 0000000000000001 ffff8800d5294880 Oct 14 20:58:02 moonflo kernel: Call Trace: Oct 14 20:58:02 moonflo kernel: <IRQ> [<ffffffff814da8d8>] dump_stack+0x45/0x57 Oct 14 20:58:02 moonflo kernel: [<ffffffff81088850>] warn_slowpath_common+0x80/0xc0 Oct 14 20:58:02 moonflo kernel: [<ffffffff810888d1>] warn_slowpath_fmt+0x41/0x50 Oct 14 20:58:02 moonflo kernel: [<ffffffff812b31c5>] ? add_interrupt_randomness+0x35/0x1e0 Oct 14 20:58:02 moonflo kernel: [<ffffffff8145b819>] dev_watchdog+0x259/0x270 Oct 14 20:58:02 moonflo kernel: [<ffffffff8145b5c0>] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel: [<ffffffff8145b5c0>] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel: [<ffffffff810d4047>] call_timer_fn.isra.30+0x17/0x70 Oct 14 20:58:02 moonflo kernel: [<ffffffff810d42a6>] run_timer_softirq+0x176/0x2b0 Oct 14 20:58:02 moonflo kernel: [<ffffffff8108bd0a>] __do_softirq+0xda/0x1f0 Oct 14 20:58:02 moonflo kernel: [<ffffffff8108c04e>] irq_exit+0x7e/0xa0 Oct 14 20:58:02 moonflo kernel: [<ffffffff8130e075>] xen_evtchn_do_upcall+0x35/0x50 Oct 14 20:58:02 moonflo kernel: [<ffffffff814e1e8e>] xen_do_hypervisor_callback+0x1e/0x40 Oct 14 20:58:02 moonflo kernel: <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 20:58:02 moonflo kernel: [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 20:58:02 moonflo kernel: [<ffffffff810459e0>] ? xen_safe_halt+0x10/0x20 Oct 14 20:58:02 moonflo kernel: [<ffffffff81053979>] ? default_idle+0x9/0x10 Oct 14 20:58:02 moonflo kernel: [<ffffffff810542da>] ? arch_cpu_idle+0xa/0x10 Oct 14 20:58:02 moonflo kernel: [<ffffffff810bd170>] ? cpu_startup_entry+0x190/0x2f0 Oct 14 20:58:02 moonflo kernel: [<ffffffff81047cd5>] ? cpu_bringup_and_idle+0x25/0x40 Oct 14 20:58:02 moonflo kernel: ---[ end trace 98d961bae351244d ]--- Oct 14 20:58:02 moonflo kernel: r8169 0000:37:04.0 enp55s4: link up After that, there are lots of messages about the link being up, one message every 12 seconds. When you unplug the network cable, you get a message that the link is down, and no message when you plug it in again. I was hoping that switching the network card (to one that uses a different driver) might solve the problem, and it did not. Now I can only guess that the network card goes to sleep and sometimes cannot be woken up again. I tried to reduce the connection speed to 100Mbit and found that accessing the VMs (via RDP) becomes too slow to use them. So I disabled the power management of the network card (through sysfs) and will have to see if the problem persists. We'll be getting decent network cards in a couple days, but since the problem doesn't seem to be related to a particular card/model/manufacturer, that might not fix it, either. This problem seems to only occur on machines that operate as a xen server. Other machines, identical Z800s, not running xen, run just fine. What would you suggest? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [gentoo-user] Networking trouble 2015-10-15 13:30 [gentoo-user] Networking trouble hw @ 2015-10-15 13:54 ` J. Roeleveld 2015-10-15 15:46 ` hw 0 siblings, 1 reply; 8+ messages in thread From: J. Roeleveld @ 2015-10-15 13:54 UTC (permalink / raw To: gentoo-user On Thursday, October 15, 2015 03:30:01 PM hw wrote: > Hi, > > I have a xen host with some HV guests which becomes unreachable via > the network after apparently random amount of times. I have already > switched the network card to see if that would make a difference, > and with the card currently installed, it worked fine for over 20 days > until it become unreachable again. Before switching the network card, > it would run a week or two before becoming unreachable. The previous > card was the on-board BCM5764M which uses the tg3 driver. > > There are messages like this in the log file: > > > Oct 14 20:58:02 moonflo kernel: ------------[ cut here ]------------ > Oct 14 20:58:02 moonflo kernel: WARNING: CPU: 10 PID: 0 at > net/sched/sch_generic.c:303 dev_watchdog+0x259/0x270() Oct 14 20:58:02 > moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit queue 0 timed > out Oct 14 20:58:02 moonflo kernel: Modules linked in: arc4 ecb md4 hmac > nls_utf8 cifs fscache xt_physdev br_netfilter iptable_filter ip_tables > xen_pciback xen_gntalloc xen_gntdev bridge stp llc zfs(PO) nouveau > snd_hda_codec_realtek snd_hda_codec_generic zunicode(PO) zavl(PO) > zcommon(PO) znvpair(PO) spl(O) zlib_deflate video backlight drm_kms_helper > ttm snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm snd_timer snd > soundcore r8169 mii xts aesni_intel glue_helper lrw gf128mul ablk_helper > cryptd aes_x86_64 sha256_generic hid_generic usbhid uhci_hcd usb_storage > ehci_pci ehci_hcd usbcore usb_common Oct 14 20:58:02 moonflo kernel: CPU: > 10 PID: 0 Comm: swapper/10 Tainted: P O 4.0.5-gentoo #3 Oct 14 > 20:58:02 moonflo kernel: Hardware name: Hewlett-Packard HP Z800 > Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 Oct 14 20:58:02 moonflo > kernel: ffffffff8175a77d ffff880124d43d98 ffffffff814da8d8 > 0000000000000001 Oct 14 20:58:02 moonflo kernel: ffff880124d43de8 > ffff880124d43dd8 ffffffff81088850 ffff880124d43dd8 Oct 14 20:58:02 moonflo > kernel: 0000000000000000 ffff8800d45f2000 0000000000000001 > ffff8800d5294880 Oct 14 20:58:02 moonflo kernel: Call Trace: > Oct 14 20:58:02 moonflo kernel: <IRQ> [<ffffffff814da8d8>] > dump_stack+0x45/0x57 Oct 14 20:58:02 moonflo kernel: [<ffffffff81088850>] > warn_slowpath_common+0x80/0xc0 Oct 14 20:58:02 moonflo kernel: > [<ffffffff810888d1>] warn_slowpath_fmt+0x41/0x50 Oct 14 20:58:02 moonflo > kernel: [<ffffffff812b31c5>] ? add_interrupt_randomness+0x35/0x1e0 Oct 14 > 20:58:02 moonflo kernel: [<ffffffff8145b819>] dev_watchdog+0x259/0x270 Oct > 14 20:58:02 moonflo kernel: [<ffffffff8145b5c0>] ? > dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel: > [<ffffffff8145b5c0>] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo > kernel: [<ffffffff810d4047>] call_timer_fn.isra.30+0x17/0x70 Oct 14 > 20:58:02 moonflo kernel: [<ffffffff810d42a6>] > run_timer_softirq+0x176/0x2b0 Oct 14 20:58:02 moonflo kernel: > [<ffffffff8108bd0a>] __do_softirq+0xda/0x1f0 Oct 14 20:58:02 moonflo > kernel: [<ffffffff8108c04e>] irq_exit+0x7e/0xa0 Oct 14 20:58:02 moonflo > kernel: [<ffffffff8130e075>] xen_evtchn_do_upcall+0x35/0x50 Oct 14 > 20:58:02 moonflo kernel: [<ffffffff814e1e8e>] > xen_do_hypervisor_callback+0x1e/0x40 Oct 14 20:58:02 moonflo kernel: <EOI> > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 20:58:02 > moonflo kernel: [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Oct > 14 20:58:02 moonflo kernel: [<ffffffff810459e0>] ? xen_safe_halt+0x10/0x20 > Oct 14 20:58:02 moonflo kernel: [<ffffffff81053979>] ? > default_idle+0x9/0x10 Oct 14 20:58:02 moonflo kernel: [<ffffffff810542da>] > ? arch_cpu_idle+0xa/0x10 Oct 14 20:58:02 moonflo kernel: > [<ffffffff810bd170>] ? cpu_startup_entry+0x190/0x2f0 Oct 14 20:58:02 > moonflo kernel: [<ffffffff81047cd5>] ? cpu_bringup_and_idle+0x25/0x40 Oct > 14 20:58:02 moonflo kernel: ---[ end trace 98d961bae351244d ]--- Oct 14 > 20:58:02 moonflo kernel: r8169 0000:37:04.0 enp55s4: link up > > > After that, there are lots of messages about the link being up, one message > every 12 seconds. When you unplug the network cable, you get a message that > the link is down, and no message when you plug it in again. > > I was hoping that switching the network card (to one that uses a different > driver) might solve the problem, and it did not. Now I can only guess that > the network card goes to sleep and sometimes cannot be woken up again. > > I tried to reduce the connection speed to 100Mbit and found that accessing > the VMs (via RDP) becomes too slow to use them. So I disabled the power > management of the network card (through sysfs) and will have to see if the > problem persists. > > We'll be getting decent network cards in a couple days, but since the > problem doesn't seem to be related to a particular card/model/manufacturer, > that might not fix it, either. > > This problem seems to only occur on machines that operate as a xen server. > Other machines, identical Z800s, not running xen, run just fine. > > What would you suggest? More info required: - Which version of Xen - Does this only occur with HVM guests? - Which network-driver are you using inside the guest - Can you connect to the "local" console of the guest? - If yes, does it still have no connectivity? I saw the same on my lab machine, which was related to: - Not using correct drivers inside HVM guests - Switch hardware not keeping the MAC/IP/Port lists long enough -- Joost ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [gentoo-user] Networking trouble 2015-10-15 13:54 ` J. Roeleveld @ 2015-10-15 15:46 ` hw 2015-10-16 5:32 ` J. Roeleveld 0 siblings, 1 reply; 8+ messages in thread From: hw @ 2015-10-15 15:46 UTC (permalink / raw To: gentoo-user J. Roeleveld wrote: > On Thursday, October 15, 2015 03:30:01 PM hw wrote: >> Hi, >> >> I have a xen host with some HV guests which becomes unreachable via >> the network after apparently random amount of times. I have already >> switched the network card to see if that would make a difference, >> and with the card currently installed, it worked fine for over 20 days >> until it become unreachable again. Before switching the network card, >> it would run a week or two before becoming unreachable. The previous >> card was the on-board BCM5764M which uses the tg3 driver. >> >> There are messages like this in the log file: >> >> >> Oct 14 20:58:02 moonflo kernel: ------------[ cut here ]------------ >> Oct 14 20:58:02 moonflo kernel: WARNING: CPU: 10 PID: 0 at >> net/sched/sch_generic.c:303 dev_watchdog+0x259/0x270() Oct 14 20:58:02 >> moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit queue 0 timed >> out Oct 14 20:58:02 moonflo kernel: Modules linked in: arc4 ecb md4 hmac >> nls_utf8 cifs fscache xt_physdev br_netfilter iptable_filter ip_tables >> xen_pciback xen_gntalloc xen_gntdev bridge stp llc zfs(PO) nouveau >> snd_hda_codec_realtek snd_hda_codec_generic zunicode(PO) zavl(PO) >> zcommon(PO) znvpair(PO) spl(O) zlib_deflate video backlight drm_kms_helper >> ttm snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm snd_timer snd >> soundcore r8169 mii xts aesni_intel glue_helper lrw gf128mul ablk_helper >> cryptd aes_x86_64 sha256_generic hid_generic usbhid uhci_hcd usb_storage >> ehci_pci ehci_hcd usbcore usb_common Oct 14 20:58:02 moonflo kernel: CPU: >> 10 PID: 0 Comm: swapper/10 Tainted: P O 4.0.5-gentoo #3 Oct 14 >> 20:58:02 moonflo kernel: Hardware name: Hewlett-Packard HP Z800 >> Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 Oct 14 20:58:02 moonflo >> kernel: ffffffff8175a77d ffff880124d43d98 ffffffff814da8d8 >> 0000000000000001 Oct 14 20:58:02 moonflo kernel: ffff880124d43de8 >> ffff880124d43dd8 ffffffff81088850 ffff880124d43dd8 Oct 14 20:58:02 moonflo >> kernel: 0000000000000000 ffff8800d45f2000 0000000000000001 >> ffff8800d5294880 Oct 14 20:58:02 moonflo kernel: Call Trace: >> Oct 14 20:58:02 moonflo kernel: <IRQ> [<ffffffff814da8d8>] >> dump_stack+0x45/0x57 Oct 14 20:58:02 moonflo kernel: [<ffffffff81088850>] >> warn_slowpath_common+0x80/0xc0 Oct 14 20:58:02 moonflo kernel: >> [<ffffffff810888d1>] warn_slowpath_fmt+0x41/0x50 Oct 14 20:58:02 moonflo >> kernel: [<ffffffff812b31c5>] ? add_interrupt_randomness+0x35/0x1e0 Oct 14 >> 20:58:02 moonflo kernel: [<ffffffff8145b819>] dev_watchdog+0x259/0x270 Oct >> 14 20:58:02 moonflo kernel: [<ffffffff8145b5c0>] ? >> dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel: >> [<ffffffff8145b5c0>] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo >> kernel: [<ffffffff810d4047>] call_timer_fn.isra.30+0x17/0x70 Oct 14 >> 20:58:02 moonflo kernel: [<ffffffff810d42a6>] >> run_timer_softirq+0x176/0x2b0 Oct 14 20:58:02 moonflo kernel: >> [<ffffffff8108bd0a>] __do_softirq+0xda/0x1f0 Oct 14 20:58:02 moonflo >> kernel: [<ffffffff8108c04e>] irq_exit+0x7e/0xa0 Oct 14 20:58:02 moonflo >> kernel: [<ffffffff8130e075>] xen_evtchn_do_upcall+0x35/0x50 Oct 14 >> 20:58:02 moonflo kernel: [<ffffffff814e1e8e>] >> xen_do_hypervisor_callback+0x1e/0x40 Oct 14 20:58:02 moonflo kernel: <EOI> >> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 20:58:02 >> moonflo kernel: [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Oct >> 14 20:58:02 moonflo kernel: [<ffffffff810459e0>] ? xen_safe_halt+0x10/0x20 >> Oct 14 20:58:02 moonflo kernel: [<ffffffff81053979>] ? >> default_idle+0x9/0x10 Oct 14 20:58:02 moonflo kernel: [<ffffffff810542da>] >> ? arch_cpu_idle+0xa/0x10 Oct 14 20:58:02 moonflo kernel: >> [<ffffffff810bd170>] ? cpu_startup_entry+0x190/0x2f0 Oct 14 20:58:02 >> moonflo kernel: [<ffffffff81047cd5>] ? cpu_bringup_and_idle+0x25/0x40 Oct >> 14 20:58:02 moonflo kernel: ---[ end trace 98d961bae351244d ]--- Oct 14 >> 20:58:02 moonflo kernel: r8169 0000:37:04.0 enp55s4: link up >> >> >> After that, there are lots of messages about the link being up, one message >> every 12 seconds. When you unplug the network cable, you get a message that >> the link is down, and no message when you plug it in again. >> >> I was hoping that switching the network card (to one that uses a different >> driver) might solve the problem, and it did not. Now I can only guess that >> the network card goes to sleep and sometimes cannot be woken up again. >> >> I tried to reduce the connection speed to 100Mbit and found that accessing >> the VMs (via RDP) becomes too slow to use them. So I disabled the power >> management of the network card (through sysfs) and will have to see if the >> problem persists. >> >> We'll be getting decent network cards in a couple days, but since the >> problem doesn't seem to be related to a particular card/model/manufacturer, >> that might not fix it, either. >> >> This problem seems to only occur on machines that operate as a xen server. >> Other machines, identical Z800s, not running xen, run just fine. >> >> What would you suggest? > > More info required: > > - Which version of Xen 4.5.1 Installed versions: 4.5.1^t(02:44:35 PM 07/14/2015)(-custom-cflags -debug -efi -flask -xsm) > - Does this only occur with HVM guests? The host has been running only HVM guests every time it happend. It was running a PV guest in between (which I had to shut down because other VMs were migrated, requiring the RAM). > - Which network-driver are you using inside the guest r8169, compiled as a module Same happened with the tg3 driver when the on-board cards were used. The tg3 driver is completely disabled in the kernel config, i. e. not even compiled as a module. > - Can you connect to the "local" console of the guest? Yes, the host seems to be running fine except for having no network connectivity. There's a keyboard and monitor physically connected to it with which you can log in and do stuff. You get no answer when you ping the host while it is unreachable. > - If yes, does it still have no connectivity? It has been restarted this morning when it was found to be unreachable. > I saw the same on my lab machine, which was related to: > - Not using correct drivers inside HVM guests There are Windoze 7 guests running that have PV drivers installed. One of those has formerly been running on a VMware host and was migrated on Tuesday. I deinstalled the VMware tools from it. Since Monday, a HVM Linux system (a modified 32-bit Debian) has also been migrated from the VMware host to this one. I don't know if it has VMware tools installed (I guess it does because it could be shut down via VMware) and how those might react now. It's working, and I don't want to touch it. However, the problem already occured before this migration, when the on-board cards were still used. > - Switch hardware not keeping the MAC/IP/Port lists long enough What might be the reason for the lists becoming too short? Too many devices connected to the network? The host has been connected to two different switches and showed the problem. Previously, that was an 8-port 1Gb switch, now it's a 24-port 1Gb switch. However, the 8-port switch is also connected to the 24-port switch the host is now connected to. (The 24-port switch connects it "directly" to the rest of the network.) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [gentoo-user] Networking trouble 2015-10-15 15:46 ` hw @ 2015-10-16 5:32 ` J. Roeleveld 2015-10-29 10:29 ` hw 0 siblings, 1 reply; 8+ messages in thread From: J. Roeleveld @ 2015-10-16 5:32 UTC (permalink / raw To: gentoo-user On Thursday, October 15, 2015 05:46:07 PM hw wrote: > J. Roeleveld wrote: > > On Thursday, October 15, 2015 03:30:01 PM hw wrote: > >> Hi, > >> > >> I have a xen host with some HV guests which becomes unreachable via > >> the network after apparently random amount of times. I have already > >> switched the network card to see if that would make a difference, > >> and with the card currently installed, it worked fine for over 20 days > >> until it become unreachable again. Before switching the network card, > >> it would run a week or two before becoming unreachable. The previous > >> card was the on-board BCM5764M which uses the tg3 driver. > >> > >> There are messages like this in the log file: > >> > >> > >> Oct 14 20:58:02 moonflo kernel: ------------[ cut here ]------------ > >> Oct 14 20:58:02 moonflo kernel: WARNING: CPU: 10 PID: 0 at > >> net/sched/sch_generic.c:303 dev_watchdog+0x259/0x270() Oct 14 20:58:02 > >> moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit queue 0 timed > >> out Oct 14 20:58:02 moonflo kernel: Modules linked in: arc4 ecb md4 hmac > >> nls_utf8 cifs fscache xt_physdev br_netfilter iptable_filter ip_tables > >> xen_pciback xen_gntalloc xen_gntdev bridge stp llc zfs(PO) nouveau > >> snd_hda_codec_realtek snd_hda_codec_generic zunicode(PO) zavl(PO) > >> zcommon(PO) znvpair(PO) spl(O) zlib_deflate video backlight > >> drm_kms_helper > >> ttm snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm snd_timer snd > >> soundcore r8169 mii xts aesni_intel glue_helper lrw gf128mul ablk_helper > >> cryptd aes_x86_64 sha256_generic hid_generic usbhid uhci_hcd usb_storage > >> ehci_pci ehci_hcd usbcore usb_common Oct 14 20:58:02 moonflo kernel: CPU: > >> 10 PID: 0 Comm: swapper/10 Tainted: P O 4.0.5-gentoo #3 Oct > >> 14 > >> 20:58:02 moonflo kernel: Hardware name: Hewlett-Packard HP Z800 > >> Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 Oct 14 20:58:02 moonflo > >> kernel: ffffffff8175a77d ffff880124d43d98 ffffffff814da8d8 > >> 0000000000000001 Oct 14 20:58:02 moonflo kernel: ffff880124d43de8 > >> ffff880124d43dd8 ffffffff81088850 ffff880124d43dd8 Oct 14 20:58:02 > >> moonflo > >> kernel: 0000000000000000 ffff8800d45f2000 0000000000000001 > >> ffff8800d5294880 Oct 14 20:58:02 moonflo kernel: Call Trace: > >> Oct 14 20:58:02 moonflo kernel: <IRQ> [<ffffffff814da8d8>] > >> dump_stack+0x45/0x57 Oct 14 20:58:02 moonflo kernel: > >> [<ffffffff81088850>] > >> warn_slowpath_common+0x80/0xc0 Oct 14 20:58:02 moonflo kernel: > >> [<ffffffff810888d1>] warn_slowpath_fmt+0x41/0x50 Oct 14 20:58:02 moonflo > >> kernel: [<ffffffff812b31c5>] ? add_interrupt_randomness+0x35/0x1e0 Oct > >> 14 > >> 20:58:02 moonflo kernel: [<ffffffff8145b819>] dev_watchdog+0x259/0x270 > >> Oct > >> 14 20:58:02 moonflo kernel: [<ffffffff8145b5c0>] ? > >> dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel: > >> [<ffffffff8145b5c0>] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo > >> kernel: [<ffffffff810d4047>] call_timer_fn.isra.30+0x17/0x70 Oct 14 > >> 20:58:02 moonflo kernel: [<ffffffff810d42a6>] > >> run_timer_softirq+0x176/0x2b0 Oct 14 20:58:02 moonflo kernel: > >> [<ffffffff8108bd0a>] __do_softirq+0xda/0x1f0 Oct 14 20:58:02 moonflo > >> kernel: [<ffffffff8108c04e>] irq_exit+0x7e/0xa0 Oct 14 20:58:02 moonflo > >> kernel: [<ffffffff8130e075>] xen_evtchn_do_upcall+0x35/0x50 Oct 14 > >> 20:58:02 moonflo kernel: [<ffffffff814e1e8e>] > >> xen_do_hypervisor_callback+0x1e/0x40 Oct 14 20:58:02 moonflo kernel: > >> <EOI> > >> > >> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 20:58:02 > >> > >> moonflo kernel: [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > >> Oct > >> 14 20:58:02 moonflo kernel: [<ffffffff810459e0>] ? > >> xen_safe_halt+0x10/0x20 > >> Oct 14 20:58:02 moonflo kernel: [<ffffffff81053979>] ? > >> default_idle+0x9/0x10 Oct 14 20:58:02 moonflo kernel: > >> [<ffffffff810542da>] > >> ? arch_cpu_idle+0xa/0x10 Oct 14 20:58:02 moonflo kernel: > >> [<ffffffff810bd170>] ? cpu_startup_entry+0x190/0x2f0 Oct 14 20:58:02 > >> moonflo kernel: [<ffffffff81047cd5>] ? cpu_bringup_and_idle+0x25/0x40 > >> Oct > >> 14 20:58:02 moonflo kernel: ---[ end trace 98d961bae351244d ]--- Oct 14 > >> 20:58:02 moonflo kernel: r8169 0000:37:04.0 enp55s4: link up > >> > >> > >> After that, there are lots of messages about the link being up, one > >> message > >> every 12 seconds. When you unplug the network cable, you get a message > >> that the link is down, and no message when you plug it in again. > >> > >> I was hoping that switching the network card (to one that uses a > >> different > >> driver) might solve the problem, and it did not. Now I can only guess > >> that > >> the network card goes to sleep and sometimes cannot be woken up again. > >> > >> I tried to reduce the connection speed to 100Mbit and found that > >> accessing > >> the VMs (via RDP) becomes too slow to use them. So I disabled the power > >> management of the network card (through sysfs) and will have to see if > >> the > >> problem persists. > >> > >> We'll be getting decent network cards in a couple days, but since the > >> problem doesn't seem to be related to a particular > >> card/model/manufacturer, > >> that might not fix it, either. > >> > >> This problem seems to only occur on machines that operate as a xen > >> server. > >> Other machines, identical Z800s, not running xen, run just fine. > >> > >> What would you suggest? > > > > More info required: > > > > - Which version of Xen > > 4.5.1 > > Installed versions: 4.5.1^t(02:44:35 PM 07/14/2015)(-custom-cflags -debug > -efi -flask -xsm) Ok, recent one. > > - Does this only occur with HVM guests? > > The host has been running only HVM guests every time it happend. > It was running a PV guest in between (which I had to shut down > because other VMs were migrated, requiring the RAM). The PV didn't have any issues? > > - Which network-driver are you using inside the guest > > r8169, compiled as a module > > Same happened with the tg3 driver when the on-board cards were used. > The tg3 driver is completely disabled in the kernel config, i. e. > not even compiled as a module. You have network cards assigned to the guests? > > - Can you connect to the "local" console of the guest? > > Yes, the host seems to be running fine except for having no network > connectivity. There's a keyboard and monitor physically connected to > it with which you can log in and do stuff. The HOST loses network connectivity? > You get no answer when you ping the host while it is unreachable. > > > - If yes, does it still have no connectivity? > > It has been restarted this morning when it was found to be unreachable. > > > I saw the same on my lab machine, which was related to: > > - Not using correct drivers inside HVM guests > > There are Windoze 7 guests running that have PV drivers installed. > One of those has formerly been running on a VMware host and was > migrated on Tuesday. I deinstalled the VMware tools from it. Which PV drivers? And did you ensure all VMWare related drivers were removed? I am not convinced uninstalling the VMWare tools is sufficient. > Since Monday, a HVM Linux system (a modified 32-bit Debian) has also > been migrated from the VMware host to this one. I don't know if it > has VMware tools installed (I guess it does because it could be shut > down via VMware) and how those might react now. It's working, and I > don't want to touch it. > > However, the problem already occured before this migration, when the > on-board cards were still used. > > > - Switch hardware not keeping the MAC/IP/Port lists long enough > > What might be the reason for the lists becoming too short? Too many > devices connected to the network? No network activity for a while. (clean installs, nothing running) Switch forgetting the MAC-address assigned to the VM. Connecting to the VM-console, I could ping www.google.com and then the connectivity re-appeared. > The host has been connected to two different switches and showed the > problem. Previously, that was an 8-port 1Gb switch, now it's a 24-port > 1Gb switch. However, the 8-port switch is also connected to the 24-port > switch the host is now connected to. (The 24-port switch connects it > "directly" to the rest of the network.) Assuming it's a managed switch, you could test this. Alternatively, check if you can access the VMs from the host. -- Joost ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [gentoo-user] Networking trouble 2015-10-16 5:32 ` J. Roeleveld @ 2015-10-29 10:29 ` hw 2015-10-29 17:25 ` J. Roeleveld 0 siblings, 1 reply; 8+ messages in thread From: hw @ 2015-10-29 10:29 UTC (permalink / raw To: gentoo-user J. Roeleveld wrote: > On Thursday, October 15, 2015 05:46:07 PM hw wrote: >> J. Roeleveld wrote: >>> On Thursday, October 15, 2015 03:30:01 PM hw wrote: >>>> Hi, >>>> >>>> I have a xen host with some HV guests which becomes unreachable via >>>> the network after apparently random amount of times. I have already >>>> switched the network card to see if that would make a difference, >>>> and with the card currently installed, it worked fine for over 20 days >>>> until it become unreachable again. Before switching the network card, >>>> it would run a week or two before becoming unreachable. The previous >>>> card was the on-board BCM5764M which uses the tg3 driver. >>>> >>>> There are messages like this in the log file: >>>> >>>> >>>> Oct 14 20:58:02 moonflo kernel: ------------[ cut here ]------------ >>>> Oct 14 20:58:02 moonflo kernel: WARNING: CPU: 10 PID: 0 at >>>> net/sched/sch_generic.c:303 dev_watchdog+0x259/0x270() Oct 14 20:58:02 >>>> moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit queue 0 timed >>>> out Oct 14 20:58:02 moonflo kernel: Modules linked in: arc4 ecb md4 hmac >>>> nls_utf8 cifs fscache xt_physdev br_netfilter iptable_filter ip_tables >>>> xen_pciback xen_gntalloc xen_gntdev bridge stp llc zfs(PO) nouveau >>>> snd_hda_codec_realtek snd_hda_codec_generic zunicode(PO) zavl(PO) >>>> zcommon(PO) znvpair(PO) spl(O) zlib_deflate video backlight >>>> drm_kms_helper >>>> ttm snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm snd_timer snd >>>> soundcore r8169 mii xts aesni_intel glue_helper lrw gf128mul ablk_helper >>>> cryptd aes_x86_64 sha256_generic hid_generic usbhid uhci_hcd usb_storage >>>> ehci_pci ehci_hcd usbcore usb_common Oct 14 20:58:02 moonflo kernel: CPU: >>>> 10 PID: 0 Comm: swapper/10 Tainted: P O 4.0.5-gentoo #3 Oct >>>> 14 >>>> 20:58:02 moonflo kernel: Hardware name: Hewlett-Packard HP Z800 >>>> Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 Oct 14 20:58:02 moonflo >>>> kernel: ffffffff8175a77d ffff880124d43d98 ffffffff814da8d8 >>>> 0000000000000001 Oct 14 20:58:02 moonflo kernel: ffff880124d43de8 >>>> ffff880124d43dd8 ffffffff81088850 ffff880124d43dd8 Oct 14 20:58:02 >>>> moonflo >>>> kernel: 0000000000000000 ffff8800d45f2000 0000000000000001 >>>> ffff8800d5294880 Oct 14 20:58:02 moonflo kernel: Call Trace: >>>> Oct 14 20:58:02 moonflo kernel: <IRQ> [<ffffffff814da8d8>] >>>> dump_stack+0x45/0x57 Oct 14 20:58:02 moonflo kernel: >>>> [<ffffffff81088850>] >>>> warn_slowpath_common+0x80/0xc0 Oct 14 20:58:02 moonflo kernel: >>>> [<ffffffff810888d1>] warn_slowpath_fmt+0x41/0x50 Oct 14 20:58:02 moonflo >>>> kernel: [<ffffffff812b31c5>] ? add_interrupt_randomness+0x35/0x1e0 Oct >>>> 14 >>>> 20:58:02 moonflo kernel: [<ffffffff8145b819>] dev_watchdog+0x259/0x270 >>>> Oct >>>> 14 20:58:02 moonflo kernel: [<ffffffff8145b5c0>] ? >>>> dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel: >>>> [<ffffffff8145b5c0>] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo >>>> kernel: [<ffffffff810d4047>] call_timer_fn.isra.30+0x17/0x70 Oct 14 >>>> 20:58:02 moonflo kernel: [<ffffffff810d42a6>] >>>> run_timer_softirq+0x176/0x2b0 Oct 14 20:58:02 moonflo kernel: >>>> [<ffffffff8108bd0a>] __do_softirq+0xda/0x1f0 Oct 14 20:58:02 moonflo >>>> kernel: [<ffffffff8108c04e>] irq_exit+0x7e/0xa0 Oct 14 20:58:02 moonflo >>>> kernel: [<ffffffff8130e075>] xen_evtchn_do_upcall+0x35/0x50 Oct 14 >>>> 20:58:02 moonflo kernel: [<ffffffff814e1e8e>] >>>> xen_do_hypervisor_callback+0x1e/0x40 Oct 14 20:58:02 moonflo kernel: >>>> <EOI> >>>> >>>> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 20:58:02 >>>> >>>> moonflo kernel: [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 >>>> Oct >>>> 14 20:58:02 moonflo kernel: [<ffffffff810459e0>] ? >>>> xen_safe_halt+0x10/0x20 >>>> Oct 14 20:58:02 moonflo kernel: [<ffffffff81053979>] ? >>>> default_idle+0x9/0x10 Oct 14 20:58:02 moonflo kernel: >>>> [<ffffffff810542da>] >>>> ? arch_cpu_idle+0xa/0x10 Oct 14 20:58:02 moonflo kernel: >>>> [<ffffffff810bd170>] ? cpu_startup_entry+0x190/0x2f0 Oct 14 20:58:02 >>>> moonflo kernel: [<ffffffff81047cd5>] ? cpu_bringup_and_idle+0x25/0x40 >>>> Oct >>>> 14 20:58:02 moonflo kernel: ---[ end trace 98d961bae351244d ]--- Oct 14 >>>> 20:58:02 moonflo kernel: r8169 0000:37:04.0 enp55s4: link up >>>> >>>> >>>> After that, there are lots of messages about the link being up, one >>>> message >>>> every 12 seconds. When you unplug the network cable, you get a message >>>> that the link is down, and no message when you plug it in again. >>>> >>>> I was hoping that switching the network card (to one that uses a >>>> different >>>> driver) might solve the problem, and it did not. Now I can only guess >>>> that >>>> the network card goes to sleep and sometimes cannot be woken up again. >>>> >>>> I tried to reduce the connection speed to 100Mbit and found that >>>> accessing >>>> the VMs (via RDP) becomes too slow to use them. So I disabled the power >>>> management of the network card (through sysfs) and will have to see if >>>> the >>>> problem persists. >>>> >>>> We'll be getting decent network cards in a couple days, but since the >>>> problem doesn't seem to be related to a particular >>>> card/model/manufacturer, >>>> that might not fix it, either. >>>> >>>> This problem seems to only occur on machines that operate as a xen >>>> server. >>>> Other machines, identical Z800s, not running xen, run just fine. >>>> >>>> What would you suggest? >>> >>> More info required: >>> >>> - Which version of Xen >> >> 4.5.1 >> >> Installed versions: 4.5.1^t(02:44:35 PM 07/14/2015)(-custom-cflags -debug >> -efi -flask -xsm) > > Ok, recent one. > >>> - Does this only occur with HVM guests? >> >> The host has been running only HVM guests every time it happend. >> It was running a PV guest in between (which I had to shut down >> because other VMs were migrated, requiring the RAM). > > The PV didn't have any issues? The whole server has the issue, not a particular VM. While the PV guest was running, the server didn't freeze. >>> - Which network-driver are you using inside the guest >> >> r8169, compiled as a module >> >> Same happened with the tg3 driver when the on-board cards were used. >> The tg3 driver is completely disabled in the kernel config, i. e. >> not even compiled as a module. > > You have network cards assigned to the guests? No, they are all connected via a bridge. I enabled STP on the bridge and the server was ok for a week, then had to be restarted. I'm seeing lots of messages in the log: Oct 28 11:14:05 moonflo kernel: brloc: topology change detected, propagating Oct 28 11:14:05 moonflo kernel: brloc: port 1(enp55s4) received tcn bpdu Oct 28 11:14:05 moonflo kernel: brloc: topology change detected, propagating Oct 28 11:14:05 moonflo kernel: brloc: port 1(enp55s4) received tcn bpdu Oct 28 11:14:05 moonflo kernel: brloc: topology change detected, propagating Oct 28 11:14:05 moonflo kernel: brloc: port 1(enp55s4) received tcn bpdu Oct 28 11:14:05 moonflo kernel: brloc: topology change detected, propagating Oct 28 11:14:05 moonflo kernel: brloc: port 1(enp55s4) received tcn bpdu Oct 28 11:14:05 moonflo kernel: brloc: topology change detected, propagating and sometimes: Oct 28 10:47:04 moonflo kernel: brloc: port 1(enp55s4) neighbor 8000.00:00:10:11:12:00 lost Any idea what this means? (Google has gone on strike, and another search engine didn't give any useful findings ...) >>> - Can you connect to the "local" console of the guest? >> >> Yes, the host seems to be running fine except for having no network >> connectivity. There's a keyboard and monitor physically connected to >> it with which you can log in and do stuff. > > The HOST loses network connectivity? Yes. Apparently when it became unresponsive yesterday, it was not possible to log in at the console, either. I wasn't there yesterday, though I've see that happen before. We tried to shut it down via acpid by pressing the power button. It didn't turn off, so it was switched off by holding the power button. What I can see in the log is: Oct 28 14:12:33 moonflo logger[20322]: /etc/xen/scripts/block: remove XENBUS_PATH=backend/vbd/2/768 Oct 28 14:12:33 moonflo logger[20323]: /etc/xen/scripts/vif-bridge: offline type_if=vif XENBUS_PATH=backend/vif/2/0 Oct 28 14:12:33 moonflo logger[20347]: /etc/xen/scripts/vif-bridge: brctl delif brloc vif2.0 failed Oct 28 14:12:33 moonflo logger[20353]: /etc/xen/scripts/vif-bridge: ifconfig vif2.0 down failed Oct 28 14:12:33 moonflo logger[20361]: /etc/xen/scripts/vif-bridge: Successful vif-bridge offline for vif2.0, bridge brloc. Oct 28 14:12:33 moonflo logger[20372]: /etc/xen/scripts/vif-bridge: remove type_if=tap XENBUS_PATH=backend/vif/2/0 Oct 28 14:12:33 moonflo logger[20391]: /etc/xen/scripts/vif-bridge: Successful vif-bridge remove for vif2.0-emu, bridge brloc. Oct 28 14:15:33 moonflo shutdown[20476]: shutting down for system halt ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Oct 28 14:17:34 moonflo syslog-ng[4611]: syslog-ng starting up; version='3.6.2' And: Oct 24 11:47:42 moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit queue 0 timed out Oct 24 11:47:42 moonflo kernel: Modules linked in: xt_physdev br_netfilter iptable_filter ip_tables xen_pciback xen_gntalloc xen_gntdev bridge stp llc zfs(PO) zuni code(PO) zavl(PO) zcommon(PO) znvpair(PO) nouveau snd_hda_codec_realtek snd_hda_codec_generic video spl(O) backlight zlib_deflate drm_kms_helper snd_hda_intel snd_ hda_controller snd_hda_codec snd_pcm snd_timer r8169 snd ttm soundcore mii xts aesni_intel glue_helper lrw gf128mul ablk_helper cryptd aes_x86_64 sha256_generic hi d_generic usbhid uhci_hcd usb_storage ehci_pci ehci_hcd usbcore usb_common Oct 24 11:47:42 moonflo kernel: CPU: 12 PID: 0 Comm: swapper/12 Tainted: P O 4.0.5-gentoo #3 Oct 24 11:47:42 moonflo kernel: Hardware name: Hewlett-Packard HP Z800 Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 Oct 24 11:47:42 moonflo kernel: ffffffff8175a77d ffff880124d83d98 ffffffff814da8d8 0000000000000001 Oct 24 11:47:42 moonflo kernel: ffff880124d83de8 ffff880124d83dd8 ffffffff81088850 ffff880124d83e68 Oct 24 11:47:42 moonflo kernel: 0000000000000000 ffff88011efd8000 0000000000000001 ffff8800d4eb5e80 Oct 24 11:47:42 moonflo kernel: Call Trace: Oct 24 11:47:42 moonflo kernel: <IRQ> [<ffffffff814da8d8>] dump_stack+0x45/0x57 Oct 24 11:47:42 moonflo kernel: [<ffffffff81088850>] warn_slowpath_common+0x80/0xc0 Oct 24 11:47:42 moonflo kernel: [<ffffffff810888d1>] warn_slowpath_fmt+0x41/0x50 Oct 24 11:47:42 moonflo kernel: [<ffffffff812b31c5>] ? add_interrupt_randomness+0x35/0x1e0 Oct 24 11:47:42 moonflo kernel: [<ffffffff8145b819>] dev_watchdog+0x259/0x270 Oct 24 11:47:42 moonflo kernel: [<ffffffff8145b5c0>] ? dev_graft_qdisc+0x80/0x80 Oct 24 11:47:42 moonflo kernel: [<ffffffff8145b5c0>] ? dev_graft_qdisc+0x80/0x80 Oct 24 11:47:42 moonflo kernel: [<ffffffff810d4047>] call_timer_fn.isra.30+0x17/0x70 Oct 24 11:47:42 moonflo kernel: [<ffffffff810d42a6>] run_timer_softirq+0x176/0x2b0 Oct 24 11:47:42 moonflo kernel: [<ffffffff8108bd0a>] __do_softirq+0xda/0x1f0 Oct 24 11:47:42 moonflo kernel: [<ffffffff8108c04e>] irq_exit+0x7e/0xa0 Oct 24 11:47:42 moonflo kernel: [<ffffffff8130e075>] xen_evtchn_do_upcall+0x35/0x50 Oct 24 11:47:42 moonflo kernel: [<ffffffff814e1e8e>] xen_do_hypervisor_callback+0x1e/0x40 Oct 24 11:47:42 moonflo kernel: <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Oct 24 11:47:42 moonflo kernel: [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Oct 24 11:47:42 moonflo kernel: [<ffffffff810459e0>] ? xen_safe_halt+0x10/0x20 Oct 24 11:47:42 moonflo kernel: [<ffffffff81053979>] ? default_idle+0x9/0x10 Oct 24 11:47:42 moonflo kernel: [<ffffffff810542da>] ? arch_cpu_idle+0xa/0x10 Oct 24 11:47:42 moonflo kernel: [<ffffffff810bd170>] ? cpu_startup_entry+0x190/0x2f0 Oct 24 11:47:42 moonflo kernel: [<ffffffff81047cd5>] ? cpu_bringup_and_idle+0x25/0x40 Oct 24 11:47:42 moonflo kernel: ---[ end trace 320b6f98f8fc070f ]--- Oct 24 11:47:42 moonflo kernel: r8169 0000:37:04.0 enp55s4: link up That was two days before it went down. After that, messages about topology changes are starting to appear. I'm not sure if I should call this "progress" ;) > >> You get no answer when you ping the host while it is unreachable. >> >>> - If yes, does it still have no connectivity? >> >> It has been restarted this morning when it was found to be unreachable. >> >>> I saw the same on my lab machine, which was related to: >>> - Not using correct drivers inside HVM guests >> >> There are Windoze 7 guests running that have PV drivers installed. >> One of those has formerly been running on a VMware host and was >> migrated on Tuesday. I deinstalled the VMware tools from it. > > Which PV drivers? Xen GPL PV Driver Developers 17.09.2014 0.11.0.373 Univention GmbH > And did you ensure all VMWare related drivers were removed? > I am not convinced uninstalling the VMWare tools is sufficient. What would I need to look at to make sure they are removed? The problem has been there before the VM that had VMWare drivers installed was migrated to this server. So I don't think they are causing this problem. >> Since Monday, a HVM Linux system (a modified 32-bit Debian) has also >> been migrated from the VMware host to this one. I don't know if it >> has VMware tools installed (I guess it does because it could be shut >> down via VMware) and how those might react now. It's working, and I >> don't want to touch it. >> >> However, the problem already occured before this migration, when the >> on-board cards were still used. >> >>> - Switch hardware not keeping the MAC/IP/Port lists long enough >> >> What might be the reason for the lists becoming too short? Too many >> devices connected to the network? > > No network activity for a while. (clean installs, nothing running) > Switch forgetting the MAC-address assigned to the VM. > > Connecting to the VM-console, I could ping www.google.com and then the > connectivity re-appeared. Half of the switches have been replaced last week in order to track down what appears to be a weird network problem. The problem is that the RDP clients are being randomly stalled. If it was only that, I'd suspect this server some more, but the internet connection goes through the same switches and is apprently also slowed down when the RPD clients are stalled. They got also randomly stalled when the RDP clients were accessing a totally different server (the VMWare server), so this might be entirely unrelated. Replacing the switches didn't fix the problem, so I'll probably put them back into service and replace the other half. >> The host has been connected to two different switches and showed the >> problem. Previously, that was an 8-port 1Gb switch, now it's a 24-port >> 1Gb switch. However, the 8-port switch is also connected to the 24-port >> switch the host is now connected to. (The 24-port switch connects it >> "directly" to the rest of the network.) > > Assuming it's a managed switch, you could test this. > Alternatively, check if you can access the VMs from the host. Good idea, I'll try that when it happens when I'm here. The network cards have arrived, Intel PRO 1000 dual port, made for IBM. I hope I get to swap the card today. Those *really* should work. Hm, I could plug in two of them and give each VM and the host its own physical card. Do you think that might help? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [gentoo-user] Networking trouble 2015-10-29 10:29 ` hw @ 2015-10-29 17:25 ` J. Roeleveld 2015-10-30 9:34 ` hw 0 siblings, 1 reply; 8+ messages in thread From: J. Roeleveld @ 2015-10-29 17:25 UTC (permalink / raw To: gentoo-user On 29 October 2015 11:29:18 CET, hw <hw@gc-24.de> wrote: >J. Roeleveld wrote: >> On Thursday, October 15, 2015 05:46:07 PM hw wrote: >>> J. Roeleveld wrote: >>>> On Thursday, October 15, 2015 03:30:01 PM hw wrote: >>>>> Hi, >>>>> >>>>> I have a xen host with some HV guests which becomes unreachable >via >>>>> the network after apparently random amount of times. I have >already >>>>> switched the network card to see if that would make a difference, >>>>> and with the card currently installed, it worked fine for over 20 >days >>>>> until it become unreachable again. Before switching the network >card, >>>>> it would run a week or two before becoming unreachable. The >previous >>>>> card was the on-board BCM5764M which uses the tg3 driver. >>>>> >>>>> There are messages like this in the log file: >>>>> >>>>> >>>>> Oct 14 20:58:02 moonflo kernel: ------------[ cut here >]------------ >>>>> Oct 14 20:58:02 moonflo kernel: WARNING: CPU: 10 PID: 0 at >>>>> net/sched/sch_generic.c:303 dev_watchdog+0x259/0x270() Oct 14 >20:58:02 >>>>> moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit queue 0 >timed >>>>> out Oct 14 20:58:02 moonflo kernel: Modules linked in: arc4 ecb >md4 hmac >>>>> nls_utf8 cifs fscache xt_physdev br_netfilter iptable_filter >ip_tables >>>>> xen_pciback xen_gntalloc xen_gntdev bridge stp llc zfs(PO) nouveau >>>>> snd_hda_codec_realtek snd_hda_codec_generic zunicode(PO) zavl(PO) >>>>> zcommon(PO) znvpair(PO) spl(O) zlib_deflate video backlight >>>>> drm_kms_helper >>>>> ttm snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm >snd_timer snd >>>>> soundcore r8169 mii xts aesni_intel glue_helper lrw gf128mul >ablk_helper >>>>> cryptd aes_x86_64 sha256_generic hid_generic usbhid uhci_hcd >usb_storage >>>>> ehci_pci ehci_hcd usbcore usb_common Oct 14 20:58:02 moonflo >kernel: CPU: >>>>> 10 PID: 0 Comm: swapper/10 Tainted: P O 4.0.5-gentoo >#3 Oct >>>>> 14 >>>>> 20:58:02 moonflo kernel: Hardware name: Hewlett-Packard HP Z800 >>>>> Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 Oct 14 20:58:02 >moonflo >>>>> kernel: ffffffff8175a77d ffff880124d43d98 ffffffff814da8d8 >>>>> 0000000000000001 Oct 14 20:58:02 moonflo kernel: ffff880124d43de8 >>>>> ffff880124d43dd8 ffffffff81088850 ffff880124d43dd8 Oct 14 20:58:02 >>>>> moonflo >>>>> kernel: 0000000000000000 ffff8800d45f2000 0000000000000001 >>>>> ffff8800d5294880 Oct 14 20:58:02 moonflo kernel: Call Trace: >>>>> Oct 14 20:58:02 moonflo kernel: <IRQ> [<ffffffff814da8d8>] >>>>> dump_stack+0x45/0x57 Oct 14 20:58:02 moonflo kernel: >>>>> [<ffffffff81088850>] >>>>> warn_slowpath_common+0x80/0xc0 Oct 14 20:58:02 moonflo kernel: >>>>> [<ffffffff810888d1>] warn_slowpath_fmt+0x41/0x50 Oct 14 20:58:02 >moonflo >>>>> kernel: [<ffffffff812b31c5>] ? >add_interrupt_randomness+0x35/0x1e0 Oct >>>>> 14 >>>>> 20:58:02 moonflo kernel: [<ffffffff8145b819>] >dev_watchdog+0x259/0x270 >>>>> Oct >>>>> 14 20:58:02 moonflo kernel: [<ffffffff8145b5c0>] ? >>>>> dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel: >>>>> [<ffffffff8145b5c0>] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 >moonflo >>>>> kernel: [<ffffffff810d4047>] call_timer_fn.isra.30+0x17/0x70 Oct >14 >>>>> 20:58:02 moonflo kernel: [<ffffffff810d42a6>] >>>>> run_timer_softirq+0x176/0x2b0 Oct 14 20:58:02 moonflo kernel: >>>>> [<ffffffff8108bd0a>] __do_softirq+0xda/0x1f0 Oct 14 20:58:02 >moonflo >>>>> kernel: [<ffffffff8108c04e>] irq_exit+0x7e/0xa0 Oct 14 20:58:02 >moonflo >>>>> kernel: [<ffffffff8130e075>] xen_evtchn_do_upcall+0x35/0x50 Oct >14 >>>>> 20:58:02 moonflo kernel: [<ffffffff814e1e8e>] >>>>> xen_do_hypervisor_callback+0x1e/0x40 Oct 14 20:58:02 moonflo >kernel: >>>>> <EOI> >>>>> >>>>> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 >20:58:02 >>>>> >>>>> moonflo kernel: [<ffffffff810013aa>] ? >xen_hypercall_sched_op+0xa/0x20 >>>>> Oct >>>>> 14 20:58:02 moonflo kernel: [<ffffffff810459e0>] ? >>>>> xen_safe_halt+0x10/0x20 >>>>> Oct 14 20:58:02 moonflo kernel: [<ffffffff81053979>] ? >>>>> default_idle+0x9/0x10 Oct 14 20:58:02 moonflo kernel: >>>>> [<ffffffff810542da>] >>>>> ? arch_cpu_idle+0xa/0x10 Oct 14 20:58:02 moonflo kernel: >>>>> [<ffffffff810bd170>] ? cpu_startup_entry+0x190/0x2f0 Oct 14 >20:58:02 >>>>> moonflo kernel: [<ffffffff81047cd5>] ? >cpu_bringup_and_idle+0x25/0x40 >>>>> Oct >>>>> 14 20:58:02 moonflo kernel: ---[ end trace 98d961bae351244d ]--- >Oct 14 >>>>> 20:58:02 moonflo kernel: r8169 0000:37:04.0 enp55s4: link up >>>>> >>>>> >>>>> After that, there are lots of messages about the link being up, >one >>>>> message >>>>> every 12 seconds. When you unplug the network cable, you get a >message >>>>> that the link is down, and no message when you plug it in again. >>>>> >>>>> I was hoping that switching the network card (to one that uses a >>>>> different >>>>> driver) might solve the problem, and it did not. Now I can only >guess >>>>> that >>>>> the network card goes to sleep and sometimes cannot be woken up >again. >>>>> >>>>> I tried to reduce the connection speed to 100Mbit and found that >>>>> accessing >>>>> the VMs (via RDP) becomes too slow to use them. So I disabled the >power >>>>> management of the network card (through sysfs) and will have to >see if >>>>> the >>>>> problem persists. >>>>> >>>>> We'll be getting decent network cards in a couple days, but since >the >>>>> problem doesn't seem to be related to a particular >>>>> card/model/manufacturer, >>>>> that might not fix it, either. >>>>> >>>>> This problem seems to only occur on machines that operate as a xen >>>>> server. >>>>> Other machines, identical Z800s, not running xen, run just fine. >>>>> >>>>> What would you suggest? >>>> >>>> More info required: >>>> >>>> - Which version of Xen >>> >>> 4.5.1 >>> >>> Installed versions: 4.5.1^t(02:44:35 PM 07/14/2015)(-custom-cflags >-debug >>> -efi -flask -xsm) >> >> Ok, recent one. >> >>>> - Does this only occur with HVM guests? >>> >>> The host has been running only HVM guests every time it happend. >>> It was running a PV guest in between (which I had to shut down >>> because other VMs were migrated, requiring the RAM). >> >> The PV didn't have any issues? > >The whole server has the issue, not a particular VM. While the PV >guest >was running, the server didn't freeze. > >>>> - Which network-driver are you using inside the guest >>> >>> r8169, compiled as a module >>> >>> Same happened with the tg3 driver when the on-board cards were used. >>> The tg3 driver is completely disabled in the kernel config, i. e. >>> not even compiled as a module. >> >> You have network cards assigned to the guests? > >No, they are all connected via a bridge. > >I enabled STP on the bridge and the server was ok for a week, then had >to be restarted. I'm seeing lots of messages in the log: > > >Oct 28 11:14:05 moonflo kernel: brloc: topology change detected, >propagating >Oct 28 11:14:05 moonflo kernel: brloc: port 1(enp55s4) received tcn >bpdu >Oct 28 11:14:05 moonflo kernel: brloc: topology change detected, >propagating >Oct 28 11:14:05 moonflo kernel: brloc: port 1(enp55s4) received tcn >bpdu >Oct 28 11:14:05 moonflo kernel: brloc: topology change detected, >propagating >Oct 28 11:14:05 moonflo kernel: brloc: port 1(enp55s4) received tcn >bpdu >Oct 28 11:14:05 moonflo kernel: brloc: topology change detected, >propagating >Oct 28 11:14:05 moonflo kernel: brloc: port 1(enp55s4) received tcn >bpdu >Oct 28 11:14:05 moonflo kernel: brloc: topology change detected, >propagating > > >and sometimes: > >Oct 28 10:47:04 moonflo kernel: brloc: port 1(enp55s4) neighbor >8000.00:00:10:11:12:00 lost > > >Any idea what this means? > >(Google has gone on strike, and another search engine didn't give any >useful >findings ...) > > >>>> - Can you connect to the "local" console of the guest? >>> >>> Yes, the host seems to be running fine except for having no network >>> connectivity. There's a keyboard and monitor physically connected >to >>> it with which you can log in and do stuff. >> >> The HOST loses network connectivity? > >Yes. > >Apparently when it became unresponsive yesterday, it was not possible >to log in at the console, either. I wasn't there yesterday, though >I've >see that happen before. We tried to shut it down via acpid by pressing >the >power button. It didn't turn off, so it was switched off by holding the >power >button. What I can see in the log is: > > >Oct 28 14:12:33 moonflo logger[20322]: /etc/xen/scripts/block: remove >XENBUS_PATH=backend/vbd/2/768 >Oct 28 14:12:33 moonflo logger[20323]: /etc/xen/scripts/vif-bridge: >offline type_if=vif XENBUS_PATH=backend/vif/2/0 >Oct 28 14:12:33 moonflo logger[20347]: /etc/xen/scripts/vif-bridge: >brctl delif brloc vif2.0 failed >Oct 28 14:12:33 moonflo logger[20353]: /etc/xen/scripts/vif-bridge: >ifconfig vif2.0 down failed >Oct 28 14:12:33 moonflo logger[20361]: /etc/xen/scripts/vif-bridge: >Successful vif-bridge offline for vif2.0, bridge brloc. >Oct 28 14:12:33 moonflo logger[20372]: /etc/xen/scripts/vif-bridge: >remove type_if=tap XENBUS_PATH=backend/vif/2/0 >Oct 28 14:12:33 moonflo logger[20391]: /etc/xen/scripts/vif-bridge: >Successful vif-bridge remove for vif2.0-emu, bridge brloc. >Oct 28 14:15:33 moonflo shutdown[20476]: shutting down for system halt >^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Oct >28 14:17:34 moonflo syslog-ng[4611]: syslog-ng starting up; >version='3.6.2' > > >And: > > >Oct 24 11:47:42 moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): >transmit queue 0 timed out >Oct 24 11:47:42 moonflo kernel: Modules linked in: xt_physdev >br_netfilter iptable_filter ip_tables xen_pciback xen_gntalloc >xen_gntdev bridge stp llc zfs(PO) zuni >code(PO) zavl(PO) zcommon(PO) znvpair(PO) nouveau snd_hda_codec_realtek >snd_hda_codec_generic video spl(O) backlight zlib_deflate >drm_kms_helper snd_hda_intel snd_ >hda_controller snd_hda_codec snd_pcm snd_timer r8169 snd ttm soundcore >mii xts aesni_intel glue_helper lrw gf128mul ablk_helper cryptd >aes_x86_64 sha256_generic hi >d_generic usbhid uhci_hcd usb_storage ehci_pci ehci_hcd usbcore >usb_common >Oct 24 11:47:42 moonflo kernel: CPU: 12 PID: 0 Comm: swapper/12 >Tainted: P O 4.0.5-gentoo #3 >Oct 24 11:47:42 moonflo kernel: Hardware name: Hewlett-Packard HP Z800 >Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 >Oct 24 11:47:42 moonflo kernel: ffffffff8175a77d ffff880124d83d98 >ffffffff814da8d8 0000000000000001 >Oct 24 11:47:42 moonflo kernel: ffff880124d83de8 ffff880124d83dd8 >ffffffff81088850 ffff880124d83e68 >Oct 24 11:47:42 moonflo kernel: 0000000000000000 ffff88011efd8000 >0000000000000001 ffff8800d4eb5e80 >Oct 24 11:47:42 moonflo kernel: Call Trace: >Oct 24 11:47:42 moonflo kernel: <IRQ> [<ffffffff814da8d8>] >dump_stack+0x45/0x57 >Oct 24 11:47:42 moonflo kernel: [<ffffffff81088850>] >warn_slowpath_common+0x80/0xc0 >Oct 24 11:47:42 moonflo kernel: [<ffffffff810888d1>] >warn_slowpath_fmt+0x41/0x50 >Oct 24 11:47:42 moonflo kernel: [<ffffffff812b31c5>] ? >add_interrupt_randomness+0x35/0x1e0 >Oct 24 11:47:42 moonflo kernel: [<ffffffff8145b819>] >dev_watchdog+0x259/0x270 >Oct 24 11:47:42 moonflo kernel: [<ffffffff8145b5c0>] ? >dev_graft_qdisc+0x80/0x80 >Oct 24 11:47:42 moonflo kernel: [<ffffffff8145b5c0>] ? >dev_graft_qdisc+0x80/0x80 >Oct 24 11:47:42 moonflo kernel: [<ffffffff810d4047>] >call_timer_fn.isra.30+0x17/0x70 >Oct 24 11:47:42 moonflo kernel: [<ffffffff810d42a6>] >run_timer_softirq+0x176/0x2b0 >Oct 24 11:47:42 moonflo kernel: [<ffffffff8108bd0a>] >__do_softirq+0xda/0x1f0 >Oct 24 11:47:42 moonflo kernel: [<ffffffff8108c04e>] >irq_exit+0x7e/0xa0 >Oct 24 11:47:42 moonflo kernel: [<ffffffff8130e075>] >xen_evtchn_do_upcall+0x35/0x50 >Oct 24 11:47:42 moonflo kernel: [<ffffffff814e1e8e>] >xen_do_hypervisor_callback+0x1e/0x40 >Oct 24 11:47:42 moonflo kernel: <EOI> [<ffffffff810013aa>] ? >xen_hypercall_sched_op+0xa/0x20 >Oct 24 11:47:42 moonflo kernel: [<ffffffff810013aa>] ? >xen_hypercall_sched_op+0xa/0x20 >Oct 24 11:47:42 moonflo kernel: [<ffffffff810459e0>] ? >xen_safe_halt+0x10/0x20 >Oct 24 11:47:42 moonflo kernel: [<ffffffff81053979>] ? >default_idle+0x9/0x10 >Oct 24 11:47:42 moonflo kernel: [<ffffffff810542da>] ? >arch_cpu_idle+0xa/0x10 >Oct 24 11:47:42 moonflo kernel: [<ffffffff810bd170>] ? >cpu_startup_entry+0x190/0x2f0 >Oct 24 11:47:42 moonflo kernel: [<ffffffff81047cd5>] ? >cpu_bringup_and_idle+0x25/0x40 >Oct 24 11:47:42 moonflo kernel: ---[ end trace 320b6f98f8fc070f ]--- >Oct 24 11:47:42 moonflo kernel: r8169 0000:37:04.0 enp55s4: link up > > >That was two days before it went down. After that, messages about >topology changes >are starting to appear. > >I'm not sure if I should call this "progress" ;) > >> >>> You get no answer when you ping the host while it is unreachable. >>> >>>> - If yes, does it still have no connectivity? >>> >>> It has been restarted this morning when it was found to be >unreachable. >>> >>>> I saw the same on my lab machine, which was related to: >>>> - Not using correct drivers inside HVM guests >>> >>> There are Windoze 7 guests running that have PV drivers installed. >>> One of those has formerly been running on a VMware host and was >>> migrated on Tuesday. I deinstalled the VMware tools from it. >> >> Which PV drivers? > >Xen GPL PV Driver Developers >17.09.2014 >0.11.0.373 >Univention GmbH > >> And did you ensure all VMWare related drivers were removed? >> I am not convinced uninstalling the VMWare tools is sufficient. > >What would I need to look at to make sure they are removed? > >The problem has been there before the VM that had VMWare drivers >installed was migrated to this server. So I don't think they are >causing this problem. > > >>> Since Monday, a HVM Linux system (a modified 32-bit Debian) has also >>> been migrated from the VMware host to this one. I don't know if it >>> has VMware tools installed (I guess it does because it could be shut >>> down via VMware) and how those might react now. It's working, and I >>> don't want to touch it. >>> >>> However, the problem already occured before this migration, when the >>> on-board cards were still used. >>> >>>> - Switch hardware not keeping the MAC/IP/Port lists long enough >>> >>> What might be the reason for the lists becoming too short? Too many >>> devices connected to the network? >> >> No network activity for a while. (clean installs, nothing running) >> Switch forgetting the MAC-address assigned to the VM. >> >> Connecting to the VM-console, I could ping www.google.com and then >the >> connectivity re-appeared. > >Half of the switches have been replaced last week in order to track >down >what appears to be a weird network problem. The problem is that the >RDP >clients are being randomly stalled. If it was only that, I'd suspect >this >server some more, but the internet connection goes through the same >switches >and is apprently also slowed down when the RPD clients are stalled. >They >got also randomly stalled when the RDP clients were accessing a totally >different server (the VMWare server), so this might be entirely >unrelated. > >Replacing the switches didn't fix the problem, so I'll probably put >them >back into service and replace the other half. > >>> The host has been connected to two different switches and showed the >>> problem. Previously, that was an 8-port 1Gb switch, now it's a >24-port >>> 1Gb switch. However, the 8-port switch is also connected to the >24-port >>> switch the host is now connected to. (The 24-port switch connects >it >>> "directly" to the rest of the network.) >> >> Assuming it's a managed switch, you could test this. >> Alternatively, check if you can access the VMs from the host. > >Good idea, I'll try that when it happens when I'm here. > >The network cards have arrived, Intel PRO 1000 dual port, made for IBM. >I hope I get to swap the card today. Those *really* should work. > >Hm, I could plug in two of them and give each VM and the host its own >physical card. Do you think that might help? Quick reply from mobile. Will give a more detailed one later. Noticed you are using ZFS. Where is your swap partition located? On ZFS or? -- Joost -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [gentoo-user] Networking trouble 2015-10-29 17:25 ` J. Roeleveld @ 2015-10-30 9:34 ` hw 2015-11-05 12:51 ` [gentoo-user] Re: update xen networking trouble hw 0 siblings, 1 reply; 8+ messages in thread From: hw @ 2015-10-30 9:34 UTC (permalink / raw To: gentoo-user J. Roeleveld wrote: > Quick reply from mobile. > Will give a more detailed one later. > > Noticed you are using ZFS. Where is your swap partition located? > > On ZFS or? Swap for dom0 is on a mdraid partition. Dom0 has 4GB RAM because it's supposed to be used for making backups once I get to set that up and is not swapping. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [gentoo-user] Re: update xen networking trouble 2015-10-30 9:34 ` hw @ 2015-11-05 12:51 ` hw 0 siblings, 0 replies; 8+ messages in thread From: hw @ 2015-11-05 12:51 UTC (permalink / raw To: gentoo-user hw wrote: > J. Roeleveld wrote: >> Quick reply from mobile. >> Will give a more detailed one later. >> >> Noticed you are using ZFS. Where is your swap partition located? >> >> On ZFS or? > > Swap for dom0 is on a mdraid partition. Dom0 has 4GB RAM because it's > supposed to be used for making backups once I get to set that up and is > not swapping. > > Update: I updated the software, and the next morning the server was down again. Yesterday, I pulled the disks and put it into another, identical HP Z800 which so far has been running without problems. It didn't go down yet, and there is one peculiarity: Nov 5 06:30:01 moonflo CROND[28291]: (root) CMD ([ ! -x /etc/cron.hourly/0anacron ] && { test -x /usr/sbin/run-crons && /usr/sbin/run-crons ; }) Nov 5 06:45:23 moonflo syslog-ng[4142]: syslog-ng starting up; version='3.7.1' Nov 5 06:45:24 moonflo acpid[4167]: starting up with netlink and the input layer Nov 5 06:45:24 moonflo acpid[4167]: 1 rule loaded Nov 5 06:45:24 moonflo acpid[4167]: waiting for events: event logging is off Nov 5 06:45:24 moonflo crond[4188]: (CRON) STARTUP (1.5.0) Nov 5 06:45:24 moonflo crond[4188]: (CRON) INFO (RANDOM_DELAY will be scaled with factor 23% if used.) Nov 5 06:45:24 moonflo crond[4188]: (CRON) INFO (running with inotify support) Nov 5 06:45:32 moonflo kernel: bridge: automatic filtering via arp/ip/ip6tables has been deprecated. Update your scripts to load br_netfilter if you need this. Nov 5 06:45:32 moonflo kernel: device enp3s0f0 entered promiscuous mode Nov 5 06:45:32 moonflo kernel: device brloc entered promiscuous mode Nov 5 06:45:32 moonflo kernel: xen_pciback: backend is vpci Nov 5 06:45:26 moonflo xenstored[4706]: Checking store ... Nov 5 06:45:26 moonflo xenstored[4706]: Checking store complete. Nov 5 06:45:26 moonflo xenstored[4706]: Checking store ... Nov 5 06:45:26 moonflo xenstored[4706]: Checking store complete. Nov 5 06:45:32 moonflo kernel: e1000e: enp3s0f0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Nov 5 06:45:32 moonflo kernel: brloc: port 1(enp3s0f0) entered forwarding state Nov 5 06:45:32 moonflo kernel: brloc: port 1(enp3s0f0) entered forwarding state Nov 5 06:45:27 moonflo ntpd[4732]: ntpd 4.2.8p3@1.3265-o Tue Jul 14 16:33:57 UTC 2015 (1): Starting Nov 5 06:45:27 moonflo ntpd[4732]: Command line: /usr/sbin/ntpd -p /var/run/ntpd.pid -g Nov 5 06:45:27 moonflo ntpd[4733]: proto: precision = 0.107 usec (-23) Nov 5 06:45:27 moonflo ntpd[4733]: Listen and drop on 0 v4wildcard 0.0.0.0:123 Nov 5 06:45:27 moonflo ntpd[4733]: Listen normally on 1 lo 127.0.0.1:123 Nov 5 06:45:27 moonflo ntpd[4733]: Listen normally on 2 brloc 192.168.220.193:123 Nov 5 06:45:27 moonflo ntpd[4733]: Listening on routing socket on fd #19 for interface updates Nov 5 06:45:27 moonflo sshd[4762]: Server listening on 0.0.0.0 port 22. Nov 5 06:45:28 moonflo root[4830]: /etc/xen/scripts/block: add XENBUS_PATH=backend/vbd/1/768 Nov 5 06:45:28 moonflo root[4882]: /etc/xen/scripts/vif-bridge: online type_if=vif XENBUS_PATH=backend/vif/1/0 Nov 5 06:45:34 moonflo kernel: device vif1.0 entered promiscuous mode Nov 5 06:45:34 moonflo kernel: ip_tables: (C) 2000-2006 Netfilter Core Team Nov 5 06:45:34 moonflo kernel: Bridge firewalling registered Nov 5 06:45:28 moonflo root[4916]: /etc/xen/scripts/vif-bridge: Successful vif-bridge online for vif1.0, bridge brloc. Nov 5 06:45:28 moonflo root[4917]: /etc/xen/scripts/vif-bridge: Writing backend/vif/1/0/hotplug-status connected to xenstore. Nov 5 06:45:28 moonflo root[4929]: /etc/xen/scripts/vif-bridge: add type_if=tap XENBUS_PATH=backend/vif/1/0 Nov 5 06:45:34 moonflo kernel: device vif1.0-emu entered promiscuous mode [...] Why do I get a message that the link is up? There is no message that the link went down. I checked another server which is connected to the same switch, and the link on the other server didn't go down and didn't come up, i. e. it was persistent. 6:45 is a suspicious time, but I haven't found any cron job which might do something with the network card. What might have happened here? Could this be some event which made the other HP Z800 go down? moonflo ~ # brctl show bridge name bridge id STP enabled interfaces brloc 8000.001517ebbdb4 no enp3s0f0 vif1.0 vif1.0-emu vif2.0 vif2.0-emu vif3.0 vif3.0-emu vif4.0 vif4.0-emu vif5.0 vif5.0-emu moonflo ~ # ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-11-05 12:51 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-10-15 13:30 [gentoo-user] Networking trouble hw 2015-10-15 13:54 ` J. Roeleveld 2015-10-15 15:46 ` hw 2015-10-16 5:32 ` J. Roeleveld 2015-10-29 10:29 ` hw 2015-10-29 17:25 ` J. Roeleveld 2015-10-30 9:34 ` hw 2015-11-05 12:51 ` [gentoo-user] Re: update xen networking trouble hw
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox