Re: [gentoo-user] Networking trouble

public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed

From: "J. Roeleveld" <joost@antarean.org>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] Networking trouble
Date: Fri, 16 Oct 2015 07:32:02 +0200	[thread overview]
Message-ID: <2569546.9sPlulUjpb@andromeda> (raw)
In-Reply-To: <561FCA3F.8090803@gc-24.de>

On Thursday, October 15, 2015 05:46:07 PM hw wrote:
> J. Roeleveld wrote:
> > On Thursday, October 15, 2015 03:30:01 PM hw wrote:
> >> Hi,
> >> 
> >> I have a xen host with some HV guests which becomes unreachable via
> >> the network after apparently random amount of times.  I have already
> >> switched the network card to see if that would make a difference,
> >> and with the card currently installed, it worked fine for over 20 days
> >> until it become unreachable again.  Before switching the network card,
> >> it would run a week or two before becoming unreachable.  The previous
> >> card was the on-board BCM5764M which uses the tg3 driver.
> >> 
> >> There are messages like this in the log file:
> >> 
> >> 
> >> Oct 14 20:58:02 moonflo kernel: ------------[ cut here ]------------
> >> Oct 14 20:58:02 moonflo kernel: WARNING: CPU: 10 PID: 0 at
> >> net/sched/sch_generic.c:303 dev_watchdog+0x259/0x270() Oct 14 20:58:02
> >> moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit queue 0 timed
> >> out Oct 14 20:58:02 moonflo kernel: Modules linked in: arc4 ecb md4 hmac
> >> nls_utf8 cifs fscache xt_physdev br_netfilter iptable_filter ip_tables
> >> xen_pciback xen_gntalloc xen_gntdev bridge stp llc zfs(PO) nouveau
> >> snd_hda_codec_realtek snd_hda_codec_generic zunicode(PO) zavl(PO)
> >> zcommon(PO) znvpair(PO) spl(O) zlib_deflate video backlight
> >> drm_kms_helper
> >> ttm snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm snd_timer snd
> >> soundcore r8169 mii xts aesni_intel glue_helper lrw gf128mul ablk_helper
> >> cryptd aes_x86_64 sha256_generic hid_generic usbhid uhci_hcd usb_storage
> >> ehci_pci ehci_hcd usbcore usb_common Oct 14 20:58:02 moonflo kernel: CPU:
> >> 10 PID: 0 Comm: swapper/10 Tainted: P           O    4.0.5-gentoo #3 Oct
> >> 14
> >> 20:58:02 moonflo kernel: Hardware name: Hewlett-Packard HP Z800
> >> Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 Oct 14 20:58:02 moonflo
> >> kernel:  ffffffff8175a77d ffff880124d43d98 ffffffff814da8d8
> >> 0000000000000001 Oct 14 20:58:02 moonflo kernel:  ffff880124d43de8
> >> ffff880124d43dd8 ffffffff81088850 ffff880124d43dd8 Oct 14 20:58:02
> >> moonflo
> >> kernel:  0000000000000000 ffff8800d45f2000 0000000000000001
> >> ffff8800d5294880 Oct 14 20:58:02 moonflo kernel: Call Trace:
> >> Oct 14 20:58:02 moonflo kernel:  <IRQ>  [<ffffffff814da8d8>]
> >> dump_stack+0x45/0x57 Oct 14 20:58:02 moonflo kernel: 
> >> [<ffffffff81088850>]
> >> warn_slowpath_common+0x80/0xc0 Oct 14 20:58:02 moonflo kernel:
> >> [<ffffffff810888d1>] warn_slowpath_fmt+0x41/0x50 Oct 14 20:58:02 moonflo
> >> kernel:  [<ffffffff812b31c5>] ? add_interrupt_randomness+0x35/0x1e0 Oct
> >> 14
> >> 20:58:02 moonflo kernel:  [<ffffffff8145b819>] dev_watchdog+0x259/0x270
> >> Oct
> >> 14 20:58:02 moonflo kernel:  [<ffffffff8145b5c0>] ?
> >> dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel:
> >> [<ffffffff8145b5c0>] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo
> >> kernel:  [<ffffffff810d4047>] call_timer_fn.isra.30+0x17/0x70 Oct 14
> >> 20:58:02 moonflo kernel:  [<ffffffff810d42a6>]
> >> run_timer_softirq+0x176/0x2b0 Oct 14 20:58:02 moonflo kernel:
> >> [<ffffffff8108bd0a>] __do_softirq+0xda/0x1f0 Oct 14 20:58:02 moonflo
> >> kernel:  [<ffffffff8108c04e>] irq_exit+0x7e/0xa0 Oct 14 20:58:02 moonflo
> >> kernel:  [<ffffffff8130e075>] xen_evtchn_do_upcall+0x35/0x50 Oct 14
> >> 20:58:02 moonflo kernel:  [<ffffffff814e1e8e>]
> >> xen_do_hypervisor_callback+0x1e/0x40 Oct 14 20:58:02 moonflo kernel: 
> >> <EOI>
> >> 
> >>   [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 20:58:02
> >> 
> >> moonflo kernel:  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
> >> Oct
> >> 14 20:58:02 moonflo kernel:  [<ffffffff810459e0>] ?
> >> xen_safe_halt+0x10/0x20
> >> Oct 14 20:58:02 moonflo kernel:  [<ffffffff81053979>] ?
> >> default_idle+0x9/0x10 Oct 14 20:58:02 moonflo kernel: 
> >> [<ffffffff810542da>]
> >> ? arch_cpu_idle+0xa/0x10 Oct 14 20:58:02 moonflo kernel:
> >> [<ffffffff810bd170>] ? cpu_startup_entry+0x190/0x2f0 Oct 14 20:58:02
> >> moonflo kernel:  [<ffffffff81047cd5>] ? cpu_bringup_and_idle+0x25/0x40
> >> Oct
> >> 14 20:58:02 moonflo kernel: ---[ end trace 98d961bae351244d ]--- Oct 14
> >> 20:58:02 moonflo kernel: r8169 0000:37:04.0 enp55s4: link up
> >> 
> >> 
> >> After that, there are lots of messages about the link being up, one
> >> message
> >> every 12 seconds.  When you unplug the network cable, you get a message
> >> that the link is down, and no message when you plug it in again.
> >> 
> >> I was hoping that switching the network card (to one that uses a
> >> different
> >> driver) might solve the problem, and it did not.  Now I can only guess
> >> that
> >> the network card goes to sleep and sometimes cannot be woken up again.
> >> 
> >> I tried to reduce the connection speed to 100Mbit and found that
> >> accessing
> >> the VMs (via RDP) becomes too slow to use them.  So I disabled the power
> >> management of the network card (through sysfs) and will have to see if
> >> the
> >> problem persists.
> >> 
> >> We'll be getting decent network cards in a couple days, but since the
> >> problem doesn't seem to be related to a particular
> >> card/model/manufacturer,
> >> that might not fix it, either.
> >> 
> >> This problem seems to only occur on machines that operate as a xen
> >> server.
> >> Other machines, identical Z800s, not running xen, run just fine.
> >> 
> >> What would you suggest?
> > 
> > More info required:
> > 
> > - Which version of Xen
> 
> 4.5.1
> 
> Installed versions:  4.5.1^t(02:44:35 PM 07/14/2015)(-custom-cflags -debug
> -efi -flask -xsm)

Ok, recent one.

> > - Does this only occur with HVM guests?
> 
> The host has been running only HVM guests every time it happend.
> It was running a PV guest in between (which I had to shut down
> because other VMs were migrated, requiring the RAM).

The PV didn't have any issues?

> > - Which network-driver are you using inside the guest
> 
> r8169, compiled as a module
>
> Same happened with the tg3 driver when the on-board cards were used.
> The tg3 driver is completely disabled in the kernel config, i. e.
> not even compiled as a module.

You have network cards assigned to the guests?

> > - Can you connect to the "local" console of the guest?
> 
> Yes, the host seems to be running fine except for having no network
> connectivity.  There's a keyboard and monitor physically connected to
> it with which you can log in and do stuff.

The HOST loses network connectivity?

> You get no answer when you ping the host while it is unreachable.
> 
> > - If yes, does it still have no connectivity?
> 
> It has been restarted this morning when it was found to be unreachable.
>
> > I saw the same on my lab machine, which was related to:
> > - Not using correct drivers inside HVM guests
> 
> There are Windoze 7 guests running that have PV drivers installed.
> One of those has formerly been running on a VMware host and was
> migrated on Tuesday.  I deinstalled the VMware tools from it.

Which PV drivers?
And did you ensure all VMWare related drivers were removed?
I am not convinced uninstalling the VMWare tools is sufficient.

> Since Monday, a HVM Linux system (a modified 32-bit Debian) has also
> been migrated from the VMware host to this one.  I don't know if it
> has VMware tools installed (I guess it does because it could be shut
> down via VMware) and how those might react now.  It's working, and I
> don't want to touch it.
> 
> However, the problem already occured before this migration, when the
> on-board cards were still used.
>
> > - Switch hardware not keeping the MAC/IP/Port lists long enough
> 
> What might be the reason for the lists becoming too short?  Too many
> devices connected to the network?

No network activity for a while. (clean installs, nothing running)
Switch forgetting the MAC-address assigned to the VM.

Connecting to the VM-console, I could ping www.google.com and then the 
connectivity re-appeared.

> The host has been connected to two different switches and showed the
> problem.  Previously, that was an 8-port 1Gb switch, now it's a 24-port
> 1Gb switch.  However, the 8-port switch is also connected to the 24-port
> switch the host is now connected to.  (The 24-port switch connects it
> "directly" to the rest of the network.)

Assuming it's a managed switch, you could test this.
Alternatively, check if you can access the VMs from the host.

--
Joost

next prev parent reply	other threads:[~2015-10-16  5:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-15 13:30 [gentoo-user] Networking trouble hw
2015-10-15 13:54 ` J. Roeleveld
2015-10-15 15:46   ` hw
2015-10-16  5:32     ` J. Roeleveld [this message]
2015-10-29 10:29       ` hw
2015-10-29 17:25         ` J. Roeleveld
2015-10-30  9:34           ` hw
2015-11-05 12:51             ` [gentoo-user] Re: update xen networking trouble hw

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2569546.9sPlulUjpb@andromeda \
    --to=joost@antarean.org \
    --cc=gentoo-user@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox