From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id AF9B013888F for ; Thu, 15 Oct 2015 13:54:18 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 7F17D21C016; Thu, 15 Oct 2015 13:54:10 +0000 (UTC) Received: from gw2.antarean.org (gw2.antarean.org [141.105.125.208]) by pigeon.gentoo.org (Postfix) with ESMTP id 5B40D21C001 for ; Thu, 15 Oct 2015 13:54:09 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by gw2.antarean.org (Postfix) with ESMTP id AC02912601C for ; Thu, 15 Oct 2015 13:53:04 +0000 () X-Virus-Scanned: amavisd-new at antarean.org Received: from gw2.antarean.org ([127.0.0.1]) by localhost (gw2.antarean.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bQW6cSFtTEIZ for ; Thu, 15 Oct 2015 13:53:02 +0000 (%Z) Received: from data.antarean.org (localhost [127.0.0.1]) by gw2.antarean.org (Postfix) with ESMTP id 9981C12601A for ; Thu, 15 Oct 2015 13:53:02 +0000 () Received: from andromeda.localnet (unknown [10.20.13.200]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by data.antarean.org (Postfix) with ESMTPSA id D7D2D4C for ; Thu, 15 Oct 2015 15:52:12 +0200 (CEST) From: "J. Roeleveld" To: gentoo-user@lists.gentoo.org Subject: Re: [gentoo-user] Networking trouble Date: Thu, 15 Oct 2015 15:54:34 +0200 Message-ID: <1637330.AMsFmt32R0@andromeda> User-Agent: KMail/4.14.8 (Linux/4.0.5-gentoo; KDE/4.14.8; x86_64; ; ) In-Reply-To: <561FAA59.5080707@gc-24.de> References: <561FAA59.5080707@gc-24.de> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Archives-Salt: a4629417-3ffa-4b5a-87e8-9b0b2da9b029 X-Archives-Hash: e22a9cd5f38013c65e779640c8c354bc On Thursday, October 15, 2015 03:30:01 PM hw wrote: > Hi, > > I have a xen host with some HV guests which becomes unreachable via > the network after apparently random amount of times. I have already > switched the network card to see if that would make a difference, > and with the card currently installed, it worked fine for over 20 days > until it become unreachable again. Before switching the network card, > it would run a week or two before becoming unreachable. The previous > card was the on-board BCM5764M which uses the tg3 driver. > > There are messages like this in the log file: > > > Oct 14 20:58:02 moonflo kernel: ------------[ cut here ]------------ > Oct 14 20:58:02 moonflo kernel: WARNING: CPU: 10 PID: 0 at > net/sched/sch_generic.c:303 dev_watchdog+0x259/0x270() Oct 14 20:58:02 > moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit queue 0 timed > out Oct 14 20:58:02 moonflo kernel: Modules linked in: arc4 ecb md4 hmac > nls_utf8 cifs fscache xt_physdev br_netfilter iptable_filter ip_tables > xen_pciback xen_gntalloc xen_gntdev bridge stp llc zfs(PO) nouveau > snd_hda_codec_realtek snd_hda_codec_generic zunicode(PO) zavl(PO) > zcommon(PO) znvpair(PO) spl(O) zlib_deflate video backlight drm_kms_helper > ttm snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm snd_timer snd > soundcore r8169 mii xts aesni_intel glue_helper lrw gf128mul ablk_helper > cryptd aes_x86_64 sha256_generic hid_generic usbhid uhci_hcd usb_storage > ehci_pci ehci_hcd usbcore usb_common Oct 14 20:58:02 moonflo kernel: CPU: > 10 PID: 0 Comm: swapper/10 Tainted: P O 4.0.5-gentoo #3 Oct 14 > 20:58:02 moonflo kernel: Hardware name: Hewlett-Packard HP Z800 > Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 Oct 14 20:58:02 moonflo > kernel: ffffffff8175a77d ffff880124d43d98 ffffffff814da8d8 > 0000000000000001 Oct 14 20:58:02 moonflo kernel: ffff880124d43de8 > ffff880124d43dd8 ffffffff81088850 ffff880124d43dd8 Oct 14 20:58:02 moonflo > kernel: 0000000000000000 ffff8800d45f2000 0000000000000001 > ffff8800d5294880 Oct 14 20:58:02 moonflo kernel: Call Trace: > Oct 14 20:58:02 moonflo kernel: [] > dump_stack+0x45/0x57 Oct 14 20:58:02 moonflo kernel: [] > warn_slowpath_common+0x80/0xc0 Oct 14 20:58:02 moonflo kernel: > [] warn_slowpath_fmt+0x41/0x50 Oct 14 20:58:02 moonflo > kernel: [] ? add_interrupt_randomness+0x35/0x1e0 Oct 14 > 20:58:02 moonflo kernel: [] dev_watchdog+0x259/0x270 Oct > 14 20:58:02 moonflo kernel: [] ? > dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel: > [] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo > kernel: [] call_timer_fn.isra.30+0x17/0x70 Oct 14 > 20:58:02 moonflo kernel: [] > run_timer_softirq+0x176/0x2b0 Oct 14 20:58:02 moonflo kernel: > [] __do_softirq+0xda/0x1f0 Oct 14 20:58:02 moonflo > kernel: [] irq_exit+0x7e/0xa0 Oct 14 20:58:02 moonflo > kernel: [] xen_evtchn_do_upcall+0x35/0x50 Oct 14 > 20:58:02 moonflo kernel: [] > xen_do_hypervisor_callback+0x1e/0x40 Oct 14 20:58:02 moonflo kernel: > [] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 20:58:02 > moonflo kernel: [] ? xen_hypercall_sched_op+0xa/0x20 Oct > 14 20:58:02 moonflo kernel: [] ? xen_safe_halt+0x10/0x20 > Oct 14 20:58:02 moonflo kernel: [] ? > default_idle+0x9/0x10 Oct 14 20:58:02 moonflo kernel: [] > ? arch_cpu_idle+0xa/0x10 Oct 14 20:58:02 moonflo kernel: > [] ? cpu_startup_entry+0x190/0x2f0 Oct 14 20:58:02 > moonflo kernel: [] ? cpu_bringup_and_idle+0x25/0x40 Oct > 14 20:58:02 moonflo kernel: ---[ end trace 98d961bae351244d ]--- Oct 14 > 20:58:02 moonflo kernel: r8169 0000:37:04.0 enp55s4: link up > > > After that, there are lots of messages about the link being up, one message > every 12 seconds. When you unplug the network cable, you get a message that > the link is down, and no message when you plug it in again. > > I was hoping that switching the network card (to one that uses a different > driver) might solve the problem, and it did not. Now I can only guess that > the network card goes to sleep and sometimes cannot be woken up again. > > I tried to reduce the connection speed to 100Mbit and found that accessing > the VMs (via RDP) becomes too slow to use them. So I disabled the power > management of the network card (through sysfs) and will have to see if the > problem persists. > > We'll be getting decent network cards in a couple days, but since the > problem doesn't seem to be related to a particular card/model/manufacturer, > that might not fix it, either. > > This problem seems to only occur on machines that operate as a xen server. > Other machines, identical Z800s, not running xen, run just fine. > > What would you suggest? More info required: - Which version of Xen - Does this only occur with HVM guests? - Which network-driver are you using inside the guest - Can you connect to the "local" console of the guest? - If yes, does it still have no connectivity? I saw the same on my lab machine, which was related to: - Not using correct drivers inside HVM guests - Switch hardware not keeping the MAC/IP/Port lists long enough -- Joost