From: thegeezer <thegeezer@thegeezer.net>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] Strange network behaviour: NIC goes down, DHCP lease renewal fails
Date: Sat, 07 Mar 2015 10:04:58 +0000 [thread overview]
Message-ID: <54FACD4A.2080302@thegeezer.net> (raw)
In-Reply-To: <20150305104625.2d88242a@marcec.fritz.box>
On 05/03/15 09:46, Marc Joliet wrote:
> Hi all,
>
> at work I'm (well, *we* are) facing an interesting problem. Since we are sort
> of stabbing in the dark here, I thought I'd ask here. Also, since this is from
> work, I will not be able to diverge very many details (not to mention that as a
> student worker I simply don't *know* many details). However, I do have
> permission from my boss to ask about this in an anonymised fashion.
>
> The symptom we're seeing is that the NIC goes down and DHCP packets stop getting
> through after a certain amount of time. What happens is:
>
> 1.) The NIC is brought up (some built-in Intel model).
>
> 2.) A DHCP client configures it.
>
> 3.) The network connection is lost at some point (the amount of time this takes
> varies, but it can be as little as 20 minutes).
>
> 4.) Eventually the lease runs out and the DHCP client tries to renew it, but
> gets no response. Sometimes, after many hours (at least 6), it will get a
> DHCPACK, but that's it. One of our sysadmins says that not only does
> the DHCP server never see the packets, but the managed switch that the PC
> is directly attached to *also* never does (again, except for when the
> occasional DHCPACK comes).
>
> 4.) Restart the network device. A reboot is not required, but it is necessary
> to terminate the DHCP client. After that everything works again.
>
> 5.) GOTO 3.
>
> (Note that I have observed that steps 3 and 4 do not necessarily occur in
> order.)
>
> This has been rather baffling, since this problem is limited to 3 computers.
>
> One of them (the longest running) runs Gentoo, courtesy of me. This is the
> first one we saw the problem with. Since we couldn't figure it out (switching
> from dhcpcd to dhclient, turning off the firewall, monitoring with tcpdump,
> etc., all with help from one of our sysadmins; Google, too, of course), Gentoo
> was "blamed", so we got a replacement PC with Fedora 20 on it, which *also*
> showed this behaviour. Both PCs run some special software (some of it mine).
> Thus, at some point this software was "blamed".
>
> So we started experimenting: we configured the Fedora PC to *not* start the
> special software, and have not seen any problems all week. Yesterday afternoon
> I then started *one* of the programs, and had not seen any problems yet by the
> time I went home.
>
> So that would speak *for* that theory, right? Well, for comparison, my boss
> recently started running a separate PC, also with a bog-standard Fedora 20.
> Guess what: it *also* shows the *exact* same behaviour as the other two PCs
> ("journalctl -u NetworkManager" shows pages upon pages of unanswered
> DHCPREQUESTs, with the occasional response thrown in). Note here that this PC
> is on a different switch and in a different VLAN.
>
> The choice of Fedora comes from the fact that we use a Fedora based distro
> internally, so it is "known". PCs running it have *not* shown the behaviour
> above (AFAIK not even *once*). Thus, one of the few things I can think of is
> finding out what is different about them relative to the standard Fedora.
>
> Right now my main ideas on what the culprit could be are:
>
> - The computers' kernel/network device is improperly configured. That is,
> maybe special configuration is needed for the computers to work properly as
> clients in the network. I'm thinking of support for some (from my
> perspective) obscure protocol(s).
>
> - It's a network problem. The three computers are in two different VLANs,
> while the workplace computers running the internal Fedora based distro are in
> a third (the main network that all the normal Windows and Linux workstations
> are connected to). However, they are on the same switch as the two computers
> running my software. One argument against this is that the Windows PC that
> runs on the same VLAN does *not* have any problems like this.
>
> One of the other ideas I had was faulty power management, and I did read of
> problems of the sort regarding the exact same network card that is in the old
> Gentoo machine on an HP support forum (from around 2008). However, the local
> sysadmin said that they have had nothing but good experience with those network
> cards. Also: *three* computers with NIC power management problems? That sounds
> a bit far-fetched to me. Nevertheless, I am not fully discounting the
> possibility.
>
> You can imagine how confusing and frustrating this is.
>
> So, has anybody here ever experienced something like this? Any ideas on what
> could be the cause?
>
> Greetings
Howdy
i've seen this before but not with the nic down event
the problem was old managed alcatel switches combined with questionable
wiring
in my case it was reversed, the gentoo box was providing the dhcp but
then suddenly nothing got dhcp responses
power cycling the switch was a temporary fix
updating the switch firmware helped a lot - went from a daily occurence
to weekly occurence
i'd have a word with the network team and have them verify through port
mirroring
1. the dhcp server is sending packets out and they are being received on
the switchport it is connected
2. the packet is also being sent out on the correct port
what they will probably discover is an issue with the mac tables /
switching and have to bounce the ports / the switch
forcing the up/down on the dhcp server also seemed to help on occasion
good luck - if you find the resolution is something else please do let
me know as i'd love to find out what the issue might have been if not
the switch!
prev parent reply other threads:[~2015-03-07 10:07 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-05 9:46 [gentoo-user] Strange network behaviour: NIC goes down, DHCP lease renewal fails Marc Joliet
2015-03-05 18:33 ` Todd Goodman
2015-03-05 21:19 ` Mick
2015-03-05 21:46 ` Marc Joliet
2015-03-06 7:15 ` Mick
2015-03-05 21:38 ` Marc Joliet
2015-03-06 6:01 ` Alan McKinnon
2015-03-06 18:45 ` [gentoo-user] Strange network behaviour: NIC goes down, DHCP lease renewal fails (WORKED AROUND) Marc Joliet
2015-03-06 19:35 ` Alan McKinnon
2015-03-06 19:57 ` Marc Joliet
2015-03-06 20:57 ` Daniel Frey
2015-03-07 10:04 ` thegeezer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54FACD4A.2080302@thegeezer.net \
--to=thegeezer@thegeezer.net \
--cc=gentoo-user@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox