* [gentoo-user] Serious stability problems, including freezes
@ 2009-06-02 20:40 Alexander Puchmayr
2009-06-03 6:17 ` Florian Philipp
0 siblings, 1 reply; 9+ messages in thread
From: Alexander Puchmayr @ 2009-06-02 20:40 UTC (permalink / raw
To: gentoo-user
Hi there!
My freshly setup homeserver has serious stability problem, which make the
the machine absolutely unuseable as home server.
The system is an amd64 (5050e@2600MHz, no overclocking) with 4GB Ram and
three 1TB SATA disks bundled to a raid5 (kernel md driver).
The kernels I've tried are gentoo-2.6.28-r, gentoo-2.6.29-r5 and
vanilla-2.6.29.4.
The problem is, when I copy large amount of data via nfs to the server, then
the logs get filled with entries like this:
--- Start extraction of /var/log/everything/current
[13010.203643] kswapd0: page allocation failure. order:1, mode:0x4020
Jun 2 21:40:43 [kernel] [13010.203653] Pid: 346, comm: kswapd0 Tainted: G
W 2.6.29.4 #2
Jun 2 21:40:43 [kernel] [13010.203657] Call Trace:
Jun 2 21:40:43 [kernel] [13010.203662] <IRQ> [<ffffffff810c887e>]
__alloc_pages_internal+0x3ce/0x4f0
Jun 2 21:40:43 [kernel] [13010.203690] [<ffffffff810f2d62>] ?
unfreeze_slab+0x92/0xf0
Jun 2 21:40:43 [kernel] [13010.203697] [<ffffffff810f3bc6>]
__slab_alloc+0x226/0x570
Jun 2 21:40:43 [kernel] [13010.203706] [<ffffffff814018df>] ?
__netdev_alloc_skb+0x1f/0x40
Jun 2 21:40:43 [kernel] [13010.203714] [<ffffffff810f4f4f>]
__kmalloc_track_caller+0x10f/0x150
Jun 2 21:40:43 [kernel] [13010.203720] [<ffffffff814018df>] ?
__netdev_alloc_skb+0x1f/0x40
Jun 2 21:40:43 [kernel] [13010.203726] [<ffffffff8140158e>]
__alloc_skb+0x6e/0x140
Jun 2 21:40:43 [kernel] [13010.203732] [<ffffffff814018df>]
__netdev_alloc_skb+0x1f/0x40
Jun 2 21:40:43 [kernel] [13010.203743] [<ffffffff812f6c32>]
rtl8169_rx_fill+0xc2/0x1f0
Jun 2 21:40:43 [kernel] [13010.203750] [<ffffffff812f6f80>]
rtl8169_rx_interrupt+0x220/0x4e0
Jun 2 21:40:43 [kernel] [13010.203757] [<ffffffff812f73c0>]
rtl8169_poll+0x40/0x220
Jun 2 21:40:43 [kernel] [13010.203765] [<ffffffff81407c99>]
net_rx_action+0xa9/0x150
Jun 2 21:40:43 [kernel] [13010.203775] [<ffffffff8106d95c>]
__do_softirq+0x9c/0x160
Jun 2 21:40:43 [kernel] [13010.203783] [<ffffffff8102b87c>]
call_softirq+0x1c/0x30
Jun 2 21:40:43 [kernel] [13010.203789] [<ffffffff8102cbb5>]
do_softirq+0x65/0xb0
Jun 2 21:40:43 [kernel] [13010.203796] [<ffffffff8106d595>]
irq_exit+0x85/0xb0
Jun 2 21:40:43 [kernel] [13010.203801] [<ffffffff8102ce6a>]
do_IRQ+0xba/0x1b0
Jun 2 21:40:43 [kernel] [13010.203808] [<ffffffff8102b0d3>]
ret_from_intr+0x0/0xf
Jun 2 21:40:43 [kernel] [13010.203811] <EOI> [<ffffffff815144db>] ?
_spin_unlock_irq+0x2b/0x40
Jun 2 21:40:43 [kernel] [13010.203825] [<ffffffff815144e0>] ?
_spin_unlock_irq+0x30/0x40
Jun 2 21:40:43 [kernel] [13010.203834] [<ffffffff810ceba0>] ?
__remove_mapping+0xd0/0x100
Jun 2 21:40:43 [kernel] [13010.203841] [<ffffffff810cf560>] ?
shrink_page_list+0x380/0x7f0
Jun 2 21:40:43 [kernel] [13010.203848] [<ffffffff810cfc30>] ?
shrink_list+0x260/0x650
Jun 2 21:40:43 [kernel] [13010.203856] [<ffffffff810d0293>] ?
shrink_zone+0x273/0x380
Jun 2 21:40:43 [kernel] [13010.203863] [<ffffffff810d04e7>] ?
shrink_slab+0x147/0x180
Jun 2 21:40:43 [kernel] [13010.203870] [<ffffffff810d0c71>] ?
kswapd+0x751/0x7c0
Jun 2 21:40:43 [kernel] [13010.203877] [<ffffffff810ce080>] ?
isolate_pages_global+0x0/0x280
Jun 2 21:40:43 [kernel] [13010.203885] [<ffffffff8107f970>] ?
autoremove_wake_function+0x0/0x40
Jun 2 21:40:43 [kernel] [13010.203896] [<ffffffff810919bd>] ?
trace_hardirqs_on+0xd/0x10
Jun 2 21:40:43 [kernel] [13010.203903] [<ffffffff810d0520>] ?
kswapd+0x0/0x7c0
Jun 2 21:40:43 [kernel] [13010.203909] [<ffffffff8107f559>] ?
kthread+0x49/0x80
Jun 2 21:40:43 [kernel] [13010.203915] [<ffffffff8102b77a>] ?
child_rip+0xa/0x20
Jun 2 21:40:43 [kernel] [13010.203920] [<ffffffff8102b17c>] ?
restore_args+0x0/0x30
Jun 2 21:40:43 [kernel] [13010.203926] [<ffffffff8107f510>] ?
kthread+0x0/0x80
Jun 2 21:40:43 [kernel] [13010.203932] [<ffffffff8102b770>] ?
child_rip+0x0/0x20
Jun 2 21:40:43 [kernel] [13010.203936] Mem-Info:
Jun 2 21:40:43 [kernel] [13010.203939] DMA per-cpu:
Jun 2 21:40:43 [kernel] [13010.203943] CPU 0: hi: 0, btch: 1 usd:
0
Jun 2 21:40:43 [kernel] [13010.203947] CPU 1: hi: 0, btch: 1 usd:
0
Jun 2 21:40:43 [kernel] [13010.203950] DMA32 per-cpu:
Jun 2 21:40:43 [kernel] [13010.203954] CPU 0: hi: 186, btch: 31 usd:
180
Jun 2 21:40:43 [kernel] [13010.203958] CPU 1: hi: 186, btch: 31 usd:
182
Jun 2 21:40:43 [kernel] [13010.203961] Normal per-cpu:
Jun 2 21:40:43 [kernel] [13010.203964] CPU 0: hi: 186, btch: 31 usd:
155
Jun 2 21:40:43 [kernel] [13010.203968] CPU 1: hi: 186, btch: 31 usd:
167
Jun 2 21:40:43 [kernel] [13010.203974] Active_anon:270 active_file:11782
inactive_anon:2024
Jun 2 21:40:43 [kernel] [13010.203976] inactive_file:905625 unevictable:0
dirty:28068 writeback:0 unstable:0
Jun 2 21:40:43 [kernel] [13010.203978] free:15792 slab:50324 mapped:697
pagetables:413 bounce:0
Jun 2 21:40:43 [kernel] [13010.203986] DMA free:1960kB min:28kB low:32kB
high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB present:15152kB pages_scanned:0
all_unreclaimable? yes
Jun 2 21:40:43 [kernel] [13010.203992] lowmem_reserve[]: 0 3480 3949 3949
Jun 2 21:40:43 [kernel] [13010.204004] DMA32 free:60844kB min:7068kB
low:8832kB high:10600kB active_anon:580kB inactive_anon:384kB
active_file:30744kB inactive_file:3213088kB unevictable:0kB
present:3563552kB pages_scanned:32 all_unreclaimable? no
Jun 2 21:40:43 [kernel] [13010.204011] lowmem_reserve[]: 0 0 469 469
Jun 2 21:40:43 [kernel] [13010.204013] Normal free:364kB min:952kB
low:1188kB high:1428kB active_anon:500kB inactive_anon:7712kB
active_file:16384kB inactive_file:409412kB unevictable:0kB present:480960kB
pages_scanned:0 all_unreclaimable? no
Jun 2 21:40:43 [kernel] [13010.204013] lowmem_reserve[]: 0 0 0 0
Jun 2 21:40:43 [kernel] [13010.204013] 917432 total pagecache pages
Jun 2 21:40:43 [kernel] [13010.204013] 0 pages in swap cache
Jun 2 21:40:43 [kernel] [13010.204013] Swap cache stats: add 0, delete 0,
find 0/0
Jun 2 21:40:43 [kernel] [13010.204013] Free swap = 5879760kB
Jun 2 21:40:43 [kernel] [13010.204013] Total swap = 5879760kB
Jun 2 21:40:43 [kernel] [13010.204013] 1171440 pages RAM
Jun 2 21:40:43 [kernel] [13010.204013] 178617 pages reserved
Jun 2 21:40:43 [kernel] [13010.204013] 905214 pages shared
Jun 2 21:40:43 [kernel] [13010.204013] 73436 pages non-shared
--- End extraction of /var/log/everything/current
Sometimes its kswap0, nfsd, swapper, and a lot of other progs causing it.
However, the systems runs without any problem until it has enough physical
memory. If I copy larger amounts of data (e.g. typical dvd-iso images of
~4GB), then the log gets filled. Today's output was >65MB :-(
So far, I think these messages are just annoying but do no further harm.
The second problem is that the network controler (rtl8169) seems to timeout
on TX, which is very bad if it happens during renewing your dhcp-lease :-(
the message looks like
Jun 2 18:25:23 [kernel] [ 1290.701314] ------------[ cut here ]------------
Jun 2 18:25:23 [kernel] [ 1290.701321] WARNING: at
net/sched/sch_generic.c:226 dev_watchdog+0x245/0x260()
Jun 2 18:25:23 [kernel] [ 1290.701326] Hardware name: System Product Name
Jun 2 18:25:23 [kernel] [ 1290.701330] NETDEV WATCHDOG: eth0 (r8169):
transmit timed out
Jun 2 18:25:23 [kernel] [ 1290.701334] Modules linked in: psmouse k8temp
pcspkr hwmon
Jun 2 18:25:23 [kernel] [ 1290.701348] Pid: 0, comm: swapper Not tainted
2.6.29.4 #2
Jun 2 18:25:23 [kernel] [ 1290.701352] Call Trace:
Jun 2 18:25:23 [kernel] [ 1290.701356] <IRQ> [<ffffffff810676c0>]
warn_slowpath+0xd0/0x130
Jun 2 18:25:23 [kernel] [ 1290.701375] [<ffffffff8108559b>] ?
sched_clock_cpu+0x13b/0x180
Jun 2 18:25:23 [kernel] [ 1290.701385] [<ffffffff8126cf19>] ?
strlcpy+0x49/0x60
Jun 2 18:25:23 [kernel] [ 1290.701391] [<ffffffff8141ca25>]
dev_watchdog+0x245/0x260
Jun 2 18:25:23 [kernel] [ 1290.701400] [<ffffffff815144db>] ?
_spin_unlock_irq+0x2b/0x40
Jun 2 18:25:23 [kernel] [ 1290.701410] [<ffffffff81091876>] ?
trace_hardirqs_on_caller+0x66/0x1a0
Jun 2 18:25:23 [kernel] [ 1290.701415] [<ffffffff8141c7e0>] ?
dev_watchdog+0x0/0x260
Jun 2 18:25:23 [kernel] [ 1290.701423] [<ffffffff810726c0>]
run_timer_softirq+0x170/0x250
Jun 2 18:25:23 [kernel] [ 1290.701431] [<ffffffff8106d95c>]
__do_softirq+0x9c/0x160
Jun 2 18:25:23 [kernel] [ 1290.701439] [<ffffffff8102b87c>]
call_softirq+0x1c/0x30
Jun 2 18:25:23 [kernel] [ 1290.701445] [<ffffffff8102cbb5>]
do_softirq+0x65/0xb0
Jun 2 18:25:23 [kernel] [ 1290.701451] [<ffffffff8106d595>]
irq_exit+0x85/0xb0
Jun 2 18:25:23 [kernel] [ 1290.701457] [<ffffffff8102ce6a>]
do_IRQ+0xba/0x1b0
Jun 2 18:25:23 [kernel] [ 1290.701463] [<ffffffff8102b0d3>]
ret_from_intr+0x0/0xf
Jun 2 18:25:23 [kernel] [ 1290.701467] <EOI> [<ffffffff81031acf>] ?
default_idle+0x4f/0x60
Jun 2 18:25:23 [kernel] [ 1290.701480] [<ffffffff81031acd>] ?
default_idle+0x4d/0x60
Jun 2 18:25:23 [kernel] [ 1290.701487] [<ffffffff81031b96>] ?
c1e_idle+0xa6/0x110
Jun 2 18:25:23 [kernel] [ 1290.701495] [<ffffffff81517b01>] ?
atomic_notifier_call_chain+0x11/0x20
Jun 2 18:25:23 [kernel] [ 1290.701504] [<ffffffff81028f27>] ?
cpu_idle+0x67/0xc0
Jun 2 18:25:23 [kernel] [ 1290.701513] [<ffffffff8150d068>] ?
start_secondary+0x163/0x1bb
Jun 2 18:25:23 [kernel] [ 1290.701518] ---[ end trace 301c1d6a9ee969de ]---
I don't know whether the first issue has anything todo with the second one.
The worst thing however is an occasional freeze of the whole system, which
happens very likely when there is high network load over nfs. If that
happens, I have a blank console, no network, no keyboard, nothing. Not even
sysrq seems to work, which makes it pretty hard to tell what has happened.
After resetting the machine, there is nothing suspicious in the logs (it
just ends).
Does anyone have suggestions?
Thanks in advance
Alex
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [gentoo-user] Serious stability problems, including freezes
2009-06-02 20:40 [gentoo-user] Serious stability problems, including freezes Alexander Puchmayr
@ 2009-06-03 6:17 ` Florian Philipp
2009-06-03 6:40 ` Stefan G. Weichinger
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Florian Philipp @ 2009-06-03 6:17 UTC (permalink / raw
To: gentoo-user
Alexander Puchmayr schrieb:
> Hi there!
>
> My freshly setup homeserver has serious stability problem, which make the
> the machine absolutely unuseable as home server.
>
[...]
>
> The problem is, when I copy large amount of data via nfs to the server, then
> the logs get filled with entries like this:
>
[...]
>
> Sometimes its kswap0, nfsd, swapper, and a lot of other progs causing it.
>
> However, the systems runs without any problem until it has enough physical
> memory. If I copy larger amounts of data (e.g. typical dvd-iso images of
> ~4GB), then the log gets filled. Today's output was >65MB :-(
> So far, I think these messages are just annoying but do no further harm.
>
> The second problem is that the network controler (rtl8169) seems to timeout
> on TX, which is very bad if it happens during renewing your dhcp-lease :-(
> the message looks like
>
[...]
>
> I don't know whether the first issue has anything todo with the second one.
>
> The worst thing however is an occasional freeze of the whole system, which
> happens very likely when there is high network load over nfs. If that
> happens, I have a blank console, no network, no keyboard, nothing. Not even
> sysrq seems to work, which makes it pretty hard to tell what has happened.
> After resetting the machine, there is nothing suspicious in the logs (it
> just ends).
>
> Does anyone have suggestions?
>
Do you have a spare network adapter, maybe an older 100MBit PCI card?
Maybe we should rule out a hardware fault on your ethernet chipset first.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [gentoo-user] Serious stability problems, including freezes
2009-06-03 6:17 ` Florian Philipp
@ 2009-06-03 6:40 ` Stefan G. Weichinger
2009-06-03 6:44 ` Alexander Puchmayr
2009-06-03 7:03 ` [gentoo-user] Serious stability problems, including freezes Roy Wright
2 siblings, 0 replies; 9+ messages in thread
From: Stefan G. Weichinger @ 2009-06-03 6:40 UTC (permalink / raw
To: gentoo-user
Florian Philipp schrieb:
>> Does anyone have suggestions?
>
> Do you have a spare network adapter, maybe an older 100MBit PCI card?
> Maybe we should rule out a hardware fault on your ethernet chipset first.
I agree on this.
question: what filesystem do you use for the nfs-exported directory?
I had issues with XFS exported via NFS, although that was with 2.6.29
only, 2.6.28 works fine.
Seems very likely that your NIC is the problem ...
Stefan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [gentoo-user] Serious stability problems, including freezes
2009-06-03 6:17 ` Florian Philipp
2009-06-03 6:40 ` Stefan G. Weichinger
@ 2009-06-03 6:44 ` Alexander Puchmayr
2009-06-03 7:08 ` Stefan G. Weichinger
2009-06-03 7:03 ` [gentoo-user] Serious stability problems, including freezes Roy Wright
2 siblings, 1 reply; 9+ messages in thread
From: Alexander Puchmayr @ 2009-06-03 6:44 UTC (permalink / raw
To: gentoo-user
Am Mittwoch 03 Juni 2009 schrieb Florian Philipp:
>
> Do you have a spare network adapter, maybe an older 100MBit PCI card?
> Maybe we should rule out a hardware fault on your ethernet chipset first.
>
I already thought on this, but the results of my tests dont indicate a
hardware fault on the ethernet chipset, because:
* I can run a ping -f to the machine, it runs for hours without the
slightest problem
* As long as files transfered are small enough (i.e. they fit in the cache
buffer on the server) and the server has enough time to write back it to
the disk, there is no problem
* If I explicitly force the ethernet link to be 100FD instead of gigabit,
the is also no problem. So I don't expect any error using another 100MBit
card.
For me it looks like as if the following is happening:
* Memory gets filled up with cached files, no problem so far
* If no more physical ram is available, the system tries to free some memory
internally, e.g. by flushing the caches.
* If releasing cache entries and writing back data to their respective
files does not perform fast enough, an internal memory allocation may not
succeed, and I see the "page allocation failure" messages, with different
processes/kernel threads in the first line.
* I assume that most of the internal kernel threads don't get a problem in
this situation, but there may be some critical parts where we do. Hence, it
might just be a matter of probability whether it encounters such a critical
part, and the probabilty increases with the MB/s the data is put to the NFS
server.
Greetings
Alex
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [gentoo-user] Serious stability problems, including freezes
2009-06-03 6:44 ` Alexander Puchmayr
@ 2009-06-03 7:08 ` Stefan G. Weichinger
2009-06-03 16:47 ` [gentoo-user] Serious stability problems [SOLVED] Alexander Puchmayr
0 siblings, 1 reply; 9+ messages in thread
From: Stefan G. Weichinger @ 2009-06-03 7:08 UTC (permalink / raw
To: gentoo-user
Alexander Puchmayr schrieb:
> Am Mittwoch 03 Juni 2009 schrieb Florian Philipp:
>> Do you have a spare network adapter, maybe an older 100MBit PCI card?
>> Maybe we should rule out a hardware fault on your ethernet chipset first.
>>
> I already thought on this, but the results of my tests dont indicate a
> hardware fault on the ethernet chipset, because:
>
> * I can run a ping -f to the machine, it runs for hours without the
> slightest problem
> * As long as files transfered are small enough (i.e. they fit in the cache
> buffer on the server) and the server has enough time to write back it to
> the disk, there is no problem
> * If I explicitly force the ethernet link to be 100FD instead of gigabit,
> the is also no problem. So I don't expect any error using another 100MBit
> card.
I would cross-check that anyway just to be sure. Other nic, other
kernel-module ... etc
> For me it looks like as if the following is happening:
>
> * Memory gets filled up with cached files, no problem so far
> * If no more physical ram is available, the system tries to free some memory
> internally, e.g. by flushing the caches.
> * If releasing cache entries and writing back data to their respective
> files does not perform fast enough, an internal memory allocation may not
> succeed, and I see the "page allocation failure" messages, with different
> processes/kernel threads in the first line.
> * I assume that most of the internal kernel threads don't get a problem in
> this situation, but there may be some critical parts where we do. Hence, it
> might just be a matter of probability whether it encounters such a critical
> part, and the probabilty increases with the MB/s the data is put to the NFS
> server.
errm, I dunno ... but how would then smaller and slower nfs-servers run
fine? Sounds unlikely to me.
Any special network-settings used? buffer-sizes, MTU, jumbo frames?
switch problems (you seem to have tried auto-negotiation off already).
Stefan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [gentoo-user] Serious stability problems [SOLVED]
2009-06-03 7:08 ` Stefan G. Weichinger
@ 2009-06-03 16:47 ` Alexander Puchmayr
2009-06-03 17:02 ` Volker Armin Hemmann
2009-06-06 7:48 ` [gentoo-user] Serious stability problems [AGAIN] Alexander Puchmayr
0 siblings, 2 replies; 9+ messages in thread
From: Alexander Puchmayr @ 2009-06-03 16:47 UTC (permalink / raw
To: gentoo-user
After a long discussion with a friend I changed some kernel-configs:
* Large Receive Offload (INET_LRO) disabled.
* Changed allocator from SLUB to SLAB
These two made the system much more stable, much less allocation error
messages, but not yet stable enough (still some freezes)
* Increased /proc/sys/vm/min_free_kbytes. was at 8065, changed it to 16000
which made the page allocations nearly disappear at all, with 20000 I
didn't find any message after having copied more than 500GB. But still one
freeze :-(
* Tried kernel boot parameter "noapic". Copied 500GB without any problem. No
freeze, no page alloc error, no transmit timeout in rtl8169 driver.
With this configuration I consider the problem as solved.
Greetings and thanks to all who answered me
Alex
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [gentoo-user] Serious stability problems [SOLVED]
2009-06-03 16:47 ` [gentoo-user] Serious stability problems [SOLVED] Alexander Puchmayr
@ 2009-06-03 17:02 ` Volker Armin Hemmann
2009-06-06 7:48 ` [gentoo-user] Serious stability problems [AGAIN] Alexander Puchmayr
1 sibling, 0 replies; 9+ messages in thread
From: Volker Armin Hemmann @ 2009-06-03 17:02 UTC (permalink / raw
To: gentoo-user
On Mittwoch 03 Juni 2009, Alexander Puchmayr wrote:
> After a long discussion with a friend I changed some kernel-configs:
>
> * Large Receive Offload (INET_LRO) disabled.
> * Changed allocator from SLUB to SLAB
>
> These two made the system much more stable, much less allocation error
> messages, but not yet stable enough (still some freezes)
>
> * Increased /proc/sys/vm/min_free_kbytes. was at 8065, changed it to 16000
> which made the page allocations nearly disappear at all, with 20000 I
> didn't find any message after having copied more than 500GB. But still one
> freeze :-(
>
> * Tried kernel boot parameter "noapic". Copied 500GB without any problem.
> No freeze, no page alloc error, no transmit timeout in rtl8169 driver.
>
> With this configuration I consider the problem as solved.
>
> Greetings and thanks to all who answered me
> Alex
and now you go to lkml and report your finding there. It is either a hardware
bug which needs a workaround in the driver or a driver bug that must be fixed.
Either way, be a good citizen, report your problem.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [gentoo-user] Serious stability problems [AGAIN]
2009-06-03 16:47 ` [gentoo-user] Serious stability problems [SOLVED] Alexander Puchmayr
2009-06-03 17:02 ` Volker Armin Hemmann
@ 2009-06-06 7:48 ` Alexander Puchmayr
1 sibling, 0 replies; 9+ messages in thread
From: Alexander Puchmayr @ 2009-06-06 7:48 UTC (permalink / raw
To: gentoo-user
>
> With this configuration I consider the problem as solved.
>
I think this was too early ... still had some freezes in the last days :-(
Started new thread on this topic with other details
Greetings
Alex
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [gentoo-user] Serious stability problems, including freezes
2009-06-03 6:17 ` Florian Philipp
2009-06-03 6:40 ` Stefan G. Weichinger
2009-06-03 6:44 ` Alexander Puchmayr
@ 2009-06-03 7:03 ` Roy Wright
2 siblings, 0 replies; 9+ messages in thread
From: Roy Wright @ 2009-06-03 7:03 UTC (permalink / raw
To: gentoo-user
On Jun 3, 2009, at 1:17 AM, Florian Philipp wrote:
> Alexander Puchmayr schrieb:
>> Hi there!
>>
>> My freshly setup homeserver has serious stability problem, which
>> make the
>> the machine absolutely unuseable as home server.
>>
> [...]
>>
>> The problem is, when I copy large amount of data via nfs to the
>> server, then
>> the logs get filled with entries like this:
>>
>> The second problem is that the network controler (rtl8169) seems to
>> timeout
>> on TX, which is very bad if it happens during renewing your dhcp-
>> lease :-(
>> the message looks like
>>
>> The worst thing however is an occasional freeze of the whole
>> system, which
>> happens very likely when there is high network load over nfs. If that
>> happens, I have a blank console, no network, no keyboard, nothing.
>> Not even
>> sysrq seems to work, which makes it pretty hard to tell what has
>> happened.
>> After resetting the machine, there is nothing suspicious in the
>> logs (it
>> just ends).
>>
>> Does anyone have suggestions?
>>
>
> Do you have a spare network adapter, maybe an older 100MBit PCI card?
> Maybe we should rule out a hardware fault on your ethernet chipset
> first.
>
I had a similar issue over a year ago. My problem turned out to be
the switch I
was connecting to was bad. I didn't discover this until I physically
ran a 50ft
cable between the two computers just to rule out my infrastructure.
Surprise. :)
Just for grins, here's my nfs mount options:
-rsize=8192,wsize=8192,soft,timeo=30,rw
My main gentoo box is down at the moment (power supply being RMA'd), but
I recall I had to set the unsecure option so my macbook could connect.
HTH,
Roy
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-06-06 7:49 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-02 20:40 [gentoo-user] Serious stability problems, including freezes Alexander Puchmayr
2009-06-03 6:17 ` Florian Philipp
2009-06-03 6:40 ` Stefan G. Weichinger
2009-06-03 6:44 ` Alexander Puchmayr
2009-06-03 7:08 ` Stefan G. Weichinger
2009-06-03 16:47 ` [gentoo-user] Serious stability problems [SOLVED] Alexander Puchmayr
2009-06-03 17:02 ` Volker Armin Hemmann
2009-06-06 7:48 ` [gentoo-user] Serious stability problems [AGAIN] Alexander Puchmayr
2009-06-03 7:03 ` [gentoo-user] Serious stability problems, including freezes Roy Wright
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox