* [gentoo-user] Kernel freezes
@ 2009-06-06 8:33 Alexander Puchmayr
2009-06-06 9:12 ` Volker Armin Hemmann
0 siblings, 1 reply; 5+ messages in thread
From: Alexander Puchmayr @ 2009-06-06 8:33 UTC (permalink / raw
To: gentoo-user
Hi there!
This week I've tried to setup a home-server, but the system is highly
instable. The first symptoms were lots of page allocation errors, which
disappeared after setting the internal memory allocator from SLUB to SLAB
and increasing the min_free_kbytes in /proc/sys/vm from 8MB to 20MB.
The machine is a AMD Athlon64X2 5050e on a asus M3A78-Pro board with 2x2GB
RAM. I'm using kernel 2.6.29.4 (vanilla, but the result is the same as
using 2.6.29-gentoo-r5), and I also upgraded the board's BIOS to the latest
version (which is 0902)
But still the system freezes after some hours. It just freezes. Console is
dead, no entry in the logs, no network connectivity, even sysrq doesn't
seem to do anything. The worst thing is I don't even have an idea what the
error could be, and in the rare situations when it crashed and the console
was not blanked, I only see the end of a stack trace, and the intresting
parts are scrolled out (and I can't scroll back as the console is
absolutely dead :-( ) The only button that is still working is the reset
button, and after rebooting the log does't tell anything (just ends without
any message)
I inspected my dmesg-output right after booting more precisely, and I've
found some strange entries which could indicate a problem. What do you
think about them?
[ 0.000000] ACPI Warning (tbfadt-0568): 32/64X length mismatch in
Gpe0Block: 64/32 [20081204]
[ 0.000000] FADT: X_PM1a_EVT_BLK.bit_width (16) does not match
PM1_EVT_LEN (4)
...
[ 0.000000] 4 Processors exceeds NR_CPUS limit of 2
[ 0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
...
[ 0.000999] Aperture pointing to e820 RAM. Ignoring.
[ 0.000999] Your BIOS doesn't leave a aperture memory hole
[ 0.000999] Please enable the IOMMU option in the BIOS setup
[ 0.000999] This costs you 64 MB of RAM
[ 0.000999] Mapping aperture over 65536 KB of RAM @ 20000000
[ 0.000999] PM: Registered nosave memory: 0000000020000000 -
0000000024000000
...
[ 0.099055] mtrr: your CPUs had inconsistent fixed MTRR settings
[ 0.099059] mtrr: probably your BIOS does not setup all CPUs.
[ 0.099116] mtrr: corrected configuration.
...
[ 0.151260] PCI-DMA: Disabling AGP.
[ 0.151260] PCI-DMA: aperture base @ 20000000 size 65536 KB
[ 0.151260] PCI-DMA: using GART IOMMU.
[ 0.151260] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
...
[ 0.163241] system 00:09: iomem range 0xfec00000-0xfec00fff has been
reserved
[ 0.163305] system 00:09: iomem range 0xfee00000-0xfee00fff has been
reserved
[ 0.163365] system 00:0a: ioport range 0x4d0-0x4d1 has been reserved
[ 0.163422] system 00:0a: ioport range 0x40b-0x40b has been reserved
[ 0.163480] system 00:0a: ioport range 0x4d6-0x4d6 has been reserved
[ 0.163537] system 00:0a: ioport range 0xc00-0xc01 has been reserved
[ 0.163595] system 00:0a: ioport range 0xc14-0xc14 has been reserved
[ 0.163653] system 00:0a: ioport range 0xc50-0xc51 has been reserved
[ 0.163711] system 00:0a: ioport range 0xc52-0xc52 has been reserved
[ 0.163769] system 00:0a: ioport range 0xc6c-0xc6c has been reserved
[ 0.163827] system 00:0a: ioport range 0xc6f-0xc6f has been reserved
[ 0.163885] system 00:0a: ioport range 0xcd0-0xcd1 has been reserved
[ 0.163942] system 00:0a: ioport range 0xcd2-0xcd3 has been reserved
[ 0.163999] system 00:0a: ioport range 0xcd4-0xcd5 has been reserved
[ 0.164070] system 00:0a: ioport range 0xcd6-0xcd7 has been reserved
[ 0.164127] system 00:0a: ioport range 0xcd8-0xcdf has been reserved
[ 0.164184] system 00:0a: ioport range 0x800-0x89f has been reserved
[ 0.164241] system 00:0a: ioport range 0xb00-0xb3f has been reserved
[ 0.164305] system 00:0a: ioport range 0x900-0x90f has been reserved
[ 0.164363] system 00:0a: ioport range 0x910-0x91f has been reserved
[ 0.164421] system 00:0a: ioport range 0xfe00-0xfefe has been reserved
[ 0.164480] system 00:0a: iomem range 0xffb80000-0xffbfffff has been
reserved
[ 0.164538] system 00:0a: iomem range 0xfec10000-0xfec1001f has been
reserved
[ 0.164598] system 00:0c: ioport range 0xe00-0xe0f has been reserved
[ 0.164656] system 00:0c: ioport range 0xe80-0xe8f has been reserved
[ 0.164713] system 00:0c: ioport range 0xf40-0xf4f has been reserved
[ 0.164771] system 00:0c: ioport range 0xa30-0xa3f has been reserved
[ 0.164830] system 00:0d: iomem range 0xe0000000-0xefffffff has been
reserved
[ 0.164890] system 00:0e: iomem range 0x0-0x9ffff could not be reserved
[ 0.164947] system 00:0e: iomem range 0xc0000-0xcffff has been reserved
[ 0.165018] system 00:0e: iomem range 0xe0000-0xfffff could not be
reserved
[ 0.165076] system 00:0e: iomem range 0x100000-0xdfffffff could not be
reserved
[ 0.165158] system 00:0e: iomem range 0xfec00000-0xffffffff could not be
reserved
...
[ 21.298450] ACPI: I/O resource piix4_smbus [0xb00-0xb07] conflicts with
ACPI region SOR1 [0xb00-0xb0f]
[ 21.298454] ACPI: Device needs an ACPI driver
[ 21.298461] piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00,
revision 0
...
[ 73.861479] ACPI: I/O resource it87 [0xe85-0xe86] conflicts with ACPI
region HWRE [0xe85-0xe86]
[ 73.861483] ACPI: Device needs an ACPI driver
Whats does this message "4 Processors exceeds NR_CPUS" say? the system is a
Dual-Core AMD Athlon64 5050e, AFAIK it has two cores and nothing more. The
mttr-Message later also indicate that there could be more than 2 CPUs
available. wondering...
The next thing which seems somewhat strange to me is the AGP aperture and
the IOMMU. The Mainboard does not have an AGP port, nor does the bios have
any option to enable. The only thing I can set is the size of the memory
reservered for the onboad video card, which I set to the smallest value of
32MB as the machine will usually not even have a display.
The iomem-range reservation errors at the end? Harmful or not?
The last messages come after loading the hw-sensors modules it87.ko and
i2c_piix4.
Thanks in advance for suggestions
Alex
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [gentoo-user] Kernel freezes
2009-06-06 8:33 [gentoo-user] Kernel freezes Alexander Puchmayr
@ 2009-06-06 9:12 ` Volker Armin Hemmann
2009-06-06 10:42 ` Alexander Puchmayr
0 siblings, 1 reply; 5+ messages in thread
From: Volker Armin Hemmann @ 2009-06-06 9:12 UTC (permalink / raw
To: gentoo-user
On Samstag 06 Juni 2009, Alexander Puchmayr wrote:
> Hi there!
>
> This week I've tried to setup a home-server, but the system is highly
> instable. The first symptoms were lots of page allocation errors, which
> disappeared after setting the internal memory allocator from SLUB to SLAB
> and increasing the min_free_kbytes in /proc/sys/vm from 8MB to 20MB.
>
> The machine is a AMD Athlon64X2 5050e on a asus M3A78-Pro board with 2x2GB
> RAM. I'm using kernel 2.6.29.4 (vanilla, but the result is the same as
> using 2.6.29-gentoo-r5), and I also upgraded the board's BIOS to the latest
> version (which is 0902)
>
> But still the system freezes after some hours. It just freezes. Console is
> dead, no entry in the logs, no network connectivity, even sysrq doesn't
> seem to do anything. The worst thing is I don't even have an idea what the
> error could be, and in the rare situations when it crashed and the console
> was not blanked, I only see the end of a stack trace, and the intresting
> parts are scrolled out (and I can't scroll back as the console is
> absolutely dead :-( ) The only button that is still working is the reset
> button, and after rebooting the log does't tell anything (just ends without
> any message)
>
> I inspected my dmesg-output right after booting more precisely, and I've
> found some strange entries which could indicate a problem. What do you
> think about them?
>
> [ 0.000000] ACPI Warning (tbfadt-0568): 32/64X length mismatch in
> Gpe0Block: 64/32 [20081204]
> [ 0.000000] FADT: X_PM1a_EVT_BLK.bit_width (16) does not match
> PM1_EVT_LEN (4)
> ...
> [ 0.000000] 4 Processors exceeds NR_CPUS limit of 2
> [ 0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
> ...
> [ 0.000999] Aperture pointing to e820 RAM. Ignoring.
> [ 0.000999] Your BIOS doesn't leave a aperture memory hole
> [ 0.000999] Please enable the IOMMU option in the BIOS setup
> [ 0.000999] This costs you 64 MB of RAM
> [ 0.000999] Mapping aperture over 65536 KB of RAM @ 20000000
> [ 0.000999] PM: Registered nosave memory: 0000000020000000 -
> 0000000024000000
> ...
> [ 0.099055] mtrr: your CPUs had inconsistent fixed MTRR settings
> [ 0.099059] mtrr: probably your BIOS does not setup all CPUs.
> [ 0.099116] mtrr: corrected configuration.
> ...
> [ 0.151260] PCI-DMA: Disabling AGP.
> [ 0.151260] PCI-DMA: aperture base @ 20000000 size 65536 KB
> [ 0.151260] PCI-DMA: using GART IOMMU.
> [ 0.151260] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
> ...
> [ 0.163241] system 00:09: iomem range 0xfec00000-0xfec00fff has been
> reserved
> [ 0.163305] system 00:09: iomem range 0xfee00000-0xfee00fff has been
> reserved
> [ 0.163365] system 00:0a: ioport range 0x4d0-0x4d1 has been reserved
> [ 0.163422] system 00:0a: ioport range 0x40b-0x40b has been reserved
> [ 0.163480] system 00:0a: ioport range 0x4d6-0x4d6 has been reserved
> [ 0.163537] system 00:0a: ioport range 0xc00-0xc01 has been reserved
> [ 0.163595] system 00:0a: ioport range 0xc14-0xc14 has been reserved
> [ 0.163653] system 00:0a: ioport range 0xc50-0xc51 has been reserved
> [ 0.163711] system 00:0a: ioport range 0xc52-0xc52 has been reserved
> [ 0.163769] system 00:0a: ioport range 0xc6c-0xc6c has been reserved
> [ 0.163827] system 00:0a: ioport range 0xc6f-0xc6f has been reserved
> [ 0.163885] system 00:0a: ioport range 0xcd0-0xcd1 has been reserved
> [ 0.163942] system 00:0a: ioport range 0xcd2-0xcd3 has been reserved
> [ 0.163999] system 00:0a: ioport range 0xcd4-0xcd5 has been reserved
> [ 0.164070] system 00:0a: ioport range 0xcd6-0xcd7 has been reserved
> [ 0.164127] system 00:0a: ioport range 0xcd8-0xcdf has been reserved
> [ 0.164184] system 00:0a: ioport range 0x800-0x89f has been reserved
> [ 0.164241] system 00:0a: ioport range 0xb00-0xb3f has been reserved
> [ 0.164305] system 00:0a: ioport range 0x900-0x90f has been reserved
> [ 0.164363] system 00:0a: ioport range 0x910-0x91f has been reserved
> [ 0.164421] system 00:0a: ioport range 0xfe00-0xfefe has been reserved
> [ 0.164480] system 00:0a: iomem range 0xffb80000-0xffbfffff has been
> reserved
> [ 0.164538] system 00:0a: iomem range 0xfec10000-0xfec1001f has been
> reserved
> [ 0.164598] system 00:0c: ioport range 0xe00-0xe0f has been reserved
> [ 0.164656] system 00:0c: ioport range 0xe80-0xe8f has been reserved
> [ 0.164713] system 00:0c: ioport range 0xf40-0xf4f has been reserved
> [ 0.164771] system 00:0c: ioport range 0xa30-0xa3f has been reserved
> [ 0.164830] system 00:0d: iomem range 0xe0000000-0xefffffff has been
> reserved
> [ 0.164890] system 00:0e: iomem range 0x0-0x9ffff could not be reserved
> [ 0.164947] system 00:0e: iomem range 0xc0000-0xcffff has been reserved
> [ 0.165018] system 00:0e: iomem range 0xe0000-0xfffff could not be
> reserved
> [ 0.165076] system 00:0e: iomem range 0x100000-0xdfffffff could not be
> reserved
> [ 0.165158] system 00:0e: iomem range 0xfec00000-0xffffffff could not be
> reserved
> ...
> [ 21.298450] ACPI: I/O resource piix4_smbus [0xb00-0xb07] conflicts with
> ACPI region SOR1 [0xb00-0xb0f]
> [ 21.298454] ACPI: Device needs an ACPI driver
> [ 21.298461] piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00,
> revision 0
> ...
> [ 73.861479] ACPI: I/O resource it87 [0xe85-0xe86] conflicts with ACPI
> region HWRE [0xe85-0xe86]
> [ 73.861483] ACPI: Device needs an ACPI driver
>
> Whats does this message "4 Processors exceeds NR_CPUS" say? the system is a
> Dual-Core AMD Athlon64 5050e, AFAIK it has two cores and nothing more. The
> mttr-Message later also indicate that there could be more than 2 CPUs
> available. wondering...
>
> The next thing which seems somewhat strange to me is the AGP aperture and
> the IOMMU. The Mainboard does not have an AGP port, nor does the bios have
> any option to enable. The only thing I can set is the size of the memory
> reservered for the onboad video card, which I set to the smallest value of
> 32MB as the machine will usually not even have a display.
>
> The iomem-range reservation errors at the end? Harmful or not?
>
> The last messages come after loading the hw-sensors modules it87.ko and
> i2c_piix4.
>
> Thanks in advance for suggestions
> Alex
*sigh* Ok, just for starters - all AMD cpus of the Athlon64 architecture have
a builtin agpgart. This agpgart functions also as an iommu. This is a great
hack to have a hardware iommu . Intel does not have this, so they rely on
software. The solution came up while AMD devs and linux kernel devs worked
together.
Please read the following links:
http://en.wikipedia.org/wiki/Iommu
http://marc.info/?l=linux-kernel&m=107759901509280&w=2
http://marc.info/?l=linux-kernel&m=107764033904042&w=2
the iommu is needed so 32bit pci devices can live with their pci adress space
behind 4gb and other sweet things.
Sadly the iommu needs a minimum on memory for itself - and uses the agp-
aperture. This is fine, but mobo vendors suck and make it too small/or not
available. In that case the kernel is forced to use real memory for the iommu.
In short, that message has nothing to do with your problem.
The NR_CPU message is confusing - I strongly suspect that your kernel config
is really fucked uo.
The iomem-range messages are harmless.
Please enable:
[] Check for low memory corruption
[] Reserve low 64K of RAM on AMI/Phoenix BIOSen
in the kernel config. Also clean it up and remove stuff like 'hyperthreading
scheduler'.
If the problem persists, start testing your hardware.
I would suspect the PSU.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [gentoo-user] Kernel freezes
2009-06-06 9:12 ` Volker Armin Hemmann
@ 2009-06-06 10:42 ` Alexander Puchmayr
2009-06-06 11:02 ` Volker Armin Hemmann
0 siblings, 1 reply; 5+ messages in thread
From: Alexander Puchmayr @ 2009-06-06 10:42 UTC (permalink / raw
To: gentoo-user
Am Samstag 06 Juni 2009 schrieb Volker Armin Hemmann:
> *sigh* Ok, just for starters - all AMD cpus of the Athlon64 architecture
> have a builtin agpgart. This agpgart functions also as an iommu. This is
> a great hack to have a hardware iommu . Intel does not have this, so they
> rely on software. The solution came up while AMD devs and linux kernel
> devs worked together.
> Please read the following links:
>
> http://en.wikipedia.org/wiki/Iommu
>
> http://marc.info/?l=linux-kernel&m=107759901509280&w=2
>
> http://marc.info/?l=linux-kernel&m=107764033904042&w=2
>
> the iommu is needed so 32bit pci devices can live with their pci adress
> space behind 4gb and other sweet things.
>
> Sadly the iommu needs a minimum on memory for itself - and uses the agp-
> aperture. This is fine, but mobo vendors suck and make it too small/or
> not available. In that case the kernel is forced to use real memory for
> the iommu.
>
> In short, that message has nothing to do with your problem.
>
Thanks for these informative links
> The NR_CPU message is confusing - I strongly suspect that your kernel
> config is really fucked uo.
?? As I have a DualCore-Cpu, I changed NR_CPU to 2, something wrong with
that? What else can be fucked up? I Enabled Multi-core scheduler
(hyperthread is disabled)
>
> Please enable:
>
> [] Check for low memory corruption
> [] Reserve low 64K of RAM on AMI/Phoenix BIOSen
>
> in the kernel config. Also clean it up and remove stuff like
> 'hyperthreading scheduler'.
>
Already done.
> If the problem persists, start testing your hardware.
>
How? I don't have access to special test equipment to test hardware. This is
the only AM2(+) board and the only AM2 CPU I have. The RAM is also unique
to this machine.
> I would suspect the PSU.
The Powersupply? What makes you think that the PSU can be the cause of a
system crash?
Greetings
Alex
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [gentoo-user] Kernel freezes
2009-06-06 10:42 ` Alexander Puchmayr
@ 2009-06-06 11:02 ` Volker Armin Hemmann
2009-06-06 13:49 ` Alexander Puchmayr
0 siblings, 1 reply; 5+ messages in thread
From: Volker Armin Hemmann @ 2009-06-06 11:02 UTC (permalink / raw
To: gentoo-user
On Samstag 06 Juni 2009, Alexander Puchmayr wrote:
> Am Samstag 06 Juni 2009 schrieb Volker Armin Hemmann:
> > *sigh* Ok, just for starters - all AMD cpus of the Athlon64 architecture
> > have a builtin agpgart. This agpgart functions also as an iommu. This is
> > a great hack to have a hardware iommu . Intel does not have this, so they
> > rely on software. The solution came up while AMD devs and linux kernel
> > devs worked together.
> > Please read the following links:
> >
> > http://en.wikipedia.org/wiki/Iommu
> >
> > http://marc.info/?l=linux-kernel&m=107759901509280&w=2
> >
> > http://marc.info/?l=linux-kernel&m=107764033904042&w=2
> >
> > the iommu is needed so 32bit pci devices can live with their pci adress
> > space behind 4gb and other sweet things.
> >
> > Sadly the iommu needs a minimum on memory for itself - and uses the agp-
> > aperture. This is fine, but mobo vendors suck and make it too small/or
> > not available. In that case the kernel is forced to use real memory for
> > the iommu.
> >
> > In short, that message has nothing to do with your problem.
>
> Thanks for these informative links
>
> > The NR_CPU message is confusing - I strongly suspect that your kernel
> > config is really fucked uo.
>
> ?? As I have a DualCore-Cpu, I changed NR_CPU to 2, something wrong with
> that? What else can be fucked up? I Enabled Multi-core scheduler
> (hyperthread is disabled)
>
no, it should be fine. Hmhm. Multi-core-scheduler should not make any
difference at all.
> > Please enable:
> >
> > [] Check for low memory corruption
> > [] Reserve low 64K of RAM on AMI/Phoenix BIOSen
> >
> > in the kernel config. Also clean it up and remove stuff like
> > 'hyperthreading scheduler'.
>
> Already done.
ok
>
> > If the problem persists, start testing your hardware.
>
> How? I don't have access to special test equipment to test hardware. This
> is the only AM2(+) board and the only AM2 CPU I have. The RAM is also
> unique to this machine.
>
> > I would suspect the PSU.
>
> The Powersupply? What makes you think that the PSU can be the cause of a
> system crash?
experience. A PSU gone bad can cause all kind of bad behaviour. Crash under
load. Crash when temps rise. Sudden reboots, harddisk damage. Mobo dead. CPU
dead. Seriously I have seen so many problems caused by PSUs over the years
that the PSU is always the first thing I check....
Ram is the easiest: get systemrescuecd, boot from cd, run memtest for a couple
of hours. If it finds errors, bingo, you found the probable culprit. In that
case raising the memory voltage a tine bit (0.05 or 0.1Volt) could be all that
is needed (I have such a memory stick myself. 1.80V and I have crashes. 1.85V
and everything works).
But memory errors also can be caused by bad PSU. So.. ask a friend for a psu
or your local hardware dealer if they can lend you one - if your problems go
away with a different PSU you found the real source.
Also, cleaning all parts (even the psu's inside) might solve the problem,
maybe it is still heat related.
While your box is open, check all capacitor's for bulges, especially on the
top, discolorment and stuff around their base. Also remove all cards and
addons and put them back, make sure that they really sit correctly.
You are running the latest bios, of course?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [gentoo-user] Kernel freezes
2009-06-06 11:02 ` Volker Armin Hemmann
@ 2009-06-06 13:49 ` Alexander Puchmayr
0 siblings, 0 replies; 5+ messages in thread
From: Alexander Puchmayr @ 2009-06-06 13:49 UTC (permalink / raw
To: gentoo-user
Am Samstag 06 Juni 2009 schrieb Volker Armin Hemmann:
>
> > > If the problem persists, start testing your hardware.
> >
> > How? I don't have access to special test equipment to test hardware.
> > This is the only AM2(+) board and the only AM2 CPU I have. The RAM is
> > also unique to this machine.
> >
> > > I would suspect the PSU.
> >
> > The Powersupply? What makes you think that the PSU can be the cause of
> > a system crash?
>
> experience. A PSU gone bad can cause all kind of bad behaviour. Crash
> under load. Crash when temps rise. Sudden reboots, harddisk damage. Mobo
> dead. CPU dead. Seriously I have seen so many problems caused by PSUs
> over the years that the PSU is always the first thing I check....
>
> Ram is the easiest: get systemrescuecd, boot from cd, run memtest for a
> couple of hours. If it finds errors, bingo, you found the probable
> culprit. In that case raising the memory voltage a tine bit (0.05 or
> 0.1Volt) could be all that is needed (I have such a memory stick myself.
> 1.80V and I have crashes. 1.85V and everything works).
> But memory errors also can be caused by bad PSU. So.. ask a friend for a
> psu or your local hardware dealer if they can lend you one - if your
> problems go away with a different PSU you found the real source.
> Also, cleaning all parts (even the psu's inside) might solve the problem,
> maybe it is still heat related.
>
> While your box is open, check all capacitor's for bulges, especially on
> the top, discolorment and stuff around their base. Also remove all cards
> and addons and put them back, make sure that they really sit correctly.
>
> You are running the latest bios, of course?
>
Luckily the PSU of my desktop was new enough to fit to that board -- I've
tried, and after roughly half an hour it crashed again. Same symptoms.
Nothing changed :-(
Memtest86+ has already been running for one night lately -- no errors found.
The caps are all OK and the board is clean, as I bought it just one week
ago.
Maybe its best to talk with my hardware dealer about it, as there is
warranty on it.
Thanks for your help
Alex
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-06-06 13:49 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-06 8:33 [gentoo-user] Kernel freezes Alexander Puchmayr
2009-06-06 9:12 ` Volker Armin Hemmann
2009-06-06 10:42 ` Alexander Puchmayr
2009-06-06 11:02 ` Volker Armin Hemmann
2009-06-06 13:49 ` Alexander Puchmayr
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox