public inbox for gentoo-amd64@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-amd64] Problem with emerge on a dual-processor machine
@ 2006-10-31 18:46 Vesna Petrovic
  2006-10-31 18:54 ` Bob Sanders
  2006-10-31 23:39 ` [gentoo-amd64] " Duncan
  0 siblings, 2 replies; 3+ messages in thread
From: Vesna Petrovic @ 2006-10-31 18:46 UTC (permalink / raw
  To: gentoo-amd64

[-- Attachment #1: Type: text/plain, Size: 12996 bytes --]

 Hello,

 I've just installed Gentoo on a dual-processor machine and now I'm running
into the following problem - when I start emerge, it randomly stops and one
of the following things happens:
  - the machine freezes completely so that I cannot switch to another
console or do anything
  - if I already have multiple ssh sessions open, sometimes one of the
sessions remains alive, but invoking any command freezes that session. Any
attempt to kill a process has no effect.
  - soft lockup detected on at least one cpu.

I'm running out of ideas what to try next, so I thought I would ask for
help.
Here is what I checked and tried so far:
  - configured and built the kernel with SMP and NUMA support.
Triple-checked this.
  - both processors are detected and initialized at boot. ACPI is used for
SMP configuration information
  - processor temperatures: CPU0 32 C, CPU1 31 C, system 39 C. System is
located in a room with steady 20 C
  - disabled CPU#1using
    echo 0 > /sys/devices/system/cpu/cpu1/online
    In this case, everything seems to work fine. This is the only way to
compile or emerge anything.
  - using MAKEOPTS="-j3". Tried with "-j2", but the same problem occurs.
  - checked if there are SMP specific USE flags, and the only one I could
find was for gimp.
  - experimented with different preemption models. The problem occurs with
all of them.
  - disabled APM and enabled ACPI 2.0 support. After I did this, I've got
"kernel panic - killing interrupt handler ...."

 The system has 2 AMD Opeteron Processors 252,  5 disks - 1IDE Maxtor
6B200R0 and 4 SCSI Maxtor 6L300S0, and probably irrelevant ATAPI 48X DVD-ROM
DVD-R CD-R/RW drive, Ethernet controller: Broadcom Corporation NetXtreme
BCM5703X Gigabit Ethernet (rev 02), RAID bus controller: Silicon Image, Inc.
SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02), FireWire (IEEE
1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link),
VGA compatible controller: ATI Technologies Inc RV350 AP [Radeon 9600]
05:00.1, Display controller: ATI Technologies Inc RV350 AP [Radeon 9600]
(Secondary).
  Kernel version is 2.6.17 built using gentoo-sources.

  Any idea what might be causing this problem? Bad kernel configuration? Bad
system configuration? Kernel bug? Portage bug? Defective processor? Problem
with disk access?
  I'm including a snapshot of the info I could retrieve from the system when
the system remained somewhat responsive after the problem occurred.

  Kind regards,
    Vesna


odin ~ # uname -a
Linux odin 2.6.17-gentoo-r8 #7 SMP PREEMPT Tue Oct 31 12:10:14 EST 2006
x86_64 AMD Opteron(tm) Processor 252 GNU/Linux

top - 22:42:38 up  7:57,  3 users,  load average: 7.99 , 7.71, 5.39
Tasks:  64 total,   8 running,  56 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.0% us, 100.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0%si
Cpu1  :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0% hi,  0.0%si
Mem:   6929848k total,   140192k used,  6789656k free,    15272k buffers
Swap:  5004236k total,        0k used,  5004236k free,    65328k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
COMMAND
 9811 root      16   0     0    0    0 R  100  0.0  18:27.43emerge
    1 root      16   0  2608  572  488 S    0  0.0   0:00.46init
    2 root      RT   0     0    0    0 R    0  0.0
0:00.00migration/0
    3 root      34  19     0    0    0 S    0  0.0
0:00.00ksoftirqd/0
    4 root      RT   0     0    0    0 R    0  0.0   0:00.00watchdog/0
    5 root      RT   0     0    0    0 S    0  0.0
0:00.00migration/1
    6 root      34  19     0    0    0 S    0  0.0
0:00.00ksoftirqd/1
    7 root      RT   0     0    0    0 S    0  0.0   0:00.00watchdog/1
    8 root      10  -5     0    0    0 R    0  0.0   0:00.00events/0
    9 root      10  -5     0    0    0 S    0  0.0   0:00.00events/1
   10 root      19  -5     0    0    0 S    0  0.0   0:00.00khelper
   11 root      10  -5     0    0    0 S    0  0.0   0:00.00kthread
   16 root      10  -5     0    0    0 R    0  0.0   0:00.00kblockd/0
   17 root      10  -5     0    0    0 S    0  0.0   0:00.00kblockd/1
   18 root      14  -5     0    0    0 S    0  0.0   0:00.00kacpid
  103 root      10  -5     0    0    0 S    0  0.0   0:00.02kseriod
  166 root      20   0     0    0    0 S    0  0.0   0:00.00pdflush
  167 root      15   0     0    0    0 S    0  0.0   0:00.00pdflush
  168 root      18   0     0    0    0 S    0  0.0   0:00.00kswapd0
  169 root      15   0     0    0    0 S    0  0.0   0:00.00kswapd1
  170 root      14  -5     0    0    0 S    0  0.0   0:00.00aio/0
  171 root      10  -5     0    0    0 S    0  0.0   0:00.00aio/1


top - 23:17:53 up  8:32,  3 users,  load average: 11.99, 11.92, 10.81
Tasks:  65 total,   8 running,  57 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.0% us, 100.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0%si
Cpu1  :  0.0% us,  0.0% sy,  0.0% ni,  0.0% id, 100.0% wa,  0.0% hi,  0.0%si
Mem:   6929848k total,   142444k used,  6787404k free,    15300k buffers
Swap:  5004236k total,        0k used,  5004236k free,    65560k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
COMMAND
 9811 root      16   0     0    0    0 R  100  0.0  53:43.13emerge
    1 root      16   0  2608  572  488 S    0  0.0   0:00.46init
    2 root      RT   0     0    0    0 R    0  0.0
0:00.00migration/0
    3 root      34  19     0    0    0 S    0  0.0
0:00.00ksoftirqd/0
    4 root      RT   0     0    0    0 R    0  0.0   0:00.00watchdog/0
    5 root      RT   0     0    0    0 S    0  0.0
0:00.00migration/1
    6 root      34  19     0    0    0 S    0  0.0
0:00.00ksoftirqd/1
    7 root      RT   0     0    0    0 S    0  0.0   0:00.00watchdog/1
    8 root      10  -5     0    0    0 R    0  0.0   0:00.00events/0
    9 root      10  -5     0    0    0 S    0  0.0   0:00.00events/1
   10 root      19  -5     0    0    0 S    0  0.0   0:00.00khelper
   11 root      10  -5     0    0    0 S    0  0.0   0:00.00kthread
   16 root      10  -5     0    0    0 R    0  0.0   0:00.00kblockd/0
   17 root      10  -5     0    0    0 S    0  0.0   0:00.00kblockd/1
   18 root      14  -5     0    0    0 S    0  0.0   0:00.00kacpid
  103 root      10  -5     0    0    0 S    0  0.0   0:00.02kseriod
  166 root      20   0     0    0    0 S    0  0.0   0:00.00pdflush
  167 root      15   0     0    0    0 D    0  0.0   0:00.00pdflush
  168 root      18   0     0    0    0 S    0  0.0   0:00.00kswapd0
  169 root      15   0     0    0    0 S    0  0.0   0:00.00kswapd1
  170 root      14  -5     0    0    0 S    0  0.0   0:00.00aio/0
  171 root      10  -5     0    0    0 S    0  0.0   0:00.00 aio/1


F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
4 S     0     1     0  0  76   0 -   652 -      ?        00:00:00 init
1 R     0     2     1  0 -40   - -     0 -      ?        00:00:00
migration/0
1 S     0     3     1  0  94  19 -     0 ksofti ?        00:00:00
ksoftirqd/0
5 R     0     4     1  0 -40   - -     0 -      ?        00:00:00 watchdog/0

1 S     0     5     1  0 -40   - -     0 migrat ?        00:00:00
migration/1
1 S     0     6     1  0  94  19 -     0 ksofti ?        00:00:00
ksoftirqd/1
5 S     0     7     1  0 -40   - -     0 watchd ?        00:00:00 watchdog/1

5 R     0     8     1  0  70  -5 -     0 -      ?        00:00:00 events/0
1 S     0     9     1  0  70  -5 -     0 worker ?        00:00:00 events/1
1 S     0    10     1  0  79  -5 -     0 worker ?        00:00:00 khelper
1 S     0    11     1  0  70  -5 -     0 worker ?        00:00:00 kthread
1 R     0    16    11  0  70  -5 -     0 -      ?        00:00:00 kblockd/0
1 S     0    17    11  0  70  -5 -     0 worker ?        00:00:00 kblockd/1
1 S     0    18    11  0  74  -5 -     0 worker ?        00:00:00 kacpid
1 S     0   103    11  0  70  -5 -     0 serio_ ?        00:00:00 kseriod
1 S     0   166    11  0  80   0 -     0 pdflus ?        00:00:00 pdflush
1 S     0   167    11  0  75   0 -     0 pdflus ?        00:00:00 pdflush
1 S     0   168     1  0  78   0 -     0 kswapd ?        00:00:00 kswapd0
1 S     0   169     1  0  75   0 -     0 kswapd ?        00:00:00 kswapd1
1 S     0   170    11  0  74  -5 -     0 worker ?        00:00:00 aio/0
1 S     0   171    11  0  70  -5 -     0 worker ?        00:00:00 aio/1
1 S     0   770    11  0  70  -5 -     0 worker ?        00:00:00 kpsmoused
1 S     0   818    11  0  70  -5 -     0 worker ?        00:00:00 ata/0
1 S     0   819    11  0  71  -5 -     0 worker ?        00:00:00 ata/1
1 S     0   821    11  0  71  -5 -     0 scsi_e ?        00:00:00 scsi_eh_0
1 S     0   822    11  0  71  -5 -     0 scsi_e ?        00:00:00 scsi_eh_1
1 S     0   823    11  0  71  -5 -     0 scsi_e ?        00:00:00 scsi_eh_2
1 S     0   824    11  0  70  -5 -     0 scsi_e ?        00:00:00 scsi_eh_3
1 S     0   850     1  0  75   0 -     0 -      ?        00:00:00 khpsbpkt
1 S     0   854     1  0  76   0 -     0 -      ?        00:00:00
knodemgrd_0
1 S     0   862    11  0  70  -5 -     0 kjourn ?        00:00:00 kjournald
5 S     0   973     1  0  78  -4 -  1764 -      ?        00:00:00 udevd
1 S     0  2119    11  0  71  -5 -     0 kjourn ?        00:00:00 kjournald
1 S     0  2123    11  0  71  -5 -     0 kjourn ?        00:00:00 kjournald
1 S     0  2129    11  0  71  -5 -     0 kjourn ?        00:00:00 kjournald
1 S     0  2134    11  0  71  -5 -     0 kjourn ?        00:00:00 kjournald
1 S     0  2139    11  0  71  -5 -     0 kjourn ?        00:00:00 kjournald
1 S     0  2144    11  0  70  -5 -     0 kjourn ?        00:00:00 kjournald
1 S     0  2149    11  0  71  -5 -     0 kjourn ?        00:00:00 kjournald
1 S     0  2157    11  0  72  -5 -     0 hub_th ?        00:00:00 khubd
5 S   111  4246     1  0  76   0 -  2248 -      ?        00:00:00 portmap
5 S     0  4314     1  0  84   0 -  5305 -      ?        00:00:00 ypbind
5 S 65534  4384     1  0  84   0 -  1463 -      ?        00:00:00 rpc.statd
1 S     0  4391    11  0  71  -5 -     0 worker ?        00:00:00 rpciod/0
1 S     0  4392    11  0  71  -5 -     0 worker ?        00:00:00 rpciod/1
1 S     0  4393     1  0  85   0 -     0 -      ?        00:00:00 lockd
1 S     0  4394     1  0  76   0 -  1462 -      ?        00:00:00 mount
1 S     0  4453     1  0  76   0 -  1461 -      ?        00:00:00 mount
5 S     0  4514     1  0  76   0 -  4294 -      ?        00:00:00 sshd
0 S     0  4585     1  0  77   0 -   917 -      tty1     00:00:00 agetty
0 S     0  4586     1  0  76   0 -   917 -      tty2     00:00:00 agetty
0 S     0  4587     1  0  76   0 -   917 -      tty3     00:00:00 agetty
0 S     0  4588     1  0  76   0 -   917 -      tty4     00:00:00 agetty
0 S     0  4589     1  0  76   0 -   916 -      tty5     00:00:00 agetty
0 S     0  4590     1  0  76   0 -   916 -      tty6     00:00:00 agetty
4 S     0 14875  4514  0  75   0 -  7073 -      ?        00:00:00 sshd
4 S     0 14878 14875  0  75   0 -  2548 wait   pts/0    00:00:00 bash
0 D     0 26815     1  0  77   0 -     0 exit   pts/0    00:00:00 cc1
4 R     0  9811 14878 86  76   0 -     0 -      pts/0    00:02:59 emerge
4 S     0 17651  4514  0  75   0 -  7036 -      ?        00:00:00 sshd
4 S     0 17654 17651  0  75   0 -  2547 wait   pts/1    00:00:00 bash
0 R     0 17661 17654  0  77   0 -  1019 -      pts/1    00:00:00 ps


odin ~ # cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 37
model name      : AMD Opteron(tm) Processor 252
stepping        : 1
cpu MHz         : 2592.234
cache size      : 1024 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm
3dnowext 3dnow pni lahf_lm
bogomips        : 5189.92
TLB size        : 1024 4K pages
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm
3dnowext 3dnow pni lahf_lm
bogomips        : 5189.92
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 37
model name      : AMD Opteron(tm) Processor 252
stepping        : 1
cpu MHz         : 2592.234
cache size      : 1024 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm
3dnowext 3dnow pni lahf_lm
bogomips        : 5184.39
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

[-- Attachment #2: Type: text/html, Size: 36960 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [gentoo-amd64] Problem with emerge on a dual-processor machine
  2006-10-31 18:46 [gentoo-amd64] Problem with emerge on a dual-processor machine Vesna Petrovic
@ 2006-10-31 18:54 ` Bob Sanders
  2006-10-31 23:39 ` [gentoo-amd64] " Duncan
  1 sibling, 0 replies; 3+ messages in thread
From: Bob Sanders @ 2006-10-31 18:54 UTC (permalink / raw
  To: gentoo-amd64

Vesna Petrovic, mused, then expounded:
> Hello,
> 
> I've just installed Gentoo on a dual-processor machine and now I'm running
> into the following problem - when I start emerge, it randomly stops and one
> of the following things happens:
>  - the machine freezes completely so that I cannot switch to another
> console or do anything
>  - if I already have multiple ssh sessions open, sometimes one of the
> sessions remains alive, but invoking any command freezes that session. Any
> attempt to kill a process has no effect.
>  - soft lockup detected on at least one cpu.
>

First try swapping around the memory modules.  If the system
exhibits the exact same pattern, then swap the cpus.

My guess is one of the low memory modules - below or at 1 GB, is flakey
or that the memory controller on one of the cpus, probably cpu 0, has
an issue.

Another thing to try is if you're running with 4-DIMMs per cpu, downgrade 
to 2 DIMMS per cpu.


Bob 
-  
See http://www.gnu.org/philosophy/no-word-attachments.html
-  
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 3+ messages in thread

* [gentoo-amd64]  Re: Problem with emerge on a dual-processor machine
  2006-10-31 18:46 [gentoo-amd64] Problem with emerge on a dual-processor machine Vesna Petrovic
  2006-10-31 18:54 ` Bob Sanders
@ 2006-10-31 23:39 ` Duncan
  1 sibling, 0 replies; 3+ messages in thread
From: Duncan @ 2006-10-31 23:39 UTC (permalink / raw
  To: gentoo-amd64

"Vesna Petrovic" <vesna.petrovic@gmail.com> posted
60bedadc0610311046q23d77c5fu671c7330a2f09a14@mail.gmail.com, excerpted
below, on  Tue, 31 Oct 2006 13:46:29 -0500:

>  The system has 2 AMD Opeteron Processors 252,  5 disks - 1IDE Maxtor
> 6B200R0 and 4 SCSI Maxtor 6L300S0, and probably irrelevant ATAPI 48X DVD-ROM
> DVD-R CD-R/RW drive, Ethernet controller: Broadcom Corporation NetXtreme
> BCM5703X Gigabit Ethernet (rev 02), RAID bus controller: Silicon Image, Inc.
> SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02), FireWire (IEEE
> 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link),
> VGA compatible controller: ATI Technologies Inc RV350 AP [Radeon 9600]
> 05:00.1, Display controller: ATI Technologies Inc RV350 AP [Radeon 9600]
> (Secondary).
>   Kernel version is 2.6.17 built using gentoo-sources.

I have but a few possibilities to suggest, but I can say I've been using a
similar system since late 2003, upgrading it over time, and know it works.
FWIW, Tyan s2885 mobo here, dual Opteron 242s, four-way SATA RAID on SiI
3114 SATA controller, 8 gig memory, Firewire, Radeon 9200 (so the ATI r200
series not the r300 series you have, mine is AGP), a DVDRW, CDRW, and older
PATA hard drive on PATA, etc.

Possibilities:

I had bum memory for awhile.  Twas a mess, until I got a BIOS upgrade that
allowed me to declock it a notch (to pc3000 level from the pc3200 it was
rated).  A memory upgrade cured that.

The usual possible hardware suspects including the power supply train
(from the one in the computer itself, to what's coming in on the line, to
the UPS if any), and possibly overclocked CPUs.  With all those hard
drives, it's possible you are simply underpowered.  I think I'm running a
550 watt rated Vantec that's rated to 650-ish spike.

At one point there was an issue with the firewire driver and x86_64 SMP. 
Since I'm not using Firewire for anything here, IIRC I disabled it in BIOS
as well as the kernel driver for it.  I've not followed up but I'd /guess/
the problem is fixed by now.  Still, it's worth trying that, and disabling
any other stuff (like USB possibly in a server environment, which it seems
you are in) you don't actually use.

Those are all shots in the dark, but this is mainly to say that what I
have here seems to work very well, at least since I fixed the memory
issue, which was simply generic memory not worth its rating and an
inability at the time to declock it in BIOS.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-10-31 23:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-31 18:46 [gentoo-amd64] Problem with emerge on a dual-processor machine Vesna Petrovic
2006-10-31 18:54 ` Bob Sanders
2006-10-31 23:39 ` [gentoo-amd64] " Duncan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox