From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13320 invoked from network); 12 May 2004 02:42:30 +0000 Received: from smtp.gentoo.org (128.193.0.39) by eagle.gentoo.oregonstate.edu with DES-CBC3-SHA encrypted SMTP; 12 May 2004 02:42:30 +0000 Received: from lists.gentoo.org ([128.193.0.34] helo=eagle.gentoo.org) by smtp.gentoo.org with esmtp (Exim 4.24) id 1BNjhW-0005wz-N3 for arch-gentoo-dev@lists.gentoo.org; Wed, 12 May 2004 02:42:30 +0000 Received: (qmail 10627 invoked by uid 50004); 12 May 2004 02:42:29 +0000 Mailing-List: contact gentoo-dev-help@gentoo.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@gentoo.org Received: (qmail 13390 invoked from network); 12 May 2004 02:42:28 +0000 Date: Tue, 11 May 2004 22:42:26 -0400 From: Josh Glover To: Gentoo Dev Message-ID: <20040512024226.GB16857%jmglov@jmglov.net> References: <200405111407.58909.gentoo-dev@gnosys.biz> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="yEPQxsgoJgBvi8ip" Content-Disposition: inline In-Reply-To: <200405111407.58909.gentoo-dev@gnosys.biz> User-Agent: Mutt/1.5.4i-ja.1 Subject: Re: [gentoo-dev] Major MCE problem with SMP on Gentoo kernels X-Archives-Salt: 88d09d22-a83c-4b14-b0f2-ef5c17c1be10 X-Archives-Hash: 122802b7345ff4bd0c282424c3fbcb9a --yEPQxsgoJgBvi8ip Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Quoth Kevin (Tue 2004-05-11 02:07:58PM -0400): > In summary, my problem is this: of those that I've tried, I can't get any= =20 > Gentoo kernel to handle SMP operation during major CPU activity (like=20 > emerging packages) for more than about 5 or 10 minutes. Invariably,=20 > during such activity, I get a kernel panic---most often with words on the= =20 > console about Machine Check Exception 000000...004 (this number from=20 > memory so it may be off). [...] > The only way that I can get reliable, stable operation with a Gentoo=20 > kernel and distribution is if I build a kernel without support for SMP. = =20 [...] > The machine is a Dell PowerEdge1600SC with a PERC-3/SC SCSI RAID=20 > controller (using AMI megaraid2 driver) and a LSI Logic Corp controller= =20 > (using Fusion MPT base driver) for the SCSI DAT and with dual 2.4GHz Xeon= =20 > processors, each having a 512KB L2 Cache. Running Gentoo with a 2.6.5 SMP kernel on a Dell PowerEdge 400SC: : jmglov@jglover; uname -a Linux jglover 2.6.5-gentoo-r1 #1 SMP Fri Apr 30 17:37:18 EDT 2004 i686 Inte= l(R) Pentium(R) 4 CPU 2.40GHz GenuineIntel GNU/Linux > Output from /proc/cpuinfo is: : jmglov@jglover; cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 2.40GHz stepping : 9 cpu MHz : 2395.027 cache size : 512 KB physical id : 0 siblings : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca = cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid bogomips : 4718.59 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 2.40GHz stepping : 9 cpu MHz : 2395.027 cache size : 512 KB physical id : 0 siblings : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca = cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid bogomips : 4767.74 > I'm a recent Gentoo convert. I think it's an excellent improvement on th= e=20 > traditional Linux distros, and I'd really like to use it on my server,=20 > but as long as this problem with SMP is present, I just can't. I really do not think it is a Gentoo issue. I have run Gentoo on quite a few SMP boxen over the past several years, and never had problems like you describe. Sounds like a hardware issue to me, unless you are using (or were using) some really bogus CFLAGS. > PS. FWIW, I'll add that I have a very vague memory while watching text fl= y=20 > up the screen during bootstrap.sh or emerge system (this was a stage 1=20 > install before I installed SuSE over it) of seeing some warning about=20 > something being unsafe with SMP. Do I need to have some setting or other= =20 > turned off for some parts of a stage 1 install with a dual CPU system? Nope. Quoth Kevin (Tue, 11 May 2004 15:38:35 -0400): > Ok. Thanks for the suggestion. But what about this: Dell has a utility > partition and some programs for doing exhaustive testing of all the > hardware in the server. If I run the most thorough set of tests > available in this utility partition and I get a clean bill of health, > is that a reliable indication that there are no hardware problems? Nope. Tragically, it usually works the other way around: hardware test suites are unlikely to give you a false positive, but if your hardware passes, that does not mean you are safe. Your issue might be heat- related, and your CPUs have to heat up for quite some time before they choke. Combine this with some other issue (maybe you optimised a bit aggressively when building your kernel?), and you have a tricky issue for a hardware tester to catch. Quoth Kevin (Tue, 11 May 2004 17:31:32 -0400): > Honestly, I'm thinking that I may have somehow built some software > (during the stage 1 installation process) that is causing these > problems, but I followed the Gentoo Handbook for doing a stage 1 > installation pretty rigidly, so I'm not sure what I might have done to > cause that. Why did you do a Stage 1, just out of curiousity. I recommend doing at least one Stage 1 install for newcomers to Gentoo, just for educational purposes, but after that, go Stage 3 and use as many binary packages as you can! There exists a Stage 3 tarball for your architecture--the Pentium4 one, so why not use that, just to make sure your base system is solid? > When I did the bootstrap.sh and emerge system, I was > running the kernel that I booted from the boot CD (2004.0 I think, and > probably even the smp kernel that was on that CD---IIRC, the 2004.1 > boot CD has some problems that prevent the use of the smp kernel on > that CD). I don't remember that, but I cannot say for certain that I have tried the 2004.1 universal x86 CD with the SMP kernel. > Are there some compiler flags or other configurable settings that, if > set to certain values during the bootstrap.sh or emerge system steps, > could end up generating software (perhaps when I built my own gcc?) > that would cause these MCEs to be thrown? I dunno, why don't you post your CFLAGS and MAKEOPTS from your make.conf here? Quoth Kevin (Tue, 11 May 2004 17:37:39 -0400): > On Tuesday 11 May 2004 15:38, Paul de Vrieze wrote: > >> What if you take the kernel from SUSE, > > I haven't tried installing Gentoo with my SuSE kernel running. Huh... > what a concept. With all the modularity of those default distro > kernels, would that even work? Maybe I'd need the kernel, the > System.map, and the /lib/modules/`uname -r` directory? Yes, you can install Gentoo while running *any* kernel. As long as you can chroot, you can install Gentoo. See my Faketoo for an example: http://forums.gentoo.org/viewtopic.php?p=3D1082580 Note that I do not actually build a kernel and setup the bootloader and so forth, since I do not need to boot my jailed Gentoo installation--it is just for ebuild development. However, nothing is stopping *you* from doing it. :) --=20 Josh Glover GPG keyID 0xDE8A3103 (C3E4 FA9E 1E07 BBDB 6D8B 07AB 2BF1 67A1 DE8A 3103) gpg --keyserver pgp.mit.edu --recv-keys DE8A3103 --yEPQxsgoJgBvi8ip Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) iD8DBQFAoY8SK/Fnod6KMQMRAggkAJ9g+Hx2bCshuqwMPX7a7ixvXCQ3twCfUyUR BpRYEKYsStNzdIFyHtDY/zY= =cDKf -----END PGP SIGNATURE----- --yEPQxsgoJgBvi8ip--