public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-user] MCE in kernel
@ 2007-09-01  1:08 Alan E. Davis
  2007-09-01  2:56 ` [gentoo-user] " Alan E. Davis
  2007-09-03 19:13 ` [gentoo-user] " Dan Farrell
  0 siblings, 2 replies; 10+ messages in thread
From: Alan E. Davis @ 2007-09-01  1:08 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 870 bytes --]

I have been unable to boot into my gentoo system due to a Machine Check
Exception.  This is an AMD 64 system.  MCE for AMD is enabled in the kernel
(2.6.21 gentoo-sources).

I am unable to boot in to turn off MCE checking.  I was able to log in by
single user mode.  The MCE happens at the end of the loading of "default"
scripts, at least this is what I am seeing on the screen: xdm has been
loaded.

The problem is, I have been installing ubuntu on another partition, and it
boots fine.

If I have it right, I can download a gentoo live install disk and compile a
new kernel.  Is there a howto on this specific problem?

Thank you,

Alan Davis

-- 
Alan Davis, Kagman High School, Saipan  lngndvs@gmail.com

"An inviscid theory of flow renders the screw useless, but the need for one
non-existent."
         ---Lord Raleigh (aka John William Strutt), or else his son,

[-- Attachment #2: Type: text/html, Size: 1125 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [gentoo-user] Re: MCE in kernel
  2007-09-01  1:08 [gentoo-user] MCE in kernel Alan E. Davis
@ 2007-09-01  2:56 ` Alan E. Davis
  2007-09-01  3:39   ` Tim
  2007-09-03 19:13 ` [gentoo-user] " Dan Farrell
  1 sibling, 1 reply; 10+ messages in thread
From: Alan E. Davis @ 2007-09-01  2:56 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 1342 bytes --]

Followuing up, I removed a troublesome partition that every time was being
checked on boot, and I was able to boot ok.  Does this make sense?

Alan

On 9/1/07, Alan E. Davis <lngndvs@gmail.com> wrote:
>
> I have been unable to boot into my gentoo system due to a Machine Check
> Exception.  This is an AMD 64 system.  MCE for AMD is enabled in the kernel
> (2.6.21 gentoo-sources).
>
> I am unable to boot in to turn off MCE checking.  I was able to log in by
> single user mode.  The MCE happens at the end of the loading of "default"
> scripts, at least this is what I am seeing on the screen: xdm has been
> loaded.
>
> The problem is, I have been installing ubuntu on another partition, and it
> boots fine.
>
> If I have it right, I can download a gentoo live install disk and compile
> a new kernel.  Is there a howto on this specific problem?
>
> Thank you,
>
> Alan Davis
>
> --
> Alan Davis, Kagman High School, Saipan  lngndvs@gmail.com
>
> "An inviscid theory of flow renders the screw useless, but the need for
> one non-existent."
>          ---Lord Raleigh (aka John William Strutt), or else his son,




-- 
Alan Davis, Kagman High School, Saipan  lngndvs@gmail.com

"An inviscid theory of flow renders the screw useless, but the need for one
non-existent."
         ---Lord Raleigh (aka John William Strutt), or else his son,

[-- Attachment #2: Type: text/html, Size: 2100 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-user] Re: MCE in kernel
  2007-09-01  2:56 ` [gentoo-user] " Alan E. Davis
@ 2007-09-01  3:39   ` Tim
  2007-09-01  4:08     ` Alan E. Davis
  0 siblings, 1 reply; 10+ messages in thread
From: Tim @ 2007-09-01  3:39 UTC (permalink / raw
  To: gentoo-user

Alan E. Davis wrote:
> Followuing up, I removed a troublesome partition that every time was
> being checked on boot, and I was able to boot ok.  Does this make sense?
> 
> Alan
> 
> On 9/1/07, * Alan E. Davis* <lngndvs@gmail.com
> <mailto:lngndvs@gmail.com>> wrote:
> 
>     I have been unable to boot into my gentoo system due to a Machine
>     Check Exception.  This is an AMD 64 system.  MCE for AMD is enabled
>     in the kernel (2.6.21 gentoo-sources). 
> 
>     I am unable to boot in to turn off MCE checking.  I was able to log
>     in by single user mode.  The MCE happens at the end of the loading
>     of "default" scripts, at least this is what I am seeing on the
>     screen: xdm has been loaded. 
> 
>     The problem is, I have been installing ubuntu on another partition,
>     and it boots fine. 
> 
>     If I have it right, I can download a gentoo live install disk and
>     compile a new kernel.  Is there a howto on this specific problem?
> 
>     Thank you,
> 
>     Alan Davis
> 
>     -- 
>     Alan Davis, Kagman High School, Saipan   lngndvs@gmail.com
>     <mailto:lngndvs@gmail.com>  
> 
>     "An inviscid theory of flow renders the screw useless, but the need
>     for one non-existent."    
>              ---Lord Raleigh (aka John William Strutt), or else his son, 
> 
> 
> 
> 
> -- 
> Alan Davis, Kagman High School, Saipan  lngndvs@gmail.com
> <mailto:lngndvs@gmail.com>  
> 
> "An inviscid theory of flow renders the screw useless, but the need for
> one non-existent."    
>          ---Lord Raleigh (aka John William Strutt), or else his son,

This makes little sense without knowing what partition you removed and
what you mean by "removing" it - did you take it out of /etc/fstab? Did
you actually repartition your disk? What partition was it, what kind was
it (primary, logical, extended) and what was on it? Hopefully we can be
of more assistance with this info.

-Tim
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-user] Re: MCE in kernel
  2007-09-01  3:39   ` Tim
@ 2007-09-01  4:08     ` Alan E. Davis
  0 siblings, 0 replies; 10+ messages in thread
From: Alan E. Davis @ 2007-09-01  4:08 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 1192 bytes --]

On 9/1/07, Tim <root@pneumaticsystem.com> wrote:
>
> Thank you for the response, Tim:
>
> This makes little sense without knowing what partition you removed and
> what you mean by "removing" it - did you take it out of /etc/fstab? Did
> you actually repartition your disk? What partition was it, what kind was
> it (primary, logical, extended) and what was on it? Hopefully we can be
> of more assistance with this info.


I removed the partition from /dev/fstab.  It is a partition on /dev/sda1, a
SATA drive, with about 20% fragmentation.  I moved everything off the drive,
and will reformat, making sure it is in ext3 or other journaling format.
Something was triggering a check every boot.  (message saying the partition
was not properly mounted---I don't have access to the exact message now).

I wonder whether this kind of hardware issue might trigger the Machine Check
Exception.

Thank you,

Alan Davis



-Tim
> --
> gentoo-user@gentoo.org mailing list
>
>


-- 
Alan Davis, Kagman High School, Saipan  lngndvs@gmail.com

"An inviscid theory of flow renders the screw useless, but the need for one
non-existent."
         ---Lord Raleigh (aka John William Strutt), or else his son,

[-- Attachment #2: Type: text/html, Size: 1924 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-user] MCE in kernel
  2007-09-01  1:08 [gentoo-user] MCE in kernel Alan E. Davis
  2007-09-01  2:56 ` [gentoo-user] " Alan E. Davis
@ 2007-09-03 19:13 ` Dan Farrell
  2007-09-03 20:51   ` Alan E. Davis
  1 sibling, 1 reply; 10+ messages in thread
From: Dan Farrell @ 2007-09-03 19:13 UTC (permalink / raw
  To: gentoo-user

On Sat, 1 Sep 2007 11:08:27 +1000
"Alan E. Davis" <lngndvs@gmail.com> wrote:

> I have been unable to boot into my gentoo system due to a Machine
> Check Exception.  This is an AMD 64 system.  MCE for AMD is enabled
> in the kernel (2.6.21 gentoo-sources).
> 
> I am unable to boot in to turn off MCE checking.  

did you know you can disable this at boot time?  Check it out:

| $ grep mce /usr/src/linux/Documentation/kernel-parameters.txt
|	mce             [IA-32] Machine Check Exception
|       nomce           [IA-32] Machine Check Exception  

just add 'nomce' to your kernel boot line in grub and you should be able
to boot with MCE turned of to reconfigure.  
					-- Dan
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-user] MCE in kernel
  2007-09-03 19:13 ` [gentoo-user] " Dan Farrell
@ 2007-09-03 20:51   ` Alan E. Davis
  2007-09-03 22:29     ` Dan Farrell
  2007-09-04 15:42     ` Don Jerman
  0 siblings, 2 replies; 10+ messages in thread
From: Alan E. Davis @ 2007-09-03 20:51 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 1752 bytes --]

Thank you.  I have solved the problem for now, but live in fear that there
is something untoward going in on my hardware.

Earlier on, this was intermittent.  I also wonder whether a register was set
or a cmos flag, because after I booted the Ubuntu partition, the machine did
boot with no complaint.  It hadn't been going on long, though.  Well, I
finally was able to boot using an earlier kernel with no MCE flag set, then
recompile a newer kernel without it.

I think your solution is the better one, though.

I did follow the instructions of the boot messages and installed an mce log
translation utility, but I didn't make sense of what to do with it.

Thank you again,

Alan

On 9/4/07, Dan Farrell <dan@spore.ath.cx> wrote:
>
> On Sat, 1 Sep 2007 11:08:27 +1000
> "Alan E. Davis" <lngndvs@gmail.com> wrote:
>
> > I have been unable to boot into my gentoo system due to a Machine
> > Check Exception.  This is an AMD 64 system.  MCE for AMD is enabled
> > in the kernel (2.6.21 gentoo-sources).
> >
> > I am unable to boot in to turn off MCE checking.
>
> did you know you can disable this at boot time?  Check it out:
>
> | $ grep mce /usr/src/linux/Documentation/kernel-parameters.txt
> |       mce             [IA-32] Machine Check Exception
> |       nomce           [IA-32] Machine Check Exception
>
> just add 'nomce' to your kernel boot line in grub and you should be able
> to boot with MCE turned of to reconfigure.
>                                         -- Dan
> --
> gentoo-user@gentoo.org mailing list
>
>


-- 
Alan Davis, Kagman High School, Saipan  lngndvs@gmail.com

"An inviscid theory of flow renders the screw useless, but the need for one
non-existent."
         ---Lord Raleigh (aka John William Strutt), or else his son,

[-- Attachment #2: Type: text/html, Size: 2806 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-user] MCE in kernel
  2007-09-03 20:51   ` Alan E. Davis
@ 2007-09-03 22:29     ` Dan Farrell
  2007-09-04  5:21       ` Alan E. Davis
  2007-09-04 15:42     ` Don Jerman
  1 sibling, 1 reply; 10+ messages in thread
From: Dan Farrell @ 2007-09-03 22:29 UTC (permalink / raw
  To: gentoo-user

On Tue, 4 Sep 2007 06:51:38 +1000
"Alan E. Davis" <lngndvs@gmail.com> wrote:

> I think your solution is the better one, though.
> 
> I did follow the instructions of the boot messages and installed an
> mce log translation utility, but I didn't make sense of what to do
> with it.

The thing is, you are only masking symptoms.  There may be something
wrong, and perhaps you could save a lot of work later by fixing a
problem before it turns catastrophic.  

from http://en.wikipedia.org/wiki/Machine_Check_Exception

A Machine Check Exception, also called MCE, is a computer hardware
error which occurs when a computer's central processing unit detects an
unrecoverable hardware problem.

Normal causes for MCE errors are overheating and/or incorrect hardware
installation. Overheating can cause electrons to become more animated
and thus escape from the silicon tracks, resulting in corrupted data.
Some specific manually induced causes could be:

Overclocking (naturally increases heat output)

Poorly fitted heatsink/computer fans (the same problem can happen with
excessive dust in the CPU fan)

Computer software can also cause errors in this way (normally by
corrupting data they are reading or writing). For example:

-Software performing read or write operations to non-existent memory
regions which leads to confusion for the processor and/or the system
bus.

3rd party programs

mcelog
    mcelog is a Linux program to decode MCE's on x86-64 processors

-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-user] MCE in kernel
  2007-09-03 22:29     ` Dan Farrell
@ 2007-09-04  5:21       ` Alan E. Davis
  0 siblings, 0 replies; 10+ messages in thread
From: Alan E. Davis @ 2007-09-04  5:21 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 1956 bytes --]

Thank you Dan:

I'll look into this.  Time to tear the old box apart again.

Thank you again.

Alan

On 9/4/07, Dan Farrell <dan@spore.ath.cx> wrote:
>
> On Tue, 4 Sep 2007 06:51:38 +1000
> "Alan E. Davis" <lngndvs@gmail.com> wrote:
>
> > I think your solution is the better one, though.
> >
> > I did follow the instructions of the boot messages and installed an
> > mce log translation utility, but I didn't make sense of what to do
> > with it.
>
> The thing is, you are only masking symptoms.  There may be something
> wrong, and perhaps you could save a lot of work later by fixing a
> problem before it turns catastrophic.
>
> from http://en.wikipedia.org/wiki/Machine_Check_Exception
>
> A Machine Check Exception, also called MCE, is a computer hardware
> error which occurs when a computer's central processing unit detects an
> unrecoverable hardware problem.
>
> Normal causes for MCE errors are overheating and/or incorrect hardware
> installation. Overheating can cause electrons to become more animated
> and thus escape from the silicon tracks, resulting in corrupted data.
> Some specific manually induced causes could be:
>
> Overclocking (naturally increases heat output)
>
> Poorly fitted heatsink/computer fans (the same problem can happen with
> excessive dust in the CPU fan)
>
> Computer software can also cause errors in this way (normally by
> corrupting data they are reading or writing). For example:
>
> -Software performing read or write operations to non-existent memory
> regions which leads to confusion for the processor and/or the system
> bus.
>
> 3rd party programs
>
> mcelog
>     mcelog is a Linux program to decode MCE's on x86-64 processors
>
> --
> gentoo-user@gentoo.org mailing list
>
>


-- 
Alan Davis, Kagman High School, Saipan  lngndvs@gmail.com

"An inviscid theory of flow renders the screw useless, but the need for one
non-existent."
         ---Lord Raleigh (aka John William Strutt), or else his son,

[-- Attachment #2: Type: text/html, Size: 2707 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-user] MCE in kernel
  2007-09-03 20:51   ` Alan E. Davis
  2007-09-03 22:29     ` Dan Farrell
@ 2007-09-04 15:42     ` Don Jerman
  2007-09-04 20:41       ` Alan E. Davis
  1 sibling, 1 reply; 10+ messages in thread
From: Don Jerman @ 2007-09-04 15:42 UTC (permalink / raw
  To: gentoo-user

On 9/3/07, Alan E. Davis <lngndvs@gmail.com> wrote:
> Thank you.  I have solved the problem for now, but live in fear that there
> is something untoward going in on my hardware.
>
Quite possible.  It can also be caused by misconfiguring kernel
drivers.  I recently (accidently) selected the ATI agpart driver
instead of the Intel driver.  Most drivers correctly detect when their
corresponding device isn't present, but this one gamely tried to
manage the AGP bridge and fouled up memory whenever X started...

So you may want to review your kernel config and make sure you have
all the devices you're attempting to use.
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-user] MCE in kernel
  2007-09-04 15:42     ` Don Jerman
@ 2007-09-04 20:41       ` Alan E. Davis
  0 siblings, 0 replies; 10+ messages in thread
From: Alan E. Davis @ 2007-09-04 20:41 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 1124 bytes --]

Thank you.  I noticed that when I ran "make oldconfig" on a new kernel, the
configs were not what I'd expected.  The wrong CPU type was configured.

Alan

On 9/5/07, Don Jerman <djerman@pobox.com> wrote:
>
> On 9/3/07, Alan E. Davis <lngndvs@gmail.com> wrote:
> > Thank you.  I have solved the problem for now, but live in fear that
> there
> > is something untoward going in on my hardware.
> >
> Quite possible.  It can also be caused by misconfiguring kernel
> drivers.  I recently (accidently) selected the ATI agpart driver
> instead of the Intel driver.  Most drivers correctly detect when their
> corresponding device isn't present, but this one gamely tried to
> manage the AGP bridge and fouled up memory whenever X started...
>
> So you may want to review your kernel config and make sure you have
> all the devices you're attempting to use.
> --
> gentoo-user@gentoo.org mailing list
>
>


-- 
Alan Davis, Kagman High School, Saipan  lngndvs@gmail.com

"An inviscid theory of flow renders the screw useless, but the need for one
non-existent."
         ---Lord Raleigh (aka John William Strutt), or else his son,

[-- Attachment #2: Type: text/html, Size: 1763 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-09-04 20:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-01  1:08 [gentoo-user] MCE in kernel Alan E. Davis
2007-09-01  2:56 ` [gentoo-user] " Alan E. Davis
2007-09-01  3:39   ` Tim
2007-09-01  4:08     ` Alan E. Davis
2007-09-03 19:13 ` [gentoo-user] " Dan Farrell
2007-09-03 20:51   ` Alan E. Davis
2007-09-03 22:29     ` Dan Farrell
2007-09-04  5:21       ` Alan E. Davis
2007-09-04 15:42     ` Don Jerman
2007-09-04 20:41       ` Alan E. Davis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox