[gentoo-user] [Hardware Error]: MC1 Error: Copyback Parity/Victim error.

public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed

* [gentoo-user] [Hardware Error]: MC1 Error: Copyback Parity/Victim error.
@ 2013-09-23 18:45 Grant
  2013-09-23 18:59 ` Paul Hartman
  0 siblings, 1 reply; 11+ messages in thread
From: Grant @ 2013-09-23 18:45 UTC (permalink / raw
  To: Gentoo mailing list

Can anyone tell me how to decipher this which has appeared in dmesg?
Google wasn't very helpful.

[Hardware Error]: MC1 Error: Copyback Parity/Victim error.
[Hardware Error]: Error Status: Corrected error, no action required.
[Hardware Error]: CPU:3 (10:2:3) MC1_STATUS[-|CE|-|-|-]: 0x9000000000000171
[Hardware Error]: cache level: L1, tx: INSN, mem-tx: EV

- Grant

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-user] [Hardware Error]: MC1 Error: Copyback Parity/Victim error.
  2013-09-23 18:45 [gentoo-user] [Hardware Error]: MC1 Error: Copyback Parity/Victim error Grant
@ 2013-09-23 18:59 ` Paul Hartman
  2013-09-23 19:15   ` Grant
  2013-09-23 20:07   ` Volker Armin Hemmann
  0 siblings, 2 replies; 11+ messages in thread
From: Paul Hartman @ 2013-09-23 18:59 UTC (permalink / raw
  To: Gentoo User

On Mon, Sep 23, 2013 at 1:45 PM, Grant <emailgrant@gmail.com> wrote:
> Can anyone tell me how to decipher this which has appeared in dmesg?
> Google wasn't very helpful.
>
> [Hardware Error]: MC1 Error: Copyback Parity/Victim error.
> [Hardware Error]: Error Status: Corrected error, no action required.
> [Hardware Error]: CPU:3 (10:2:3) MC1_STATUS[-|CE|-|-|-]: 0x9000000000000171
> [Hardware Error]: cache level: L1, tx: INSN, mem-tx: EV

Looks like machine check error, it detected an error in the L1 cache
on your CPU.

Since it says "Corrected error, no action required" I would not worry
about it. If that makes you feel any better. :)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-user] [Hardware Error]: MC1 Error: Copyback Parity/Victim error.
  2013-09-23 18:59 ` Paul Hartman
@ 2013-09-23 19:15   ` Grant
  2013-09-23 19:41     ` Ralf Ramsauer
  2013-09-23 20:07   ` Volker Armin Hemmann
  1 sibling, 1 reply; 11+ messages in thread
From: Grant @ 2013-09-23 19:15 UTC (permalink / raw
  To: Gentoo mailing list

>> Can anyone tell me how to decipher this which has appeared in dmesg?
>> Google wasn't very helpful.
>>
>> [Hardware Error]: MC1 Error: Copyback Parity/Victim error.
>> [Hardware Error]: Error Status: Corrected error, no action required.
>> [Hardware Error]: CPU:3 (10:2:3) MC1_STATUS[-|CE|-|-|-]: 0x9000000000000171
>> [Hardware Error]: cache level: L1, tx: INSN, mem-tx: EV
>
> Looks like machine check error, it detected an error in the L1 cache
> on your CPU.
>
> Since it says "Corrected error, no action required" I would not worry
> about it. If that makes you feel any better. :)

It does!  Thank you.

- Grant


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-user] [Hardware Error]: MC1 Error: Copyback Parity/Victim error.
  2013-09-23 19:15   ` Grant
@ 2013-09-23 19:41     ` Ralf Ramsauer
  0 siblings, 0 replies; 11+ messages in thread
From: Ralf Ramsauer @ 2013-09-23 19:41 UTC (permalink / raw
  To: gentoo-user

What kind of architecture / CPU?

I suppose it's an AMD CPU as this error is thrown by
"drivers/edac/mce_amd.c":

$ cd /usr/src/linux; fgrep -R "MC1 Error"
drivers/edac/mce_amd.c: pr_emerg(HW_ERR "MC1 Error: ");

$ fgrep -R "Copyback Parity/Victim error"
drivers/edac/mce_amd.c:                 pr_cont("Copyback Parity/Victim
error.\n");

Regards,
--
Ralf

On 09/23/13 21:15, Grant wrote:
>>> Can anyone tell me how to decipher this which has appeared in dmesg?
>>> Google wasn't very helpful.
>>>
>>> [Hardware Error]: MC1 Error: Copyback Parity/Victim error.
>>> [Hardware Error]: Error Status: Corrected error, no action required.
>>> [Hardware Error]: CPU:3 (10:2:3) MC1_STATUS[-|CE|-|-|-]: 0x9000000000000171
>>> [Hardware Error]: cache level: L1, tx: INSN, mem-tx: EV
>> Looks like machine check error, it detected an error in the L1 cache
>> on your CPU.
>>
>> Since it says "Corrected error, no action required" I would not worry
>> about it. If that makes you feel any better. :)
> It does!  Thank you.
>
> - Grant
>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-user] [Hardware Error]: MC1 Error: Copyback Parity/Victim error.
  2013-09-23 18:59 ` Paul Hartman
  2013-09-23 19:15   ` Grant
@ 2013-09-23 20:07   ` Volker Armin Hemmann
  2013-09-23 20:24     ` Ralf Ramsauer
  1 sibling, 1 reply; 11+ messages in thread
From: Volker Armin Hemmann @ 2013-09-23 20:07 UTC (permalink / raw
  To: gentoo-user

Am 23.09.2013 20:59, schrieb Paul Hartman:
> On Mon, Sep 23, 2013 at 1:45 PM, Grant <emailgrant@gmail.com> wrote:
>> Can anyone tell me how to decipher this which has appeared in dmesg?
>> Google wasn't very helpful.
>>
>> [Hardware Error]: MC1 Error: Copyback Parity/Victim error.
>> [Hardware Error]: Error Status: Corrected error, no action required.
>> [Hardware Error]: CPU:3 (10:2:3) MC1_STATUS[-|CE|-|-|-]: 0x9000000000000171
>> [Hardware Error]: cache level: L1, tx: INSN, mem-tx: EV
> Looks like machine check error, it detected an error in the L1 cache
> on your CPU.
>
> Since it says "Corrected error, no action required" I would not worry
> about it. If that makes you feel any better. :)
>
>

since those errors are rare, I would worry about it.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-user] [Hardware Error]: MC1 Error: Copyback Parity/Victim error.
  2013-09-23 20:07   ` Volker Armin Hemmann
@ 2013-09-23 20:24     ` Ralf Ramsauer
  2013-09-23 20:43       ` Ralf Ramsauer
  0 siblings, 1 reply; 11+ messages in thread
From: Ralf Ramsauer @ 2013-09-23 20:24 UTC (permalink / raw
  To: gentoo-user

I share this opinion.
The message says - even if the error was corrected - that there's
something dramatically wrong with your - i suppose - CPU.
"Corrected error" might imply, that some low-level feature got disabled
in order to prevent furher errors.

Does this error appear only once at early boot or frequently?

Regards,
--
Ralf

On 09/23/13 22:07, Volker Armin Hemmann wrote:
> Am 23.09.2013 20:59, schrieb Paul Hartman:
>> On Mon, Sep 23, 2013 at 1:45 PM, Grant <emailgrant@gmail.com> wrote:
>>> Can anyone tell me how to decipher this which has appeared in dmesg?
>>> Google wasn't very helpful.
>>>
>>> [Hardware Error]: MC1 Error: Copyback Parity/Victim error.
>>> [Hardware Error]: Error Status: Corrected error, no action required.
>>> [Hardware Error]: CPU:3 (10:2:3) MC1_STATUS[-|CE|-|-|-]: 0x9000000000000171
>>> [Hardware Error]: cache level: L1, tx: INSN, mem-tx: EV
>> Looks like machine check error, it detected an error in the L1 cache
>> on your CPU.
>>
>> Since it says "Corrected error, no action required" I would not worry
>> about it. If that makes you feel any better. :)
>>
>>
> since those errors are rare, I would worry about it.
>
   


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-user] [Hardware Error]: MC1 Error: Copyback Parity/Victim error.
  2013-09-23 20:24     ` Ralf Ramsauer
@ 2013-09-23 20:43       ` Ralf Ramsauer
  2013-09-24  8:01         ` Grant
  0 siblings, 1 reply; 11+ messages in thread
From: Ralf Ramsauer @ 2013-09-23 20:43 UTC (permalink / raw
  To: gentoo-user

I had a deeper look into the kernel sources:

Your error message is exactly thrown by
static bool k8_mc1_mce(u16 ec, u8 xec)

So probably you have a K8 ;-)

Have a look at:
http://www.redhat.com/archives/rhelv5-list/2007-October/msg00075.html

It *might* be an error concerning ECC error correction. Did you recently
change any hardware?

Could you attach your /proc/cpuinfo?

Regards,
--
Ralf

On 09/23/13 22:24, Ralf Ramsauer wrote:
> I share this opinion.
> The message says - even if the error was corrected - that there's
> something dramatically wrong with your - i suppose - CPU.
> "Corrected error" might imply, that some low-level feature got disabled
> in order to prevent furher errors.
>
> Does this error appear only once at early boot or frequently?
>
> Regards,
> --
> Ralf
>
> On 09/23/13 22:07, Volker Armin Hemmann wrote:
>> Am 23.09.2013 20:59, schrieb Paul Hartman:
>>> On Mon, Sep 23, 2013 at 1:45 PM, Grant <emailgrant@gmail.com> wrote:
>>>> Can anyone tell me how to decipher this which has appeared in dmesg?
>>>> Google wasn't very helpful.
>>>>
>>>> [Hardware Error]: MC1 Error: Copyback Parity/Victim error.
>>>> [Hardware Error]: Error Status: Corrected error, no action required.
>>>> [Hardware Error]: CPU:3 (10:2:3) MC1_STATUS[-|CE|-|-|-]: 0x9000000000000171
>>>> [Hardware Error]: cache level: L1, tx: INSN, mem-tx: EV
>>> Looks like machine check error, it detected an error in the L1 cache
>>> on your CPU.
>>>
>>> Since it says "Corrected error, no action required" I would not worry
>>> about it. If that makes you feel any better. :)
>>>
>>>
>> since those errors are rare, I would worry about it.
>>
>    
>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-user] [Hardware Error]: MC1 Error: Copyback Parity/Victim error.
  2013-09-23 20:43       ` Ralf Ramsauer
@ 2013-09-24  8:01         ` Grant
  2013-09-24 10:37           ` Ralf Ramsauer
  0 siblings, 1 reply; 11+ messages in thread
From: Grant @ 2013-09-24  8:01 UTC (permalink / raw
  To: Gentoo mailing list

[-- Attachment #1: Type: text/plain, Size: 1888 bytes --]

> I had a deeper look into the kernel sources:
>
> Your error message is exactly thrown by
> static bool k8_mc1_mce(u16 ec, u8 xec)
>
> So probably you have a K8 ;-)
>
> Have a look at:
> http://www.redhat.com/archives/rhelv5-list/2007-October/msg00075.html

I read it, that one sounds like a correctable ECC RAM error.

> It *might* be an error concerning ECC error correction. Did you recently
> change any hardware?

No hardware changed in a very long time.

> Could you attach your /proc/cpuinfo?

Sure, I've attached it.  I'm changing hosts and machines shortly and
I've only seen this error once so I'm thinking I don't need to take
action.

- Grant


>> I share this opinion.
>> The message says - even if the error was corrected - that there's
>> something dramatically wrong with your - i suppose - CPU.
>> "Corrected error" might imply, that some low-level feature got disabled
>> in order to prevent furher errors.
>>
>> Does this error appear only once at early boot or frequently?
>>
>> Regards,
>> --
>> Ralf
>>
>> On 09/23/13 22:07, Volker Armin Hemmann wrote:
>>> Am 23.09.2013 20:59, schrieb Paul Hartman:
>>>> On Mon, Sep 23, 2013 at 1:45 PM, Grant <emailgrant@gmail.com> wrote:
>>>>> Can anyone tell me how to decipher this which has appeared in dmesg?
>>>>> Google wasn't very helpful.
>>>>>
>>>>> [Hardware Error]: MC1 Error: Copyback Parity/Victim error.
>>>>> [Hardware Error]: Error Status: Corrected error, no action required.
>>>>> [Hardware Error]: CPU:3 (10:2:3) MC1_STATUS[-|CE|-|-|-]: 0x9000000000000171
>>>>> [Hardware Error]: cache level: L1, tx: INSN, mem-tx: EV
>>>> Looks like machine check error, it detected an error in the L1 cache
>>>> on your CPU.
>>>>
>>>> Since it says "Corrected error, no action required" I would not worry
>>>> about it. If that makes you feel any better. :)
>>>>
>>>>
>>> since those errors are rare, I would worry about it.

[-- Attachment #2: cpuinfo.txt --]
[-- Type: text/plain, Size: 3544 bytes --]

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 2
model name	: Quad-Core AMD Opteron(tm) Processor 2352
stepping	: 3
microcode	: 0x1000065
cpu MHz		: 2094.812
cache size	: 512 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs hw_pstate npt lbrv svm_lock
bogomips	: 4189.62
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor	: 1
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 2
model name	: Quad-Core AMD Opteron(tm) Processor 2352
stepping	: 3
microcode	: 0x1000065
cpu MHz		: 2094.812
cache size	: 512 KB
physical id	: 0
siblings	: 4
core id		: 1
cpu cores	: 4
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs hw_pstate npt lbrv svm_lock
bogomips	: 4189.62
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor	: 2
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 2
model name	: Quad-Core AMD Opteron(tm) Processor 2352
stepping	: 3
microcode	: 0x1000065
cpu MHz		: 2094.812
cache size	: 512 KB
physical id	: 0
siblings	: 4
core id		: 2
cpu cores	: 4
apicid		: 2
initial apicid	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs hw_pstate npt lbrv svm_lock
bogomips	: 4189.62
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor	: 3
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 2
model name	: Quad-Core AMD Opteron(tm) Processor 2352
stepping	: 3
microcode	: 0x1000065
cpu MHz		: 2094.812
cache size	: 512 KB
physical id	: 0
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 3
initial apicid	: 3
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs hw_pstate npt lbrv svm_lock
bogomips	: 4189.62
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-user] [Hardware Error]: MC1 Error: Copyback Parity/Victim error.
  2013-09-24  8:01         ` Grant
@ 2013-09-24 10:37           ` Ralf Ramsauer
  2013-09-24 12:18             ` Thanasis
  2013-09-24 12:24             ` Volker Armin Hemmann
  0 siblings, 2 replies; 11+ messages in thread
From: Ralf Ramsauer @ 2013-09-24 10:37 UTC (permalink / raw
  To: gentoo-user

A friend of mine told me, that AMD also had some trouble concerning TLB
on that architecture (translation lookaside buffer).
Unfortunatelly I have no references for that issue.

I would keep a eye on that error, and if your system must be
highly-available, i would even change hardware.

Regards,

--
Ralf

On 09/24/13 10:01, Grant wrote:
>> I had a deeper look into the kernel sources:
>>
>> Your error message is exactly thrown by
>> static bool k8_mc1_mce(u16 ec, u8 xec)
>>
>> So probably you have a K8 ;-)
>>
>> Have a look at:
>> http://www.redhat.com/archives/rhelv5-list/2007-October/msg00075.html
> I read it, that one sounds like a correctable ECC RAM error.
>
>> It *might* be an error concerning ECC error correction. Did you recently
>> change any hardware?
> No hardware changed in a very long time.
>
>> Could you attach your /proc/cpuinfo?
> Sure, I've attached it.  I'm changing hosts and machines shortly and
> I've only seen this error once so I'm thinking I don't need to take
> action.
>
> - Grant
>
>
>>> I share this opinion.
>>> The message says - even if the error was corrected - that there's
>>> something dramatically wrong with your - i suppose - CPU.
>>> "Corrected error" might imply, that some low-level feature got disabled
>>> in order to prevent furher errors.
>>>
>>> Does this error appear only once at early boot or frequently?
>>>
>>> Regards,
>>> --
>>> Ralf
>>>
>>> On 09/23/13 22:07, Volker Armin Hemmann wrote:
>>>> Am 23.09.2013 20:59, schrieb Paul Hartman:
>>>>> On Mon, Sep 23, 2013 at 1:45 PM, Grant <emailgrant@gmail.com> wrote:
>>>>>> Can anyone tell me how to decipher this which has appeared in dmesg?
>>>>>> Google wasn't very helpful.
>>>>>>
>>>>>> [Hardware Error]: MC1 Error: Copyback Parity/Victim error.
>>>>>> [Hardware Error]: Error Status: Corrected error, no action required.
>>>>>> [Hardware Error]: CPU:3 (10:2:3) MC1_STATUS[-|CE|-|-|-]: 0x9000000000000171
>>>>>> [Hardware Error]: cache level: L1, tx: INSN, mem-tx: EV
>>>>> Looks like machine check error, it detected an error in the L1 cache
>>>>> on your CPU.
>>>>>
>>>>> Since it says "Corrected error, no action required" I would not worry
>>>>> about it. If that makes you feel any better. :)
>>>>>
>>>>>
>>>> since those errors are rare, I would worry about it.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-user] [Hardware Error]: MC1 Error: Copyback Parity/Victim error.
  2013-09-24 10:37           ` Ralf Ramsauer
@ 2013-09-24 12:18             ` Thanasis
  2013-09-24 12:24             ` Volker Armin Hemmann
  1 sibling, 0 replies; 11+ messages in thread
From: Thanasis @ 2013-09-24 12:18 UTC (permalink / raw
  To: gentoo-user; +Cc: Ralf Ramsauer

on 09/24/2013 01:37 PM Ralf Ramsauer wrote the following:
> A friend of mine told me, that AMD also had some trouble concerning TLB
> on that architecture (translation lookaside buffer).
> Unfortunatelly I have no references for that issue.

http://en.wikipedia.org/wiki/AMD_10h#TLB_Bug




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-user] [Hardware Error]: MC1 Error: Copyback Parity/Victim error.
  2013-09-24 10:37           ` Ralf Ramsauer
  2013-09-24 12:18             ` Thanasis
@ 2013-09-24 12:24             ` Volker Armin Hemmann
  1 sibling, 0 replies; 11+ messages in thread
From: Volker Armin Hemmann @ 2013-09-24 12:24 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 656 bytes --]

Completely unrelated. He got a Cache ecc error. Not tlb Bug. Also those
were fixed quickly. In hardware and software. No Problem There. I hate Auto
correction.

Am 24.09.2013 12:37 schrieb "Ralf Ramsauer" <
ralf+gentoo@ramses-pyramidenbau.de>:

A friend of mine told me, that AMD also had some trouble concerning TLB
on that architecture (translation lookaside buffer).
Unfortunatelly I have no references for that issue.

I would keep a eye on that error, and if your system must be
highly-available, i would even change hardware.

Regards,

--
Ralf

On 09/24/13 10:01, Grant wrote:
>> I had a deeper look into the kernel sources:
>>
>> Your error me...

[-- Attachment #2: Type: text/html, Size: 910 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-09-24 12:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-23 18:45 [gentoo-user] [Hardware Error]: MC1 Error: Copyback Parity/Victim error Grant
2013-09-23 18:59 ` Paul Hartman
2013-09-23 19:15   ` Grant
2013-09-23 19:41     ` Ralf Ramsauer
2013-09-23 20:07   ` Volker Armin Hemmann
2013-09-23 20:24     ` Ralf Ramsauer
2013-09-23 20:43       ` Ralf Ramsauer
2013-09-24  8:01         ` Grant
2013-09-24 10:37           ` Ralf Ramsauer
2013-09-24 12:18             ` Thanasis
2013-09-24 12:24             ` Volker Armin Hemmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox