* [gentoo-user] hw problems
@ 2019-07-22 9:46 pat
2019-07-22 10:02 ` Adam Carter
0 siblings, 1 reply; 7+ messages in thread
From: pat @ 2019-07-22 9:46 UTC (permalink / raw
To: Gentoo User
[-- Attachment #1: Type: text/plain, Size: 625 bytes --]
Hi,
Since last week my gentoo installation start to randomly freeze. The
first I've detected was during huge disk usage, another was during
hibernation, etc. I think it might be related to HDD problems, but I
want to be sure. In kernel log there are some errors, but I'm not able
to decide if those causes the freezing or not (I've saw such messages
earlier too, so I'm not sure). So, is there a good diagnostic tool to
check HW and mainly HDD? What I need to decide is if buying new HDD will
fix the issue or not.
Thanks a lot
Pat
----------------------------------------
Freehosting PIPNI - http://www.pipni.cz/
[-- Attachment #2: kern.log.0.gz --]
[-- Type: application/x-gzip, Size: 86444 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-user] hw problems
2019-07-22 9:46 [gentoo-user] hw problems pat
@ 2019-07-22 10:02 ` Adam Carter
2019-07-22 10:29 ` Raffaele Belardi
0 siblings, 1 reply; 7+ messages in thread
From: Adam Carter @ 2019-07-22 10:02 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 646 bytes --]
On Mon, Jul 22, 2019 at 7:47 PM <pat@xvalheru.org> wrote:
> Hi,
>
> Since last week my gentoo installation start to randomly freeze. The
> first I've detected was during huge disk usage, another was during
> hibernation, etc. I think it might be related to HDD problems, but I
> want to be sure. In kernel log there are some errors, but I'm not able
> to decide if those causes the freezing or not (I've saw such messages
> earlier too, so I'm not sure). So, is there a good diagnostic tool to
> check HW and mainly HDD? What I need to decide is if buying new HDD will
> fix the issue or not
Install smartmontools then
# smartctl -a /dev/sda
[-- Attachment #2: Type: text/html, Size: 1046 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-user] hw problems
2019-07-22 10:02 ` Adam Carter
@ 2019-07-22 10:29 ` Raffaele Belardi
2019-07-22 10:53 ` Mick
0 siblings, 1 reply; 7+ messages in thread
From: Raffaele Belardi @ 2019-07-22 10:29 UTC (permalink / raw
To: gentoo-user
Adam Carter wrote:
> On Mon, Jul 22, 2019 at 7:47 PM <pat@xvalheru.org <mailto:pat@xvalheru.org>> wrote:
>
> Hi,
>
> Since last week my gentoo installation start to randomly freeze. The
> first I've detected was during huge disk usage, another was during
> hibernation, etc. I think it might be related to HDD problems, but I
> want to be sure. In kernel log there are some errors, but I'm not able
> to decide if those causes the freezing or not (I've saw such messages
> earlier too, so I'm not sure). So, is there a good diagnostic tool to
> check HW and mainly HDD? What I need to decide is if buying new HDD will
> fix the issue or not
>
>
> Install smartmontools then
>
> # smartctl -a /dev/sda
>
>
I think Adam answered the OP but I just wanted to understand the kernel log:
- the errors are from device pcieport 0000:00:1c.0
- according to "pci 0000:00:1c.0: [8086:9d14] type 01 class 0x060400", this is should be a
PCI bridge.
So the error may come from the bridge itself or from a device attached to the bridge, I
suppose?
- the disk is attached to ata1: "ata1.00: ATA-10: ST2000LM015-2E8174, SDM1, max UDMA/133"
- ata1 is "ata1: SATA max UDMA/133 abar m2048@0xd1133000 port 0xd1133100 irq 122"
Is there a way to understand where the ata1 is physically attached to?
In other words, can one tell from the log if the error comes from the ata1 device or
something else?
thanks,
raffaele
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-user] hw problems
2019-07-22 10:29 ` Raffaele Belardi
@ 2019-07-22 10:53 ` Mick
2019-07-23 0:20 ` Adam Carter
0 siblings, 1 reply; 7+ messages in thread
From: Mick @ 2019-07-22 10:53 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 1961 bytes --]
On Monday, 22 July 2019 11:29:26 BST Raffaele Belardi wrote:
> Adam Carter wrote:
> > On Mon, Jul 22, 2019 at 7:47 PM <pat@xvalheru.org
<mailto:pat@xvalheru.org>> wrote:
> > Hi,
> >
> > Since last week my gentoo installation start to randomly freeze. The
> > first I've detected was during huge disk usage, another was during
> > hibernation, etc. I think it might be related to HDD problems, but I
> > want to be sure. In kernel log there are some errors, but I'm not able
> > to decide if those causes the freezing or not (I've saw such messages
> > earlier too, so I'm not sure). So, is there a good diagnostic tool to
> > check HW and mainly HDD? What I need to decide is if buying new HDD
> > will
> > fix the issue or not
> >
> > Install smartmontools then
> >
> > # smartctl -a /dev/sda
>
> I think Adam answered the OP but I just wanted to understand the kernel log:
>
> - the errors are from device pcieport 0000:00:1c.0
> - according to "pci 0000:00:1c.0: [8086:9d14] type 01 class 0x060400", this
> is should be a PCI bridge.
>
> So the error may come from the bridge itself or from a device attached to
> the bridge, I suppose?
I think device [8086:9d14] which errors out is a wireless card ... ?
> - the disk is attached to ata1: "ata1.00: ATA-10: ST2000LM015-2E8174, SDM1,
> max UDMA/133" - ata1 is "ata1: SATA max UDMA/133 abar m2048@0xd1133000 port
> 0xd1133100 irq 122"
>
> Is there a way to understand where the ata1 is physically attached to?
> In other words, can one tell from the log if the error comes from the ata1
> device or something else?
>
> thanks,
>
> raffaele
lspci will show the PCI port, but I think the error looks like it is related
to the wireless card, which is also bouncing like mad. I'd check the correct
driver is available and the firmware too, especially if it needs to be
configured manually (not all are available in linux-firmware).
--
Regards,
Mick
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-user] hw problems
2019-07-22 10:53 ` Mick
@ 2019-07-23 0:20 ` Adam Carter
2019-07-23 0:29 ` Adam Carter
0 siblings, 1 reply; 7+ messages in thread
From: Adam Carter @ 2019-07-23 0:20 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 553 bytes --]
> I think device [8086:9d14] which errors out is a wireless card ... ?
>
>
Looking up that ID via https://pci-ids.ucw.cz/read/PC/8086/9d14 it shows up
as "Sunrise Point-LP PCI Express Root Port #5"
The closest hit i can find otherwise is;
grep '8086 9d' /usr/share/misc/pci.ids
8086 9d60 100 Series PCH/Sunrise Point PCH I2C0 [Skylake/Kaby Lake LPSS
I2C]
Still, I like the idea of removing (or disabling in BIOS) the WLAN
interface to see if that helps. Otherwise i'd be removing any other
non-essential PCIE cards, and reseating any essential ones.
[-- Attachment #2: Type: text/html, Size: 1007 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-user] hw problems
2019-07-23 0:20 ` Adam Carter
@ 2019-07-23 0:29 ` Adam Carter
2019-07-23 7:14 ` SOLVED " pat
0 siblings, 1 reply; 7+ messages in thread
From: Adam Carter @ 2019-07-23 0:29 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 978 bytes --]
Also these look nasty;
Jul 15 01:07:25 draken-korin kernel: mce: [Hardware Error]: Machine check
events logged
Jul 15 01:07:25 draken-korin kernel: mce: [Hardware Error]: CPU 0: Machine
Check: 0 Bank 6: ee2000000040110a
Jul 15 01:07:25 draken-korin kernel: mce: [Hardware Error]: TSC 0 ADDR
fef1cf80 MISC 43880010086
Jul 15 01:07:25 draken-korin kernel: mce: [Hardware Error]: PROCESSOR
0:806e9 TIME 1563152768 SOCKET 0 APIC 0 microcode 30
Jul 15 01:07:25 draken-korin kernel: mce: [Hardware Error]: Machine check
events logged
Jul 15 01:07:25 draken-korin kernel: mce: [Hardware Error]: CPU 0: Machine
Check: 0 Bank 7: ee2000000040110a
Jul 15 01:07:25 draken-korin kernel: mce: [Hardware Error]: TSC 0 ADDR
fef1ff00 MISC 43880010086
Jul 15 01:07:25 draken-korin kernel: mce: [Hardware Error]: PROCESSOR
0:806e9 TIME 1563152768 SOCKET 0 APIC 0 microcode 30
Is the system overheating?
https://en.wikipedia.org/wiki/Machine-check_exception has some links to MCE
decode programs
[-- Attachment #2: Type: text/html, Size: 1164 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* SOLVED Re: [gentoo-user] hw problems
2019-07-23 0:29 ` Adam Carter
@ 2019-07-23 7:14 ` pat
0 siblings, 0 replies; 7+ messages in thread
From: pat @ 2019-07-23 7:14 UTC (permalink / raw
To: gentoo-user
On 2019-07-23 02:29, Adam Carter wrote:
> Also these look nasty;
> Jul 15 01:07:25 draken-korin kernel: mce: [Hardware Error]: Machine
> check events logged
> Jul 15 01:07:25 draken-korin kernel: mce: [Hardware Error]: CPU 0:
> Machine Check: 0 Bank 6: ee2000000040110a
> Jul 15 01:07:25 draken-korin kernel: mce: [Hardware Error]: TSC 0 ADDR
> fef1cf80 MISC 43880010086
> Jul 15 01:07:25 draken-korin kernel: mce: [Hardware Error]: PROCESSOR
> 0:806e9 TIME 1563152768 SOCKET 0 APIC 0 microcode 30
> Jul 15 01:07:25 draken-korin kernel: mce: [Hardware Error]: Machine
> check events logged
> Jul 15 01:07:25 draken-korin kernel: mce: [Hardware Error]: CPU 0:
> Machine Check: 0 Bank 7: ee2000000040110a
> Jul 15 01:07:25 draken-korin kernel: mce: [Hardware Error]: TSC 0 ADDR
> fef1ff00 MISC 43880010086
> Jul 15 01:07:25 draken-korin kernel: mce: [Hardware Error]: PROCESSOR
> 0:806e9 TIME 1563152768 SOCKET 0 APIC 0 microcode 30
>
> Is the system overheating?
>
> https://en.wikipedia.org/wiki/Machine-check_exception has some links
> to MCE decode programs
Hi,
I want to thank to all for suggestions. The problem was that RAM had
died. It was slow thus it made me to think that it is HDD.
Thanks again.
Pat
----------------------------------------
Freehosting PIPNI - http://www.pipni.cz/
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-07-23 7:14 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-07-22 9:46 [gentoo-user] hw problems pat
2019-07-22 10:02 ` Adam Carter
2019-07-22 10:29 ` Raffaele Belardi
2019-07-22 10:53 ` Mick
2019-07-23 0:20 ` Adam Carter
2019-07-23 0:29 ` Adam Carter
2019-07-23 7:14 ` SOLVED " pat
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox