* [gentoo-user] SSD or MoBo playing up?
@ 2020-06-18 22:47 Michael
2020-06-19 0:28 ` Adam Carter
0 siblings, 1 reply; 4+ messages in thread
From: Michael @ 2020-06-18 22:47 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 6010 bytes --]
It started thus:
Jun 18 17:52:45 asus kernel: ata1.00: exception Emask 0x0 SAct 0x2 SErr 0x0
action 0x6 frozen
Jun 18 17:52:45 asus kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Jun 18 17:52:45 asus kernel: ata1.00: cmd 61/20:08:60:f1:90/00:00:02:00:00/40
tag 1 ncq dma 16384 out\x0a res 40/00:01:00:4f:c2/00:00:00:00:00/00
Emask 0x4 (timeout)
Jun 18 17:52:45 asus kernel: ata1.00: status: { DRDY }
Jun 18 17:52:45 asus kernel: ata1: hard resetting link
Jun 18 17:52:55 asus kernel: ata1: softreset failed (1st FIS failed)
Jun 18 17:52:55 asus kernel: ata1: hard resetting link
Jun 18 17:53:05 asus kernel: ata1: softreset failed (1st FIS failed)
Jun 18 17:53:05 asus kernel: ata1: hard resetting link
Jun 18 17:53:40 asus kernel: ata1: softreset failed (1st FIS failed)
Jun 18 17:53:40 asus kernel: ata1: limiting SATA link speed to 3.0 Gbps
Jun 18 17:53:40 asus kernel: ata1: hard resetting link
Jun 18 17:53:45 asus kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl
320)
Jun 18 17:53:45 asus kernel: ata1.00: link online but device misclassified
Jun 18 17:53:51 asus kernel: ata1.00: qc timeout (cmd 0xec)
Jun 18 17:53:51 asus kernel: ata1.00: failed to IDENTIFY (I/O error,
err_mask=0x4)
Jun 18 17:53:51 asus kernel: ata1.00: revalidation failed (errno=-5)
Jun 18 17:53:51 asus kernel: ata1: hard resetting link
Jun 18 17:54:01 asus kernel: ata1: softreset failed (1st FIS failed)
Jun 18 17:54:01 asus kernel: ata1: hard resetting link
Jun 18 17:54:11 asus kernel: ata1: softreset failed (1st FIS failed)
Jun 18 17:54:11 asus kernel: ata1: hard resetting link
Jun 18 17:54:46 asus kernel: ata1: softreset failed (1st FIS failed)
Jun 18 17:54:46 asus kernel: ata1: limiting SATA link speed to 1.5 Gbps
Jun 18 17:54:46 asus kernel: ata1: hard resetting link
Jun 18 17:54:51 asus kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl
310)
Jun 18 17:54:51 asus kernel: ata1.00: link online but device misclassified
Jun 18 17:55:01 asus kernel: ata1.00: qc timeout (cmd 0xec)
Jun 18 17:55:01 asus kernel: ata1.00: failed to IDENTIFY (I/O error,
err_mask=0x4)
Jun 18 17:55:01 asus kernel: ata1.00: revalidation failed (errno=-5)
Jun 18 17:55:01 asus kernel: ata1: hard resetting link
Jun 18 17:55:11 asus kernel: ata1: softreset failed (1st FIS failed)
[snip ...]
Jun 18 17:57:32 asus kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl
310)
Jun 18 17:57:32 asus kernel: ata1.00: link online but device misclassified
Jun 18 17:57:32 asus kernel: ata1: EH complete
Jun 18 17:57:32 asus kernel: sd 0:0:0:0: [sda] tag#15 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jun 18 17:57:32 asus kernel: sd 0:0:0:0: [sda] tag#15 CDB: Write(10) 2a 00 03
b5 5d 18 00 00 08 00
Jun 18 17:57:32 asus kernel: blk_update_request: I/O error, dev sda, sector
62217496 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
Jun 18 17:57:32 asus kernel: BTRFS error (device sda3): bdev /dev/sda3 errs:
wr 1, rd 0, flush 0, corrupt 0, gen 0
Jun 18 17:57:32 asus kernel: sd 0:0:0:0: [sda] tag#16 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jun 18 17:57:32 asus kernel: sd 0:0:0:0: [sda] tag#16 CDB: Write(10) 2a 00 08
7b 76 68 00 00 08 00
Jun 18 17:57:32 asus kernel: blk_update_request: I/O error, dev sda, sector
142308968 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
Jun 18 17:57:32 asus kernel: BTRFS error (device sda3): bdev /dev/sda3 errs:
wr 2, rd 0, flush 0, corrupt 0, gen 0
Jun 18 17:57:32 asus kernel: sd 0:0:0:0: [sda] tag#24 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jun 18 17:57:32 asus kernel: sd 0:0:0:0: [sda] tag#17 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jun 18 17:57:32 asus kernel: sd 0:0:0:0: [sda] tag#17 CDB: Write(10) 2a 00 02
90 f1 60 00 00 20 00
Jun 18 17:57:32 asus kernel: blk_update_request: I/O error, dev sda, sector
43053408 op 0x1:(WRITE) flags 0x1800 phys_seg 4 prio class 0
Jun 18 17:57:32 asus kernel: sd 0:0:0:0: [sda] tag#24 CDB: ATA command pass
through(16) 85 06 20 00 00 00 00 00 00 00 00 00 00 00 e5 00
Jun 18 17:57:32 asus kernel: BTRFS error (device sda3): bdev /dev/sda3 errs:
wr 3, rd 0, flush 0, corrupt 0, gen 0
Jun 18 17:57:32 asus kernel: BTRFS: error (device sda3) in
btrfs_commit_transaction:2280: errno=-5 IO failure (Error while writing out
transaction)
Jun 18 17:57:32 asus kernel: BTRFS info (device sda3): forced readonly
Jun 18 17:57:32 asus kernel: BTRFS warning (device sda3): Skipping commit of
aborted transaction.
Jun 18 17:57:32 asus kernel: BTRFS: error (device sda3) in
cleanup_transaction:1832: errno=-5 IO failure
Jun 18 17:57:32 asus kernel: BTRFS info (device sda3): delayed_refs has NO
entry
Jun 18 17:57:32 asus kernel: sd 0:0:0:0: [sda] tag#18 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jun 18 17:57:32 asus kernel: sd 0:0:0:0: [sda] tag#18 CDB: Read(10) 28 00 01
2b 3b c0 00 00 20 00
Jun 18 17:57:32 asus kernel: blk_update_request: I/O error, dev sda, sector
19610560 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 0
Jun 18 17:57:32 asus kernel: BTRFS error (device sda2): bdev /dev/sda2 errs:
wr 0, rd 1, flush 0, corrupt 0, gen 0
Jun 18 17:57:32 asus kernel: BTRFS info (device sda2): no csum found for inode
14036317 start 0
Jun 18 17:57:32 asus kernel: sd 0:0:0:0: [sda] tag#4 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jun 18 17:57:32 asus kernel: sd 0:0:0:0: [sda] tag#4 CDB: Read(10) 28 00 01 2b
3b c0 00 00 20 00
Jun 18 17:57:32 asus kernel: blk_update_request: I/O error, dev sda, sector
19610560 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 0
[snip ...]
I shut it down, rebooted with LiveUSB, btrfsck'ed it and ... no errors. :-/
I fstrim'med it, fsck'ed it once more and ran a smartctl short test. Still no
errors.
/dev/sda is an SSD, with /dev/sda2 being mounted as / and /dev/sda3 mounted as
/home.
I took a fresh back up of /home, just in case, but I don't know what to make
of the above. Do you see something familiar in the syslog errors shown above?
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [gentoo-user] SSD or MoBo playing up?
2020-06-18 22:47 [gentoo-user] SSD or MoBo playing up? Michael
@ 2020-06-19 0:28 ` Adam Carter
2020-06-19 0:59 ` mad.scientist.at.large
0 siblings, 1 reply; 4+ messages in thread
From: Adam Carter @ 2020-06-19 0:28 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 534 bytes --]
On Fri, Jun 19, 2020 at 8:48 AM Michael <confabulate@kintzios.com> wrote:
> It started thus:
>
> Jun 18 17:52:45 asus kernel: ata1.00: exception Emask 0x0 SAct 0x2 SErr
> 0x0
> action 0x6 frozen
> Jun 18 17:52:45 asus kernel: ata1.00: failed command: WRITE FPDMA QUEUED
> Jun 18 17:52:45 asus kernel: ata1.00: cmd
> 61/20:08:60:f1:90/00:00:02:00:00/40
> tag 1 ncq dma 16384 out\x0a res
> 40/00:01:00:4f:c2/00:00:00:00:00/00
> Emask 0x4 (timeout)
> <snip>
>
What does smartctl -a /dev/sda report?
Tried replacing the cable?
[-- Attachment #2: Type: text/html, Size: 909 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [gentoo-user] SSD or MoBo playing up?
2020-06-19 0:28 ` Adam Carter
@ 2020-06-19 0:59 ` mad.scientist.at.large
2020-06-19 10:26 ` Michael
0 siblings, 1 reply; 4+ messages in thread
From: mad.scientist.at.large @ 2020-06-19 0:59 UTC (permalink / raw
To: Gentoo User
You also might try a known good power supply as well.
You should definitely try the drive on another system if you can't do that, and/or try another drive with the current mother board. With the errors being different over time it could easily be nearly any component in the system. Symptoms are sometimes misleading, trouble shooting is difficult for most people, sometimes even for people who are usually good at it and experienced. Root causes are often obscured and failures can propagate in unforeseen ways even for the experienced. Testing suspect components in another system and/or substituting known good components into the failing system are the most useful for trouble shooting, but you have to remember what you've tested and what it suggested and recheck when it doesn't make sense.
-- “The whole world is watching! The whole world is watching!”
Jun 18, 2020, 18:28 by adamcarter3@gmail.com:
> On Fri, Jun 19, 2020 at 8:48 AM Michael <> confabulate@kintzios.com> > wrote:
>
>> It started thus:
>>
>> Jun 18 17:52:45 asus kernel: ata1.00: exception Emask 0x0 SAct 0x2 SErr 0x0
>> action 0x6 frozen
>> Jun 18 17:52:45 asus kernel: ata1.00: failed command: WRITE FPDMA QUEUED
>> Jun 18 17:52:45 asus kernel: ata1.00: cmd 61/20:08:60:f1:90/00:00:02:00:00/40
>> tag 1 ncq dma 16384 out\x0a res 40/00:01:00:4f:c2/00:00:00:00:00/00
>> Emask 0x4 (timeout)
>> <snip>
>>
>
> What does smartctl -a /dev/sda report?
>
> Tried replacing the cable?
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [gentoo-user] SSD or MoBo playing up?
2020-06-19 0:59 ` mad.scientist.at.large
@ 2020-06-19 10:26 ` Michael
0 siblings, 0 replies; 4+ messages in thread
From: Michael @ 2020-06-19 10:26 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 5912 bytes --]
On Friday, 19 June 2020 01:59:55 BST mad.scientist.at.large@tutanota.com
wrote:
> You also might try a known good power supply as well.
>
> You should definitely try the drive on another system if you can't do that,
> and/or try another drive with the current mother board. With the errors
> being different over time it could easily be nearly any component in the
> system. Symptoms are sometimes misleading, trouble shooting is difficult
> for most people, sometimes even for people who are usually good at it and
> experienced. Root causes are often obscured and failures can propagate in
> unforeseen ways even for the experienced. Testing suspect components in
> another system and/or substituting known good components into the failing
> system are the most useful for trouble shooting, but you have to remember
> what you've tested and what it suggested and recheck when it doesn't make
> sense.
>
> -- “The whole world is watching! The whole world is watching!”
Thank you both for your replies.
The smartctl result following the short test showed no error, but it doesn't
present a log of tests either.
Following a reboot dmesg was happy too.
========================
# smartctl -a /dev/sda
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-5.4.38-gentoo] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Indilinx Barefoot 3 based SSDs
Device Model: OCZ-ARC100
Serial Number: A22L1061435000660
LU WWN Device Id: 5 e83a97 10000e885
Firmware Version: 1.00
User Capacity: 240,057,409,536 bytes [240 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jun 19 11:12:44 2020 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection:
Disabled.
Self-test execution status: ( 249) Self-test routine in progress...
90% of test remaining.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x1d) SMART execute Offline immediate.
No Auto Offline data collection
support.
Abort Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
recommended polling time: ( 0) minutes.
Extended self-test routine
recommended polling time: ( 0) minutes.
SMART Attributes Data Structure revision number: 18
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
5 Runtime_Bad_Block 0x0000 000 000 000 Old_age Offline
- 0
9 Power_On_Hours 0x0000 100 100 000 Old_age Offline
- 12417
12 Power_Cycle_Count 0x0000 100 100 000 Old_age Offline
- 3029
171 Avail_OP_Block_Count 0x0000 100 100 000 Old_age Offline
- 79543120
174 Pwr_Cycle_Ct_Unplanned 0x0000 100 100 000 Old_age Offline
- 42
195 Total_Prog_Failures 0x0000 100 100 000 Old_age Offline
- 0
196 Total_Erase_Failures 0x0000 100 100 000 Old_age Offline
- 0
197 Total_Unc_Read_Failures 0x0000 100 100 000 Old_age Offline
- 0
208 Average_Erase_Count 0x0000 100 100 000 Old_age Offline
- 1547
210 SATA_CRC_Error_Count 0x0000 100 100 000 Old_age Offline
- 0
224 In_Warranty 0x0000 100 100 000 Old_age Offline
- 1
233 Remaining_Lifetime_Perc 0x0000 049 049 000 Old_age Offline
- 49
241 Host_Writes_GiB 0x0000 100 100 000 Old_age Offline
- 19172
242 Host_Reads_GiB 0x0000 100 100 000 Old_age Offline
- 13725
249 Total_NAND_Prog_Ct_GiB 0x0000 100 100 000 Old_age Offline
- 719206231
SMART Error Log Version: 1
No Errors Logged
Warning! SMART Self-Test Log Structure error: invalid SMART checksum.
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
Selective Self-tests/Logging not supported
============================
I'm running a long test now to see if this comes up with anything, but reading
the previous error in syslog shows the SATA link failed, the kernel tried
resetting it and downgraded the speed to 3.0 Gbps, but the link kept
malfunctioning, which ultimately affected the fs.
I've reseated the cables just for good measure. I hope the PSU is still OK,
because I'd rather spend the money on a new PC.
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-06-19 10:26 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-18 22:47 [gentoo-user] SSD or MoBo playing up? Michael
2020-06-19 0:28 ` Adam Carter
2020-06-19 0:59 ` mad.scientist.at.large
2020-06-19 10:26 ` Michael
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox