public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-user] hp H222 SAS controller
@ 2013-07-02 19:56 Stefan G. Weichinger
  2013-07-02 22:42 ` Paul Hartman
  0 siblings, 1 reply; 23+ messages in thread
From: Stefan G. Weichinger @ 2013-07-02 19:56 UTC (permalink / raw
  To: gentoo-user


Does anyone use that controller with gentoo?

If yes, which driver/module does support it?

I ordered one for a server and did not really check the facts ;-)

Stefan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-02 19:56 [gentoo-user] hp H222 SAS controller Stefan G. Weichinger
@ 2013-07-02 22:42 ` Paul Hartman
  2013-07-03  7:29   ` Stefan G. Weichinger
  2013-07-11 22:06   ` Stefan G. Weichinger
  0 siblings, 2 replies; 23+ messages in thread
From: Paul Hartman @ 2013-07-02 22:42 UTC (permalink / raw
  To: gentoo-user

On Tue, Jul 2, 2013 at 2:56 PM, Stefan G. Weichinger <lists@xunil.at> wrote:
>
> Does anyone use that controller with gentoo?
>
> If yes, which driver/module does support it?
>
> I ordered one for a server and did not really check the facts ;-)

Looks like it uses the LSI SAS2008 chipset (basically LSI controller
with HP branding), so you should enable kernel module mpt2sas
(CONFIG_SCSI_MPT2SAS) and probably some other SAS-related options will
be required as well if you don't already use them.

I actually just installed a card with this same chipset in my Gentoo
machine yesterday! I have not attached disks to it yet, as I am
waiting for the enclosure to be delivered, but so far nothing froze or
burst into flames when the module loaded. :)  I even upgraded the BIOS
and firmware on the card from within linux and everything seems okay,
so far.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-02 22:42 ` Paul Hartman
@ 2013-07-03  7:29   ` Stefan G. Weichinger
  2013-07-03 19:34     ` Paul Hartman
  2013-07-11 22:06   ` Stefan G. Weichinger
  1 sibling, 1 reply; 23+ messages in thread
From: Stefan G. Weichinger @ 2013-07-03  7:29 UTC (permalink / raw
  To: gentoo-user

Am 03.07.2013 00:42, schrieb Paul Hartman:
> On Tue, Jul 2, 2013 at 2:56 PM, Stefan G. Weichinger <lists@xunil.at> wrote:
>>
>> Does anyone use that controller with gentoo?
>>
>> If yes, which driver/module does support it?
>>
>> I ordered one for a server and did not really check the facts ;-)
> 
> Looks like it uses the LSI SAS2008 chipset (basically LSI controller
> with HP branding), so you should enable kernel module mpt2sas
> (CONFIG_SCSI_MPT2SAS) and probably some other SAS-related options will
> be required as well if you don't already use them.
> 
> I actually just installed a card with this same chipset in my Gentoo
> machine yesterday! I have not attached disks to it yet, as I am
> waiting for the enclosure to be delivered, but so far nothing froze or
> burst into flames when the module loaded. :)  I even upgraded the BIOS
> and firmware on the card from within linux and everything seems okay,
> so far.

Thanks a lot, Paul, for that feedback. Seems that you will be the first
to really test it, my box will arrive next week, I assume. This will be
an installation from scratch so no SAS-related stuff there already.

I wonder if it makes sense to attach the disks to that adapter as well?
This box will do amanda backups ... so there will be the amanda holding
disk and it is important to have maximum speed between that holding area
and the tape drive. I plan RAID1 on 2x2TB disks at least or maybe even
RAID0 (it's a rather temporary storage area so the redundancy isn't that
important). Testing will show!

Greets, Stefan



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-03  7:29   ` Stefan G. Weichinger
@ 2013-07-03 19:34     ` Paul Hartman
  2013-07-04  9:21       ` Stefan G. Weichinger
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Hartman @ 2013-07-03 19:34 UTC (permalink / raw
  To: gentoo-user

On Wed, Jul 3, 2013 at 2:29 AM, Stefan G. Weichinger <lists@xunil.at> wrote:
> Am 03.07.2013 00:42, schrieb Paul Hartman:
>> On Tue, Jul 2, 2013 at 2:56 PM, Stefan G. Weichinger <lists@xunil.at> wrote:
>>>
>>> Does anyone use that controller with gentoo?
>>>
>>> If yes, which driver/module does support it?
>>>
>>> I ordered one for a server and did not really check the facts ;-)
>>
>> Looks like it uses the LSI SAS2008 chipset (basically LSI controller
>> with HP branding), so you should enable kernel module mpt2sas
>> (CONFIG_SCSI_MPT2SAS) and probably some other SAS-related options will
>> be required as well if you don't already use them.
>>
>> I actually just installed a card with this same chipset in my Gentoo
>> machine yesterday! I have not attached disks to it yet, as I am
>> waiting for the enclosure to be delivered, but so far nothing froze or
>> burst into flames when the module loaded. :)  I even upgraded the BIOS
>> and firmware on the card from within linux and everything seems okay,
>> so far.
>
> Thanks a lot, Paul, for that feedback. Seems that you will be the first
> to really test it, my box will arrive next week, I assume. This will be
> an installation from scratch so no SAS-related stuff there already.
>
> I wonder if it makes sense to attach the disks to that adapter as well?
> This box will do amanda backups ... so there will be the amanda holding
> disk and it is important to have maximum speed between that holding area
> and the tape drive. I plan RAID1 on 2x2TB disks at least or maybe even
> RAID0 (it's a rather temporary storage area so the redundancy isn't that
> important). Testing will show!

Mine will be attached to an external 8-disk storage array with 2
external SAS cables (4 disks per cable). I had a 5-disk 8TB software
RAID5 in my computer that I had to remove due to an unplanned
motherboard upgrade. Right now the disks are in a cheap external
5-disk eSATA/USB JBOD enclosure plugged into the eSATA port on my
motherboard, but it's not able to access all disks at the same time,
so the RAID5 performance is awful. Around 10-20 MB/sec on writes and
max 50MB/sec on reads. (It was previously 100MB+/sec for both
operations.)

In the eSATA enclosure, a single scrub (check) of my array takes FOUR
DAYS to complete! I worry about what will happen if I have to replace
a disk, the rebuild would take forever... what if there is a power
outage and my UPS battery only lasts around 30 minutes?

I bought two of the lowest-quality 4tb Seagate drives for US$140 each
on sale and plan to use them to make a backup copy of my files from
the RAID onto those drives. So far I have never made a backup of my
RAID because I never had enough storage space to duplicate it all.
"RAID is not a backup" has been repeating in my head for all these
years. Horror stories about a corrupt filesystem, or 1 bad sector
causing the whole RAID5 rebuild to fail. Now that I will have extra
drive bays, maybe I can add a second parity drive and try to do an
online upgrade from RAID5 to RAID6. I definitely want to make a good
backup before I try that...

I am hopeful that the SAS controller and enclosure should give me high
performance again! I will let you know how it goes.

BTW, I am using the latest 3.9 series linux kernel.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-03 19:34     ` Paul Hartman
@ 2013-07-04  9:21       ` Stefan G. Weichinger
  2013-07-05  2:04         ` Paul Hartman
  0 siblings, 1 reply; 23+ messages in thread
From: Stefan G. Weichinger @ 2013-07-04  9:21 UTC (permalink / raw
  To: gentoo-user

Am 03.07.2013 21:34, schrieb Paul Hartman:
> On Wed, Jul 3, 2013 at 2:29 AM, Stefan G. Weichinger <lists@xunil.at> wrote:
>> Am 03.07.2013 00:42, schrieb Paul Hartman:
>>> On Tue, Jul 2, 2013 at 2:56 PM, Stefan G. Weichinger <lists@xunil.at> wrote:
>>>>
>>>> Does anyone use that controller with gentoo?
>>>>
>>>> If yes, which driver/module does support it?
>>>>
>>>> I ordered one for a server and did not really check the facts ;-)
>>>
>>> Looks like it uses the LSI SAS2008 chipset (basically LSI controller
>>> with HP branding), so you should enable kernel module mpt2sas
>>> (CONFIG_SCSI_MPT2SAS) and probably some other SAS-related options will
>>> be required as well if you don't already use them.
>>>
>>> I actually just installed a card with this same chipset in my Gentoo
>>> machine yesterday! I have not attached disks to it yet, as I am
>>> waiting for the enclosure to be delivered, but so far nothing froze or
>>> burst into flames when the module loaded. :)  I even upgraded the BIOS
>>> and firmware on the card from within linux and everything seems okay,
>>> so far.
>>
>> Thanks a lot, Paul, for that feedback. Seems that you will be the first
>> to really test it, my box will arrive next week, I assume. This will be
>> an installation from scratch so no SAS-related stuff there already.
>>
>> I wonder if it makes sense to attach the disks to that adapter as well?
>> This box will do amanda backups ... so there will be the amanda holding
>> disk and it is important to have maximum speed between that holding area
>> and the tape drive. I plan RAID1 on 2x2TB disks at least or maybe even
>> RAID0 (it's a rather temporary storage area so the redundancy isn't that
>> important). Testing will show!
> 
> Mine will be attached to an external 8-disk storage array with 2
> external SAS cables (4 disks per cable). I had a 5-disk 8TB software
> RAID5 in my computer that I had to remove due to an unplanned
> motherboard upgrade. Right now the disks are in a cheap external
> 5-disk eSATA/USB JBOD enclosure plugged into the eSATA port on my
> motherboard, but it's not able to access all disks at the same time,
> so the RAID5 performance is awful. Around 10-20 MB/sec on writes and
> max 50MB/sec on reads. (It was previously 100MB+/sec for both
> operations.)
> 
> In the eSATA enclosure, a single scrub (check) of my array takes FOUR
> DAYS to complete! I worry about what will happen if I have to replace
> a disk, the rebuild would take forever... what if there is a power
> outage and my UPS battery only lasts around 30 minutes?
> 
> I bought two of the lowest-quality 4tb Seagate drives for US$140 each
> on sale and plan to use them to make a backup copy of my files from
> the RAID onto those drives. So far I have never made a backup of my
> RAID because I never had enough storage space to duplicate it all.
> "RAID is not a backup" has been repeating in my head for all these
> years. Horror stories about a corrupt filesystem, or 1 bad sector
> causing the whole RAID5 rebuild to fail. Now that I will have extra
> drive bays, maybe I can add a second parity drive and try to do an
> online upgrade from RAID5 to RAID6. I definitely want to make a good
> backup before I try that...
> 
> I am hopeful that the SAS controller and enclosure should give me high
> performance again! I will let you know how it goes.
> 
> BTW, I am using the latest 3.9 series linux kernel.

My planned box will be a stable gentoo installation so that will mean
3.8.13 for now. No problem, I assume.

Thanks for your description ... good luck with that!
I will maybe pre-install that system in a VM until the hardware gets
here ;-)

Stefan



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-04  9:21       ` Stefan G. Weichinger
@ 2013-07-05  2:04         ` Paul Hartman
  2013-07-08 15:39           ` Paul Hartman
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Hartman @ 2013-07-05  2:04 UTC (permalink / raw
  To: gentoo-user

On Thu, Jul 4, 2013 at 4:21 AM, Stefan G. Weichinger <lists@xunil.at> wrote:
> My planned box will be a stable gentoo installation so that will mean
> 3.8.13 for now. No problem, I assume.

No problem, I think the mpt2sas driver appeared in kernel around 2.6.3X series.

> Thanks for your description ... good luck with that!
> I will maybe pre-install that system in a VM until the hardware gets
> here ;-)

Today I installed my drives into the SAS enclosure. Everything is
working great and I'm getting maximum speed from all drives with
simultaneous access. So far I have not experienced any errors or
problems. Hopefully your luck is as good as mine!

Here is how it looks in dmesg:

[    4.260179] mpt2sas version 14.100.00.00 loaded
[    4.265444] mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED,
total mem (32800384 kB)
[    4.265498] mpt2sas 0000:06:00.0: irq 98 for MSI/MSI-X
[    4.265607] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 98
[    4.265609] mpt2sas0: iomem(0x00000000fe3c0000),
mapped(0xffffc90000038000), size(16384)
[    4.265611] mpt2sas0: ioport(0x000000000000b000), size(256)
[    4.344822] mpt2sas0: sending message unit reset !!
[    4.346817] mpt2sas0: message unit reset: SUCCESS
[    4.390041] mpt2sas0: Allocated physical memory: size(4219 kB)
[    4.390048] mpt2sas0: Current Controller Queue Depth(1867), Max
Controller Queue Depth(2040)
[    4.390052] mpt2sas0: Scatter Gather Elements per IO(128)
[    4.450285] mpt2sas0: LSISAS2008: FWVersion(16.00.00.00),
ChipRevision(0x03), BiosVersion(07.31.00.00)
[    4.450291] mpt2sas0: Protocol=(Initiator,Target),
Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set
Full,NCQ)
[    4.453850] mpt2sas0: sending port enable !!
[    4.459383] mpt2sas0: host_add: handle(0x0001),
sas_addr(0x500605b0060f40d0), phys(8)
[    4.464832] mpt2sas0: port enable: SUCCESS

The disks appear as SCSI disks like normal, but with the SAS address included:

[    4.466217] scsi 10:0:0:0: Direct-Access     ATA
ST4000DM000-1F21 CC52 PQ: 0 ANSI: 6
[    4.466228] scsi 10:0:0:0: SATA: handle(0x0009),
sas_addr(0x4433221100000000), phy(0), device_name(0x5000c500508bcc46)
[    4.466234] scsi 10:0:0:0: SATA:
enclosure_logical_id(0x500605b0060f40d0), slot(0)
[    4.466359] scsi 10:0:0:0: atapi(n), ncq(y), asyn_notify(n),
smart(y), fua(y), sw_preserve(y)
[    4.466369] scsi 10:0:0:0: qdepth(32), tagged(1), simple(0),
ordered(0), scsi_level(7), cmd_que(1)
[    4.466710] sd 10:0:0:0: Attached scsi generic sg4 type 0
[    4.467172] sd 10:0:0:0: [sdd] 7814037168 512-byte logical blocks:
(4.00 TB/3.63 TiB)
[    4.467180] sd 10:0:0:0: [sdd] 4096-byte physical blocks

Good luck,
Paul


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-05  2:04         ` Paul Hartman
@ 2013-07-08 15:39           ` Paul Hartman
  2013-07-08 15:58             ` Alan McKinnon
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Hartman @ 2013-07-08 15:39 UTC (permalink / raw
  To: gentoo-user

On Thu, Jul 4, 2013 at 9:04 PM, Paul Hartman
<paul.hartman+gentoo@gmail.com> wrote:
> ST4000DM000

As a side-note these two Seagate 4TB "Desktop" edition drives I bought
already, after about than 100 hours of power-on usage, both drives
have each encountered dozens of unreadable sectors so far. I was able
to correct them (force reallocation) using hdparm... So it should be
"fixed", and I'm reading that this is "normal" with newer drives and
"don't worry about it", but I'm still coming from the time when 1 bad
sector = red alert, replace the drive ASAP.  I guess I will need to
monitor and see if it gets worse.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-08 15:39           ` Paul Hartman
@ 2013-07-08 15:58             ` Alan McKinnon
  2013-07-08 18:27               ` Stefan G. Weichinger
  2013-07-14 22:35               ` Paul Hartman
  0 siblings, 2 replies; 23+ messages in thread
From: Alan McKinnon @ 2013-07-08 15:58 UTC (permalink / raw
  To: gentoo-user

On 08/07/2013 17:39, Paul Hartman wrote:
> On Thu, Jul 4, 2013 at 9:04 PM, Paul Hartman
> <paul.hartman+gentoo@gmail.com> wrote:
>> ST4000DM000
> 
> As a side-note these two Seagate 4TB "Desktop" edition drives I bought
> already, after about than 100 hours of power-on usage, both drives
> have each encountered dozens of unreadable sectors so far. I was able
> to correct them (force reallocation) using hdparm... So it should be
> "fixed", and I'm reading that this is "normal" with newer drives and
> "don't worry about it", but I'm still coming from the time when 1 bad
> sector = red alert, replace the drive ASAP.  I guess I will need to
> monitor and see if it gets worse.
> 


Way back when in the bad old days of drives measured in 100s of megs,
you'd get a few bad sectors now and then, and would have to mark them as
faulty. This didn't bother us then much

Nowadays we have drives that are 8,000 bigger than that so all other
things being equal we'd expect sectors to fail 8,000 time more (more
being a very fuzzy concept, and I know full well I'm using it loosely :-) )

Our drives nowadays also have smart firmware, something we had to
introduce when CHS no longer cut it, this lead to sector failures being
somewhat "invisible" leaving us with the happy delusion that drives were
vastly reliable etc etc etc. But you know all this.

A mere few dozen failures in the first 100 hours is a failure rate of
(Alan whips out the trust sci calculator) 4.8E-6%. Pretty damn
spectacular if you ask me and WELL within probabilities.

There is likely nothing wrong with your drives. If they are faulty, it's
highly likely a systemic manufacturing fault of the mechanicals (servo
systems, motor bearing etc)

You do realize that modern hard drives have for the longest time been up
there in the Top X list of Most Reliable Devices Made By Mankind Ever?



-- 
Alan McKinnon
alan.mckinnon@gmail.com



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-08 15:58             ` Alan McKinnon
@ 2013-07-08 18:27               ` Stefan G. Weichinger
  2013-07-08 21:42                 ` Alan McKinnon
  2013-07-08 22:48                 ` Paul Hartman
  2013-07-14 22:35               ` Paul Hartman
  1 sibling, 2 replies; 23+ messages in thread
From: Stefan G. Weichinger @ 2013-07-08 18:27 UTC (permalink / raw
  To: gentoo-user

Am 08.07.2013 17:58, schrieb Alan McKinnon:
> On 08/07/2013 17:39, Paul Hartman wrote:
>> On Thu, Jul 4, 2013 at 9:04 PM, Paul Hartman
>> <paul.hartman+gentoo@gmail.com> wrote:
>>> ST4000DM000
>>
>> As a side-note these two Seagate 4TB "Desktop" edition drives I bought
>> already, after about than 100 hours of power-on usage, both drives
>> have each encountered dozens of unreadable sectors so far. I was able
>> to correct them (force reallocation) using hdparm... So it should be
>> "fixed", and I'm reading that this is "normal" with newer drives and
>> "don't worry about it", but I'm still coming from the time when 1 bad
>> sector = red alert, replace the drive ASAP.  I guess I will need to
>> monitor and see if it gets worse.
>>
> 
> 
> Way back when in the bad old days of drives measured in 100s of megs,
> you'd get a few bad sectors now and then, and would have to mark them as
> faulty. This didn't bother us then much
> 
> Nowadays we have drives that are 8,000 bigger than that so all other
> things being equal we'd expect sectors to fail 8,000 time more (more
> being a very fuzzy concept, and I know full well I'm using it loosely :-) )
> 
> Our drives nowadays also have smart firmware, something we had to
> introduce when CHS no longer cut it, this lead to sector failures being
> somewhat "invisible" leaving us with the happy delusion that drives were
> vastly reliable etc etc etc. But you know all this.
> 
> A mere few dozen failures in the first 100 hours is a failure rate of
> (Alan whips out the trust sci calculator) 4.8E-6%. Pretty damn
> spectacular if you ask me and WELL within probabilities.
> 
> There is likely nothing wrong with your drives. If they are faulty, it's
> highly likely a systemic manufacturing fault of the mechanicals (servo
> systems, motor bearing etc)
> 
> You do realize that modern hard drives have for the longest time been up
> there in the Top X list of Most Reliable Devices Made By Mankind Ever?

Does it make sense to apply some sort of burn-in-procedure before
actually formatting and using the disks? Running badblocks or something?

I ask because I wait for that shiny new server and doing so might not
hurt before installing gentoo. Or is that too paranoid and a waste of time?

Thanks, greets, Stefan



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-08 18:27               ` Stefan G. Weichinger
@ 2013-07-08 21:42                 ` Alan McKinnon
  2013-07-10 17:12                   ` Stefan G. Weichinger
  2013-07-08 22:48                 ` Paul Hartman
  1 sibling, 1 reply; 23+ messages in thread
From: Alan McKinnon @ 2013-07-08 21:42 UTC (permalink / raw
  To: gentoo-user

On 08/07/2013 20:27, Stefan G. Weichinger wrote:
> Am 08.07.2013 17:58, schrieb Alan McKinnon:
>> On 08/07/2013 17:39, Paul Hartman wrote:
>>> On Thu, Jul 4, 2013 at 9:04 PM, Paul Hartman
>>> <paul.hartman+gentoo@gmail.com> wrote:
>>>> ST4000DM000
>>>
>>> As a side-note these two Seagate 4TB "Desktop" edition drives I bought
>>> already, after about than 100 hours of power-on usage, both drives
>>> have each encountered dozens of unreadable sectors so far. I was able
>>> to correct them (force reallocation) using hdparm... So it should be
>>> "fixed", and I'm reading that this is "normal" with newer drives and
>>> "don't worry about it", but I'm still coming from the time when 1 bad
>>> sector = red alert, replace the drive ASAP.  I guess I will need to
>>> monitor and see if it gets worse.
>>>
>>
>>
>> Way back when in the bad old days of drives measured in 100s of megs,
>> you'd get a few bad sectors now and then, and would have to mark them as
>> faulty. This didn't bother us then much
>>
>> Nowadays we have drives that are 8,000 bigger than that so all other
>> things being equal we'd expect sectors to fail 8,000 time more (more
>> being a very fuzzy concept, and I know full well I'm using it loosely :-) )
>>
>> Our drives nowadays also have smart firmware, something we had to
>> introduce when CHS no longer cut it, this lead to sector failures being
>> somewhat "invisible" leaving us with the happy delusion that drives were
>> vastly reliable etc etc etc. But you know all this.
>>
>> A mere few dozen failures in the first 100 hours is a failure rate of
>> (Alan whips out the trust sci calculator) 4.8E-6%. Pretty damn
>> spectacular if you ask me and WELL within probabilities.
>>
>> There is likely nothing wrong with your drives. If they are faulty, it's
>> highly likely a systemic manufacturing fault of the mechanicals (servo
>> systems, motor bearing etc)
>>
>> You do realize that modern hard drives have for the longest time been up
>> there in the Top X list of Most Reliable Devices Made By Mankind Ever?
> 
> Does it make sense to apply some sort of burn-in-procedure before
> actually formatting and using the disks? Running badblocks or something?
> 
> I ask because I wait for that shiny new server and doing so might not
> hurt before installing gentoo. Or is that too paranoid and a waste of time?

If it makes you feel better, then by all means go through the motions
.

For my money, I reckon that's exactly what it is - motions and ritual. I
havew any anecdotal evidence to back it up, but it's fairly strong
anecdotal evidence:

Over the last 5 years, the team I'm in, the teams we work closely with
and the Storage guys have commissioned >1000 pieces of hardware and
probably more than 4000 drives, the vast majority from Dell. I have no
idea what burn-in Dell applies, if any. We've had our fair share of
infant mortality failures, prob ably less than 20 in 5 years. And here's
the kicker - every single one failed in production.

Most of that hardware, and ALL of the SANs, went through heavy
pre-deployment testing. Usually, this means cloning the -dev system onto
it and running the crap out of it for a decent length of time. Once the
techies were happy, install the production version and switch it on.

I conclude that the likely reason we only found failure in prod is that
only prod gives a decent viable test that approximates real life and dev
is always a mere simulation. It's not usage that kills a few drives
early, it's the almost random pattern of disk access that you get in
real life. That tends to shake out the weak links better than any test.

However, this is all anecdotal so use or discard as you see fit :-). I
no longer worry about data loss as we have 4 hour warranty turnaround
SLAs in place and company policy is to only deploy storage that is
guaranteed to survive loss of any one drive in an array.


-- 
Alan McKinnon
alan.mckinnon@gmail.com



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-08 18:27               ` Stefan G. Weichinger
  2013-07-08 21:42                 ` Alan McKinnon
@ 2013-07-08 22:48                 ` Paul Hartman
  2013-07-10 17:14                   ` Stefan G. Weichinger
  1 sibling, 1 reply; 23+ messages in thread
From: Paul Hartman @ 2013-07-08 22:48 UTC (permalink / raw
  To: gentoo-user

On Mon, Jul 8, 2013 at 1:27 PM, Stefan G. Weichinger <lists@xunil.at> wrote:
> Does it make sense to apply some sort of burn-in-procedure before
> actually formatting and using the disks? Running badblocks or something?
>
> I ask because I wait for that shiny new server and doing so might not
> hurt before installing gentoo. Or is that too paranoid and a waste of time?

Initially I ran the SMART long test and it found no errors. Then I did
badblocks read-only scan and it found some bad sectors. After that,
SMART tests failed to complete due to "failure reading LBA xxxxxxxxx".
I used hdparm to remap those sectors, but didn't feel entirely
confident in the disk at that point in time.

So I ran the badblocks destructive read-write test and it completed
(after a couple days) with zero errors! How can it be?

Checking the SMART statistics afterward, I can see now there are
dozens of newly reallocated sectors. So that means the drive silently
replaced those bad sectors with spares, which is good! That is what it
is supposed to do! I don't feel happy about the fact that those bad
sectors exist in the first place, but the drive did what it was
designed to do when it encountered them.

After the r/w badblocks test cycle finished, I ran SMART long-scan
again and this time it completed with no errors.

So I recommend to do the destructive read-write badblocks test, if you
can afford the hours (or days) spent waiting for it to complete.

SMART alone did not detect the errors initially, but neither did
badblocks actually identify the errors during its write test (because
the drive hides it). But the combination of badblocks and the
self-repairing code in the drive's firmware accomplished the goal of
making my disk free of errors (logically).

Notes:

WARNING! Be careful to give the correct device name when doing the
badblocks write test! There is no confirmation prompt! It immediately
starts destroying data at the beginning of the disk.

If you have a disk with 4k sector size, be sure to tell badblocks to
use a 4096 byte block size. It uses 1k block size by default, which
can cause the test to be very slow! In my system badblocks with 1k
block size read at 15MB/sec, while 4k block size read at over
160MB/sec! Using 1k block size on a 4k-sector disk also causes all
errors to be reported 4 times each.

Good luck :)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-08 21:42                 ` Alan McKinnon
@ 2013-07-10 17:12                   ` Stefan G. Weichinger
  0 siblings, 0 replies; 23+ messages in thread
From: Stefan G. Weichinger @ 2013-07-10 17:12 UTC (permalink / raw
  To: gentoo-user

Am 08.07.2013 23:42, schrieb Alan McKinnon:

> If it makes you feel better, then by all means go through the motions
> .
> 
> For my money, I reckon that's exactly what it is - motions and ritual. I
> havew any anecdotal evidence to back it up, but it's fairly strong
> anecdotal evidence:
> 
> Over the last 5 years, the team I'm in, the teams we work closely with
> and the Storage guys have commissioned >1000 pieces of hardware and
> probably more than 4000 drives, the vast majority from Dell. I have no
> idea what burn-in Dell applies, if any. We've had our fair share of
> infant mortality failures, prob ably less than 20 in 5 years. And here's
> the kicker - every single one failed in production.
> 
> Most of that hardware, and ALL of the SANs, went through heavy
> pre-deployment testing. Usually, this means cloning the -dev system onto
> it and running the crap out of it for a decent length of time. Once the
> techies were happy, install the production version and switch it on.
> 
> I conclude that the likely reason we only found failure in prod is that
> only prod gives a decent viable test that approximates real life and dev
> is always a mere simulation. It's not usage that kills a few drives
> early, it's the almost random pattern of disk access that you get in
> real life. That tends to shake out the weak links better than any test.
> 
> However, this is all anecdotal so use or discard as you see fit :-). I
> no longer worry about data loss as we have 4 hour warranty turnaround
> SLAs in place and company policy is to only deploy storage that is
> guaranteed to survive loss of any one drive in an array.

Thanks for that, good point :-)

Stefan



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-08 22:48                 ` Paul Hartman
@ 2013-07-10 17:14                   ` Stefan G. Weichinger
  0 siblings, 0 replies; 23+ messages in thread
From: Stefan G. Weichinger @ 2013-07-10 17:14 UTC (permalink / raw
  To: gentoo-user

Am 09.07.2013 00:48, schrieb Paul Hartman:
> On Mon, Jul 8, 2013 at 1:27 PM, Stefan G. Weichinger <lists@xunil.at> wrote:
>> Does it make sense to apply some sort of burn-in-procedure before
>> actually formatting and using the disks? Running badblocks or something?
>>
>> I ask because I wait for that shiny new server and doing so might not
>> hurt before installing gentoo. Or is that too paranoid and a waste of time?
> 
> Initially I ran the SMART long test and it found no errors. Then I did
> badblocks read-only scan and it found some bad sectors. After that,
> SMART tests failed to complete due to "failure reading LBA xxxxxxxxx".
> I used hdparm to remap those sectors, but didn't feel entirely
> confident in the disk at that point in time.
> 
> So I ran the badblocks destructive read-write test and it completed
> (after a couple days) with zero errors! How can it be?
> 
> Checking the SMART statistics afterward, I can see now there are
> dozens of newly reallocated sectors. So that means the drive silently
> replaced those bad sectors with spares, which is good! That is what it
> is supposed to do! I don't feel happy about the fact that those bad
> sectors exist in the first place, but the drive did what it was
> designed to do when it encountered them.
> 
> After the r/w badblocks test cycle finished, I ran SMART long-scan
> again and this time it completed with no errors.
> 
> So I recommend to do the destructive read-write badblocks test, if you
> can afford the hours (or days) spent waiting for it to complete.
> 
> SMART alone did not detect the errors initially, but neither did
> badblocks actually identify the errors during its write test (because
> the drive hides it). But the combination of badblocks and the
> self-repairing code in the drive's firmware accomplished the goal of
> making my disk free of errors (logically).
> 
> Notes:
> 
> WARNING! Be careful to give the correct device name when doing the
> badblocks write test! There is no confirmation prompt! It immediately
> starts destroying data at the beginning of the disk.
> 
> If you have a disk with 4k sector size, be sure to tell badblocks to
> use a 4096 byte block size. It uses 1k block size by default, which
> can cause the test to be very slow! In my system badblocks with 1k
> block size read at 15MB/sec, while 4k block size read at over
> 160MB/sec! Using 1k block size on a 4k-sector disk also causes all
> errors to be reported 4 times each.
> 
> Good luck :)

Thanks for your explanations, Paul ... I will see if I have the patience
to wait for hours or days :-)

Stefan



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-02 22:42 ` Paul Hartman
  2013-07-03  7:29   ` Stefan G. Weichinger
@ 2013-07-11 22:06   ` Stefan G. Weichinger
  2013-07-11 22:38     ` Paul Hartman
  1 sibling, 1 reply; 23+ messages in thread
From: Stefan G. Weichinger @ 2013-07-11 22:06 UTC (permalink / raw
  To: gentoo-user

Am 03.07.2013 00:42, schrieb Paul Hartman:
> On Tue, Jul 2, 2013 at 2:56 PM, Stefan G. Weichinger <lists@xunil.at> wrote:
>>
>> Does anyone use that controller with gentoo?
>>
>> If yes, which driver/module does support it?
>>
>> I ordered one for a server and did not really check the facts ;-)
> 
> Looks like it uses the LSI SAS2008 chipset (basically LSI controller
> with HP branding), so you should enable kernel module mpt2sas
> (CONFIG_SCSI_MPT2SAS) and probably some other SAS-related options will
> be required as well if you don't already use them.

lspci shows something else here:


# lspci | grep SATA
00:1f.2 SATA controller: Intel Corporation 7 Series/C210 Series Chipset
Family 6-port SATA Controller [AHCI mode] (rev 04)
38:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9172 SATA
6Gb/s Controller (rev 11)
3d:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9172 SATA
6Gb/s Controller (rev 11)

so I have to look for Marvell stuff ... module "mv_sas" does not work
yet, no scsi-tape-device visible.

Stefan



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-11 22:06   ` Stefan G. Weichinger
@ 2013-07-11 22:38     ` Paul Hartman
  2013-07-11 22:47       ` Stefan G. Weichinger
  2013-07-11 23:02       ` Stefan G. Weichinger
  0 siblings, 2 replies; 23+ messages in thread
From: Paul Hartman @ 2013-07-11 22:38 UTC (permalink / raw
  To: gentoo-user

On Thu, Jul 11, 2013 at 5:06 PM, Stefan G. Weichinger <lists@xunil.at> wrote:
> Am 03.07.2013 00:42, schrieb Paul Hartman:
>> On Tue, Jul 2, 2013 at 2:56 PM, Stefan G. Weichinger <lists@xunil.at> wrote:
>>>
>>> Does anyone use that controller with gentoo?
>>>
>>> If yes, which driver/module does support it?
>>>
>>> I ordered one for a server and did not really check the facts ;-)
>>
>> Looks like it uses the LSI SAS2008 chipset (basically LSI controller
>> with HP branding), so you should enable kernel module mpt2sas
>> (CONFIG_SCSI_MPT2SAS) and probably some other SAS-related options will
>> be required as well if you don't already use them.
>
> lspci shows something else here:
>
>
> # lspci | grep SATA
> 00:1f.2 SATA controller: Intel Corporation 7 Series/C210 Series Chipset
> Family 6-port SATA Controller [AHCI mode] (rev 04)
> 38:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9172 SATA
> 6Gb/s Controller (rev 11)
> 3d:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9172 SATA
> 6Gb/s Controller (rev 11)
>
> so I have to look for Marvell stuff ... module "mv_sas" does not work
> yet, no scsi-tape-device visible.

Hmmm, even the data on HP's website says H222 uses LSI SAS2x08 chipset
and mpt2sas driver.  I think maybe those Marvell entries are
SATA/eSATA ports on your motherboard. Or you don't have the same H222
I am seeing online when I search. :)

BTW that Marvell chipset should work with the ordinary kernel AHCI driver.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-11 22:38     ` Paul Hartman
@ 2013-07-11 22:47       ` Stefan G. Weichinger
  2013-07-11 23:02       ` Stefan G. Weichinger
  1 sibling, 0 replies; 23+ messages in thread
From: Stefan G. Weichinger @ 2013-07-11 22:47 UTC (permalink / raw
  To: gentoo-user

Am 12.07.2013 00:38, schrieb Paul Hartman:
> On Thu, Jul 11, 2013 at 5:06 PM, Stefan G. Weichinger <lists@xunil.at> wrote:
>> Am 03.07.2013 00:42, schrieb Paul Hartman:
>>> On Tue, Jul 2, 2013 at 2:56 PM, Stefan G. Weichinger <lists@xunil.at> wrote:
>>>>
>>>> Does anyone use that controller with gentoo?
>>>>
>>>> If yes, which driver/module does support it?
>>>>
>>>> I ordered one for a server and did not really check the facts ;-)
>>>
>>> Looks like it uses the LSI SAS2008 chipset (basically LSI controller
>>> with HP branding), so you should enable kernel module mpt2sas
>>> (CONFIG_SCSI_MPT2SAS) and probably some other SAS-related options will
>>> be required as well if you don't already use them.
>>
>> lspci shows something else here:
>>
>>
>> # lspci | grep SATA
>> 00:1f.2 SATA controller: Intel Corporation 7 Series/C210 Series Chipset
>> Family 6-port SATA Controller [AHCI mode] (rev 04)
>> 38:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9172 SATA
>> 6Gb/s Controller (rev 11)
>> 3d:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9172 SATA
>> 6Gb/s Controller (rev 11)
>>
>> so I have to look for Marvell stuff ... module "mv_sas" does not work
>> yet, no scsi-tape-device visible.
> 
> Hmmm, even the data on HP's website says H222 uses LSI SAS2x08 chipset
> and mpt2sas driver.  I think maybe those Marvell entries are
> SATA/eSATA ports on your motherboard. Or you don't have the same H222
> I am seeing online when I search. :)
> 
> BTW that Marvell chipset should work with the ordinary kernel AHCI driver.

did some screwdriver engineering:

lspci -v > with-card.txt

poweroff; remove card

lspci -v > without-card.txt

vimdiff ...

interesting ... the diff is some block with "PCI bridge" only .. it
seems as if the card isn't detected at all.

The card has "hp" stickers on it and "SAS9205-4i4e" which googles as
HP222 ...

Stefan



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-11 22:38     ` Paul Hartman
  2013-07-11 22:47       ` Stefan G. Weichinger
@ 2013-07-11 23:02       ` Stefan G. Weichinger
  2013-07-12 13:52         ` Stefan G. Weichinger
  1 sibling, 1 reply; 23+ messages in thread
From: Stefan G. Weichinger @ 2013-07-11 23:02 UTC (permalink / raw
  To: gentoo-user

Am 12.07.2013 00:38, schrieb Paul Hartman:

> Hmmm, even the data on HP's website says H222 uses LSI SAS2x08 chipset
> and mpt2sas driver.  I think maybe those Marvell entries are
> SATA/eSATA ports on your motherboard.

Yes, you might be right ...

and I see the IDs in /sys/bus/pci/drivers/ahci

So the LTO4 is connected *somewhere* else, I don't get any /dev/st0 even
with "st" compiled into the kernel.




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-11 23:02       ` Stefan G. Weichinger
@ 2013-07-12 13:52         ` Stefan G. Weichinger
  2013-07-12 17:21           ` Stefan G. Weichinger
  0 siblings, 1 reply; 23+ messages in thread
From: Stefan G. Weichinger @ 2013-07-12 13:52 UTC (permalink / raw
  To: gentoo-user

Am 12.07.2013 01:02, schrieb Stefan G. Weichinger:
> Am 12.07.2013 00:38, schrieb Paul Hartman:
> 
>> Hmmm, even the data on HP's website says H222 uses LSI SAS2x08 chipset
>> and mpt2sas driver.  I think maybe those Marvell entries are
>> SATA/eSATA ports on your motherboard.
> 
> Yes, you might be right ...
> 
> and I see the IDs in /sys/bus/pci/drivers/ahci
> 
> So the LTO4 is connected *somewhere* else, I don't get any /dev/st0 even
> with "st" compiled into the kernel.

I checked for PCI(e) devices in the BIOS: the controller does not show
up there!

hmm ...



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-12 13:52         ` Stefan G. Weichinger
@ 2013-07-12 17:21           ` Stefan G. Weichinger
  0 siblings, 0 replies; 23+ messages in thread
From: Stefan G. Weichinger @ 2013-07-12 17:21 UTC (permalink / raw
  To: gentoo-user

Am 12.07.2013 15:52, schrieb Stefan G. Weichinger:
> Am 12.07.2013 01:02, schrieb Stefan G. Weichinger:
>> Am 12.07.2013 00:38, schrieb Paul Hartman:
>>
>>> Hmmm, even the data on HP's website says H222 uses LSI SAS2x08 chipset
>>> and mpt2sas driver.  I think maybe those Marvell entries are
>>> SATA/eSATA ports on your motherboard.
>>
>> Yes, you might be right ...
>>
>> and I see the IDs in /sys/bus/pci/drivers/ahci
>>
>> So the LTO4 is connected *somewhere* else, I don't get any /dev/st0 even
>> with "st" compiled into the kernel.
> 
> I checked for PCI(e) devices in the BIOS: the controller does not show
> up there!
> 
> hmm ...

chose another slot, now it works!





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-08 15:58             ` Alan McKinnon
  2013-07-08 18:27               ` Stefan G. Weichinger
@ 2013-07-14 22:35               ` Paul Hartman
  2013-07-15  7:39                 ` Mick
  1 sibling, 1 reply; 23+ messages in thread
From: Paul Hartman @ 2013-07-14 22:35 UTC (permalink / raw
  To: gentoo-user

On Mon, Jul 8, 2013 at 10:58 AM, Alan McKinnon <alan.mckinnon@gmail.com> wrote:
> On 08/07/2013 17:39, Paul Hartman wrote:
>> On Thu, Jul 4, 2013 at 9:04 PM, Paul Hartman
>> <paul.hartman+gentoo@gmail.com> wrote:
>>> ST4000DM000
>>
>> As a side-note these two Seagate 4TB "Desktop" edition drives I bought
>> already, after about than 100 hours of power-on usage, both drives
>> have each encountered dozens of unreadable sectors so far. I was able
>> to correct them (force reallocation) using hdparm... So it should be
>> "fixed", and I'm reading that this is "normal" with newer drives and
>> "don't worry about it", but I'm still coming from the time when 1 bad
>> sector = red alert, replace the drive ASAP.  I guess I will need to
>> monitor and see if it gets worse.
>>
>
>
> Way back when in the bad old days of drives measured in 100s of megs,
> you'd get a few bad sectors now and then, and would have to mark them as
> faulty. This didn't bother us then much
>
> Nowadays we have drives that are 8,000 bigger than that so all other
> things being equal we'd expect sectors to fail 8,000 time more (more
> being a very fuzzy concept, and I know full well I'm using it loosely :-) )
>
> Our drives nowadays also have smart firmware, something we had to
> introduce when CHS no longer cut it, this lead to sector failures being
> somewhat "invisible" leaving us with the happy delusion that drives were
> vastly reliable etc etc etc. But you know all this.
>
> A mere few dozen failures in the first 100 hours is a failure rate of
> (Alan whips out the trust sci calculator) 4.8E-6%. Pretty damn
> spectacular if you ask me and WELL within probabilities.
>
> There is likely nothing wrong with your drives. If they are faulty, it's
> highly likely a systemic manufacturing fault of the mechanicals (servo
> systems, motor bearing etc)
>
> You do realize that modern hard drives have for the longest time been up
> there in the Top X list of Most Reliable Devices Made By Mankind Ever?

An update: the Seagate drives have both continued to spit more
unrecoverable errors and find more and more bad sectors. Including
some end-to-end errors indicated as critical "FAILING NOW" status in
SMART. From what I have read that error means the drive's internal
cache did not match the data written to disk, which seems like a
serious flaw. The threshold is 1 which means if it happens at all, the
drive should be replaced. It has happened half a dozen times on each
disk so far (but not at the exact same time, so I don't think it is a
host controller problem -- and other disks on the same controller and
cable have had no issues). They have also been disconnecting and
resetting randomly, sometimes requiring me to pull the drive and
reinsert it into the enclosure to make it reappear. It happens even
after I disabled APM, so I know it isn't a spin-down/idle timeout
thing. Temperatures are actually very good (low 30's) so they are not
overheating.

I think I will try to trade them in to Seagate for a new pair under
warranty replacement. And then probably try to sell the replacements
and be rid of them.

Meanwhile, during that experiment, I bought 2 brand new Western
Digital Red 3TB drives last week. No problems in SMART testing or
creating LVM/RAID/Filesystems. I have now been running the destructive
write/read badblocks tests for 24+ hours and they have been perfect so
far, exactly 0 errors. They are more expensive (3TB for the same price
as the 4TB seagate) and slightly slower read/write speed (150MB/sec
peak vs 170MB/sec peak), but I value reliability over all other
factors.

These Seagate drives must have some kind of manufacturing defect, or
perhaps were damaged in shipping... UPS have been known to treat
packages like a football!


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-14 22:35               ` Paul Hartman
@ 2013-07-15  7:39                 ` Mick
  2013-07-15 15:39                   ` Paul Hartman
  2013-07-23 12:52                   ` J. Roeleveld
  0 siblings, 2 replies; 23+ messages in thread
From: Mick @ 2013-07-15  7:39 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: Text/Plain, Size: 5001 bytes --]

On Sunday 14 Jul 2013 23:35:50 Paul Hartman wrote:
> On Mon, Jul 8, 2013 at 10:58 AM, Alan McKinnon <alan.mckinnon@gmail.com> 
wrote:
> > On 08/07/2013 17:39, Paul Hartman wrote:
> >> On Thu, Jul 4, 2013 at 9:04 PM, Paul Hartman
> >> 
> >> <paul.hartman+gentoo@gmail.com> wrote:
> >>> ST4000DM000
> >> 
> >> As a side-note these two Seagate 4TB "Desktop" edition drives I bought
> >> already, after about than 100 hours of power-on usage, both drives
> >> have each encountered dozens of unreadable sectors so far. I was able
> >> to correct them (force reallocation) using hdparm... So it should be
> >> "fixed", and I'm reading that this is "normal" with newer drives and
> >> "don't worry about it", but I'm still coming from the time when 1 bad
> >> sector = red alert, replace the drive ASAP.  I guess I will need to
> >> monitor and see if it gets worse.
> > 
> > Way back when in the bad old days of drives measured in 100s of megs,
> > you'd get a few bad sectors now and then, and would have to mark them as
> > faulty. This didn't bother us then much
> > 
> > Nowadays we have drives that are 8,000 bigger than that so all other
> > things being equal we'd expect sectors to fail 8,000 time more (more
> > being a very fuzzy concept, and I know full well I'm using it loosely :-)
> > )
> > 
> > Our drives nowadays also have smart firmware, something we had to
> > introduce when CHS no longer cut it, this lead to sector failures being
> > somewhat "invisible" leaving us with the happy delusion that drives were
> > vastly reliable etc etc etc. But you know all this.
> > 
> > A mere few dozen failures in the first 100 hours is a failure rate of
> > (Alan whips out the trust sci calculator) 4.8E-6%. Pretty damn
> > spectacular if you ask me and WELL within probabilities.
> > 
> > There is likely nothing wrong with your drives. If they are faulty, it's
> > highly likely a systemic manufacturing fault of the mechanicals (servo
> > systems, motor bearing etc)
> > 
> > You do realize that modern hard drives have for the longest time been up
> > there in the Top X list of Most Reliable Devices Made By Mankind Ever?
> 
> An update: the Seagate drives have both continued to spit more
> unrecoverable errors and find more and more bad sectors. Including
> some end-to-end errors indicated as critical "FAILING NOW" status in
> SMART. From what I have read that error means the drive's internal
> cache did not match the data written to disk, which seems like a
> serious flaw. The threshold is 1 which means if it happens at all, the
> drive should be replaced. It has happened half a dozen times on each
> disk so far (but not at the exact same time, so I don't think it is a
> host controller problem -- and other disks on the same controller and
> cable have had no issues). They have also been disconnecting and
> resetting randomly, sometimes requiring me to pull the drive and
> reinsert it into the enclosure to make it reappear. It happens even
> after I disabled APM, so I know it isn't a spin-down/idle timeout
> thing. Temperatures are actually very good (low 30's) so they are not
> overheating.
> 
> I think I will try to trade them in to Seagate for a new pair under
> warranty replacement. And then probably try to sell the replacements
> and be rid of them.
> 
> Meanwhile, during that experiment, I bought 2 brand new Western
> Digital Red 3TB drives last week. No problems in SMART testing or
> creating LVM/RAID/Filesystems. I have now been running the destructive
> write/read badblocks tests for 24+ hours and they have been perfect so
> far, exactly 0 errors. They are more expensive (3TB for the same price
> as the 4TB seagate) and slightly slower read/write speed (150MB/sec
> peak vs 170MB/sec peak), but I value reliability over all other
> factors.
> 
> These Seagate drives must have some kind of manufacturing defect, or
> perhaps were damaged in shipping... UPS have been known to treat
> packages like a football!

I've been watching this thread with interest, because I've been trying to find 
out which HDD I should be buying for a new PC.  For every person reporting 
problematic Seagates there's another person complaining about Western Digital 
being too noisy, failing, or in the case of the black versions, far too 
expensive.

Amidst all the anecdotal aphorisms against one or the other manufacturer, I 
saw mentioned that the likelihood of failure doubles up when you go from 1TB 
to 2 TB.  If true, I guess that the 3TB would have fewer failures than 4TB 
drive.

For what it's worth I have had a number of Seagates failing on me, but since 
this was in the 90's.  On my laptop a Seagate Momentus 7200.4 (ST9500420ASG) 
is running fine for the last 3.5 years so, I was thinking of taking a punt on 
a 'Seagate Barracuda 3.5 inch 2TB 7200 RPM 64MB 6GB/S Internal SATA'.  But 
what you're mentioning here gives me cause to pause.

-- 
Regards,
Mick

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-15  7:39                 ` Mick
@ 2013-07-15 15:39                   ` Paul Hartman
  2013-07-23 12:52                   ` J. Roeleveld
  1 sibling, 0 replies; 23+ messages in thread
From: Paul Hartman @ 2013-07-15 15:39 UTC (permalink / raw
  To: gentoo-user

On Mon, Jul 15, 2013 at 2:39 AM, Mick <michaelkintzios@gmail.com> wrote:
> I've been watching this thread with interest, because I've been trying to find
> out which HDD I should be buying for a new PC.  For every person reporting
> problematic Seagates there's another person complaining about Western Digital
> being too noisy, failing, or in the case of the black versions, far too
> expensive.
>
> Amidst all the anecdotal aphorisms against one or the other manufacturer, I
> saw mentioned that the likelihood of failure doubles up when you go from 1TB
> to 2 TB.  If true, I guess that the 3TB would have fewer failures than 4TB
> drive.
>
> For what it's worth I have had a number of Seagates failing on me, but since
> this was in the 90's.  On my laptop a Seagate Momentus 7200.4 (ST9500420ASG)
> is running fine for the last 3.5 years so, I was thinking of taking a punt on
> a 'Seagate Barracuda 3.5 inch 2TB 7200 RPM 64MB 6GB/S Internal SATA'.  But
> what you're mentioning here gives me cause to pause.

One important thing to know is that there are only 3 HDD manufacturers
remaining: Seagate, WD and Toshiba. Any other brand names you see are
just relabeled versions of those. Maxtor/IBM/Hitachi/Fujitsu/Samsung
and all those who came before them are gone.

My personal preference the last several years was always Samsung, I
never had a single problem with one of those. Unfortunately they are
no longer in the HDD business...

In general, for 3.5" drives, I think "NAS" or "RAID" or "Enterprise"
branded drives tend to be more expensive, but of a higher quality and
rated to run in 24/7 environments. Even if you're not using it that
way, it suggests that it's a more rugged drive. The "Personal",
"Desktop", "Budget" etc. and drives that come in external enclosures
tend to be a roll of the dice. Some have speculated that the HDDs
which score lower on quality assurance tests get stuck into these
lower-priced lines (kind of like CPU binning).

The Seagate "Desktop 4TB" drives I got for $140 have extremely
aggressive power-saving and spin-down (sometimes it takes 10 seconds
just to access the drive after it spins down!). They are 5400rpm, but
that is unadvertised and some people claim to have received 7200rpm.
The specs on lifetime are pretty poor. I read that they are only rated
for something like 200 days of cumulative use. But I expected it to at
least work for a week! I keep running passes of badblocks and it keeps
finding new bad sectors that weren't there the previous time I ran it.
It is literally degrading before my very eyes. I have zero trust in
it.

For 2.5" hard drives, I have seen many, many crashed 2.5" drives from
every brand, but never had one fail on me personally. I've always
attributed it to human influence, people tend to be rough on laptops,
tossing them onto the table, dropping them, leaving them in a hot or
freezing cold car, etc. Also the nature of laptop use means a lot of
on/off which means a lot of hot/cold which is really bad for hard
drives.

And for 5.25" hard drives I have an old 1.2 GB Quantum drive that
sounds like a screaming cat going through a jet engine. You can
seriously hear it from outside my house with all the windows and doors
closed. But it actually still works all these years later. :)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-user] hp H222 SAS controller
  2013-07-15  7:39                 ` Mick
  2013-07-15 15:39                   ` Paul Hartman
@ 2013-07-23 12:52                   ` J. Roeleveld
  1 sibling, 0 replies; 23+ messages in thread
From: J. Roeleveld @ 2013-07-23 12:52 UTC (permalink / raw
  To: gentoo-user

On Mon, July 15, 2013 09:39, Mick wrote:
> On Sunday 14 Jul 2013 23:35:50 Paul Hartman wrote:
>> On Mon, Jul 8, 2013 at 10:58 AM, Alan McKinnon <alan.mckinnon@gmail.com>
> wrote:
>> > On 08/07/2013 17:39, Paul Hartman wrote:
>> >> On Thu, Jul 4, 2013 at 9:04 PM, Paul Hartman
>> >>
>> >> <paul.hartman+gentoo@gmail.com> wrote:
>> >>> ST4000DM000
>> >>
>> >> As a side-note these two Seagate 4TB "Desktop" edition drives I
>> bought
>> >> already, after about than 100 hours of power-on usage, both drives
>> >> have each encountered dozens of unreadable sectors so far. I was able
>> >> to correct them (force reallocation) using hdparm... So it should be
>> >> "fixed", and I'm reading that this is "normal" with newer drives and
>> >> "don't worry about it", but I'm still coming from the time when 1 bad
>> >> sector = red alert, replace the drive ASAP.  I guess I will need to
>> >> monitor and see if it gets worse.
>> >
>> > Way back when in the bad old days of drives measured in 100s of megs,
>> > you'd get a few bad sectors now and then, and would have to mark them
>> as
>> > faulty. This didn't bother us then much
>> >
>> > Nowadays we have drives that are 8,000 bigger than that so all other
>> > things being equal we'd expect sectors to fail 8,000 time more (more
>> > being a very fuzzy concept, and I know full well I'm using it loosely
>> :-)
>> > )
>> >
>> > Our drives nowadays also have smart firmware, something we had to
>> > introduce when CHS no longer cut it, this lead to sector failures
>> being
>> > somewhat "invisible" leaving us with the happy delusion that drives
>> were
>> > vastly reliable etc etc etc. But you know all this.
>> >
>> > A mere few dozen failures in the first 100 hours is a failure rate of
>> > (Alan whips out the trust sci calculator) 4.8E-6%. Pretty damn
>> > spectacular if you ask me and WELL within probabilities.
>> >
>> > There is likely nothing wrong with your drives. If they are faulty,
>> it's
>> > highly likely a systemic manufacturing fault of the mechanicals (servo
>> > systems, motor bearing etc)
>> >
>> > You do realize that modern hard drives have for the longest time been
>> up
>> > there in the Top X list of Most Reliable Devices Made By Mankind Ever?
>>
>> An update: the Seagate drives have both continued to spit more
>> unrecoverable errors and find more and more bad sectors. Including
>> some end-to-end errors indicated as critical "FAILING NOW" status in
>> SMART. From what I have read that error means the drive's internal
>> cache did not match the data written to disk, which seems like a
>> serious flaw. The threshold is 1 which means if it happens at all, the
>> drive should be replaced. It has happened half a dozen times on each
>> disk so far (but not at the exact same time, so I don't think it is a
>> host controller problem -- and other disks on the same controller and
>> cable have had no issues). They have also been disconnecting and
>> resetting randomly, sometimes requiring me to pull the drive and
>> reinsert it into the enclosure to make it reappear. It happens even
>> after I disabled APM, so I know it isn't a spin-down/idle timeout
>> thing. Temperatures are actually very good (low 30's) so they are not
>> overheating.
>>
>> I think I will try to trade them in to Seagate for a new pair under
>> warranty replacement. And then probably try to sell the replacements
>> and be rid of them.
>>
>> Meanwhile, during that experiment, I bought 2 brand new Western
>> Digital Red 3TB drives last week. No problems in SMART testing or
>> creating LVM/RAID/Filesystems. I have now been running the destructive
>> write/read badblocks tests for 24+ hours and they have been perfect so
>> far, exactly 0 errors. They are more expensive (3TB for the same price
>> as the 4TB seagate) and slightly slower read/write speed (150MB/sec
>> peak vs 170MB/sec peak), but I value reliability over all other
>> factors.
>>
>> These Seagate drives must have some kind of manufacturing defect, or
>> perhaps were damaged in shipping... UPS have been known to treat
>> packages like a football!
>
> I've been watching this thread with interest, because I've been trying to
> find
> out which HDD I should be buying for a new PC.  For every person reporting
> problematic Seagates there's another person complaining about Western
> Digital
> being too noisy, failing, or in the case of the black versions, far too
> expensive.
>
> Amidst all the anecdotal aphorisms against one or the other manufacturer,
> I
> saw mentioned that the likelihood of failure doubles up when you go from
> 1TB
> to 2 TB.  If true, I guess that the 3TB would have fewer failures than 4TB
> drive.
>
> For what it's worth I have had a number of Seagates failing on me, but
> since
> this was in the 90's.  On my laptop a Seagate Momentus 7200.4
> (ST9500420ASG)
> is running fine for the last 3.5 years so, I was thinking of taking a punt
> on
> a 'Seagate Barracuda 3.5 inch 2TB 7200 RPM 64MB 6GB/S Internal SATA'.  But
> what you're mentioning here gives me cause to pause.

I usually tend to avoid comments about one brand or the other.
Due to some bad experiences in the past with some brands, and not many
issues with WD (only 1 dodgy drive, more details further in this email) in
the past couple of years, I use mainly WD-drives.

I currently have quite a few WD RED 3TB drives in use and these are quite
reliable. Even in a not-so-optimal climate. (Gets really hot where they
are, room temperature has been above 30C for the past several days)

In the previous location (yes, moved the server), there were more 30C+
days and the old servercase got kicked a few times by accident. This lead
to 1 WD green (1.5TB) out of 6 to end up being unreliable to the point
where this drive is as reliable as a car with water in the fueltank...

I also found the REDs to be a little more expensive then "regular" drives
and a lot cheaper then the Raid Edition drives.

--
Joost



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2013-07-23 12:53 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-02 19:56 [gentoo-user] hp H222 SAS controller Stefan G. Weichinger
2013-07-02 22:42 ` Paul Hartman
2013-07-03  7:29   ` Stefan G. Weichinger
2013-07-03 19:34     ` Paul Hartman
2013-07-04  9:21       ` Stefan G. Weichinger
2013-07-05  2:04         ` Paul Hartman
2013-07-08 15:39           ` Paul Hartman
2013-07-08 15:58             ` Alan McKinnon
2013-07-08 18:27               ` Stefan G. Weichinger
2013-07-08 21:42                 ` Alan McKinnon
2013-07-10 17:12                   ` Stefan G. Weichinger
2013-07-08 22:48                 ` Paul Hartman
2013-07-10 17:14                   ` Stefan G. Weichinger
2013-07-14 22:35               ` Paul Hartman
2013-07-15  7:39                 ` Mick
2013-07-15 15:39                   ` Paul Hartman
2013-07-23 12:52                   ` J. Roeleveld
2013-07-11 22:06   ` Stefan G. Weichinger
2013-07-11 22:38     ` Paul Hartman
2013-07-11 22:47       ` Stefan G. Weichinger
2013-07-11 23:02       ` Stefan G. Weichinger
2013-07-12 13:52         ` Stefan G. Weichinger
2013-07-12 17:21           ` Stefan G. Weichinger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox