[gentoo-user] dma_intr errors on heavy writes -- cause for concern?

public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed

* [gentoo-user] dma_intr errors on heavy writes -- cause for concern?
@ 2006-11-01  1:08 Richard Broersma Jr
  2006-11-01  1:32 ` Hemmann, Volker Armin
  2006-11-01  4:08 ` Richard Fish
  0 siblings, 2 replies; 7+ messages in thread
From: Richard Broersma Jr @ 2006-11-01  1:08 UTC (permalink / raw
  To: Gentoo Users

During certain times I my server under goes heavy disc writes.  When this happen I get the
following error.  Should I be concerned?  Does anyone know of any resources that I can read up on
that will explain that all of this mean?

hda/hdc are my remaining MAXTOR drives combined are a RAID1 mirror using mdadm.  I have another
software RAID10 array using 4 WesternDigital drives, but I have not yet seen anything like these
errors on this array.
 
Thanks for the help.

Oct 26 16:58:59 [kernel] hda: dma_intr: error=0x10 { SectorIdNotFound }, LBAsect=183618872251967,
high=10944537, low=10982975, sector=430413375
Oct 26 16:58:59 [kernel] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Oct 26 16:58:59 [kernel] hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Oct 26 16:58:59 [kernel] ide: failed opcode was: unknown
Oct 26 16:59:00 [kernel] hdc: dma_intr: error=0x10 { SectorIdNotFound }, LBAsect=190215942397799,
high=11337753, low=11362151, sector=430792551
Oct 26 16:59:00 [kernel] ide: failed opcode was: unknown
Oct 26 18:12:09 [kernel] hdc: dma_intr: error=0x10 { SectorIdNotFound }, LBAsect=182519360557247,
high=10879001, low=10916031, sector=430346431
Oct 26 18:12:09 [kernel] hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Oct 26 18:12:09 [kernel] ide: failed opcode was: unknown


Regards,

Richard Broersma Jr.
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-user] dma_intr errors on heavy writes -- cause for concern?
  2006-11-01  1:08 [gentoo-user] dma_intr errors on heavy writes -- cause for concern? Richard Broersma Jr
@ 2006-11-01  1:32 ` Hemmann, Volker Armin
  2006-11-01  7:22   ` Alan
  2006-11-01  4:08 ` Richard Fish
  1 sibling, 1 reply; 7+ messages in thread
From: Hemmann, Volker Armin @ 2006-11-01  1:32 UTC (permalink / raw
  To: gentoo-user

On Wednesday 01 November 2006 02:08, Richard Broersma Jr wrote:
> During certain times I my server under goes heavy disc writes.  When this
> happen I get the following error.  Should I be concerned?  Does anyone know
> of any resources that I can read up on that will explain that all of this
> mean?

yes, you should. Errors like this have usually one of this causes:

disk is dying
cable is defective
PSU drops voltages under load
controller is defective
ram is defective
board is just junk

>From most likely to least likely.

-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-user] dma_intr errors on heavy writes -- cause for concern?
  2006-11-01  1:08 [gentoo-user] dma_intr errors on heavy writes -- cause for concern? Richard Broersma Jr
  2006-11-01  1:32 ` Hemmann, Volker Armin
@ 2006-11-01  4:08 ` Richard Fish
  2006-11-01  6:14   ` Richard Broersma Jr
  1 sibling, 1 reply; 7+ messages in thread
From: Richard Fish @ 2006-11-01  4:08 UTC (permalink / raw
  To: gentoo-user

On 10/31/06, Richard Broersma Jr <rabroersma@yahoo.com> wrote:
> During certain times I my server under goes heavy disc writes.  When this happen I get the
> following error.  Should I be concerned?  Does anyone know of any resources that I can read up on
> that will explain that all of this mean?

I don't have any resources, but my understanding is that LBAsect is
the logical sector of the block device (i.e, the raid array), while
sector is the physical sector of disk.  LBAsect looks
suspicious...referencing a sector that is somewhere around the 91
petabytes address.  You didn't mention how large the disks are, but
even there they are requesting a sector that is about 220G from the
beginning of the disk, and returning that no such sector exists
(SectorIdNotFound).

So my guess is that your filesystem is getting confused under load,
and trying to access stuff that is beyond the end of your raid array.
So, which fs and kernel version?

-Richard
-- 
gentoo-user@gentoo.org mailing list

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-user] dma_intr errors on heavy writes -- cause for concern?
  2006-11-01  4:08 ` Richard Fish
@ 2006-11-01  6:14   ` Richard Broersma Jr
  2006-11-01  8:16     ` Richard Fish
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Broersma Jr @ 2006-11-01  6:14 UTC (permalink / raw
  To: gentoo-user

> On 10/31/06, Richard Broersma Jr <rabroersma@yahoo.com> wrote:
> > During certain times I my server under goes heavy disc writes.  When this happen I get the
> > following error.  Should I be concerned?  Does anyone know of any resources that I can read up
> on
> > that will explain that all of this mean?
> 
> I don't have any resources, but my understanding is that LBAsect is
> the logical sector of the block device (i.e, the raid array), while
> sector is the physical sector of disk.  LBAsect looks
> suspicious...referencing a sector that is somewhere around the 91
> petabytes address.  You didn't mention how large the disks are, but
> even there they are requesting a sector that is about 220G from the
> beginning of the disk, and returning that no such sector exists
> (SectorIdNotFound).
> 
> So my guess is that your filesystem is getting confused under load,
> and trying to access stuff that is beyond the end of your raid array.
> So, which fs and kernel version?

oops, I was mistake, I forgot that when I re-arrange my disks my RAID10 is partly using hda/hdc.

Linux version 2.6.17-gentoo-r7 (root@db_server01) 
(gcc version 4.1.1 (Gentoo 4.1.1)) #8 Sun Oct 8 20:28:34 PDT 2006

md4 : active raid10 hdg1[3] hde1[2] hdc1[1] hda1[0]
      586098688 blocks 1024K chunks 2 near-copies [4/4] [UUUU]
fstab
/dev/md4 /home ext3 noatime 0 2
df
/dev/md/4   576901664   7284500 540312232   2% /home

Disk /dev/hda: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/hda1               1       38913   312568641   fd  Linux raid autodetect

Disk /dev/hdc: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/hdc1               1       38913   312568641   fd  Linux raid autodetect

Disk /dev/hde: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/hde1               1       38913   312568641   fd  Linux raid autodetect

Disk /dev/hdg: 300.0 GB, 300090728448 bytes
255 heads, 63 sectors/track, 36483 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/hdg1               1       36483   293049666   fd  Linux raid autodetect

Regards,

Richard Broersma Jr.

-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-user] dma_intr errors on heavy writes -- cause for concern?
  2006-11-01  1:32 ` Hemmann, Volker Armin
@ 2006-11-01  7:22   ` Alan
  2006-11-01 18:09     ` Hemmann, Volker Armin
  0 siblings, 1 reply; 7+ messages in thread
From: Alan @ 2006-11-01  7:22 UTC (permalink / raw
  To: gentoo-user

On Wed, Nov 01, 2006 at 02:32:36AM +0100, Hemmann, Volker Armin wrote:
> On Wednesday 01 November 2006 02:08, Richard Broersma Jr wrote:
> > During certain times I my server under goes heavy disc writes.  When this
> > happen I get the following error.  Should I be concerned?  Does anyone know
> > of any resources that I can read up on that will explain that all of this
> > mean?
> 
> yes, you should. Errors like this have usually one of this causes:
> 
> disk is dying
> cable is defective
> PSU drops voltages under load
> controller is defective
> ram is defective
> board is just junk
> 
> >From most likely to least likely.

Seconded.... dma_intr errors are never good, pretty much each time I've
seen them they've preceeded a drive failure.  From what I remember it's
basically the disk failing and then going out of DMA mode to try to
recover and failing (or something like that).  Short answer is start
shopping for new disks if you like your data to be safe :)  Or at
minimal do a backup of anything important on there ASAP.

-- 
Alan <alan@ufies.org> - http://arcterex.net
--------------------------------------------------------------------
"Backups are for people who don't pray."                 -- big Mike
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-user] dma_intr errors on heavy writes -- cause for concern?
  2006-11-01  6:14   ` Richard Broersma Jr
@ 2006-11-01  8:16     ` Richard Fish
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Fish @ 2006-11-01  8:16 UTC (permalink / raw
  To: gentoo-user

On 10/31/06, Richard Broersma Jr <rabroersma@yahoo.com> wrote:
> > So my guess is that your filesystem is getting confused under load,
> > and trying to access stuff that is beyond the end of your raid array.
> > So, which fs and kernel version?
>
> oops, I was mistake, I forgot that when I re-arrange my disks my RAID10 is partly using hda/hdc.
>
> Linux version 2.6.17-gentoo-r7 (root@db_server01)
> (gcc version 4.1.1 (Gentoo 4.1.1)) #8 Sun Oct 8 20:28:34 PDT 2006
>
> md4 : active raid10 hdg1[3] hde1[2] hdc1[1] hda1[0]
>       586098688 blocks 1024K chunks 2 near-copies [4/4] [UUUU]

Ok well this sort of changes things for me.  I would start to suspect
hardware...particularly any hardware that is specific to hda/hdc, and
particularly the cables (since you mentioned "re-arranging" things).
Remember that UDMA cables are really sensitive to length (really must
be less than 18 inches long), and damage.

One thing you could try is move the disks around.  Linux software raid
is pretty tolerant to those kinds of changes, so it should be safe to
exchange hdc and hdg, for example.  If the problem follows the hda
drive to hdg, then you know you have a drive about to fail.  If it now
happens with a different drive on hda, then cable, motherboard, or RAM
issues have to be suspected.

-Richard
-- 
gentoo-user@gentoo.org mailing list

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-user] dma_intr errors on heavy writes -- cause for concern?
  2006-11-01  7:22   ` Alan
@ 2006-11-01 18:09     ` Hemmann, Volker Armin
  0 siblings, 0 replies; 7+ messages in thread
From: Hemmann, Volker Armin @ 2006-11-01 18:09 UTC (permalink / raw
  To: gentoo-user

On Wednesday 01 November 2006 08:22, Alan wrote:
> On Wed, Nov 01, 2006 at 02:32:36AM +0100, Hemmann, Volker Armin wrote:
> > On Wednesday 01 November 2006 02:08, Richard Broersma Jr wrote:
> > > During certain times I my server under goes heavy disc writes.  When
> > > this happen I get the following error.  Should I be concerned?  Does
> > > anyone know of any resources that I can read up on that will explain
> > > that all of this mean?
> >
> > yes, you should. Errors like this have usually one of this causes:
> >
> > disk is dying
> > cable is defective
> > PSU drops voltages under load
> > controller is defective
> > ram is defective
> > board is just junk
> >
> > >From most likely to least likely.
>
> Seconded.... dma_intr errors are never good, pretty much each time I've
> seen them they've preceeded a drive failure.  From what I remember it's
> basically the disk failing and then going out of DMA mode to try to
> recover and failing (or something like that).  Short answer is start
> shopping for new disks if you like your data to be safe :)  Or at
> minimal do a backup of anything important on there ASAP.

well, in my experience, a broken cable with a loose connection can do that 
too. The IDE-cables are very sensitive, sometimes, when switching drives or 
just reseating the connectors one of the wires can break. And then you get 
random errors, which will creep up, when the cable gets warm or cold, or the 
under load or when the fans blow a little bit harder...

I solved some harddrive problems simply by changing the cables.
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-11-01 18:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-01  1:08 [gentoo-user] dma_intr errors on heavy writes -- cause for concern? Richard Broersma Jr
2006-11-01  1:32 ` Hemmann, Volker Armin
2006-11-01  7:22   ` Alan
2006-11-01 18:09     ` Hemmann, Volker Armin
2006-11-01  4:08 ` Richard Fish
2006-11-01  6:14   ` Richard Broersma Jr
2006-11-01  8:16     ` Richard Fish

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox