public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-user] Mysteriously dismounting partition
@ 2015-10-26 14:47 Peter Humphrey
  2015-10-27 11:04 ` Stefan G. Weichinger
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Humphrey @ 2015-10-26 14:47 UTC (permalink / raw
  To: gentoo-user

Hello list,

I have a small rescue system in this box, using /dev/sda3 and /dev/sdb3 in a 
traditional partition layout. The disks are (supposedly) identical SSDs. All 
goes well when I boot the system, but by the time I come to write to sdb3 it's 
dismounted itself. It even dismounted itself once in the middle of syncing 
portage. Here's a snippet from fstab:

LABEL=RescueSys     /           ext4   relatime   1 1
LABEL=RescUsrBits   /usr-bits   ext4   relatime   1 2

I keep the portage tree under /usr-bits.

# dmesg | grep sdb3
[    1.753508]  sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 sdb7 sdb8 sdb9 >
[    4.833460] EXT4-fs (sdb3): mounted filesystem with ordered data mode. Opts: 
(null)
[  107.205918] EXT4-fs (sdb3): mounted filesystem with ordered data mode. Opts: 
(null)

You can see the successful mount at 4.8 s; the entry at 107 s is me mounting 
it again manually.

I've rewritten the partition label, and I've run a smartctl test which 
reported no faults found. I've also just reduced the speed of the chipset, 
which has three settings: good performance, better performance and turbo. It 
adopts the turbo setting by default and I've now set it to "better". It's too 
early yet to see if that will help.

What else can I try?

-- 
Rgds
Peter



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-user] Mysteriously dismounting partition
  2015-10-26 14:47 [gentoo-user] Mysteriously dismounting partition Peter Humphrey
@ 2015-10-27 11:04 ` Stefan G. Weichinger
  2015-10-27 12:25   ` Peter Humphrey
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan G. Weichinger @ 2015-10-27 11:04 UTC (permalink / raw
  To: gentoo-user

Am 26.10.2015 um 15:47 schrieb Peter Humphrey:

> I keep the portage tree under /usr-bits.
> 
> # dmesg | grep sdb3
> [    1.753508]  sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 sdb7 sdb8 sdb9 >
> [    4.833460] EXT4-fs (sdb3): mounted filesystem with ordered data mode. Opts: 
> (null)
> [  107.205918] EXT4-fs (sdb3): mounted filesystem with ordered data mode. Opts: 
> (null)
> 
> You can see the successful mount at 4.8 s; the entry at 107 s is me mounting 
> it again manually.
> 
> I've rewritten the partition label, and I've run a smartctl test which 
> reported no faults found. I've also just reduced the speed of the chipset, 
> which has three settings: good performance, better performance and turbo. It 
> adopts the turbo setting by default and I've now set it to "better". It's too 
> early yet to see if that will help.

interesting ...

What init-system? openrc or systemd?
No trace of the actual unmount in any logs?

Maybe also look/grep for the LABEL of the fs.

Maybe test if using the device-name itself ( /dev/sdb3 ) or the UUID in
fstab changes the behavior.

I use UUIDs here without problems (with systemd).



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-user] Mysteriously dismounting partition
  2015-10-27 11:04 ` Stefan G. Weichinger
@ 2015-10-27 12:25   ` Peter Humphrey
  2015-10-27 14:16     ` J. Roeleveld
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Humphrey @ 2015-10-27 12:25 UTC (permalink / raw
  To: gentoo-user

On Tuesday 27 October 2015 12:04:46 Stefan G. Weichinger wrote:
> Am 26.10.2015 um 15:47 schrieb Peter Humphrey:
> > I keep the portage tree under /usr-bits.
> > 
> > # dmesg | grep sdb3
> > [    1.753508]  sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 sdb7 sdb8 sdb9 >
> > [    4.833460] EXT4-fs (sdb3): mounted filesystem with ordered data mode.
> > Opts: (null)
> > [  107.205918] EXT4-fs (sdb3): mounted filesystem with ordered data mode.
> > Opts: (null)
> > 
> > You can see the successful mount at 4.8 s; the entry at 107 s is me
> > mounting it again manually.
> > 
> > I've rewritten the partition label, and I've run a smartctl test which
> > reported no faults found. I've also just reduced the speed of the chipset,
> > which has three settings: good performance, better performance and turbo.
> > It adopts the turbo setting by default and I've now set it to "better".
> > It's too early yet to see if that will help.
> 
> interesting ...
> 
> What init-system? openrc or systemd?

Openrc.

> No trace of the actual unmount in any logs?

Not that I can find, no.

> Maybe also look/grep for the LABEL of the fs.

Nope, nor that.

> Maybe test if using the device-name itself ( /dev/sdb3 ) or the UUID in
> fstab changes the behavior.

I'll try reverting to /dev/sdb3 and see if that helps.

> I use UUIDs here without problems (with systemd).

The only thing I use UUIDs for here is in mdadm.conf to get the LVs started 
reliably for the main system*. Those live in partitions /dev/sd[ab][5789].

Three more things: I've had the cover off and checked the seating of the SATA 
cables; while the lid was off I watched the MB LEDs during startup, which 
seemed okay; and today the kernel was upgraded from 4.0.5 to 4.0.9; that may 
help too. (Hm ... too many changes at once.)

* Now that I think of it, one of the LVs came up as inactive the other day, 
and nothing I could think of would activate it (consulting man mdadm of 
course). In the end I had to reboot. This machine has shown some bizarre 
behaviour over the last few months. Something is definitely wrong; I just can't 
figure out what it is.

-- 
Rgds
Peter



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-user] Mysteriously dismounting partition
  2015-10-27 12:25   ` Peter Humphrey
@ 2015-10-27 14:16     ` J. Roeleveld
  2015-10-27 16:36       ` Peter Humphrey
  0 siblings, 1 reply; 7+ messages in thread
From: J. Roeleveld @ 2015-10-27 14:16 UTC (permalink / raw
  To: gentoo-user

On Tuesday, October 27, 2015 12:25:07 PM Peter Humphrey wrote:
> On Tuesday 27 October 2015 12:04:46 Stefan G. Weichinger wrote:
> > Am 26.10.2015 um 15:47 schrieb Peter Humphrey:
> > > I keep the portage tree under /usr-bits.
> > > 
> > > # dmesg | grep sdb3
> > > [    1.753508]  sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 sdb7 sdb8 sdb9 >
> > > [    4.833460] EXT4-fs (sdb3): mounted filesystem with ordered data
> > > mode.
> > > Opts: (null)
> > > [  107.205918] EXT4-fs (sdb3): mounted filesystem with ordered data
> > > mode.
> > > Opts: (null)
> > > 
> > > You can see the successful mount at 4.8 s; the entry at 107 s is me
> > > mounting it again manually.
> > > 
> > > I've rewritten the partition label, and I've run a smartctl test which
> > > reported no faults found. I've also just reduced the speed of the
> > > chipset,
> > > which has three settings: good performance, better performance and
> > > turbo.
> > > It adopts the turbo setting by default and I've now set it to "better".
> > > It's too early yet to see if that will help.
> > 
> > interesting ...
> > 
> > What init-system? openrc or systemd?
> 
> Openrc.
> 
> > No trace of the actual unmount in any logs?
> 
> Not that I can find, no.
> 
> > Maybe also look/grep for the LABEL of the fs.
> 
> Nope, nor that.
> 
> > Maybe test if using the device-name itself ( /dev/sdb3 ) or the UUID in
> > fstab changes the behavior.
> 
> I'll try reverting to /dev/sdb3 and see if that helps.
> 
> > I use UUIDs here without problems (with systemd).
> 
> The only thing I use UUIDs for here is in mdadm.conf to get the LVs started
> reliably for the main system*. Those live in partitions /dev/sd[ab][5789].
> 
> Three more things: I've had the cover off and checked the seating of the
> SATA cables; while the lid was off I watched the MB LEDs during startup,
> which seemed okay; and today the kernel was upgraded from 4.0.5 to 4.0.9;
> that may help too. (Hm ... too many changes at once.)
> 
> * Now that I think of it, one of the LVs came up as inactive the other day,
> and nothing I could think of would activate it (consulting man mdadm of
> course). In the end I had to reboot. This machine has shown some bizarre
> behaviour over the last few months. Something is definitely wrong; I just
> can't figure out what it is.

The full log for that entire period might be useful.

If a disk is umounted/removed, something needs to be logged somewhere.
Might even be a comment from the scsi-subsystem or the SATA driver.

I usually only grep the log to try to find specific messages.
If I know the time-period something weird happened in, I tend to go through 
the unfiltered log for that period.

--
Joost


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-user] Mysteriously dismounting partition
  2015-10-27 14:16     ` J. Roeleveld
@ 2015-10-27 16:36       ` Peter Humphrey
  2015-10-27 17:18         ` J. Roeleveld
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Humphrey @ 2015-10-27 16:36 UTC (permalink / raw
  To: gentoo-user

On Tuesday 27 October 2015 15:16:26 J. Roeleveld wrote:

> If a disk is umounted/removed, something needs to be logged somewhere.
> Might even be a comment from the scsi-subsystem or the SATA driver.
> 
> I usually only grep the log to try to find specific messages.
> If I know the time-period something weird happened in, I tend to go through
> the unfiltered log for that period.

I have been scanning dmesg and /var/log/messages by eye and not noticed 
anything. I'll keep doing it though.

-- 
Rgds
Peter



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-user] Mysteriously dismounting partition
  2015-10-27 16:36       ` Peter Humphrey
@ 2015-10-27 17:18         ` J. Roeleveld
  2015-10-28 10:27           ` Peter Humphrey
  0 siblings, 1 reply; 7+ messages in thread
From: J. Roeleveld @ 2015-10-27 17:18 UTC (permalink / raw
  To: gentoo-user

On 27 October 2015 17:36:00 CET, Peter Humphrey <peter@prh.myzen.co.uk> wrote:
>On Tuesday 27 October 2015 15:16:26 J. Roeleveld wrote:
>
>> If a disk is umounted/removed, something needs to be logged
>somewhere.
>> Might even be a comment from the scsi-subsystem or the SATA driver.
>> 
>> I usually only grep the log to try to find specific messages.
>> If I know the time-period something weird happened in, I tend to go
>through
>> the unfiltered log for that period.
>
>I have been scanning dmesg and /var/log/messages by eye and not noticed
>
>anything. I'll keep doing it though.

What does your fstab look like?

And maybe some more info, like which kernel version. Mount version.
And maybe check for some weird crontab entry somewhere?

You could also rule out the use of umount by replacing it with a wrapper script that logs every call with as much info as is possible?

--
Joost
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-user] Mysteriously dismounting partition
  2015-10-27 17:18         ` J. Roeleveld
@ 2015-10-28 10:27           ` Peter Humphrey
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Humphrey @ 2015-10-28 10:27 UTC (permalink / raw
  To: gentoo-user

On Tuesday 27 October 2015 18:18:18 J. Roeleveld wrote:
> On 27 October 2015 17:36:00 CET, Peter Humphrey <peter@prh.myzen.co.uk> 
wrote:
> >On Tuesday 27 October 2015 15:16:26 J. Roeleveld wrote:
> >> If a disk is umounted/removed, something needs to be logged
> >>somewhere.
> >
> >> Might even be a comment from the scsi-subsystem or the SATA driver.
> >> 
> >> I usually only grep the log to try to find specific messages.
> >> If I know the time-period something weird happened in, I tend to go
> >> through the unfiltered log for that period.
> >
> > I have been scanning dmesg and /var/log/messages by eye and not noticed
> > anything. I'll keep doing it though.
> 
> What does your fstab look like?

I gave the two relevant lines in my first message, but here's the whole thing:

LABEL=RescueSys     /                               ext4 relatime        1 1
LABEL=RescUsrBits   /usr-bits                       ext4 relatime        1 2
/dev/md1            /boot                           ext2 relatime,noauto 1 2
/dev/md5            /mnt/main             ext4   relatime,noauto,dev,exec 0 2
/dev/vg7/local      /mnt/main/usr/local             ext4 relatime,noauto 0 3
/dev/vg7/home       /mnt/main/home                  ext4 relatime,noauto 0 3
/dev/vg7/common     /mnt/main/home/prh/common       ext4 relatime,noauto 0 4
/dev/vg7/virt       /mnt/main/home/prh/.VirtualBox  ext4 relatime,noauto 0 4
/dev/vg7/boinc      /mnt/main/home/prh/boinc        ext4 relatime,noauto 0 4
/dev/vg7/var        /mnt/main/var                   ext4 relatime,noauto 0 2
/dev/vg7/portage    /mnt/main/usr/portage           ext4 relatime,noauto 0 2
/dev/vg7/packages   /mnt/main/usr/portage/packages  ext4 relatime,noauto 0 3
/dev/vg7/distfiles  /mnt/main/usr/portage/distfiles ext4 relatime,noauto 0 3
/dev/vg7/opt        /mnt/main/opt                   ext4 relatime,noauto 0 3
/dev/vg7/atom       /mnt/main/mnt/atom              ext4 relatime,noauto 0 3
/dev/vg7/atomresc   /mnt/main/mnt/atomresc          ext4 relatime,noauto 0 3
/dev/vg7/tpad       /mnt/main/mnt/tpad              ext4 relatime,noauto 0 3
/dev/vg7/vartmp     /mnt/main/var/tmp               ext4 relatime,noauto 0 3
/dev/sdc5           /mnt/sdc              ext4    relatime,noauto,user 0 0
/dev/sdd5           /mnt/sdd              ext4   relatime,noauto,user 0 0
/dev/sr0            /mnt/dvd                      iso9660 noauto,user    0 0
/dev/sda2           none                            swap sw              0 0
/dev/sdb2           none                            swap sw              0 0
tmpfs               /tmp                  tmpfs  nodev,nosuid,size=6G 0 0
proc                /proc                           proc defaults        0 0
shm                 /dev/shm              tmpfs  nodev,nosuid,noexec 0 0
/dev/md8            /mnt/qt5                        ext4 noauto,relatime 0 0
/dev/vg9/home       /mnt/qt5/home                   ext4 noauto,relatime 0 0
/dev/vg9/var        /mnt/qt5/var                    ext4 noauto,relatime 0 0
/dev/vg9/vartmp     /mnt/qt5/var/tmp   ext4  noauto,relatime,nosuid,nodev 0 0
/dev/vg9/local      /mnt/qt5/usr/local              ext4 noauto,relatime 0 0
/dev/vg9/portage    /mnt/qt5/usr/portage            ext4 noauto,relatime 0 0
/dev/vg9/packages   /mnt/qt5/usr/portage/packages   ext4 noauto,relatime 0 0
/dev/vg9/distfiles  /mnt/qt5/usr/portage/distfiles  ext4 noauto,relatime 0 0

> And maybe some more info, like which kernel version. Mount version.
> And maybe check for some weird crontab entry somewhere?

No cron installed. Kernel was 4.0.5 at the time of writing, upgraded to 4.0.9 
yesterday. Mount is in sys-apps/util-linux-2.26.2 - was 2.25.2-r2 until 27 
September but the problem occurred with both versions.

> You could also rule out the use of umount by replacing it with a wrapper
> script that logs every call with as much info as is possible?

Hm. That may be above my bash grade.

I'm inclined to suspect the kernel. No real evidence, just that I've booted 
the rescue system twice since installing the new kernel and everything worked 
as it should.

-- 
Rgds
Peter



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-10-28 10:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-26 14:47 [gentoo-user] Mysteriously dismounting partition Peter Humphrey
2015-10-27 11:04 ` Stefan G. Weichinger
2015-10-27 12:25   ` Peter Humphrey
2015-10-27 14:16     ` J. Roeleveld
2015-10-27 16:36       ` Peter Humphrey
2015-10-27 17:18         ` J. Roeleveld
2015-10-28 10:27           ` Peter Humphrey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox