* [gentoo-user] [Gentoo-User] emerge --sync likely to kill SSD?
@ 2014-06-19 2:36 microcai
2014-06-19 8:40 ` Amankwah
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: microcai @ 2014-06-19 2:36 UTC (permalink / raw
To: gentoo-user
rsync is doing bunch of 4k ramdon IO when updateing portage tree,
that will kill SSDs with much higher Write Amplification Factror.
I have a 2year old SSDs that have reported Write Amplification Factor
of 26. I think the only reason is that I put portage tree on this SSD
to speed it up.
what is the suggest way to reduce Write Amplification of a portage sync ?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] [Gentoo-User] emerge --sync likely to kill SSD?
2014-06-19 2:36 [gentoo-user] [Gentoo-User] emerge --sync likely to kill SSD? microcai
@ 2014-06-19 8:40 ` Amankwah
2014-06-19 11:44 ` Neil Bothwick
2014-06-19 22:03 ` Full Analyst
2014-06-20 17:48 ` [gentoo-user] " Kai Krakow
2 siblings, 1 reply; 17+ messages in thread
From: Amankwah @ 2014-06-19 8:40 UTC (permalink / raw
To: gentoo-user
On Thu, Jun 19, 2014 at 10:36:59AM +0800, microcai wrote:
> rsync is doing bunch of 4k ramdon IO when updateing portage tree,
> that will kill SSDs with much higher Write Amplification Factror.
>
>
> I have a 2year old SSDs that have reported Write Amplification Factor
> of 26. I think the only reason is that I put portage tree on this SSD
> to speed it up.
>
> what is the suggest way to reduce Write Amplification of a portage sync ?
>
Maybe the only solution is that move the portage tree to HDD??
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] [Gentoo-User] emerge --sync likely to kill SSD?
2014-06-19 8:40 ` Amankwah
@ 2014-06-19 11:44 ` Neil Bothwick
2014-06-19 11:56 ` Rich Freeman
0 siblings, 1 reply; 17+ messages in thread
From: Neil Bothwick @ 2014-06-19 11:44 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 319 bytes --]
On Thu, 19 Jun 2014 16:40:08 +0800, Amankwah wrote:
> Maybe the only solution is that move the portage tree to HDD??
Or tmpfs if you rarely reboot or have a fast enough connection to your
preferred portage mirror.
--
Neil Bothwick
The voices in my head may not be real, but they have some good ideas!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] [Gentoo-User] emerge --sync likely to kill SSD?
2014-06-19 11:44 ` Neil Bothwick
@ 2014-06-19 11:56 ` Rich Freeman
2014-06-19 12:16 ` Kerin Millar
0 siblings, 1 reply; 17+ messages in thread
From: Rich Freeman @ 2014-06-19 11:56 UTC (permalink / raw
To: gentoo-user
On Thu, Jun 19, 2014 at 7:44 AM, Neil Bothwick <neil@digimed.co.uk> wrote:
> On Thu, 19 Jun 2014 16:40:08 +0800, Amankwah wrote:
>
>> Maybe the only solution is that move the portage tree to HDD??
>
> Or tmpfs if you rarely reboot or have a fast enough connection to your
> preferred portage mirror.
There has been a proposal to move it to squashfs, which might
potentially also help.
The portage tree is 700M uncompressed, which seems like a bit much to
just leave in RAM all the time.
Mine is on an SSD, but the SMART attributes aren't well-documented so
I have no idea what the erase count or WAF is - just the LBA written
count.
Rich
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] [Gentoo-User] emerge --sync likely to kill SSD?
2014-06-19 11:56 ` Rich Freeman
@ 2014-06-19 12:16 ` Kerin Millar
0 siblings, 0 replies; 17+ messages in thread
From: Kerin Millar @ 2014-06-19 12:16 UTC (permalink / raw
To: gentoo-user
On 19/06/2014 12:56, Rich Freeman wrote:
> On Thu, Jun 19, 2014 at 7:44 AM, Neil Bothwick <neil@digimed.co.uk> wrote:
>> On Thu, 19 Jun 2014 16:40:08 +0800, Amankwah wrote:
>>
>>> Maybe the only solution is that move the portage tree to HDD??
>>
>> Or tmpfs if you rarely reboot or have a fast enough connection to your
>> preferred portage mirror.
>
> There has been a proposal to move it to squashfs, which might
> potentially also help.
>
> The portage tree is 700M uncompressed, which seems like a bit much to
> just leave in RAM all the time.
The tree will not necessarily be left in RAM all of the time. Pages
allocated by tmpfs reside in pagecache. Given sufficient pressure, they
may be migrated to swap. Even then, zswap [1] could be used so as to
reduce write amplification. I like Neil's suggestion, assuming that the
need to reboot is infrequent.
--Kerin
[1] https://www.kernel.org/doc/Documentation/vm/zswap.txt
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] [Gentoo-User] emerge --sync likely to kill SSD?
2014-06-19 2:36 [gentoo-user] [Gentoo-User] emerge --sync likely to kill SSD? microcai
2014-06-19 8:40 ` Amankwah
@ 2014-06-19 22:03 ` Full Analyst
2014-06-20 17:48 ` [gentoo-user] " Kai Krakow
2 siblings, 0 replies; 17+ messages in thread
From: Full Analyst @ 2014-06-19 22:03 UTC (permalink / raw
To: gentoo-user
Hello Microcal,
I use tmpfs heavily as I have an SSD.
Here are some information that can help you :
tank woody # mount -v | grep tmpfs
devtmpfs on /dev type devtmpfs
(rw,relatime,size=8050440k,nr_inodes=2012610,mode=755)
tmpfs on /run type tmpfs (rw,nosuid,nodev,relatime,size=1610408k,mode=755)
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime)
cgroup_root on /sys/fs/cgroup type tmpfs
(rw,nosuid,nodev,noexec,relatime,size=10240k,mode=755)
tmpfs on /var/tmp/portage type tmpfs (rw,size=12G)
tmpfs on /usr/portage type tmpfs (rw,size=12G)
tmpfs on /usr/src type tmpfs (rw,size=12G)
tmpfs on /tmp type tmpfs (rw,size=12G)
tmpfs on /home/woody/.mutt/cache type tmpfs (rw)
tank woody # cat /etc/fstab
# /etc/fstab: static file system information.
#
# noatime turns off atimes for increased performance (atimes normally
aren't
# needed); notail increases performance of ReiserFS (at the expense of
storage
# efficiency). It's safe to drop the noatime options if you want and to
# switch between notail / tail freely.
#
# The root filesystem should have a pass number of either 0 or 1.
# All other filesystems should have a pass number of 0 or greater than 1.
#
# See the manpage fstab(5) for more information.
#
# <fs> <mountpoint> <type> <opts> <dump/pass>
# NOTE: If your BOOT partition is ReiserFS, add the notail option to opts.
/dev/sda1 / ext4 noatime,discard,user_xattr 0 1
/dev/sda3 /home ext4 noatime,discard,user_xattr 0 1
#/dev/sda6 /home ext3 noatime 0 1
#/dev/sda2 none swap sw 0 0
tmpfs /var/tmp/portage tmpfs size=12G 0 0
tmpfs /usr/portage tmpfs size=12G 0 0
tmpfs /usr/src tmpfs size=12G 0 0
tmpfs /tmp tmpfs size=12G 0 0
tmpfs /home/woody/.mutt/cache/ tmpfs size=12G 0 0
#/dev/cdrom /mnt/cdrom auto noauto,ro 0 0
#/dev/fd0 /mnt/floppy auto noauto 0 0
tank woody #
For the /usr/portage directory, if you reboot, all you have to do is
emerge-webrsync or do like me :
tank woody # l /usr/ | grep portage
924221 4 drwxr-xr-x 170 root root 4096 1 mars 02:51 portage_tmpfs
6771 0 drwxr-xr-x 171 root root 3500 11 juin 20:40 portage
tank woody #
The /usr/portage_tmpfs is a backup of the /usr/portage, this avoid me
retreiving all portage information from gentoo's servers.
Please note that I also use www-misc/profile-sync-daemon in order to
store my browsers cache on /tmp.
I rarely shutdown my computer :)
Have fun
On 19/06/2014 04:36, microcai wrote:
> rsync is doing bunch of 4k ramdon IO when updateing portage tree,
> that will kill SSDs with much higher Write Amplification Factror.
>
>
> I have a 2year old SSDs that have reported Write Amplification Factor
> of 26. I think the only reason is that I put portage tree on this SSD
> to speed it up.
>
> what is the suggest way to reduce Write Amplification of a portage sync ?
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [gentoo-user] Re: [Gentoo-User] emerge --sync likely to kill SSD?
2014-06-19 2:36 [gentoo-user] [Gentoo-User] emerge --sync likely to kill SSD? microcai
2014-06-19 8:40 ` Amankwah
2014-06-19 22:03 ` Full Analyst
@ 2014-06-20 17:48 ` Kai Krakow
2014-06-21 4:54 ` microcai
2014-06-21 14:27 ` Peter Humphrey
2 siblings, 2 replies; 17+ messages in thread
From: Kai Krakow @ 2014-06-20 17:48 UTC (permalink / raw
To: gentoo-user
microcai <microcai@fedoraproject.org> schrieb:
> rsync is doing bunch of 4k ramdon IO when updateing portage tree,
> that will kill SSDs with much higher Write Amplification Factror.
>
>
> I have a 2year old SSDs that have reported Write Amplification Factor
> of 26. I think the only reason is that I put portage tree on this SSD
> to speed it up.
Use a file system that turns random writes into sequential writes, like the
pretty newcomer f2fs. You could try using it for your rootfs but currently I
suggest just creating a separate partition for it and either mount it as
/usr/portage or symlink that dir into this directory (that way you could use
it for other purposes, too, that generate random short writes, like log
files).
Then, I'd recommend changing your scheduler to deadline, bump up the io
queue depth to a much higher value (echo -n 2048 >
/sys/block/sdX/queue/nr_requests) and then change the dirty io flusher to
not run as early as it usually would (change vm.dirty_writeback_centisecs to
1500 and vm.dirty_expire_centisecs to 3000). That way the vfs layer has a
chance to better coalesce multi-block writes into one batch write, and f2fs
will take care of doing it in sequential order.
I'd also suggest not to use the discard mount options and instead create a
cronjob that runs fstrim on the SSD devices. But YMMV.
As a safety measure, only ever partition and use only 70-80% of your SSD so
it can reliably do its wear-leveling. It will improve lifetime and keep the
performance up even with filled filesystems.
--
Replies to list only preferred.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Re: [Gentoo-User] emerge --sync likely to kill SSD?
2014-06-20 17:48 ` [gentoo-user] " Kai Krakow
@ 2014-06-21 4:54 ` microcai
2014-06-21 14:27 ` Peter Humphrey
1 sibling, 0 replies; 17+ messages in thread
From: microcai @ 2014-06-21 4:54 UTC (permalink / raw
To: gentoo-user
2014-06-21 1:48 GMT+08:00 Kai Krakow <hurikhan77@gmail.com>:
> microcai <microcai@fedoraproject.org> schrieb:
>
>> rsync is doing bunch of 4k ramdon IO when updateing portage tree,
>> that will kill SSDs with much higher Write Amplification Factror.
>>
>>
>> I have a 2year old SSDs that have reported Write Amplification Factor
>> of 26. I think the only reason is that I put portage tree on this SSD
>> to speed it up.
>
> Use a file system that turns random writes into sequential writes, like the
> pretty newcomer f2fs. You could try using it for your rootfs but currently I
> suggest just creating a separate partition for it and either mount it as
> /usr/portage or symlink that dir into this directory (that way you could use
> it for other purposes, too, that generate random short writes, like log
> files).
>
> Then, I'd recommend changing your scheduler to deadline, bump up the io
> queue depth to a much higher value (echo -n 2048 >
> /sys/block/sdX/queue/nr_requests) and then change the dirty io flusher to
> not run as early as it usually would (change vm.dirty_writeback_centisecs to
> 1500 and vm.dirty_expire_centisecs to 3000). That way the vfs layer has a
> chance to better coalesce multi-block writes into one batch write, and f2fs
> will take care of doing it in sequential order.
>
> I'd also suggest not to use the discard mount options and instead create a
> cronjob that runs fstrim on the SSD devices. But YMMV.
>
> As a safety measure, only ever partition and use only 70-80% of your SSD so
> it can reliably do its wear-leveling. It will improve lifetime and keep the
> performance up even with filled filesystems.
>
> --
many thanks to all of you!
no I've put my portage tree on an F2FS partation now.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Re: [Gentoo-User] emerge --sync likely to kill SSD?
2014-06-20 17:48 ` [gentoo-user] " Kai Krakow
2014-06-21 4:54 ` microcai
@ 2014-06-21 14:27 ` Peter Humphrey
2014-06-21 14:54 ` Rich Freeman
2014-06-21 19:24 ` Kai Krakow
1 sibling, 2 replies; 17+ messages in thread
From: Peter Humphrey @ 2014-06-21 14:27 UTC (permalink / raw
To: gentoo-user
On Friday 20 June 2014 19:48:14 Kai Krakow wrote:
> microcai <microcai@fedoraproject.org> schrieb:
> > rsync is doing bunch of 4k ramdon IO when updateing portage tree,
> > that will kill SSDs with much higher Write Amplification Factror.
> >
> > I have a 2year old SSDs that have reported Write Amplification Factor
> > of 26. I think the only reason is that I put portage tree on this SSD
> > to speed it up.
>
> Use a file system that turns random writes into sequential writes, like the
> pretty newcomer f2fs. You could try using it for your rootfs but currently I
> suggest just creating a separate partition for it and either mount it as
> /usr/portage or symlink that dir into this directory (that way you could
> use it for other purposes, too, that generate random short writes, like log
> files).
Well, there's a surprise! Thanks for mentioning f2fs. I've just converted my
Atom box's seven partitions to it, recompiled the kernel to include it,
changed the fstab entries and rebooted. It just worked.
--->8
> I'd also suggest not to use the discard mount options and instead create a
> cronjob that runs fstrim on the SSD devices. But YMMV.
I found that fstrim can't work on f2fs file systems. I don't know whether
discard works yet.
Thanks again.
--
Regards
Peter
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Re: [Gentoo-User] emerge --sync likely to kill SSD?
2014-06-21 14:27 ` Peter Humphrey
@ 2014-06-21 14:54 ` Rich Freeman
2014-06-21 19:19 ` [gentoo-user] " Kai Krakow
2014-06-21 19:24 ` Kai Krakow
1 sibling, 1 reply; 17+ messages in thread
From: Rich Freeman @ 2014-06-21 14:54 UTC (permalink / raw
To: gentoo-user
On Sat, Jun 21, 2014 at 10:27 AM, Peter Humphrey <peter@prh.myzen.co.uk> wrote:
>
> I found that fstrim can't work on f2fs file systems. I don't know whether
> discard works yet.
Fstrim is to be preferred over discard in general. However, I suspect
neither is needed for something like f2fs. Being log-based it doesn't
really overwrite data in place. I suspect that it waits until an
entire region of the disk is unused and then it TRIMs the whole
region.
However, I haven't actually used it and only know the little I've read
about it. That is the principle of a log-based filesystem.
I'm running btrfs on my SSD root, which is supposed to be decent for
flash, but the SMART attributes of my drive aren't well-documented so
I couldn't tell you what the erase count is up to.
Rich
^ permalink raw reply [flat|nested] 17+ messages in thread
* [gentoo-user] Re: Re: [Gentoo-User] emerge --sync likely to kill SSD?
2014-06-21 14:54 ` Rich Freeman
@ 2014-06-21 19:19 ` Kai Krakow
0 siblings, 0 replies; 17+ messages in thread
From: Kai Krakow @ 2014-06-21 19:19 UTC (permalink / raw
To: gentoo-user
Rich Freeman <rich0@gentoo.org> schrieb:
> On Sat, Jun 21, 2014 at 10:27 AM, Peter Humphrey <peter@prh.myzen.co.uk>
> wrote:
>>
>> I found that fstrim can't work on f2fs file systems. I don't know whether
>> discard works yet.
>
> Fstrim is to be preferred over discard in general. However, I suspect
> neither is needed for something like f2fs. Being log-based it doesn't
> really overwrite data in place. I suspect that it waits until an
> entire region of the disk is unused and then it TRIMs the whole
> region.
F2fs prefers to fill an entire erase block before touching the next. It also
tries to coalese small writes into 16k blocks before submitting them to
disk. And according to the docs it supports trim/discard internally.
> However, I haven't actually used it and only know the little I've read
> about it. That is the principle of a log-based filesystem.
There's an article at LWN [1] and in the comments you can find a few
important information about the technical details.
Posted Oct 11, 2012 21:11 UTC (Thu) by arnd:
| * Wear leveling usually works by having a pool of available erase blocks
| in the drive. When you write to a new location, the drive takes on block
| out of that pool and writes the data there. When the drive thinks you
| are done writing to one block, it cleans up any partially written data
| and puts a different block back into the pool.
| * f2fs tries to group writes into larger operations of at least page size
| (16KB or more) to be efficient, current FTLs are horribly bad at 4KB
| page size writes. It also tries to fill erase blocks (multiples of 2MB)
| in the order that the devices can handle.
| * logfs actually works on block devices but hasn't been actively worked on
| over the last few years. f2fs also promises better performance by using
| only 6 erase blocks concurrently rather than 12 in the case of logfs. A
| lot of the underlying principles are the same though.
| * The "industry" is moving away from raw flash interfaces towards eMMC and
| related technologies (UFS, SD, ...). We are not going back to raw flash
| any time soon, which is unfortunate for a number of reasons but also has
| a few significant advantages. Having the FTL take care of bad block
| management and wear leveling is one such advantage, at least if they get
| it right.
According to wikipedia [2], some more interesting features are on the way,
like compression and data deduplication to lower the impact of writes.
[1]: http://lwn.net/Articles/518988/
[2]: http://en.wikipedia.org/wiki/F2FS
--
Replies to list only preferred.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [gentoo-user] Re: Re: [Gentoo-User] emerge --sync likely to kill SSD?
2014-06-21 14:27 ` Peter Humphrey
2014-06-21 14:54 ` Rich Freeman
@ 2014-06-21 19:24 ` Kai Krakow
2014-06-22 1:40 ` Rich Freeman
1 sibling, 1 reply; 17+ messages in thread
From: Kai Krakow @ 2014-06-21 19:24 UTC (permalink / raw
To: gentoo-user
Peter Humphrey <peter@prh.myzen.co.uk> schrieb:
> On Friday 20 June 2014 19:48:14 Kai Krakow wrote:
>> microcai <microcai@fedoraproject.org> schrieb:
>> > rsync is doing bunch of 4k ramdon IO when updateing portage tree,
>> > that will kill SSDs with much higher Write Amplification Factror.
>> >
>> > I have a 2year old SSDs that have reported Write Amplification Factor
>> > of 26. I think the only reason is that I put portage tree on this SSD
>> > to speed it up.
>>
>> Use a file system that turns random writes into sequential writes, like
>> the pretty newcomer f2fs. You could try using it for your rootfs but
>> currently I suggest just creating a separate partition for it and either
>> mount it as /usr/portage or symlink that dir into this directory (that
>> way you could use it for other purposes, too, that generate random short
>> writes, like log files).
>
> Well, there's a surprise! Thanks for mentioning f2fs. I've just converted
> my Atom box's seven partitions to it, recompiled the kernel to include it,
> changed the fstab entries and rebooted. It just worked.
It's said to be twice as fast with some workloads (especially write
workloads). Can you confirm that? I didn't try it that much yet - usually I
use it for pendrives only. I have no experience using it for rootfs.
And while we are at it, I'd also like to mention bcache. Tho, conversion is
not straight forward. However, I'm going to try that soon for my spinning
rust btrfs.
--
Replies to list only preferred.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Re: Re: [Gentoo-User] emerge --sync likely to kill SSD?
2014-06-21 19:24 ` Kai Krakow
@ 2014-06-22 1:40 ` Rich Freeman
2014-06-22 11:44 ` [gentoo-user] " Kai Krakow
0 siblings, 1 reply; 17+ messages in thread
From: Rich Freeman @ 2014-06-22 1:40 UTC (permalink / raw
To: gentoo-user
On Sat, Jun 21, 2014 at 3:24 PM, Kai Krakow <hurikhan77@gmail.com> wrote:
> And while we are at it, I'd also like to mention bcache. Tho, conversion is
> not straight forward. However, I'm going to try that soon for my spinning
> rust btrfs.
I contemplated that, but I'd really like to see btrfs support
something more native. Bcache is way too low-level for me and strikes
me as inefficient as a result. Plus, since it sits UNDER btrfs you'd
probably lose all the fancy volume management features.
ZFS has ssd caching as part of the actual filesystem, and that seems
MUCH cleaner.
Rich
^ permalink raw reply [flat|nested] 17+ messages in thread
* [gentoo-user] Re: Re: Re: [Gentoo-User] emerge --sync likely to kill SSD?
2014-06-22 1:40 ` Rich Freeman
@ 2014-06-22 11:44 ` Kai Krakow
2014-06-22 13:44 ` Rich Freeman
0 siblings, 1 reply; 17+ messages in thread
From: Kai Krakow @ 2014-06-22 11:44 UTC (permalink / raw
To: gentoo-user
Rich Freeman <rich0@gentoo.org> schrieb:
> On Sat, Jun 21, 2014 at 3:24 PM, Kai Krakow <hurikhan77@gmail.com> wrote:
>> And while we are at it, I'd also like to mention bcache. Tho, conversion
>> is not straight forward. However, I'm going to try that soon for my
>> spinning rust btrfs.
>
> I contemplated that, but I'd really like to see btrfs support
> something more native. Bcache is way too low-level for me and strikes
> me as inefficient as a result. Plus, since it sits UNDER btrfs you'd
> probably lose all the fancy volume management features.
I don't see where you could lose the volume management features. You just
add device on top of the bcache device after you initialized the raw device
with a bcache superblock and attached it. The rest works the same, just that
you use bcacheX instead of sdX devices.
Bcache is a general approach and it seems to work very well for that
already. There are hot data tracking patches and proposals to support adding
a cache device to the btrfs pool and let btrfs migrate data back and forth
between each. That would be native. But it still would lack the advanced
features ZFS implements to make use of such caching devices, implementing
even different strategies for ZIL, ARC, and L2ARC. That's the gap bcache
tries to jump.
> ZFS has ssd caching as part of the actual filesystem, and that seems
> MUCH cleaner.
Yes, it is much more mature in that regard. Comparing with ZFS, bcache is a
lot like ZIL, while hot data relocation in btrfs would be a lot like L2ARC.
ARC is a special purpose RAM cache separate from the VFS caches which has
special knowledge about ZFS structures to keep performance high. Some
filesystems implement something similar by keeping tree structures
completely in RAM. I think, both bcache and hot data tracking take parts of
the work that ARC does for ZFS - note that "hot data tracking" is a generic
VFS interface, while "hot data relocation" is something from btrfs. Both
work together but it is not there yet.
From that point of view, I don't think something like ZIL should be
implemented in btrfs itself but as a generic approach like bcache so every
component in Linux can make use of it. Hot data relocation OTOH is
interesting from another point of view and may become part of future btrfs
as it benefits from knowledge about the filesystem itself, using a generic
interface like "hot data tracking" in VFS - so other components can make use
of that, too.
A ZIL-like cache and hot data relocation could probably solve a lot of
fragmentation issues (especially a ZIL-like cache), so I hope work for that
will get pushed a little more soon.
Having to prepare devices for bcache is kind of a show-stopper because it is
no drop-in component that way. But OTOH I like that approach better than dm-
cache because it protects from using the backing device without going
through the caching layer which could otherwise severely damage your data,
and you get along with fewer devices and don't need to size a meta device
(which probably needs to grow later if you add devices, I don't know).
--
Replies to list only preferred.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Re: Re: Re: [Gentoo-User] emerge --sync likely to kill SSD?
2014-06-22 11:44 ` [gentoo-user] " Kai Krakow
@ 2014-06-22 13:44 ` Rich Freeman
2014-06-24 18:34 ` [gentoo-user] " Kai Krakow
0 siblings, 1 reply; 17+ messages in thread
From: Rich Freeman @ 2014-06-22 13:44 UTC (permalink / raw
To: gentoo-user
On Sun, Jun 22, 2014 at 7:44 AM, Kai Krakow <hurikhan77@gmail.com> wrote:
> I don't see where you could lose the volume management features. You just
> add device on top of the bcache device after you initialized the raw device
> with a bcache superblock and attached it. The rest works the same, just that
> you use bcacheX instead of sdX devices.
Ah, didn't realize you could attach/remove devices to bcache later.
Presumably it handles device failures gracefully, ie exposing them to
the underlying filesystem so that it can properly recover?
>
> From that point of view, I don't think something like ZIL should be
> implemented in btrfs itself but as a generic approach like bcache so every
> component in Linux can make use of it. Hot data relocation OTOH is
> interesting from another point of view and may become part of future btrfs
> as it benefits from knowledge about the filesystem itself, using a generic
> interface like "hot data tracking" in VFS - so other components can make use
> of that, too.
The only problem with doing stuff like this at a lower level (both
write and read caching) is that it isn't RAID-aware. If you write
10GB of data, you use 20GB of cache to do it if you're mirrored,
because the cache doesn't know about mirroring. Offhand I'm not sure
if there are any performance penalties as well around the need for
barriers/etc with the cache not being able to be relied on to do the
right thing in terms of what gets written out - also, the data isn't
redundant while it is on the cache, unless you mirror the cache.
Granted, if you're using it for write intent logging then there isn't
much getting around that.
> Having to prepare devices for bcache is kind of a show-stopper because it is
> no drop-in component that way. But OTOH I like that approach better than dm-
> cache because it protects from using the backing device without going
> through the caching layer which could otherwise severely damage your data,
> and you get along with fewer devices and don't need to size a meta device
> (which probably needs to grow later if you add devices, I don't know).
And this is the main thing keeping me away from it. It is REALLY
painful to migrate to/from. Having it integrated into the filesystem
delivers all the same benefits of not being able to mount it without
the cache present.
Now excuse me while I go fix my btrfs (I tried re-enabling snapper and
it again got the filesystem into a worked-up state after trying to
clean up half a dozen snapshots at the same time - it works fine until
you go and try to write a lot of data to it, then it stops syncing
though you don't necessarily notice until a few hours later when the
write cache exhausts RAM and on reboot your disk reverts back a few
hours). I suspect that if I just treat it gently for a few hours
btrfs will clean up the mess and it will work normally again, but the
damage apparently persists after a reboot if you go heavy in the disk
too quickly...
Rich
^ permalink raw reply [flat|nested] 17+ messages in thread
* [gentoo-user] Re: Re: Re: Re: [Gentoo-User] emerge --sync likely to kill SSD?
2014-06-22 13:44 ` Rich Freeman
@ 2014-06-24 18:34 ` Kai Krakow
2014-06-24 20:01 ` Rich Freeman
0 siblings, 1 reply; 17+ messages in thread
From: Kai Krakow @ 2014-06-24 18:34 UTC (permalink / raw
To: gentoo-user
Rich Freeman <rich0@gentoo.org> schrieb:
> On Sun, Jun 22, 2014 at 7:44 AM, Kai Krakow <hurikhan77@gmail.com> wrote:
>> I don't see where you could lose the volume management features. You just
>> add device on top of the bcache device after you initialized the raw
>> device with a bcache superblock and attached it. The rest works the same,
>> just that you use bcacheX instead of sdX devices.
>
> Ah, didn't realize you could attach/remove devices to bcache later.
> Presumably it handles device failures gracefully, ie exposing them to
> the underlying filesystem so that it can properly recover?
I'm not sure if multiple partitions can share the same cache device
partition but more or less that's it: Initialize bcache, then attach your
backing devices, then add those bcache devices to your btrfs.
I don't know how errors are handled, tho. But as with every caching
technique (even in ZFS) your data is likely toast if the cache device dies
in the middle of action. Thus, you should put bcache on LVM RAID if you are
going to use it for write caching (i.e. write-back mode). Read caching
should be okay (write-through mode). Bcache is a little slower than other
flash-cache implementations because it only reports data as written back to
the FS if it reached stable storage (which can be the cache device, tho, if
you are using write-back mode). It was also designed with unexpected reboots
in mind, read. It will replay transactions from its log on reboot. This
means, you can have unstable data conditions on the raw device which is why
you should never try to use that directly, e.g. from a rescue disk. But
since bcache wraps the partition with its own superblock this mistake should
be impossible.
I'm not sure how graceful device failures are handled. I suppose in write-
back mode you can get into trouble because it's too late for bcache to tell
the FS that there is a write error when it already confirmed that stable
storage has been hit. Maybe it will just keep the data around so you could
swap devices or will report the error next time when data is written to that
location. It probably interferes with btrfs RAID logic on that matter.
> The only problem with doing stuff like this at a lower level (both
> write and read caching) is that it isn't RAID-aware. If you write
> 10GB of data, you use 20GB of cache to do it if you're mirrored,
> because the cache doesn't know about mirroring.
Yes, it will write double the data to the cache then - but only if btrfs
also did actually read both copies (which it probably does not because it
has checksums and does not need to compare data, and lets just ignore the
case that another process could try to read the same data from the other
raid member later, that case should become optimized-out by the OS cache).
Otherwise both caches should work pretty individually with their own set of
data depending on how btrfs uses each device individually. Remember that
btrfs raid is not a block-based raid where block locations would match 1:1
on each device. Btrfs raid can place one mirror of data in two completely
different locations on each member device (which is actually a good thing in
case block errors accumulate in specific locations for a "faulty" model of a
disk). In case of write caching it will of course cache double the data
(because both members will be written to). But I think that's okay for the
same reasons, except it will wear your cache device faster. But in that case
I suggest to use individual SSDs for each btrfs member device anyways. It's
not optimal, I know. Could be useful to see some best practices and
pros/cons on that topic (individual cache device per btrfs member vs. bcache
on LVM RAID with bcache partitions on the RAID for all members). I think the
best strategy depends on if you are write-most or read-most.
Thanks for mentioning. Interesting thoughts. ;-)
> Offhand I'm not sure
> if there are any performance penalties as well around the need for
> barriers/etc with the cache not being able to be relied on to do the
> right thing in terms of what gets written out - also, the data isn't
> redundant while it is on the cache, unless you mirror the cache.
This is partialy what I outlined above. I think in case of write-caching,
there is no barriers pass-thru needed. Bcache will confirm the barriers and
that's all the FS needs to know (because bcache is supervising the FS, all
requests go through the bcache layer, no direct access to the backing
device). Of course, it's then bcache's job to ensure everything gets written
out correctly in the background (whenever it feels to do so). But it can use
its own write-barriers to ensure that for the underlying device - that's
nothing the FS has to care about. Performance should be faster anyway
because, well, you are writing to a faster device - that is what bcache is
all about, isn't it? ;-)
I don't think write-barriers for read caching are needed, at least not from
point of view of the FS. The caching layer, tho, will use it internally for
its caching structures. If that will have a bad effect on performance is
probably dependent on the implementation, but my intuition says: No
performance impact because putting read data in the cache can be defered and
then data will be written in the background (write-behind).
> Granted, if you're using it for write intent logging then there isn't
> much getting around that.
Well, sure for bcache. But I think in case of FS-internal write caching
devices that case could be handled gracefully (the method which you'd
prefer). Since in the internal case the cache has knowledge about the FS bad
block handling, it can just retry writing data to another location/disk or
keep it around until the admin fixed the problem with the backing device.
BTW: SSD firmwares usually suffer similar problems like outlined above
because they do writes in the background when they already confirmed
persistence to the OS layer. This is why SSD failures are usually much
severe compared HDD failures. Do some research, and you should find tests
about that topic. Especially consumer SSD firmwares have a big problem with
that. So I'm not sure if it really should be bcache's job to fix that
particular problem. You should just ensure good firmware and proper failure
protection at the hardware level if you want to do fancy caching stuff - the
FTL should be able to hide those problems before the whole thing explodes,
then report errors before it is able to no longer ensure correct
persistence. I suppose that is also the detail where enterprise grade SSDs
behave different. HDDs have related issues (SATA vs enterprise SCSI vs SAS,
hotword: IO timeouts and bad blocks, and why you should not use consumer
hardware for RAIDs). I think all the same holds true for ZFS.
>> Having to prepare devices for bcache is kind of a show-stopper because it
>> is no drop-in component that way. But OTOH I like that approach better
>> than dm- cache because it protects from using the backing device without
>> going through the caching layer which could otherwise severely damage
>> your data, and you get along with fewer devices and don't need to size a
>> meta device (which probably needs to grow later if you add devices, I
>> don't know).
>
> And this is the main thing keeping me away from it. It is REALLY
> painful to migrate to/from. Having it integrated into the filesystem
> delivers all the same benefits of not being able to mount it without
> the cache present.
The migration pain is what currently keeps me away, too. Otherwise I would
just buy one of those fancy new cheap but still speedy Crucial SSDs and
"just enable" bcache... :-\
> Now excuse me while I go fix my btrfs (I tried re-enabling snapper and
> it again got the filesystem into a worked-up state after trying to
> clean up half a dozen snapshots at the same time - it works fine until
> you go and try to write a lot of data to it, then it stops syncing
> though you don't necessarily notice until a few hours later when the
> write cache exhausts RAM and on reboot your disk reverts back a few
> hours). I suspect that if I just treat it gently for a few hours
> btrfs will clean up the mess and it will work normally again, but the
> damage apparently persists after a reboot if you go heavy in the disk
> too quickly...
You should report that to the btrfs list. You could try to "echo w >
/proc/sysrq-trigger" and look at the blocked processes list in dmesg
afterwards. I'm sure one important btrfs thread is in blocked state then...
--
Replies to list only preferred.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Re: Re: Re: Re: [Gentoo-User] emerge --sync likely to kill SSD?
2014-06-24 18:34 ` [gentoo-user] " Kai Krakow
@ 2014-06-24 20:01 ` Rich Freeman
0 siblings, 0 replies; 17+ messages in thread
From: Rich Freeman @ 2014-06-24 20:01 UTC (permalink / raw
To: gentoo-user
On Tue, Jun 24, 2014 at 2:34 PM, Kai Krakow <hurikhan77@gmail.com> wrote:
> I'm not sure if multiple partitions can share the same cache device
> partition but more or less that's it: Initialize bcache, then attach your
> backing devices, then add those bcache devices to your btrfs.
Ah, if you are stuck with one bcache partition per cached device then
that will be fairly painful to manage.
> Yes, it will write double the data to the cache then - but only if btrfs
> also did actually read both copies (which it probably does not because it
> has checksums and does not need to compare data, and lets just ignore the
> case that another process could try to read the same data from the other
> raid member later, that case should become optimized-out by the OS cache).
I didn't realize you were proposing read caching only. If you're only
caching reads then obviously that is much safer. I think with btrfs
in raid1 mode with only two devices you can tell it to prefer a
particular device for reading in which case you could just bcache that
drive. It would only read from the other drive if the cache failed.
However, I don't think btrfs lets you manually arrange drives into
array-like structures. It auto-balances everything which is usually a
plus, but if you have 30 disks you can't tell it to treat them as 6x
5-disk RAID5s vs one 30-disk raid5 (I think).
Rich
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2014-06-24 20:01 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-19 2:36 [gentoo-user] [Gentoo-User] emerge --sync likely to kill SSD? microcai
2014-06-19 8:40 ` Amankwah
2014-06-19 11:44 ` Neil Bothwick
2014-06-19 11:56 ` Rich Freeman
2014-06-19 12:16 ` Kerin Millar
2014-06-19 22:03 ` Full Analyst
2014-06-20 17:48 ` [gentoo-user] " Kai Krakow
2014-06-21 4:54 ` microcai
2014-06-21 14:27 ` Peter Humphrey
2014-06-21 14:54 ` Rich Freeman
2014-06-21 19:19 ` [gentoo-user] " Kai Krakow
2014-06-21 19:24 ` Kai Krakow
2014-06-22 1:40 ` Rich Freeman
2014-06-22 11:44 ` [gentoo-user] " Kai Krakow
2014-06-22 13:44 ` Rich Freeman
2014-06-24 18:34 ` [gentoo-user] " Kai Krakow
2014-06-24 20:01 ` Rich Freeman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox