* [gentoo-user] [Gentoo-User] emerge --sync likely to kill SSD? @ 2014-06-19 2:36 microcai 2014-06-19 8:40 ` Amankwah ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: microcai @ 2014-06-19 2:36 UTC (permalink / raw To: gentoo-user rsync is doing bunch of 4k ramdon IO when updateing portage tree, that will kill SSDs with much higher Write Amplification Factror. I have a 2year old SSDs that have reported Write Amplification Factor of 26. I think the only reason is that I put portage tree on this SSD to speed it up. what is the suggest way to reduce Write Amplification of a portage sync ? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] [Gentoo-User] emerge --sync likely to kill SSD? 2014-06-19 2:36 [gentoo-user] [Gentoo-User] emerge --sync likely to kill SSD? microcai @ 2014-06-19 8:40 ` Amankwah 2014-06-19 11:44 ` Neil Bothwick 2014-06-19 22:03 ` Full Analyst 2014-06-20 17:48 ` [gentoo-user] " Kai Krakow 2 siblings, 1 reply; 17+ messages in thread From: Amankwah @ 2014-06-19 8:40 UTC (permalink / raw To: gentoo-user On Thu, Jun 19, 2014 at 10:36:59AM +0800, microcai wrote: > rsync is doing bunch of 4k ramdon IO when updateing portage tree, > that will kill SSDs with much higher Write Amplification Factror. > > > I have a 2year old SSDs that have reported Write Amplification Factor > of 26. I think the only reason is that I put portage tree on this SSD > to speed it up. > > what is the suggest way to reduce Write Amplification of a portage sync ? > Maybe the only solution is that move the portage tree to HDD?? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] [Gentoo-User] emerge --sync likely to kill SSD? 2014-06-19 8:40 ` Amankwah @ 2014-06-19 11:44 ` Neil Bothwick 2014-06-19 11:56 ` Rich Freeman 0 siblings, 1 reply; 17+ messages in thread From: Neil Bothwick @ 2014-06-19 11:44 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 319 bytes --] On Thu, 19 Jun 2014 16:40:08 +0800, Amankwah wrote: > Maybe the only solution is that move the portage tree to HDD?? Or tmpfs if you rarely reboot or have a fast enough connection to your preferred portage mirror. -- Neil Bothwick The voices in my head may not be real, but they have some good ideas! [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] [Gentoo-User] emerge --sync likely to kill SSD? 2014-06-19 11:44 ` Neil Bothwick @ 2014-06-19 11:56 ` Rich Freeman 2014-06-19 12:16 ` Kerin Millar 0 siblings, 1 reply; 17+ messages in thread From: Rich Freeman @ 2014-06-19 11:56 UTC (permalink / raw To: gentoo-user On Thu, Jun 19, 2014 at 7:44 AM, Neil Bothwick <neil@digimed.co.uk> wrote: > On Thu, 19 Jun 2014 16:40:08 +0800, Amankwah wrote: > >> Maybe the only solution is that move the portage tree to HDD?? > > Or tmpfs if you rarely reboot or have a fast enough connection to your > preferred portage mirror. There has been a proposal to move it to squashfs, which might potentially also help. The portage tree is 700M uncompressed, which seems like a bit much to just leave in RAM all the time. Mine is on an SSD, but the SMART attributes aren't well-documented so I have no idea what the erase count or WAF is - just the LBA written count. Rich ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] [Gentoo-User] emerge --sync likely to kill SSD? 2014-06-19 11:56 ` Rich Freeman @ 2014-06-19 12:16 ` Kerin Millar 0 siblings, 0 replies; 17+ messages in thread From: Kerin Millar @ 2014-06-19 12:16 UTC (permalink / raw To: gentoo-user On 19/06/2014 12:56, Rich Freeman wrote: > On Thu, Jun 19, 2014 at 7:44 AM, Neil Bothwick <neil@digimed.co.uk> wrote: >> On Thu, 19 Jun 2014 16:40:08 +0800, Amankwah wrote: >> >>> Maybe the only solution is that move the portage tree to HDD?? >> >> Or tmpfs if you rarely reboot or have a fast enough connection to your >> preferred portage mirror. > > There has been a proposal to move it to squashfs, which might > potentially also help. > > The portage tree is 700M uncompressed, which seems like a bit much to > just leave in RAM all the time. The tree will not necessarily be left in RAM all of the time. Pages allocated by tmpfs reside in pagecache. Given sufficient pressure, they may be migrated to swap. Even then, zswap [1] could be used so as to reduce write amplification. I like Neil's suggestion, assuming that the need to reboot is infrequent. --Kerin [1] https://www.kernel.org/doc/Documentation/vm/zswap.txt ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] [Gentoo-User] emerge --sync likely to kill SSD? 2014-06-19 2:36 [gentoo-user] [Gentoo-User] emerge --sync likely to kill SSD? microcai 2014-06-19 8:40 ` Amankwah @ 2014-06-19 22:03 ` Full Analyst 2014-06-20 17:48 ` [gentoo-user] " Kai Krakow 2 siblings, 0 replies; 17+ messages in thread From: Full Analyst @ 2014-06-19 22:03 UTC (permalink / raw To: gentoo-user Hello Microcal, I use tmpfs heavily as I have an SSD. Here are some information that can help you : tank woody # mount -v | grep tmpfs devtmpfs on /dev type devtmpfs (rw,relatime,size=8050440k,nr_inodes=2012610,mode=755) tmpfs on /run type tmpfs (rw,nosuid,nodev,relatime,size=1610408k,mode=755) shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime) cgroup_root on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime,size=10240k,mode=755) tmpfs on /var/tmp/portage type tmpfs (rw,size=12G) tmpfs on /usr/portage type tmpfs (rw,size=12G) tmpfs on /usr/src type tmpfs (rw,size=12G) tmpfs on /tmp type tmpfs (rw,size=12G) tmpfs on /home/woody/.mutt/cache type tmpfs (rw) tank woody # cat /etc/fstab # /etc/fstab: static file system information. # # noatime turns off atimes for increased performance (atimes normally aren't # needed); notail increases performance of ReiserFS (at the expense of storage # efficiency). It's safe to drop the noatime options if you want and to # switch between notail / tail freely. # # The root filesystem should have a pass number of either 0 or 1. # All other filesystems should have a pass number of 0 or greater than 1. # # See the manpage fstab(5) for more information. # # <fs> <mountpoint> <type> <opts> <dump/pass> # NOTE: If your BOOT partition is ReiserFS, add the notail option to opts. /dev/sda1 / ext4 noatime,discard,user_xattr 0 1 /dev/sda3 /home ext4 noatime,discard,user_xattr 0 1 #/dev/sda6 /home ext3 noatime 0 1 #/dev/sda2 none swap sw 0 0 tmpfs /var/tmp/portage tmpfs size=12G 0 0 tmpfs /usr/portage tmpfs size=12G 0 0 tmpfs /usr/src tmpfs size=12G 0 0 tmpfs /tmp tmpfs size=12G 0 0 tmpfs /home/woody/.mutt/cache/ tmpfs size=12G 0 0 #/dev/cdrom /mnt/cdrom auto noauto,ro 0 0 #/dev/fd0 /mnt/floppy auto noauto 0 0 tank woody # For the /usr/portage directory, if you reboot, all you have to do is emerge-webrsync or do like me : tank woody # l /usr/ | grep portage 924221 4 drwxr-xr-x 170 root root 4096 1 mars 02:51 portage_tmpfs 6771 0 drwxr-xr-x 171 root root 3500 11 juin 20:40 portage tank woody # The /usr/portage_tmpfs is a backup of the /usr/portage, this avoid me retreiving all portage information from gentoo's servers. Please note that I also use www-misc/profile-sync-daemon in order to store my browsers cache on /tmp. I rarely shutdown my computer :) Have fun On 19/06/2014 04:36, microcai wrote: > rsync is doing bunch of 4k ramdon IO when updateing portage tree, > that will kill SSDs with much higher Write Amplification Factror. > > > I have a 2year old SSDs that have reported Write Amplification Factor > of 26. I think the only reason is that I put portage tree on this SSD > to speed it up. > > what is the suggest way to reduce Write Amplification of a portage sync ? > ^ permalink raw reply [flat|nested] 17+ messages in thread
* [gentoo-user] Re: [Gentoo-User] emerge --sync likely to kill SSD? 2014-06-19 2:36 [gentoo-user] [Gentoo-User] emerge --sync likely to kill SSD? microcai 2014-06-19 8:40 ` Amankwah 2014-06-19 22:03 ` Full Analyst @ 2014-06-20 17:48 ` Kai Krakow 2014-06-21 4:54 ` microcai 2014-06-21 14:27 ` Peter Humphrey 2 siblings, 2 replies; 17+ messages in thread From: Kai Krakow @ 2014-06-20 17:48 UTC (permalink / raw To: gentoo-user microcai <microcai@fedoraproject.org> schrieb: > rsync is doing bunch of 4k ramdon IO when updateing portage tree, > that will kill SSDs with much higher Write Amplification Factror. > > > I have a 2year old SSDs that have reported Write Amplification Factor > of 26. I think the only reason is that I put portage tree on this SSD > to speed it up. Use a file system that turns random writes into sequential writes, like the pretty newcomer f2fs. You could try using it for your rootfs but currently I suggest just creating a separate partition for it and either mount it as /usr/portage or symlink that dir into this directory (that way you could use it for other purposes, too, that generate random short writes, like log files). Then, I'd recommend changing your scheduler to deadline, bump up the io queue depth to a much higher value (echo -n 2048 > /sys/block/sdX/queue/nr_requests) and then change the dirty io flusher to not run as early as it usually would (change vm.dirty_writeback_centisecs to 1500 and vm.dirty_expire_centisecs to 3000). That way the vfs layer has a chance to better coalesce multi-block writes into one batch write, and f2fs will take care of doing it in sequential order. I'd also suggest not to use the discard mount options and instead create a cronjob that runs fstrim on the SSD devices. But YMMV. As a safety measure, only ever partition and use only 70-80% of your SSD so it can reliably do its wear-leveling. It will improve lifetime and keep the performance up even with filled filesystems. -- Replies to list only preferred. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Re: [Gentoo-User] emerge --sync likely to kill SSD? 2014-06-20 17:48 ` [gentoo-user] " Kai Krakow @ 2014-06-21 4:54 ` microcai 2014-06-21 14:27 ` Peter Humphrey 1 sibling, 0 replies; 17+ messages in thread From: microcai @ 2014-06-21 4:54 UTC (permalink / raw To: gentoo-user 2014-06-21 1:48 GMT+08:00 Kai Krakow <hurikhan77@gmail.com>: > microcai <microcai@fedoraproject.org> schrieb: > >> rsync is doing bunch of 4k ramdon IO when updateing portage tree, >> that will kill SSDs with much higher Write Amplification Factror. >> >> >> I have a 2year old SSDs that have reported Write Amplification Factor >> of 26. I think the only reason is that I put portage tree on this SSD >> to speed it up. > > Use a file system that turns random writes into sequential writes, like the > pretty newcomer f2fs. You could try using it for your rootfs but currently I > suggest just creating a separate partition for it and either mount it as > /usr/portage or symlink that dir into this directory (that way you could use > it for other purposes, too, that generate random short writes, like log > files). > > Then, I'd recommend changing your scheduler to deadline, bump up the io > queue depth to a much higher value (echo -n 2048 > > /sys/block/sdX/queue/nr_requests) and then change the dirty io flusher to > not run as early as it usually would (change vm.dirty_writeback_centisecs to > 1500 and vm.dirty_expire_centisecs to 3000). That way the vfs layer has a > chance to better coalesce multi-block writes into one batch write, and f2fs > will take care of doing it in sequential order. > > I'd also suggest not to use the discard mount options and instead create a > cronjob that runs fstrim on the SSD devices. But YMMV. > > As a safety measure, only ever partition and use only 70-80% of your SSD so > it can reliably do its wear-leveling. It will improve lifetime and keep the > performance up even with filled filesystems. > > -- many thanks to all of you! no I've put my portage tree on an F2FS partation now. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Re: [Gentoo-User] emerge --sync likely to kill SSD? 2014-06-20 17:48 ` [gentoo-user] " Kai Krakow 2014-06-21 4:54 ` microcai @ 2014-06-21 14:27 ` Peter Humphrey 2014-06-21 14:54 ` Rich Freeman 2014-06-21 19:24 ` Kai Krakow 1 sibling, 2 replies; 17+ messages in thread From: Peter Humphrey @ 2014-06-21 14:27 UTC (permalink / raw To: gentoo-user On Friday 20 June 2014 19:48:14 Kai Krakow wrote: > microcai <microcai@fedoraproject.org> schrieb: > > rsync is doing bunch of 4k ramdon IO when updateing portage tree, > > that will kill SSDs with much higher Write Amplification Factror. > > > > I have a 2year old SSDs that have reported Write Amplification Factor > > of 26. I think the only reason is that I put portage tree on this SSD > > to speed it up. > > Use a file system that turns random writes into sequential writes, like the > pretty newcomer f2fs. You could try using it for your rootfs but currently I > suggest just creating a separate partition for it and either mount it as > /usr/portage or symlink that dir into this directory (that way you could > use it for other purposes, too, that generate random short writes, like log > files). Well, there's a surprise! Thanks for mentioning f2fs. I've just converted my Atom box's seven partitions to it, recompiled the kernel to include it, changed the fstab entries and rebooted. It just worked. --->8 > I'd also suggest not to use the discard mount options and instead create a > cronjob that runs fstrim on the SSD devices. But YMMV. I found that fstrim can't work on f2fs file systems. I don't know whether discard works yet. Thanks again. -- Regards Peter ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Re: [Gentoo-User] emerge --sync likely to kill SSD? 2014-06-21 14:27 ` Peter Humphrey @ 2014-06-21 14:54 ` Rich Freeman 2014-06-21 19:19 ` [gentoo-user] " Kai Krakow 2014-06-21 19:24 ` Kai Krakow 1 sibling, 1 reply; 17+ messages in thread From: Rich Freeman @ 2014-06-21 14:54 UTC (permalink / raw To: gentoo-user On Sat, Jun 21, 2014 at 10:27 AM, Peter Humphrey <peter@prh.myzen.co.uk> wrote: > > I found that fstrim can't work on f2fs file systems. I don't know whether > discard works yet. Fstrim is to be preferred over discard in general. However, I suspect neither is needed for something like f2fs. Being log-based it doesn't really overwrite data in place. I suspect that it waits until an entire region of the disk is unused and then it TRIMs the whole region. However, I haven't actually used it and only know the little I've read about it. That is the principle of a log-based filesystem. I'm running btrfs on my SSD root, which is supposed to be decent for flash, but the SMART attributes of my drive aren't well-documented so I couldn't tell you what the erase count is up to. Rich ^ permalink raw reply [flat|nested] 17+ messages in thread
* [gentoo-user] Re: Re: [Gentoo-User] emerge --sync likely to kill SSD? 2014-06-21 14:54 ` Rich Freeman @ 2014-06-21 19:19 ` Kai Krakow 0 siblings, 0 replies; 17+ messages in thread From: Kai Krakow @ 2014-06-21 19:19 UTC (permalink / raw To: gentoo-user Rich Freeman <rich0@gentoo.org> schrieb: > On Sat, Jun 21, 2014 at 10:27 AM, Peter Humphrey <peter@prh.myzen.co.uk> > wrote: >> >> I found that fstrim can't work on f2fs file systems. I don't know whether >> discard works yet. > > Fstrim is to be preferred over discard in general. However, I suspect > neither is needed for something like f2fs. Being log-based it doesn't > really overwrite data in place. I suspect that it waits until an > entire region of the disk is unused and then it TRIMs the whole > region. F2fs prefers to fill an entire erase block before touching the next. It also tries to coalese small writes into 16k blocks before submitting them to disk. And according to the docs it supports trim/discard internally. > However, I haven't actually used it and only know the little I've read > about it. That is the principle of a log-based filesystem. There's an article at LWN [1] and in the comments you can find a few important information about the technical details. Posted Oct 11, 2012 21:11 UTC (Thu) by arnd: | * Wear leveling usually works by having a pool of available erase blocks | in the drive. When you write to a new location, the drive takes on block | out of that pool and writes the data there. When the drive thinks you | are done writing to one block, it cleans up any partially written data | and puts a different block back into the pool. | * f2fs tries to group writes into larger operations of at least page size | (16KB or more) to be efficient, current FTLs are horribly bad at 4KB | page size writes. It also tries to fill erase blocks (multiples of 2MB) | in the order that the devices can handle. | * logfs actually works on block devices but hasn't been actively worked on | over the last few years. f2fs also promises better performance by using | only 6 erase blocks concurrently rather than 12 in the case of logfs. A | lot of the underlying principles are the same though. | * The "industry" is moving away from raw flash interfaces towards eMMC and | related technologies (UFS, SD, ...). We are not going back to raw flash | any time soon, which is unfortunate for a number of reasons but also has | a few significant advantages. Having the FTL take care of bad block | management and wear leveling is one such advantage, at least if they get | it right. According to wikipedia [2], some more interesting features are on the way, like compression and data deduplication to lower the impact of writes. [1]: http://lwn.net/Articles/518988/ [2]: http://en.wikipedia.org/wiki/F2FS -- Replies to list only preferred. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [gentoo-user] Re: Re: [Gentoo-User] emerge --sync likely to kill SSD? 2014-06-21 14:27 ` Peter Humphrey 2014-06-21 14:54 ` Rich Freeman @ 2014-06-21 19:24 ` Kai Krakow 2014-06-22 1:40 ` Rich Freeman 1 sibling, 1 reply; 17+ messages in thread From: Kai Krakow @ 2014-06-21 19:24 UTC (permalink / raw To: gentoo-user Peter Humphrey <peter@prh.myzen.co.uk> schrieb: > On Friday 20 June 2014 19:48:14 Kai Krakow wrote: >> microcai <microcai@fedoraproject.org> schrieb: >> > rsync is doing bunch of 4k ramdon IO when updateing portage tree, >> > that will kill SSDs with much higher Write Amplification Factror. >> > >> > I have a 2year old SSDs that have reported Write Amplification Factor >> > of 26. I think the only reason is that I put portage tree on this SSD >> > to speed it up. >> >> Use a file system that turns random writes into sequential writes, like >> the pretty newcomer f2fs. You could try using it for your rootfs but >> currently I suggest just creating a separate partition for it and either >> mount it as /usr/portage or symlink that dir into this directory (that >> way you could use it for other purposes, too, that generate random short >> writes, like log files). > > Well, there's a surprise! Thanks for mentioning f2fs. I've just converted > my Atom box's seven partitions to it, recompiled the kernel to include it, > changed the fstab entries and rebooted. It just worked. It's said to be twice as fast with some workloads (especially write workloads). Can you confirm that? I didn't try it that much yet - usually I use it for pendrives only. I have no experience using it for rootfs. And while we are at it, I'd also like to mention bcache. Tho, conversion is not straight forward. However, I'm going to try that soon for my spinning rust btrfs. -- Replies to list only preferred. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Re: Re: [Gentoo-User] emerge --sync likely to kill SSD? 2014-06-21 19:24 ` Kai Krakow @ 2014-06-22 1:40 ` Rich Freeman 2014-06-22 11:44 ` [gentoo-user] " Kai Krakow 0 siblings, 1 reply; 17+ messages in thread From: Rich Freeman @ 2014-06-22 1:40 UTC (permalink / raw To: gentoo-user On Sat, Jun 21, 2014 at 3:24 PM, Kai Krakow <hurikhan77@gmail.com> wrote: > And while we are at it, I'd also like to mention bcache. Tho, conversion is > not straight forward. However, I'm going to try that soon for my spinning > rust btrfs. I contemplated that, but I'd really like to see btrfs support something more native. Bcache is way too low-level for me and strikes me as inefficient as a result. Plus, since it sits UNDER btrfs you'd probably lose all the fancy volume management features. ZFS has ssd caching as part of the actual filesystem, and that seems MUCH cleaner. Rich ^ permalink raw reply [flat|nested] 17+ messages in thread
* [gentoo-user] Re: Re: Re: [Gentoo-User] emerge --sync likely to kill SSD? 2014-06-22 1:40 ` Rich Freeman @ 2014-06-22 11:44 ` Kai Krakow 2014-06-22 13:44 ` Rich Freeman 0 siblings, 1 reply; 17+ messages in thread From: Kai Krakow @ 2014-06-22 11:44 UTC (permalink / raw To: gentoo-user Rich Freeman <rich0@gentoo.org> schrieb: > On Sat, Jun 21, 2014 at 3:24 PM, Kai Krakow <hurikhan77@gmail.com> wrote: >> And while we are at it, I'd also like to mention bcache. Tho, conversion >> is not straight forward. However, I'm going to try that soon for my >> spinning rust btrfs. > > I contemplated that, but I'd really like to see btrfs support > something more native. Bcache is way too low-level for me and strikes > me as inefficient as a result. Plus, since it sits UNDER btrfs you'd > probably lose all the fancy volume management features. I don't see where you could lose the volume management features. You just add device on top of the bcache device after you initialized the raw device with a bcache superblock and attached it. The rest works the same, just that you use bcacheX instead of sdX devices. Bcache is a general approach and it seems to work very well for that already. There are hot data tracking patches and proposals to support adding a cache device to the btrfs pool and let btrfs migrate data back and forth between each. That would be native. But it still would lack the advanced features ZFS implements to make use of such caching devices, implementing even different strategies for ZIL, ARC, and L2ARC. That's the gap bcache tries to jump. > ZFS has ssd caching as part of the actual filesystem, and that seems > MUCH cleaner. Yes, it is much more mature in that regard. Comparing with ZFS, bcache is a lot like ZIL, while hot data relocation in btrfs would be a lot like L2ARC. ARC is a special purpose RAM cache separate from the VFS caches which has special knowledge about ZFS structures to keep performance high. Some filesystems implement something similar by keeping tree structures completely in RAM. I think, both bcache and hot data tracking take parts of the work that ARC does for ZFS - note that "hot data tracking" is a generic VFS interface, while "hot data relocation" is something from btrfs. Both work together but it is not there yet. From that point of view, I don't think something like ZIL should be implemented in btrfs itself but as a generic approach like bcache so every component in Linux can make use of it. Hot data relocation OTOH is interesting from another point of view and may become part of future btrfs as it benefits from knowledge about the filesystem itself, using a generic interface like "hot data tracking" in VFS - so other components can make use of that, too. A ZIL-like cache and hot data relocation could probably solve a lot of fragmentation issues (especially a ZIL-like cache), so I hope work for that will get pushed a little more soon. Having to prepare devices for bcache is kind of a show-stopper because it is no drop-in component that way. But OTOH I like that approach better than dm- cache because it protects from using the backing device without going through the caching layer which could otherwise severely damage your data, and you get along with fewer devices and don't need to size a meta device (which probably needs to grow later if you add devices, I don't know). -- Replies to list only preferred. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Re: Re: Re: [Gentoo-User] emerge --sync likely to kill SSD? 2014-06-22 11:44 ` [gentoo-user] " Kai Krakow @ 2014-06-22 13:44 ` Rich Freeman 2014-06-24 18:34 ` [gentoo-user] " Kai Krakow 0 siblings, 1 reply; 17+ messages in thread From: Rich Freeman @ 2014-06-22 13:44 UTC (permalink / raw To: gentoo-user On Sun, Jun 22, 2014 at 7:44 AM, Kai Krakow <hurikhan77@gmail.com> wrote: > I don't see where you could lose the volume management features. You just > add device on top of the bcache device after you initialized the raw device > with a bcache superblock and attached it. The rest works the same, just that > you use bcacheX instead of sdX devices. Ah, didn't realize you could attach/remove devices to bcache later. Presumably it handles device failures gracefully, ie exposing them to the underlying filesystem so that it can properly recover? > > From that point of view, I don't think something like ZIL should be > implemented in btrfs itself but as a generic approach like bcache so every > component in Linux can make use of it. Hot data relocation OTOH is > interesting from another point of view and may become part of future btrfs > as it benefits from knowledge about the filesystem itself, using a generic > interface like "hot data tracking" in VFS - so other components can make use > of that, too. The only problem with doing stuff like this at a lower level (both write and read caching) is that it isn't RAID-aware. If you write 10GB of data, you use 20GB of cache to do it if you're mirrored, because the cache doesn't know about mirroring. Offhand I'm not sure if there are any performance penalties as well around the need for barriers/etc with the cache not being able to be relied on to do the right thing in terms of what gets written out - also, the data isn't redundant while it is on the cache, unless you mirror the cache. Granted, if you're using it for write intent logging then there isn't much getting around that. > Having to prepare devices for bcache is kind of a show-stopper because it is > no drop-in component that way. But OTOH I like that approach better than dm- > cache because it protects from using the backing device without going > through the caching layer which could otherwise severely damage your data, > and you get along with fewer devices and don't need to size a meta device > (which probably needs to grow later if you add devices, I don't know). And this is the main thing keeping me away from it. It is REALLY painful to migrate to/from. Having it integrated into the filesystem delivers all the same benefits of not being able to mount it without the cache present. Now excuse me while I go fix my btrfs (I tried re-enabling snapper and it again got the filesystem into a worked-up state after trying to clean up half a dozen snapshots at the same time - it works fine until you go and try to write a lot of data to it, then it stops syncing though you don't necessarily notice until a few hours later when the write cache exhausts RAM and on reboot your disk reverts back a few hours). I suspect that if I just treat it gently for a few hours btrfs will clean up the mess and it will work normally again, but the damage apparently persists after a reboot if you go heavy in the disk too quickly... Rich ^ permalink raw reply [flat|nested] 17+ messages in thread
* [gentoo-user] Re: Re: Re: Re: [Gentoo-User] emerge --sync likely to kill SSD? 2014-06-22 13:44 ` Rich Freeman @ 2014-06-24 18:34 ` Kai Krakow 2014-06-24 20:01 ` Rich Freeman 0 siblings, 1 reply; 17+ messages in thread From: Kai Krakow @ 2014-06-24 18:34 UTC (permalink / raw To: gentoo-user Rich Freeman <rich0@gentoo.org> schrieb: > On Sun, Jun 22, 2014 at 7:44 AM, Kai Krakow <hurikhan77@gmail.com> wrote: >> I don't see where you could lose the volume management features. You just >> add device on top of the bcache device after you initialized the raw >> device with a bcache superblock and attached it. The rest works the same, >> just that you use bcacheX instead of sdX devices. > > Ah, didn't realize you could attach/remove devices to bcache later. > Presumably it handles device failures gracefully, ie exposing them to > the underlying filesystem so that it can properly recover? I'm not sure if multiple partitions can share the same cache device partition but more or less that's it: Initialize bcache, then attach your backing devices, then add those bcache devices to your btrfs. I don't know how errors are handled, tho. But as with every caching technique (even in ZFS) your data is likely toast if the cache device dies in the middle of action. Thus, you should put bcache on LVM RAID if you are going to use it for write caching (i.e. write-back mode). Read caching should be okay (write-through mode). Bcache is a little slower than other flash-cache implementations because it only reports data as written back to the FS if it reached stable storage (which can be the cache device, tho, if you are using write-back mode). It was also designed with unexpected reboots in mind, read. It will replay transactions from its log on reboot. This means, you can have unstable data conditions on the raw device which is why you should never try to use that directly, e.g. from a rescue disk. But since bcache wraps the partition with its own superblock this mistake should be impossible. I'm not sure how graceful device failures are handled. I suppose in write- back mode you can get into trouble because it's too late for bcache to tell the FS that there is a write error when it already confirmed that stable storage has been hit. Maybe it will just keep the data around so you could swap devices or will report the error next time when data is written to that location. It probably interferes with btrfs RAID logic on that matter. > The only problem with doing stuff like this at a lower level (both > write and read caching) is that it isn't RAID-aware. If you write > 10GB of data, you use 20GB of cache to do it if you're mirrored, > because the cache doesn't know about mirroring. Yes, it will write double the data to the cache then - but only if btrfs also did actually read both copies (which it probably does not because it has checksums and does not need to compare data, and lets just ignore the case that another process could try to read the same data from the other raid member later, that case should become optimized-out by the OS cache). Otherwise both caches should work pretty individually with their own set of data depending on how btrfs uses each device individually. Remember that btrfs raid is not a block-based raid where block locations would match 1:1 on each device. Btrfs raid can place one mirror of data in two completely different locations on each member device (which is actually a good thing in case block errors accumulate in specific locations for a "faulty" model of a disk). In case of write caching it will of course cache double the data (because both members will be written to). But I think that's okay for the same reasons, except it will wear your cache device faster. But in that case I suggest to use individual SSDs for each btrfs member device anyways. It's not optimal, I know. Could be useful to see some best practices and pros/cons on that topic (individual cache device per btrfs member vs. bcache on LVM RAID with bcache partitions on the RAID for all members). I think the best strategy depends on if you are write-most or read-most. Thanks for mentioning. Interesting thoughts. ;-) > Offhand I'm not sure > if there are any performance penalties as well around the need for > barriers/etc with the cache not being able to be relied on to do the > right thing in terms of what gets written out - also, the data isn't > redundant while it is on the cache, unless you mirror the cache. This is partialy what I outlined above. I think in case of write-caching, there is no barriers pass-thru needed. Bcache will confirm the barriers and that's all the FS needs to know (because bcache is supervising the FS, all requests go through the bcache layer, no direct access to the backing device). Of course, it's then bcache's job to ensure everything gets written out correctly in the background (whenever it feels to do so). But it can use its own write-barriers to ensure that for the underlying device - that's nothing the FS has to care about. Performance should be faster anyway because, well, you are writing to a faster device - that is what bcache is all about, isn't it? ;-) I don't think write-barriers for read caching are needed, at least not from point of view of the FS. The caching layer, tho, will use it internally for its caching structures. If that will have a bad effect on performance is probably dependent on the implementation, but my intuition says: No performance impact because putting read data in the cache can be defered and then data will be written in the background (write-behind). > Granted, if you're using it for write intent logging then there isn't > much getting around that. Well, sure for bcache. But I think in case of FS-internal write caching devices that case could be handled gracefully (the method which you'd prefer). Since in the internal case the cache has knowledge about the FS bad block handling, it can just retry writing data to another location/disk or keep it around until the admin fixed the problem with the backing device. BTW: SSD firmwares usually suffer similar problems like outlined above because they do writes in the background when they already confirmed persistence to the OS layer. This is why SSD failures are usually much severe compared HDD failures. Do some research, and you should find tests about that topic. Especially consumer SSD firmwares have a big problem with that. So I'm not sure if it really should be bcache's job to fix that particular problem. You should just ensure good firmware and proper failure protection at the hardware level if you want to do fancy caching stuff - the FTL should be able to hide those problems before the whole thing explodes, then report errors before it is able to no longer ensure correct persistence. I suppose that is also the detail where enterprise grade SSDs behave different. HDDs have related issues (SATA vs enterprise SCSI vs SAS, hotword: IO timeouts and bad blocks, and why you should not use consumer hardware for RAIDs). I think all the same holds true for ZFS. >> Having to prepare devices for bcache is kind of a show-stopper because it >> is no drop-in component that way. But OTOH I like that approach better >> than dm- cache because it protects from using the backing device without >> going through the caching layer which could otherwise severely damage >> your data, and you get along with fewer devices and don't need to size a >> meta device (which probably needs to grow later if you add devices, I >> don't know). > > And this is the main thing keeping me away from it. It is REALLY > painful to migrate to/from. Having it integrated into the filesystem > delivers all the same benefits of not being able to mount it without > the cache present. The migration pain is what currently keeps me away, too. Otherwise I would just buy one of those fancy new cheap but still speedy Crucial SSDs and "just enable" bcache... :-\ > Now excuse me while I go fix my btrfs (I tried re-enabling snapper and > it again got the filesystem into a worked-up state after trying to > clean up half a dozen snapshots at the same time - it works fine until > you go and try to write a lot of data to it, then it stops syncing > though you don't necessarily notice until a few hours later when the > write cache exhausts RAM and on reboot your disk reverts back a few > hours). I suspect that if I just treat it gently for a few hours > btrfs will clean up the mess and it will work normally again, but the > damage apparently persists after a reboot if you go heavy in the disk > too quickly... You should report that to the btrfs list. You could try to "echo w > /proc/sysrq-trigger" and look at the blocked processes list in dmesg afterwards. I'm sure one important btrfs thread is in blocked state then... -- Replies to list only preferred. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] Re: Re: Re: Re: [Gentoo-User] emerge --sync likely to kill SSD? 2014-06-24 18:34 ` [gentoo-user] " Kai Krakow @ 2014-06-24 20:01 ` Rich Freeman 0 siblings, 0 replies; 17+ messages in thread From: Rich Freeman @ 2014-06-24 20:01 UTC (permalink / raw To: gentoo-user On Tue, Jun 24, 2014 at 2:34 PM, Kai Krakow <hurikhan77@gmail.com> wrote: > I'm not sure if multiple partitions can share the same cache device > partition but more or less that's it: Initialize bcache, then attach your > backing devices, then add those bcache devices to your btrfs. Ah, if you are stuck with one bcache partition per cached device then that will be fairly painful to manage. > Yes, it will write double the data to the cache then - but only if btrfs > also did actually read both copies (which it probably does not because it > has checksums and does not need to compare data, and lets just ignore the > case that another process could try to read the same data from the other > raid member later, that case should become optimized-out by the OS cache). I didn't realize you were proposing read caching only. If you're only caching reads then obviously that is much safer. I think with btrfs in raid1 mode with only two devices you can tell it to prefer a particular device for reading in which case you could just bcache that drive. It would only read from the other drive if the cache failed. However, I don't think btrfs lets you manually arrange drives into array-like structures. It auto-balances everything which is usually a plus, but if you have 30 disks you can't tell it to treat them as 6x 5-disk RAID5s vs one 30-disk raid5 (I think). Rich ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2014-06-24 20:01 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-06-19 2:36 [gentoo-user] [Gentoo-User] emerge --sync likely to kill SSD? microcai 2014-06-19 8:40 ` Amankwah 2014-06-19 11:44 ` Neil Bothwick 2014-06-19 11:56 ` Rich Freeman 2014-06-19 12:16 ` Kerin Millar 2014-06-19 22:03 ` Full Analyst 2014-06-20 17:48 ` [gentoo-user] " Kai Krakow 2014-06-21 4:54 ` microcai 2014-06-21 14:27 ` Peter Humphrey 2014-06-21 14:54 ` Rich Freeman 2014-06-21 19:19 ` [gentoo-user] " Kai Krakow 2014-06-21 19:24 ` Kai Krakow 2014-06-22 1:40 ` Rich Freeman 2014-06-22 11:44 ` [gentoo-user] " Kai Krakow 2014-06-22 13:44 ` Rich Freeman 2014-06-24 18:34 ` [gentoo-user] " Kai Krakow 2014-06-24 20:01 ` Rich Freeman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox