* [gentoo-user] "Amount" of fstrim? (curiosity driven, no paranoia :) @ 2020-04-26 14:52 tuxic 2020-04-26 15:20 ` Rich Freeman 0 siblings, 1 reply; 14+ messages in thread From: tuxic @ 2020-04-26 14:52 UTC (permalink / raw To: Gentoo Hi, jyst out of curiosity: I have a 512 MB NVMe SDD drive installed, which I had (currently) formatted with one 256 MB root partition. I bound /var and /tmp to hardisk. Currently I am doing one Gentoo update a day and I am running unstable. Just to get a feeling, how often I need to fstrim / I do it currently by hand once in a week. Fstrim reports about 200 GiB of trimmed data. From the gut this looks quite a lot -- the whole partition is 256 GB in size. Smartclt report for the drive: Data Units Written: 700,841 [358 GB] Each week 200 GiB fstrimmed data for a partition of 256 GB in size and since the beginning I have written only 358 GB to it. How does this all fit together? Cheers! Meino ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-user] "Amount" of fstrim? (curiosity driven, no paranoia :) 2020-04-26 14:52 [gentoo-user] "Amount" of fstrim? (curiosity driven, no paranoia :) tuxic @ 2020-04-26 15:20 ` Rich Freeman 2020-04-26 16:15 ` tuxic 0 siblings, 1 reply; 14+ messages in thread From: Rich Freeman @ 2020-04-26 15:20 UTC (permalink / raw To: gentoo-user On Sun, Apr 26, 2020 at 10:52 AM <tuxic@posteo.de> wrote: > > Fstrim reports about 200 GiB of trimmed data. > > From the gut this looks quite a lot -- the whole > partition is 256 GB in size. > > Smartclt report for the drive: > Data Units Written: 700,841 [358 GB] > > Each week 200 GiB fstrimmed data for a partition of > 256 GB in size and since the beginning I have written > only 358 GB to it. > > How does this all fit together? It doesn't fit together, because the amount of space trimmed has nothing to do with the amount of data written. How much free space is there? I would think that fstrim would just trim all unused blocks on the filesystem. Unless it maintained state it would have no idea what has changed since the last time it was run, so if you ran it 10 times in a row it would trim 200GiB each time. Unless your NVMe is brain-dead the only real downside to running it more often is the IO. If you trim 200GiB of data 100x in a row the 99x after the first one should all be no-ops if the drive is well-designed. An fstrim should just be a metadata operation. Now, not all flash storage is equally well-implemented, and I suspect the guidelines to avoid running it often or using discard settings are from those who either have really cheap drives, or ones from a long time ago. A lot of linux advice tends to be based on what people did 10+years ago, and a lot of linux design decisions get made to accommodate the guy who wants everything to work fine on his 386+ISA and SGI Indigo in his basement. My suggestion would be to run fstrim twice in a row and see how fast it operates and what the results are. If the second one completes very quickly that suggests that the drive is sane. I'd probably just run it daily in that case, but weekly is probably fine especially if the drive isn't very full. -- Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-user] "Amount" of fstrim? (curiosity driven, no paranoia :) 2020-04-26 15:20 ` Rich Freeman @ 2020-04-26 16:15 ` tuxic 2020-04-26 19:29 ` Rich Freeman 2020-04-27 15:12 ` Kent Fredric 0 siblings, 2 replies; 14+ messages in thread From: tuxic @ 2020-04-26 16:15 UTC (permalink / raw To: gentoo-user On 04/26 11:20, Rich Freeman wrote: > On Sun, Apr 26, 2020 at 10:52 AM <tuxic@posteo.de> wrote: > > > > Fstrim reports about 200 GiB of trimmed data. > > > > From the gut this looks quite a lot -- the whole > > partition is 256 GB in size. > > > > Smartclt report for the drive: > > Data Units Written: 700,841 [358 GB] > > > > Each week 200 GiB fstrimmed data for a partition of > > 256 GB in size and since the beginning I have written > > only 358 GB to it. > > > > How does this all fit together? > > It doesn't fit together, because the amount of space trimmed has > nothing to do with the amount of data written. > > How much free space is there? I would think that fstrim would just > trim all unused blocks on the filesystem. Unless it maintained state > it would have no idea what has changed since the last time it was run, > so if you ran it 10 times in a row it would trim 200GiB each time. > > Unless your NVMe is brain-dead the only real downside to running it > more often is the IO. If you trim 200GiB of data 100x in a row the > 99x after the first one should all be no-ops if the drive is > well-designed. An fstrim should just be a metadata operation. > > Now, not all flash storage is equally well-implemented, and I suspect > the guidelines to avoid running it often or using discard settings are > from those who either have really cheap drives, or ones from a long > time ago. A lot of linux advice tends to be based on what people did > 10+years ago, and a lot of linux design decisions get made to > accommodate the guy who wants everything to work fine on his 386+ISA > and SGI Indigo in his basement. > > My suggestion would be to run fstrim twice in a row and see how fast > it operates and what the results are. If the second one completes > very quickly that suggests that the drive is sane. I'd probably just > run it daily in that case, but weekly is probably fine especially if > the drive isn't very full. > > -- > Rich > Hi Rich, thanks for explanation. My observations does not fit with your explanation, though. Early in the morning I did a fstrim, which results in the 200GiB of freed data. Base on you posting I did a fstrim now with no wait in between: host:/root>fstrim -v / /: 3.3 GiB (3578650624 bytes) trimmed host:/root>fstrim -v / /: 0 B (0 bytes) trimmed This time the first fstrim reports a small mount of trimmed data and second one no fstrimmed data at all. The SSD is a ADATA Technology Co., Ltd. XPG SX8200 Pro PCIe Gen3x4 M.2 2280 Solid State Drive (rev 03) (cut'n'paste from `lspci`) host:/root>df -h / Filesystem Size Used Avail Use% Mounted on /dev/root 246G 45G 189G 20% / Cheers! Meino ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-user] "Amount" of fstrim? (curiosity driven, no paranoia :) 2020-04-26 16:15 ` tuxic @ 2020-04-26 19:29 ` Rich Freeman 2020-04-27 1:43 ` tuxic 2020-04-27 10:32 ` Alan Mackenzie 2020-04-27 15:12 ` Kent Fredric 1 sibling, 2 replies; 14+ messages in thread From: Rich Freeman @ 2020-04-26 19:29 UTC (permalink / raw To: gentoo-user On Sun, Apr 26, 2020 at 12:15 PM <tuxic@posteo.de> wrote: > > On 04/26 11:20, Rich Freeman wrote: > > On Sun, Apr 26, 2020 at 10:52 AM <tuxic@posteo.de> wrote: > > > > > > Fstrim reports about 200 GiB of trimmed data. > > > > > > > My suggestion would be to run fstrim twice in a row and see how fast > > it operates and what the results are. If the second one completes > > very quickly that suggests that the drive is sane. I'd probably just > > run it daily in that case, but weekly is probably fine especially if > > the drive isn't very full. > > > > host:/root>fstrim -v / > /: 3.3 GiB (3578650624 bytes) trimmed > host:/root>fstrim -v / > /: 0 B (0 bytes) trimmed > > This time the first fstrim reports a small mount of trimmed > data and second one no fstrimmed data at all. > Ok, I became a bit less lazy and started looking at the source. All fstrim does is send an FITRIM ioctl to the kernel for the device. This is implemented in a filesystem-dependent manner, and I couldn't actually find any documentation on it (actual documentation on the ioctl - not the fstrim manpage/etc). A quick glimpse at the ext4 source suggests that ext4 has a flag that can track whether a group of blocks has been trimmed yet or not since it was last deallocated. So ext4 will make repeated fstrim runs a no-op and the drive won't see these. At least, that was what I got after about 5-10min of browsing. I didn't take the time to grok how ext4 tracks free space and so on. Incidentally, in the other thread the reason that dry-run didn't report anything to be trimmed is that this is hard-coded: printf(_("%s: 0 B (dry run) trimmed on %s\n"), path, devname); https://github.com/karelzak/util-linux/blob/master/sys-utils/fstrim.c#L109 Otherwise the ioctl returns how much space was trimmed, and fstrim outputs this. -- Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-user] "Amount" of fstrim? (curiosity driven, no paranoia :) 2020-04-26 19:29 ` Rich Freeman @ 2020-04-27 1:43 ` tuxic 2020-04-27 1:58 ` Rich Freeman 2020-04-27 10:32 ` Alan Mackenzie 1 sibling, 1 reply; 14+ messages in thread From: tuxic @ 2020-04-27 1:43 UTC (permalink / raw To: gentoo-user On 04/26 03:29, Rich Freeman wrote: > On Sun, Apr 26, 2020 at 12:15 PM <tuxic@posteo.de> wrote: > > > > On 04/26 11:20, Rich Freeman wrote: > > > On Sun, Apr 26, 2020 at 10:52 AM <tuxic@posteo.de> wrote: > > > > > > > > Fstrim reports about 200 GiB of trimmed data. > > > > > > > > > > My suggestion would be to run fstrim twice in a row and see how fast > > > it operates and what the results are. If the second one completes > > > very quickly that suggests that the drive is sane. I'd probably just > > > run it daily in that case, but weekly is probably fine especially if > > > the drive isn't very full. > > > > > > > host:/root>fstrim -v / > > /: 3.3 GiB (3578650624 bytes) trimmed > > host:/root>fstrim -v / > > /: 0 B (0 bytes) trimmed > > > > This time the first fstrim reports a small mount of trimmed > > data and second one no fstrimmed data at all. > > > > Ok, I became a bit less lazy and started looking at the source. > > All fstrim does is send an FITRIM ioctl to the kernel for the device. > This is implemented in a filesystem-dependent manner, and I couldn't > actually find any documentation on it (actual documentation on the > ioctl - not the fstrim manpage/etc). A quick glimpse at the ext4 > source suggests that ext4 has a flag that can track whether a group of > blocks has been trimmed yet or not since it was last deallocated. So > ext4 will make repeated fstrim runs a no-op and the drive won't see > these. > > At least, that was what I got after about 5-10min of browsing. I > didn't take the time to grok how ext4 tracks free space and so on. > > Incidentally, in the other thread the reason that dry-run didn't > report anything to be trimmed is that this is hard-coded: > printf(_("%s: 0 B (dry run) trimmed on %s\n"), path, devname); > https://github.com/karelzak/util-linux/blob/master/sys-utils/fstrim.c#L109 > > Otherwise the ioctl returns how much space was trimmed, and fstrim outputs this. > > -- > Rich > Hi Rich, thank you very much for digging into the depth of the sources and for explaining it!!! Very appreciated !!! :) :) To implement a dry run with a printf() is new to me... ;) Cheers! Meino ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-user] "Amount" of fstrim? (curiosity driven, no paranoia :) 2020-04-27 1:43 ` tuxic @ 2020-04-27 1:58 ` Rich Freeman 2020-04-27 3:14 ` tuxic 0 siblings, 1 reply; 14+ messages in thread From: Rich Freeman @ 2020-04-27 1:58 UTC (permalink / raw To: gentoo-user On Sun, Apr 26, 2020 at 9:43 PM <tuxic@posteo.de> wrote: > > To implement a dry run with a printf() is new to me... ;) > That is all they fstrim authors could do, since there is no dry-run option for the actual ioctl, and fstrim itself has no idea how the filesystem will implement it (short of re-implementing numerous filesystems in the program and running it on unmounted devices). It seems like an fstrim dry-run is only minimally functional, though I guess it will test if you made any gross errors in syntax and so on. I don't see any reason why they couldn't have a dry-run option for the ioctl, but it would have to be implemented in the various filesystems. Really it seems like ioctl in general in the kernel isn't super-well-documented. It isn't like the system call interface. That is, unless I just missed some ioctl document floating around. The actual list of ioctls is in the kernel includes, but this does not define the syntax of the 3rd parameter of the ioctl system call which is function-specific. The structure used by the FITRIM ioctl is in the includes, but not with any kind of documentation or even a cross-reference to associate the structure with the ioctl itself. -- Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-user] "Amount" of fstrim? (curiosity driven, no paranoia :) 2020-04-27 1:58 ` Rich Freeman @ 2020-04-27 3:14 ` tuxic 2020-04-27 8:22 ` William Kenworthy 0 siblings, 1 reply; 14+ messages in thread From: tuxic @ 2020-04-27 3:14 UTC (permalink / raw To: gentoo-user On 04/26 09:58, Rich Freeman wrote: > On Sun, Apr 26, 2020 at 9:43 PM <tuxic@posteo.de> wrote: > > > > To implement a dry run with a printf() is new to me... ;) > > > > That is all they fstrim authors could do, since there is no dry-run > option for the actual ioctl, and fstrim itself has no idea how the > filesystem will implement it (short of re-implementing numerous > filesystems in the program and running it on unmounted devices). It > seems like an fstrim dry-run is only minimally functional, though I > guess it will test if you made any gross errors in syntax and so on. > I don't see any reason why they couldn't have a dry-run option for the > ioctl, but it would have to be implemented in the various filesystems. > Really it seems like ioctl in general in the kernel isn't > super-well-documented. It isn't like the system call interface. That > is, unless I just missed some ioctl document floating around. The > actual list of ioctls is in the kernel includes, but this does not > define the syntax of the 3rd parameter of the ioctl system call which > is function-specific. The structure used by the FITRIM ioctl is in the > includes, but not with any kind of documentation or even a > cross-reference to associate the structure with the ioctl itself. > > -- > Rich > Hi Rich, thanks for the explanations again. But I think it is better not to implement a feature at all as via printf. For a dry run I had expected, that some checks had been implemented, whether a non-dry run would be successfull. For example: When submitting fstrim -n / as normal user I get: /: 0 B (dry run) trimmed Doing the same without dry run set I get: fstrim: /: FITRIM ioctl failed: Operation not permitted When doing a fstrim -n /home/user as normal user, I get the same behaviour as above -- despite that /home/user is on a harddisk with no fstrim functionality at all. If fstrim cannot implement the above correctlu, it would be better not to implement it all...I think. Cheers! Meino ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-user] "Amount" of fstrim? (curiosity driven, no paranoia :) 2020-04-27 3:14 ` tuxic @ 2020-04-27 8:22 ` William Kenworthy 0 siblings, 0 replies; 14+ messages in thread From: William Kenworthy @ 2020-04-27 8:22 UTC (permalink / raw To: gentoo-user On 27/4/20 11:14 am, tuxic@posteo.de wrote: > On 04/26 09:58, Rich Freeman wrote: > / on a btrfs raid10 (1x500G and 3x120G SSD) "fstrim -v /" about 2 hrs apart: rattus ~ # fstrim -v / /: 680.6 GiB (730744291328 bytes) trimmed rattus ~ # fstrim -v / /: 17.8 GiB (19087859712 bytes) trimmed rattus ~ # fstrim -v / /: 17.8 GiB (19074703360 bytes) trimmed rattus ~ # the last two runs took about the same time, though the first was only slightly longer - should have timed it properly! BillK ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-user] "Amount" of fstrim? (curiosity driven, no paranoia :) 2020-04-26 19:29 ` Rich Freeman 2020-04-27 1:43 ` tuxic @ 2020-04-27 10:32 ` Alan Mackenzie 1 sibling, 0 replies; 14+ messages in thread From: Alan Mackenzie @ 2020-04-27 10:32 UTC (permalink / raw To: gentoo-user Hello, Rich. On Sun, Apr 26, 2020 at 15:29:40 -0400, Rich Freeman wrote: [ .... ] > Incidentally, in the other thread the reason that dry-run didn't > report anything to be trimmed is that this is hard-coded: > printf(_("%s: 0 B (dry run) trimmed on %s\n"), path, devname); > https://github.com/karelzak/util-linux/blob/master/sys-utils/fstrim.c#L109 Thanks for looking that up! There doesn't appear to be much point to this misleading --dry-run option. It seems like a good idea which never got implemented (except in the manual). > Otherwise the ioctl returns how much space was trimmed, and fstrim outputs this. > -- > Rich -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-user] "Amount" of fstrim? (curiosity driven, no paranoia :) 2020-04-26 16:15 ` tuxic 2020-04-26 19:29 ` Rich Freeman @ 2020-04-27 15:12 ` Kent Fredric 2020-04-27 16:20 ` tuxic 1 sibling, 1 reply; 14+ messages in thread From: Kent Fredric @ 2020-04-27 15:12 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 2480 bytes --] On Sun, 26 Apr 2020 18:15:51 +0200 tuxic@posteo.de wrote: > Filesystem Size Used Avail Use% Mounted on > /dev/root 246G 45G 189G 20% / Given that (Size - Used) is roughly 200G, it suggests to me that perhaps, some process somewhere is creating and deleting a lot of temporary files on this device (or maybe simply re-writing the same file multiple times) From a userspace, this would be invisible, as the "new" file would be in a new location on the disk, and the "old" file would be invisible, and marked "can be overwritten". So if you did: for i in {0..200}; do cp a b rm a mv b a done Where "a" is a 1G file, I'd expect this to have a *ceiling* of 200G that would turn up in fstrim output, as once you reached iteration 201, where "can overwrite" would allow the SSD to go back and rewrite over the space used in iteration 1. While the whole time, the visible disk usage in df -h would never exceed 46G . I don't know if this is what is happening, I don't have an SSD and don't get to use fstrim. But based on what you've said, the results aren't *too* surprising. Though its possible the hardware has some internal magic to elide some writes, potentially making the "cp" action incur very few writes, which would show up in the smartctl data, but ext4 might not know anything about that, so perhaps fstrim only indicates what ext4 *tracked* as being cleaned, while it may have incurred much less cleanup required on the hardware. That would explain the difference between smartctl and fstrim results. Maybe compare smartctl output over time with /sys/fs/ext4/<device>/session_write_kbytes and see if one grows faster than the other? :) My local session_write_kbytes is currently at 709G, the partition its for is only 552G, with 49G space, and its been booted 33 days, so "21G of writes a day". And uh, lifetime_write_kbytes is about 18TB. Yikes. ( compiling things involves a *LOT* of ephemeral data ) Also, probably don't assume the amount of free space on your partition is all the physical device has at its disposal to use. It seems possible that on the hardware, the total pool of "free blocks" is arbitrarily usable by the device for wear levelling, and a TRIM command to that device could plausibly report more blocks trimmed than your current partition size, depending on how its implemented. But indeed, lots of speculation here on my part :) [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-user] "Amount" of fstrim? (curiosity driven, no paranoia :) 2020-04-27 15:12 ` Kent Fredric @ 2020-04-27 16:20 ` tuxic 2020-04-27 16:59 ` Rich Freeman 0 siblings, 1 reply; 14+ messages in thread From: tuxic @ 2020-04-27 16:20 UTC (permalink / raw To: gentoo-user On 04/28 03:12, Kent Fredric wrote: > On Sun, 26 Apr 2020 18:15:51 +0200 > tuxic@posteo.de wrote: > > > Filesystem Size Used Avail Use% Mounted on > > /dev/root 246G 45G 189G 20% / > > Given that (Size - Used) is roughly 200G, it suggests to me that > perhaps, some process somewhere is creating and deleting a lot of > temporary files on this device (or maybe simply re-writing the same > file multiple times) > > From a userspace, this would be invisible, as the "new" file would be > in a new location on the disk, and the "old" file would be invisible, > and marked "can be overwritten". > > So if you did: > > for i in {0..200}; do > cp a b > rm a > mv b a > done > > Where "a" is a 1G file, I'd expect this to have a *ceiling* of 200G > that would turn up in fstrim output, as once you reached iteration 201, > where "can overwrite" would allow the SSD to go back and rewrite over > the space used in iteration 1. > > While the whole time, the visible disk usage in df -h would never > exceed 46G . > > I don't know if this is what is happening, I don't have an SSD and > don't get to use fstrim. > > But based on what you've said, the results aren't *too* surprising. > > Though its possible the hardware has some internal magic to elide some > writes, potentially making the "cp" action incur very few writes, which > would show up in the smartctl data, but ext4 might not know anything > about that, so perhaps fstrim only indicates what ext4 *tracked* as > being cleaned, while it may have incurred much less cleanup required on > the hardware. > > That would explain the difference between smartctl and fstrim results. > > Maybe compare smartctl output over time with > /sys/fs/ext4/<device>/session_write_kbytes and see if one grows faster > than the other? :) > > My local session_write_kbytes is currently at 709G, the partition its > for is only 552G, with 49G space, and its been booted 33 days, so "21G > of writes a day". > > And uh, lifetime_write_kbytes is about 18TB. Yikes. > > ( compiling things involves a *LOT* of ephemeral data ) > > Also, probably don't assume the amount of free space on your partition > is all the physical device has at its disposal to use. It seems > possible that on the hardware, the total pool of "free blocks" is > arbitrarily usable by the device for wear levelling, and a TRIM command > to that device could plausibly report more blocks trimmed than your > current partition size, depending on how its implemented. > > But indeed, lots of speculation here on my part :) > Hi Kent, Thank yopu very much for your research and your explanations! :) Due to some statements I found online I did a interesting little experiment: fstrim fstrim reboot fstrim Reported amount of data for each fstrim: 200.2GiB 0.0GiB -- 200.2GiB The reboot seems to be worth the same amount of fstrimmed data as one week of daily updates and recompilations. ;) (By the way: This all happens on a ext4 filesystem). Background according to the reports I found online: The kernel is keep track of all, which already has been fstrimmed and avoids to retrimm the same data. This knowledge gets lost, when the PC is powercycled or rebooted. I think, the value of the amount of fstrimmed data does not reflect the amount of data, which gets physically fstrimmed by the SSD controller. The kernel onlu throw the information of "possible candidates for being fstrimmed" towards the SSD controller, which is real master behind all this. And as you wrote: The maximum amount of "possible data for being fstrimmed" is all free space of the filesystem. Slightly related: Do you know the purpose of this value (smartctl -a <device>): Data Units Read: 656,599 [336 GB] Data Units Written: 702,251 [359 GB] Host Read Commands: 4,316,042 Host Write Commands: 3,080,180 Are these the raw amount of data, I have send to the SSD? Looks like a lot... Cheers! Meino ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-user] "Amount" of fstrim? (curiosity driven, no paranoia :) 2020-04-27 16:20 ` tuxic @ 2020-04-27 16:59 ` Rich Freeman 2020-04-27 19:07 ` antlists 0 siblings, 1 reply; 14+ messages in thread From: Rich Freeman @ 2020-04-27 16:59 UTC (permalink / raw To: gentoo-user On Mon, Apr 27, 2020 at 12:20 PM <tuxic@posteo.de> wrote: > > The kernel is keep track of all, which already has been fstrimmed and > avoids to retrimm the same data. > This knowledge gets lost, when the PC is powercycled or rebooted. > I imagine this is filesystem-specific. When I checked the ext4 source I didn't think to actually check whether those flags are stored on disk vs in some kind of cache. I wouldn't be surprised if this data is also lost by simply unmounting the filesystem. > I think, the value of the amount of fstrimmed data does not reflect > the amount of data, which gets physically fstrimmed by the SSD > controller. Yup. Though I'd take issue with the term "physically fstrimmed" - I don't think that a concept like this really exists. The only physical operations are reading, writing, and erasing. TRIM is really a logical operation at its heart. It wouldn't make sense for a TRIM to automatically trigger some kind of erase operation all the time. Suppose blocks 1-32 are in a single erase group. You send a TRIM command for block 1 only. It makes no sense to have the device read blocks 2-32, erase blocks 1-32, and then write blocks 2-32 back. That does erase block 1, but it costs a bunch of IO and it only replicates the worst case scenario of what would happen if you overwrote block 1 in place without trimming it first. You might argue that now block 1 can be written later without having to do another erase, but this is only true if the drive can remember that it was already erased - otherwise all writes have to be preceded with reads just to see if the block is already empty. Maybe that is how they actually do it, but it seems like it would make more sense for a drive to try to look for opportunities to erase entire blocks that don't require a read first, or to try to keep these unused areas in some kind in extents that are less expensive to track. The drive already has to do a lot of mapping for the sake of wear leveling. Really though a better solution than any of this is for the filesystem to be more SSD-aware and just only perform writes on entire erase regions at one time. If the drive is told to write blocks 1-32 then it can just blindly erase their contents first because it knows everything there is getting overwritten anyway. Likewise a filesystem could do its own wear-leveling also, especially on something like flash where the cost of fragmentation is not high. I'm not sure how well either zfs or ext4 perform in these roles. Obviously a solution like f2fs designed for flash storage is going to excel here. -- Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-user] "Amount" of fstrim? (curiosity driven, no paranoia :) 2020-04-27 16:59 ` Rich Freeman @ 2020-04-27 19:07 ` antlists 2020-04-27 19:17 ` Rich Freeman 0 siblings, 1 reply; 14+ messages in thread From: antlists @ 2020-04-27 19:07 UTC (permalink / raw To: gentoo-user On 27/04/2020 17:59, Rich Freeman wrote: > Really though a better solution than any of this is for the filesystem > to be more SSD-aware and just only perform writes on entire erase > regions at one time. If the drive is told to write blocks 1-32 then > it can just blindly erase their contents first because it knows > everything there is getting overwritten anyway. Likewise a filesystem > could do its own wear-leveling also, especially on something like > flash where the cost of fragmentation is not high. I'm not sure how > well either zfs or ext4 perform in these roles. Obviously a solution > like f2fs designed for flash storage is going to excel here. The problem here is "how big is an erase region". I've heard comments that it is several megs. Trying to consolidate writes into megabyte blocks is going to be tricky, to say the least, unless you're dealing with video files or hi-res photos - I think the files my camera chucks out are in the 10MB region ... (24MP raw...) Cheers, Wol ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-user] "Amount" of fstrim? (curiosity driven, no paranoia :) 2020-04-27 19:07 ` antlists @ 2020-04-27 19:17 ` Rich Freeman 0 siblings, 0 replies; 14+ messages in thread From: Rich Freeman @ 2020-04-27 19:17 UTC (permalink / raw To: gentoo-user On Mon, Apr 27, 2020 at 3:07 PM antlists <antlists@youngman.org.uk> wrote: > > On 27/04/2020 17:59, Rich Freeman wrote: > > Really though a better solution than any of this is for the filesystem > > to be more SSD-aware and just only perform writes on entire erase > > regions at one time. If the drive is told to write blocks 1-32 then > > it can just blindly erase their contents first because it knows > > everything there is getting overwritten anyway. Likewise a filesystem > > could do its own wear-leveling also, especially on something like > > flash where the cost of fragmentation is not high. I'm not sure how > > well either zfs or ext4 perform in these roles. Obviously a solution > > like f2fs designed for flash storage is going to excel here. > > The problem here is "how big is an erase region". I've heard comments > that it is several megs. I imagine most SSDs aren't that big, though SMR drives probably are that and more. But I agree - for anything like this to work it really needs to be a host-managed solution ideally, or at least one where the vendor has published specs on how to align writes/etc. -- Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-04-27 19:17 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-04-26 14:52 [gentoo-user] "Amount" of fstrim? (curiosity driven, no paranoia :) tuxic 2020-04-26 15:20 ` Rich Freeman 2020-04-26 16:15 ` tuxic 2020-04-26 19:29 ` Rich Freeman 2020-04-27 1:43 ` tuxic 2020-04-27 1:58 ` Rich Freeman 2020-04-27 3:14 ` tuxic 2020-04-27 8:22 ` William Kenworthy 2020-04-27 10:32 ` Alan Mackenzie 2020-04-27 15:12 ` Kent Fredric 2020-04-27 16:20 ` tuxic 2020-04-27 16:59 ` Rich Freeman 2020-04-27 19:07 ` antlists 2020-04-27 19:17 ` Rich Freeman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox