From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id BAAAF139337 for ; Sun, 1 Aug 2021 03:07:09 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id BFD34E0982; Sun, 1 Aug 2021 03:07:02 +0000 (UTC) Received: from icp-osb-irony-out5.external.iinet.net.au (icp-osb-irony-out5.external.iinet.net.au [203.59.1.221]) by pigeon.gentoo.org (Postfix) with ESMTP id 985D6E092E for ; Sun, 1 Aug 2021 03:06:59 +0000 (UTC) X-SMTP-MATCH: 0 IronPort-HdrOrdr: =?us-ascii?q?A9a23=3ATfWAk6om6eOK/8WA2r46VxEaV5oseYIsim?= =?us-ascii?q?QD101hICG9Afbo7/xG+85rrCMc6QxhPk3I9urwWpVoLUm9yXcx2/h3AV7AZn?= =?us-ascii?q?iDhILLFvAA0WKK+VSJcECTygce78ZdmsNFZ+EYY2IVsS+32njcLz/++rO6zJ?= =?us-ascii?q?w=3D?= X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A2C/BgBtDwZh/+hY69xaHAEBAQEBAQc?= =?us-ascii?q?BARIBAQQEAQFACYFQgXYGgXwBAQFphEeJBIcSAQEBAQEBBoEULQM4AYQIhii?= =?us-ascii?q?GToo+gWgLAQEBAQEBAQEBCUEEAQGEWAKDACY4EwECBBUBAQEFAQEBAQEGAwG?= =?us-ascii?q?BJIV1hkMBBSMPASMzCxgCAiYCAlcTBgIBARCCXYJiJasPgTGBAYRohR2BECq?= =?us-ascii?q?Nb0N9gRCBPA+BboEBPoQPAQsGAgEIAgeDJoJkBIIfgVgkCgETCDAPHwEJIwo?= =?us-ascii?q?LNX4FLwORMR2OJp0VgzGRdIw/Bg8FJoNjkWsIL5BihnevQ4UQgT45gQ1wTR8?= =?us-ascii?q?ZgyRQGQ6OVm4BCY0tNAEBAS84AgYBCgEBAwmICQEBgkUBAQ?= X-IPAS-Result: =?us-ascii?q?A2C/BgBtDwZh/+hY69xaHAEBAQEBAQcBARIBAQQEAQFAC?= =?us-ascii?q?YFQgXYGgXwBAQFphEeJBIcSAQEBAQEBBoEULQM4AYQIhiiGToo+gWgLAQEBA?= =?us-ascii?q?QEBAQEBCUEEAQGEWAKDACY4EwECBBUBAQEFAQEBAQEGAwGBJIV1hkMBBSMPA?= =?us-ascii?q?SMzCxgCAiYCAlcTBgIBARCCXYJiJasPgTGBAYRohR2BECqNb0N9gRCBPA+Bb?= =?us-ascii?q?oEBPoQPAQsGAgEIAgeDJoJkBIIfgVgkCgETCDAPHwEJIwoLNX4FLwORMR2OJ?= =?us-ascii?q?p0VgzGRdIw/Bg8FJoNjkWsIL5BihnevQ4UQgT45gQ1wTR8ZgyRQGQ6OVm4BC?= =?us-ascii?q?Y0tNAEBAS84AgYBCgEBAwmICQEBgkUBAQ?= X-IronPort-AV: E=Sophos;i="5.84,285,1620662400"; d="scan'208";a="390872519" Received: from 220-235-88-232.dyn.iinet.net.au (HELO mail.infra.localdomain) ([220.235.88.232]) by icp-osb-irony-out5.iinet.net.au with ESMTP; 01 Aug 2021 11:06:25 +0800 Received: from localhost (mail.infra.localdomain [127.0.0.1]) by mail.infra.localdomain (Postfix) with ESMTP id 2A9A122FB for ; Sun, 1 Aug 2021 11:06:25 +0800 (AWST) X-Virus-Scanned: amavisd-new at localdomain Received: from mail.infra.localdomain ([127.0.0.1]) by localhost (mail.infra.localdomain [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DmichSzYL5Hp for ; Sun, 1 Aug 2021 11:05:30 +0800 (AWST) Subject: Re: [gentoo-user] [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) To: gentoo-user@lists.gentoo.org References: <9946c2eb-bb5c-a9c0-ced9-1ac269cd69a0@gmail.com> <6ecbf2d6-2c6f-3f66-5eee-f4766d5e5254@gmail.com> <24805.48814.331408.860941@tux.local> <5483630c-3cd1-bca2-0a6d-62bb85a5adc6@gmail.com> <96fc901a-2ce4-0ea0-0ed1-1c529145c0e9@gmail.com> <6102DB58.7040103@youngman.org.uk> <56d64f52-1b9a-1309-c720-06bb63c9f80a@iinet.net.au> <7a8c52c3-4c96-89ac-ace0-6eb4b8f1401f@iinet.net.au> <6104C897.5010505@youngman.org.uk> From: William Kenworthy Message-ID: <6ca83a12-24b7-57bd-2dd9-1b1d46209d69@iinet.net.au> Date: Sun, 1 Aug 2021 11:05:29 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-AU X-Archives-Salt: d8038684-cf1a-4c6d-8868-90a06b876d61 X-Archives-Hash: f3d9e4d1d8d35b1d265a7592e1b91bff On 31/7/21 9:30 pm, Rich Freeman wrote: > On Sat, Jul 31, 2021 at 8:59 AM William Kenworthy wrote: >> I tried using moosefs with a rpi3B in the >> mix and it didn't go well once I started adding data - rpi 4's were not >> available when I set it up. > Pi2/3s only have USB2 as far as I'm aware, and they stick the ethernet > port on that USB bus besides. So, they're terrible for anything that > involves IO of any kind. > > The Pi4 moves the ethernet off of USB, upgrades it to gigabit, and has > two USB3 hosts, so this is just all-around a missive improvement. > Obviously it isn't going to outclass some server-grade system with a > gazillion PCIe v4 lanes but it is very good for an SBC and the price. > > I'd love server-grade ARM hardware but it is just so expensive unless > there is some source out there I'm not aware of. It is crazy that you > can't get more than 4-8GiB of RAM on an affordable arm system. Checkout the odroid range.  Same or only slightly $$$ more for a much better unit than a pi (except for the availability of 8G ram on the pi4) None of the pi's I have had have come close though I do not have a pi4 and that looks from reading to be much closer in performance.  The Odroid sites includes comparison charts of odroid aganst the rpi and it also shows it getting closer in performance.  There are a few other companies out there too.  I am hoping the popularity of the pi 8G will push others to match it. I found the supplied 4.9 or 4.14 kernels problematic with random crashes, espicially if usb was involved.  I am currently using the 5.12 tobetter kernels and aarch64 or arm32 bit gentoo userlands. > >> I think that SMR disks will work quite well >> on moosefs or lizardfs - I don't see long continuous writes to one disk >> but a random distribution of writes across the cluster with gaps between >> on each disk (1G network). > So, the distributed filesystems divide all IO (including writes) > across all the drives in the cluster. When you have a number of > drives that obviously increases the total amount of IO you can handle > before the SMR drives start hitting the wall. Writing 25GB of data to > a single SMR drive will probably overrun its CMR cache, but if you > split it across 10 drives and write 2.5GB each, there is a decent > chance they'll all have room in the cache, take the write quickly, and > then as long as your writes aren't sustained they can clear the > buffer. Not strictly what I am seeing.  You request a file from MFS and the first first free chunkserver with the data replies.  Writing is similar in that (depending on the creation arguments) a chunk is written wherever responds fastest then replicated.  Replication is done under control of an algorithm that replicates a set number of chunks at a time between a limited number of chunkservers in a stream depending on replication status.  So I am seeing individual disk activity that is busy for a few seconds, and then nothing for a short period - this pattern has become more pronounced as I added chunkservers and would seem to match the SMR requirements.  If I replace/rebuild (resilver) a chunkserver, that one is a lot busier, but still not 100% continuous write or read.  Moosefs uses a throttled replication methodology.  This is with 7 chunkservers and 9 disks - more is definitely better for performance. > I think you're still going to have an issue in a rebalancing scenario > unless you're adding many drives at once so that the network becomes > rate-limiting instead of the disks. Having unreplicated data sitting > around for days or weeks due to slow replication performance is > setting yourself up for multiple failures. So, I'd still stay away > from them. I think at some point I am going to have to add an SMR disk and see what happens - cant do it now though. > > If you have 10GbE then your ability to overrun those disks goes way > up. Ditto if you're running something like Ceph which can achieve > higher performance. I'm just doing bulk storage where I care a lot > more about capacity than performance. If I were trying to run a k8s > cluster or something I'd be on Ceph on SSD or whatever. Tried ceph - run away fast :) I have a lot of nearly static data but also a number of lxc instances (running on an Odroid N2) with both the LXC instance and data stored on the cluster.  These include email, calendaring, dns, webservers etc. all work well.  The online borgbackup repos are also stored on it. Limitations on community moosefs is the single point of failure that is the master plus the memory resource requirements on the master.  I improved performance and master memory requirements considerably by pushing the larger data sets (e.g., Gib of mail files) into a container file stored on MFS and loop mounted onto the mailserver lxc instance.  Convoluted but very happy with the improvement its made. >> With a good adaptor, USB3 is great ... otherwise its been quite >> frustrating :( I do suspect linux and its pedantic correctness trying >> to deal with hardware that isn't truly standardised (as in the >> manufacturer probably supplies a windows driver that covers it up) is >> part of the problem. These adaptors are quite common and I needed to >> apply the ATA command filter and turn off UAS using the usb tweaks >> mechanism to stop the crashes and data corruption. The comments in the >> kernel driver code for these adaptors are illuminating! > Sometimes I wonder. I occasionally get errors in dmesg about > unaligned writes when using zfs. Others have seen these. The zfs > developers seem convinced that the issue isn't with zfs but it simply > is reporting the issue, or maybe it happens under loads that you're > more likely to get with zfs scrubbing (which IMO performs far worse > than with btrfs - I'm guessing it isn't optimized to scan physically > sequentially on each disk but may be doing it in a more logical order > and synchronously between mirror pairs). Sometimes I wonder if there > is just some sort of bug in the HBA drivers, or maybe the hardware on > the motherboard. Consumer PC hardware (like all PC hardware) is > basically a black box unless you have pretty sophisticated testing > equipment and knowledge, so if your SATA host is messing things up how > would you know? > BillK