From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id BCA69158094 for ; Fri, 26 Aug 2022 12:07:38 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 8535EE09C3; Fri, 26 Aug 2022 12:07:32 +0000 (UTC) Received: from mail-pj1-f44.google.com (mail-pj1-f44.google.com [209.85.216.44]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id CABE7E09B5 for ; Fri, 26 Aug 2022 12:07:31 +0000 (UTC) Received: by mail-pj1-f44.google.com with SMTP id i8-20020a17090a65c800b001fd602afda2so1482056pjs.4 for ; Fri, 26 Aug 2022 05:07:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc; bh=nqGtArQIPAwNszp35tV0paxlV6wIfquigmxCj8aWzC0=; b=fB8iRMYWSO1GneoGq3kzbcCt7It58cE3kr+1YqpWhEInaWQ1TTo36RLgGFPn4XYAZe mPxTAH0W21ZuBN8H6sTGzBgz4PLHIdXJhRDMjbmZ+ZaoBSDg7/ql0Lg7eOSUxWtbP/K9 tV+VwRqP1yTafcEn6cmNYn0IIwMykKpzLp2Na310irv7G1AmKLIFJOr5Yx07trvBSgDT MSbNQl4dn94RBHBxfUbhBBcRbEM+wg9+8B3PJLm/fcUXgxNZTY5RmUh51tL82pVGMOTs xvbcw1dglBAbL1h+6PU9/I2lnCjb6ikAzcfHZ0Nm9vUipgm0ARa6nD1V5VN/HuAh6h87 1nuA== X-Gm-Message-State: ACgBeo1GJSaDC4uRTSn6hK3GlY49JYwr/NqyPbXC1KNs0JjpeudEHKT6 SDAYY75EEyDvHrV970gfaHwXanwntMe6aZHvPn8rtPpR/1BZSg== X-Google-Smtp-Source: AA6agR7JH22clRg/XhvtXUgCIU5cZ49njM2Snzy3jdQBJvsQKLhFhN37axygRvrOt0qtqI7/Q/j7BI7oI8YMQby3pNs= X-Received: by 2002:a17:90b:3d8b:b0:1fb:6f72:3f8f with SMTP id pq11-20020a17090b3d8b00b001fb6f723f8fmr4025240pjb.125.1661515650327; Fri, 26 Aug 2022 05:07:30 -0700 (PDT) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 References: <57a9895b-9357-17f1-8fb5-d0ede952eefc@gmail.com> <20220819042614.bj5crtjkgszbnshh@grusum.dhaller.de> <289fe32e-2815-c361-ea80-73d8df539417@iinet.net.au> <6f3feff2-eac9-f6ec-4a3c-c511cf469603@gmail.com> <6e3ee99d-46da-4cdf-93ed-838591a50f67@users.sourceforge.net> <12905e2c-f3ad-b7b9-78e1-4604e38f8a8e@gmail.com> In-Reply-To: From: Rich Freeman Date: Fri, 26 Aug 2022 08:07:20 -0400 Message-ID: Subject: Re: [gentoo-user] Getting maximum space out of a hard drive To: gentoo-user@lists.gentoo.org Content-Type: text/plain; charset="UTF-8" X-Archives-Salt: a1892b8e-933f-43dd-ac4c-58cd033384f9 X-Archives-Hash: bba3774914a65821bf5ba6840bda90cd On Fri, Aug 26, 2022 at 7:26 AM Dale wrote: > > I looked into the Raspberry and the newest version, about $150 now, doesn't even have SATA ports. The Pi4 is definitely a step up from the previous versions in terms of IO, but it is still pretty limited. It has USB3 and gigabit, and they don't share a USB host or anything like that, so you should get close to full performance out of both. The CPU is of course pretty limited, as is RAM. Biggest benefit is the super-low power consumption, and that is something I take seriously as for a lot of cheap hardware that runs 24x7 the power cost rapidly exceeds the purchase price. I see people buying old servers for $100 or whatever and those things will often go through $100 worth of electricity in a few months. How many hard drives are you talking about? There are two general routes to go for something like this. The simplest and most traditional way is a NAS box of some kind, with RAID. The issue with these approaches is that you're limited by the number of hard drives you can run off of one host, and of course if anything other than a drive fails you're offline. The other approach is a distributed filesystem. That ramps up the learning curve quite a bit, but for something like media where IOPS doesn't matter it eliminates the need to try to cram a dozen hard drives into one host. Ceph can also do IOPS but you're talking 10GbE + NVMe and big bucks, and that is how modern server farms would do it. I'll describe the traditional route since I suspect that is where you're going to end up. If you only had 2-4 drives total you could probably get away with a Pi4 and USB3 drives, but if you want encryption or anything CPU-intensive you're probably going to bottleneck on the CPU. It would be fine if you're more concerned with capacity than storage. For more drives than that, or just to be more robust, then any standard amd64 build will be fine. Obviously a motherboard with lots of SATA ports will help here. However, that almost always is a bottleneck on consumer gear, and the typical solution to that for SATA is a host bus adapter. They're expensive new, but cheap on ebay (I've had them fail though, which is probably why companies tend to sell them while they're still working). They also use a ton of power - I've measured them using upwards of 60W - they're designed for servers where nobody seems to care. A typical HBA can provide 8-32 SATA ports, via mini-SAS breakout cables (one mini-SAS port can provide 4 SATA ports). HBAs tend to use a lot of PCIe lanes - you don't necessarily need all of them if you only have a few drives and they're spinning disks, but it is probably easiest if you get a CPU with integrated graphics and use the 16x slot for the HBA. That or get a motherboard with two large slots (they usually aren't 16x, but getting 4-8x slots on a consumer motherboard isn't super-common). For software I'd use mdadm plus LVM. ZFS or btrfs are your other options, and those can run on bare metal, but btrfs is immature and ZFS cannot be reshaped the way mdadm can, so there are tradeoffs. If you want to use your existing drives and don't have a backup to restore or want to do it live, then the easiest option there is to add one drive to the system to expand capacity. Put mdadm on that drive as a degraded raid1 or whatever, then put LVM on top, and migrate data from an existing disk live over to the new one, freeing up one or more existing drives. Then put mdadm on those and LVM and migrate more data onto them, and so on, until everything is running on top of mdadm. Of course you need to plan how you want the array to look and have enough drives that you get the desired level of redundancy. You can start with degraded arrays (which is no worse than what you have now), then when enough drives are freed up they can be added as pairs to fill it out. If you want to go the distributed storage route then CephFS is the canonical solution at this point but it is RAM-hungry so it tends to be expensive. It is also complex, but there are ansible playbooks and so on to manage that (though playbooks with 100+ plays in them make me nervous). For something simpler MooseFS or LizardFS are probably where I'd start. I'm running LizardFS but they've been on the edge of death for years upstream and MooseFS licensing is apparently better now, so I'd probably look at that first. I did a talk on lizardfs recently: https://www.youtube.com/watch?v=dbMRcVrdsQs -- Rich