From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 9611D139337 for ; Sun, 1 Aug 2021 11:37:57 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 2FD5EE0940; Sun, 1 Aug 2021 11:37:50 +0000 (UTC) Received: from mail-ot1-f50.google.com (mail-ot1-f50.google.com [209.85.210.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 1C0F5E090E for ; Sun, 1 Aug 2021 11:37:48 +0000 (UTC) Received: by mail-ot1-f50.google.com with SMTP id 61-20020a9d0d430000b02903eabfc221a9so14788060oti.0 for ; Sun, 01 Aug 2021 04:37:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=w9/GF1L6cVbMpv58UkPVDinBYbRbs9OB7iMZFWCMc2A=; b=eRfFRfAEsiRXxnGX3UQQW6Xqv8rl0h+8VJOVa+lzwzPoHXDPz/N03W8TNCDEaR16bc vUwY4vntN3NES/Bo3/47ZDHEoOfpXpOMzlPm+ADQyXpzpetmJOJ17LUWgFaoW/yXIdsM ju+li5UkD0XYsAFDmTHkH9nmH+kchnWYHrk3e9eeZPN4FkAJ81BupCPPjzCkSDr08zJe C4wPHWXw9tBaQGdznIl9mIZZMNfDjObVvvpJAr7eVkFhZyPFfOaiHSdbij0eXWHJDvO+ HdMpxWfr/JYO53y9cZ9m0+a7VFd1HjLf6qXq99NntI6b+rqQbz653F+hR20TDU5H/mem p/8g== X-Gm-Message-State: AOAM5303AP3uwPhwRWsU/AjTTNVOSVWHqAdMOqd6TNrlNFVoKEVQ25oF Hvw5ULGsoeX7JW2idkR2PDzKYFtkbH2vRpl6XrIxHe5j X-Google-Smtp-Source: ABdhPJymrDncQj90w3OndAwTo+ek2nN6EZKhjtWDVXMRQHvOaFGamlxm4VxhJFllyWhPJCkd8k29UaMW7TJ95IDk9ro= X-Received: by 2002:a9d:6ad4:: with SMTP id m20mr8400372otq.338.1627817867552; Sun, 01 Aug 2021 04:37:47 -0700 (PDT) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 References: <9946c2eb-bb5c-a9c0-ced9-1ac269cd69a0@gmail.com> <6ecbf2d6-2c6f-3f66-5eee-f4766d5e5254@gmail.com> <24805.48814.331408.860941@tux.local> <5483630c-3cd1-bca2-0a6d-62bb85a5adc6@gmail.com> <96fc901a-2ce4-0ea0-0ed1-1c529145c0e9@gmail.com> <6102DB58.7040103@youngman.org.uk> <56d64f52-1b9a-1309-c720-06bb63c9f80a@iinet.net.au> <7a8c52c3-4c96-89ac-ace0-6eb4b8f1401f@iinet.net.au> <6104C897.5010505@youngman.org.uk> <6ca83a12-24b7-57bd-2dd9-1b1d46209d69@iinet.net.au> In-Reply-To: <6ca83a12-24b7-57bd-2dd9-1b1d46209d69@iinet.net.au> From: Rich Freeman Date: Sun, 1 Aug 2021 07:37:35 -0400 Message-ID: Subject: Re: [gentoo-user] [OT] SMR drives (WAS: cryptsetup close and device in use when it is not) To: gentoo-user@lists.gentoo.org Content-Type: text/plain; charset="UTF-8" X-Archives-Salt: 6ad43efd-57df-4a12-bf61-a72a1f67d4df X-Archives-Hash: fef92b016e64269dbe8b91c2996af67f On Sat, Jul 31, 2021 at 11:05 PM William Kenworthy wrote: > > On 31/7/21 9:30 pm, Rich Freeman wrote: > > > > I'd love server-grade ARM hardware but it is just so expensive unless > > there is some source out there I'm not aware of. It is crazy that you > > can't get more than 4-8GiB of RAM on an affordable arm system. > Checkout the odroid range. Same or only slightly $$$ more for a much > better unit than a pi (except for the availability of 8G ram on the pi4) Oh, they have been on my short list. I was opining about the lack of cheap hardware with >8GB of RAM, and I don't believe ODROID offers anything like that. I'd be happy if they just took DDR4 on top of whatever onboard RAM they had. My SBCs for the lizardfs cluster are either Pi4s or RockPro64s. The Pi4 addresses basically all the issues in the original Pis as far as I'm aware, and is comparable to most of the ODroid stuff I believe (at least for the stuff I need), and they're still really cheap. The RockPro64 was a bit more expensive but also performs nicely - I bought that to try playing around with LSI HBAs to get many SATA drives on one SBC. I'm mainly storing media so capacity matters more than speed. At the time most existing SBCs either didn't have SATA or had something like 1-2 ports, and that means you're ending up with a lot of hosts. Sure, it would perform better, but it costs more. Granted, at the start I didn't want more than 1-2 drives per host anyway until I got up to maybe 5 or so hosts just because that is where you see the cluster perform well and have decent safety margins, but at this point if I add capacity it will be to existing hosts. > Tried ceph - run away fast :) Yeah, it is complex, and most of the tools for managing it created concerns that if something went wrong they could really mess the whole thing up fast. The thing that pushed me away from it was reports that it doesn't perform well only a few OSDs and I wanted something I could pilot without buying a lot of hardware. Another issue is that at least at the time I was looking into it they wanted OSDs to have 1GB of RAM per 1TB of storage. That is a LOT of RAM. Aside from the fact that RAM is expensive, it basically eliminates the ability to use low-power cheap SBCs for all the OSDs, which is what I'm doing with lizardfs. I don't care about the SBCs being on 24x7 when they pull a few watts each peak, and almost nothing when idle. If I want to attach even 4x14TB hard drives to an SBC though it would need 64GB of RAM per the standards of Ceph at the time. Good luck finding a cheap low-power ARM board that has 64GB of RAM - anything that even had DIMM slots was something crazy like $1k at the time and at that point I might as well build full PCs. It seems like they've backed off on the memory requirements, maybe, but I'd want to check on that. I've seen stories of bad things happening when the OSDs don't have much RAM and you run into a scenario like: 1. Lose disk, cluster starts to rebuild. 2. Lose another disk, cluster queues another rebuild. 3. Oh, first disk comes back, cluster queues another rebuild to restore the first disk. 4. Replace the second failed disk, cluster queues another rebuild. Apparently at least in the old days all the OSDs had to keep track of all of that and they'd run out of RAM and basically melt down, unless you went around adding more RAM to every OSD. With LizardFS the OSDs basically do nothing at all but pipe stuff to disk. If you want to use full-disk encryption then there is a CPU hit for that, but that is all outside of Lizardfs and dm-crypt at least is reasonable. (zfs on the other hand does not hardware accelerate it on SBCs as far as I can tell and that hurts.) > I improved performance and master memory > requirements considerably by pushing the larger data sets (e.g., Gib of > mail files) into a container file stored on MFS and loop mounted onto > the mailserver lxc instance. Convoluted but very happy with the > improvement its made. Yeah, I've noticed as you described in the other email memory depends on number of files, and it depends on having it all in RAM at once. I'm using it for media storage mostly so the file count is modest. I do use snapshots but only a few at a time so it can handle that. While the master is running on amd64 with plenty of RAM I do have shadow masters set up on SBCs and I do want to be able to switch over to one if something goes wrong, so I want RAM use to be acceptable. It really doesn't matter how much space the files take up - just now many inodes you have. -- Rich