From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gentoo-user+bounces-196582-garchives=archives.gentoo.org@lists.gentoo.org>
Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by finch.gentoo.org (Postfix) with ESMTPS id 9611D139337
	for <garchives@archives.gentoo.org>; Sun,  1 Aug 2021 11:37:57 +0000 (UTC)
Received: from pigeon.gentoo.org (localhost [127.0.0.1])
	by pigeon.gentoo.org (Postfix) with SMTP id 2FD5EE0940;
	Sun,  1 Aug 2021 11:37:50 +0000 (UTC)
Received: from mail-ot1-f50.google.com (mail-ot1-f50.google.com [209.85.210.50])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by pigeon.gentoo.org (Postfix) with ESMTPS id 1C0F5E090E
	for <gentoo-user@lists.gentoo.org>; Sun,  1 Aug 2021 11:37:48 +0000 (UTC)
Received: by mail-ot1-f50.google.com with SMTP id 61-20020a9d0d430000b02903eabfc221a9so14788060oti.0
        for <gentoo-user@lists.gentoo.org>; Sun, 01 Aug 2021 04:37:48 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to;
        bh=w9/GF1L6cVbMpv58UkPVDinBYbRbs9OB7iMZFWCMc2A=;
        b=eRfFRfAEsiRXxnGX3UQQW6Xqv8rl0h+8VJOVa+lzwzPoHXDPz/N03W8TNCDEaR16bc
         vUwY4vntN3NES/Bo3/47ZDHEoOfpXpOMzlPm+ADQyXpzpetmJOJ17LUWgFaoW/yXIdsM
         ju+li5UkD0XYsAFDmTHkH9nmH+kchnWYHrk3e9eeZPN4FkAJ81BupCPPjzCkSDr08zJe
         C4wPHWXw9tBaQGdznIl9mIZZMNfDjObVvvpJAr7eVkFhZyPFfOaiHSdbij0eXWHJDvO+
         HdMpxWfr/JYO53y9cZ9m0+a7VFd1HjLf6qXq99NntI6b+rqQbz653F+hR20TDU5H/mem
         p/8g==
X-Gm-Message-State: AOAM5303AP3uwPhwRWsU/AjTTNVOSVWHqAdMOqd6TNrlNFVoKEVQ25oF
	Hvw5ULGsoeX7JW2idkR2PDzKYFtkbH2vRpl6XrIxHe5j
X-Google-Smtp-Source: ABdhPJymrDncQj90w3OndAwTo+ek2nN6EZKhjtWDVXMRQHvOaFGamlxm4VxhJFllyWhPJCkd8k29UaMW7TJ95IDk9ro=
X-Received: by 2002:a9d:6ad4:: with SMTP id m20mr8400372otq.338.1627817867552;
 Sun, 01 Aug 2021 04:37:47 -0700 (PDT)
Precedence: bulk
List-Post: <mailto:gentoo-user@lists.gentoo.org>
List-Help: <mailto:gentoo-user+help@lists.gentoo.org>
List-Unsubscribe: <mailto:gentoo-user+unsubscribe@lists.gentoo.org>
List-Subscribe: <mailto:gentoo-user+subscribe@lists.gentoo.org>
List-Id: Gentoo Linux mail <gentoo-user.gentoo.org>
X-BeenThere: gentoo-user@lists.gentoo.org
Reply-to: gentoo-user@lists.gentoo.org
X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply
MIME-Version: 1.0
References: <9946c2eb-bb5c-a9c0-ced9-1ac269cd69a0@gmail.com>
 <6ecbf2d6-2c6f-3f66-5eee-f4766d5e5254@gmail.com> <f48bc72e-dfea-fb10-28a9-89fee3195d29@gmail.com>
 <AM6PR10MB24406ED1F1B405DFD39C0AD1EF1B9@AM6PR10MB2440.EURPRD10.PROD.OUTLOOK.COM>
 <24805.48814.331408.860941@tux.local> <5483630c-3cd1-bca2-0a6d-62bb85a5adc6@gmail.com>
 <YP3JvifkTInJsxl2@kern> <96fc901a-2ce4-0ea0-0ed1-1c529145c0e9@gmail.com>
 <YP8igfk3Bt+al/vL@kern> <6102DB58.7040103@youngman.org.uk>
 <YQMVtr83vkfzS0nP@kern> <56d64f52-1b9a-1309-c720-06bb63c9f80a@iinet.net.au>
 <CAGfcS_=8sjEe60KEuVsoqEyj5TbK177H0vf2fY3yQp370h+VFw@mail.gmail.com>
 <7a8c52c3-4c96-89ac-ace0-6eb4b8f1401f@iinet.net.au> <6104C897.5010505@youngman.org.uk>
 <CAGfcS_niW+TqPo42UsB=m1xBy_DNwPhy0UHrS8zzHsd2occeTw@mail.gmail.com>
 <ec5183c6-1bd0-fe1f-90cc-d1992f6457d7@iinet.net.au> <CAGfcS_kziu6DUkRzU+iPK=S3KtKsMujvaaiGxZ2QYotLCne0kQ@mail.gmail.com>
 <6ca83a12-24b7-57bd-2dd9-1b1d46209d69@iinet.net.au>
In-Reply-To: <6ca83a12-24b7-57bd-2dd9-1b1d46209d69@iinet.net.au>
From: Rich Freeman <rich0@gentoo.org>
Date: Sun, 1 Aug 2021 07:37:35 -0400
Message-ID: <CAGfcS_=A=+L9pBxHuaa+Fitom494vqwwQS2NU48kBhaDgEResQ@mail.gmail.com>
Subject: Re: [gentoo-user] [OT] SMR drives (WAS: cryptsetup close and device
 in use when it is not)
To: gentoo-user@lists.gentoo.org
Content-Type: text/plain; charset="UTF-8"
X-Archives-Salt: 6ad43efd-57df-4a12-bf61-a72a1f67d4df
X-Archives-Hash: fef92b016e64269dbe8b91c2996af67f

On Sat, Jul 31, 2021 at 11:05 PM William Kenworthy <billk@iinet.net.au> wrote:
>
> On 31/7/21 9:30 pm, Rich Freeman wrote:
> >
> > I'd love server-grade ARM hardware but it is just so expensive unless
> > there is some source out there I'm not aware of.  It is crazy that you
> > can't get more than 4-8GiB of RAM on an affordable arm system.
> Checkout the odroid range.  Same or only slightly $$$ more for a much
> better unit than a pi (except for the availability of 8G ram on the pi4)

Oh, they have been on my short list.

I was opining about the lack of cheap hardware with >8GB of RAM, and I
don't believe ODROID offers anything like that.  I'd be happy if they
just took DDR4 on top of whatever onboard RAM they had.

My SBCs for the lizardfs cluster are either Pi4s or RockPro64s.  The
Pi4 addresses basically all the issues in the original Pis as far as
I'm aware, and is comparable to most of the ODroid stuff I believe (at
least for the stuff I need), and they're still really cheap.  The
RockPro64 was a bit more expensive but also performs nicely - I bought
that to try playing around with LSI HBAs to get many SATA drives on
one SBC.

I'm mainly storing media so capacity matters more than speed.  At the
time most existing SBCs either didn't have SATA or had something like
1-2 ports, and that means you're ending up with a lot of hosts.  Sure,
it would perform better, but it costs more. Granted, at the start I
didn't want more than 1-2 drives per host anyway until I got up to
maybe 5 or so hosts just because that is where you see the cluster
perform well and have decent safety margins, but at this point if I
add capacity it will be to existing hosts.

> Tried ceph - run away fast :)

Yeah, it is complex, and most of the tools for managing it created
concerns that if something went wrong they could really mess the whole
thing up fast.  The thing that pushed me away from it was reports that
it doesn't perform well only a few OSDs and I wanted something I could
pilot without buying a lot of hardware.  Another issue is that at
least at the time I was looking into it they wanted OSDs to have 1GB
of RAM per 1TB of storage.  That is a LOT of RAM.  Aside from the fact
that RAM is expensive, it basically eliminates the ability to use
low-power cheap SBCs for all the OSDs, which is what I'm doing with
lizardfs.  I don't care about the SBCs being on 24x7 when they pull a
few watts each peak, and almost nothing when idle.  If I want to
attach even 4x14TB hard drives to an SBC though it would need 64GB of
RAM per the standards of Ceph at the time.  Good luck finding a cheap
low-power ARM board that has 64GB of RAM - anything that even had DIMM
slots was something crazy like $1k at the time and at that point I
might as well build full PCs.

It seems like they've backed off on the memory requirements, maybe,
but I'd want to check on that.  I've seen stories of bad things
happening when the OSDs don't have much RAM and you run into a
scenario like:
1. Lose disk, cluster starts to rebuild.
2. Lose another disk, cluster queues another rebuild.
3. Oh, first disk comes back, cluster queues another rebuild to
restore the first disk.
4. Replace the second failed disk, cluster queues another rebuild.

Apparently at least in the old days all the OSDs had to keep track of
all of that and they'd run out of RAM and basically melt down, unless
you went around adding more RAM to every OSD.

With LizardFS the OSDs basically do nothing at all but pipe stuff to
disk.  If you want to use full-disk encryption then there is a CPU hit
for that, but that is all outside of Lizardfs and dm-crypt at least is
reasonable.  (zfs on the other hand does not hardware accelerate it on
SBCs as far as I can tell and that hurts.)

> I improved performance and master memory
> requirements considerably by pushing the larger data sets (e.g., Gib of
> mail files) into a container file stored on MFS and loop mounted onto
> the mailserver lxc instance.  Convoluted but very happy with the
> improvement its made.

Yeah, I've noticed as you described in the other email memory depends
on number of files, and it depends on having it all in RAM at once.
I'm using it for media storage mostly so the file count is modest.  I
do use snapshots but only a few at a time so it can handle that.
While the master is running on amd64 with plenty of RAM I do have
shadow masters set up on SBCs and I do want to be able to switch over
to one if something goes wrong, so I want RAM use to be acceptable.
It really doesn't matter how much space the files take up - just now
many inodes you have.

-- 
Rich