From: Rich Freeman <rich0@gentoo.org>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] Hard drive storage questions
Date: Fri, 9 Nov 2018 08:25:12 -0500 [thread overview]
Message-ID: <CAGfcS_mO8X+ao4apOi5de1jm70L=OC9gumiV+v6FZ+nwPZVDYQ@mail.gmail.com> (raw)
In-Reply-To: <f0fefde7-a601-2b8d-be5f-9f9a953ba801@iinet.net.au>
On Fri, Nov 9, 2018 at 3:17 AM Bill Kenworthy <billk@iinet.net.au> wrote:
>
> I'll second your comments on ceph after my experience - great idea for
> large scale systems, otherwise performance is quite poor on small
> systems. Needs at least GB connections with two networks as well as only
> one or two drives per host to work properly.
>
> I think I'll give lizardfs a go - an interesting read.
>
So, ANY distributed/NAS solution is going to want a good network
(gigabit or better), if you care about performance. With Ceph and the
rebuilds/etc it probably makes an even bigger difference, but lizardfs
still shuttles data around. With replication any kind of write is
multiplied so even moderate use is going to use a lot of network
bandwidth. If you're talking about hosting OS images for VMs it is a
big deal. If you're talking about hosting TV shows for your Myth
server or whatever, it probably isn't as big a deal unless you have 14
tuners and 12 clients.
Lizardfs isn't without its issues. For my purposes it is fine, but it
is NOT as robust as Ceph. Finding direct comparisons online is
difficult, but here are some of my observations (having not actually
used either, but having read up on both):
* Ceph (esp for obj store) is designed to avoid bottlenecks. Lizardfs
has a single master server that ALL metadata requests have to go
through. When you start getting into dozens of nodes that will start
to be a bottleneck, but it also eliminates some of the rigidity of
Ceph since clients don't have to know where all the data is. I
imagine it adds a bit of latency to reads.
* Lizardfs defaults to acking writes after the first node receives
them, then replicates them. Ceph defaults to acking after all
replicas are made. For any application that takes transactions
seriously there is a HUGE data security difference, but it of course
will lower write latency for lizardfs.
* Lizardfs makes it a lot easier to tweak storage policy at the
directory/file level. Cephfs basically does this more at the
mountpoint level.
* Ceph CRUSH maps are much more configurable than Lizardfs goals.
With Ceph you could easily say that you want 2 copies, and they have
to be on hard drives with different vendors, and in different
datacenters. With Lizardfs combining tags like this is less
convenient, and while you could say that you want one copy in rack A
and one in rack B, you can't say that you don't care which 2 as long
as they are different.
* The lizardfs high-availability stuff (equiv of Ceph monitors) only
recently went FOSS, and probably isn't stabilized on most distros.
You can have backup masters that are ready to go, but you need your
own solution for promoting them.
* Lizardfs security seems to be non-existent. Don't stick it on your
intranet if you are a business. Fine for home, or for a segregated
SAN, maybe, or you could stick it all behind some kind of VPN and roll
your own security layer. Ceph security seems pretty robust, but
watching what the ansible playbook did to set it up makes me shudder
at the thought of doing it myself. Lots of keys that all need to be
in sync so that everything can talk to each other. I'm not sure if
for clients whether it can outsource authentication to kerberos/etc -
not a need for me but I wouldn't be surprised if this is supported.
The key syncing makes a lot more sense within the cluster itself.
* Lizardfs is MUCH simpler to set up. For Ceph I recommend the
ansible playbook, though if I were using it in production I'd want to
do some serious config management as it seems rather complex and it
seems like the sort of thing that could take out half a datacenter if
it had a bug. For Lizardfs if you're willing to use the suggested
hostnames about 95% of it is auto-configuring as storage nodes just
reach out to the default master DNS name and report in, and everything
trusts everything (not just by default - I don't think you even can
lock it down unless you stick every node behind a VPN to limit who can
talk to who).
--
Rich
next prev parent reply other threads:[~2018-11-09 13:25 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-09 1:16 [gentoo-user] Hard drive storage questions Dale
2018-11-09 1:31 ` Jack
2018-11-09 1:43 ` Dale
2018-11-09 2:04 ` Andrew Lowe
2018-11-09 2:07 ` Bill Kenworthy
2018-11-09 8:39 ` Neil Bothwick
2018-11-09 2:29 ` Rich Freeman
2018-11-09 8:17 ` Bill Kenworthy
2018-11-09 13:25 ` Rich Freeman [this message]
2018-11-09 9:02 ` J. Roeleveld
2018-11-11 0:45 ` Dale
2018-11-11 21:41 ` Wol's lists
2018-11-11 22:17 ` Dale
2018-11-09 9:24 ` Wols Lists
-- strict thread matches above, loose matches on Subject: below --
2015-04-28 8:39 Dale
2015-04-28 14:49 ` Francisco Ares
2015-04-28 15:01 ` Alan McKinnon
2015-04-28 15:24 ` Neil Bothwick
2015-04-28 17:38 ` Rich Freeman
2015-04-28 18:11 ` Neil Bothwick
2015-04-28 18:31 ` Rich Freeman
2015-04-28 18:41 ` Neil Bothwick
2015-04-29 6:13 ` Alan McKinnon
2015-04-29 7:52 ` Neil Bothwick
2015-05-04 7:39 ` Dale
2015-05-04 7:46 ` Neil Bothwick
2015-05-04 8:13 ` Mick
2015-05-04 8:26 ` Dale
2015-05-04 8:23 ` Dale
2015-05-04 10:31 ` Neil Bothwick
2015-05-04 10:40 ` Dale
2015-05-04 11:26 ` Neil Bothwick
2015-05-09 10:56 ` Dale
2015-05-09 12:59 ` Rich Freeman
2015-05-09 14:46 ` Todd Goodman
2015-05-09 18:16 ` Rich Freeman
2015-05-04 11:35 ` Rich Freeman
2015-05-04 18:42 ` Nuno Magalhães
2015-05-05 6:41 ` Alan McKinnon
2015-05-05 10:56 ` Rich Freeman
2015-05-05 11:33 ` Neil Bothwick
2015-05-05 12:05 ` Mick
2015-05-05 12:21 ` Neil Bothwick
2015-05-05 12:39 ` Mick
2015-05-05 12:53 ` Rich Freeman
2015-05-05 21:50 ` Neil Bothwick
2015-05-05 22:21 ` Bill Kenworthy
2015-05-05 22:33 ` Bill Kenworthy
2015-05-04 10:57 ` Alan Mackenzie
2015-04-28 15:02 ` Rich Freeman
2015-05-04 7:23 ` Dale
2015-05-05 3:01 ` Walter Dnes
2015-04-27 7:41 Dale
2015-04-28 18:25 ` Daniel Frey
2015-04-28 21:23 ` Dale
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGfcS_mO8X+ao4apOi5de1jm70L=OC9gumiV+v6FZ+nwPZVDYQ@mail.gmail.com' \
--to=rich0@gentoo.org \
--cc=gentoo-user@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox