From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 4C401138334 for ; Fri, 9 Nov 2018 13:25:35 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id C47A5E0CBF; Fri, 9 Nov 2018 13:25:26 +0000 (UTC) Received: from mail-pg1-f194.google.com (mail-pg1-f194.google.com [209.85.215.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 4795FE0C39 for ; Fri, 9 Nov 2018 13:25:26 +0000 (UTC) Received: by mail-pg1-f194.google.com with SMTP id z10so852176pgp.7 for ; Fri, 09 Nov 2018 05:25:26 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=sBnKeCOldCpx1eB/fYbEMXXHOdal/CiXvLcMOdJjLnE=; b=hApPk0CRcpv7O3xe0eelyU5F3gtkRKr1uZyykoC6E0uLNwaoizmLzTsCI3o8rUPH9s 5SFI4fljySTxg3fN86iSy70lCJyKs09iw5uglM2astPEdOlFARC5EbK1dfDK8ovrBd/g wpRw1mpwg/G335SGOecFsZ5a1rLpJTJzdY9iT9oJmHNU3BQyE6p0a3Nxt7JJS93bQrHq XEegU7/PB8GzvXjGFEgJi6MhBPeVd6ELROsgj+eyAPg871TSLssGZZc1ly+QuXi6N+/k 4win9AYT5n7portHwM6EuL8JmLLicdkKGrjRsdgdaH4AnNEy+zzbvahVKKtt4IS1KEts gQEQ== X-Gm-Message-State: AGRZ1gLAbBxV1BuH4YCIhtdB190jcAf036piKq8GKPd6oxqtj7G0oSyp AcupT/7M1gyk7/OgCJbBM5dK7dRR+Vi8IGJtXmpunw== X-Google-Smtp-Source: AJdET5elQ0NsDXCuKnnoC5w/FpoyvdvQaIIUbG4gCpNATBGenxbuH9BjiPIHnl7oVhGud8YpJ8rpYNpijeCD0O9QiAg= X-Received: by 2002:a62:670f:: with SMTP id b15-v6mr8982983pfc.243.1541769924565; Fri, 09 Nov 2018 05:25:24 -0800 (PST) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 References: <9b5365ed-5cca-54b3-0da7-bbbe697b4c40@gmail.com> In-Reply-To: From: Rich Freeman Date: Fri, 9 Nov 2018 08:25:12 -0500 Message-ID: Subject: Re: [gentoo-user] Hard drive storage questions To: gentoo-user@lists.gentoo.org Content-Type: text/plain; charset="UTF-8" X-Archives-Salt: 98bc31c9-7d07-4d87-9ac8-f3ab137cfe44 X-Archives-Hash: df239b83b8cf943c2e82ece243141ae6 On Fri, Nov 9, 2018 at 3:17 AM Bill Kenworthy wrote: > > I'll second your comments on ceph after my experience - great idea for > large scale systems, otherwise performance is quite poor on small > systems. Needs at least GB connections with two networks as well as only > one or two drives per host to work properly. > > I think I'll give lizardfs a go - an interesting read. > So, ANY distributed/NAS solution is going to want a good network (gigabit or better), if you care about performance. With Ceph and the rebuilds/etc it probably makes an even bigger difference, but lizardfs still shuttles data around. With replication any kind of write is multiplied so even moderate use is going to use a lot of network bandwidth. If you're talking about hosting OS images for VMs it is a big deal. If you're talking about hosting TV shows for your Myth server or whatever, it probably isn't as big a deal unless you have 14 tuners and 12 clients. Lizardfs isn't without its issues. For my purposes it is fine, but it is NOT as robust as Ceph. Finding direct comparisons online is difficult, but here are some of my observations (having not actually used either, but having read up on both): * Ceph (esp for obj store) is designed to avoid bottlenecks. Lizardfs has a single master server that ALL metadata requests have to go through. When you start getting into dozens of nodes that will start to be a bottleneck, but it also eliminates some of the rigidity of Ceph since clients don't have to know where all the data is. I imagine it adds a bit of latency to reads. * Lizardfs defaults to acking writes after the first node receives them, then replicates them. Ceph defaults to acking after all replicas are made. For any application that takes transactions seriously there is a HUGE data security difference, but it of course will lower write latency for lizardfs. * Lizardfs makes it a lot easier to tweak storage policy at the directory/file level. Cephfs basically does this more at the mountpoint level. * Ceph CRUSH maps are much more configurable than Lizardfs goals. With Ceph you could easily say that you want 2 copies, and they have to be on hard drives with different vendors, and in different datacenters. With Lizardfs combining tags like this is less convenient, and while you could say that you want one copy in rack A and one in rack B, you can't say that you don't care which 2 as long as they are different. * The lizardfs high-availability stuff (equiv of Ceph monitors) only recently went FOSS, and probably isn't stabilized on most distros. You can have backup masters that are ready to go, but you need your own solution for promoting them. * Lizardfs security seems to be non-existent. Don't stick it on your intranet if you are a business. Fine for home, or for a segregated SAN, maybe, or you could stick it all behind some kind of VPN and roll your own security layer. Ceph security seems pretty robust, but watching what the ansible playbook did to set it up makes me shudder at the thought of doing it myself. Lots of keys that all need to be in sync so that everything can talk to each other. I'm not sure if for clients whether it can outsource authentication to kerberos/etc - not a need for me but I wouldn't be surprised if this is supported. The key syncing makes a lot more sense within the cluster itself. * Lizardfs is MUCH simpler to set up. For Ceph I recommend the ansible playbook, though if I were using it in production I'd want to do some serious config management as it seems rather complex and it seems like the sort of thing that could take out half a datacenter if it had a bug. For Lizardfs if you're willing to use the suggested hostnames about 95% of it is auto-configuring as storage nodes just reach out to the default master DNS name and report in, and everything trusts everything (not just by default - I don't think you even can lock it down unless you stick every node behind a VPN to limit who can talk to who). -- Rich