public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
From: "J. Roeleveld" <joost@antarean.org>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] Re: File system testing
Date: Wed, 17 Sep 2014 21:34:06 +0200	[thread overview]
Message-ID: <15339117.pAj2kdbPAt@andromeda> (raw)
In-Reply-To: <loom.20140917T172559-530@post.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 6077 bytes --]


On Wednesday, September 17, 2014 03:55:56 PM James wrote:
> J. Roeleveld <joost <at> antarean.org> writes:
> > > Distributed File Systems (DFS):
> > 
> > > Local (Device) File Systems LFS:
> > Is my understanding correct that the top list all require one of
> > the bottom  list?
> > Eg. the "clustering" FSs only ensure the files on the LFSs are
> > duplicated/spread over the various nodes?
> > 
> > I would normally expect the clustering FS to be either the full layer
> > or a  clustered block-device where an FS can be placed on top.
> 
> I have not performed these installation yet. My research indicates
> that first you put the Local FS on the drive, just like any installation
> of Linux. Then you put the distributed FS on top of this. Some DFS might
> not require a LFS, but FhGFS does and does HDFS. I will not acutally
> be able to accurately answer your questions, until I start to build
> up the 3 system cluster. (a week or 2 away) is my best guess.

Playing around with clusters is on my list, but due to other activities having 
a higher priority, I haven't had much time yet.

> > Otherwise it seems more like a network filesystem with caching
> > options (See  AFS).
> 
> OK, I'll add AFS. You may be correct on this one  or AFS might be both.

Personally, I would read up on these and see how they work. Then, based 
on that, decide if they are likely to assist in the specific situation you are 
interested in.
AFS, NFS, CIFS,... can be used for clusters, but, apart from NFS, I wouldn't 
expect much performance out of them.
If you need it to be fault-tolerant and not overly rely on a single point of 
failure, I wouldn't be using any of these. Only AFS, from my original 
investigation, showed some fault-tolerence, but needed too many 
resources (disk-space) on the clients.

> > I am also interested in these filesystems, but for a slightly different
> 
> > scenario:
> Ok, so I the "test-dummy-crash-victim" I'd be honored to have, you,
> Alan, Neil, Mic  etc etc back-seat-0drive on this adventure! (The more
> I read the more it's time for burbon, bash, and a  bit of cursing
> to get started...)

Good luck and even though I'd love to join in with the testing, I simply do 
not have the time to keep up. I would probably just slow you down.

> > - 2 servers in remote locations (different offices)
> > - 1 of these has all the files stored (server A) at the main office
> > - The other (server B - remote office) needs to "offer" all files
> > from serverA  When server B needs to supply a file, it needs to
> > check if the local copy is still the "valid" version.
> > If yes, supply the local copy, otherwise download
> > from server A. When a file is changed, server A needs to be updated.
> > While server B is sharing a file, the file needs to be locked on server A
> > preventing simultaneous updates.
> 
> OOch, file locking (precious tells me that is alway tricky).

I need it to be locked on server A while server B has a proper write-lock to 
avoid 2 modifications to compete with each other.

> (pist, systemd is causing fits for the clustering geniuses;
> some are espousing a variety of cgroup gymnastics for phantom kills)

phantom kills?

> Spark is fault tolerant, regardless of node/memory/drive failures
> above the fault tolerance that a file system configuration many support.
> If fact, files lost can be 'regenerated' but it is computationally
> expensive.

Too much for me.

> You have to get your file system(s) set up. Then install
> mesos-0.20.0 and then spark. I have mesos mostly ready. I should
> have spark in alpha-beta this weekend. I'm fairly clueless on the
> DFS/LFS issue, so a DFS that needs no LFS might be a good first choice
> for testing the (3) system cluster.

That, or a 4th node acting like a NAS sharing the filesystem over NFS.

> > I prefer not to supply the same amount of storage at server B as
> > server A has. The remote location generally only needs access to 5% 
of
> > the total amount of files stored on server A. But not always the same 
5%.
> > Does anyone know of a filesystem that can handle this?
> 
> So in clustering, from what I have read, there are all kinds of files
> passed around between the nodes and the master(s). Many are critical
> files not part of the application or scientific calculations.
> So in time, I think in a clustering evironment, all you seek is
> very possible, but it's a hunch, gut feeling, not fact. I'd put
> raid mirros underdneath that system, if it makes sense, for now,
> or just dd the stuff with a script of something kludgy (Alan is the
> king of kludge....)

Hmm... mirroring between servers. Always an option, except it will not work 
for me in this case:
1) Remote location will have a domestic ADSL line. I'll be lucky if it has a 
500kbps uplink
2) Server A, currently, has around 7TB of current data that also needs to 
be available on the remote site.

With a 8mbps downlink, waiting for a file to be copied to the remote site is 
acceptable. After modifications, the new version can be copied back to 
serverA slowly during network-idle-time or when server A actually needs it.
If there is a constant mirroring between A and B, the 500kbps (if I am 
lucky) will be insufficient.

> On gentoo planet one of the devs has "Consul" in his overlays. Read
> up on that for ideas that may be relevant to what you need.

Assuming the following is the website:
http://www.consul.io/intro/vs/

Then this seems more a tool to replace Nagios, Puppet and similar. It 
doesn't have any magic inside to actually distribute a filesystem in a way 
that when a file is "cached" at the local site, you don't have to wait for it to 
download from the remote site. And any changes to the file will be copied 
to the master store automagically.
It is intelligent enough to invalidate local copies only when the master 
copy got changed.
And it distributes write-locks to ensure edits can occur only via 1 server at 
a time. And every user will always get the latest version, regardless of 
where/when it was last edited.

--
Joost 

> 
> > Joost
> 
> James


[-- Attachment #2: Type: text/html, Size: 23291 bytes --]

  reply	other threads:[~2014-09-17 19:34 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-16 19:07 [gentoo-user] File system testing James
2014-09-17  7:45 ` J. Roeleveld
2014-09-17 15:55   ` [gentoo-user] " James
2014-09-17 19:34     ` J. Roeleveld [this message]
2014-09-17 20:20       ` Alec Ten Harmsel
2014-09-17 20:56         ` James
2014-09-18  8:24           ` J. Roeleveld
2014-09-18  9:48             ` Rich Freeman
2014-09-18 10:22               ` J. Roeleveld
2014-09-19 13:41             ` James
2014-09-19 14:56               ` Rich Freeman
2014-09-19 15:06                 ` J. Roeleveld
2014-09-19 15:02               ` J. Roeleveld
2014-09-18  8:04         ` J. Roeleveld
2014-09-18  9:17         ` Kerin Millar
2014-09-18 13:12           ` Alec Ten Harmsel
2014-09-19 15:21             ` Kerin Millar
2014-09-17 18:10 ` [gentoo-user] " Hervé Guillemet
2014-09-17 18:21   ` J. Roeleveld
2014-09-17 21:05     ` [gentoo-user] " James
2014-09-18  7:29       ` J. Roeleveld
2014-09-18  8:28     ` [gentoo-user] " Kerin Millar
2014-09-25 20:56     ` thegeezer
2014-09-18 15:32   ` [gentoo-user] " James
2014-09-25 20:47 ` [gentoo-user] " thegeezer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=15339117.pAj2kdbPAt@andromeda \
    --to=joost@antarean.org \
    --cc=gentoo-user@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox