* [gentoo-user] File system testing
@ 2014-09-16 19:07 James
2014-09-17 7:45 ` J. Roeleveld
` (2 more replies)
0 siblings, 3 replies; 25+ messages in thread
From: James @ 2014-09-16 19:07 UTC (permalink / raw
To: gentoo-user
Hello,
By now many are familiar with my keen interest in clustering gentoo
systems. So, what most cluster technologies use is a distributed file
system on top of the local (HD/SDD) file system. Naturally not
all file systems, particularly the distributed file systems, have
straightforward instructions. Also, an device file system, such as
XFS and a distibuted (on top of the device file system) combination
may not work very well when paired. So a variety of testing is
something I'm researching. Eliminiation of either file system
listed below, due to Gentoo User Experience is most welcome information,
as well as tips and tricks to setting up any file system.
Distributed File Systems (DFS):
HDFS (poor performance)
Lustre
Ceph
XtreemFS
GlusterFS
MooseFS
FhGFS (BeeGFS) soon to be entirely open sourced?
Any other distributed file systems I should consider using?
Local (Device) File Systems LFS:
btrfs
zfs
ext4
xfs
Obviously I do not what to test all combinations of DFS/LocalFS
so your comments are extremely welcome as is any and all
related information.
James
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-user] File system testing
2014-09-16 19:07 [gentoo-user] File system testing James
@ 2014-09-17 7:45 ` J. Roeleveld
2014-09-17 15:55 ` [gentoo-user] " James
2014-09-17 18:10 ` [gentoo-user] " Hervé Guillemet
2014-09-25 20:47 ` [gentoo-user] " thegeezer
2 siblings, 1 reply; 25+ messages in thread
From: J. Roeleveld @ 2014-09-17 7:45 UTC (permalink / raw
To: gentoo-user
On Tuesday, September 16, 2014 07:07:38 PM James wrote:
> Hello,
>
> By now many are familiar with my keen interest in clustering gentoo
> systems. So, what most cluster technologies use is a distributed file
> system on top of the local (HD/SDD) file system. Naturally not
> all file systems, particularly the distributed file systems, have
> straightforward instructions. Also, an device file system, such as
> XFS and a distibuted (on top of the device file system) combination
> may not work very well when paired. So a variety of testing is
> something I'm researching. Eliminiation of either file system
> listed below, due to Gentoo User Experience is most welcome information,
> as well as tips and tricks to setting up any file system.
>
>
> Distributed File Systems (DFS):
> HDFS (poor performance)
> Lustre
> Ceph
> XtreemFS
> GlusterFS
> MooseFS
> FhGFS (BeeGFS) soon to be entirely open sourced?
> Any other distributed file systems I should consider using?
>
> Local (Device) File Systems LFS:
> btrfs
> zfs
> ext4
> xfs
>
> Obviously I do not what to test all combinations of DFS/LocalFS
> so your comments are extremely welcome as is any and all
> related information.
>
> James
James,
Is my understanding correct that the top list all require one of the bottom
list?
Eg. the "clustering" FSs only ensure the files on the LFSs are
duplicated/spread over the various nodes?
I would normally expect the clustering FS to be either the full layer or a
clustered block-device where an FS can be placed on top.
Otherwise it seems more like a network filesystem with caching options (See
AFS).
I am also interested in these filesystems, but for a slightly different
scenario:
- 2 servers in remote locations (different offices)
- 1 of these has all the files stored (server A) at the main office
- The other (server B - remote office) needs to "offer" all files from serverA
When server B needs to supply a file, it needs to check if the local copy is
still the "valid" version. If yes, supply the local copy, otherwise download
from server A. When a file is changed, server A needs to be updated.
While server B is sharing a file, the file needs to be locked on server A
preventing simultaneous updates.
I prefer not to supply the same amount of storage at server B as server A has.
The remote location generally only needs access to 5% of the total amount of
files stored on server A. But not always the same 5%.
Does anyone know of a filesystem that can handle this?
--
Joost
^ permalink raw reply [flat|nested] 25+ messages in thread
* [gentoo-user] Re: File system testing
2014-09-17 7:45 ` J. Roeleveld
@ 2014-09-17 15:55 ` James
2014-09-17 19:34 ` J. Roeleveld
0 siblings, 1 reply; 25+ messages in thread
From: James @ 2014-09-17 15:55 UTC (permalink / raw
To: gentoo-user
J. Roeleveld <joost <at> antarean.org> writes:
> > Distributed File Systems (DFS):
> > Local (Device) File Systems LFS:
> Is my understanding correct that the top list all require one of
> the bottom list?
> Eg. the "clustering" FSs only ensure the files on the LFSs are
> duplicated/spread over the various nodes?
> I would normally expect the clustering FS to be either the full layer
> or a clustered block-device where an FS can be placed on top.
I have not performed these installation yet. My research indicates
that first you put the Local FS on the drive, just like any installation
of Linux. Then you put the distributed FS on top of this. Some DFS might
not require a LFS, but FhGFS does and does HDFS. I will not acutally
be able to accurately answer your questions, until I start to build
up the 3 system cluster. (a week or 2 away) is my best guess.
> Otherwise it seems more like a network filesystem with caching
> options (See AFS).
OK, I'll add AFS. You may be correct on this one or AFS might be both.
> I am also interested in these filesystems, but for a slightly different
> scenario:
Ok, so I the "test-dummy-crash-victim" I'd be honored to have, you,
Alan, Neil, Mic etc etc back-seat-0drive on this adventure! (The more
I read the more it's time for burbon, bash, and a bit of cursing
to get started...)
> - 2 servers in remote locations (different offices)
> - 1 of these has all the files stored (server A) at the main office
> - The other (server B - remote office) needs to "offer" all files
> from serverA When server B needs to supply a file, it needs to
> check if the local copy is still the "valid" version.
> If yes, supply the local copy, otherwise download
> from server A. When a file is changed, server A needs to be updated.
> While server B is sharing a file, the file needs to be locked on server A
> preventing simultaneous updates.
OOch, file locking (precious tells me that is alway tricky).
(pist, systemd is causing fits for the clustering geniuses;
some are espousing a variety of cgroup gymnastics for phantom kills)
Spark is fault tolerant, regardless of node/memory/drive failures
above the fault tolerance that a file system configuration many support.
If fact, files lost can be 'regenerated' but it is computationally
expensive. You have to get your file system(s) set up. Then install
mesos-0.20.0 and then spark. I have mesos mostly ready. I should
have spark in alpha-beta this weekend. I'm fairly clueless on the
DFS/LFS issue, so a DFS that needs no LFS might be a good first choice
for testing the (3) system cluster.
> I prefer not to supply the same amount of storage at server B as
> server A has. The remote location generally only needs access to 5% of
> the total amount of files stored on server A. But not always the same 5%.
> Does anyone know of a filesystem that can handle this?
So in clustering, from what I have read, there are all kinds of files
passed around between the nodes and the master(s). Many are critical
files not part of the application or scientific calculations.
So in time, I think in a clustering evironment, all you seek is
very possible, but it's a hunch, gut feeling, not fact. I'd put
raid mirros underdneath that system, if it makes sense, for now,
or just dd the stuff with a script of something kludgy (Alan is the
king of kludge....)
On gentoo planet one of the devs has "Consul" in his overlays. Read
up on that for ideas that may be relevant to what you need.
> Joost
James
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-user] File system testing
2014-09-16 19:07 [gentoo-user] File system testing James
2014-09-17 7:45 ` J. Roeleveld
@ 2014-09-17 18:10 ` Hervé Guillemet
2014-09-17 18:21 ` J. Roeleveld
2014-09-18 15:32 ` [gentoo-user] " James
2014-09-25 20:47 ` [gentoo-user] " thegeezer
2 siblings, 2 replies; 25+ messages in thread
From: Hervé Guillemet @ 2014-09-17 18:10 UTC (permalink / raw
To: gentoo-user
Le 16/09/2014 21:07, James a écrit :
>
> By now many are familiar with my keen interest in clustering gentoo
> systems. So, what most cluster technologies use is a distributed file
> system on top of the local (HD/SDD) file system. Naturally not
> all file systems, particularly the distributed file systems, have
> straightforward instructions. Also, an device file system, such as
> XFS and a distibuted (on top of the device file system) combination
> may not work very well when paired. So a variety of testing is
> something I'm researching. Eliminiation of either file system
> listed below, due to Gentoo User Experience is most welcome information,
> as well as tips and tricks to setting up any file system.
Hi James,
Have you found this document :
http://hal.inria.fr/hal-00789086/PDF/a_survey_of_dfs.pdf
On a related matter, I'd like to host my own file server on a dedicated
box so that I can access my working files from serveral locations. I'd
like it to be fast and secure, and I don't mind if the files are
replicated on each workstation. What would be the better tools for this ?
--
Hervé
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-user] File system testing
2014-09-17 18:10 ` [gentoo-user] " Hervé Guillemet
@ 2014-09-17 18:21 ` J. Roeleveld
2014-09-17 21:05 ` [gentoo-user] " James
` (2 more replies)
2014-09-18 15:32 ` [gentoo-user] " James
1 sibling, 3 replies; 25+ messages in thread
From: J. Roeleveld @ 2014-09-17 18:21 UTC (permalink / raw
To: gentoo-user
On 17 September 2014 20:10:57 CEST, "Hervé Guillemet" <herve@guillemet.org> wrote:
>Le 16/09/2014 21:07, James a écrit :
>>
>> By now many are familiar with my keen interest in clustering gentoo
>> systems. So, what most cluster technologies use is a distributed file
>> system on top of the local (HD/SDD) file system. Naturally not
>> all file systems, particularly the distributed file systems, have
>> straightforward instructions. Also, an device file system, such as
>> XFS and a distibuted (on top of the device file system) combination
>> may not work very well when paired. So a variety of testing is
>> something I'm researching. Eliminiation of either file system
>> listed below, due to Gentoo User Experience is most welcome
>information,
>> as well as tips and tricks to setting up any file system.
>
>Hi James,
>
>Have you found this document :
>
>http://hal.inria.fr/hal-00789086/PDF/a_survey_of_dfs.pdf
>
>On a related matter, I'd like to host my own file server on a dedicated
>box so that I can access my working files from serveral locations. I'd
>like it to be fast and secure, and I don't mind if the files are
>replicated on each workstation. What would be the better tools for this
>?
AFS has caching and can survive temporary disappearance of the server.
For me, I need to be able to provide Samba filesharing on top of that layer on 2 different locations as I don't see the network bandwidth to be sufficient for normal operations. (ADSL uplinks tend to be dead slow)
--
Joost
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-user] Re: File system testing
2014-09-17 15:55 ` [gentoo-user] " James
@ 2014-09-17 19:34 ` J. Roeleveld
2014-09-17 20:20 ` Alec Ten Harmsel
0 siblings, 1 reply; 25+ messages in thread
From: J. Roeleveld @ 2014-09-17 19:34 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 6077 bytes --]
On Wednesday, September 17, 2014 03:55:56 PM James wrote:
> J. Roeleveld <joost <at> antarean.org> writes:
> > > Distributed File Systems (DFS):
> >
> > > Local (Device) File Systems LFS:
> > Is my understanding correct that the top list all require one of
> > the bottom list?
> > Eg. the "clustering" FSs only ensure the files on the LFSs are
> > duplicated/spread over the various nodes?
> >
> > I would normally expect the clustering FS to be either the full layer
> > or a clustered block-device where an FS can be placed on top.
>
> I have not performed these installation yet. My research indicates
> that first you put the Local FS on the drive, just like any installation
> of Linux. Then you put the distributed FS on top of this. Some DFS might
> not require a LFS, but FhGFS does and does HDFS. I will not acutally
> be able to accurately answer your questions, until I start to build
> up the 3 system cluster. (a week or 2 away) is my best guess.
Playing around with clusters is on my list, but due to other activities having
a higher priority, I haven't had much time yet.
> > Otherwise it seems more like a network filesystem with caching
> > options (See AFS).
>
> OK, I'll add AFS. You may be correct on this one or AFS might be both.
Personally, I would read up on these and see how they work. Then, based
on that, decide if they are likely to assist in the specific situation you are
interested in.
AFS, NFS, CIFS,... can be used for clusters, but, apart from NFS, I wouldn't
expect much performance out of them.
If you need it to be fault-tolerant and not overly rely on a single point of
failure, I wouldn't be using any of these. Only AFS, from my original
investigation, showed some fault-tolerence, but needed too many
resources (disk-space) on the clients.
> > I am also interested in these filesystems, but for a slightly different
>
> > scenario:
> Ok, so I the "test-dummy-crash-victim" I'd be honored to have, you,
> Alan, Neil, Mic etc etc back-seat-0drive on this adventure! (The more
> I read the more it's time for burbon, bash, and a bit of cursing
> to get started...)
Good luck and even though I'd love to join in with the testing, I simply do
not have the time to keep up. I would probably just slow you down.
> > - 2 servers in remote locations (different offices)
> > - 1 of these has all the files stored (server A) at the main office
> > - The other (server B - remote office) needs to "offer" all files
> > from serverA When server B needs to supply a file, it needs to
> > check if the local copy is still the "valid" version.
> > If yes, supply the local copy, otherwise download
> > from server A. When a file is changed, server A needs to be updated.
> > While server B is sharing a file, the file needs to be locked on server A
> > preventing simultaneous updates.
>
> OOch, file locking (precious tells me that is alway tricky).
I need it to be locked on server A while server B has a proper write-lock to
avoid 2 modifications to compete with each other.
> (pist, systemd is causing fits for the clustering geniuses;
> some are espousing a variety of cgroup gymnastics for phantom kills)
phantom kills?
> Spark is fault tolerant, regardless of node/memory/drive failures
> above the fault tolerance that a file system configuration many support.
> If fact, files lost can be 'regenerated' but it is computationally
> expensive.
Too much for me.
> You have to get your file system(s) set up. Then install
> mesos-0.20.0 and then spark. I have mesos mostly ready. I should
> have spark in alpha-beta this weekend. I'm fairly clueless on the
> DFS/LFS issue, so a DFS that needs no LFS might be a good first choice
> for testing the (3) system cluster.
That, or a 4th node acting like a NAS sharing the filesystem over NFS.
> > I prefer not to supply the same amount of storage at server B as
> > server A has. The remote location generally only needs access to 5%
of
> > the total amount of files stored on server A. But not always the same
5%.
> > Does anyone know of a filesystem that can handle this?
>
> So in clustering, from what I have read, there are all kinds of files
> passed around between the nodes and the master(s). Many are critical
> files not part of the application or scientific calculations.
> So in time, I think in a clustering evironment, all you seek is
> very possible, but it's a hunch, gut feeling, not fact. I'd put
> raid mirros underdneath that system, if it makes sense, for now,
> or just dd the stuff with a script of something kludgy (Alan is the
> king of kludge....)
Hmm... mirroring between servers. Always an option, except it will not work
for me in this case:
1) Remote location will have a domestic ADSL line. I'll be lucky if it has a
500kbps uplink
2) Server A, currently, has around 7TB of current data that also needs to
be available on the remote site.
With a 8mbps downlink, waiting for a file to be copied to the remote site is
acceptable. After modifications, the new version can be copied back to
serverA slowly during network-idle-time or when server A actually needs it.
If there is a constant mirroring between A and B, the 500kbps (if I am
lucky) will be insufficient.
> On gentoo planet one of the devs has "Consul" in his overlays. Read
> up on that for ideas that may be relevant to what you need.
Assuming the following is the website:
http://www.consul.io/intro/vs/
Then this seems more a tool to replace Nagios, Puppet and similar. It
doesn't have any magic inside to actually distribute a filesystem in a way
that when a file is "cached" at the local site, you don't have to wait for it to
download from the remote site. And any changes to the file will be copied
to the master store automagically.
It is intelligent enough to invalidate local copies only when the master
copy got changed.
And it distributes write-locks to ensure edits can occur only via 1 server at
a time. And every user will always get the latest version, regardless of
where/when it was last edited.
--
Joost
>
> > Joost
>
> James
[-- Attachment #2: Type: text/html, Size: 23291 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-user] Re: File system testing
2014-09-17 19:34 ` J. Roeleveld
@ 2014-09-17 20:20 ` Alec Ten Harmsel
2014-09-17 20:56 ` James
` (2 more replies)
0 siblings, 3 replies; 25+ messages in thread
From: Alec Ten Harmsel @ 2014-09-17 20:20 UTC (permalink / raw
To: gentoo-user
As far as HDFS goes, I would only set that up if you will use it for
Hadoop or related tools. It's highly specific, and the performance is
not good unless you're doing a massively parallel read (what it was
designed for). I can elaborate why if anyone is actually interested.
We use Lustre for our high performance general storage. I don't have any
numbers, but I'm pretty sure it is *really* fast (10Gbit/s over IB
sounds familiar, but don't quote me on that).
>
> Personally, I would read up on these and see how they work. Then,
> based on that, decide if they are likely to assist in the specific
> situation you are interested in.
>
Always good advice.
Alec
^ permalink raw reply [flat|nested] 25+ messages in thread
* [gentoo-user] Re: File system testing
2014-09-17 20:20 ` Alec Ten Harmsel
@ 2014-09-17 20:56 ` James
2014-09-18 8:24 ` J. Roeleveld
2014-09-18 8:04 ` J. Roeleveld
2014-09-18 9:17 ` Kerin Millar
2 siblings, 1 reply; 25+ messages in thread
From: James @ 2014-09-17 20:56 UTC (permalink / raw
To: gentoo-user
Alec Ten Harmsel <alec <at> alectenharmsel.com> writes:
> As far as HDFS goes, I would only set that up if you will use it for
> Hadoop or related tools. It's highly specific, and the performance is
> not good unless you're doing a massively parallel read (what it was
> designed for). I can elaborate why if anyone is actually interested.
Acutally, from my research and my goal (one really big scientific simulation
running constantly). Many folks are recommending to skip Hadoop/HDFS all
together and go straight to mesos/spark. RDD (in-memory) cluster calculations
are at the heart of my needs. The opposite end of the spectrum, loads
of small files and small apps; I dunno about, but, I'm all ears.
In the end, my (3) node scientific cluster will morph and support
the typical myriad of networked applications, but I can take
a few years to figure that out, or just copy what smart guys like
you and joost do.....
> We use Lustre for our high performance general storage. I don't have any
> numbers, but I'm pretty sure it is *really* fast (10Gbit/s over IB
> sounds familiar, but don't quote me on that).
AT Umich, you guys should test the FhGFS/btrfs combo. The folks
at UCI swear about it, although they are only publishing a wee bit.
(you know, water cooler gossip)...... Surely the Wolverines do not
want those californians getting up on them?
Are you guys planning a mesos/spark test?
> > Personally, I would read up on these and see how they work. Then,
> > based on that, decide if they are likely to assist in the specific
> > situation you are interested in.
It's a ton of reading. It's not apples-to-apple_cider type of reading.
My head hurts.....
I'm leaning to DFS/LFS
(2) Luster/btrfs and FhGFS/btrfs
Thoughts/comments?
James
^ permalink raw reply [flat|nested] 25+ messages in thread
* [gentoo-user] Re: File system testing
2014-09-17 18:21 ` J. Roeleveld
@ 2014-09-17 21:05 ` James
2014-09-18 7:29 ` J. Roeleveld
2014-09-18 8:28 ` [gentoo-user] " Kerin Millar
2014-09-25 20:56 ` thegeezer
2 siblings, 1 reply; 25+ messages in thread
From: James @ 2014-09-17 21:05 UTC (permalink / raw
To: gentoo-user
J. Roeleveld <joost <at> antarean.org> writes:
> AFS has caching and can survive temporary disappearance of the server.
Excellent for low bandwidth connections. Most DFS have mechanisms to
deal with transient failures, but not as generaous on the time-scale
as AFS. I believe, if I recall correctly, these hi-latency, low bandwith
recovery mechanism keen design paramters, at least bake in the
CMU develop cycples, for AFS?
While attractive for your situation, these features might actually
be detrimental to a hi_performance distributed cluster's needs for
a DFS?
> For me, I need to be able to provide Samba filesharing on top of that
> layer on 2 different locations as I don't see the network bandwidth to
> be sufficient for normal operations. (ADSL uplinks tend to be dead slow)
Yea, I'm not going to be testing OpenAFS for my needs, unless I read
some compelling publish data on it's applicability to high end
clusters best choice as a DFS.....
It's probably great for SETI etc etc.
James
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-user] Re: File system testing
2014-09-17 21:05 ` [gentoo-user] " James
@ 2014-09-18 7:29 ` J. Roeleveld
0 siblings, 0 replies; 25+ messages in thread
From: J. Roeleveld @ 2014-09-18 7:29 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 1672 bytes --]
On Wednesday, September 17, 2014 09:05:09 PM James wrote:
> J. Roeleveld <joost <at> antarean.org> writes:
> > AFS has caching and can survive temporary disappearance of the
server.
>
> Excellent for low bandwidth connections. Most DFS have mechanisms to
> deal with transient failures, but not as generaous on the time-scale
> as AFS. I believe, if I recall correctly, these hi-latency, low bandwith
> recovery mechanism keen design paramters, at least bake in the
> CMU develop cycples, for AFS?
>
> While attractive for your situation, these features might actually
> be detrimental to a hi_performance distributed cluster's needs for
> a DFS?
I tend to agree. I'm not sure how up-to-date AFS is, but from re-reading the
wikipedia pages, it sounds like what I need. Provided I can get it to work
together with Samba. I need to allow MS Windows laptops access to the
files on the remote location.
> > For me, I need to be able to provide Samba filesharing on top of that
> > layer on 2 different locations as I don't see the network bandwidth to
> > be sufficient for normal operations. (ADSL uplinks tend to be dead
slow)
>
> Yea, I'm not going to be testing OpenAFS for my needs, unless I read
> some compelling publish data on it's applicability to high end
> clusters best choice as a DFS.....
I wouldn't either.
> It's probably great for SETI etc etc.
Doubtful :)
Did you see the following wikipedia page:
http://en.wikipedia.org/wiki/List_of_file_systems
It contains a nice long list of various distributed, clustered,.... filesystems.
I just miss an indication on how well these are still supported and on which
OSs these (can) work.
--
Joost
[-- Attachment #2: Type: text/html, Size: 7704 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-user] Re: File system testing
2014-09-17 20:20 ` Alec Ten Harmsel
2014-09-17 20:56 ` James
@ 2014-09-18 8:04 ` J. Roeleveld
2014-09-18 9:17 ` Kerin Millar
2 siblings, 0 replies; 25+ messages in thread
From: J. Roeleveld @ 2014-09-18 8:04 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 1657 bytes --]
On Wednesday, September 17, 2014 04:20:24 PM Alec Ten Harmsel wrote:
> As far as HDFS goes, I would only set that up if you will use it for
> Hadoop or related tools. It's highly specific, and the performance is
> not good unless you're doing a massively parallel read (what it was
> designed for). I can elaborate why if anyone is actually interested.
>
> We use Lustre for our high performance general storage. I don't have
any
> numbers, but I'm pretty sure it is *really* fast (10Gbit/s over IB
> sounds familiar, but don't quote me on that).
I think any shared filesystem will be fast if you have a lot of bandwidth :)
When comparing network filesystems it makes sense to keep the hardware
identical reduce the overhead to a percentage. Eg. What is the theoretical
maximum speed for the used network. (10Gbit/s) and what is the actual
maximum speed you get with:
1) a single really large file (200GB)
2) a lot (100,000) smaller files (2MB)
Then you can make an estimate on what to expect when using a 1Gbit/s
network. I somehow don't expect James to have InfiniBand available for his
research?
Personally, when choosing between InfiniBand and Ethernet, I'm tempted
to go with dedicated bonded 10Gbit/s links because of the price-
difference. (A quick research shows me that Infiniband is about 3x as
expensive for the same throughput)
> > Personally, I would read up on these and see how they work. Then,
> > based on that, decide if they are likely to assist in the specific
> > situation you are interested in.
>
> Always good advice.
It saves time to do some simple research (the reading type) before
actually doing tests.
--
Joost
[-- Attachment #2: Type: text/html, Size: 6197 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-user] Re: File system testing
2014-09-17 20:56 ` James
@ 2014-09-18 8:24 ` J. Roeleveld
2014-09-18 9:48 ` Rich Freeman
2014-09-19 13:41 ` James
0 siblings, 2 replies; 25+ messages in thread
From: J. Roeleveld @ 2014-09-18 8:24 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 3874 bytes --]
On Wednesday, September 17, 2014 08:56:28 PM James wrote:
> Alec Ten Harmsel <alec <at> alectenharmsel.com> writes:
> > As far as HDFS goes, I would only set that up if you will use it for
> > Hadoop or related tools. It's highly specific, and the performance is
> > not good unless you're doing a massively parallel read (what it was
> > designed for). I can elaborate why if anyone is actually interested.
>
> Acutally, from my research and my goal (one really big scientific
simulation
> running constantly).
Out of curiosity, what do you want to simulate?
> Many folks are recommending to skip Hadoop/HDFS all
> together
I agree, Hadoop/HDFS is for data analysis. Like building a profile about
people based on the information companies like Facebook, Google, NSA,
Walmart, Governments, Banks,.... collect about their
customers/users/citizens/slaves/....
> and go straight to mesos/spark. RDD (in-memory) cluster
> calculations are at the heart of my needs. The opposite end of the
> spectrum, loads of small files and small apps; I dunno about, but, I'm all
> ears.
> In the end, my (3) node scientific cluster will morph and support
> the typical myriad of networked applications, but I can take
> a few years to figure that out, or just copy what smart guys like
> you and joost do.....
Nope, I'm simply following what you do and provide suggestions where I
can.
Most of the clusters and distributed computing stuff I do is based on
adding machines to distribute the load. But the mechanisms for these are
implemented in the applications I work with, not what I design underneath.
The filesystems I am interested in are different to the ones you want.
I need to provided access to software installation files to a VM server and
access to documentation which is created by the users.
The VM server is physically next to what I already mentioned as server A.
Access to the VM from the remote site will be using remote desktop
connections.
But to allow faster and easier access to the documentation, I need a
server B at the remote site which functions as described.
AFS might be suitable, but I need to be able to layer Samba on top of that
to allow a seamless operation.
I don't want the laptops to have their own cache and then having to figure
out how to solve the multiple different changes to documents containing
layouts. (MS Word and OpenDocument files)
> > We use Lustre for our high performance general storage. I don't have
any
> > numbers, but I'm pretty sure it is *really* fast (10Gbit/s over IB
> > sounds familiar, but don't quote me on that).
>
> AT Umich, you guys should test the FhGFS/btrfs combo. The folks
> at UCI swear about it, although they are only publishing a wee bit.
> (you know, water cooler gossip)...... Surely the Wolverines do not
> want those californians getting up on them?
>
> Are you guys planning a mesos/spark test?
>
> > > Personally, I would read up on these and see how they work. Then,
> > > based on that, decide if they are likely to assist in the specific
> > > situation you are interested in.
>
> It's a ton of reading. It's not apples-to-apple_cider type of reading.
> My head hurts.....
Take a walk outside. Clear air should help you with the headaches :P
> I'm leaning to DFS/LFS
>
> (2) Luster/btrfs and FhGFS/btrfs
>
> Thoughts/comments?
I have insufficient knowledge to advise on either of these.
One question, why BTRFS instead of ZFS?
My current understanding is:
- ZFS is production ready, but due to licensing issues, not included in the
kernel
- BTRFS is included, but not yet production ready with all planned features
For me, Raid6-like functionality is an absolute requirement and latest I
know is that that isn't implemented in BTRFS yet. Does anyone know when
that will be implemented and reliable? Eg. what time-frame are we talking
about?
--
Joost
[-- Attachment #2: Type: text/html, Size: 14985 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-user] File system testing
2014-09-17 18:21 ` J. Roeleveld
2014-09-17 21:05 ` [gentoo-user] " James
@ 2014-09-18 8:28 ` Kerin Millar
2014-09-25 20:56 ` thegeezer
2 siblings, 0 replies; 25+ messages in thread
From: Kerin Millar @ 2014-09-18 8:28 UTC (permalink / raw
To: gentoo-user
On 17/09/2014 19:21, J. Roeleveld wrote:
> On 17 September 2014 20:10:57 CEST, "Hervé Guillemet" <herve@guillemet.org> wrote:
>> Le 16/09/2014 21:07, James a écrit :
>>>
>>> By now many are familiar with my keen interest in clustering gentoo
>>> systems. So, what most cluster technologies use is a distributed file
>>> system on top of the local (HD/SDD) file system. Naturally not
>>> all file systems, particularly the distributed file systems, have
>>> straightforward instructions. Also, an device file system, such as
>>> XFS and a distibuted (on top of the device file system) combination
>>> may not work very well when paired. So a variety of testing is
>>> something I'm researching. Eliminiation of either file system
>>> listed below, due to Gentoo User Experience is most welcome
>> information,
>>> as well as tips and tricks to setting up any file system.
>>
>> Hi James,
>>
>> Have you found this document :
>>
>> http://hal.inria.fr/hal-00789086/PDF/a_survey_of_dfs.pdf
>>
>> On a related matter, I'd like to host my own file server on a dedicated
>> box so that I can access my working files from serveral locations. I'd
>> like it to be fast and secure, and I don't mind if the files are
>> replicated on each workstation. What would be the better tools for this
>> ?
>
> AFS has caching and can survive temporary disappearance of the server.
>
> For me, I need to be able to provide Samba filesharing on top of that layer on 2 different locations as I don't see the network bandwidth to be sufficient for normal operations. (ADSL uplinks tend to be dead slow)
You might try GlusterFS with two replicating bricks. The latest version
of Samba in portage includes a VFS plugin that can integrate GlusterFS
volumes via GFAPI.
--Kerin
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-user] Re: File system testing
2014-09-17 20:20 ` Alec Ten Harmsel
2014-09-17 20:56 ` James
2014-09-18 8:04 ` J. Roeleveld
@ 2014-09-18 9:17 ` Kerin Millar
2014-09-18 13:12 ` Alec Ten Harmsel
2 siblings, 1 reply; 25+ messages in thread
From: Kerin Millar @ 2014-09-18 9:17 UTC (permalink / raw
To: gentoo-user
On 17/09/2014 21:20, Alec Ten Harmsel wrote:
> As far as HDFS goes, I would only set that up if you will use it for
> Hadoop or related tools. It's highly specific, and the performance is
> not good unless you're doing a massively parallel read (what it was
> designed for). I can elaborate why if anyone is actually interested.
I, for one, am very interested.
--Kerin
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-user] Re: File system testing
2014-09-18 8:24 ` J. Roeleveld
@ 2014-09-18 9:48 ` Rich Freeman
2014-09-18 10:22 ` J. Roeleveld
2014-09-19 13:41 ` James
1 sibling, 1 reply; 25+ messages in thread
From: Rich Freeman @ 2014-09-18 9:48 UTC (permalink / raw
To: gentoo-user
The HTML...it hurts my eyes... :)
On Thu, Sep 18, 2014 at 4:24 AM, J. Roeleveld <joost@antarean.org> wrote:
>
> On Wednesday, September 17, 2014 08:56:28 PM James wrote:
>
>> Alec Ten Harmsel <alec <at> alectenharmsel.com> writes:
>
>> > As far as HDFS goes, I would only set that up if you will use it for
>> > Hadoop or related tools. It's highly specific, and the performance is
>> > not good unless you're doing a massively parallel read (what it was
>> > designed for). I can elaborate why if anyone is actually interested.
>
FYI - one very big limitation of hdfs is its minimum filesize is
something huge like 1MB or something like that. Hadoop was designed
to take a REALLY big input file and chunk it up. If you use hdfs to
store something like /usr/portage it will turn into the sort of
monstrosity that you'd actually need a cluster to store.
>
> My current understanding is:
>
> - ZFS is production ready, but due to licensing issues, not included in the
> kernel
>
> - BTRFS is included, but not yet production ready with all planned features
>
Your understanding of their maturity is fairly accurate. They also
aren't 100% moving in the same direction - btrfs aims more to be a
general-purpose filesystem replacement especially for smaller systems,
and zfs is more focused on the enterprise, so it lacks features like
raid reshaping (who needs to add 1 disk to a raid5 when you can just
add 5 more disks to your 30 disk storage system).
I think btrfs has a bit more hope of being an ext4 replacement some
day for both this reason and the licensing issue. That in no way
detracts from the usefulness of zfs, especially for larger deployments
where the few areas where btrfs is more flexible would probably be
looked at as gimmicks (kind of like being able to build your whole OS
from source :) ).
> For me, Raid6-like functionality is an absolute requirement and latest I
> know is that that isn't implemented in BTRFS yet. Does anyone know when that
> will be implemented and reliable? Eg. what time-frame are we talking about?
>
I suspect we're talking months before it is really implemented, and
much longer before it is reliable. Right now btrfs can write raid6,
but it can't really read it. That is, it operates just fine until you
actually lose a disk containing something other than parity, and then
it loses access to the data. This code is only in the kernel for
development purposes and nobody advocates using it for production.
Most of the code in btrfs which is reliable has been around for years,
like raid1 support, and obviously it will be years until the raid5/6
code reaches that point. I am using btrfs mainly because once that
day comes it will be much easier to migrate to it from btrfs raid1
than from zfs (which has no mechanism for migrating raid levels
in-place (that is, within an existing vdev) - you would need to add
new drives to the pool, migrate the data, and remove the old drives
from the pool, which is nice if you have a big stack of drives and
spare sata ports lying around like you would in a SAN).
--
Rich
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-user] Re: File system testing
2014-09-18 9:48 ` Rich Freeman
@ 2014-09-18 10:22 ` J. Roeleveld
0 siblings, 0 replies; 25+ messages in thread
From: J. Roeleveld @ 2014-09-18 10:22 UTC (permalink / raw
To: gentoo-user
On Thursday, September 18, 2014 05:48:58 AM Rich Freeman wrote:
> The HTML...it hurts my eyes... :)
Apologies.
> > My current understanding is:
> >
> > - ZFS is production ready, but due to licensing issues, not included in
> > the
> > kernel
> >
> > - BTRFS is included, but not yet production ready with all planned
> > features
>
> Your understanding of their maturity is fairly accurate. They also
> aren't 100% moving in the same direction - btrfs aims more to be a
> general-purpose filesystem replacement especially for smaller systems,
> and zfs is more focused on the enterprise, so it lacks features like
> raid reshaping (who needs to add 1 disk to a raid5 when you can just
> add 5 more disks to your 30 disk storage system).
Thank you for this info. I wasn't aware of this difference in 'design'.
Sounds like ZFS will be more suited for me then.
> I think btrfs has a bit more hope of being an ext4 replacement some
> day for both this reason and the licensing issue. That in no way
> detracts from the usefulness of zfs, especially for larger deployments
> where the few areas where btrfs is more flexible would probably be
> looked at as gimmicks (kind of like being able to build your whole OS
> from source :) ).
Next time I am rebuilding the desktops, I will likely switch them to BTRFS.
Sounds like BTRFS will be more suited there.
> > For me, Raid6-like functionality is an absolute requirement and latest I
> > know is that that isn't implemented in BTRFS yet. Does anyone know when
> > that will be implemented and reliable? Eg. what time-frame are we talking
> > about?
> I suspect we're talking months before it is really implemented, and
> much longer before it is reliable. Right now btrfs can write raid6,
> but it can't really read it. That is, it operates just fine until you
> actually lose a disk containing something other than parity, and then
> it loses access to the data. This code is only in the kernel for
> development purposes and nobody advocates using it for production.
> Most of the code in btrfs which is reliable has been around for years,
> like raid1 support, and obviously it will be years until the raid5/6
> code reaches that point. I am using btrfs mainly because once that
> day comes it will be much easier to migrate to it from btrfs raid1
> than from zfs (which has no mechanism for migrating raid levels
> in-place (that is, within an existing vdev) - you would need to add
> new drives to the pool, migrate the data, and remove the old drives
> from the pool, which is nice if you have a big stack of drives and
> spare sata ports lying around like you would in a SAN).
Exactly, although I prefer not to change the filesystem on a live system
anytime soon. When it comes to redoing the filesystem like that, restoring
from backups will be the fastest solution.
--
Joost
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-user] Re: File system testing
2014-09-18 9:17 ` Kerin Millar
@ 2014-09-18 13:12 ` Alec Ten Harmsel
2014-09-19 15:21 ` Kerin Millar
0 siblings, 1 reply; 25+ messages in thread
From: Alec Ten Harmsel @ 2014-09-18 13:12 UTC (permalink / raw
To: gentoo-user
On 09/18/2014 05:17 AM, Kerin Millar wrote:
> On 17/09/2014 21:20, Alec Ten Harmsel wrote:
>> As far as HDFS goes, I would only set that up if you will use it for
>> Hadoop or related tools. It's highly specific, and the performance is
>> not good unless you're doing a massively parallel read (what it was
>> designed for). I can elaborate why if anyone is actually interested.
>
> I, for one, am very interested.
>
> --Kerin
>
Alright, here goes:
Rich Freeman wrote:
> FYI - one very big limitation of hdfs is its minimum filesize is
> something huge like 1MB or something like that. Hadoop was designed
> to take a REALLY big input file and chunk it up. If you use hdfs to
> store something like /usr/portage it will turn into the sort of
> monstrosity that you'd actually need a cluster to store.
This is exactly correct, except we run with a block size of 128MB, and a large cluster will typically have a block size of 256MB or even 512MB.
HDFS has two main components: a NameNode, which keeps track of which blocks are a part of which file (in memory), and the DataNodes that actually store the blocks. No data ever flows through the NameNode; it negotiates transfers between the client and DataNodes and negotiates transfers for jobs. Since the NameNode stores metadata in-memory, small files are bad because RAM gets wasted.
What exactly is Hadoop/HDFS used for? The most common uses are generating search indices on data (which is a batch job) and doing non-realtime processing of log streams and/or data streams (another batch job) and allowing a large number of analysts run disparate queries on the same large dataset (another batch job). Batch processing - processing the entire dataset - is really where Hadoop shines.
When you put a file into HDFS, it gets split based on the block size. This is done so that a parallel read will be really fast - each map task reads in a single block and processes it. Ergo, if you put in a 1GB file with a 128MB block size and run a MapReduce job, 8 map tasks will be launched. If you put in a 1TB file, 8192 tasks would be launched. Tuning the block size is important to optimize the overhead of launching tasks vs. potentially under-utilizing a cluster. Typically, a cluster with a lot of data has a bigger block size.
The downsides of HDFS:
* Seeked reads are not supported afaik because no one needs that for batch processing
* Seeked writes into an existing file are not supported because either blocks would be added in the middle of a file and wouldn't be 128MB, or existing blocks would be edited, resulting in blocks larger than 128MB. Both of these scenarios are bad.
Since HDFS users typically do not need seeked reads or seeked writes, these downsides aren't really a big deal.
If something's not clear, let me know.
Alec
^ permalink raw reply [flat|nested] 25+ messages in thread
* [gentoo-user] Re: File system testing
2014-09-17 18:10 ` [gentoo-user] " Hervé Guillemet
2014-09-17 18:21 ` J. Roeleveld
@ 2014-09-18 15:32 ` James
1 sibling, 0 replies; 25+ messages in thread
From: James @ 2014-09-18 15:32 UTC (permalink / raw
To: gentoo-user
Hervé Guillemet <herve <at> guillemet.org> writes:
>
> Le 16/09/2014 21:07, James a écrit :
> >
> > By now many are familiar with my keen interest in clustering gentoo
> > systems. So, what most cluster technologies use is a distributed file
> > system on top of the local (HD/SDD) file system.
> Have you found this document :
> http://hal.inria.fr/hal-00789086/PDF/a_survey_of_dfs.pdf
Hello Herve,
Yes, I read the document and it is a good introduction to some
of my issues on which file system(s) to use for clustering. But, it's
more of a survey than a comparison/benchmark study, which would be
really beneficial.
DFS are moving so fast now, and their setups and features are
rarely a one to one match. For example, (currently) the best load balancing
you find, is actually in the apps that run above the cluster software. [1]
Some of the performance/resource-utilizations of the files systems/resources
are determined by real-time analytics with graphical displays. I'm
not sure that load balancing even belongs in a DFS, yet in the paper
you reference, it was prominently discussed. Things are moving so
fast there in the distributed-*/cluster/cluster-tools/cluster-apps
space, one really need a system set up to apply almost daily patches
for testing. I never realize just how much reading is necessary just
to understand the current landscape in clustering.
I'm trying to figure out an echo_system where gentoo folks can experiment
wtih mesos clustering for scientific applications. After that, the
more general case should be mature enough for general purpose applications.
I'm avoiding the clustered web arena, as that is just too much for
me to digest; so somebody else could champion that part of all of
those Apache-cluster technologies.
Thanks for the document link!
James
[1]
^ permalink raw reply [flat|nested] 25+ messages in thread
* [gentoo-user] Re: File system testing
2014-09-18 8:24 ` J. Roeleveld
2014-09-18 9:48 ` Rich Freeman
@ 2014-09-19 13:41 ` James
2014-09-19 14:56 ` Rich Freeman
2014-09-19 15:02 ` J. Roeleveld
1 sibling, 2 replies; 25+ messages in thread
From: James @ 2014-09-19 13:41 UTC (permalink / raw
To: gentoo-user
J. Roeleveld <joost <at> antarean.org> writes:
> Out of curiosity, what do you want to simulate?
subsurface flows in porous medium. AKA carbon sequestration
by injection wells. You know, provide proof that those
that remove hydrocarbons and actuall put the CO2 back
and significantly mitigate the effects of their ventures.
It's like this. I have been stuggling with my 17 year old "genius"
son who is a year away from entering medical school, with
learning responsibility. So I got him a hyperactive, highly
intelligent (mix-doberman) puppy to nurture, raise, train, love
and be resonsible for. It's one genious pup, teaching another
pup about being responsible.
So goes the earl_bidness.......imho.
> > Many folks are recommending to skip Hadoop/HDFS all together
> I agree, Hadoop/HDFS is for data analysis. Like building a profile
> about people based on the information companies like Facebook,
> Google, NSA, Walmart, Governments, Banks,.... collect about their
> customers/users/citizens/slaves/....
> > and go straight to mesos/spark. RDD (in-memory) cluster
> > calculations are at the heart of my needs. The opposite end of the
> > spectrum, loads of small files and small apps; I dunno about, but, I'm all
> > ears.
> > In the end, my (3) node scientific cluster will morph and support
> > the typical myriad of networked applications, but I can take
> > a few years to figure that out, or just copy what smart guys like
> > you and joost do.....
>
> Nope, I'm simply following what you do and provide suggestions where I can.
> Most of the clusters and distributed computing stuff I do is based on
> adding machines to distribute the load. But the mechanisms for these are >
implemented in the applications I work with, not what I design underneath.
> The filesystems I am interested in are different to the ones you want.
Maybe. I do not know what I want yet. My vision is very light weight
workstations running lxqt (small memory footprint) or such, and a bad_arse
cluster for the heavy lifting running on whatever heterogenous resoruces I
have. From what I've read, the cluster and the file systems are all
redundant that the cluster level (mesos/spark anyway) regardless of one any
give processor/system is doing. All of Alans fantasies (needs) can be
realized once the cluster stuff is master. (chronos, ansible etc etc).
> I need to provided access to software installation files to a VM server
> and access to documentation which is created by the users. The
> VM server is physically next to what I already mentioned as server A.
> Access to the VM from the remote site will be using remote desktop
> connections. But to allow faster and easier access to the
> documentation, I need a server B at the remote site which functions as
> described. AFS might be suitable, but I need to be able to layer Samba
> on top of that to allow a seamless operation.
> I don't want the laptops to have their own cache and then having to
> figure out how to solve the multiple different changes to documents
> containing layouts. (MS Word and OpenDocument files).
Ok so your customers (hperactive problem users) inteface to your cluster
to do their work. When finished you write things out to other servers
with all of the VM servers. Lots of really cool tools are emerging
in the cluster space.
I think these folks have mesos + spark + samba + nfs all in one box. [1]
Build rather than purchase? WE have to figure out what you and Alan need, on
a cluster, because it is what most folks need/want. It the admin_advantage
part of cluster. (There also the Big Science (me) and Web centric needs.
Right now they are realted project, but things will coalesce, imho. There is
even "Spark_sql" for postgres admins [2].
[1]
http://www.quantaqct.com/en/01_product/02_detail.php?mid=29&sid=162&id=163&qs=102
[2] https://spark.apache.org/sql/
> > > We use Lustre for our high performance general storage. I don't
> > > have any numbers, but I'm pretty sure it is *really* fast (10Gbit/s
> > > over IB sounds familiar, but don't quote me on that).
> >
> > AT Umich, you guys should test the FhGFS/btrfs combo. The folks
> > at UCI swear about it, although they are only publishing a wee bit.
> > (you know, water cooler gossip)...... Surely the Wolverines do not
> > want those californians getting up on them?
> > Are you guys planning a mesos/spark test?
> > > > Personally, I would read up on these and see how they work. Then,
> > > > based on that, decide if they are likely to assist in the specific
> > > > situation you are interested in.
> > It's a ton of reading. It's not apples-to-apple_cider type of reading.
> > My head hurts.....
> Take a walk outside. Clear air should help you with the headaches :P
Basketball, Boobs and Burbon use to work quite well. Now it's mostly
basketball, but I'm working on someone "very cute"......
> > I'm leaning to DFS/LFS
> > (2) Luster/btrfs and FhGFS/btrfs
> I have insufficient knowledge to advise on either of these.
> One question, why BTRFS instead of ZFS?
I think btrfs has tremendous potential. I tried ZFS a few times,
but the installs are not part of gentoo, so they got borked
uEFI, grubs to uuids, etc etc also were in the mix. That was almost
a year ago. For what ever reason the clustering folks I have
read and communicated with are using ext4, xfs and btrfs. Prolly
mostly because those are mostly used in their (systemd) inspired)
distros....?
> My current understanding is: - ZFS is production ready, but due to
> licensing issues, not included in the kernel - BTRFS is included, but
> not yet production ready with all planned features.
Yep. the license issue with ZFS is a real killer for me. Besides,
as an old state-machine, C hack, anything with B-tree is fabulous.
Prejudices? Yep, but here, I'm sticking with my gut. Multi port
ram can do mavelous things with Btree data structures. The
rest will become available/stable. Simply, I just trust btrfs, in
my gut.
> For me, Raid6-like functionality is an absolute requirement and latest I >
know is that that isn't implemented in BTRFS yet. Does anyone know when
> that will be implemented and reliable? Eg. what time-frame are we
> talking about?
Now we are "communicating"! We have different visions. I want cheap,
mirrored HD on small numbers of processors (less than 16 for now).
I want max ram of the hightest performance possilbe. I want my reduncancy
in my cluster with my cluster software deciding when/where/how-often
to write out to HD. If the max_ram is not enought, then SSD will
be between the ram and HD. Also, know this. The GPU will be assimilated
into the processors, just like the FPUs were, some decade ago. Remember
the i386 and the i387 math coprocessor chip? The good folks at opengl,
gcc (GNU) and others will soon (eventually?) give us compilers to
automagically use the gpu (and all of that blazingly fast ram therein,
as slave to Alan admin authority (some bullship like that).
So, my "Epiphany" is this. The bitches at systemd are to renamed
"StripperD", as they will manage the boot cycle (how fast you can
go down (save power) and come back up (online). The Cluster
will rule off of your hardware, like a "Sheilk" "the ring that rules
them all" be the driver of the gabage collect processes. The cluster
will be like the "knights of the round table"; each node helping, and
standing for those other nodes (nobles) that stumble, always with
extra resources, triple/quad redundancy and solving problems
before that kernel based "piece of" has a chance to anything
other than "go down" or "Come up" online.
We shall see just who the master is of my hardawre!
The sadest thing for me is that when I extolled about billion
dollar companies corrupting the kernel development process, I did
not even have those {hat wearing loosers} in mind. They are
irrelevant. I was thinking about those semiconductor companies.
You know the ones that accept billions of dollars for the NSA
and private spoofs to embed hardware inside of hardware. The ones
that can use "white noise" as a communications channel. The ones
that can tap a fiber optic cable, with penetration. Those are
the ones to focus on. Not a bunch of "silly boyz"......
My new K_main{} has highlighted a path to neuter systemd.
But I do like how StripperD moves up and down, very quickly.
Cool huh?
It's PARTY TIME!
> Joost
James
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-user] Re: File system testing
2014-09-19 13:41 ` James
@ 2014-09-19 14:56 ` Rich Freeman
2014-09-19 15:06 ` J. Roeleveld
2014-09-19 15:02 ` J. Roeleveld
1 sibling, 1 reply; 25+ messages in thread
From: Rich Freeman @ 2014-09-19 14:56 UTC (permalink / raw
To: gentoo-user
On Fri, Sep 19, 2014 at 9:41 AM, James <wireless@tampabay.rr.com> wrote:
>
> I think btrfs has tremendous potential. I tried ZFS a few times,
> but the installs are not part of gentoo, so they got borked
> uEFI, grubs to uuids, etc etc also were in the mix. That was almost
> a year ago. For what ever reason the clustering folks I have
> read and communicated with are using ext4, xfs and btrfs. Prolly
> mostly because those are mostly used in their (systemd) inspired)
> distros....?
I do think that btrfs in the long-term is more likely to be mainstream
on linux, but I wouldn't be surprised if getting zfs working on Gentoo
is much easier now. Richard Yao is both a Gentoo dev and significant
zfs on linux contributor, so I suspect he is doing much of the latter
on the former.
>
> Yep. the license issue with ZFS is a real killer for me. Besides,
> as an old state-machine, C hack, anything with B-tree is fabulous.
> Prejudices? Yep, but here, I'm sticking with my gut. Multi port
> ram can do mavelous things with Btree data structures. The
> rest will become available/stable. Simply, I just trust btrfs, in
> my gut.
I don't know enough about zfs to compare them, but the design of btrfs
has a certain amount of beauty/symmetry/etc to it IMHO. I only have
studied it enough to be dangerous and give some intro talks to my LUG,
but just about everything is stored in b-trees, the design allows both
fixed and non-fixed length nodes within the trees, and just about
everything about the filesystem is dynamic other than the superblocks,
which do little more than ID the filesystem and point to the current
tree roots. The important stuff is all replicated and versioned.
I wouldn't be surprised if it shared many of these design features
with other modern filesystems, and I do not profess to be an expert on
modern filesystem design, so I won't make any claims about btrfs being
better/worse than other filesystems in this regard. However, I would
say that anybody interested in data structures would do well to study
it.
--
Rich
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-user] Re: File system testing
2014-09-19 13:41 ` James
2014-09-19 14:56 ` Rich Freeman
@ 2014-09-19 15:02 ` J. Roeleveld
1 sibling, 0 replies; 25+ messages in thread
From: J. Roeleveld @ 2014-09-19 15:02 UTC (permalink / raw
To: gentoo-user
On Friday, September 19, 2014 01:41:26 PM James wrote:
> J. Roeleveld <joost <at> antarean.org> writes:
> > Out of curiosity, what do you want to simulate?
>
> subsurface flows in porous medium. AKA carbon sequestration
> by injection wells. You know, provide proof that those
> that remove hydrocarbons and actuall put the CO2 back
> and significantly mitigate the effects of their ventures.
Interesting topic. Can't provide advice on that topic.
> It's like this. I have been stuggling with my 17 year old "genius"
> son who is a year away from entering medical school, with
> learning responsibility. So I got him a hyperactive, highly
> intelligent (mix-doberman) puppy to nurture, raise, train, love
> and be resonsible for. It's one genious pup, teaching another
> pup about being responsible.
Overactive kids, always fun.
I try to keep mine busy without computers and TVs for now. (She's going to be
3 in November)
> So goes the earl_bidness.......imho.
>
> > > Many folks are recommending to skip Hadoop/HDFS all together
> >
> > I agree, Hadoop/HDFS is for data analysis. Like building a profile
> > about people based on the information companies like Facebook,
> > Google, NSA, Walmart, Governments, Banks,.... collect about their
> > customers/users/citizens/slaves/....
> >
> > > and go straight to mesos/spark. RDD (in-memory) cluster
> > > calculations are at the heart of my needs. The opposite end of the
> > > spectrum, loads of small files and small apps; I dunno about, but, I'm
> > > all
> > > ears.
> > > In the end, my (3) node scientific cluster will morph and support
> > > the typical myriad of networked applications, but I can take
> > > a few years to figure that out, or just copy what smart guys like
> > > you and joost do.....
> >
> >
> > Nope, I'm simply following what you do and provide suggestions where I
> > can.
> > Most of the clusters and distributed computing stuff I do is based on
> > adding machines to distribute the load. But the mechanisms for these are
> > implemented in the applications I work with, not what I design underneath.
> > The filesystems I am interested in are different to the ones you want.
>
> Maybe. I do not know what I want yet. My vision is very light weight
> workstations running lxqt (small memory footprint) or such, and a bad_arse
> cluster for the heavy lifting running on whatever heterogenous resoruces I
> have. From what I've read, the cluster and the file systems are all
> redundant that the cluster level (mesos/spark anyway) regardless of one any
> give processor/system is doing. All of Alans fantasies (needs) can be
> realized once the cluster stuff is master. (chronos, ansible etc etc).
Alan = your son? or?
I would, from the workstation point of view, keep the cluster as a single
entity, to keep things easier.
A cluster FS for workstation/desktop use is generally not suitable for a High
Performance Cluster (HPC) (or vice-versa)
> > I need to provided access to software installation files to a VM server
> > and access to documentation which is created by the users. The
> > VM server is physically next to what I already mentioned as server A.
> > Access to the VM from the remote site will be using remote desktop
> > connections. But to allow faster and easier access to the
> > documentation, I need a server B at the remote site which functions as
> > described. AFS might be suitable, but I need to be able to layer Samba
> > on top of that to allow a seamless operation.
> > I don't want the laptops to have their own cache and then having to
> > figure out how to solve the multiple different changes to documents
> > containing layouts. (MS Word and OpenDocument files).
>
> Ok so your customers (hperactive problem users) inteface to your cluster
> to do their work. When finished you write things out to other servers
> with all of the VM servers. Lots of really cool tools are emerging
> in the cluster space.
Actually, slightly different scenario.
Most work is done at customers systems. Occasionally we need to test software
versions prior to implementing these at customers. For that, we use VMs.
The VM-server we have is currently sufficient for this. When it isn't, we'll
need to add a 2nd VMserver.
On the NAS, we store:
- Documentation about customers + Howto documents on how to best install the
software.
- Installation files downloaded from vendors (We also deal with older versions
that are no longer available. We need to have our own collection to handle
that)
As we are looking into also working from a different location, we need:
- Access to the VM-server (easy, using VPN and Remote Desktops)
- Access to the files (I prefer to have a local 'cache' at the remote location)
It's the access to files part where I need to have some sort of "distributed"
filesystem.
> I think these folks have mesos + spark + samba + nfs all in one box. [1]
> [1]
> http://www.quantaqct.com/en/01_product/02_detail.php?mid=29&sid=162&id=163&q
> s=102
Had a quick look, these use MS Windows Storage 2012, this is only failover on
the storage side. I don't see anything related to what we are working with.
> Build rather than purchase? WE have to figure out what you and Alan need, on
> a cluster, because it is what most folks need/want. It the admin_advantage
> part of cluster. (There also the Big Science (me) and Web centric needs.
> Right now they are realted project, but things will coalesce, imho. There
> is even "Spark_sql" for postgres admins [2].
>
>
> [2] https://spark.apache.org/sql/
Hmm.... that is interesting.
> > > > We use Lustre for our high performance general storage. I don't
> > > > have any numbers, but I'm pretty sure it is *really* fast (10Gbit/s
> > > > over IB sounds familiar, but don't quote me on that).
> > >
> > > AT Umich, you guys should test the FhGFS/btrfs combo. The folks
> > > at UCI swear about it, although they are only publishing a wee bit.
> > > (you know, water cooler gossip)...... Surely the Wolverines do not
> > > want those californians getting up on them?
> > >
> > > Are you guys planning a mesos/spark test?
> > >
> > > > > Personally, I would read up on these and see how they work. Then,
> > > > > based on that, decide if they are likely to assist in the specific
> > > > > situation you are interested in.
> > >
> > > It's a ton of reading. It's not apples-to-apple_cider type of reading.
> > > My head hurts.....
> >
> > Take a walk outside. Clear air should help you with the headaches :P
>
> Basketball, Boobs and Burbon use to work quite well. Now it's mostly
> basketball, but I'm working on someone "very cute"......
Cloning? Genetics?
Now that I am interested in. I could do with a couple of clones. ;)
Btw, there are women who know more about some aspects of IT then you and me
put together. Some of those even manage to look great as well ;)
> > > I'm leaning to DFS/LFS
> > > (2) Luster/btrfs and FhGFS/btrfs
> >
> > I have insufficient knowledge to advise on either of these.
> > One question, why BTRFS instead of ZFS?
>
> I think btrfs has tremendous potential. I tried ZFS a few times,
> but the installs are not part of gentoo, so they got borked
> uEFI, grubs to uuids, etc etc also were in the mix. That was almost
> a year ago.
I did a quick test with Gentoo and ZFS. With the current documentation and
ebuilds, it actually is quite simple to get to use. Provided you don't intend
to use it for the root filesystem.
> For what ever reason the clustering folks I have
> read and communicated with are using ext4, xfs and btrfs. Prolly
> mostly because those are mostly used in their (systemd) inspired)
> distros....?
I think mostly because they are included native into the kernel and when
dealing with HPC, you don't want to use a filesystem that is know to eat memory
for breakfast.
When I switch the NAS over to ZFS, I will be using a dedicated machine with
16GB of memory. Probably going to increase that to 32GB not too long after.
> > My current understanding is: - ZFS is production ready, but due to
> > licensing issues, not included in the kernel - BTRFS is included, but
> > not yet production ready with all planned features.
>
> Yep. the license issue with ZFS is a real killer for me. Besides,
> as an old state-machine, C hack, anything with B-tree is fabulous.
> Prejudices? Yep, but here, I'm sticking with my gut. Multi port
> ram can do mavelous things with Btree data structures. The
> rest will become available/stable. Simply, I just trust btrfs, in
> my gut.
I think both are stable and usable, with the limitations I currently see and
confirmed by Rich.
> > For me, Raid6-like functionality is an absolute requirement and latest I
> > know is that that isn't implemented in BTRFS yet. Does anyone know when
> > that will be implemented and reliable? Eg. what time-frame are we
> > talking about?
>
> Now we are "communicating"! We have different visions. I want cheap,
> mirrored HD on small numbers of processors (less than 16 for now).
> I want max ram of the hightest performance possilbe. I want my reduncancy
> in my cluster with my cluster software deciding when/where/how-often
> to write out to HD. If the max_ram is not enought, then SSD will
> be between the ram and HD. Also, know this. The GPU will be assimilated
> into the processors, just like the FPUs were, some decade ago. Remember
> the i386 and the i387 math coprocessor chip? The good folks at opengl,
> gcc (GNU) and others will soon (eventually?) give us compilers to
> automagically use the gpu (and all of that blazingly fast ram therein,
> as slave to Alan admin authority (some bullship like that).
Yep, and for HPC and VMs, you want to keep as much memory available for what
matters.
For a file storage cluster, memory is there to assist the serving of files. (As
that is what matters there)
> So, my "Epiphany" is this. The bitches at systemd are to renamed
> "StripperD", as they will manage the boot cycle (how fast you can
> go down (save power) and come back up (online). The Cluster
> will rule off of your hardware, like a "Sheilk" "the ring that rules
> them all" be the driver of the gabage collect processes.
Aargh, garbage collectors...
They tend to spring into action when least convenient...
Try to be able to control when they start cleaning.
> The cluster
> will be like the "knights of the round table"; each node helping, and
> standing for those other nodes (nobles) that stumble, always with
> extra resources, triple/quad redundancy and solving problems
> before that kernel based "piece of" has a chance to anything
> other than "go down" or "Come up" online.
Interesting, need to parse this slowly over the weekend.
> We shall see just who the master is of my hardawre!
> The sadest thing for me is that when I extolled about billion
> dollar companies corrupting the kernel development process, I did
> not even have those {hat wearing loosers} in mind. They are
> irrelevant. I was thinking about those semiconductor companies.
> You know the ones that accept billions of dollars for the NSA
> and private spoofs to embed hardware inside of hardware. The ones
> that can use "white noise" as a communications channel. The ones
> that can tap a fiber optic cable, with penetration. Those are
> the ones to focus on. Not a bunch of "silly boyz"......
For that, you need to keep the important sensitive data off the grid.
> My new K_main{} has highlighted a path to neuter systemd.
> But I do like how StripperD moves up and down, very quickly.
I don't care about boot times or shutdown times. If I did, I'd invest in high
speed ram disks and SSDs.
Having 50 of the fastest SSDs in Raid-0 config will give more data then the
rest of the system can handle ;)
If then using that for VMs which can keep the entire virtual disk also in
memory, and you really are fllying with performance. That's why in-memory
systems are becoming popular again.
> Cool huh?
> It's PARTY TIME!
Parties are nice...
--
Joost
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-user] Re: File system testing
2014-09-19 14:56 ` Rich Freeman
@ 2014-09-19 15:06 ` J. Roeleveld
0 siblings, 0 replies; 25+ messages in thread
From: J. Roeleveld @ 2014-09-19 15:06 UTC (permalink / raw
To: gentoo-user
On Friday, September 19, 2014 10:56:59 AM Rich Freeman wrote:
> On Fri, Sep 19, 2014 at 9:41 AM, James <wireless@tampabay.rr.com> wrote:
> > I think btrfs has tremendous potential. I tried ZFS a few times,
> > but the installs are not part of gentoo, so they got borked
> > uEFI, grubs to uuids, etc etc also were in the mix. That was almost
> > a year ago. For what ever reason the clustering folks I have
> > read and communicated with are using ext4, xfs and btrfs. Prolly
> > mostly because those are mostly used in their (systemd) inspired)
> > distros....?
>
> I do think that btrfs in the long-term is more likely to be mainstream
> on linux, but I wouldn't be surprised if getting zfs working on Gentoo
> is much easier now. Richard Yao is both a Gentoo dev and significant
> zfs on linux contributor, so I suspect he is doing much of the latter
> on the former.
Don't have the link handy, but there is an howto about it that, when followed,
will give a ZFS pool running on Gentoo in a very short time. (emerge zfs is
the longest part of the whole thing)
Not even needed to reboot.
> > Yep. the license issue with ZFS is a real killer for me. Besides,
> > as an old state-machine, C hack, anything with B-tree is fabulous.
> > Prejudices? Yep, but here, I'm sticking with my gut. Multi port
> > ram can do mavelous things with Btree data structures. The
> > rest will become available/stable. Simply, I just trust btrfs, in
> > my gut.
>
> I don't know enough about zfs to compare them, but the design of btrfs
> has a certain amount of beauty/symmetry/etc to it IMHO. I only have
> studied it enough to be dangerous and give some intro talks to my LUG,
> but just about everything is stored in b-trees, the design allows both
> fixed and non-fixed length nodes within the trees, and just about
> everything about the filesystem is dynamic other than the superblocks,
> which do little more than ID the filesystem and point to the current
> tree roots. The important stuff is all replicated and versioned.
>
> I wouldn't be surprised if it shared many of these design features
> with other modern filesystems, and I do not profess to be an expert on
> modern filesystem design, so I won't make any claims about btrfs being
> better/worse than other filesystems in this regard. However, I would
> say that anybody interested in data structures would do well to study
> it.
I like the idea of both and hope BTRFS will also come with the raid-6-like
features and good support for larger drive counts (I've got 16 available for
the filestorage) to make it, for me, a viable alternative to ZFS.
--
Joost
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-user] Re: File system testing
2014-09-18 13:12 ` Alec Ten Harmsel
@ 2014-09-19 15:21 ` Kerin Millar
0 siblings, 0 replies; 25+ messages in thread
From: Kerin Millar @ 2014-09-19 15:21 UTC (permalink / raw
To: gentoo-user
On 18/09/2014 14:12, Alec Ten Harmsel wrote:
>
> On 09/18/2014 05:17 AM, Kerin Millar wrote:
>> On 17/09/2014 21:20, Alec Ten Harmsel wrote:
>>> As far as HDFS goes, I would only set that up if you will use it for
>>> Hadoop or related tools. It's highly specific, and the performance is
>>> not good unless you're doing a massively parallel read (what it was
>>> designed for). I can elaborate why if anyone is actually interested.
>>
>> I, for one, am very interested.
>>
>> --Kerin
>>
>
> Alright, here goes:
>
> Rich Freeman wrote:
>
>> FYI - one very big limitation of hdfs is its minimum filesize is
>> something huge like 1MB or something like that. Hadoop was designed
>> to take a REALLY big input file and chunk it up. If you use hdfs to
>> store something like /usr/portage it will turn into the sort of
>> monstrosity that you'd actually need a cluster to store.
>
> This is exactly correct, except we run with a block size of 128MB, and a large cluster will typically have a block size of 256MB or even 512MB.
>
> HDFS has two main components: a NameNode, which keeps track of which blocks are a part of which file (in memory), and the DataNodes that actually store the blocks. No data ever flows through the NameNode; it negotiates transfers between the client and DataNodes and negotiates transfers for jobs. Since the NameNode stores metadata in-memory, small files are bad because RAM gets wasted.
>
> What exactly is Hadoop/HDFS used for? The most common uses are generating search indices on data (which is a batch job) and doing non-realtime processing of log streams and/or data streams (another batch job) and allowing a large number of analysts run disparate queries on the same large dataset (another batch job). Batch processing - processing the entire dataset - is really where Hadoop shines.
>
> When you put a file into HDFS, it gets split based on the block size. This is done so that a parallel read will be really fast - each map task reads in a single block and processes it. Ergo, if you put in a 1GB file with a 128MB block size and run a MapReduce job, 8 map tasks will be launched. If you put in a 1TB file, 8192 tasks would be launched. Tuning the block size is important to optimize the overhead of launching tasks vs. potentially under-utilizing a cluster. Typically, a cluster with a lot of data has a bigger block size.
>
> The downsides of HDFS:
> * Seeked reads are not supported afaik because no one needs that for batch processing
> * Seeked writes into an existing file are not supported because either blocks would be added in the middle of a file and wouldn't be 128MB, or existing blocks would be edited, resulting in blocks larger than 128MB. Both of these scenarios are bad.
>
> Since HDFS users typically do not need seeked reads or seeked writes, these downsides aren't really a big deal.
>
> If something's not clear, let me know.
Thank you for taking the time to explain.
--Kerin
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-user] File system testing
2014-09-16 19:07 [gentoo-user] File system testing James
2014-09-17 7:45 ` J. Roeleveld
2014-09-17 18:10 ` [gentoo-user] " Hervé Guillemet
@ 2014-09-25 20:47 ` thegeezer
2 siblings, 0 replies; 25+ messages in thread
From: thegeezer @ 2014-09-25 20:47 UTC (permalink / raw
To: gentoo-user
On 16/09/14 20:07, James wrote:
> Hello,
>
> By now many are familiar with my keen interest in clustering gentoo
> systems. So, what most cluster technologies use is a distributed file
> system on top of the local (HD/SDD) file system. Naturally not
> all file systems, particularly the distributed file systems, have
> straightforward instructions. Also, an device file system, such as
> XFS and a distibuted (on top of the device file system) combination
> may not work very well when paired. So a variety of testing is
> something I'm researching. Eliminiation of either file system
> listed below, due to Gentoo User Experience is most welcome information,
> as well as tips and tricks to setting up any file system.
>
>
> Distributed File Systems (DFS):
> HDFS (poor performance)
> Lustre
> Ceph
> XtreemFS
> GlusterFS
> MooseFS
> FhGFS (BeeGFS) soon to be entirely open sourced?
> Any other distributed file systems I should consider using?
>
> Local (Device) File Systems LFS:
> btrfs
> zfs
> ext4
> xfs
>
> Obviously I do not what to test all combinations of DFS/LocalFS
> so your comments are extremely welcome as is any and all
> related information.
>
> James
>
>
howdy,
you might also like to see about GFS2, OCFS and OrangeFS.
GFS2 for me was major effort to get going on gentoo, OCFS worked almost
out of the box, but is from oracle.
in all cases writes were the biggest hurdle for me due to the
distributed lock mechanisms
ymmv
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-user] File system testing
2014-09-17 18:21 ` J. Roeleveld
2014-09-17 21:05 ` [gentoo-user] " James
2014-09-18 8:28 ` [gentoo-user] " Kerin Millar
@ 2014-09-25 20:56 ` thegeezer
2 siblings, 0 replies; 25+ messages in thread
From: thegeezer @ 2014-09-25 20:56 UTC (permalink / raw
To: gentoo-user
On 17/09/14 19:21, J. Roeleveld wrote:
> AFS has caching and can survive temporary disappearance of the server.
> For me, I need to be able to provide Samba filesharing on top of that
> layer on 2 different locations as I don't see the network bandwidth to
> be sufficient for normal operations. (ADSL uplinks tend to be dead
> slow) -- Joost
Riverbed wan appliances were always great for this. I would have loved
to see an open source version of their hash-zip-send as it worked
amazingly well.
however, from [1] you can mount.cifs with option fsc, and perhaps (sorry
not tried myself) then use something like cachefs to make for a
controlled size and location for that cache? also [2] might be of
interest to you
"
fsc Enable local disk caching using FS-Cache (off by default). This
option could be useful to improve performance on a slow link,
heavily loaded server and/or network where reading from the
disk is faster than reading from the server (over the network).
This could also impact scalability positively as the
number of calls to the server are reduced. However, local
caching is not suitable for all workloads for e.g. read-once
type workloads. So, you need to consider carefully your
workload/scenario before using this option. Currently, local
disk caching is functional for CIFS files opened as read-only.
"
[1] https://www.kernel.org/doc/readme/Documentation-filesystems-cifs-README
[2]
http://www.cyberciti.biz/faq/centos-redhat-install-configure-cachefilesd-for-nfs/
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2014-09-25 20:57 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-16 19:07 [gentoo-user] File system testing James
2014-09-17 7:45 ` J. Roeleveld
2014-09-17 15:55 ` [gentoo-user] " James
2014-09-17 19:34 ` J. Roeleveld
2014-09-17 20:20 ` Alec Ten Harmsel
2014-09-17 20:56 ` James
2014-09-18 8:24 ` J. Roeleveld
2014-09-18 9:48 ` Rich Freeman
2014-09-18 10:22 ` J. Roeleveld
2014-09-19 13:41 ` James
2014-09-19 14:56 ` Rich Freeman
2014-09-19 15:06 ` J. Roeleveld
2014-09-19 15:02 ` J. Roeleveld
2014-09-18 8:04 ` J. Roeleveld
2014-09-18 9:17 ` Kerin Millar
2014-09-18 13:12 ` Alec Ten Harmsel
2014-09-19 15:21 ` Kerin Millar
2014-09-17 18:10 ` [gentoo-user] " Hervé Guillemet
2014-09-17 18:21 ` J. Roeleveld
2014-09-17 21:05 ` [gentoo-user] " James
2014-09-18 7:29 ` J. Roeleveld
2014-09-18 8:28 ` [gentoo-user] " Kerin Millar
2014-09-25 20:56 ` thegeezer
2014-09-18 15:32 ` [gentoo-user] " James
2014-09-25 20:47 ` [gentoo-user] " thegeezer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox