public inbox for gentoo-science@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-science] [Fwd: [atlas-devel] 3.7.31 and threading problem]
@ 2007-05-18 19:06 M. Edward (Ed) Borasky
  2007-05-18 19:31 ` Markus Dittrich
  0 siblings, 1 reply; 5+ messages in thread
From: M. Edward (Ed) Borasky @ 2007-05-18 19:06 UTC (permalink / raw
  To: gentoo-science

[-- Attachment #1: Type: text/plain, Size: 42 bytes --]

Is this applicable to the Gentoo ebuilds?

[-- Attachment #2: [atlas-devel] 3.7.31 and threading problem.eml --]
[-- Type: message/rfc822, Size: 5961 bytes --]

From: Clint Whaley <whaley@cs.utsa.edu>
To: math-atlas-devel@lists.sourceforge.net
Subject: [atlas-devel] 3.7.31 and threading problem
Date: Thu, 17 May 2007 16:45:32 -0500
Message-ID: <200705172145.l4HLjW1A007591@pandora1.cs.utsa.edu>

Guys,

OK, ATLAS 3.7.31 has finally escaped.  It's been a while since the last 
developer release, but I've actually been working pretty fulltime on it
for a while now.

I've added support for MIPS, and some assembly kernels tuned for static
MIPS archs. I've also update config to handle the MIPS/Linux system I
had access to.  I've also finally got OS X/G5 (AKA PPC970) support in
the new framework.  It took a lot of hoop-jumping.  I presently have full
config support only for OS X/G5 (including 64 & 32 bit arch defs); I have no
idea how OS X/G4 will do, and don't know if things will work under Linux or
AIX (all on the to-do list, if/when I get time & access).

More importantly, we now have much better kernels for PowerPC970FX.  Single
precision now has an assembly kernel, which I wrote because the 
compiler-controlled altivec kernel had to be massaged with every compiler
release, and so it was proving impossible to keep up to date.  In assembly,
I could make things a good deal faster, and so we now achieve almost 79%
of peak in the kernel (the old kernel got something like 62% of peak).

For double precision, a student in my Fundamentals of High Performance
Optimization class, Tony Castaldo, found a cool trick for PPC970 FPU
code: you get much better performance if you issue your instructions
in sets of four (generally, 4 integer or load ops, followed by 4 fpops, etc).
Tony also noticed that by mixing the iterations of the M-loop you could push
performance slighly higher yet.  With these tricks, ATLAS's kernel performance
went from roughly 75% of peak to over 82%.

Therefore, on my 2Ghz PowerPC970, I can now achieve over 6Gflop in a DGEMM,
and almost 12Gflop in SGEMM for one processor (though you have to run big
problems to see these high numbers).

I also noticed something on the threading front which may be critical for
some of you.  For very large problems (eg N > 2K) ATLAS's threaded
performance dropped badly, to below serial.  The reason is tied into
memory allocation.  I'm pretty sure there's a fix that will allow the
threaded code to handle this better, but in the meantime, if you experience
this problem, pump up the maximum amount of workspace ATLAS is allowed by
increasing the macro ATL_MaxMalloc in ATLAS/atlas_lv3.h.  It is presently at
16MB; for my machine, I set it to 160MB, and then I never saw the problem
again :)

Cheers,
Clint

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-05-19  4:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-18 19:06 [gentoo-science] [Fwd: [atlas-devel] 3.7.31 and threading problem] M. Edward (Ed) Borasky
2007-05-18 19:31 ` Markus Dittrich
2007-05-18 19:57   ` M. Edward (Ed) Borasky
2007-05-18 20:33     ` Markus Dittrich
2007-05-19  4:38       ` M. Edward (Ed) Borasky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox