* [gentoo-science] [Fwd: Re: [atlas-devel] 1) ATLAS shared libraries; 2) "ASM" -> "ASM VOLATILE"]
@ 2006-08-25 6:52 Adam Piątyszek
2006-08-25 12:42 ` [gentoo-science] " Markus Dittrich
0 siblings, 1 reply; 3+ messages in thread
From: Adam Piątyszek @ 2006-08-25 6:52 UTC (permalink / raw
To: Markus Dittrich; +Cc: gentoo-science
[-- Attachment #1: Type: text/plain, Size: 5259 bytes --]
Dear Markus, gentoo-science guys,
Please find below the reply from Clint to my yesterday's email related to
our work on ATLAS shared libraries in Gentoo.
Markus, I think we can help with answering the questions (2) and (3). Of
course, volunteers from gentoo-science are welcome as well.
BR,
/ediap
-------- Original message --------
Subject: Re: [atlas-devel] 1) ATLAS shared libraries; 2) "ASM" -> "ASM
VOLATILE"
Date: Thu, 24 Aug 2006 17:44:19 -0500
From: Clint Whaley <whaley@cs.utsa.edu>
Reply-To: List for developer discussion, NOT SUPPORT.
<math-atlas-devel@lists.sourceforge.net>
To: math-atlas-devel@lists.sourceforge.net
Adam,
>1) In parallel to your great work on new ATLAS releases, one Gentoo
>developer (Markus) and I have been working on preparing an updated set of
>patches to build both static and shared libraries of ATLAS.
Great!
>I am conscious that you recommended using ATLAS as a static library only,
>due to its better performance (I do not know the real difference, though).
>But in Gentoo, shared libraries are preferred.
>Could you please comment on the performance differences and possible
>extension of ATLAS official package with optional support of shared
>libraries? We are keen on supporting you with our patches.
OK, ATLAS still defaults to .a because it's what I use :) Back in the day,
ATLAS was mainly used in HPC, characterized by big applications that often
ran on parallel machines. Linking in an extra lib was the least of these
guys worries.
I'm still an HPC guy at heart, so I always use .a when available. However,
I do not believe the performance difference should be noticable to the average
user. Here's what I *think* is the affects of shared libs:
(1) An extra register is used to store the ptr to a table in memory for
global memory (this is what -fpic does, I think)
-- I don't think this hurts ATLAS, because ATLAS doesn't use global
memory. I assume (I don't know) that ATLAS's assembly is still
allowed to use that register, as long as it save/restores it . . .
(2) The first time the routine is called, there is greater overhead in a
.so, 'cause you have to load the shared object at that time
-- not sure how much worse this is that .a, since you usually have
to hit the disk on first load; .so is probably more likely to be
on completely different pages, I guess . . .
In these days where ATLAS is used for a whole lot of non-HPC things, as well
as being wrapped and plugged into high-level things like PSEs and Python,
my suspicion is that the *majority* of users would like shared libraries.
So, supporting an out-of-box build to .so is definitely in my plans,
I just haven't got around to doing the work yet. Because I have no real
experience with shared libs, I have questions that will need to be
investigated before I can do this:
(1) Is it true that the extra pointer may still be used if we restore it at
end of assembly routine?
(2) Does throwing the -fpic or other required compiler flag changes change
the best cases (thus necessitating doubling the arch defaults)?
(3) What is the overall performance affect when using .so?
I've tried to answer (1) by looking at some docs, but never got convinced
either way. I've been meaning to write a resister stress-test to see if
I can make gcc use the reserved register in a function w/o global data.
Perhaps you know?
You guys could help with (2) & (3) if you like. You could build out-of-box
to .a on whatever machines you can, and then build it to .so using your
gentoo harness, and post some head-to-head timings . . . If, as we suspect,
the difference is essentially zero, that makes .so a lot more attractive . . .
I doubt I'll spend a large amount of time getting .so in before getting
a new stable out, but if it doesn't require a huge amount of changes,
and someone can outline it to me so that I can see the tricks work
generally (i.e., not just one version of Linux), I'd certainly welcome
help with this . . .
>To build shared ATLAS we replaced most of the compiler variables with
>their special redefinitions using the "libtool". You can have a look at
>the patch in this bug-report: https://bugs.gentoo.org/show_bug.cgi?id=144314
On one of the comments there, you'll be happy to know I just added a
--with-netlib-lapack to config which allows ATLAS to automatically build the
combined lapack library, assuming netlib lapack has been installed prior to
the ATLAS build . . .
>2) BTW, in "include/contrib/camm_dpa.h" header file, we needed to change
>the "ASM" into "ASM VOLATILE" to build shared libraries. I wonder, if you
>can incorporate this change in the official ATLAS sources. Of course, when
>you are sure that it won't break something (I am not an assembler expert
>at all). ;)
This is a file written by Camm, who's the Debian ATLAS maintainer. He
also builds to .so, I think. So, Camm, is it OK to make this change?
Cheers,
Clint
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* [gentoo-science] Re: [Fwd: Re: [atlas-devel] 1) ATLAS shared libraries; 2) "ASM" -> "ASM VOLATILE"]
2006-08-25 6:52 [gentoo-science] [Fwd: Re: [atlas-devel] 1) ATLAS shared libraries; 2) "ASM" -> "ASM VOLATILE"] Adam Piątyszek
@ 2006-08-25 12:42 ` Markus Dittrich
2006-08-25 14:23 ` M. Edward (Ed) Borasky
0 siblings, 1 reply; 3+ messages in thread
From: Markus Dittrich @ 2006-08-25 12:42 UTC (permalink / raw
To: Adam Piątyszek; +Cc: gentoo-science
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Fri, 25 Aug 2006, Adam Pityszek wrote:
> Dear Markus, gentoo-science guys,
>
> Please find below the reply from Clint to my yesterday's email related to
> our work on ATLAS shared libraries in Gentoo.
>
> Markus, I think we can help with answering the questions (2) and (3). Of
> course, volunteers from gentoo-science are welcome as well.
>
> BR,
> /ediap
>
> (1) Is it true that the extra pointer may still be used if we restore it at
> end of assembly routine?
> (2) Does throwing the -fpic or other required compiler flag changes change
> the best cases (thus necessitating doubling the arch defaults)?
> (3) What is the overall performance affect when using .so?
>
> I've tried to answer (1) by looking at some docs, but never got convinced
> either way. I've been meaning to write a resister stress-test to see if
> I can make gcc use the reserved register in a function w/o global data.
> Perhaps you know?
>
> You guys could help with (2) & (3) if you like. You could build out-of-box
> to .a on whatever machines you can, and then build it to .so using your
> gentoo harness, and post some head-to-head timings . . . If, as we suspect,
> the difference is essentially zero, that makes .so a lot more attractive . . .
>
Hi Adam,
Thanks for talking to upstream about this and Clint's response
sounds encouraging. We could definitely help out with 2) and 3);
it would be good to know anyway how well we do with our shared
libs. In doing so we should also test the impact of using
the 387 floating point unit versus the sse instruction set.
According to Clint, the former can give a significant performance
gain on some CPU's. If that is the case it might be worth a
note in the ebuild to make our users aware of it.
We should get a hold of a nice benchmark suite for this purpose;
Clint has posted one on this gcc bug
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
which we might be able to use. I'll have a look at it.
Best,
Markus
- --
Markus Dittrich (markusle)
Gentoo Linux Developer
Scientific applications
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
iD8DBQFE7vAixlRwCwb7k40RAu4HAJ9qZ9UZr5Nt6rTAC/XoTXezJ+yqswCdE9iF
9766l1jX9prxDzgSjnsWPWU=
=++uN
-----END PGP SIGNATURE-----
--
gentoo-science@gentoo.org mailing list
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [gentoo-science] Re: [Fwd: Re: [atlas-devel] 1) ATLAS shared libraries; 2) "ASM" -> "ASM VOLATILE"]
2006-08-25 12:42 ` [gentoo-science] " Markus Dittrich
@ 2006-08-25 14:23 ` M. Edward (Ed) Borasky
0 siblings, 0 replies; 3+ messages in thread
From: M. Edward (Ed) Borasky @ 2006-08-25 14:23 UTC (permalink / raw
To: gentoo-science; +Cc: Adam Piątyszek
Markus Dittrich wrote:
> On Fri, 25 Aug 2006, Adam Pityszek wrote:
>
>>> Dear Markus, gentoo-science guys,
>>>
>>> Please find below the reply from Clint to my yesterday's email related to
>>> our work on ATLAS shared libraries in Gentoo.
>>>
>>> Markus, I think we can help with answering the questions (2) and (3). Of
>>> course, volunteers from gentoo-science are welcome as well.
>>>
>>> BR,
>>> /ediap
>>>
>>> (1) Is it true that the extra pointer may still be used if we restore
>>> it at
>>> end of assembly routine?
>>> (2) Does throwing the -fpic or other required compiler flag changes
>>> change
>>> the best cases (thus necessitating doubling the arch defaults)?
>>> (3) What is the overall performance affect when using .so?
>>>
>>> I've tried to answer (1) by looking at some docs, but never got convinced
>>> either way. I've been meaning to write a resister stress-test to see if
>>> I can make gcc use the reserved register in a function w/o global data.
>>> Perhaps you know?
>>>
>>> You guys could help with (2) & (3) if you like. You could build
>>> out-of-box
>>> to .a on whatever machines you can, and then build it to .so using your
>>> gentoo harness, and post some head-to-head timings . . . If, as we
>>> suspect,
>>> the difference is essentially zero, that makes .so a lot more
>>> attractive . . .
>>>
>
> Hi Adam,
>
> Thanks for talking to upstream about this and Clint's response
> sounds encouraging. We could definitely help out with 2) and 3);
> it would be good to know anyway how well we do with our shared libs. In
> doing so we should also test the impact of using
> the 387 floating point unit versus the sse instruction set. According to
> Clint, the former can give a significant performance
> gain on some CPU's. If that is the case it might be worth a note in the
> ebuild to make our users aware of it.
>
> We should get a hold of a nice benchmark suite for this purpose; Clint
> has posted one on this gcc bug
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
> which we might be able to use. I'll have a look at it.
>
> Best,
> Markus
>
>
> -- Markus Dittrich (markusle)
> Gentoo Linux Developer
> Scientific applications
If you have the time, you can turn off all of the pre-conceived notions
Atlas has about your architecture and let it benchmark itself. In fact,
for the hard-core number crunchers, you might actually want to put a USE
flag in the ebuild to do a "brute-force" assume-nothing compile, warning
them that it takes a long time and that it should be run after an
"emerge -f" with Linux in single-user mode. My recollection is that it
used to take about 8 hours on a 1.3 GHz Athlon Thunderbird.
--
gentoo-science@gentoo.org mailing list
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2006-08-25 14:24 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-25 6:52 [gentoo-science] [Fwd: Re: [atlas-devel] 1) ATLAS shared libraries; 2) "ASM" -> "ASM VOLATILE"] Adam Piątyszek
2006-08-25 12:42 ` [gentoo-science] " Markus Dittrich
2006-08-25 14:23 ` M. Edward (Ed) Borasky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox