From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lists.gentoo.org ([140.105.134.102] helo=robin.gentoo.org) by nuthatch.gentoo.org with esmtp (Exim 4.62) (envelope-from ) id 1Hp831-0004vR-Aw for garchives@archives.gentoo.org; Fri, 18 May 2007 19:23:31 +0000 Received: from robin.gentoo.org (localhost [127.0.0.1]) by robin.gentoo.org (8.14.0/8.14.0) with SMTP id l4IJLlUR009139; Fri, 18 May 2007 19:21:47 GMT Received: from alnrmhc11.comcast.net (alnrmhc11.comcast.net [204.127.225.91]) by robin.gentoo.org (8.14.0/8.14.0) with ESMTP id l4IJLii1009071 for ; Fri, 18 May 2007 19:21:45 GMT Received: from [71.236.188.93] (c-71-236-188-93.hsd1.or.comcast.net[71.236.188.93]) by comcast.net (alnrmhc11) with ESMTP id <20070518190643b1100kdtp7e>; Fri, 18 May 2007 19:06:43 +0000 Message-ID: <464DF942.3070401@cesmail.net> Date: Fri, 18 May 2007 12:06:42 -0700 From: "M. Edward (Ed) Borasky" User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.2) Gecko/20070221 SeaMonkey/1.1.1 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-science@gentoo.org Reply-to: gentoo-science@lists.gentoo.org MIME-Version: 1.0 To: gentoo-science@lists.gentoo.org Subject: [gentoo-science] [Fwd: [atlas-devel] 3.7.31 and threading problem] Content-Type: multipart/mixed; boundary="------------000001080009000808000102" X-Archives-Salt: f37958bc-a938-4279-ae50-895fd07184ac X-Archives-Hash: 27c8e54b73af29544b9ad58709a63093 This is a multi-part message in MIME format. --------------000001080009000808000102 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Is this applicable to the Gentoo ebuilds? --------------000001080009000808000102 Content-Type: message/rfc822; name="[atlas-devel] 3.7.31 and threading problem.eml" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="[atlas-devel] 3.7.31 and threading problem.eml" X-Account-Key: account2 X-Mozilla-Keys: Return-Path: Delivered-To: cesmail-net-znmeb@cesmail.net Received: (qmail 15501 invoked from network); 18 May 2007 06:41:58 -0000 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on blade6 X-Spam-Level: X-Spam-Status: hits=0.0 tests=none version=3.1.8 Received: from unknown (192.168.1.103) by blade6.cesmail.net with QMQP; 18 May 2007 06:41:58 -0000 Received: from lists-outbound.sourceforge.net (66.35.250.225) by mx53.cesmail.net with SMTP; 18 May 2007 06:41:58 -0000 Received: from sc8-sf-list1-new.sourceforge.net (sc8-sf-list1-new-b.sourceforge.net [10.3.1.93]) by sc8-sf-spam2.sourceforge.net (Postfix) with ESMTP id 9D2AB120AE; Thu, 17 May 2007 23:41:57 -0700 (PDT) Received: from sc8-sf-mx1-b.sourceforge.net ([10.3.1.91] helo=mail.sourceforge.net) by sc8-sf-list1-new.sourceforge.net with esmtp (Exim 4.43) id 1HowA0-0003z3-5A for math-atlas-devel@lists.sourceforge.net; Thu, 17 May 2007 23:41:56 -0700 Received: from mail0.cs.utsa.edu ([129.115.29.4]) by mail.sourceforge.net with esmtp (Exim 4.44) id 1How9z-00066t-Rz for math-atlas-devel@lists.sourceforge.net; Thu, 17 May 2007 23:41:56 -0700 Received: from pandora1.cs.utsa.edu (pandora1.cs.utsa.edu [129.115.29.16]) by mail0.cs.utsa.edu (Postfix) with ESMTP id AF4792BBE for ; Thu, 17 May 2007 16:45:31 -0500 (CDT) Received: (from whaley@localhost) by pandora1.cs.utsa.edu (8.13.8+Sun/8.13.8/Submit) id l4HLjW1A007591 for math-atlas-devel@lists.sourceforge.net; Thu, 17 May 2007 16:45:32 -0500 (CDT) From: Clint Whaley Message-Id: <200705172145.l4HLjW1A007591@pandora1.cs.utsa.edu> Date: Thu, 17 May 2007 16:45:32 -0500 To: math-atlas-devel@lists.sourceforge.net User-Agent: Heirloom mailx 12.0 3/4/06 MIME-Version: 1.0 Subject: [atlas-devel] 3.7.31 and threading problem X-BeenThere: math-atlas-devel@lists.sourceforge.net X-Mailman-Version: 2.1.8 Precedence: list Reply-To: "List for developer discussion, NOT SUPPORT." List-Id: "List for developer discussion, NOT SUPPORT." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: math-atlas-devel-bounces@lists.sourceforge.net Errors-To: math-atlas-devel-bounces@lists.sourceforge.net X-SpamCop-Checked: 192.168.1.103 66.35.250.225 10.3.1.93 10.3.1.91 129.115.29.4 129.115.29.16 Guys, OK, ATLAS 3.7.31 has finally escaped. It's been a while since the last developer release, but I've actually been working pretty fulltime on it for a while now. I've added support for MIPS, and some assembly kernels tuned for static MIPS archs. I've also update config to handle the MIPS/Linux system I had access to. I've also finally got OS X/G5 (AKA PPC970) support in the new framework. It took a lot of hoop-jumping. I presently have full config support only for OS X/G5 (including 64 & 32 bit arch defs); I have no idea how OS X/G4 will do, and don't know if things will work under Linux or AIX (all on the to-do list, if/when I get time & access). More importantly, we now have much better kernels for PowerPC970FX. Single precision now has an assembly kernel, which I wrote because the compiler-controlled altivec kernel had to be massaged with every compiler release, and so it was proving impossible to keep up to date. In assembly, I could make things a good deal faster, and so we now achieve almost 79% of peak in the kernel (the old kernel got something like 62% of peak). For double precision, a student in my Fundamentals of High Performance Optimization class, Tony Castaldo, found a cool trick for PPC970 FPU code: you get much better performance if you issue your instructions in sets of four (generally, 4 integer or load ops, followed by 4 fpops, etc). Tony also noticed that by mixing the iterations of the M-loop you could push performance slighly higher yet. With these tricks, ATLAS's kernel performance went from roughly 75% of peak to over 82%. Therefore, on my 2Ghz PowerPC970, I can now achieve over 6Gflop in a DGEMM, and almost 12Gflop in SGEMM for one processor (though you have to run big problems to see these high numbers). I also noticed something on the threading front which may be critical for some of you. For very large problems (eg N > 2K) ATLAS's threaded performance dropped badly, to below serial. The reason is tied into memory allocation. I'm pretty sure there's a fix that will allow the threaded code to handle this better, but in the meantime, if you experience this problem, pump up the maximum amount of workspace ATLAS is allowed by increasing the macro ATL_MaxMalloc in ATLAS/atlas_lv3.h. It is presently at 16MB; for my machine, I set it to 160MB, and then I never saw the problem again :) Cheers, Clint ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Math-atlas-devel mailing list Math-atlas-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/math-atlas-devel --------------000001080009000808000102-- -- gentoo-science@gentoo.org mailing list