From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lists.gentoo.org ([140.105.134.102] helo=robin.gentoo.org) by nuthatch.gentoo.org with esmtp (Exim 4.43) id 1DwSIU-0008DW-HN for garchives@archives.gentoo.org; Sat, 23 Jul 2005 22:16:42 +0000 Received: from robin.gentoo.org (localhost [127.0.0.1]) by robin.gentoo.org (8.13.4/8.13.4) with SMTP id j6NMEeIt022718; Sat, 23 Jul 2005 22:14:40 GMT Received: from rwcrmhc12.comcast.net (rwcrmhc13.comcast.net [204.127.198.39]) by robin.gentoo.org (8.13.4/8.13.4) with ESMTP id j6NMEdoi018272 for ; Sat, 23 Jul 2005 22:14:40 GMT Received: from [192.168.0.123] (pcp04370732pcs.nrockv01.md.comcast.net[69.140.218.245]) by comcast.net (rwcrmhc13) with ESMTP id <20050723221520015009vv1ve>; Sat, 23 Jul 2005 22:15:20 +0000 Message-ID: <42E2C177.6090001@erols.com> Date: Sat, 23 Jul 2005 18:15:19 -0400 From: Matt Randolph User-Agent: Mozilla Thunderbird 1.0.5 (X11/20050714) X-Accept-Language: en-us, en Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-amd64@gentoo.org Reply-to: gentoo-amd64@lists.gentoo.org MIME-Version: 1.0 To: gentoo-amd64@lists.gentoo.org Subject: Re: [gentoo-amd64] x86_64 optimization patches for glibc. References: <42E258A7.5080501@telia.com> In-Reply-To: <42E258A7.5080501@telia.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Archives-Salt: 497e95ea-1f66-43d6-8e5b-b55fc0fbefa8 X-Archives-Hash: bf9d533850db82ce942fc8a7765465ad Simon Strandman wrote: > Hi! > > Some binary distros like Mandrake and suse patches their glibcs with > x86_64 optimized strings and an x86_64 optimized libm to improve > performance. > > I tried extracting those patches from an mandrake SRPM and add them to > the glibc 2.3.5 ebuild. The x86_64 optimized strings patch built and > worked perfectly and gave a large speedup as you can see below. But I > couldn't get glibc to build with the libm patch because of unresolved > symbols (and I'm no programmer so I have no idea how to fix that). > > I found a small C program on a suse mailing-list to measure glibc > memory copy performance: > http://lists.suse.com/archive/suse-amd64/2005-Mar/0220.html > > With the glibc 2.3.5 currently in gentoo I get: > isidor ~ # ./memcpy 2200 1000 1048576 > Memory to memory copy rate = 1291.600098 MBytes / sec. Block size = > 1048576. > > But with glibc 2.3.5 + amd64 optimized strings I get: > isidor ~ # ./memcpy 2200 1000 1048576 > Memory to memory copy rate = 2389.321777 MBytes / sec. Block size = > 1048576. > > That's an improvement of over 1000mb/s! Suse 9.3 also gives about > 2300mb/s out of the box. > > How about adding these patches to gentoo? Perhaps in glibc 2.3.5-r1 > before it leaves package.mask? I'll create a bugreport about it if you > agree! > > This .tar.bz2 contains the glibc directory from my overlay with the > mandrake patches included in files/mdk, but the libm patches are > commented out in the ebuild. > http://snigel.no-ip.com/~nxsty/linux/glibc.tar.bz2 > There is a bug in the original memcpy.c that will cause a segfault if you don't pass it any parameters. Here is a fixed version. I've left everything else alone (except for a spelling correction). // memcpy.c - Measure how fast we can copy memory #include #include #include #include /* timing function */ #define rdtscll(val) do { \ unsigned int a,d; \ asm volatile("rdtsc" : "=a" (a), "=d" (d)); \ (val) = ((unsigned long)a) | (((unsigned long)d)<<32); \ } while(0) int main(int argc, char *argv[]) { int cpu_rate, num_loops, block_size, block_size_lwords, i, j; unsigned char *send_block_p, *rcv_block_p; unsigned long start_time, end_time; float rate; unsigned long *s_p, *r_p; if (argc != 4) { fprintf(stderr, "Usage: %s \n", argv[0] ); return 1; } cpu_rate = atoi(argv[1]); num_loops = atoi(argv[2]); block_size = atoi(argv[3]); block_size_lwords = block_size / sizeof(unsigned long); block_size = sizeof(unsigned long) * block_size_lwords; send_block_p = malloc(block_size); rcv_block_p = malloc(block_size); if ((send_block_p == NULL) || (rcv_block_p == NULL)) { fprintf(stderr, "Malloc failed to allocate block(s) of size %d.\n", block_size); } // start_time = clock(); rdtscll(start_time); for (i = 0; i < num_loops; i++) { memcpy(rcv_block_p, send_block_p, block_size); // s_p = (unsigned long *) send_block_p; // r_p = (unsigned long *) rcv_block_p; // // for (j = 0 ; j < block_size_lwords; j++) { // *(r_p++) = *(s_p++); // } } // end_time = clock(); rdtscll(end_time); rate = (float) (block_size) * (float) (num_loops) / ((float) (end_time - start_time)) * ((float) cpu_rate) * 1.0E6 / 1.0E6; fprintf(stdout, "Memory to memory copy rate = %f MBytes / sec. Block size = %d.\n", rate, block_size); } /* end main() */ -- "Pluralitas non est ponenda sine necessitate" - W. of O. -- gentoo-amd64@gentoo.org mailing list