* [gentoo-amd64] amd64 and kernel configuration @ 2005-07-27 6:29 Dulmandakh Sukhbaatar 2005-07-27 6:10 ` [gentoo-amd64] " Duncan 0 siblings, 1 reply; 18+ messages in thread From: Dulmandakh Sukhbaatar @ 2005-07-27 6:29 UTC (permalink / raw To: gentoo-amd64 I'm new to amd64 and don't know how to configure kernel for best performance, but I've been using gentoo since 2004.1. Should I enable SMP, HyperThreading (name differs from hypertransport), and NUMA with single processor? I found out what with hypertransport performance will better than without it, but in the help of SMP suggests that if you have single processor its better to disable SMP. Thus disabling SMP there is no option for hyperthreading. Last is hyperthreading same as hypertransport or not? Enabling hyperthreading can i enable hypertransport? Sorry for my poor english :D. -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 18+ messages in thread
* [gentoo-amd64] Re: amd64 and kernel configuration 2005-07-27 6:29 [gentoo-amd64] amd64 and kernel configuration Dulmandakh Sukhbaatar @ 2005-07-27 6:10 ` Duncan 2005-07-27 6:19 ` NY Kwok 0 siblings, 1 reply; 18+ messages in thread From: Duncan @ 2005-07-27 6:10 UTC (permalink / raw To: gentoo-amd64 Dulmandakh Sukhbaatar posted <20050727062947.73020.qmail@mail.mng.mn>, excerpted below, on Wed, 27 Jul 2005 14:29:47 +0800: > I'm new to amd64 and don't know how to configure kernel for best > performance, but I've been using gentoo since 2004.1. Should I enable SMP, > HyperThreading (name differs from hypertransport), and NUMA with single > processor? I found out what with hypertransport performance will better > than without it, but in the help of SMP suggests that if you have single > processor its better to disable SMP. Thus disabling SMP there is no option > for hyperthreading. Last is hyperthreading same as hypertransport or not? > Enabling hyperthreading can i enable hypertransport? Sorry for my poor > english :D. Hypertransport is the name of the interconnect technology AMD uses. It's how the CPU connects to everything else. Therefore, you want that on, or it'll use slower modes. Hyperthreading is an Intel technology, used to help compensate for their very deep CPU pipelining, to minimize the time the CPU spends idle in case of a branch mispredict, by switching to the other thread while the first one goes back to memory to get all the stuff it thought it wouldn't need because it predicted the branch choice wrongly. AMD CPUs don't have such deep pipelining and have other technology to minimize branch mispredict penalties, so don't benefit much from hyperthreading, and therefore don't include it. If your CPU is indeed an AMD64 CPU, you don't want hyperthreading. If it's one of the new Intel x86_64 CPUs, you may or may not want it, depending on which particular one it is and whether hyperthreading is enabled on it or not. SMP is short for Symmetrical Multi-Processing. Traditionally, it meant you had two CPUs. However, hyperthreading is treated by the kernel as two CPUs, which is why SMP must be enabled to get the hyperthreading option. Note that the newest thing to come to x86/x86_64 is dual-core CPUs. These CPUs actually have two logical CPUs in one package. This is better than hyperthreading because it's the real thing. Both Intel and AMD have dual-core units available, but they are quite new and still expensive, so you aren't likely to have one and not know about it. Again, dual core is handled as SMP by the processor, so you'll want SMP on if you have a dual-core CPU. If you are using only a single-core AMD64, you'll want SMP off, because altho the kernel will work with it on, it'll be more bloated than it needs to be. Does that clear up the confusion? -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman in http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-amd64] Re: amd64 and kernel configuration 2005-07-27 6:10 ` [gentoo-amd64] " Duncan @ 2005-07-27 6:19 ` NY Kwok 2005-07-27 7:50 ` Dulmandakh Sukhbaatar 2005-07-27 10:18 ` Duncan 0 siblings, 2 replies; 18+ messages in thread From: NY Kwok @ 2005-07-27 6:19 UTC (permalink / raw To: gentoo-amd64 On 27/07/2005, at 4:10 PM, Duncan wrote: > > SMP is short for Symmetrical Multi-Processing. Traditionally, it meant > you had two CPUs. However, hyperthreading is treated by the kernel as > two > CPUs, which is why SMP must be enabled to get the hyperthreading > option. > Note that the newest thing to come to x86/x86_64 is dual-core CPUs. > These > CPUs actually have two logical CPUs in one package. This is better > than > hyperthreading because it's the real thing. Actually, dual-core means they have two physical cores in one package. Two logical cores = hyperthreading. ;P On that note, you want the AMD dual cores as well, because they are much better designed (they have the crossbar architecture all ready to drop in additional cores, whereas the current Intel dual-core are really ugly hacks and perform terribly compared to the AMD ones) -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 18+ messages in thread
* [gentoo-amd64] Re: amd64 and kernel configuration 2005-07-27 6:19 ` NY Kwok @ 2005-07-27 7:50 ` Dulmandakh Sukhbaatar 2005-07-27 7:04 ` Michal Žeravík 2005-07-27 10:13 ` [gentoo-amd64] " Duncan 2005-07-27 10:18 ` Duncan 1 sibling, 2 replies; 18+ messages in thread From: Dulmandakh Sukhbaatar @ 2005-07-27 7:50 UTC (permalink / raw To: gentoo-amd64 Thanks. How can I enable hypertransport in kernel or somewhere? Anyone knows about NUMA? I read about it, and it seems technology for multiprocessor systems. Thus I have single CPU, I don't need it. Right? > > On 27/07/2005, at 4:10 PM, Duncan wrote: >> >> SMP is short for Symmetrical Multi-Processing. Traditionally, it meant >> you had two CPUs. However, hyperthreading is treated by the kernel as >> two >> CPUs, which is why SMP must be enabled to get the hyperthreading option. >> Note that the newest thing to come to x86/x86_64 is dual-core CPUs. >> These >> CPUs actually have two logical CPUs in one package. This is better than >> hyperthreading because it's the real thing. > > Actually, dual-core means they have two physical cores in one package. Two > logical cores = hyperthreading. ;P > > On that note, you want the AMD dual cores as well, because they are much > better designed (they have the crossbar architecture all ready to drop in > additional cores, whereas the current Intel dual-core are really ugly > hacks and perform terribly compared to the AMD ones) > > -- > gentoo-amd64@gentoo.org mailing list > -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-amd64] Re: amd64 and kernel configuration 2005-07-27 7:50 ` Dulmandakh Sukhbaatar @ 2005-07-27 7:04 ` Michal Žeravík 2005-07-27 9:58 ` netpython 2005-07-27 10:02 ` Duncan 2005-07-27 10:13 ` [gentoo-amd64] " Duncan 1 sibling, 2 replies; 18+ messages in thread From: Michal Žeravík @ 2005-07-27 7:04 UTC (permalink / raw To: gentoo-amd64 So does it mean I should enable SMP support for Athlon64 (winchester, venice) ? michal Dulmandakh Sukhbaatar wrote: > Thanks. How can I enable hypertransport in kernel or somewhere? Anyone > knows about NUMA? I read about it, and it seems technology for > multiprocessor systems. Thus I have single CPU, I don't need it. Right? > >> >> On 27/07/2005, at 4:10 PM, Duncan wrote: >> >>> >>> SMP is short for Symmetrical Multi-Processing. Traditionally, it meant >>> you had two CPUs. However, hyperthreading is treated by the kernel >>> as two >>> CPUs, which is why SMP must be enabled to get the hyperthreading >>> option. >>> Note that the newest thing to come to x86/x86_64 is dual-core CPUs. >>> These >>> CPUs actually have two logical CPUs in one package. This is better >>> than >>> hyperthreading because it's the real thing. >> >> >> Actually, dual-core means they have two physical cores in one >> package. Two logical cores = hyperthreading. ;P >> On that note, you want the AMD dual cores as well, because they are >> much better designed (they have the crossbar architecture all ready >> to drop in additional cores, whereas the current Intel dual-core are >> really ugly hacks and perform terribly compared to the AMD ones) >> -- >> gentoo-amd64@gentoo.org mailing list > > > -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-amd64] Re: amd64 and kernel configuration 2005-07-27 7:04 ` Michal Žeravík @ 2005-07-27 9:58 ` netpython 2005-07-27 12:30 ` Brett Johnson 2005-07-27 10:02 ` Duncan 1 sibling, 1 reply; 18+ messages in thread From: netpython @ 2005-07-27 9:58 UTC (permalink / raw To: gentoo-amd64 I have enabled SMP on my gentoo AMD64 system and my box doesn't run any slower (or faster). On 7/27/05, Michal Žeravík <michalz@olomouc.com> wrote: > So does it mean I should enable SMP support for Athlon64 (winchester, > venice) ? > > michal > > > Dulmandakh Sukhbaatar wrote: > > > Thanks. How can I enable hypertransport in kernel or somewhere? Anyone > > knows about NUMA? I read about it, and it seems technology for > > multiprocessor systems. Thus I have single CPU, I don't need it. Right? > > > >> > >> On 27/07/2005, at 4:10 PM, Duncan wrote: > >> > >>> > >>> SMP is short for Symmetrical Multi-Processing. Traditionally, it meant > >>> you had two CPUs. However, hyperthreading is treated by the kernel > >>> as two > >>> CPUs, which is why SMP must be enabled to get the hyperthreading > >>> option. > >>> Note that the newest thing to come to x86/x86_64 is dual-core CPUs. > >>> These > >>> CPUs actually have two logical CPUs in one package. This is better > >>> than > >>> hyperthreading because it's the real thing. > >> > >> > >> Actually, dual-core means they have two physical cores in one > >> package. Two logical cores = hyperthreading. ;P > >> On that note, you want the AMD dual cores as well, because they are > >> much better designed (they have the crossbar architecture all ready > >> to drop in additional cores, whereas the current Intel dual-core are > >> really ugly hacks and perform terribly compared to the AMD ones) > >> -- > >> gentoo-amd64@gentoo.org mailing list > > > > > > > > -- > gentoo-amd64@gentoo.org mailing list > > -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-amd64] Re: amd64 and kernel configuration 2005-07-27 9:58 ` netpython @ 2005-07-27 12:30 ` Brett Johnson 2005-07-27 15:58 ` [gentoo-amd64] " Duncan 0 siblings, 1 reply; 18+ messages in thread From: Brett Johnson @ 2005-07-27 12:30 UTC (permalink / raw To: gentoo-amd64 netpython wrote: > I have enabled SMP on my gentoo AMD64 system and my > box doesn't run any slower (or faster). > As stated earlier by Duncan (in what I thought was a great explanation!); "If you are using only a single-core AMD64, you'll want SMP off, because altho the kernel will work with it on, it'll be more bloated than it needs to be." This just means the physical size of the kernel will be larger than it needs to be, and consume more memory. It will have no impact on overall system performance. -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 18+ messages in thread
* [gentoo-amd64] Re: Re: amd64 and kernel configuration 2005-07-27 12:30 ` Brett Johnson @ 2005-07-27 15:58 ` Duncan 0 siblings, 0 replies; 18+ messages in thread From: Duncan @ 2005-07-27 15:58 UTC (permalink / raw To: gentoo-amd64 Brett Johnson posted <42E77E5B.4040604@blzj.com>, excerpted below, on Wed, 27 Jul 2005 07:30:19 -0500: > netpython wrote: >> I have enabled SMP on my gentoo AMD64 system and my box doesn't run any >> slower (or faster). >> >> > As stated earlier by Duncan (in what I thought was a great explanation!); > > "If you are using only a single-core AMD64, you'll want SMP off, because > altho the kernel will work with it on, it'll be more bloated than it needs > to be." > > This just means the physical size of the kernel will be larger than it > needs to be, and consume more memory. It will have no impact on overall > system performance. Exactly so, because when the kernel doesn't detect a second CPU, it'll disable most of the SMP code and not even touch it, therefore not affecting performance. The only exception is the size of the kernel. Kernel memory is locked memory -- it cannot be swapped out. Therefore, a kernel larger than it has to be means less real memory available for other things, and more swapping and/or less caching than would otherwise be necessary. The effect isn't normally large enough to notice, but it /might/ mean occasionally waiting an extra few seconds for a swapped out app to load, or a file to be read from disk that otherwise would have still been in memory cache, were it not for that additional and entirely unused kernel bloat. BTW, that's also a good reason to keep drivers you don't use very often, likely floppy drivers, perhaps CD/DVD drivers and their filesystems, perhaps FAT filesystems, perhaps printers drivers and/or anything related such as parport drivers, perhaps scanner drivers, etc... keep them all compiled as modules, and only load those modules when needed, unloading them later. A loaded kernel module is part of the kernel, and as such, again, locked memory, not swappable. If you only use your floppy drive once a month, and only use the FAT filesystem when accessing the floppy, it simply makes no sense to compile it built-in to the kernel, or to keep those modules loaded when not in use. Far better to free that memory, so it may be used by something you are actually /using/. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman in http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 18+ messages in thread
* [gentoo-amd64] Re: Re: amd64 and kernel configuration 2005-07-27 7:04 ` Michal Žeravík 2005-07-27 9:58 ` netpython @ 2005-07-27 10:02 ` Duncan 1 sibling, 0 replies; 18+ messages in thread From: Duncan @ 2005-07-27 10:02 UTC (permalink / raw To: gentoo-amd64 Michal Zeravik posted <42E73212.2010202@olomouc.com>, excerpted below, on Wed, 27 Jul 2005 09:04:50 +0200: > So does it mean I should enable SMP support for Athlon64 (winchester, > venice) ? Only if it's dual-core. I haven't kept track of which code names are, but I don't believe those are, or it'd be advertised all over and you'd definitely know it. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman in http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 18+ messages in thread
* [gentoo-amd64] Re: amd64 and kernel configuration 2005-07-27 7:50 ` Dulmandakh Sukhbaatar 2005-07-27 7:04 ` Michal Žeravík @ 2005-07-27 10:13 ` Duncan 2005-07-27 10:27 ` Paolo Ripamonti 2005-07-27 10:46 ` [gentoo-amd64] " Drew Kirkpatrick 1 sibling, 2 replies; 18+ messages in thread From: Duncan @ 2005-07-27 10:13 UTC (permalink / raw To: gentoo-amd64 Dulmandakh Sukhbaatar posted <20050727075012.79549.qmail@mail.mng.mn>, excerpted below, on Wed, 27 Jul 2005 15:50:12 +0800: > Thanks. How can I enable hypertransport in kernel or somewhere? Anyone > knows about NUMA? I read about it, and it seems technology for > multiprocessor systems. Thus I have single CPU, I don't need it. Right? NUMA is indeed for multi-processor systems. NUMA is Non-Uniform Memory Architecture. With AMD CPUs that have the memory controller on the same chip as the CPU, that means that each CPU can control it's own memory. If you run NUMA mode in this case (and if your BIOS is set up accordingly), the kernel will try to keep the memory for each task in the memory handled by, local to, that CPU. If either the kernel or BIOS is set to unified memory, or if you only have memory sticks in the slots for one of the CPUs, then you won't get NUMA mode and the kernel won't care what memory addresses the memory for each process lives at. AFAIK, hypertransport is automatically handled by your choice of chipset. If the chipset you configure has it, it will be enabled, if not, it won't. I was therefore a bit puzzled when you mentioned hypertransport specifically in the previous post, since I don't believe there's a specific kernel option for it. (It's possible, however, that there is and I've just forgotten about it, since it's been awhile since I reviewed the settings for the entire kernel -- I just run make oldconfig and deal with any new options in each newer kernel, and additionally do any specific tweaking I might want to try.) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman in http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-amd64] Re: amd64 and kernel configuration 2005-07-27 10:13 ` [gentoo-amd64] " Duncan @ 2005-07-27 10:27 ` Paolo Ripamonti 2005-07-27 14:19 ` [gentoo-amd64] " Duncan 2005-07-27 10:46 ` [gentoo-amd64] " Drew Kirkpatrick 1 sibling, 1 reply; 18+ messages in thread From: Paolo Ripamonti @ 2005-07-27 10:27 UTC (permalink / raw To: gentoo-amd64 On 7/27/05, Duncan <1i5t5.duncan@cox.net> wrote: > I don't believe there's a > specific kernel option for it. (It's possible, however, that there is and Spent my morning browsing make menuconfig, well... there is no voice regarding hypertransport so I guess you're right (unless it's time for me to take a sight exam :-P) -- Paolo Ripamonti e-mail paolo.ripamonti@gmail.com web-site http://paoloripamonti.too.it ### To err is human, to moo bovine! ### -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 18+ messages in thread
* [gentoo-amd64] Re: Re: amd64 and kernel configuration 2005-07-27 10:27 ` Paolo Ripamonti @ 2005-07-27 14:19 ` Duncan 2005-07-27 14:31 ` Paolo Ripamonti 0 siblings, 1 reply; 18+ messages in thread From: Duncan @ 2005-07-27 14:19 UTC (permalink / raw To: gentoo-amd64 Paolo Ripamonti posted <40bdd4fd0507270327575194a8@mail.gmail.com>, excerpted below, on Wed, 27 Jul 2005 12:27:18 +0200: > On 7/27/05, Duncan <1i5t5.duncan@cox.net> wrote: >> I don't believe there's a >> specific kernel option for it. (It's possible, however, that there is > > Spent my morning browsing make menuconfig, well... there is no voice > regarding hypertransport so I guess you're right (unless it's time for me > to take a sight exam :-P) A likely more effective method of at least confirming whether there is such a thing, if you don't already know where it is, would be to view/pager/edit the .config file, since that puts it in flat format, meaning you just have to go down the list, rather than browsing thru all that nesting... Of course, grepping for "hypertransport" (case insensitive) might be even more useful, and quicker. Quickly opening a konsole to /usr/src/linux, firing up mc, and doing a search on the term in question, yields a surprising number of hits (this hypertransport in the kernel stuff is new info to me as well). Hypertransport is as I mentioned an AMD connection technology, but they've created a more or less open standard out of it, and apparently, a decent enough number of MIPS and PPC hardware platforms use it, that there is support in the Linux kernel on those platforms for it. From the quick search I just did, it appears the kernel DOES have a CONFIG_HYPERTRANSPORT option, but it only appears on MIPS, as a sub-option dependent on the Yosemite chipset/platform/whatever-it-may-be. Apparently on x86/x86_64/ppc hypertransport itself isn't an option, but something that you either have or don't have, based on the characteristics of the chipset drivers chosen. Quite interesting, I must say. I had known AMD had opened the standard and tried to make it a public one, but wasn't aware that some ppc and mips platforms had incorporated it, so I'm learning something new in all this, myself! That's actually one of the reasons I so enjoy newsgroups and lists such as this -- I never know when a question will come up that'll twist my inquisitiveness of into areas I would have never explored on my own, and I'll learn something either directly from the post content, or from my own exploration stimulated by the content of the post. Those unexpected "ah-ha!" moments, as the new idea clicks into place, filling an information void I didn't know existed, are something I crave, in large part /because/ they are sourced outside of myself, therefore something I wouldn't ordinarily stumble across in my own semi-structured meanderings in search of information. So very cool, you guys stimulated me to learn something I would have missed on my own, today! =8^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman in http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-amd64] Re: Re: amd64 and kernel configuration 2005-07-27 14:19 ` [gentoo-amd64] " Duncan @ 2005-07-27 14:31 ` Paolo Ripamonti 2005-07-27 16:16 ` [gentoo-amd64] " Duncan 0 siblings, 1 reply; 18+ messages in thread From: Paolo Ripamonti @ 2005-07-27 14:31 UTC (permalink / raw To: gentoo-amd64 On 7/27/05, Duncan <1i5t5.duncan@cox.net> wrote: > A likely more effective method of at least confirming whether there is > such a thing, if you don't already know where it is, would be to > view/pager/edit the .config file, since that puts it in flat format, > meaning you just have to go down the list, rather than browsing thru all > that nesting... > <snip> O'course I've browsed all only because I was searching for that little stupid pcspeaker module that has recently been moved in a new gracious position I nevere remember, and while surfing I noticed no ht... (I'm not so masochistic ;-) ) But me too, I must absolutely thanks all of you guys for this really interesting thread! This is the moment when you love even the word mailing-list! Cheers! -- Paolo Ripamonti e-mail paolo.ripamonti@gmail.com web-site http://paoloripamonti.too.it ### To err is human, to moo bovine! ### -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 18+ messages in thread
* [gentoo-amd64] Re: Re: Re: amd64 and kernel configuration 2005-07-27 14:31 ` Paolo Ripamonti @ 2005-07-27 16:16 ` Duncan 0 siblings, 0 replies; 18+ messages in thread From: Duncan @ 2005-07-27 16:16 UTC (permalink / raw To: gentoo-amd64 Paolo Ripamonti posted <40bdd4fd0507270731319e86c2@mail.gmail.com>, excerpted below, on Wed, 27 Jul 2005 16:31:36 +0200: > 'course I've browsed all only because I was searching for that little > stupid pcspeaker module that has recently been moved in a new gracious > position I nevere remember LOL! I've had similar experiences with settings I knew were there but couldn't find, sometimes because they moved from where they were when I configured them! I suspect most of us that configure our own kernel have experienced the same frustration, over the years. Still, I feel a bit better knowing I'm not the only one to get "lost", occasionally! <g> -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman in http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-amd64] Re: amd64 and kernel configuration 2005-07-27 10:13 ` [gentoo-amd64] " Duncan 2005-07-27 10:27 ` Paolo Ripamonti @ 2005-07-27 10:46 ` Drew Kirkpatrick 2005-07-27 15:42 ` [gentoo-amd64] " Duncan 1 sibling, 1 reply; 18+ messages in thread From: Drew Kirkpatrick @ 2005-07-27 10:46 UTC (permalink / raw To: gentoo-amd64 Just to point out, amd was calling the opterons and such more of a SUMO configuration (Sufficiently Uniform Memory Organization, not joking here), instead of NUMA. Whereas technically, it clearly is a NUMA system, the differences in latency when accessing memory from a bank attached to another processors memory controller is very small. Small enough to be largely ignored, and treated like uniform memory access latencies in a SMP system. Sorta in between SMP unified style memory access and NUMA. This holds for up to 3 hypertranport link hops, or up to 8 chips/sockets. You add hypertransport switches to scale over 8 chips/sockets, it'll most likely be a different story... What I've always wondered is, the NUMA code in the linux kernel, is this for handling traditional NUMA, like in a large computer system (big iron) where NUMA memory access latencies will vary greatly, or is it simply for optimizing the memory usage across the memory banks. Keeping data in the memory of the processor using it, etc, etc. Of course none of this matters for single chip/socket amd systems, as dual cores as well as single cores share a memory controller. Hmm, maybe I should drink some coffee and shutup until I'm awake... On 7/27/05, Duncan <1i5t5.duncan@cox.net> wrote: > Dulmandakh Sukhbaatar posted <20050727075012.79549.qmail@mail.mng.mn>, > excerpted below, on Wed, 27 Jul 2005 15:50:12 +0800: > > > Thanks. How can I enable hypertransport in kernel or somewhere? Anyone > > knows about NUMA? I read about it, and it seems technology for > > multiprocessor systems. Thus I have single CPU, I don't need it. Right? > > NUMA is indeed for multi-processor systems. NUMA is Non-Uniform Memory > Architecture. With AMD CPUs that have the memory controller on the same > chip as the CPU, that means that each CPU can control it's own memory. If > you run NUMA mode in this case (and if your BIOS is set up accordingly), > the kernel will try to keep the memory for each task in the memory handled > by, local to, that CPU. If either the kernel or BIOS is set to unified > memory, or if you only have memory sticks in the slots for one of the > CPUs, then you won't get NUMA mode and the kernel won't care what memory > addresses the memory for each process lives at. > > AFAIK, hypertransport is automatically handled by your choice of chipset. > If the chipset you configure has it, it will be enabled, if not, it won't. > I was therefore a bit puzzled when you mentioned hypertransport > specifically in the previous post, since I don't believe there's a > specific kernel option for it. (It's possible, however, that there is and > I've just forgotten about it, since it's been awhile since I reviewed > the settings for the entire kernel -- I just run make oldconfig and deal > with any new options in each newer kernel, and additionally do any > specific tweaking I might want to try.) > > -- > Duncan - List replies preferred. No HTML msgs. > "Every nonfree program has a lord, a master -- > and if you use the program, he is your master." Richard Stallman in > http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html > > > -- > gentoo-amd64@gentoo.org mailing list > > -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 18+ messages in thread
* [gentoo-amd64] Re: Re: amd64 and kernel configuration 2005-07-27 10:46 ` [gentoo-amd64] " Drew Kirkpatrick @ 2005-07-27 15:42 ` Duncan 2005-07-27 17:07 ` Jean.Borsenberger 0 siblings, 1 reply; 18+ messages in thread From: Duncan @ 2005-07-27 15:42 UTC (permalink / raw To: gentoo-amd64 Drew Kirkpatrick posted <81469e8e0507270346445f4363@mail.gmail.com>, excerpted below, on Wed, 27 Jul 2005 05:46:28 -0500: > Just to point out, amd was calling the opterons and such more of a SUMO > configuration (Sufficiently Uniform Memory Organization, not joking here), > instead of NUMA. Whereas technically, it clearly is a NUMA system, the > differences in latency when accessing memory from a bank attached to > another processors memory controller is very small. Small enough to be > largely ignored, and treated like uniform memory access latencies in a SMP > system. Sorta in between SMP unified style memory access and NUMA. This > holds for up to 3 hypertranport link hops, or up to 8 chips/sockets. You > add hypertransport switches to scale over 8 chips/sockets, it'll most > likely be a different story... I wasn't aware of the AMD "SUMO" moniker, but it /does/ make sense, given the design of the hardware. They have a very good point, that while it's physically NUMA, the latencies variances are so close to unified that in many ways it's indistinguishable -- except for the fact that keeping it NUMA means allowing independent access of two different apps running on two different CPUs, to their own memory in parallel, rather than one having to wait for the other, if the memory were interleaved and unified (as it would be for quad channel access, if that were enabled). > What I've always wondered is, the NUMA code in the linux kernel, is this > for handling traditional NUMA, like in a large computer system (big iron) > where NUMA memory access latencies will vary greatly, or is it simply for > optimizing the memory usage across the memory banks. Keeping data in the > memory of the processor using it, etc, etc. Of course none of this matters > for single chip/socket amd systems, as dual cores as well as single cores > share a memory controller. Hmm, maybe I should drink some coffee and > shutup until I'm awake... Well, yeah, for single-socket/dual-core, but what about dual socket (either single core or dual core)? Your questions make sense there, and that's what I'm running (single core, tho upgrading to dual core for a quad-core total board sometime next year, would be very nice, and just might be within the limits of my budget), so yes, I'm rather interested! The answer to your question on how the kernel deals with it, by my understanding, is this: The Linux kernel SMP/NUMA architecture allows for "CPU affinity grouping". In earlier kernels, it was all automated, but they are actually getting advanced enough now to allow deliberate manual splitting of various groups, and combined with userspace control applications, will ultimately be able to dynamically assign processes to one or more CPU groups of various sizes, controlling the CPU and memory resources available to individual processes. So, yes, I guess that means it's developing some pretty "big iron" qualities, altho many of them are still in flux and won't be stable at least in mainline for another six months or a year, at minimum. Let's refocus now back on the implementation and the smaller picture once again, to examine these "CPU affinity zones" in a bit more detail. The following is according to the writeups I've seen, mostly on LWN's weekly kernel pages. (Jon Corbet, LWN editor, does a very good job of balancing the technical kernel hacker level stuff with the middle-ground not-too-technical kernel follower stuff, good enough that I find the site useful enough to subscribe, even tho I could get even the premium content a week later for free. Yes, that's an endorsement of the site, because it's where a lot of my info comes from, and I'm certainly not one to try to keep my knowledge exclusive!) Anyway... from mainly that source... CPU affinity zones work with sets and supersets of processors. An Intel hyperthreading pair of virtual processors on the same physical processor will be at the highest affinity level, the lowest level aka strongest grouping in the hierarchy, because they share the same cache memory all the way up to L1 itself, and the Linux kernel can switch processes between the two virtual CPUs of a hyperthreaded CPU with zero cost or loss in performance, therefore only taking into account the relative balance of processes on each of the hyperthread virtual CPUs. At the next lowest level affinity, we'd have the dual-core AMDs, same chip, same memory controller, same local memory, same hypertransport interfaces to the chipset, other CPUs and the rest of the world, and very tightly cooperative, but with separate L2 and of course separate L1 cache. There's a slight performance penalty between switching processes between these CPUs, due to the cache flushing it would entail, but it's only very slight and quite speedy, so thread imbalance between the two processors doesn't have to get bad at all, before it's worth it to switch the CPUs to maintain balance, even at the cost of that cache flush. At a slightly lower level of affinity would be the Intel dual cores, since they aren't quite so tightly coupled, and don't share all the same interfaces to the outside world. In practice, since only one of these, the Intel dual core or the AMD dual core, will normally be encountered in real life, they can be treated at the same level, with possibly a small internal tweak to the relative weighting of thread imbalance vs performance loss for switching CPUs, based on which one is actually in place. Here things get interesting, because of the different implementations available. AMD's 2-way thru 8-way Opterons configured for unified memory access would be first, because again, their dedicated inter-CPU hypertransport links let them cooperate closer than conventional multi-socket CPUs would. Beyond that, it's a tossup between Intel's unified memory multi-processors and AMD's NUMA/SUMO memory Opterons. I'd still say the Opterons cooperate closer, even in NUMA/SUMO mode, than Intel chips will with unified memory, due to that SUMO aspect. At the same time, they have the parallel memory access advantages of NUMA. Beyond that, there's several levels of clustering, local/board, off-board but short-fat-pipe accessible (using technologies such as PCI interconnect, fibre-channel, and that SGI interconnect tech IDR the name of ATM), conventional (and Beowulf?) type clustering, and remote clustering. At each of these levels, as with the above, the cost to switch processes between peers at the same affinity level gets higher and higher, so the corresponding process imbalance necessary to trigger a switch likewise gets higher and higher, until at the extreme of remote clustering, it's almost done manually only. or anyway at the level of a user level application managing the transfers, rather than the kernel, directly (since, after all, with remote clustering, each remote group is probably running its own kernel, if not individual machines within that group). So, the point of all that is that the kernel sees a hierarchical grouping of CPUs, and is designed with more flexibility to balance processes and memory use at the extreme affinity end, and more hesitation to balance it due to the higher cost involved, at the extremely low affinity end. The main writeup I read on the subject dealt with thread/process CPU switching, not memory switching, but within the context of NUMA, the principles become so intertwined it's impossible to separate them, and the writeup very clearly made the point that the memory issues involved in making the transfer were included in the cost accounting as well. I'm not sure whether this addressed the point you were trying to make, or hit beside it, but anyway, it was fun trying to put into text for the first time since I read about it, the principles in that writeup, along with other facts I've merged along the way. My dad's a teacher, and I remember him many times making the point that the best way to learn something is to attempt to teach it. He used that principle in his own classes, having the students help each other, and I remember him making the point about himself as well, at one point, as he struggled to teach basic accounting principles based only on a textbook and the single college intro level class he had himself taken years before, when he found himself teaching a high school class on the subject. The principle is certainly true, as by explaining the affinity clustering principles here, it has forced me to ensure they form a reasonable and self consistent infrastructure in my own head, in ordered to be able to explain it in the post. So, anyway, thanks for the intellectual stimulation! <g> -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman in http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-amd64] Re: Re: amd64 and kernel configuration 2005-07-27 15:42 ` [gentoo-amd64] " Duncan @ 2005-07-27 17:07 ` Jean.Borsenberger 0 siblings, 0 replies; 18+ messages in thread From: Jean.Borsenberger @ 2005-07-27 17:07 UTC (permalink / raw To: gentoo-amd64 Well, may be it's SUMO, but when we swicth on the NUMA option for the kernel of our quadri-pro - 16Gb opteron it did speed up the OPENMP benchs between 20% to 30% (depending on the program considered). Note: OPenMP is a FORTRAN variation in which you put paralelisation directives, without boring of the implementation details, using a single address space, for all instances of the user program. Jean Borsenberger tel: +33 (0)1 45 07 76 29 Observatoire de Paris Meudon 5 place Jules Janssen 92195 Meudon France On Wed, 27 Jul 2005, Duncan wrote: > Drew Kirkpatrick posted <81469e8e0507270346445f4363@mail.gmail.com>, > excerpted below, on Wed, 27 Jul 2005 05:46:28 -0500: > > > Just to point out, amd was calling the opterons and such more of a SUMO > > configuration (Sufficiently Uniform Memory Organization, not joking here), > > instead of NUMA. Whereas technically, it clearly is a NUMA system, the > > differences in latency when accessing memory from a bank attached to > > another processors memory controller is very small. Small enough to be > > largely ignored, and treated like uniform memory access latencies in a SMP > > system. Sorta in between SMP unified style memory access and NUMA. This > > holds for up to 3 hypertranport link hops, or up to 8 chips/sockets. You > > add hypertransport switches to scale over 8 chips/sockets, it'll most > > likely be a different story... > > I wasn't aware of the AMD "SUMO" moniker, but it /does/ make sense, given > the design of the hardware. They have a very good point, that while it's > physically NUMA, the latencies variances are so close to unified that in > many ways it's indistinguishable -- except for the fact that keeping it > NUMA means allowing independent access of two different apps running on > two different CPUs, to their own memory in parallel, rather than one > having to wait for the other, if the memory were interleaved and unified > (as it would be for quad channel access, if that were enabled). > > > What I've always wondered is, the NUMA code in the linux kernel, is this > > for handling traditional NUMA, like in a large computer system (big iron) > > where NUMA memory access latencies will vary greatly, or is it simply for > > optimizing the memory usage across the memory banks. Keeping data in the > > memory of the processor using it, etc, etc. Of course none of this matters > > for single chip/socket amd systems, as dual cores as well as single cores > > share a memory controller. Hmm, maybe I should drink some coffee and > > shutup until I'm awake... > > Well, yeah, for single-socket/dual-core, but what about dual socket > (either single core or dual core)? Your questions make sense there, and > that's what I'm running (single core, tho upgrading to dual core for a > quad-core total board sometime next year, would be very nice, and just > might be within the limits of my budget), so yes, I'm rather interested! > > The answer to your question on how the kernel deals with it, by my > understanding, is this: The Linux kernel SMP/NUMA architecture allows for > "CPU affinity grouping". In earlier kernels, it was all automated, but > they are actually getting advanced enough now to allow deliberate manual > splitting of various groups, and combined with userspace control > applications, will ultimately be able to dynamically assign processes to > one or more CPU groups of various sizes, controlling the CPU and memory > resources available to individual processes. So, yes, I guess that means > it's developing some pretty "big iron" qualities, altho many of them are > still in flux and won't be stable at least in mainline for another six > months or a year, at minimum. > > Let's refocus now back on the implementation and the smaller picture once > again, to examine these "CPU affinity zones" in a bit more detail. The > following is according to the writeups I've seen, mostly on LWN's weekly > kernel pages. (Jon Corbet, LWN editor, does a very good job of balancing > the technical kernel hacker level stuff with the middle-ground > not-too-technical kernel follower stuff, good enough that I find the site > useful enough to subscribe, even tho I could get even the premium content > a week later for free. Yes, that's an endorsement of the site, because > it's where a lot of my info comes from, and I'm certainly not one to try > to keep my knowledge exclusive!) > > Anyway... from mainly that source... CPU affinity zones work with sets > and supersets of processors. An Intel hyperthreading pair of virtual > processors on the same physical processor will be at the highest affinity > level, the lowest level aka strongest grouping in the hierarchy, because > they share the same cache memory all the way up to L1 itself, and the > Linux kernel can switch processes between the two virtual CPUs of a > hyperthreaded CPU with zero cost or loss in performance, therefore only > taking into account the relative balance of processes on each of the > hyperthread virtual CPUs. > > At the next lowest level affinity, we'd have the dual-core AMDs, same > chip, same memory controller, same local memory, same hypertransport > interfaces to the chipset, other CPUs and the rest of the world, and very > tightly cooperative, but with separate L2 and of course separate L1 cache. > There's a slight performance penalty between switching processes between > these CPUs, due to the cache flushing it would entail, but it's only very > slight and quite speedy, so thread imbalance between the two processors > doesn't have to get bad at all, before it's worth it to switch the CPUs to > maintain balance, even at the cost of that cache flush. > > At a slightly lower level of affinity would be the Intel dual cores, since > they aren't quite so tightly coupled, and don't share all the same > interfaces to the outside world. In practice, since only one of these, > the Intel dual core or the AMD dual core, will normally be encountered in > real life, they can be treated at the same level, with possibly a small > internal tweak to the relative weighting of thread imbalance vs > performance loss for switching CPUs, based on which one is actually in > place. > > Here things get interesting, because of the different implementations > available. AMD's 2-way thru 8-way Opterons configured for unified memory > access would be first, because again, their dedicated inter-CPU > hypertransport links let them cooperate closer than conventional > multi-socket CPUs would. Beyond that, it's a tossup between Intel's > unified memory multi-processors and AMD's NUMA/SUMO memory Opterons. I'd > still say the Opterons cooperate closer, even in NUMA/SUMO mode, than > Intel chips will with unified memory, due to that SUMO aspect. At the > same time, they have the parallel memory access advantages of NUMA. > > Beyond that, there's several levels of clustering, local/board, off-board > but short-fat-pipe accessible (using technologies such as PCI > interconnect, fibre-channel, and that SGI interconnect tech IDR the name > of ATM), conventional (and Beowulf?) type clustering, and remote > clustering. At each of these levels, as with the above, the cost to switch > processes between peers at the same affinity level gets higher and higher, > so the corresponding process imbalance necessary to trigger a switch > likewise gets higher and higher, until at the extreme of remote > clustering, it's almost done manually only. or anyway at the level of a > user level application managing the transfers, rather than the kernel, > directly (since, after all, with remote clustering, each remote group is > probably running its own kernel, if not individual machines within that > group). > > So, the point of all that is that the kernel sees a hierarchical grouping > of CPUs, and is designed with more flexibility to balance processes and > memory use at the extreme affinity end, and more hesitation to balance it > due to the higher cost involved, at the extremely low affinity end. The > main writeup I read on the subject dealt with thread/process CPU > switching, not memory switching, but within the context of NUMA, the > principles become so intertwined it's impossible to separate them, and the > writeup very clearly made the point that the memory issues involved in > making the transfer were included in the cost accounting as well. > > I'm not sure whether this addressed the point you were trying to make, or > hit beside it, but anyway, it was fun trying to put into text for the > first time since I read about it, the principles in that writeup, along > with other facts I've merged along the way. My dad's a teacher, and I > remember him many times making the point that the best way to learn > something is to attempt to teach it. He used that principle in his own > classes, having the students help each other, and I remember him making > the point about himself as well, at one point, as he struggled to teach > basic accounting principles based only on a textbook and the single > college intro level class he had himself taken years before, when he found > himself teaching a high school class on the subject. The principle is > certainly true, as by explaining the affinity clustering principles here, > it has forced me to ensure they form a reasonable and self consistent > infrastructure in my own head, in ordered to be able to explain it in the > post. So, anyway, thanks for the intellectual stimulation! <g> > > -- > Duncan - List replies preferred. No HTML msgs. > "Every nonfree program has a lord, a master -- > and if you use the program, he is your master." Richard Stallman in > http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html > > > -- > gentoo-amd64@gentoo.org mailing list > > -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 18+ messages in thread
* [gentoo-amd64] Re: Re: amd64 and kernel configuration 2005-07-27 6:19 ` NY Kwok 2005-07-27 7:50 ` Dulmandakh Sukhbaatar @ 2005-07-27 10:18 ` Duncan 1 sibling, 0 replies; 18+ messages in thread From: Duncan @ 2005-07-27 10:18 UTC (permalink / raw To: gentoo-amd64 NY Kwok posted <25f58b7910e09fd5453bb3ec534330d1@xsmail.com>, excerpted below, on Wed, 27 Jul 2005 16:19:42 +1000: > Actually, dual-core means they have two physical cores in one package. Two > logical cores = hyperthreading. ;P Absolutely correct. Minor brain fart, there, as they say. <g> Thanks for catching that and correcting it! =8^) (I was trying to emphasize that with the AMD design, at least, it's still a single piece of silicon, only with two functionally separate cores, like two CPUs in one but cooperating a bit better than two entirely separate CPUs would, and chose the wrong wording to convey what I wanted, so ended up conveying something rather different, instead. =8^\ ) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman in http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2005-07-27 17:09 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-07-27 6:29 [gentoo-amd64] amd64 and kernel configuration Dulmandakh Sukhbaatar 2005-07-27 6:10 ` [gentoo-amd64] " Duncan 2005-07-27 6:19 ` NY Kwok 2005-07-27 7:50 ` Dulmandakh Sukhbaatar 2005-07-27 7:04 ` Michal Žeravík 2005-07-27 9:58 ` netpython 2005-07-27 12:30 ` Brett Johnson 2005-07-27 15:58 ` [gentoo-amd64] " Duncan 2005-07-27 10:02 ` Duncan 2005-07-27 10:13 ` [gentoo-amd64] " Duncan 2005-07-27 10:27 ` Paolo Ripamonti 2005-07-27 14:19 ` [gentoo-amd64] " Duncan 2005-07-27 14:31 ` Paolo Ripamonti 2005-07-27 16:16 ` [gentoo-amd64] " Duncan 2005-07-27 10:46 ` [gentoo-amd64] " Drew Kirkpatrick 2005-07-27 15:42 ` [gentoo-amd64] " Duncan 2005-07-27 17:07 ` Jean.Borsenberger 2005-07-27 10:18 ` Duncan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox