[gentoo-amd64] Re: amd64 and kernel configuration

public inbox for gentoo-amd64@lists.gentoo.org
 help / color / mirror / Atom feed

* [gentoo-amd64]  Re: amd64 and kernel configuration
  2005-07-27  6:29 [gentoo-amd64] amd64 and kernel configuration Dulmandakh Sukhbaatar
@ 2005-07-27  6:10 ` Duncan
  2005-07-27  6:19   ` NY Kwok
  0 siblings, 1 reply; 18+ messages in thread
From: Duncan @ 2005-07-27  6:10 UTC (permalink / raw
  To: gentoo-amd64

Dulmandakh Sukhbaatar posted <20050727062947.73020.qmail@mail.mng.mn>,
excerpted below,  on Wed, 27 Jul 2005 14:29:47 +0800:

> I'm new to amd64 and don't know how to configure kernel for best
> performance, but I've been using gentoo since 2004.1. Should I enable SMP,
> HyperThreading (name differs from hypertransport), and NUMA with single
> processor? I found out what with hypertransport performance will better
> than without it, but in the help of SMP suggests that if you have single
> processor its better to disable SMP. Thus disabling SMP there is no option
> for hyperthreading. Last is hyperthreading same as hypertransport or not?
> Enabling hyperthreading can i enable hypertransport? Sorry for my poor
> english :D.

Hypertransport is the name of the interconnect technology AMD uses.  It's
how the CPU connects to everything else.  Therefore, you want that on, or
it'll use slower modes.

Hyperthreading is an Intel technology, used to help compensate for their
very deep CPU pipelining, to minimize the time the CPU spends idle in case
of a branch mispredict, by switching to the other thread while the first
one goes back to memory to get all the stuff it thought it wouldn't need
because it predicted the branch choice wrongly.  AMD CPUs don't have such
deep pipelining and have other technology to minimize branch mispredict
penalties, so don't benefit much from hyperthreading, and therefore don't
include it.  If your CPU is indeed an AMD64 CPU, you don't want
hyperthreading.  If it's one of the new Intel x86_64 CPUs, you may or may
not want it, depending on which particular one it is and whether
hyperthreading is enabled on it or not.

SMP is short for Symmetrical Multi-Processing.  Traditionally, it meant
you had two CPUs.  However, hyperthreading is treated by the kernel as two
CPUs, which is why SMP must be enabled to get the hyperthreading option. 
Note that the newest thing to come to x86/x86_64 is dual-core CPUs.  These
CPUs actually have two logical CPUs in one package.  This is better than
hyperthreading because it's the real thing.  Both Intel and AMD have
dual-core units available, but they are quite new and still expensive, so
you aren't likely to have one and not know about it.  Again, dual core is
handled as SMP by the processor, so you'll want SMP on if you have a
dual-core CPU.  If you are using only a single-core AMD64, you'll want SMP
off, because altho the kernel will work with it on, it'll be more bloated
than it needs to be.

Does that clear up the confusion?

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman in
http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html

-- 
gentoo-amd64@gentoo.org mailing list

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-amd64]  Re: amd64 and kernel configuration
  2005-07-27  6:10 ` [gentoo-amd64] " Duncan
@ 2005-07-27  6:19   ` NY Kwok
  2005-07-27  7:50     ` Dulmandakh Sukhbaatar
  2005-07-27 10:18     ` Duncan
  0 siblings, 2 replies; 18+ messages in thread
From: NY Kwok @ 2005-07-27  6:19 UTC (permalink / raw
  To: gentoo-amd64


On 27/07/2005, at 4:10 PM, Duncan wrote:
>
> SMP is short for Symmetrical Multi-Processing.  Traditionally, it meant
> you had two CPUs.  However, hyperthreading is treated by the kernel as 
> two
> CPUs, which is why SMP must be enabled to get the hyperthreading 
> option.
> Note that the newest thing to come to x86/x86_64 is dual-core CPUs.  
> These
> CPUs actually have two logical CPUs in one package.  This is better 
> than
> hyperthreading because it's the real thing.

Actually, dual-core means they have two physical cores in one package. 
Two logical cores = hyperthreading. ;P

On that note, you want the AMD dual cores as well, because they are 
much better designed (they have the crossbar architecture all ready to 
drop in additional cores, whereas the current Intel dual-core are 
really ugly hacks and perform terribly compared to the AMD ones)

-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [gentoo-amd64] amd64 and kernel configuration
@ 2005-07-27  6:29 Dulmandakh Sukhbaatar
  2005-07-27  6:10 ` [gentoo-amd64] " Duncan
  0 siblings, 1 reply; 18+ messages in thread
From: Dulmandakh Sukhbaatar @ 2005-07-27  6:29 UTC (permalink / raw
  To: gentoo-amd64

I'm new to amd64 and don't know how to configure kernel for best 
performance, but I've been using gentoo since 2004.1. Should I enable SMP, 
HyperThreading (name differs from hypertransport), and NUMA with single 
processor? I found out what with hypertransport performance will better than 
without it, but in the help of SMP suggests that if you have single 
processor its better to disable SMP. Thus disabling SMP there is no option 
for hyperthreading. Last is hyperthreading same as hypertransport or not? 
Enabling hyperthreading can i enable hypertransport? Sorry for my poor 
english :D.
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-amd64] Re: amd64 and kernel configuration
  2005-07-27  7:50     ` Dulmandakh Sukhbaatar
@ 2005-07-27  7:04       ` Michal Žeravík
  2005-07-27  9:58         ` netpython
  2005-07-27 10:02         ` Duncan
  2005-07-27 10:13       ` [gentoo-amd64] " Duncan
  1 sibling, 2 replies; 18+ messages in thread
From: Michal Žeravík @ 2005-07-27  7:04 UTC (permalink / raw
  To: gentoo-amd64

So does it mean I should enable SMP support for Athlon64 (winchester, 
venice) ?

michal


Dulmandakh Sukhbaatar wrote:

> Thanks. How can I enable hypertransport in kernel or somewhere? Anyone 
> knows about NUMA? I read about it, and it seems technology for 
> multiprocessor systems. Thus I have single CPU, I don't need it. Right?
>
>>
>> On 27/07/2005, at 4:10 PM, Duncan wrote:
>>
>>>
>>> SMP is short for Symmetrical Multi-Processing.  Traditionally, it meant
>>> you had two CPUs.  However, hyperthreading is treated by the kernel 
>>> as two
>>> CPUs, which is why SMP must be enabled to get the hyperthreading 
>>> option.
>>> Note that the newest thing to come to x86/x86_64 is dual-core CPUs.  
>>> These
>>> CPUs actually have two logical CPUs in one package.  This is better 
>>> than
>>> hyperthreading because it's the real thing.
>>
>>
>> Actually, dual-core means they have two physical cores in one 
>> package. Two logical cores = hyperthreading. ;P
>> On that note, you want the AMD dual cores as well, because they are 
>> much better designed (they have the crossbar architecture all ready 
>> to drop in additional cores, whereas the current Intel dual-core are 
>> really ugly hacks and perform terribly compared to the AMD ones)
>> -- 
>> gentoo-amd64@gentoo.org mailing list
>
>
>

-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [gentoo-amd64] Re: amd64 and kernel configuration
  2005-07-27  6:19   ` NY Kwok
@ 2005-07-27  7:50     ` Dulmandakh Sukhbaatar
  2005-07-27  7:04       ` Michal Žeravík
  2005-07-27 10:13       ` [gentoo-amd64] " Duncan
  2005-07-27 10:18     ` Duncan
  1 sibling, 2 replies; 18+ messages in thread
From: Dulmandakh Sukhbaatar @ 2005-07-27  7:50 UTC (permalink / raw
  To: gentoo-amd64

Thanks. How can I enable hypertransport in kernel or somewhere? Anyone knows 
about NUMA? I read about it, and it seems technology for multiprocessor 
systems. Thus I have single CPU, I don't need it. Right? 

> 
> On 27/07/2005, at 4:10 PM, Duncan wrote:
>> 
>> SMP is short for Symmetrical Multi-Processing.  Traditionally, it meant
>> you had two CPUs.  However, hyperthreading is treated by the kernel as 
>> two
>> CPUs, which is why SMP must be enabled to get the hyperthreading option.
>> Note that the newest thing to come to x86/x86_64 is dual-core CPUs.  
>> These
>> CPUs actually have two logical CPUs in one package.  This is better than
>> hyperthreading because it's the real thing.
> 
> Actually, dual-core means they have two physical cores in one package. Two 
> logical cores = hyperthreading. ;P 
> 
> On that note, you want the AMD dual cores as well, because they are much 
> better designed (they have the crossbar architecture all ready to drop in 
> additional cores, whereas the current Intel dual-core are really ugly 
> hacks and perform terribly compared to the AMD ones) 
> 
> -- 
> gentoo-amd64@gentoo.org mailing list 
> 
 

-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-amd64] Re: amd64 and kernel configuration
  2005-07-27  7:04       ` Michal Žeravík
@ 2005-07-27  9:58         ` netpython
  2005-07-27 12:30           ` Brett Johnson
  2005-07-27 10:02         ` Duncan
  1 sibling, 1 reply; 18+ messages in thread
From: netpython @ 2005-07-27  9:58 UTC (permalink / raw
  To: gentoo-amd64

I have enabled SMP on my gentoo AMD64 system and my
box doesn't run any slower (or faster).


On 7/27/05, Michal Žeravík <michalz@olomouc.com> wrote:
> So does it mean I should enable SMP support for Athlon64 (winchester,
> venice) ?
> 
> michal
> 
> 
> Dulmandakh Sukhbaatar wrote:
> 
> > Thanks. How can I enable hypertransport in kernel or somewhere? Anyone
> > knows about NUMA? I read about it, and it seems technology for
> > multiprocessor systems. Thus I have single CPU, I don't need it. Right?
> >
> >>
> >> On 27/07/2005, at 4:10 PM, Duncan wrote:
> >>
> >>>
> >>> SMP is short for Symmetrical Multi-Processing.  Traditionally, it meant
> >>> you had two CPUs.  However, hyperthreading is treated by the kernel
> >>> as two
> >>> CPUs, which is why SMP must be enabled to get the hyperthreading
> >>> option.
> >>> Note that the newest thing to come to x86/x86_64 is dual-core CPUs.
> >>> These
> >>> CPUs actually have two logical CPUs in one package.  This is better
> >>> than
> >>> hyperthreading because it's the real thing.
> >>
> >>
> >> Actually, dual-core means they have two physical cores in one
> >> package. Two logical cores = hyperthreading. ;P
> >> On that note, you want the AMD dual cores as well, because they are
> >> much better designed (they have the crossbar architecture all ready
> >> to drop in additional cores, whereas the current Intel dual-core are
> >> really ugly hacks and perform terribly compared to the AMD ones)
> >> --
> >> gentoo-amd64@gentoo.org mailing list
> >
> >
> >
> 
> --
> gentoo-amd64@gentoo.org mailing list
> 
>

-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [gentoo-amd64]  Re: Re: amd64 and kernel configuration
  2005-07-27  7:04       ` Michal Žeravík
  2005-07-27  9:58         ` netpython
@ 2005-07-27 10:02         ` Duncan
  1 sibling, 0 replies; 18+ messages in thread
From: Duncan @ 2005-07-27 10:02 UTC (permalink / raw
  To: gentoo-amd64

Michal Zeravik posted <42E73212.2010202@olomouc.com>, excerpted below, 
on Wed, 27 Jul 2005 09:04:50 +0200:

> So does it mean I should enable SMP support for Athlon64 (winchester,
> venice) ?

Only if it's dual-core.  I haven't kept track of which code names are, but
I don't believe those are, or it'd be advertised all over and you'd
definitely know it.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman in
http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html


-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [gentoo-amd64]  Re: amd64 and kernel configuration
  2005-07-27  7:50     ` Dulmandakh Sukhbaatar
  2005-07-27  7:04       ` Michal Žeravík
@ 2005-07-27 10:13       ` Duncan
  2005-07-27 10:27         ` Paolo Ripamonti
  2005-07-27 10:46         ` [gentoo-amd64] " Drew Kirkpatrick
  1 sibling, 2 replies; 18+ messages in thread
From: Duncan @ 2005-07-27 10:13 UTC (permalink / raw
  To: gentoo-amd64

Dulmandakh Sukhbaatar posted <20050727075012.79549.qmail@mail.mng.mn>,
excerpted below,  on Wed, 27 Jul 2005 15:50:12 +0800:

> Thanks. How can I enable hypertransport in kernel or somewhere? Anyone
> knows about NUMA? I read about it, and it seems technology for
> multiprocessor systems. Thus I have single CPU, I don't need it. Right?

NUMA is indeed for multi-processor systems.  NUMA is Non-Uniform Memory
Architecture.  With AMD CPUs that have the memory controller on the same
chip as the CPU, that means that each CPU can control it's own memory.  If
you run NUMA mode in this case (and if your BIOS is set up accordingly),
the kernel will try to keep the memory for each task in the memory handled
by, local to, that CPU.  If either the kernel or BIOS is set to unified
memory, or if you only have memory sticks in the slots for one of the
CPUs, then you won't get NUMA mode and the kernel won't care what memory
addresses the memory for each process lives at.

AFAIK, hypertransport is automatically handled by your choice of chipset. 
If the chipset you configure has it, it will be enabled, if not, it won't.
I was therefore a bit puzzled when you mentioned hypertransport
specifically in the previous post, since I don't believe there's a
specific kernel option for it.  (It's possible, however, that there is and
I've just forgotten about it, since it's been awhile since I reviewed
the settings for the entire kernel -- I just run make oldconfig and deal
with any new options in each newer kernel, and additionally do any
specific tweaking I might want to try.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman in
http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html

-- 
gentoo-amd64@gentoo.org mailing list

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [gentoo-amd64]  Re: Re: amd64 and kernel configuration
  2005-07-27  6:19   ` NY Kwok
  2005-07-27  7:50     ` Dulmandakh Sukhbaatar
@ 2005-07-27 10:18     ` Duncan
  1 sibling, 0 replies; 18+ messages in thread
From: Duncan @ 2005-07-27 10:18 UTC (permalink / raw
  To: gentoo-amd64

NY Kwok posted <25f58b7910e09fd5453bb3ec534330d1@xsmail.com>, excerpted
below,  on Wed, 27 Jul 2005 16:19:42 +1000:

> Actually, dual-core means they have two physical cores in one package. Two
> logical cores = hyperthreading. ;P

Absolutely correct.  Minor brain fart, there, as they say.  <g>  Thanks
for catching that and correcting it!  =8^)  

(I was trying to emphasize that with the AMD design, at least, it's still
a single piece of silicon, only with two functionally separate cores, like
two CPUs in one but cooperating a bit better than two entirely separate
CPUs would, and chose the wrong wording to convey what I wanted, so ended
up conveying something rather different, instead. =8^\ )

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman in
http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html

-- 
gentoo-amd64@gentoo.org mailing list

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-amd64] Re: amd64 and kernel configuration
  2005-07-27 10:13       ` [gentoo-amd64] " Duncan
@ 2005-07-27 10:27         ` Paolo Ripamonti
  2005-07-27 14:19           ` [gentoo-amd64] " Duncan
  2005-07-27 10:46         ` [gentoo-amd64] " Drew Kirkpatrick
  1 sibling, 1 reply; 18+ messages in thread
From: Paolo Ripamonti @ 2005-07-27 10:27 UTC (permalink / raw
  To: gentoo-amd64

On 7/27/05, Duncan <1i5t5.duncan@cox.net> wrote:
> I don't believe there's a
> specific kernel option for it.  (It's possible, however, that there is and
Spent my morning browsing make menuconfig, well... there is no voice
regarding hypertransport so I guess you're right (unless it's time for
me to take a sight exam :-P)

-- 
Paolo Ripamonti
e-mail paolo.ripamonti@gmail.com
web-site http://paoloripamonti.too.it
### To err is human, to moo bovine! ###

-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-amd64] Re: amd64 and kernel configuration
  2005-07-27 10:13       ` [gentoo-amd64] " Duncan
  2005-07-27 10:27         ` Paolo Ripamonti
@ 2005-07-27 10:46         ` Drew Kirkpatrick
  2005-07-27 15:42           ` [gentoo-amd64] " Duncan
  1 sibling, 1 reply; 18+ messages in thread
From: Drew Kirkpatrick @ 2005-07-27 10:46 UTC (permalink / raw
  To: gentoo-amd64

Just to point out, amd was calling the opterons and such more of a
SUMO configuration (Sufficiently Uniform Memory Organization, not
joking here), instead of NUMA. Whereas technically, it clearly is a
NUMA system, the differences in latency when accessing memory from a
bank attached to another processors memory controller is very small.
Small enough to be largely ignored, and treated like uniform memory
access latencies in a SMP system. Sorta in between SMP unified style
memory access and NUMA. This holds for up to 3 hypertranport link
hops, or up to 8 chips/sockets. You add hypertransport switches to
scale over 8 chips/sockets, it'll most likely be a different story...

What I've always wondered is, the NUMA code in the linux kernel, is
this for handling traditional NUMA, like in a large computer system
(big iron) where NUMA memory access latencies will vary greatly, or is
it simply for optimizing the memory usage across the memory banks.
Keeping data in the memory of the processor using it, etc, etc. Of
course none of this matters for single chip/socket amd systems, as
dual cores as well as single cores share a memory controller. Hmm,
maybe I should drink some coffee and shutup until I'm awake...

On 7/27/05, Duncan <1i5t5.duncan@cox.net> wrote:
> Dulmandakh Sukhbaatar posted <20050727075012.79549.qmail@mail.mng.mn>,
> excerpted below,  on Wed, 27 Jul 2005 15:50:12 +0800:
> 
> > Thanks. How can I enable hypertransport in kernel or somewhere? Anyone
> > knows about NUMA? I read about it, and it seems technology for
> > multiprocessor systems. Thus I have single CPU, I don't need it. Right?
> 
> NUMA is indeed for multi-processor systems.  NUMA is Non-Uniform Memory
> Architecture.  With AMD CPUs that have the memory controller on the same
> chip as the CPU, that means that each CPU can control it's own memory.  If
> you run NUMA mode in this case (and if your BIOS is set up accordingly),
> the kernel will try to keep the memory for each task in the memory handled
> by, local to, that CPU.  If either the kernel or BIOS is set to unified
> memory, or if you only have memory sticks in the slots for one of the
> CPUs, then you won't get NUMA mode and the kernel won't care what memory
> addresses the memory for each process lives at.
> 
> AFAIK, hypertransport is automatically handled by your choice of chipset.
> If the chipset you configure has it, it will be enabled, if not, it won't.
> I was therefore a bit puzzled when you mentioned hypertransport
> specifically in the previous post, since I don't believe there's a
> specific kernel option for it.  (It's possible, however, that there is and
> I've just forgotten about it, since it's been awhile since I reviewed
> the settings for the entire kernel -- I just run make oldconfig and deal
> with any new options in each newer kernel, and additionally do any
> specific tweaking I might want to try.)
> 
> --
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman in
> http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html
> 
> 
> --
> gentoo-amd64@gentoo.org mailing list
> 
>

-- 
gentoo-amd64@gentoo.org mailing list

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-amd64] Re: amd64 and kernel configuration
  2005-07-27  9:58         ` netpython
@ 2005-07-27 12:30           ` Brett Johnson
  2005-07-27 15:58             ` [gentoo-amd64] " Duncan
  0 siblings, 1 reply; 18+ messages in thread
From: Brett Johnson @ 2005-07-27 12:30 UTC (permalink / raw
  To: gentoo-amd64

netpython wrote:
> I have enabled SMP on my gentoo AMD64 system and my
> box doesn't run any slower (or faster).
> 

As stated earlier by Duncan (in what I thought was a great explanation!);

"If you are using only a single-core AMD64, you'll want SMP
off, because altho the kernel will work with it on, it'll be more
bloated than it needs to be."

This just means the physical size of the kernel will be larger than it
needs to be, and consume more memory. It will have no impact on overall
system performance.
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [gentoo-amd64]  Re: Re: amd64 and kernel configuration
  2005-07-27 10:27         ` Paolo Ripamonti
@ 2005-07-27 14:19           ` Duncan
  2005-07-27 14:31             ` Paolo Ripamonti
  0 siblings, 1 reply; 18+ messages in thread
From: Duncan @ 2005-07-27 14:19 UTC (permalink / raw
  To: gentoo-amd64

Paolo Ripamonti posted <40bdd4fd0507270327575194a8@mail.gmail.com>,
excerpted below,  on Wed, 27 Jul 2005 12:27:18 +0200:

> On 7/27/05, Duncan <1i5t5.duncan@cox.net> wrote:
>> I don't believe there's a
>> specific kernel option for it.  (It's possible, however, that there is
>
> Spent my morning browsing make menuconfig, well... there is no voice
> regarding hypertransport so I guess you're right (unless it's time for me
> to take a sight exam :-P)

A likely more effective method of at least confirming whether there is
such a thing, if you don't already know where it is, would be to
view/pager/edit the .config file, since that puts it in flat format,
meaning you just have to go down the list, rather than browsing thru all
that nesting...

Of course, grepping for "hypertransport" (case insensitive) might be even
more useful, and quicker.  Quickly opening a konsole to /usr/src/linux,
firing up mc, and doing a search on the term in question, yields a
surprising number of hits (this hypertransport in the kernel stuff is new
info to me as well). Hypertransport is as I mentioned an AMD connection
technology, but they've created a more or less open standard out of it,
and apparently, a decent enough number of MIPS and PPC hardware platforms
use it, that there is support in the Linux kernel on those platforms for
it.  From the quick search I just did, it appears the kernel DOES have a
CONFIG_HYPERTRANSPORT option, but it only appears on MIPS, as a sub-option
dependent on the Yosemite chipset/platform/whatever-it-may-be.  Apparently
on x86/x86_64/ppc hypertransport itself isn't an option, but something
that you either have or don't have, based on the characteristics of the
chipset drivers chosen.

Quite interesting, I must say.  I had known AMD had opened the standard
and tried to make it a public one, but wasn't aware that some ppc and mips
platforms had incorporated it, so I'm learning something new in all this,
myself!

That's actually one of the reasons I so enjoy newsgroups and lists such as
this -- I never know when a question will come up that'll twist my
inquisitiveness of into areas I would have never explored on my own, and
I'll learn something either directly from the post content, or from my own
exploration stimulated by the content of the post.  Those unexpected
"ah-ha!" moments, as the new idea clicks into place, filling an
information void I didn't know existed, are something I crave, in large
part /because/ they are sourced outside of myself, therefore something I
wouldn't ordinarily stumble across in my own semi-structured meanderings
in search of information.

So very cool, you guys stimulated me to learn something I would have
missed on my own, today! =8^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman in
http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html

-- 
gentoo-amd64@gentoo.org mailing list

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-amd64] Re: Re: amd64 and kernel configuration
  2005-07-27 14:19           ` [gentoo-amd64] " Duncan
@ 2005-07-27 14:31             ` Paolo Ripamonti
  2005-07-27 16:16               ` [gentoo-amd64] " Duncan
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Ripamonti @ 2005-07-27 14:31 UTC (permalink / raw
  To: gentoo-amd64

On 7/27/05, Duncan <1i5t5.duncan@cox.net> wrote:

> A likely more effective method of at least confirming whether there is
> such a thing, if you don't already know where it is, would be to
> view/pager/edit the .config file, since that puts it in flat format,
> meaning you just have to go down the list, rather than browsing thru all
> that nesting...
> <snip>
O'course I've browsed all only because I was searching for that little
stupid pcspeaker module that has recently been moved in a new gracious
position I nevere remember, and while surfing I noticed no ht... (I'm
not so masochistic ;-) )
But me too, I must absolutely thanks all of you guys for this really
interesting thread!
This is the moment when you love even the word mailing-list!
Cheers!

-- 
Paolo Ripamonti
e-mail paolo.ripamonti@gmail.com
web-site http://paoloripamonti.too.it
### To err is human, to moo bovine! ###

-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [gentoo-amd64]  Re: Re: amd64 and kernel configuration
  2005-07-27 10:46         ` [gentoo-amd64] " Drew Kirkpatrick
@ 2005-07-27 15:42           ` Duncan
  2005-07-27 17:07             ` Jean.Borsenberger
  0 siblings, 1 reply; 18+ messages in thread
From: Duncan @ 2005-07-27 15:42 UTC (permalink / raw
  To: gentoo-amd64

Drew Kirkpatrick posted <81469e8e0507270346445f4363@mail.gmail.com>,
excerpted below,  on Wed, 27 Jul 2005 05:46:28 -0500:

> Just to point out, amd was calling the opterons and such more of a SUMO
> configuration (Sufficiently Uniform Memory Organization, not joking here),
> instead of NUMA. Whereas technically, it clearly is a NUMA system, the
> differences in latency when accessing memory from a bank attached to
> another processors memory controller is very small. Small enough to be
> largely ignored, and treated like uniform memory access latencies in a SMP
> system. Sorta in between SMP unified style memory access and NUMA. This
> holds for up to 3 hypertranport link hops, or up to 8 chips/sockets. You
> add hypertransport switches to scale over 8 chips/sockets, it'll most
> likely be a different story...

I wasn't aware of the AMD "SUMO" moniker, but it /does/ make sense, given
the design of the hardware.  They have a very good point, that while it's
physically NUMA, the latencies variances are so close to unified that in
many ways it's indistinguishable -- except for the fact that keeping it
NUMA means allowing independent access of two different apps running on
two different CPUs, to their own memory in parallel, rather than one
having to wait for the other, if the memory were interleaved and unified
(as it would be for quad channel access, if that were enabled).

> What I've always wondered is, the NUMA code in the linux kernel, is this
> for handling traditional NUMA, like in a large computer system (big iron)
> where NUMA memory access latencies will vary greatly, or is it simply for
> optimizing the memory usage across the memory banks. Keeping data in the
> memory of the processor using it, etc, etc. Of course none of this matters
> for single chip/socket amd systems, as dual cores as well as single cores
> share a memory controller. Hmm, maybe I should drink some coffee and
> shutup until I'm awake...

Well, yeah, for single-socket/dual-core, but what about dual socket
(either single core or dual core)?  Your questions make sense there, and
that's what I'm running (single core, tho upgrading to dual core for a
quad-core total board sometime next year, would be very nice, and just
might be within the limits of my budget), so yes, I'm rather interested!

The answer to your question on how the kernel deals with it, by my
understanding, is this:  The Linux kernel SMP/NUMA architecture allows for
"CPU affinity grouping".  In earlier kernels, it was all automated, but
they are actually getting advanced enough now to allow deliberate manual
splitting of various groups, and combined with userspace control
applications, will ultimately be able to dynamically assign processes to
one or more CPU groups of various sizes, controlling the CPU and memory
resources available to individual processes.  So, yes, I guess that means
it's developing some pretty "big iron" qualities, altho many of them are
still in flux and won't be stable at least in mainline for another six
months or a year, at minimum.

Let's refocus now back on the implementation and the smaller picture once
again, to examine these "CPU affinity zones" in a bit more detail.  The
following is according to the writeups I've seen, mostly on LWN's weekly
kernel pages.   (Jon Corbet, LWN editor, does a very good job of balancing
the technical kernel hacker level stuff with the middle-ground
not-too-technical kernel follower stuff, good enough that I find the site
useful enough to subscribe, even tho I could get even the premium content
a week later for free.  Yes, that's an endorsement of the site, because
it's where a lot of my info comes from, and I'm certainly not one to try
to keep my knowledge exclusive!)

Anyway... from mainly that source...  CPU affinity zones work with sets
and supersets of processors.  An Intel hyperthreading pair of virtual
processors on the same physical processor will be at the highest affinity
level, the lowest level aka strongest grouping in the hierarchy, because
they share the same cache memory all the way up to L1 itself, and the
Linux kernel can switch processes between the two virtual CPUs of a
hyperthreaded CPU with zero cost or loss in performance, therefore only
taking into account the relative balance of processes on each of the
hyperthread virtual CPUs.

At the next lowest level affinity, we'd have the dual-core AMDs, same
chip, same memory controller, same local memory, same hypertransport
interfaces to the chipset, other CPUs and the rest of the world, and very
tightly cooperative, but with separate L2 and of course separate L1 cache.
There's a slight performance penalty between switching processes between
these CPUs, due to the cache flushing it would entail, but it's only very
slight and quite speedy, so thread imbalance between the two processors
doesn't have to get bad at all, before it's worth it to switch the CPUs to
maintain balance, even at the cost of that cache flush.

At a slightly lower level of affinity would be the Intel dual cores, since
they aren't quite so tightly coupled, and don't share all the same
interfaces to the outside world.  In practice, since only one of these,
the Intel dual core or the AMD dual core, will normally be encountered in
real life, they can be treated at the same level, with possibly a small
internal tweak to the relative weighting of thread imbalance vs
performance loss for switching CPUs, based on which one is actually in
place.

Here things get interesting, because of the different implementations
available.  AMD's 2-way thru 8-way Opterons configured for unified memory
access would be first, because again, their dedicated inter-CPU
hypertransport links let them cooperate closer than conventional
multi-socket CPUs would.  Beyond that, it's a tossup between Intel's
unified memory multi-processors and AMD's NUMA/SUMO memory Opterons.  I'd
still say the Opterons cooperate closer, even in NUMA/SUMO mode, than
Intel chips will with unified memory, due to that SUMO aspect.  At the
same time, they have the parallel memory access advantages of NUMA.

Beyond that, there's several levels of clustering, local/board, off-board
but short-fat-pipe accessible (using technologies such as PCI
interconnect, fibre-channel, and that SGI interconnect tech IDR the name
of ATM), conventional (and Beowulf?) type clustering, and remote
clustering. At each of these levels, as with the above, the cost to switch
processes between peers at the same affinity level gets higher and higher,
so the corresponding process imbalance necessary to trigger a switch
likewise gets higher and higher, until at the extreme of remote
clustering, it's almost done manually only. or anyway at the level of a
user level application managing the transfers, rather than the kernel,
directly (since, after all, with remote clustering, each remote group is
probably running its own kernel, if not individual machines within that
group).

So, the point of all that is that the kernel sees a hierarchical grouping
of CPUs, and is designed with more flexibility to balance processes and
memory use at the extreme affinity end, and more hesitation to balance it
due to the higher cost involved, at the extremely low affinity end.  The
main writeup I read on the subject dealt with thread/process CPU
switching, not memory switching, but within the context of NUMA, the
principles become so intertwined it's impossible to separate them, and the
writeup very clearly made the point that the memory issues involved in
making the transfer were included in the cost accounting as well.

I'm not sure whether this addressed the point you were trying to make, or
hit beside it, but anyway, it was fun trying to put into text for the
first time since I read about it, the principles in that writeup, along
with other facts I've merged along the way.  My dad's a teacher, and I
remember him many times making the point that the best way to learn
something is to attempt to teach it.  He used that principle in his own
classes, having the students help each other, and I remember him making
the point about himself as well, at one point, as he struggled to teach
basic accounting principles based only on a textbook and the single
college intro level class he had himself taken years before, when he found
himself teaching a high school class on the subject.  The principle is
certainly true, as by explaining the affinity clustering principles here,
it has forced me to ensure they form a reasonable and self consistent
infrastructure in my own head, in ordered to be able to explain it in the
post.  So, anyway, thanks for the intellectual stimulation!  <g>

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman in
http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html

-- 
gentoo-amd64@gentoo.org mailing list

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [gentoo-amd64]  Re: Re: amd64 and kernel configuration
  2005-07-27 12:30           ` Brett Johnson
@ 2005-07-27 15:58             ` Duncan
  0 siblings, 0 replies; 18+ messages in thread
From: Duncan @ 2005-07-27 15:58 UTC (permalink / raw
  To: gentoo-amd64

Brett Johnson posted <42E77E5B.4040604@blzj.com>, excerpted below,  on
Wed, 27 Jul 2005 07:30:19 -0500:

> netpython wrote:
>> I have enabled SMP on my gentoo AMD64 system and my box doesn't run any
>> slower (or faster).
>> 
>> 
> As stated earlier by Duncan (in what I thought was a great explanation!);
> 
> "If you are using only a single-core AMD64, you'll want SMP off, because
> altho the kernel will work with it on, it'll be more bloated than it needs
> to be."
> 
> This just means the physical size of the kernel will be larger than it
> needs to be, and consume more memory. It will have no impact on overall
> system performance.

Exactly so, because when the kernel doesn't detect a second CPU, it'll
disable most of the SMP code and not even touch it, therefore not
affecting performance.

The only exception is the size of the kernel.  Kernel memory is locked
memory -- it cannot be swapped out.  Therefore, a kernel larger than it
has to be means less real memory available for other things, and more
swapping and/or less caching than would otherwise be necessary.  The
effect isn't normally large enough to notice, but it /might/ mean
occasionally waiting an extra few seconds for a swapped out app to load,
or a file to be read from disk that otherwise would have still been in
memory cache, were it not for that additional and entirely unused kernel
bloat.

BTW, that's also a good reason to keep drivers you don't use very often,
likely floppy drivers, perhaps CD/DVD drivers and their filesystems,
perhaps FAT filesystems, perhaps printers drivers and/or anything related
such as parport drivers, perhaps scanner drivers, etc...  keep them all
compiled as modules, and only load those modules when needed, unloading
them later.  A loaded kernel module is part of the kernel, and as such,
again, locked memory, not swappable.  If you only use your floppy drive
once a month, and only use the FAT filesystem when accessing the floppy,
it simply makes no sense to compile it built-in to the kernel, or to keep
those modules loaded when not in use.  Far better to free that memory, so
it may be used by something you are actually /using/.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman in
http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html

-- 
gentoo-amd64@gentoo.org mailing list

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [gentoo-amd64]  Re: Re: Re: amd64 and kernel configuration
  2005-07-27 14:31             ` Paolo Ripamonti
@ 2005-07-27 16:16               ` Duncan
  0 siblings, 0 replies; 18+ messages in thread
From: Duncan @ 2005-07-27 16:16 UTC (permalink / raw
  To: gentoo-amd64

Paolo Ripamonti posted <40bdd4fd0507270731319e86c2@mail.gmail.com>,
excerpted below,  on Wed, 27 Jul 2005 16:31:36 +0200:

> 'course I've browsed all only because I was searching for that little
> stupid pcspeaker module that has recently been moved in a new gracious
> position I nevere remember

LOL!  I've had similar experiences with settings I knew were there but
couldn't find, sometimes because they moved from where they were when I
configured them! I suspect most of us that configure our own kernel have
experienced the same frustration, over the years.

Still, I feel a bit better knowing I'm not the only one to get "lost",
occasionally!  <g>

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman in
http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html


-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [gentoo-amd64]  Re: Re: amd64 and kernel configuration
  2005-07-27 15:42           ` [gentoo-amd64] " Duncan
@ 2005-07-27 17:07             ` Jean.Borsenberger
  0 siblings, 0 replies; 18+ messages in thread
From: Jean.Borsenberger @ 2005-07-27 17:07 UTC (permalink / raw
  To: gentoo-amd64

	Well, may be it's SUMO, but when we swicth on the NUMA option for
the kernel of our quadri-pro - 16Gb opteron it did speed up the OPENMP
benchs between 20% to 30% (depending on the program considered).

Note: OPenMP is a FORTRAN variation in which you put paralelisation
directives, without boring of the implementation details, using a single
address space, for all instances of the user program.

Jean Borsenberger
tel: +33 (0)1 45 07 76 29
Observatoire de Paris Meudon
5 place Jules Janssen
92195 Meudon France

On Wed, 27 Jul 2005, Duncan wrote:

> Drew Kirkpatrick posted <81469e8e0507270346445f4363@mail.gmail.com>,
> excerpted below,  on Wed, 27 Jul 2005 05:46:28 -0500:
>
> > Just to point out, amd was calling the opterons and such more of a SUMO
> > configuration (Sufficiently Uniform Memory Organization, not joking here),
> > instead of NUMA. Whereas technically, it clearly is a NUMA system, the
> > differences in latency when accessing memory from a bank attached to
> > another processors memory controller is very small. Small enough to be
> > largely ignored, and treated like uniform memory access latencies in a SMP
> > system. Sorta in between SMP unified style memory access and NUMA. This
> > holds for up to 3 hypertranport link hops, or up to 8 chips/sockets. You
> > add hypertransport switches to scale over 8 chips/sockets, it'll most
> > likely be a different story...
>
> I wasn't aware of the AMD "SUMO" moniker, but it /does/ make sense, given
> the design of the hardware.  They have a very good point, that while it's
> physically NUMA, the latencies variances are so close to unified that in
> many ways it's indistinguishable -- except for the fact that keeping it
> NUMA means allowing independent access of two different apps running on
> two different CPUs, to their own memory in parallel, rather than one
> having to wait for the other, if the memory were interleaved and unified
> (as it would be for quad channel access, if that were enabled).
>
> > What I've always wondered is, the NUMA code in the linux kernel, is this
> > for handling traditional NUMA, like in a large computer system (big iron)
> > where NUMA memory access latencies will vary greatly, or is it simply for
> > optimizing the memory usage across the memory banks. Keeping data in the
> > memory of the processor using it, etc, etc. Of course none of this matters
> > for single chip/socket amd systems, as dual cores as well as single cores
> > share a memory controller. Hmm, maybe I should drink some coffee and
> > shutup until I'm awake...
>
> Well, yeah, for single-socket/dual-core, but what about dual socket
> (either single core or dual core)?  Your questions make sense there, and
> that's what I'm running (single core, tho upgrading to dual core for a
> quad-core total board sometime next year, would be very nice, and just
> might be within the limits of my budget), so yes, I'm rather interested!
>
> The answer to your question on how the kernel deals with it, by my
> understanding, is this:  The Linux kernel SMP/NUMA architecture allows for
> "CPU affinity grouping".  In earlier kernels, it was all automated, but
> they are actually getting advanced enough now to allow deliberate manual
> splitting of various groups, and combined with userspace control
> applications, will ultimately be able to dynamically assign processes to
> one or more CPU groups of various sizes, controlling the CPU and memory
> resources available to individual processes.  So, yes, I guess that means
> it's developing some pretty "big iron" qualities, altho many of them are
> still in flux and won't be stable at least in mainline for another six
> months or a year, at minimum.
>
> Let's refocus now back on the implementation and the smaller picture once
> again, to examine these "CPU affinity zones" in a bit more detail.  The
> following is according to the writeups I've seen, mostly on LWN's weekly
> kernel pages.   (Jon Corbet, LWN editor, does a very good job of balancing
> the technical kernel hacker level stuff with the middle-ground
> not-too-technical kernel follower stuff, good enough that I find the site
> useful enough to subscribe, even tho I could get even the premium content
> a week later for free.  Yes, that's an endorsement of the site, because
> it's where a lot of my info comes from, and I'm certainly not one to try
> to keep my knowledge exclusive!)
>
> Anyway... from mainly that source...  CPU affinity zones work with sets
> and supersets of processors.  An Intel hyperthreading pair of virtual
> processors on the same physical processor will be at the highest affinity
> level, the lowest level aka strongest grouping in the hierarchy, because
> they share the same cache memory all the way up to L1 itself, and the
> Linux kernel can switch processes between the two virtual CPUs of a
> hyperthreaded CPU with zero cost or loss in performance, therefore only
> taking into account the relative balance of processes on each of the
> hyperthread virtual CPUs.
>
> At the next lowest level affinity, we'd have the dual-core AMDs, same
> chip, same memory controller, same local memory, same hypertransport
> interfaces to the chipset, other CPUs and the rest of the world, and very
> tightly cooperative, but with separate L2 and of course separate L1 cache.
> There's a slight performance penalty between switching processes between
> these CPUs, due to the cache flushing it would entail, but it's only very
> slight and quite speedy, so thread imbalance between the two processors
> doesn't have to get bad at all, before it's worth it to switch the CPUs to
> maintain balance, even at the cost of that cache flush.
>
> At a slightly lower level of affinity would be the Intel dual cores, since
> they aren't quite so tightly coupled, and don't share all the same
> interfaces to the outside world.  In practice, since only one of these,
> the Intel dual core or the AMD dual core, will normally be encountered in
> real life, they can be treated at the same level, with possibly a small
> internal tweak to the relative weighting of thread imbalance vs
> performance loss for switching CPUs, based on which one is actually in
> place.
>
> Here things get interesting, because of the different implementations
> available.  AMD's 2-way thru 8-way Opterons configured for unified memory
> access would be first, because again, their dedicated inter-CPU
> hypertransport links let them cooperate closer than conventional
> multi-socket CPUs would.  Beyond that, it's a tossup between Intel's
> unified memory multi-processors and AMD's NUMA/SUMO memory Opterons.  I'd
> still say the Opterons cooperate closer, even in NUMA/SUMO mode, than
> Intel chips will with unified memory, due to that SUMO aspect.  At the
> same time, they have the parallel memory access advantages of NUMA.
>
> Beyond that, there's several levels of clustering, local/board, off-board
> but short-fat-pipe accessible (using technologies such as PCI
> interconnect, fibre-channel, and that SGI interconnect tech IDR the name
> of ATM), conventional (and Beowulf?) type clustering, and remote
> clustering. At each of these levels, as with the above, the cost to switch
> processes between peers at the same affinity level gets higher and higher,
> so the corresponding process imbalance necessary to trigger a switch
> likewise gets higher and higher, until at the extreme of remote
> clustering, it's almost done manually only. or anyway at the level of a
> user level application managing the transfers, rather than the kernel,
> directly (since, after all, with remote clustering, each remote group is
> probably running its own kernel, if not individual machines within that
> group).
>
> So, the point of all that is that the kernel sees a hierarchical grouping
> of CPUs, and is designed with more flexibility to balance processes and
> memory use at the extreme affinity end, and more hesitation to balance it
> due to the higher cost involved, at the extremely low affinity end.  The
> main writeup I read on the subject dealt with thread/process CPU
> switching, not memory switching, but within the context of NUMA, the
> principles become so intertwined it's impossible to separate them, and the
> writeup very clearly made the point that the memory issues involved in
> making the transfer were included in the cost accounting as well.
>
> I'm not sure whether this addressed the point you were trying to make, or
> hit beside it, but anyway, it was fun trying to put into text for the
> first time since I read about it, the principles in that writeup, along
> with other facts I've merged along the way.  My dad's a teacher, and I
> remember him many times making the point that the best way to learn
> something is to attempt to teach it.  He used that principle in his own
> classes, having the students help each other, and I remember him making
> the point about himself as well, at one point, as he struggled to teach
> basic accounting principles based only on a textbook and the single
> college intro level class he had himself taken years before, when he found
> himself teaching a high school class on the subject.  The principle is
> certainly true, as by explaining the affinity clustering principles here,
> it has forced me to ensure they form a reasonable and self consistent
> infrastructure in my own head, in ordered to be able to explain it in the
> post.  So, anyway, thanks for the intellectual stimulation!  <g>
>
> --
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman in
> http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html
>
>
> --
> gentoo-amd64@gentoo.org mailing list
>
>
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2005-07-27 17:09 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-27  6:29 [gentoo-amd64] amd64 and kernel configuration Dulmandakh Sukhbaatar
2005-07-27  6:10 ` [gentoo-amd64] " Duncan
2005-07-27  6:19   ` NY Kwok
2005-07-27  7:50     ` Dulmandakh Sukhbaatar
2005-07-27  7:04       ` Michal Žeravík
2005-07-27  9:58         ` netpython
2005-07-27 12:30           ` Brett Johnson
2005-07-27 15:58             ` [gentoo-amd64] " Duncan
2005-07-27 10:02         ` Duncan
2005-07-27 10:13       ` [gentoo-amd64] " Duncan
2005-07-27 10:27         ` Paolo Ripamonti
2005-07-27 14:19           ` [gentoo-amd64] " Duncan
2005-07-27 14:31             ` Paolo Ripamonti
2005-07-27 16:16               ` [gentoo-amd64] " Duncan
2005-07-27 10:46         ` [gentoo-amd64] " Drew Kirkpatrick
2005-07-27 15:42           ` [gentoo-amd64] " Duncan
2005-07-27 17:07             ` Jean.Borsenberger
2005-07-27 10:18     ` Duncan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox