public inbox for gentoo-amd64@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-amd64] Identifying CPUs in the kernel
@ 2007-06-22  9:30 Peter Humphrey
  2007-06-22 15:01 ` Hemmann, Volker Armin
  2007-06-22 15:02 ` [gentoo-amd64] " Duncan
  0 siblings, 2 replies; 14+ messages in thread
From: Peter Humphrey @ 2007-06-22  9:30 UTC (permalink / raw
  To: gentoo-amd64

I've been having trouble with the task scheduler in the kernel. I'm running 
BOINC from a manual installation (because portage builds a useless version, 
but that's another story) and as I have two CPUs I've told it to use them 
both. This used to work well on my previous motherboard, now defunct, but 
it doesn't on this Supermicro H8DCE.

I'm running gkrellm to show me what's happening in the system, including 
processor loads. Nice time is shown separately from user and system time, 
so I can easily see what BOINC's up to.

This is what happens: when BOINC starts up it starts two processes, which it 
thinks are going to occupy up to 100% of each processor's time. But both 
gkrellm and top show both processes running at 50% on CPU1, always that 
one, with CPU0 idling. Then, if I start an emerge or something, that 
divides its time more-or-less equally between the two processors with the 
BOINC processes still confined to CPU1.

Even more confusingly, sometimes top even disagrees with itself about the 
processor loadings, the heading lines showing one CPU loaded and the task 
lines showing the other.

Just occasionally, BOINC will start its processes properly, each using 100% 
of a CPU, but after a while it reverts spontaneously to its usual 
behaviour. I can't find anything in any log to coincide with the reversion.

I've tried all the versions of BOINC I can find, and I've tried all the 
available kernels, including vanilla-sources, with no change. I've also 
tried running folding@home instead of BOINC, and that behaves in the same 
way. I've talked to the BOINC people, who say they haven't seen this 
behaviour anywhere else and that it sounds like a problem with my 
particular configuration of components. I'm trying an installation of 
Kubuntu to see if that's any different, but it's a long process getting to 
an equivalent state so I can't report a result yet.

I'm beginning to think there must be a problem with my motherboard. Can 
anyone suggest something else for me to check?

-- 
Rgds
Peter Humphrey
Linux Counter 5290, Aug 93
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-amd64] Identifying CPUs in the kernel
  2007-06-22  9:30 [gentoo-amd64] Identifying CPUs in the kernel Peter Humphrey
@ 2007-06-22 15:01 ` Hemmann, Volker Armin
  2007-06-22 17:50   ` Peter Humphrey
  2007-06-22 15:02 ` [gentoo-amd64] " Duncan
  1 sibling, 1 reply; 14+ messages in thread
From: Hemmann, Volker Armin @ 2007-06-22 15:01 UTC (permalink / raw
  To: gentoo-amd64

On Freitag, 22. Juni 2007, Peter Humphrey wrote:

> I'm beginning to think there must be a problem with my motherboard. Can
> anyone suggest something else for me to check?

yes, wait for .22 kernel.

AFAIR There is a bugfix about scheduling on dual core cpus in the upcoming 
release.
Aside from that, if everything else is more or less spreaded equally, I would 
suspect BOINC and not the kernel. But you can try different schedulers (Con 
Kolivas' rsdl or what it is called or Ingo Molnar's cfs - not to confused 
with with io schedulers)  to see if there is a difference.
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [gentoo-amd64]  Re: Identifying CPUs in the kernel
  2007-06-22  9:30 [gentoo-amd64] Identifying CPUs in the kernel Peter Humphrey
  2007-06-22 15:01 ` Hemmann, Volker Armin
@ 2007-06-22 15:02 ` Duncan
  2007-06-22 18:10   ` Peter Humphrey
  2007-06-27  1:14   ` Joshua Hoblitt
  1 sibling, 2 replies; 14+ messages in thread
From: Duncan @ 2007-06-22 15:02 UTC (permalink / raw
  To: gentoo-amd64

Peter Humphrey <prh@gotadsl.co.uk> posted
200706221030.26924.prh@gotadsl.co.uk, excerpted below, on  Fri, 22 Jun
2007 10:30:26 +0100:

> This is what happens: when BOINC starts up it starts two processes,
> which it thinks are going to occupy up to 100% of each processor's time.
> But both gkrellm and top show both processes running at 50% on CPU1,
> always that one, with CPU0 idling. Then, if I start an emerge or
> something, that divides its time more-or-less equally between the two
> processors with the BOINC processes still confined to CPU1.
> 
> Even more confusingly, sometimes top even disagrees with itself about
> the processor loadings, the heading lines showing one CPU loaded and the
> task lines showing the other.
> 
> Just occasionally, BOINC will start its processes properly, each using
> 100% of a CPU, but after a while it reverts spontaneously to its usual
> behaviour. I can't find anything in any log to coincide with the
> reversion.

Was it you who posted about this before, or someone else?  If it wasn't 
you, take a look back thru the list a couple months, as it did come up 
previously.  You may have someone to compare notes with. =8^)

Separate processes or separate threads?  Two CPUs (um, two separate 
sockets) or two cores on the same CPU/socket?

Some or all of the following you likely already know, but hey, maybe 
it'll help someone else and it never hurts to throw it in anyway...

The kernel task scheduler uses CPU affinity, which is supposed to have a 
variable resistance to switching CPUs, and an preference for keeping a 
task on the CPU controlling its memory, given a NUMA architecture 
situation where there's local and remote memory, and a penalty to be paid 
for access to remote memory.

There are however differences in architecture between AMD (with its 
onboard memory controller and closer cooperation, both between cores and 
between CPUs on separate sockets, due to the direct Hypertransport links) 
and Intel (with it's off-chip controller and looser inter-core, inter-
chip, and inter-socket cooperation).  There's also differences in the way 
you can configure both the memory (thru the BIOS) and the kernel, for 
separate NUMA access or unified view memory.  If these settings don't 
match your actual physical layout, efficiency will be less than peak, 
either because there won't be enough resistance to a relatively high cost 
switching between CPUs/cores and memory, so they'll switch off frequently 
with little reason, incurring expensive delays each time, or because 
there's too much resistance and to much favor placed on what the kernel 
thinks is local vs remote memory, when it's all the same, and there is in 
fact very little cost to switching cores/CPUs.

Generally, if you have a single slot true dual core (Intel core-duo or 
any AMD dual core), you'll run a single memory controller with a single 
unified view on memory, and costs to switch cores will be relatively 
low.  You'll want to disable NUMA, and configure your kernel with a 
single scheduling domain.

If you have multiple slots or the early Intel pseudo-dual-cores, which 
were really two separate CPUs simply packaged together, with no special 
cooperation between them, you'll probably want them in separate 
scheduling domains.  If it's AMD with its onboard memory controllers, two 
sockets means two controllers, and you'll also want to consider NUMA, tho 
you can disable it and interleave your memory if you wish, for a unified 
memory view and higher bandwidth, but at the tradeoff of higher latency 
and less efficient memory access when separate tasks (each running on a 
CPU) both want to use memory at the same time.

If you are lucky enough to have four cores, it gets more complex, as 
currently, four-cores operate as two loosely cooperating pairs, with 
closer cooperation between cores of the same pair.  For highest 
efficiency there, you'll have two levels of scheduling domain, mirroring 
the tight local pair-partner cooperation with the rather looser 
cooperation between pairs.

In particular, you'll want to pay attention to the following kernel 
config settings under Processor type and features:

1) Symmetric multiprocessing support (CONFIG_SMP).  You probably have 
this set right or you'd not be using multiple CPUs/cores.

2) Under SMP, /possibly/ SMT (CONFIG_SCHED_SMT), tho for Intel only, and 
on the older Hyperthreading Netburst arch models.

3) Still under SMP, Multi-core scheduler support (CONFIG_SCHED_MC), if 
you have true dual cores.  Again, note that the first "dual core" Intel 
units were simply two separate CPUs in the same package, so you probably 
do NOT want this for them.

4) Non Uniform Memory Access (NUMA) Support (CONFIG_NUMA).  You probably 
do NOT want this on single-socket multi-cores, and on most Intel 
systems.  You probably DO want this on AMD multi-socket Opteron systems, 
BUT note that there may be BIOS settings for this as well.  It won't work 
so efficiently if the BIOS setting doesn't agree with the kernel setting.

5) If you have NUMA support enabled, you'll also want either Old style 
AMD Opteron NUMA detection (CONFIG_K8_NUMA) or (preferred) ACPI NUMA 
detection (CONFIG_X86_64_ACPI_NUMA).

6) Make sure you do *NOT* have NUMA emulation (CONFIG_NUMA_EMU) enabled.  
As the help for that option says, it's only useful for debugging.

What I'm wondering, of course, is whether you have NUMA turned on when 
you shouldn't, or don't have core scheduling turned on when you should, 
thus artificially increasing the resistance to switching cores/cpus and 
causing the stickiness.


Now for the process vs. thread stuff.  With NUMA turned on, especially if 
core scheduling is turned off, threads of the same app, accessing the 
same memory, will be more likely to be scheduled on the same processor.  
I don't know anything that will allow specifying processor per-thread, at 
least with the newer NPTL (Native POSIX Thread Library) threading.  With 
the older Linux threads model, each thread showed up as a separate 
process, with its own PID, and could therefore be accessed separately by 
the various scheduling tools.

If however, you were correct when you said BOINC starts two separate 
/processes/ (not threads), or if BOINC happens to use the older/heavier 
Linux threads model (which again will cause the threads to show up as 
separate processes), THEN you are in luck! =8^)

There are two scheduling utility packages that include utilities to tie 
processes to one or more specific processors.

sys-process/schedutils is what I have installed.  It's a collection of 
separate utilities, including taskset, by which I can tell the kernel 
which CPUs I want specific processes to run on.  This worked well for me 
since I was more interested in taskset than the other included utilities, 
and only had to learn the single simple command.  It does what I need it 
to do, and does it well. =8^)

If you prefer a single do-it-all scheduler-tool, perhaps easier to learn 
if you plan to fiddle with more than simply which CPU a process runs on, 
and want to learn it all at once, sys-process/schedtool may be more your 
style.

Hope that's of some help, even if part or all of it is review.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-amd64] Identifying CPUs in the kernel
  2007-06-22 15:01 ` Hemmann, Volker Armin
@ 2007-06-22 17:50   ` Peter Humphrey
  2007-06-22 18:18     ` Hemmann, Volker Armin
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Humphrey @ 2007-06-22 17:50 UTC (permalink / raw
  To: gentoo-amd64

On Friday 22 June 2007 16:01:56 Hemmann, Volker Armin wrote:
> On Freitag, 22. Juni 2007, Peter Humphrey wrote:
> > I'm beginning to think there must be a problem with my motherboard. Can
> > anyone suggest something else for me to check?
>
> yes, wait for .22 kernel.

The vanilla sources I've tried are .22 and they behave the same.

> AFAIR There is a bugfix about scheduling on dual core cpus in the
> upcoming release.

I forgot to say that these are dual sockets, not cores.

> Aside from that, if everything else is more or less spreaded equally, I
> would suspect BOINC and not the kernel. But you can try different
> schedulers (Con Kolivas' rsdl or what it is called or Ingo Molnar's cfs -
> not to confused with with io schedulers)  to see if there is a
> difference.

I'll look into that idea - thanks.

-- 
Rgds
Peter Humphrey
Linux Counter 5290, Aug 93
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-amd64]  Re: Identifying CPUs in the kernel
  2007-06-22 15:02 ` [gentoo-amd64] " Duncan
@ 2007-06-22 18:10   ` Peter Humphrey
  2007-06-22 23:47     ` Duncan
  2007-06-27  1:14   ` Joshua Hoblitt
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Humphrey @ 2007-06-22 18:10 UTC (permalink / raw
  To: gentoo-amd64

On Friday 22 June 2007 16:02:59 Duncan wrote:

> Was it you who posted about this before, or someone else?  If it wasn't
> you, take a look back thru the list a couple months, as it did come up
> previously.  You may have someone to compare notes with. =8^)

Nope, t'was I, and I've now tried lots more things but with no success.

> Separate processes or separate threads?  Two CPUs (um, two separate
> sockets) or two cores on the same CPU/socket?

Twin sockets, single Opteron 246 cores. And the BOINC scheduler sets off 
separate processes; they have distinct PIDs and on the old motherboard ran 
on distinct CPUs.

> There are [...] differences in architecture between AMD (with its onboard
> memory controller and closer cooperation, both between cores and between
> CPUs on separate sockets, due to the direct Hypertransport links) and
> Intel (with its off-chip controller and looser inter-core, inter- chip,
> and inter-socket cooperation).  There's also differences in the way you
> can configure both the memory (thru the BIOS) and the kernel, for separate
> NUMA access or unified view memory.  If these settings don't match your
> actual physical layout, efficiency will be less than peak, either because
> there won't be enough resistance to a relatively high cost switching
> between CPUs/cores and memory, so they'll switch off frequently 
> with little reason, incurring expensive delays each time, or because
> there's too much resistance and to much favor placed on what the kernel
> thinks is local vs remote memory, when it's all the same, and there is in
> fact very little cost to switching cores/CPUs. 

This board has eight DIMM sockets, four arranged next to each CPU socket and 
associated with it electrically and logically. I have four 1GB DIMMs, each 
in (I hope) the right pair of sockets in each bank of four. In other words, 
each CPU has 2GB of local RAM. I suppose I could buy as much RAM again and 
fill up all the sockets :-)

> If it's AMD with its onboard memory controllers, two sockets means two
> controllers, and you'll also want to consider NUMA, tho you can disable it
> and interleave your memory if you wish, for a unified memory view and
> higher bandwidth, but at the tradeoff of higher latency and less efficient
> memory access when separate tasks (each running on a CPU) both want to use
> memory at the same time. 

NUMA  is switched on in BIOS and kernel config. I still find some of the 
BIOS settings mysterious though, so perhaps I don't have it set up right. I 
have tried the two sets of defaults, failsafe and optimised, but with no 
effect on this problem.

> In particular, you'll want to pay attention to the following kernel config
> settings under Processor type and features: 
>
> 1) Symmetric multiprocessing support (CONFIG_SMP).  You probably have
> this set right or you'd not be using multiple CPUs/cores.

Yep.

> 2) Under SMP, /possibly/ SMT (CONFIG_SCHED_SMT), tho for Intel only, and
> on the older Hyperthreading Netburst arch models.

Nope.

> 3) Still under SMP, Multi-core scheduler support (CONFIG_SCHED_MC), if
> you have true dual cores.  Again, note that the first "dual core" Intel
> units were simply two separate CPUs in the same package, so you probably
> do NOT want this for them.

Nope.

> 4) Non Uniform Memory Access (NUMA) Support (CONFIG_NUMA) [...] You
> probably DO want this on AMD multi-socket Opteron systems, BUT note that
> there may be BIOS settings for this as well.  It won't work so efficiently
> if the BIOS setting doesn't agree with the kernel setting. 

Yep.

> 5) If you have NUMA support enabled, you'll also want either Old style
> AMD Opteron NUMA detection (CONFIG_K8_NUMA) or (preferred) ACPI NUMA
> detection (CONFIG_X86_64_ACPI_NUMA).

Tried both together and each separately. No difference.

> 6) Make sure you do *NOT* have NUMA emulation (CONFIG_NUMA_EMU) enabled.

No point in emulating something that's present.  :-)

> What I'm wondering, of course, is whether you have NUMA turned on when
> you shouldn't, or don't have core scheduling turned on when you should,
> thus artificially increasing the resistance to switching cores/cpus and
> causing the stickiness.

I don't think so.

> If [...] you were correct when you said BOINC starts two
> separate /processes/ (not threads),

I'm sure I was correct.

> or if BOINC happens to use the older/heavier Linux threads model (which
> again will cause the threads to show up as separate processes),

I can't be quite certain this isn't happening, but I'm nearly so.

> There are two scheduling utility packages that include utilities to tie
> processes to one or more specific processors. 
>
> sys-process/schedutils is what I have installed.  It's a collection of
> separate utilities, including taskset, by which I can tell the kernel
> which CPUs I want specific processes to run on.

> If you prefer a single do-it-all scheduler-tool, perhaps easier to learn
> if you plan to fiddle with more than simply which CPU a process runs on,
> and want to learn it all at once, sys-process/schedtool may be more your
> style.

I'll look into those - thanks.

-- 
Rgds
Peter Humphrey
Linux Counter 5290, Aug 93
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-amd64] Identifying CPUs in the kernel
  2007-06-22 17:50   ` Peter Humphrey
@ 2007-06-22 18:18     ` Hemmann, Volker Armin
  0 siblings, 0 replies; 14+ messages in thread
From: Hemmann, Volker Armin @ 2007-06-22 18:18 UTC (permalink / raw
  To: gentoo-amd64

On Freitag, 22. Juni 2007, Peter Humphrey wrote:
> On Friday 22 June 2007 16:01:56 Hemmann, Volker Armin wrote:
> > On Freitag, 22. Juni 2007, Peter Humphrey wrote:
> > > I'm beginning to think there must be a problem with my motherboard. Can
> > > anyone suggest something else for me to check?
> >
> > yes, wait for .22 kernel.
>
> The vanilla sources I've tried are .22 and they behave the same.

.22 is not released yet - and the rcX seem to be badly broken in the 
filesystem area.

There are reported problems with xfs, ext3 and reiserfs using systems. I would 
stay away from them until 2.6.22 is released.


> > AFAIR There is a bugfix about scheduling on dual core cpus in the
> > upcoming release.
>
> I forgot to say that these are dual sockets, not cores.
>
> > Aside from that, if everything else is more or less spreaded equally, I
> > would suspect BOINC and not the kernel. But you can try different
> > schedulers (Con Kolivas' rsdl or what it is called or Ingo Molnar's cfs -
> > not to confused with with io schedulers)  to see if there is a
> > difference.
>
> I'll look into that idea - thanks.

http://people.redhat.com/mingo/cfs-scheduler/

for Con's patch use google ;)
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [gentoo-amd64]  Re: Identifying CPUs in the kernel
  2007-06-22 18:10   ` Peter Humphrey
@ 2007-06-22 23:47     ` Duncan
  2007-06-23  8:16       ` Peter Humphrey
  0 siblings, 1 reply; 14+ messages in thread
From: Duncan @ 2007-06-22 23:47 UTC (permalink / raw
  To: gentoo-amd64

Peter Humphrey <prh@gotadsl.co.uk> posted
200706221910.44194.prh@gotadsl.co.uk, excerpted below, on  Fri, 22 Jun
2007 19:10:44 +0100:

>> What I'm wondering, of course, is whether you have NUMA turned on when
>> you shouldn't, or don't have core scheduling turned on when you should,
>> thus artificially increasing the resistance to switching cores/cpus and
>> causing the stickiness.
> 
> I don't think so.

Yeah, now that you've clarified that it's sockets and confirmed settings, 
you seem to have it right.

On the BIOS settings, some of them will affect whether the board can use 
all four gigs memory as well, by controlling how it arranges the address 
space and whether there's a hole left between 3.5 and 4 gig for 32-bit 
PCI hardware addressing or not.  I've a similar arrangement here on a 
Tyan s2885, only with two-gig sticks so 8 gig memory.  If you are seeing 
your full 4 gig memory, tho, you've got that set right, both in the 
kernel and in the BIOS.

The other BIOS settings of interest here are the access bitness/
interleaving.  If it's like mine, you'll be able to set 32-, 64-, or 128-
bit interleaving.  You'll want 64-bit, interleaving the sticks in the 
node for best bandwidth there, but not the nodes, so they can be used 
NUMA.  In ordered to actually get the 64-bit interleaved access, you'll 
need the sticks in paired slots on the node, however.  (1&2 or 3&4, not 
2&3 or separated.)  But it sounds like you have that as well.

Finally, there's the question of how the rest of the system connects to 
the sockets.  Here, everything except memory all connects to the first 
socket (CPU0), so the system can run in single socket mode.  However, 
that means anything doing heavy I/O or the like, including 3D video 
access, runs most efficient on CPU0.  In particular, using taskset 
(mentioned in what I snipped), I've noticed that even in 2D mode but with 
Composite on, X takes several percentage points more CPU when it's 
scheduled on CPU1 than it does when it's allowed to run on CPU0.  CPU1 
works best with CPU or hard drive or other comparatively slow I/O bound 
processes, the former since it doesn't matter which CPU for them, the 
latter since the I/O is slow enough it's the bottleneck in any case. If 
your board is laid out similarly, when you are playing around with 
taskset or the like, it's worth keeping that in mind.

If as you say, BOINC is running separate processes, than scheduling it 
with taskset should be possible and do what you need to do.  The only 
caveat would be if the processes terminate and restart.  You may need to 
hack a script up to run from cron, to check every minute or 10 or 
whatever, depending on how long the BOINC tasks last, to keep them 
scheduled on separate CPUs.  I have a particular game (Master of Orion 
original, the only non-source based software I still run) I run in DOSBOX 
emulation.  Mainly, I use taskset to set DOSBOX on CPU1, while X and 
anything else I'm running that uses significant CPU gets put on CPU0.  
That works VERY well, and has allowed me to increase the emulation speed 
dramatically over that possible before, when X and DOSBOX may have been 
running on the same CPU.  That's the big thing I use taskset for, but it 
works quite well for it. =8^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-amd64]  Re: Identifying CPUs in the kernel
  2007-06-22 23:47     ` Duncan
@ 2007-06-23  8:16       ` Peter Humphrey
  2007-06-23  9:52         ` Duncan
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Humphrey @ 2007-06-23  8:16 UTC (permalink / raw
  To: gentoo-amd64

On Saturday 23 June 2007 00:47:27 Duncan wrote:
> Peter Humphrey <prh@gotadsl.co.uk> posted
> 200706221910.44194.prh@gotadsl.co.uk, excerpted below, on  Fri, 22 Jun
>
> 2007 19:10:44 +0100:
> >> What I'm wondering, of course, is whether you have NUMA turned on when
> >> you shouldn't, or don't have core scheduling turned on when you
> >> should, thus artificially increasing the resistance to switching
> >> cores/cpus and causing the stickiness.
> >
> > I don't think so.
>
> Yeah, now that you've clarified that it's sockets and confirmed settings,
> you seem to have it right.

Here's an example of silly output from top. In this case I did this:
# schedtool -a 0x1 5280
to get 5280 onto CPU0, then when I didn't get any better loadings I restored 
the affinity to its original value:
# schedtool -a 0x3 5280

Here's what top showed then. Look at the /nice/ values on lines 3 and 4, and 
compare those with the %CPU and Processor fields of processes 5279 and 5280 
(sorry about the line wraps). This has me deeply puzzled:

top - 09:04:59 up 23 min,  5 users,  load average: 3.60, 4.79, 3.91
Tasks: 124 total,   2 running, 122 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.3%us,  0.3%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  
0.0%st
Cpu1  :  0.0%us,  0.3%sy, 99.7%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  
0.0%st
Mem:   4088968k total,  1822644k used,  2266324k free,   218296k buffers
Swap:  4176848k total,        0k used,  4176848k free,   735708k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
 5279 prh       34  19 60256  38m 3600 S   50  1.0   6:53.97 1 
setiathome-5.12
 5280 prh       34  19 60252  38m 3612 S   50  1.0   6:54.08 0 
setiathome-5.12
 3692 root      15   0  144m  63m 7564 S    0  1.6   0:36.92 0 X
 5272 prh       15   0  4464 2636 1692 S    0  0.1   0:00.70 1 boinc
 5286 prh       15   0 93016  21m  14m S    0  0.5   0:00.66 0 konsole
 5322 prh       15   0  145m  13m  10m S    0  0.3   0:03.01 0 gkrellm2
10357 root      15   0 10732 1340  964 R    0  0.0   0:00.01 1 top
[snip system processes]

I don't think this is a scheduling problem; it goes deeper, so that the 
kernel doesn't have a consistent picture of which processor is which.

-- 
Rgds
Peter Humphrey
Linux Counter 5290, Aug 93
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [gentoo-amd64]  Re: Identifying CPUs in the kernel
  2007-06-23  8:16       ` Peter Humphrey
@ 2007-06-23  9:52         ` Duncan
  2007-06-24 10:14           ` Peter Humphrey
  2007-07-13 14:01           ` Peter Humphrey
  0 siblings, 2 replies; 14+ messages in thread
From: Duncan @ 2007-06-23  9:52 UTC (permalink / raw
  To: gentoo-amd64

Peter Humphrey <prh@gotadsl.co.uk> posted
200706230916.07711.prh@gotadsl.co.uk, excerpted below, on  Sat, 23 Jun
2007 09:16:07 +0100:

> Here's what top showed then. Look at the /nice/ values on lines 3 and 4,
> and compare those with the %CPU and Processor fields of processes 5279
> and 5280 (sorry about the line wraps). This has me deeply puzzled:

Fixed the line wraps and removed a bit of extraneous information. =8^)
 
> top - 09:04:59 up 23 min, 5 users, load average: 3.60, 4.79, 3.91
> Tasks: 124 total, 2 running, 122 sleeping, 0 stopped, 0 zombie

> Cpu0: 0.3%us, 0.3%sy,  0.0%ni, 99.3%id, [zeroes]
> Cpu1: 0.0%us, 0.3%sy, 99.7%ni,  0.0%id, [zeroes]

> PID USER  PR NI S %CPU %MEM  TIME+   P COMMAND
> 5279 prh  34 19 S   50  1.0  6:53.97 1 setiathome-5.12
> 5280 prh  34 19 S   50  1.0  6:54.08 0 setiathome-5.12


> I don't think this is a scheduling problem; it goes deeper, so that the
> kernel doesn't have a consistent picture of which processor is which.

Critical question here, is that in SMP Irix or SMP Solaris mode?  (See
the top manpage if you don't know what I mean.)  Asked another way, is
that displaying percent of total CPU time (both CPUs) or percent of
total divided by number of CPUs (so percent of one CPU)?

If it's Irix mode (percent total CPU time), then it's reporting full
usage of both CPUs, one on each.  The CPU0 line would then be the one
screwed up, since it's reporting idle when it clearly has to be in use.

If it's Solaris mode (percent of a single CPU's time, so total of all
percentages should be 200% if you have two CPUs), then the CPUs
lines would seem to be correct, both processes would appear to be
running on CPU1, maxing it out, and the P column of the 5280 line
would have to be screwed up.  (That's assuming you let the figures
stabilize after the last schedtool call you made.)

In either case, I'm not sure where your bug is, but you are correct,
the problem appears to be way deeper than scheduling.  I'd guess it's
ultimately a kernel bug, possibly due to a hardware bug, possibly not,
but you might wish to file it on top initially, just to see if they've
seen similar and can tell you what's going on.  Unless you want to
double-check patching status yourself, you might as well file the bug
with Gentoo first, in case it's a Gentoo bug.  They'll probably end
up closing it "upstream", but at least then when you file it upstream,
you can say you've cleared it with Gentoo first. 

As for top, note that there's a trick you can use with it.  You'll
likely want to trim the memory columns etc as I did for your bug
report, but you may not want to mess up your regular config to do
so.  Not a problem! =8^)  Create a symlink to top called something
else (say topbug).  Then run it using the symlink, and you can change
and save your setttings, and it'll save them in a different rc file
(topbugrc using my example).  That way, you can run it with the bug
report settings when you want to, without messing up your regular
config.

Of course, don't forget to mention in your bug report whether you were
in Solaris or Irix SMP mode, because as I explained, it /does/ make a
difference.

Let me know how this goes, post the bug number when you file it or
whatever, as I'd like to follow it too.  You definitely have a
strange one here, and I'd /love/ to see what the real experts have
to say about it!  You are absolutely correct, it doesn't seem to
make any sense at all!

Good luck.  That's one /strange/ problem you have going there!
No /wonder/ you were expressing frustration earlier!

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-amd64]  Re: Identifying CPUs in the kernel
  2007-06-23  9:52         ` Duncan
@ 2007-06-24 10:14           ` Peter Humphrey
  2007-07-13 14:01           ` Peter Humphrey
  1 sibling, 0 replies; 14+ messages in thread
From: Peter Humphrey @ 2007-06-24 10:14 UTC (permalink / raw
  To: gentoo-amd64

On Saturday 23 June 2007 10:52:33 Duncan wrote:
> Peter Humphrey <prh@gotadsl.co.uk> posted
> > top - 09:04:59 up 23 min, 5 users, load average: 3.60, 4.79, 3.91
> > Tasks: 124 total, 2 running, 122 sleeping, 0 stopped, 0 zombie
> >
> > Cpu0: 0.3%us, 0.3%sy,  0.0%ni, 99.3%id, [zeroes]
> > Cpu1: 0.0%us, 0.3%sy, 99.7%ni,  0.0%id, [zeroes]
> >
> > PID USER  PR NI S %CPU %MEM  TIME+   P COMMAND
> > 5279 prh  34 19 S   50  1.0  6:53.97 1 setiathome-5.12
> > 5280 prh  34 19 S   50  1.0  6:54.08 0 setiathome-5.12
> >
> > I don't think this is a scheduling problem; it goes deeper, so that the
> > kernel doesn't have a consistent picture of which processor is which.
>
> Critical question here, is that in SMP Irix or SMP Solaris mode?  (See
> the top manpage if you don't know what I mean.)  Asked another way, is
> that displaying percent of total CPU time (both CPUs) or percent of
> total divided by number of CPUs (so percent of one CPU)?

That's another oddity. I press <I> (capital letter i) and /top/ says "Irix 
mode off" and shows half the previous percentage CPU in the process lines: 
25 in the example above. I then press <I> again and it says "Irix mode on" 
and shows the 50s again. Is this backwards, or is my utter confusion 
showing?  :-(

I want it to show:
  -	for each CPU, the percent to which it is loaded; and,
  -	for each process, how much of a CPU's time it is consuming.
The presence of two CPUs requires two CPU lines and allows for two lots of 
processes. That seems logical to me. Is it Irix mode or Solaris?

This morning /top/ is showing this:
--- 
top - 10:51:55 up  2:22,  5 users,  load average: 2.43, 2.34, 2.60
Tasks: 121 total,   4 running, 117 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.3%us,  0.3%sy,  0.0%ni, 99.3%id,  [zeroes]
Cpu1  :  0.0%us,  0.3%sy, 99.7%ni,  0.0%id,  [zeroes]

PID USER      PR  NI  S %CPU %MEM    TIME+  P COMMAND
5270 prh      34  19  S   50  0.9  67:55.08 1 setiathome-5.12
5271 prh      34  19  S   50  1.1  67:57.80 1 einstein_S5R2_4
--- 

So CPU1 is fully loaded and CPU0 is idling. Gkrellm shows the same. The box 
has been running for 2 hours from cold and I haven't tampered with 
anything; BOINC starts from /etc/init.d/local. When /top/ is behaving this 
way my problem seems to be one of scheduling, but I'm pretty sure it isn't.

> I'm not sure where your bug is, but [...] the problem appears to be way
> deeper than scheduling.  I'd guess it's ultimately a kernel bug, possibly
> due to a hardware bug

That's my thought too.

> [...] you might wish to file it on top initially, just to see if they've
> seen [anything] similar and can tell you what's going on.  Unless you want
> to double-check patching status yourself, you might as well file the bug
> with Gentoo first, in case it's a Gentoo bug.  They'll probably end up
> closing it "upstream", but at least then when you file it upstream, you
> can say you've cleared it with Gentoo first. 

I'll do that, but I'll wait a day or two to see what else comes up here.

> You definitely have a strange one here, and I'd /love/ to see what the
> real experts have to say about it! 

Mm, me too.

-- 
Rgds
Peter Humphrey
Linux Counter 5290, Aug 93
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-amd64]  Re: Identifying CPUs in the kernel
  2007-06-22 15:02 ` [gentoo-amd64] " Duncan
  2007-06-22 18:10   ` Peter Humphrey
@ 2007-06-27  1:14   ` Joshua Hoblitt
  1 sibling, 0 replies; 14+ messages in thread
From: Joshua Hoblitt @ 2007-06-27  1:14 UTC (permalink / raw
  To: gentoo-amd64

[-- Attachment #1: Type: text/plain, Size: 723 bytes --]

On Fri, Jun 22, 2007 at 03:02:59PM +0000, Duncan wrote:
> There are two scheduling utility packages that include utilities to tie 
> processes to one or more specific processors.
> 
> sys-process/schedutils is what I have installed.  It's a collection of 
> separate utilities, including taskset, by which I can tell the kernel 
> which CPUs I want specific processes to run on.  This worked well for me 
> since I was more interested in taskset than the other included utilities, 
> and only had to learn the single simple command.  It does what I need it 
> to do, and does it well. =8^)

schedutils was merged into util-linux. see http://rlove.org/

ionice is part of util-linux now as well...

-J

--

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-amd64]  Re: Identifying CPUs in the kernel
  2007-06-23  9:52         ` Duncan
  2007-06-24 10:14           ` Peter Humphrey
@ 2007-07-13 14:01           ` Peter Humphrey
  2007-07-13 14:41             ` Peter Humphrey
  2007-07-13 14:55             ` Duncan
  1 sibling, 2 replies; 14+ messages in thread
From: Peter Humphrey @ 2007-07-13 14:01 UTC (permalink / raw
  To: gentoo-amd64

On Saturday 23 June 2007 10:52, Duncan wrote:

> Let me know how this goes [...] as I'd like to follow it too.

Well, after quite a bit more work, I'm still none the wiser. I've built a 
new installation from scratch without the ~amd64 key word, and I've tried 
many combinations of kernel parameters. Once or twice I thought I'd found 
it: for instance, it seemed that removing I2C completely caused the faulty 
behaviour to appear, but then putting it back in didn't correct it.

The new system is as independent of the original as possible. The old one 
was on /dev/hd[a,b] and the new is on /dev/sda, which is a SATA disk. I've 
changed the BIOS to boot from SATA first, though I couldn't hide the old 
disks completely - the IDE optical disks /dev/hd[c,d] were then hidden as 
well.

I started with a genkernel kernel, remembering to enable SATA in it 
first  ;-) , then switched to compiling manually, progressively stripping 
out all the extraneous modules and inbuilt features until I had a 
reasonably well tuned kernel. it works just fine.

I also reverted the BIOS to an optimised set of defaults and changed one 
thing at a time to reach (what I think is) the optimum for me. The system 
is working as it should. Soon I'll erase the old partitions - when I'm sure 
I don't need anything else off them.

So I can't put a case together to raise a bug report, and I'll have to 
accept just-one-of-those-things.

-- 
Rgds
Peter.
Linux Counter 5290, Aug 93
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-amd64]  Re: Identifying CPUs in the kernel
  2007-07-13 14:01           ` Peter Humphrey
@ 2007-07-13 14:41             ` Peter Humphrey
  2007-07-13 14:55             ` Duncan
  1 sibling, 0 replies; 14+ messages in thread
From: Peter Humphrey @ 2007-07-13 14:41 UTC (permalink / raw
  To: gentoo-amd64

On Friday 13 July 2007 15:01, I wrote:

[various things]

I meant also to say that I started out on this new build using the same CD as 
I'd used before: 2006.1 minimal. When I'd installed a bare system it booted 
OK and seemed healthy enough, but then if I ran "emerge -e system", the 
resulting system would not load the Ethernet module properly, so I was 
incommunicado. I did this several times (until I tired of banging my head 
against that wall), including recompiling the kernel before the remerge, 
after it or both or neither, before giving up and downloading a 2007.0 ISO. 
That one worked fine.

This year is full of puzzles. I don't like mysteries.

-- 
Rgds
Peter.
Linux Counter 5290, Aug 93
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [gentoo-amd64]  Re: Identifying CPUs in the kernel
  2007-07-13 14:01           ` Peter Humphrey
  2007-07-13 14:41             ` Peter Humphrey
@ 2007-07-13 14:55             ` Duncan
  1 sibling, 0 replies; 14+ messages in thread
From: Duncan @ 2007-07-13 14:55 UTC (permalink / raw
  To: gentoo-amd64

Peter Humphrey <prh@gotadsl.co.uk> posted
200707131501.46783.prh@gotadsl.co.uk, excerpted below, on  Fri, 13 Jul
2007 15:01:46 +0100:

> So I can't put a case together to raise a bug report, and I'll have to
> accept just-one-of-those-things.

Yeah, that happens.  It can definitely be frustrating, especially since 
because you don't know what caused it, you haven't the foggiest whether 
it'll be back and when. =8^(  However, I've come to the conclusion that 
sometimes you just don't look that horse in the mouth, and chances are, 
it'll be your trusty steed for some time. =8^)

BTW, had a list connectivity glitch and didn't get this reply tacked on 
your last post, but re: Irix/Solaris mode, it seems the manpage says the 
reverse of what it actually does.  Having never used either of the named 
platforms, I can't say whether the manpage or the app is correct in what 
it claims for the modes, but yes, Irix mode ON seems to result in what 
the manpage calls Solaris mode, while Irix mode OFF results in what the 
manpage calls Irix mode.  So at least that part isn't unique to your 
(old) system, as that's the way it works here as well.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-07-13 14:59 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-22  9:30 [gentoo-amd64] Identifying CPUs in the kernel Peter Humphrey
2007-06-22 15:01 ` Hemmann, Volker Armin
2007-06-22 17:50   ` Peter Humphrey
2007-06-22 18:18     ` Hemmann, Volker Armin
2007-06-22 15:02 ` [gentoo-amd64] " Duncan
2007-06-22 18:10   ` Peter Humphrey
2007-06-22 23:47     ` Duncan
2007-06-23  8:16       ` Peter Humphrey
2007-06-23  9:52         ` Duncan
2007-06-24 10:14           ` Peter Humphrey
2007-07-13 14:01           ` Peter Humphrey
2007-07-13 14:41             ` Peter Humphrey
2007-07-13 14:55             ` Duncan
2007-06-27  1:14   ` Joshua Hoblitt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox