* [gentoo-amd64] Identifying CPUs in the kernel
@ 2007-06-22 9:30 Peter Humphrey
2007-06-22 15:01 ` Hemmann, Volker Armin
2007-06-22 15:02 ` [gentoo-amd64] " Duncan
0 siblings, 2 replies; 14+ messages in thread
From: Peter Humphrey @ 2007-06-22 9:30 UTC (permalink / raw
To: gentoo-amd64
I've been having trouble with the task scheduler in the kernel. I'm running
BOINC from a manual installation (because portage builds a useless version,
but that's another story) and as I have two CPUs I've told it to use them
both. This used to work well on my previous motherboard, now defunct, but
it doesn't on this Supermicro H8DCE.
I'm running gkrellm to show me what's happening in the system, including
processor loads. Nice time is shown separately from user and system time,
so I can easily see what BOINC's up to.
This is what happens: when BOINC starts up it starts two processes, which it
thinks are going to occupy up to 100% of each processor's time. But both
gkrellm and top show both processes running at 50% on CPU1, always that
one, with CPU0 idling. Then, if I start an emerge or something, that
divides its time more-or-less equally between the two processors with the
BOINC processes still confined to CPU1.
Even more confusingly, sometimes top even disagrees with itself about the
processor loadings, the heading lines showing one CPU loaded and the task
lines showing the other.
Just occasionally, BOINC will start its processes properly, each using 100%
of a CPU, but after a while it reverts spontaneously to its usual
behaviour. I can't find anything in any log to coincide with the reversion.
I've tried all the versions of BOINC I can find, and I've tried all the
available kernels, including vanilla-sources, with no change. I've also
tried running folding@home instead of BOINC, and that behaves in the same
way. I've talked to the BOINC people, who say they haven't seen this
behaviour anywhere else and that it sounds like a problem with my
particular configuration of components. I'm trying an installation of
Kubuntu to see if that's any different, but it's a long process getting to
an equivalent state so I can't report a result yet.
I'm beginning to think there must be a problem with my motherboard. Can
anyone suggest something else for me to check?
--
Rgds
Peter Humphrey
Linux Counter 5290, Aug 93
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-amd64] Identifying CPUs in the kernel
2007-06-22 9:30 [gentoo-amd64] Identifying CPUs in the kernel Peter Humphrey
@ 2007-06-22 15:01 ` Hemmann, Volker Armin
2007-06-22 17:50 ` Peter Humphrey
2007-06-22 15:02 ` [gentoo-amd64] " Duncan
1 sibling, 1 reply; 14+ messages in thread
From: Hemmann, Volker Armin @ 2007-06-22 15:01 UTC (permalink / raw
To: gentoo-amd64
On Freitag, 22. Juni 2007, Peter Humphrey wrote:
> I'm beginning to think there must be a problem with my motherboard. Can
> anyone suggest something else for me to check?
yes, wait for .22 kernel.
AFAIR There is a bugfix about scheduling on dual core cpus in the upcoming
release.
Aside from that, if everything else is more or less spreaded equally, I would
suspect BOINC and not the kernel. But you can try different schedulers (Con
Kolivas' rsdl or what it is called or Ingo Molnar's cfs - not to confused
with with io schedulers) to see if there is a difference.
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 14+ messages in thread
* [gentoo-amd64] Re: Identifying CPUs in the kernel
2007-06-22 9:30 [gentoo-amd64] Identifying CPUs in the kernel Peter Humphrey
2007-06-22 15:01 ` Hemmann, Volker Armin
@ 2007-06-22 15:02 ` Duncan
2007-06-22 18:10 ` Peter Humphrey
2007-06-27 1:14 ` Joshua Hoblitt
1 sibling, 2 replies; 14+ messages in thread
From: Duncan @ 2007-06-22 15:02 UTC (permalink / raw
To: gentoo-amd64
Peter Humphrey <prh@gotadsl.co.uk> posted
200706221030.26924.prh@gotadsl.co.uk, excerpted below, on Fri, 22 Jun
2007 10:30:26 +0100:
> This is what happens: when BOINC starts up it starts two processes,
> which it thinks are going to occupy up to 100% of each processor's time.
> But both gkrellm and top show both processes running at 50% on CPU1,
> always that one, with CPU0 idling. Then, if I start an emerge or
> something, that divides its time more-or-less equally between the two
> processors with the BOINC processes still confined to CPU1.
>
> Even more confusingly, sometimes top even disagrees with itself about
> the processor loadings, the heading lines showing one CPU loaded and the
> task lines showing the other.
>
> Just occasionally, BOINC will start its processes properly, each using
> 100% of a CPU, but after a while it reverts spontaneously to its usual
> behaviour. I can't find anything in any log to coincide with the
> reversion.
Was it you who posted about this before, or someone else? If it wasn't
you, take a look back thru the list a couple months, as it did come up
previously. You may have someone to compare notes with. =8^)
Separate processes or separate threads? Two CPUs (um, two separate
sockets) or two cores on the same CPU/socket?
Some or all of the following you likely already know, but hey, maybe
it'll help someone else and it never hurts to throw it in anyway...
The kernel task scheduler uses CPU affinity, which is supposed to have a
variable resistance to switching CPUs, and an preference for keeping a
task on the CPU controlling its memory, given a NUMA architecture
situation where there's local and remote memory, and a penalty to be paid
for access to remote memory.
There are however differences in architecture between AMD (with its
onboard memory controller and closer cooperation, both between cores and
between CPUs on separate sockets, due to the direct Hypertransport links)
and Intel (with it's off-chip controller and looser inter-core, inter-
chip, and inter-socket cooperation). There's also differences in the way
you can configure both the memory (thru the BIOS) and the kernel, for
separate NUMA access or unified view memory. If these settings don't
match your actual physical layout, efficiency will be less than peak,
either because there won't be enough resistance to a relatively high cost
switching between CPUs/cores and memory, so they'll switch off frequently
with little reason, incurring expensive delays each time, or because
there's too much resistance and to much favor placed on what the kernel
thinks is local vs remote memory, when it's all the same, and there is in
fact very little cost to switching cores/CPUs.
Generally, if you have a single slot true dual core (Intel core-duo or
any AMD dual core), you'll run a single memory controller with a single
unified view on memory, and costs to switch cores will be relatively
low. You'll want to disable NUMA, and configure your kernel with a
single scheduling domain.
If you have multiple slots or the early Intel pseudo-dual-cores, which
were really two separate CPUs simply packaged together, with no special
cooperation between them, you'll probably want them in separate
scheduling domains. If it's AMD with its onboard memory controllers, two
sockets means two controllers, and you'll also want to consider NUMA, tho
you can disable it and interleave your memory if you wish, for a unified
memory view and higher bandwidth, but at the tradeoff of higher latency
and less efficient memory access when separate tasks (each running on a
CPU) both want to use memory at the same time.
If you are lucky enough to have four cores, it gets more complex, as
currently, four-cores operate as two loosely cooperating pairs, with
closer cooperation between cores of the same pair. For highest
efficiency there, you'll have two levels of scheduling domain, mirroring
the tight local pair-partner cooperation with the rather looser
cooperation between pairs.
In particular, you'll want to pay attention to the following kernel
config settings under Processor type and features:
1) Symmetric multiprocessing support (CONFIG_SMP). You probably have
this set right or you'd not be using multiple CPUs/cores.
2) Under SMP, /possibly/ SMT (CONFIG_SCHED_SMT), tho for Intel only, and
on the older Hyperthreading Netburst arch models.
3) Still under SMP, Multi-core scheduler support (CONFIG_SCHED_MC), if
you have true dual cores. Again, note that the first "dual core" Intel
units were simply two separate CPUs in the same package, so you probably
do NOT want this for them.
4) Non Uniform Memory Access (NUMA) Support (CONFIG_NUMA). You probably
do NOT want this on single-socket multi-cores, and on most Intel
systems. You probably DO want this on AMD multi-socket Opteron systems,
BUT note that there may be BIOS settings for this as well. It won't work
so efficiently if the BIOS setting doesn't agree with the kernel setting.
5) If you have NUMA support enabled, you'll also want either Old style
AMD Opteron NUMA detection (CONFIG_K8_NUMA) or (preferred) ACPI NUMA
detection (CONFIG_X86_64_ACPI_NUMA).
6) Make sure you do *NOT* have NUMA emulation (CONFIG_NUMA_EMU) enabled.
As the help for that option says, it's only useful for debugging.
What I'm wondering, of course, is whether you have NUMA turned on when
you shouldn't, or don't have core scheduling turned on when you should,
thus artificially increasing the resistance to switching cores/cpus and
causing the stickiness.
Now for the process vs. thread stuff. With NUMA turned on, especially if
core scheduling is turned off, threads of the same app, accessing the
same memory, will be more likely to be scheduled on the same processor.
I don't know anything that will allow specifying processor per-thread, at
least with the newer NPTL (Native POSIX Thread Library) threading. With
the older Linux threads model, each thread showed up as a separate
process, with its own PID, and could therefore be accessed separately by
the various scheduling tools.
If however, you were correct when you said BOINC starts two separate
/processes/ (not threads), or if BOINC happens to use the older/heavier
Linux threads model (which again will cause the threads to show up as
separate processes), THEN you are in luck! =8^)
There are two scheduling utility packages that include utilities to tie
processes to one or more specific processors.
sys-process/schedutils is what I have installed. It's a collection of
separate utilities, including taskset, by which I can tell the kernel
which CPUs I want specific processes to run on. This worked well for me
since I was more interested in taskset than the other included utilities,
and only had to learn the single simple command. It does what I need it
to do, and does it well. =8^)
If you prefer a single do-it-all scheduler-tool, perhaps easier to learn
if you plan to fiddle with more than simply which CPU a process runs on,
and want to learn it all at once, sys-process/schedtool may be more your
style.
Hope that's of some help, even if part or all of it is review.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-amd64] Identifying CPUs in the kernel
2007-06-22 15:01 ` Hemmann, Volker Armin
@ 2007-06-22 17:50 ` Peter Humphrey
2007-06-22 18:18 ` Hemmann, Volker Armin
0 siblings, 1 reply; 14+ messages in thread
From: Peter Humphrey @ 2007-06-22 17:50 UTC (permalink / raw
To: gentoo-amd64
On Friday 22 June 2007 16:01:56 Hemmann, Volker Armin wrote:
> On Freitag, 22. Juni 2007, Peter Humphrey wrote:
> > I'm beginning to think there must be a problem with my motherboard. Can
> > anyone suggest something else for me to check?
>
> yes, wait for .22 kernel.
The vanilla sources I've tried are .22 and they behave the same.
> AFAIR There is a bugfix about scheduling on dual core cpus in the
> upcoming release.
I forgot to say that these are dual sockets, not cores.
> Aside from that, if everything else is more or less spreaded equally, I
> would suspect BOINC and not the kernel. But you can try different
> schedulers (Con Kolivas' rsdl or what it is called or Ingo Molnar's cfs -
> not to confused with with io schedulers) to see if there is a
> difference.
I'll look into that idea - thanks.
--
Rgds
Peter Humphrey
Linux Counter 5290, Aug 93
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-amd64] Re: Identifying CPUs in the kernel
2007-06-22 15:02 ` [gentoo-amd64] " Duncan
@ 2007-06-22 18:10 ` Peter Humphrey
2007-06-22 23:47 ` Duncan
2007-06-27 1:14 ` Joshua Hoblitt
1 sibling, 1 reply; 14+ messages in thread
From: Peter Humphrey @ 2007-06-22 18:10 UTC (permalink / raw
To: gentoo-amd64
On Friday 22 June 2007 16:02:59 Duncan wrote:
> Was it you who posted about this before, or someone else? If it wasn't
> you, take a look back thru the list a couple months, as it did come up
> previously. You may have someone to compare notes with. =8^)
Nope, t'was I, and I've now tried lots more things but with no success.
> Separate processes or separate threads? Two CPUs (um, two separate
> sockets) or two cores on the same CPU/socket?
Twin sockets, single Opteron 246 cores. And the BOINC scheduler sets off
separate processes; they have distinct PIDs and on the old motherboard ran
on distinct CPUs.
> There are [...] differences in architecture between AMD (with its onboard
> memory controller and closer cooperation, both between cores and between
> CPUs on separate sockets, due to the direct Hypertransport links) and
> Intel (with its off-chip controller and looser inter-core, inter- chip,
> and inter-socket cooperation). There's also differences in the way you
> can configure both the memory (thru the BIOS) and the kernel, for separate
> NUMA access or unified view memory. If these settings don't match your
> actual physical layout, efficiency will be less than peak, either because
> there won't be enough resistance to a relatively high cost switching
> between CPUs/cores and memory, so they'll switch off frequently
> with little reason, incurring expensive delays each time, or because
> there's too much resistance and to much favor placed on what the kernel
> thinks is local vs remote memory, when it's all the same, and there is in
> fact very little cost to switching cores/CPUs.
This board has eight DIMM sockets, four arranged next to each CPU socket and
associated with it electrically and logically. I have four 1GB DIMMs, each
in (I hope) the right pair of sockets in each bank of four. In other words,
each CPU has 2GB of local RAM. I suppose I could buy as much RAM again and
fill up all the sockets :-)
> If it's AMD with its onboard memory controllers, two sockets means two
> controllers, and you'll also want to consider NUMA, tho you can disable it
> and interleave your memory if you wish, for a unified memory view and
> higher bandwidth, but at the tradeoff of higher latency and less efficient
> memory access when separate tasks (each running on a CPU) both want to use
> memory at the same time.
NUMA is switched on in BIOS and kernel config. I still find some of the
BIOS settings mysterious though, so perhaps I don't have it set up right. I
have tried the two sets of defaults, failsafe and optimised, but with no
effect on this problem.
> In particular, you'll want to pay attention to the following kernel config
> settings under Processor type and features:
>
> 1) Symmetric multiprocessing support (CONFIG_SMP). You probably have
> this set right or you'd not be using multiple CPUs/cores.
Yep.
> 2) Under SMP, /possibly/ SMT (CONFIG_SCHED_SMT), tho for Intel only, and
> on the older Hyperthreading Netburst arch models.
Nope.
> 3) Still under SMP, Multi-core scheduler support (CONFIG_SCHED_MC), if
> you have true dual cores. Again, note that the first "dual core" Intel
> units were simply two separate CPUs in the same package, so you probably
> do NOT want this for them.
Nope.
> 4) Non Uniform Memory Access (NUMA) Support (CONFIG_NUMA) [...] You
> probably DO want this on AMD multi-socket Opteron systems, BUT note that
> there may be BIOS settings for this as well. It won't work so efficiently
> if the BIOS setting doesn't agree with the kernel setting.
Yep.
> 5) If you have NUMA support enabled, you'll also want either Old style
> AMD Opteron NUMA detection (CONFIG_K8_NUMA) or (preferred) ACPI NUMA
> detection (CONFIG_X86_64_ACPI_NUMA).
Tried both together and each separately. No difference.
> 6) Make sure you do *NOT* have NUMA emulation (CONFIG_NUMA_EMU) enabled.
No point in emulating something that's present. :-)
> What I'm wondering, of course, is whether you have NUMA turned on when
> you shouldn't, or don't have core scheduling turned on when you should,
> thus artificially increasing the resistance to switching cores/cpus and
> causing the stickiness.
I don't think so.
> If [...] you were correct when you said BOINC starts two
> separate /processes/ (not threads),
I'm sure I was correct.
> or if BOINC happens to use the older/heavier Linux threads model (which
> again will cause the threads to show up as separate processes),
I can't be quite certain this isn't happening, but I'm nearly so.
> There are two scheduling utility packages that include utilities to tie
> processes to one or more specific processors.
>
> sys-process/schedutils is what I have installed. It's a collection of
> separate utilities, including taskset, by which I can tell the kernel
> which CPUs I want specific processes to run on.
> If you prefer a single do-it-all scheduler-tool, perhaps easier to learn
> if you plan to fiddle with more than simply which CPU a process runs on,
> and want to learn it all at once, sys-process/schedtool may be more your
> style.
I'll look into those - thanks.
--
Rgds
Peter Humphrey
Linux Counter 5290, Aug 93
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-amd64] Identifying CPUs in the kernel
2007-06-22 17:50 ` Peter Humphrey
@ 2007-06-22 18:18 ` Hemmann, Volker Armin
0 siblings, 0 replies; 14+ messages in thread
From: Hemmann, Volker Armin @ 2007-06-22 18:18 UTC (permalink / raw
To: gentoo-amd64
On Freitag, 22. Juni 2007, Peter Humphrey wrote:
> On Friday 22 June 2007 16:01:56 Hemmann, Volker Armin wrote:
> > On Freitag, 22. Juni 2007, Peter Humphrey wrote:
> > > I'm beginning to think there must be a problem with my motherboard. Can
> > > anyone suggest something else for me to check?
> >
> > yes, wait for .22 kernel.
>
> The vanilla sources I've tried are .22 and they behave the same.
.22 is not released yet - and the rcX seem to be badly broken in the
filesystem area.
There are reported problems with xfs, ext3 and reiserfs using systems. I would
stay away from them until 2.6.22 is released.
> > AFAIR There is a bugfix about scheduling on dual core cpus in the
> > upcoming release.
>
> I forgot to say that these are dual sockets, not cores.
>
> > Aside from that, if everything else is more or less spreaded equally, I
> > would suspect BOINC and not the kernel. But you can try different
> > schedulers (Con Kolivas' rsdl or what it is called or Ingo Molnar's cfs -
> > not to confused with with io schedulers) to see if there is a
> > difference.
>
> I'll look into that idea - thanks.
http://people.redhat.com/mingo/cfs-scheduler/
for Con's patch use google ;)
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 14+ messages in thread
* [gentoo-amd64] Re: Identifying CPUs in the kernel
2007-06-22 18:10 ` Peter Humphrey
@ 2007-06-22 23:47 ` Duncan
2007-06-23 8:16 ` Peter Humphrey
0 siblings, 1 reply; 14+ messages in thread
From: Duncan @ 2007-06-22 23:47 UTC (permalink / raw
To: gentoo-amd64
Peter Humphrey <prh@gotadsl.co.uk> posted
200706221910.44194.prh@gotadsl.co.uk, excerpted below, on Fri, 22 Jun
2007 19:10:44 +0100:
>> What I'm wondering, of course, is whether you have NUMA turned on when
>> you shouldn't, or don't have core scheduling turned on when you should,
>> thus artificially increasing the resistance to switching cores/cpus and
>> causing the stickiness.
>
> I don't think so.
Yeah, now that you've clarified that it's sockets and confirmed settings,
you seem to have it right.
On the BIOS settings, some of them will affect whether the board can use
all four gigs memory as well, by controlling how it arranges the address
space and whether there's a hole left between 3.5 and 4 gig for 32-bit
PCI hardware addressing or not. I've a similar arrangement here on a
Tyan s2885, only with two-gig sticks so 8 gig memory. If you are seeing
your full 4 gig memory, tho, you've got that set right, both in the
kernel and in the BIOS.
The other BIOS settings of interest here are the access bitness/
interleaving. If it's like mine, you'll be able to set 32-, 64-, or 128-
bit interleaving. You'll want 64-bit, interleaving the sticks in the
node for best bandwidth there, but not the nodes, so they can be used
NUMA. In ordered to actually get the 64-bit interleaved access, you'll
need the sticks in paired slots on the node, however. (1&2 or 3&4, not
2&3 or separated.) But it sounds like you have that as well.
Finally, there's the question of how the rest of the system connects to
the sockets. Here, everything except memory all connects to the first
socket (CPU0), so the system can run in single socket mode. However,
that means anything doing heavy I/O or the like, including 3D video
access, runs most efficient on CPU0. In particular, using taskset
(mentioned in what I snipped), I've noticed that even in 2D mode but with
Composite on, X takes several percentage points more CPU when it's
scheduled on CPU1 than it does when it's allowed to run on CPU0. CPU1
works best with CPU or hard drive or other comparatively slow I/O bound
processes, the former since it doesn't matter which CPU for them, the
latter since the I/O is slow enough it's the bottleneck in any case. If
your board is laid out similarly, when you are playing around with
taskset or the like, it's worth keeping that in mind.
If as you say, BOINC is running separate processes, than scheduling it
with taskset should be possible and do what you need to do. The only
caveat would be if the processes terminate and restart. You may need to
hack a script up to run from cron, to check every minute or 10 or
whatever, depending on how long the BOINC tasks last, to keep them
scheduled on separate CPUs. I have a particular game (Master of Orion
original, the only non-source based software I still run) I run in DOSBOX
emulation. Mainly, I use taskset to set DOSBOX on CPU1, while X and
anything else I'm running that uses significant CPU gets put on CPU0.
That works VERY well, and has allowed me to increase the emulation speed
dramatically over that possible before, when X and DOSBOX may have been
running on the same CPU. That's the big thing I use taskset for, but it
works quite well for it. =8^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-amd64] Re: Identifying CPUs in the kernel
2007-06-22 23:47 ` Duncan
@ 2007-06-23 8:16 ` Peter Humphrey
2007-06-23 9:52 ` Duncan
0 siblings, 1 reply; 14+ messages in thread
From: Peter Humphrey @ 2007-06-23 8:16 UTC (permalink / raw
To: gentoo-amd64
On Saturday 23 June 2007 00:47:27 Duncan wrote:
> Peter Humphrey <prh@gotadsl.co.uk> posted
> 200706221910.44194.prh@gotadsl.co.uk, excerpted below, on Fri, 22 Jun
>
> 2007 19:10:44 +0100:
> >> What I'm wondering, of course, is whether you have NUMA turned on when
> >> you shouldn't, or don't have core scheduling turned on when you
> >> should, thus artificially increasing the resistance to switching
> >> cores/cpus and causing the stickiness.
> >
> > I don't think so.
>
> Yeah, now that you've clarified that it's sockets and confirmed settings,
> you seem to have it right.
Here's an example of silly output from top. In this case I did this:
# schedtool -a 0x1 5280
to get 5280 onto CPU0, then when I didn't get any better loadings I restored
the affinity to its original value:
# schedtool -a 0x3 5280
Here's what top showed then. Look at the /nice/ values on lines 3 and 4, and
compare those with the %CPU and Processor fields of processes 5279 and 5280
(sorry about the line wraps). This has me deeply puzzled:
top - 09:04:59 up 23 min, 5 users, load average: 3.60, 4.79, 3.91
Tasks: 124 total, 2 running, 122 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.3%us, 0.3%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Cpu1 : 0.0%us, 0.3%sy, 99.7%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 4088968k total, 1822644k used, 2266324k free, 218296k buffers
Swap: 4176848k total, 0k used, 4176848k free, 735708k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
5279 prh 34 19 60256 38m 3600 S 50 1.0 6:53.97 1
setiathome-5.12
5280 prh 34 19 60252 38m 3612 S 50 1.0 6:54.08 0
setiathome-5.12
3692 root 15 0 144m 63m 7564 S 0 1.6 0:36.92 0 X
5272 prh 15 0 4464 2636 1692 S 0 0.1 0:00.70 1 boinc
5286 prh 15 0 93016 21m 14m S 0 0.5 0:00.66 0 konsole
5322 prh 15 0 145m 13m 10m S 0 0.3 0:03.01 0 gkrellm2
10357 root 15 0 10732 1340 964 R 0 0.0 0:00.01 1 top
[snip system processes]
I don't think this is a scheduling problem; it goes deeper, so that the
kernel doesn't have a consistent picture of which processor is which.
--
Rgds
Peter Humphrey
Linux Counter 5290, Aug 93
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 14+ messages in thread
* [gentoo-amd64] Re: Identifying CPUs in the kernel
2007-06-23 8:16 ` Peter Humphrey
@ 2007-06-23 9:52 ` Duncan
2007-06-24 10:14 ` Peter Humphrey
2007-07-13 14:01 ` Peter Humphrey
0 siblings, 2 replies; 14+ messages in thread
From: Duncan @ 2007-06-23 9:52 UTC (permalink / raw
To: gentoo-amd64
Peter Humphrey <prh@gotadsl.co.uk> posted
200706230916.07711.prh@gotadsl.co.uk, excerpted below, on Sat, 23 Jun
2007 09:16:07 +0100:
> Here's what top showed then. Look at the /nice/ values on lines 3 and 4,
> and compare those with the %CPU and Processor fields of processes 5279
> and 5280 (sorry about the line wraps). This has me deeply puzzled:
Fixed the line wraps and removed a bit of extraneous information. =8^)
> top - 09:04:59 up 23 min, 5 users, load average: 3.60, 4.79, 3.91
> Tasks: 124 total, 2 running, 122 sleeping, 0 stopped, 0 zombie
> Cpu0: 0.3%us, 0.3%sy, 0.0%ni, 99.3%id, [zeroes]
> Cpu1: 0.0%us, 0.3%sy, 99.7%ni, 0.0%id, [zeroes]
> PID USER PR NI S %CPU %MEM TIME+ P COMMAND
> 5279 prh 34 19 S 50 1.0 6:53.97 1 setiathome-5.12
> 5280 prh 34 19 S 50 1.0 6:54.08 0 setiathome-5.12
> I don't think this is a scheduling problem; it goes deeper, so that the
> kernel doesn't have a consistent picture of which processor is which.
Critical question here, is that in SMP Irix or SMP Solaris mode? (See
the top manpage if you don't know what I mean.) Asked another way, is
that displaying percent of total CPU time (both CPUs) or percent of
total divided by number of CPUs (so percent of one CPU)?
If it's Irix mode (percent total CPU time), then it's reporting full
usage of both CPUs, one on each. The CPU0 line would then be the one
screwed up, since it's reporting idle when it clearly has to be in use.
If it's Solaris mode (percent of a single CPU's time, so total of all
percentages should be 200% if you have two CPUs), then the CPUs
lines would seem to be correct, both processes would appear to be
running on CPU1, maxing it out, and the P column of the 5280 line
would have to be screwed up. (That's assuming you let the figures
stabilize after the last schedtool call you made.)
In either case, I'm not sure where your bug is, but you are correct,
the problem appears to be way deeper than scheduling. I'd guess it's
ultimately a kernel bug, possibly due to a hardware bug, possibly not,
but you might wish to file it on top initially, just to see if they've
seen similar and can tell you what's going on. Unless you want to
double-check patching status yourself, you might as well file the bug
with Gentoo first, in case it's a Gentoo bug. They'll probably end
up closing it "upstream", but at least then when you file it upstream,
you can say you've cleared it with Gentoo first.
As for top, note that there's a trick you can use with it. You'll
likely want to trim the memory columns etc as I did for your bug
report, but you may not want to mess up your regular config to do
so. Not a problem! =8^) Create a symlink to top called something
else (say topbug). Then run it using the symlink, and you can change
and save your setttings, and it'll save them in a different rc file
(topbugrc using my example). That way, you can run it with the bug
report settings when you want to, without messing up your regular
config.
Of course, don't forget to mention in your bug report whether you were
in Solaris or Irix SMP mode, because as I explained, it /does/ make a
difference.
Let me know how this goes, post the bug number when you file it or
whatever, as I'd like to follow it too. You definitely have a
strange one here, and I'd /love/ to see what the real experts have
to say about it! You are absolutely correct, it doesn't seem to
make any sense at all!
Good luck. That's one /strange/ problem you have going there!
No /wonder/ you were expressing frustration earlier!
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-amd64] Re: Identifying CPUs in the kernel
2007-06-23 9:52 ` Duncan
@ 2007-06-24 10:14 ` Peter Humphrey
2007-07-13 14:01 ` Peter Humphrey
1 sibling, 0 replies; 14+ messages in thread
From: Peter Humphrey @ 2007-06-24 10:14 UTC (permalink / raw
To: gentoo-amd64
On Saturday 23 June 2007 10:52:33 Duncan wrote:
> Peter Humphrey <prh@gotadsl.co.uk> posted
> > top - 09:04:59 up 23 min, 5 users, load average: 3.60, 4.79, 3.91
> > Tasks: 124 total, 2 running, 122 sleeping, 0 stopped, 0 zombie
> >
> > Cpu0: 0.3%us, 0.3%sy, 0.0%ni, 99.3%id, [zeroes]
> > Cpu1: 0.0%us, 0.3%sy, 99.7%ni, 0.0%id, [zeroes]
> >
> > PID USER PR NI S %CPU %MEM TIME+ P COMMAND
> > 5279 prh 34 19 S 50 1.0 6:53.97 1 setiathome-5.12
> > 5280 prh 34 19 S 50 1.0 6:54.08 0 setiathome-5.12
> >
> > I don't think this is a scheduling problem; it goes deeper, so that the
> > kernel doesn't have a consistent picture of which processor is which.
>
> Critical question here, is that in SMP Irix or SMP Solaris mode? (See
> the top manpage if you don't know what I mean.) Asked another way, is
> that displaying percent of total CPU time (both CPUs) or percent of
> total divided by number of CPUs (so percent of one CPU)?
That's another oddity. I press <I> (capital letter i) and /top/ says "Irix
mode off" and shows half the previous percentage CPU in the process lines:
25 in the example above. I then press <I> again and it says "Irix mode on"
and shows the 50s again. Is this backwards, or is my utter confusion
showing? :-(
I want it to show:
- for each CPU, the percent to which it is loaded; and,
- for each process, how much of a CPU's time it is consuming.
The presence of two CPUs requires two CPU lines and allows for two lots of
processes. That seems logical to me. Is it Irix mode or Solaris?
This morning /top/ is showing this:
---
top - 10:51:55 up 2:22, 5 users, load average: 2.43, 2.34, 2.60
Tasks: 121 total, 4 running, 117 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.3%us, 0.3%sy, 0.0%ni, 99.3%id, [zeroes]
Cpu1 : 0.0%us, 0.3%sy, 99.7%ni, 0.0%id, [zeroes]
PID USER PR NI S %CPU %MEM TIME+ P COMMAND
5270 prh 34 19 S 50 0.9 67:55.08 1 setiathome-5.12
5271 prh 34 19 S 50 1.1 67:57.80 1 einstein_S5R2_4
---
So CPU1 is fully loaded and CPU0 is idling. Gkrellm shows the same. The box
has been running for 2 hours from cold and I haven't tampered with
anything; BOINC starts from /etc/init.d/local. When /top/ is behaving this
way my problem seems to be one of scheduling, but I'm pretty sure it isn't.
> I'm not sure where your bug is, but [...] the problem appears to be way
> deeper than scheduling. I'd guess it's ultimately a kernel bug, possibly
> due to a hardware bug
That's my thought too.
> [...] you might wish to file it on top initially, just to see if they've
> seen [anything] similar and can tell you what's going on. Unless you want
> to double-check patching status yourself, you might as well file the bug
> with Gentoo first, in case it's a Gentoo bug. They'll probably end up
> closing it "upstream", but at least then when you file it upstream, you
> can say you've cleared it with Gentoo first.
I'll do that, but I'll wait a day or two to see what else comes up here.
> You definitely have a strange one here, and I'd /love/ to see what the
> real experts have to say about it!
Mm, me too.
--
Rgds
Peter Humphrey
Linux Counter 5290, Aug 93
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-amd64] Re: Identifying CPUs in the kernel
2007-06-22 15:02 ` [gentoo-amd64] " Duncan
2007-06-22 18:10 ` Peter Humphrey
@ 2007-06-27 1:14 ` Joshua Hoblitt
1 sibling, 0 replies; 14+ messages in thread
From: Joshua Hoblitt @ 2007-06-27 1:14 UTC (permalink / raw
To: gentoo-amd64
[-- Attachment #1: Type: text/plain, Size: 723 bytes --]
On Fri, Jun 22, 2007 at 03:02:59PM +0000, Duncan wrote:
> There are two scheduling utility packages that include utilities to tie
> processes to one or more specific processors.
>
> sys-process/schedutils is what I have installed. It's a collection of
> separate utilities, including taskset, by which I can tell the kernel
> which CPUs I want specific processes to run on. This worked well for me
> since I was more interested in taskset than the other included utilities,
> and only had to learn the single simple command. It does what I need it
> to do, and does it well. =8^)
schedutils was merged into util-linux. see http://rlove.org/
ionice is part of util-linux now as well...
-J
--
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-amd64] Re: Identifying CPUs in the kernel
2007-06-23 9:52 ` Duncan
2007-06-24 10:14 ` Peter Humphrey
@ 2007-07-13 14:01 ` Peter Humphrey
2007-07-13 14:41 ` Peter Humphrey
2007-07-13 14:55 ` Duncan
1 sibling, 2 replies; 14+ messages in thread
From: Peter Humphrey @ 2007-07-13 14:01 UTC (permalink / raw
To: gentoo-amd64
On Saturday 23 June 2007 10:52, Duncan wrote:
> Let me know how this goes [...] as I'd like to follow it too.
Well, after quite a bit more work, I'm still none the wiser. I've built a
new installation from scratch without the ~amd64 key word, and I've tried
many combinations of kernel parameters. Once or twice I thought I'd found
it: for instance, it seemed that removing I2C completely caused the faulty
behaviour to appear, but then putting it back in didn't correct it.
The new system is as independent of the original as possible. The old one
was on /dev/hd[a,b] and the new is on /dev/sda, which is a SATA disk. I've
changed the BIOS to boot from SATA first, though I couldn't hide the old
disks completely - the IDE optical disks /dev/hd[c,d] were then hidden as
well.
I started with a genkernel kernel, remembering to enable SATA in it
first ;-) , then switched to compiling manually, progressively stripping
out all the extraneous modules and inbuilt features until I had a
reasonably well tuned kernel. it works just fine.
I also reverted the BIOS to an optimised set of defaults and changed one
thing at a time to reach (what I think is) the optimum for me. The system
is working as it should. Soon I'll erase the old partitions - when I'm sure
I don't need anything else off them.
So I can't put a case together to raise a bug report, and I'll have to
accept just-one-of-those-things.
--
Rgds
Peter.
Linux Counter 5290, Aug 93
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-amd64] Re: Identifying CPUs in the kernel
2007-07-13 14:01 ` Peter Humphrey
@ 2007-07-13 14:41 ` Peter Humphrey
2007-07-13 14:55 ` Duncan
1 sibling, 0 replies; 14+ messages in thread
From: Peter Humphrey @ 2007-07-13 14:41 UTC (permalink / raw
To: gentoo-amd64
On Friday 13 July 2007 15:01, I wrote:
[various things]
I meant also to say that I started out on this new build using the same CD as
I'd used before: 2006.1 minimal. When I'd installed a bare system it booted
OK and seemed healthy enough, but then if I ran "emerge -e system", the
resulting system would not load the Ethernet module properly, so I was
incommunicado. I did this several times (until I tired of banging my head
against that wall), including recompiling the kernel before the remerge,
after it or both or neither, before giving up and downloading a 2007.0 ISO.
That one worked fine.
This year is full of puzzles. I don't like mysteries.
--
Rgds
Peter.
Linux Counter 5290, Aug 93
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 14+ messages in thread
* [gentoo-amd64] Re: Identifying CPUs in the kernel
2007-07-13 14:01 ` Peter Humphrey
2007-07-13 14:41 ` Peter Humphrey
@ 2007-07-13 14:55 ` Duncan
1 sibling, 0 replies; 14+ messages in thread
From: Duncan @ 2007-07-13 14:55 UTC (permalink / raw
To: gentoo-amd64
Peter Humphrey <prh@gotadsl.co.uk> posted
200707131501.46783.prh@gotadsl.co.uk, excerpted below, on Fri, 13 Jul
2007 15:01:46 +0100:
> So I can't put a case together to raise a bug report, and I'll have to
> accept just-one-of-those-things.
Yeah, that happens. It can definitely be frustrating, especially since
because you don't know what caused it, you haven't the foggiest whether
it'll be back and when. =8^( However, I've come to the conclusion that
sometimes you just don't look that horse in the mouth, and chances are,
it'll be your trusty steed for some time. =8^)
BTW, had a list connectivity glitch and didn't get this reply tacked on
your last post, but re: Irix/Solaris mode, it seems the manpage says the
reverse of what it actually does. Having never used either of the named
platforms, I can't say whether the manpage or the app is correct in what
it claims for the modes, but yes, Irix mode ON seems to result in what
the manpage calls Solaris mode, while Irix mode OFF results in what the
manpage calls Irix mode. So at least that part isn't unique to your
(old) system, as that's the way it works here as well.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2007-07-13 14:59 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-22 9:30 [gentoo-amd64] Identifying CPUs in the kernel Peter Humphrey
2007-06-22 15:01 ` Hemmann, Volker Armin
2007-06-22 17:50 ` Peter Humphrey
2007-06-22 18:18 ` Hemmann, Volker Armin
2007-06-22 15:02 ` [gentoo-amd64] " Duncan
2007-06-22 18:10 ` Peter Humphrey
2007-06-22 23:47 ` Duncan
2007-06-23 8:16 ` Peter Humphrey
2007-06-23 9:52 ` Duncan
2007-06-24 10:14 ` Peter Humphrey
2007-07-13 14:01 ` Peter Humphrey
2007-07-13 14:41 ` Peter Humphrey
2007-07-13 14:55 ` Duncan
2007-06-27 1:14 ` Joshua Hoblitt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox