[gentoo-amd64] oom killer problems

public inbox for gentoo-amd64@lists.gentoo.org
 help / color / mirror / Atom feed

* [gentoo-amd64] oom killer problems
@ 2005-09-28 20:35 Hemmann, Volker Armin
  2005-09-29  7:14 ` [gentoo-amd64] " Duncan
  2005-10-01 20:47 ` [gentoo-amd64] oom killer problems - solved Hemmann, Volker Armin
  0 siblings, 2 replies; 7+ messages in thread
From: Hemmann, Volker Armin @ 2005-09-28 20:35 UTC (permalink / raw
  To: gentoo-amd64

Hi,
when I try to emerge kdepim-3.4.2 with the kdeenablefinal use-flag I get a lot 
of oom-kills.
I got them with 512mb, so I upgraded to 1gig and still have them.
What puzzles me is, that I have a lot of swap free when it happens.. could 
someone please tell me, why the oom-killer becomes active, when there is 
still a lot of free swap?
I am just an user, so using easy words would be much appreciated ;)

 5364.515694] oom-killer: gfp_mask=0xd0, order=0
[ 5364.515696] Mem-info:
[ 5364.515698] DMA per-cpu:
[ 5364.515700] cpu 0 hot: low 2, high 6, batch 1 used:5
[ 5364.515703] cpu 0 cold: low 0, high 2, batch 1 used:1
[ 5364.515705] Normal per-cpu:
[ 5364.515707] cpu 0 hot: low 62, high 186, batch 31 used:99
[ 5364.515709] cpu 0 cold: low 0, high 62, batch 31 used:54
[ 5364.515711] HighMem per-cpu: empty
[ 5364.515714] Free pages:        8068kB (0kB HighMem)
[ 5364.515718] Active:242539 inactive:312 dirty:0 writeback:0 unstable:0 
free:2017 slab:5622 mapped:241995 pagetables:1927
[ 5364.515723] DMA free:4088kB min:60kB low:72kB high:88kB active:8392kB 
inactive:0kB present:15996kB pages_scanned:8424 all_unreclaimable? yes
[ 5364.515726] lowmem_reserve[]: 0 1007 1007
[ 5364.515731] Normal free:3980kB min:4028kB low:5032kB high:6040kB 
active:961764kB inactive:1248kB present:1031872kB pages_scanned:1485831 
all_unreclaimable? yes
[ 5364.515734] lowmem_reserve[]: 0 0 0
[ 5364.515738] HighMem free:0kB min:128kB low:160kB high:192kB active:0kB 
inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
[ 5364.515740] lowmem_reserve[]: 0 0 0
[ 5364.515742] DMA: 0*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 
1*1024kB 1*2048kB 0*4096kB = 4088kB
[ 5364.515749] Normal: 1*4kB 3*8kB 1*16kB 1*32kB 1*64kB 2*128kB 0*256kB 
1*512kB 1*1024kB 1*2048kB 0*4096kB = 3980kB
[ 5364.515756] HighMem: empty
[ 5364.515759] Swap cache: add 35992, delete 28384, find 8627/10318, race 0+0
[ 5364.515761] Free swap  = 918004kB
[ 5364.515763] Total swap = 996020kB
[ 5364.515765] Free swap:       918004kB
[ 5364.521082] 262064 pages of RAM
[ 5364.521084] 5479 reserved pages
[ 5364.521086] 111904 pages shared
[ 5364.521088] 7608 pages swap cached
[ 5364.521146] Out of Memory: Killed process 7513 (klauncher).
[ 5373.371852] oom-killer: gfp_mask=0x201d2, order=0
[ 5373.371857] Mem-info:
[ 5373.371859] DMA per-cpu:
[ 5373.371862] cpu 0 hot: low 2, high 6, batch 1 used:5
[ 5373.371864] cpu 0 cold: low 0, high 2, batch 1 used:1
[ 5373.371866] Normal per-cpu:
[ 5373.371868] cpu 0 hot: low 62, high 186, batch 31 used:75
[ 5373.371871] cpu 0 cold: low 0, high 62, batch 31 used:34
[ 5373.371873] HighMem per-cpu: empty
[ 5373.371876] Free pages:        9024kB (0kB HighMem)
[ 5373.371880] Active:159724 inactive:82930 dirty:0 writeback:381 unstable:0 
free:2256 slab:5596 mapped:240928 pagetables:1927
[ 5373.371884] DMA free:4088kB min:60kB low:72kB high:88kB active:8392kB 
inactive:0kB present:15996kB pages_scanned:8424 all_unreclaimable? yes
[ 5373.371887] lowmem_reserve[]: 0 1007 1007
[ 5373.371892] Normal free:4936kB min:4028kB low:5032kB high:6040kB 
active:630504kB inactive:331720kB present:1031872kB pages_scanned:376761 
all_unreclaimable? no
[ 5373.371896] lowmem_reserve[]: 0 0 0
[ 5373.371899] HighMem free:0kB min:128kB low:160kB high:192kB active:0kB 
inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
[ 5373.371902] lowmem_reserve[]: 0 0 0
[ 5373.371904] DMA: 0*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 
1*1024kB 1*2048kB 0*4096kB = 4088kB
[ 5373.371911] Normal: 170*4kB 26*8kB 5*16kB 2*32kB 1*64kB 2*128kB 0*256kB 
1*512kB 1*1024kB 1*2048kB 0*4096kB = 4936kB
[ 5373.371918] HighMem: empty
[ 5373.371921] Swap cache: add 36927, delete 28640, find 8691/10408, race 0+0
[ 5373.371923] Free swap  = 914904kB
[ 5373.371925] Total swap = 996020kB
[ 5373.371926] Free swap:       914904kB
[ 5373.377339] 262064 pages of RAM
[ 5373.377342] 5479 reserved pages
[ 5373.377344] 112097 pages shared
[ 5373.377346] 8287 pages swap cached
[ 5373.377403] Out of Memory: Killed process 7533 (kwin).

kernel is 2.6.13-r2
I have 1gb of ram, and approximatly 1gb of swap.

Glück Auf
Volker

ps.
 I  emerged kdepim without kdeenablefinal, so there is no big pressure, I am 
just curious

-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gentoo-amd64]  Re: oom killer problems
  2005-09-28 20:35 [gentoo-amd64] oom killer problems Hemmann, Volker Armin
@ 2005-09-29  7:14 ` Duncan
  2005-09-29 16:27   ` Hemmann, Volker Armin
  2005-10-01 20:47 ` [gentoo-amd64] oom killer problems - solved Hemmann, Volker Armin
  1 sibling, 1 reply; 7+ messages in thread
From: Duncan @ 2005-09-29  7:14 UTC (permalink / raw
  To: gentoo-amd64

Hemmann, Volker Armin posted
<200509282235.32195.volker.armin.hemmann@tu-clausthal.de>, excerpted
below,  on Wed, 28 Sep 2005 22:35:32 +0200:

> Hi,
> when I try to emerge kdepim-3.4.2 with the kdeenablefinal use-flag I get
> a lot of oom-kills.
> I got them with 512mb, so I upgraded to 1gig and still have them. What
> puzzles me is, that I have a lot of swap free when it happens.. could
> someone please tell me, why the oom-killer becomes active, when there is
> still a lot of free swap?
> I am just an user, so using easy words would be much appreciated ;)
> 
[snip]
> 
> kernel is 2.6.13-r2
> I have 1gb of ram, and approximatly 1gb of swap.
> 
>  I  emerged kdepim without kdeenablefinal, so there is no big pressure,
>  I am
> just curious

There's something about the "lots of swap left" thing below.  However,
that's theory, I'll cover the practical stuff first, leaving that aspect
for later.

kdeenablefinal requires HUGE amounts of memory, no doubt about it.  I've
not had serious issues with my gig of memory (dual Opterons as you seem to
have), using kdeenablefinal here, but I've been doing things rather
different than you probably have, and any one of the things I've done
different may be the reason I haven't had the memory issue to the severity
you have.

1.  I have swap entirely disabled.

Here was my reasosning (apart from the issue at hand).  I was reading an
explanation of some of the aspects of the kernel VMM (virtual memory
manager) on LWN (Linux Weekly News, lwn.net), when I suddenly realized
that all the complexity they were describing I could probably do without,
by turning off swap, since I'd recently upgraded to a gig of RAM.  I
reasoned that I normally ran a quarter to a third of that in application
memory, so even if I doubled normal use at times, I'd still have a third
of a gig of free memory available for cache.  Further, I reasoned that if
something should use all that memory and STILL run out, it was likely a
runaway process, gobbling all the memory available, and that I might as
well have it activate the OOM killer at a gig, without further lugging the
system down, than at  2 G (or whatever), lugging the system down with a
swap storm so I couldn't do anything about it anyway.  For the most part,
I've been quite happy with my decision, altho now that suspend is starting
to look like it'll work for dual CPU systems (suspend to RAM sort of
worked, for the first time here, early in the .13 rcs, but they reverted
it for .13 release, as it needed more work), I may enable swap again, if
only to get suspend to disk functionality.

Of course, I'm not saying disabling swap is the right thing for you, but
I've been happy with it, here.  Anyway, a gig of RAM, swap disabled, so
the VMM complexity that's part of managing swap also disabled.  It's
possible that's a factor, tho I'm guessing the stuff below is more likely.

2.  Possibly the biggest factor is the KDE packages used.  I'm using the
split-ebuilds, NOT the monolithic category packages.  It's possible that's
the difference.  Further, I don't have all the split-packages that compose
kdepim-meta merged.  I have kmail and knode merged, with dependencies of
course, but don't have a handheld to worry about syncing to, so skipped
all those split-ebuilds that form part of kdepim-meta (and are part of the
monolithic ebuild), except where kmail/knode etc had them as dependencies.
Thus, no kitchensync, korn, kandy, kdepim-kresources, etc.

There are therefore two possibilities here.  One is that one of the
individual apps I skipped requires more memory.  The other is that the
monolithic ebuild you used does several things at once (possibly due to
your jobs setting, see below) where the split ebuilds do them in series,
therefore limiting the maximum memory required at a given moment.

3.  I'm NOT using unsermake.  For some reason, it hasn't worked for me
since KDE 3.2 or so.  I've tried different versions, but always had either
an error, or despite my settings, the ebuild doesn't seem to register
unsermake and thus uses the normal make system.  Unsermake is better at
parallellizing the various jobs, making more efficient use of multiple
CPUs, but also, given the memory required for enable final, likely causing
higher memory stress than ordinary gnu-make does.  If you are using that
and it's otherwise working for you, that may be the difference.

The rest of the possibilities may or may not apply.  You didn't include
the output of emerge info, so I can't  compare the relevant  info from
your system to mine.  However, I suspect they /do/ apply, for reasons
which should be clear as I present them, below.

4.  It appears (from the snipped stuff) you are running dual CPU (or a
single dual-core CPU).  How many jobs do you have portage configured for?
With my dual-CPU system, I originally had four set, but after seeing what
KDE compiling with kdeenablefinal did to my memory resources, even a gig,
I decided I better reduce that to three!  If you have  four or more
parallel jobs set, THAT could very possibly be your problem, right there.
You can probably do four or more jobs OR kdeenablefinal, but not BOTH, at
least not BOTH, while running X and KDE at the same time!

I should mention that I sometimes run multiple emerges (each with three
jobs) in parallel.  I *DID* run into OOM issues when trying to do that
with kmail and another large KDE package.  Kmail is of course part of
kdepim, and my experience DOES confirm that it's one of the largest in
memory requirements, with kdeenablefinal set.  I could emerge small things
in parallel with it, stuff like kworldwatch, say, but nothing major, like
konqueror.  Thus, I can almost certainly say that six jobs will trigger
the OOM killer, when some of them are kmail, and could speculate that five
jobs would do it, at some point in the kmail compilation.  Four jobs may
or may not work, but three did, for me, under the conditions explained in
the other six points, of course.

(Note that the unsermake thing could compound the issue here, because as I
said, it's better at finding things to run in parallel than the normal
make system is.)

5.  I'm now running gcc-4.0.1, and have been compiling kde with
gcc-4.0.0-preX or later since kde-3.4.0.  gcc-4.x is still package.mask-ed
on Gentoo, because some packages still don't compile with it.  Of course,
that's easily worked around because Gentoo slots gcc, so I have the latest
gcc-3.4.x installed, in addition to gcc-4.x, and can (and do) easily
switch between them using gcc-config.  However, the fact that gcc-4 is
still masked for Gentoo, means you probably aren't running it, while I am,
and that's what I compile kde with.  The 4.x version is enough different
from 3.4.x that memory use can be expected to be rather different as well.
It's quite possible that the kdeenablefinal stuff requires even more
memory with gcc-3.x than it does with the 4.x I've been successfully
using.

6.  It's also possible something else in the configuration affects
compile-time memory usage.  There are CFLAGS, of course, and I'm also
running newer (and still masked, AFAIK) versions of binutils and glibc,
with patches specifically for gcc-4.

7.  I don't do my kernels thru Gentoo, preferring instead to use the
kernel straight off of kernel.org, You say kernel 2.6.13-r2, the r2
indicating a Gentoo revision, but you don't say /which/ Gentoo kernel you
are running.  The VMM is complex enough and has a wide enough variety of
patches circulating for it, that it's possible you hit a bug that wasn't
in the mainline kernel.org kernel that I'm running.  Or... it may be some
other factor in our differing kernel configs.

...

Now to the theory.  Why would OOM trigger when you had all that free swap?
There are two possible explanations I am aware of and maybe others that
I'm not.

1.  "Memory allocation" is a verb as well as a noun.

We know that enablefinal uses lots of memory.  The USE flag description
mentions that and we've discovered it to be /very/ true.  If you run
ksysguard on your panel as I do, and monitor memory using it as I do (or
run a VT with a top session running if compiling at the text console), you
are also aware that memory use during compile sessions, particularly KDE
compile sessions with enablefinal set, varies VERY drastically!  From my
observations, each "job" will at times eat more and more memory, until
with kmail in particular, multiple jobs are taking well over 200MB of
memory a piece!  (See why I mentioned parallel jobs above?  At 200,
possibly 300+ MB apiece, multiple parallel jobs eat up the memory VERY
fast!)  After grabbing more and more memory for awhile, a job will
suddenly complete and release it ALL at once.  The memory usage graph will
suddenly drop multiple hundreds of megabytes -- for ONE job!

Well, during the memory usage increase phase, each job will allocate more
and more memory, a chunk at a time.  It's possible (tho not likely from my
observations of this particular usage pattern) that an app could want X MB
of memory all at once, in ordered to complete the task.  Until it gets
that memory it can't go any further, the task it is trying to do is half
complete so it can't release any memory either, without losing what it has
already done.  If the allocation request is big enough, (or you have
several of them in parallel all at the same time that together are big
enough), it can cause the OOM to trigger even with what looks like quite a
bit of free memory left, because all available cache and other memory that
can be freed has already been freed, and no app can continue to the point
of being able to release memory, without grabbing some memory first.  If
one of them is wanting a LOT of memory, and the OOM killer isn't killing
it off first (there are various OOM killer algorithms out there, some
using different factors for picking the app to die than others), stuff
will start dieing to allow the app wanting all that memory to get it.

Of course, it could also be very plainly a screwed up VMM or OOM killer,
as well.  These things aren't exactly simple to get right... and if gcc
took an unexpected optimization that has side effects...

2.  There is memory and there is "memory", and then there is 'memory' and
"'memory'" and '"memory"' as well.  <g>

There is of course the obvious difference between real/physical and
swap/virtual memory, with real memory being far faster (while at the same
time being slower than L2 cache, which is slower than L1 cache, which is
slower than the registers, which can be accessed at full CPU speed, but
that's beside the point for this discussion).

That's only the tip of the iceberg, however.  From the software's
perspective, that division mainly affects locked memory vs swappable
memory.  The kernel is always locked memory -- it cannot be swapped, even
drivers that are never used, the reason it makes sense to keep your kernel
as small as possible, leaving more room in real memory for programs to
use.  Depending on your kernel and its configuration, various forms of
RAMDISK, ramfs vs tmpfs vs ... may be locked (or not).  Likewise, some
kernel patches and configs make it easier or harder for applications to
lock memory as well.  Maybe a complicating factor here is that you had a
lot of locked memory and the compile process required more locked memory
than was left?  I'm not sure how much locked memory a normal process on a
normal kernel can have, if any, but given both that and the fact that the
kernel you were running is unknown, it's a possibility.

Then there are the "memory zones".  Fortunately, amd64 is less complicated
in this respect than x86.  However, various memory zones do still exist,
and not only do some things require memory in a specific zone, but it can
be difficult to transfer in-use memory from one zone to another, even
where it COULD be placed in a different  zone.  Up until earlier this
year, it was often impossible to transfer memory between zones without
using the backing store (swap).  That was the /only/ way possible!
However, as I said, amd64 is less complicated in this respect than x86, so
memory zones weren't likely the issue here -- unless something was going
wrong, of of course.

Finally, there's the "contiguous memory" issue.  Right after boot, your
system has lots of free memory, in large blobs of contiguous pages.  It's
easy to get contiguous memory allocated in blocks of 256, 512, and 1024
pages at once.  As uptime increases, however, memory gets fragmented thru
normal use.  A system that has been up awhile will have far fewer 1024
page blocks immediately available for use, and fewer 512 and 256 page
blocks as well. Total memory available may be the same, but if it's all in
1 and 2 page blocks, it'll take some serious time to move stuff around to
allocate a 1024 page contiguous block -- if it's even possible to do at
all.  Given the type of memory access patterns I've observed during kde
merges with enablefinal on, while I'm not technically skilled enough to
verify my suspicions, of the listed possibilities which are those I know,
I believe this to be the most likely culprit, the reason the OOM killer
was activating even while swap (and possibly even main memory) was still
free.

I'm sure there are other variations on the theme, however, other memory
type restrictions, and it may have been one of /those/ that it just so
happened came up short at the time you needed it.  In any case, as should
be quite plain by now, a raw "available memory" number doesn't give
/anything/ /even/ /close/ to the entire picture, at the detail needed to
fully grok why the OOM killer was activating, when overall memory wasn't
apparently in short supply at all.

I should also mention those numbers I snipped.  I know enough to just
begin to make a bit of sense out of them, but not enough to /understand/
them, at least to the point of understanding what they are saying is
wrong.  You can see the contiguous memory block figures  for each of the
DMA and normal memory zones.  4kB pages, so the 1024 page blocks are 4MB. 
I just don't understand enough about the internals to grok either them or
this log snip, however.  I know the general theories and hopefully
explained them well enough, but don't know how they apply concretely. 
Perhaps someone else does.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman in
http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html

-- 
gentoo-amd64@gentoo.org mailing list

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-amd64]  Re: oom killer problems
  2005-09-29  7:14 ` [gentoo-amd64] " Duncan
@ 2005-09-29 16:27   ` Hemmann, Volker Armin
  2005-09-30  0:40     ` John Myers
  0 siblings, 1 reply; 7+ messages in thread
From: Hemmann, Volker Armin @ 2005-09-29 16:27 UTC (permalink / raw
  To: gentoo-amd64

On Thursday 29 September 2005 09:14, Duncan wrote:
> Hemmann, Volker Armin posted
> <200509282235.32195.volker.armin.hemmann@tu-clausthal.de>, excerpted
>

> kdeenablefinal requires HUGE amounts of memory, no doubt about it.  I've
> not had serious issues with my gig of memory (dual Opterons as you seem to
> have), using kdeenablefinal here, but I've been doing things rather
> different than you probably have, and any one of the things I've done
> different may be the reason I haven't had the memory issue to the severity
> you have.
>

yeah, but with my 32bit system even 512mb were enough for building kdepim with 
kdeenablefinal


>
> The rest of the possibilities may or may not apply.  You didn't include
> the output of emerge info, so I can't  compare the relevant  info from
> your system to mine.  However, I suspect they /do/ apply, for reasons
> which should be clear as I present them, below.
>
> 4.  It appears (from the snipped stuff) you are running dual CPU (or a
> single dual-core CPU).  How many jobs do you have portage configured for?
> With my dual-CPU system, I originally had four set, but after seeing what
> KDE compiling with kdeenablefinal did to my memory resources, even a gig,
> I decided I better reduce that to three!  If you have  four or more
> parallel jobs set, THAT could very possibly be your problem, right there.
> You can probably do four or more jobs OR kdeenablefinal, but not BOTH, at
> least not BOTH, while running X and KDE at the same time!
>

no, single cpu, single core.

Here is my emerge info:
ortage 2.0.52-r1 (default-linux/amd64/2005.1, gcc-3.4.4, glibc-2.3.5-r1, 
2.6.13-gentoo-r2 x86_64)
=================================================================
System uname: 2.6.13-gentoo-r2 x86_64 AMD Athlon(tm) 64 Processor 3200+
Gentoo Base System version 1.12.0_pre8
ccache version 2.4 [disabled]
dev-lang/python:     2.3.5, 2.4.2
sys-apps/sandbox:    1.2.13
sys-devel/autoconf:  2.13, 2.59-r7
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6
sys-devel/binutils:  2.16.1
sys-devel/libtool:   1.5.20
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="amd64 ~amd64"
AUTOCLEAN="yes"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=k8 -O2 -fweb -ftracer -fpeel-loops -msse3 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3.4/env /usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/kde/3/share/config /usr/lib/X11/xkb /usr/share/config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-march=k8 -O2 -fweb -ftracer -fpeel-loops -msse3 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig distlocks sandbox sfperms strict"
GENTOO_MIRRORS="ftp://ftp.tu-clausthal.de/pub/linux/gentoo/"
LC_ALL="de_DE@euro"
LINGUAS="de"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="amd64 S3TC X acpi alsa audiofile avi bash-completion berkdb bitmap-fonts 
bluetooth bzip2 cairo cdparanoia cdr cpudetection crypt curl dvd dvdr dvdread 
emboss emul-linux-x86 encode exif ffmpeg fftw foomaticdb fortran ftp gif gimp 
glitz glut glx gnokii gpm gstreamer gtk gtk2 icq id3 imagemagick imlib irmc 
jabber java javascrip jp2 jpeg jpeg2k kde kdeenablefinal kdepim lame lesstif 
libwww lm_sensors lzo lzw lzw-tiff mad matroska mjpeg mmap mng motif mp3 mpeg 
mpeg2 mplayer mysql ncurses nls no-old-linux nocd nosendmail nowin nptl 
nsplugin nvidia offensive ogg openal opengl oscar pam pdflib perl player png 
posix python qt quicktime rar readline reiserfs scanner sdl sendfile 
sharedmem sms sndfile sockets spell ssl stencil-buffer subtitles svg sysfs 
tcpd tga theora tiff transcode truetype truetype-fonts type1 type1-fonts 
unicode usb userlocales v4l v4l2 vcd videos visualization vorbis wmf xanim 
xine xml xml2 xpm xrandr xsl xv xvid xvmc yv12 zlib zvbi linguas_de 
userland_GNU kernel_linux elibc_glibc"
Unset:  ASFLAGS, CTARGET, LANG, LDFLAGS, PORTDIR_OVERLAY

as you can see, makeopts is at -j2

>
> (Note that the unsermake thing could compound the issue here, because as I
> said, it's better at finding things to run in parallel than the normal
> make system is.)
>
> 5.  I'm now running gcc-4.0.1, and have been compiling kde with
> gcc-4.0.0-preX or later since kde-3.4.0.  gcc-4.x is still package.mask-ed
> on Gentoo, because some packages still don't compile with it.  Of course,
> that's easily worked around because Gentoo slots gcc, so I have the latest
> gcc-3.4.x installed, in addition to gcc-4.x, and can (and do) easily
> switch between them using gcc-config.  However, the fact that gcc-4 is
> still masked for Gentoo, means you probably aren't running it, while I am,
> and that's what I compile kde with.  The 4.x version is enough different
> from 3.4.x that memory use can be expected to be rather different as well.
> It's quite possible that the kdeenablefinal stuff requires even more
> memory with gcc-3.x than it does with the 4.x I've been successfully
> using.

hm, I read some stuff on anandtech, that shows, that the apps compiled with 
gcc4 are a LOT slower than apps compiled with 3.4 on the amd64 platform. So I 
stay away from it, until I see some numbers, that convince me to the opposite 
- and until I can be sure, that almost everything builds with it ;)


>
> 7.  I don't do my kernels thru Gentoo, preferring instead to use the
> kernel straight off of kernel.org, You say kernel 2.6.13-r2, the r2
> indicating a Gentoo revision, but you don't say /which/ Gentoo kernel you
> are running.  The VMM is complex enough and has a wide enough variety of
> patches circulating for it, that it's possible you hit a bug that wasn't
> in the mainline kernel.org kernel that I'm running.  Or... it may be some
> other factor in our differing kernel configs.

yes I said, at the bottom of my mail:
kernel is 2.6.13-r2

> ...
>
> Now to the theory.  Why would OOM trigger when you had all that free swap?
> There are two possible explanations I am aware of and maybe others that
> I'm not.
>
> 1.  "Memory allocation" is a verb as well as a noun.
>
> We know that enablefinal uses lots of memory.  The USE flag description
> mentions that and we've discovered it to be /very/ true.  If you run
> ksysguard on your panel as I do, and monitor memory using it as I do (or
> run a VT with a top session running if compiling at the text console), you
> are also aware that memory use during compile sessions, particularly KDE
> compile sessions with enablefinal set, varies VERY drastically!  From my
> observations, each "job" will at times eat more and more memory, until
> with kmail in particular, multiple jobs are taking well over 200MB of
> memory a piece!  (See why I mentioned parallel jobs above?  At 200,
> possibly 300+ MB apiece, multiple parallel jobs eat up the memory VERY
> fast!)  After grabbing more and more memory for awhile, a job will
> suddenly complete and release it ALL at once.  The memory usage graph will
> suddenly drop multiple hundreds of megabytes -- for ONE job!

i watched the memory consumption with gkrellm2.
At first, there were several hundered mb free, dropping fast to ~150mb free, 
which droppend slower to 20-50mb free. There it was 'locked' for some time, 
when suddenly the oom-killer sprang in (I did not watch gkrellm continously, 
even with a 3200+ kdepim takes more time to built, than I can watch gkrellm 
without a break). But the behaviour was the same for 512mb or 1
gb of ram.

> Well, during the memory usage increase phase, each job will allocate more
> and more memory, a chunk at a time.  It's possible (tho not likely from my
> observations of this particular usage pattern) that an app could want X MB
> of memory all at once, in ordered to complete the task.  Until it gets
> that memory it can't go any further, the task it is trying to do is half
> complete so it can't release any memory either, without losing what it has
> already done.  If the allocation request is big enough, (or you have
> several of them in parallel all at the same time that together are big
> enough), it can cause the OOM to trigger even with what looks like quite a
> bit of free memory left, because all available cache and other memory that
> can be freed has already been freed, and no app can continue to the point
> of being able to release memory, without grabbing some memory first.  If
> one of them is wanting a LOT of memory, and the OOM killer isn't killing
> it off first (there are various OOM killer algorithms out there, some
> using different factors for picking the app to die than others), stuff
> will start dieing to allow the app wanting all that memory to get it.
>
> Of course, it could also be very plainly a screwed up VMM or OOM killer,
> as well.  These things aren't exactly simple to get right... and if gcc
> took an unexpected optimization that has side effects...
>
> 2.  There is memory and there is "memory", and then there is 'memory' and
> "'memory'" and '"memory"' as well.  <g>
>
> There is of course the obvious difference between real/physical and
> swap/virtual memory, with real memory being far faster (while at the same
> time being slower than L2 cache, which is slower than L1 cache, which is
> slower than the registers, which can be accessed at full CPU speed, but
> that's beside the point for this discussion).
>
> That's only the tip of the iceberg, however.  From the software's
> perspective, that division mainly affects locked memory vs swappable
> memory.  The kernel is always locked memory -- it cannot be swapped, even
> drivers that are never used, the reason it makes sense to keep your kernel
> as small as possible, leaving more room in real memory for programs to
> use.  Depending on your kernel and its configuration, various forms of
> RAMDISK, ramfs vs tmpfs vs ... may be locked (or not).  Likewise, some
> kernel patches and configs make it easier or harder for applications to
> lock memory as well.  Maybe a complicating factor here is that you had a
> lot of locked memory and the compile process required more locked memory
> than was left?  I'm not sure how much locked memory a normal process on a
> normal kernel can have, if any, but given both that and the fact that the
> kernel you were running is unknown, it's a possibility.

I don't use ramdisks, and the only tempfs user is udev - with ~180kb used.

>
> Then there are the "memory zones".  Fortunately, amd64 is less complicated
> in this respect than x86.  However, various memory zones do still exist,
> and not only do some things require memory in a specific zone, but it can
> be difficult to transfer in-use memory from one zone to another, even
> where it COULD be placed in a different  zone.  Up until earlier this
> year, it was often impossible to transfer memory between zones without
> using the backing store (swap).  That was the /only/ way possible!
> However, as I said, amd64 is less complicated in this respect than x86, so
> memory zones weren't likely the issue here -- unless something was going
> wrong, of of course.
>
> Finally, there's the "contiguous memory" issue.  Right after boot, your
> system has lots of free memory, in large blobs of contiguous pages.  It's
> easy to get contiguous memory allocated in blocks of 256, 512, and 1024
> pages at once.  As uptime increases, however, memory gets fragmented thru
> normal use.  A system that has been up awhile will have far fewer 1024
> page blocks immediately available for use, and fewer 512 and 256 page
> blocks as well. Total memory available may be the same, but if it's all in
> 1 and 2 page blocks, it'll take some serious time to move stuff around to
> allocate a 1024 page contiguous block -- if it's even possible to do at
> all.  Given the type of memory access patterns I've observed during kde
> merges with enablefinal on, while I'm not technically skilled enough to
> verify my suspicions, of the listed possibilities which are those I know,
> I believe this to be the most likely culprit, the reason the OOM killer
> was activating even while swap (and possibly even main memory) was still
> free.
>
> I'm sure there are other variations on the theme, however, other memory
> type restrictions, and it may have been one of /those/ that it just so
> happened came up short at the time you needed it.  In any case, as should
> be quite plain by now, a raw "available memory" number doesn't give
> /anything/ /even/ /close/ to the entire picture, at the detail needed to
> fully grok why the OOM killer was activating, when overall memory wasn't
> apparently in short supply at all.
>
> I should also mention those numbers I snipped.  I know enough to just
> begin to make a bit of sense out of them, but not enough to /understand/
> them, at least to the point of understanding what they are saying is
> wrong.  You can see the contiguous memory block figures  for each of the
> DMA and normal memory zones.  4kB pages, so the 1024 page blocks are 4MB.
> I just don't understand enough about the internals to grok either them or
> this log snip, however.  I know the general theories and hopefully
> explained them well enough, but don't know how they apply concretely.
> Perhaps someone else does.
>

thanks for your time - I will try vanilla kernel.org kernels this weekend and 
if there is any difference, I will post again.


Glück Auf
Volker

-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-amd64]  Re: oom killer problems
  2005-09-29 16:27   ` Hemmann, Volker Armin
@ 2005-09-30  0:40     ` John Myers
  0 siblings, 0 replies; 7+ messages in thread
From: John Myers @ 2005-09-30  0:40 UTC (permalink / raw
  To: gentoo-amd64

[-- Attachment #1: Type: text/plain, Size: 268 bytes --]

On Thursday 29 September 2005 09:27, Hemmann, Volker Armin wrote:
> yeah, but with my 32bit system even 512mb were enough for building kdepim
> with kdeenablefinal
>
32-bit code is a lot smaller than 64-bit code, both the size of the code, and 
the memory usage

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-amd64] oom killer problems - solved
  2005-09-28 20:35 [gentoo-amd64] oom killer problems Hemmann, Volker Armin
  2005-09-29  7:14 ` [gentoo-amd64] " Duncan
@ 2005-10-01 20:47 ` Hemmann, Volker Armin
  2005-10-01 22:26   ` Hemmann, Volker Armin
  1 sibling, 1 reply; 7+ messages in thread
From: Hemmann, Volker Armin @ 2005-10-01 20:47 UTC (permalink / raw
  To: gentoo-amd64

Hi,

I tried vanilla kernel org kernel (2.6.14-rc2)

and no oom-kills occur!

Hmm, does that mean, that there is a bug in the gentoo-sources?
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-amd64] oom killer problems - solved
  2005-10-01 20:47 ` [gentoo-amd64] oom killer problems - solved Hemmann, Volker Armin
@ 2005-10-01 22:26   ` Hemmann, Volker Armin
  2005-10-01 22:30     ` Hemmann, Volker Armin
  0 siblings, 1 reply; 7+ messages in thread
From: Hemmann, Volker Armin @ 2005-10-01 22:26 UTC (permalink / raw
  To: gentoo-amd64

On Saturday 01 October 2005 22:47, Hemmann, Volker Armin wrote:
> Hi,
>
> I tried vanilla kernel org kernel (2.6.14-rc2)
>
> and no oom-kills occur!
>
> Hmm, does that mean, that there is a bug in the gentoo-sources?

no, it is not a gentoo-sources bug.

UNORDERED_IO=y
was the culprit.
Changing it to n made kdepim built without problems.

-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-amd64] oom killer problems - solved
  2005-10-01 22:26   ` Hemmann, Volker Armin
@ 2005-10-01 22:30     ` Hemmann, Volker Armin
  0 siblings, 0 replies; 7+ messages in thread
From: Hemmann, Volker Armin @ 2005-10-01 22:30 UTC (permalink / raw
  To: gentoo-amd64

On Sunday 02 October 2005 00:26, Hemmann, Volker Armin wrote:
> On Saturday 01 October 2005 22:47, Hemmann, Volker Armin wrote:
> > Hi,
> >
> > I tried vanilla kernel org kernel (2.6.14-rc2)
> >
> > and no oom-kills occur!
> >
> > Hmm, does that mean, that there is a bug in the gentoo-sources?
>
> no, it is not a gentoo-sources bug.
>
> UNORDERED_IO=y
> was the culprit.
> Changing it to n made kdepim built without problems.

I was to fast - I got an oom-kill again.
I just not 'saw' it, because this time, it only killed kdelauncher and nothing 
I was seeing directly :(

hmm

-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-10-01 22:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-28 20:35 [gentoo-amd64] oom killer problems Hemmann, Volker Armin
2005-09-29  7:14 ` [gentoo-amd64] " Duncan
2005-09-29 16:27   ` Hemmann, Volker Armin
2005-09-30  0:40     ` John Myers
2005-10-01 20:47 ` [gentoo-amd64] oom killer problems - solved Hemmann, Volker Armin
2005-10-01 22:26   ` Hemmann, Volker Armin
2005-10-01 22:30     ` Hemmann, Volker Armin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox