* [gentoo-amd64] oom killer problems @ 2005-09-28 20:35 Hemmann, Volker Armin 2005-09-29 7:14 ` [gentoo-amd64] " Duncan 2005-10-01 20:47 ` [gentoo-amd64] oom killer problems - solved Hemmann, Volker Armin 0 siblings, 2 replies; 7+ messages in thread From: Hemmann, Volker Armin @ 2005-09-28 20:35 UTC (permalink / raw To: gentoo-amd64 Hi, when I try to emerge kdepim-3.4.2 with the kdeenablefinal use-flag I get a lot of oom-kills. I got them with 512mb, so I upgraded to 1gig and still have them. What puzzles me is, that I have a lot of swap free when it happens.. could someone please tell me, why the oom-killer becomes active, when there is still a lot of free swap? I am just an user, so using easy words would be much appreciated ;) 5364.515694] oom-killer: gfp_mask=0xd0, order=0 [ 5364.515696] Mem-info: [ 5364.515698] DMA per-cpu: [ 5364.515700] cpu 0 hot: low 2, high 6, batch 1 used:5 [ 5364.515703] cpu 0 cold: low 0, high 2, batch 1 used:1 [ 5364.515705] Normal per-cpu: [ 5364.515707] cpu 0 hot: low 62, high 186, batch 31 used:99 [ 5364.515709] cpu 0 cold: low 0, high 62, batch 31 used:54 [ 5364.515711] HighMem per-cpu: empty [ 5364.515714] Free pages: 8068kB (0kB HighMem) [ 5364.515718] Active:242539 inactive:312 dirty:0 writeback:0 unstable:0 free:2017 slab:5622 mapped:241995 pagetables:1927 [ 5364.515723] DMA free:4088kB min:60kB low:72kB high:88kB active:8392kB inactive:0kB present:15996kB pages_scanned:8424 all_unreclaimable? yes [ 5364.515726] lowmem_reserve[]: 0 1007 1007 [ 5364.515731] Normal free:3980kB min:4028kB low:5032kB high:6040kB active:961764kB inactive:1248kB present:1031872kB pages_scanned:1485831 all_unreclaimable? yes [ 5364.515734] lowmem_reserve[]: 0 0 0 [ 5364.515738] HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no [ 5364.515740] lowmem_reserve[]: 0 0 0 [ 5364.515742] DMA: 0*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4088kB [ 5364.515749] Normal: 1*4kB 3*8kB 1*16kB 1*32kB 1*64kB 2*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3980kB [ 5364.515756] HighMem: empty [ 5364.515759] Swap cache: add 35992, delete 28384, find 8627/10318, race 0+0 [ 5364.515761] Free swap = 918004kB [ 5364.515763] Total swap = 996020kB [ 5364.515765] Free swap: 918004kB [ 5364.521082] 262064 pages of RAM [ 5364.521084] 5479 reserved pages [ 5364.521086] 111904 pages shared [ 5364.521088] 7608 pages swap cached [ 5364.521146] Out of Memory: Killed process 7513 (klauncher). [ 5373.371852] oom-killer: gfp_mask=0x201d2, order=0 [ 5373.371857] Mem-info: [ 5373.371859] DMA per-cpu: [ 5373.371862] cpu 0 hot: low 2, high 6, batch 1 used:5 [ 5373.371864] cpu 0 cold: low 0, high 2, batch 1 used:1 [ 5373.371866] Normal per-cpu: [ 5373.371868] cpu 0 hot: low 62, high 186, batch 31 used:75 [ 5373.371871] cpu 0 cold: low 0, high 62, batch 31 used:34 [ 5373.371873] HighMem per-cpu: empty [ 5373.371876] Free pages: 9024kB (0kB HighMem) [ 5373.371880] Active:159724 inactive:82930 dirty:0 writeback:381 unstable:0 free:2256 slab:5596 mapped:240928 pagetables:1927 [ 5373.371884] DMA free:4088kB min:60kB low:72kB high:88kB active:8392kB inactive:0kB present:15996kB pages_scanned:8424 all_unreclaimable? yes [ 5373.371887] lowmem_reserve[]: 0 1007 1007 [ 5373.371892] Normal free:4936kB min:4028kB low:5032kB high:6040kB active:630504kB inactive:331720kB present:1031872kB pages_scanned:376761 all_unreclaimable? no [ 5373.371896] lowmem_reserve[]: 0 0 0 [ 5373.371899] HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no [ 5373.371902] lowmem_reserve[]: 0 0 0 [ 5373.371904] DMA: 0*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4088kB [ 5373.371911] Normal: 170*4kB 26*8kB 5*16kB 2*32kB 1*64kB 2*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4936kB [ 5373.371918] HighMem: empty [ 5373.371921] Swap cache: add 36927, delete 28640, find 8691/10408, race 0+0 [ 5373.371923] Free swap = 914904kB [ 5373.371925] Total swap = 996020kB [ 5373.371926] Free swap: 914904kB [ 5373.377339] 262064 pages of RAM [ 5373.377342] 5479 reserved pages [ 5373.377344] 112097 pages shared [ 5373.377346] 8287 pages swap cached [ 5373.377403] Out of Memory: Killed process 7533 (kwin). kernel is 2.6.13-r2 I have 1gb of ram, and approximatly 1gb of swap. Glück Auf Volker ps. I emerged kdepim without kdeenablefinal, so there is no big pressure, I am just curious -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 7+ messages in thread
* [gentoo-amd64] Re: oom killer problems 2005-09-28 20:35 [gentoo-amd64] oom killer problems Hemmann, Volker Armin @ 2005-09-29 7:14 ` Duncan 2005-09-29 16:27 ` Hemmann, Volker Armin 2005-10-01 20:47 ` [gentoo-amd64] oom killer problems - solved Hemmann, Volker Armin 1 sibling, 1 reply; 7+ messages in thread From: Duncan @ 2005-09-29 7:14 UTC (permalink / raw To: gentoo-amd64 Hemmann, Volker Armin posted <200509282235.32195.volker.armin.hemmann@tu-clausthal.de>, excerpted below, on Wed, 28 Sep 2005 22:35:32 +0200: > Hi, > when I try to emerge kdepim-3.4.2 with the kdeenablefinal use-flag I get > a lot of oom-kills. > I got them with 512mb, so I upgraded to 1gig and still have them. What > puzzles me is, that I have a lot of swap free when it happens.. could > someone please tell me, why the oom-killer becomes active, when there is > still a lot of free swap? > I am just an user, so using easy words would be much appreciated ;) > [snip] > > kernel is 2.6.13-r2 > I have 1gb of ram, and approximatly 1gb of swap. > > I emerged kdepim without kdeenablefinal, so there is no big pressure, > I am > just curious There's something about the "lots of swap left" thing below. However, that's theory, I'll cover the practical stuff first, leaving that aspect for later. kdeenablefinal requires HUGE amounts of memory, no doubt about it. I've not had serious issues with my gig of memory (dual Opterons as you seem to have), using kdeenablefinal here, but I've been doing things rather different than you probably have, and any one of the things I've done different may be the reason I haven't had the memory issue to the severity you have. 1. I have swap entirely disabled. Here was my reasosning (apart from the issue at hand). I was reading an explanation of some of the aspects of the kernel VMM (virtual memory manager) on LWN (Linux Weekly News, lwn.net), when I suddenly realized that all the complexity they were describing I could probably do without, by turning off swap, since I'd recently upgraded to a gig of RAM. I reasoned that I normally ran a quarter to a third of that in application memory, so even if I doubled normal use at times, I'd still have a third of a gig of free memory available for cache. Further, I reasoned that if something should use all that memory and STILL run out, it was likely a runaway process, gobbling all the memory available, and that I might as well have it activate the OOM killer at a gig, without further lugging the system down, than at 2 G (or whatever), lugging the system down with a swap storm so I couldn't do anything about it anyway. For the most part, I've been quite happy with my decision, altho now that suspend is starting to look like it'll work for dual CPU systems (suspend to RAM sort of worked, for the first time here, early in the .13 rcs, but they reverted it for .13 release, as it needed more work), I may enable swap again, if only to get suspend to disk functionality. Of course, I'm not saying disabling swap is the right thing for you, but I've been happy with it, here. Anyway, a gig of RAM, swap disabled, so the VMM complexity that's part of managing swap also disabled. It's possible that's a factor, tho I'm guessing the stuff below is more likely. 2. Possibly the biggest factor is the KDE packages used. I'm using the split-ebuilds, NOT the monolithic category packages. It's possible that's the difference. Further, I don't have all the split-packages that compose kdepim-meta merged. I have kmail and knode merged, with dependencies of course, but don't have a handheld to worry about syncing to, so skipped all those split-ebuilds that form part of kdepim-meta (and are part of the monolithic ebuild), except where kmail/knode etc had them as dependencies. Thus, no kitchensync, korn, kandy, kdepim-kresources, etc. There are therefore two possibilities here. One is that one of the individual apps I skipped requires more memory. The other is that the monolithic ebuild you used does several things at once (possibly due to your jobs setting, see below) where the split ebuilds do them in series, therefore limiting the maximum memory required at a given moment. 3. I'm NOT using unsermake. For some reason, it hasn't worked for me since KDE 3.2 or so. I've tried different versions, but always had either an error, or despite my settings, the ebuild doesn't seem to register unsermake and thus uses the normal make system. Unsermake is better at parallellizing the various jobs, making more efficient use of multiple CPUs, but also, given the memory required for enable final, likely causing higher memory stress than ordinary gnu-make does. If you are using that and it's otherwise working for you, that may be the difference. The rest of the possibilities may or may not apply. You didn't include the output of emerge info, so I can't compare the relevant info from your system to mine. However, I suspect they /do/ apply, for reasons which should be clear as I present them, below. 4. It appears (from the snipped stuff) you are running dual CPU (or a single dual-core CPU). How many jobs do you have portage configured for? With my dual-CPU system, I originally had four set, but after seeing what KDE compiling with kdeenablefinal did to my memory resources, even a gig, I decided I better reduce that to three! If you have four or more parallel jobs set, THAT could very possibly be your problem, right there. You can probably do four or more jobs OR kdeenablefinal, but not BOTH, at least not BOTH, while running X and KDE at the same time! I should mention that I sometimes run multiple emerges (each with three jobs) in parallel. I *DID* run into OOM issues when trying to do that with kmail and another large KDE package. Kmail is of course part of kdepim, and my experience DOES confirm that it's one of the largest in memory requirements, with kdeenablefinal set. I could emerge small things in parallel with it, stuff like kworldwatch, say, but nothing major, like konqueror. Thus, I can almost certainly say that six jobs will trigger the OOM killer, when some of them are kmail, and could speculate that five jobs would do it, at some point in the kmail compilation. Four jobs may or may not work, but three did, for me, under the conditions explained in the other six points, of course. (Note that the unsermake thing could compound the issue here, because as I said, it's better at finding things to run in parallel than the normal make system is.) 5. I'm now running gcc-4.0.1, and have been compiling kde with gcc-4.0.0-preX or later since kde-3.4.0. gcc-4.x is still package.mask-ed on Gentoo, because some packages still don't compile with it. Of course, that's easily worked around because Gentoo slots gcc, so I have the latest gcc-3.4.x installed, in addition to gcc-4.x, and can (and do) easily switch between them using gcc-config. However, the fact that gcc-4 is still masked for Gentoo, means you probably aren't running it, while I am, and that's what I compile kde with. The 4.x version is enough different from 3.4.x that memory use can be expected to be rather different as well. It's quite possible that the kdeenablefinal stuff requires even more memory with gcc-3.x than it does with the 4.x I've been successfully using. 6. It's also possible something else in the configuration affects compile-time memory usage. There are CFLAGS, of course, and I'm also running newer (and still masked, AFAIK) versions of binutils and glibc, with patches specifically for gcc-4. 7. I don't do my kernels thru Gentoo, preferring instead to use the kernel straight off of kernel.org, You say kernel 2.6.13-r2, the r2 indicating a Gentoo revision, but you don't say /which/ Gentoo kernel you are running. The VMM is complex enough and has a wide enough variety of patches circulating for it, that it's possible you hit a bug that wasn't in the mainline kernel.org kernel that I'm running. Or... it may be some other factor in our differing kernel configs. ... Now to the theory. Why would OOM trigger when you had all that free swap? There are two possible explanations I am aware of and maybe others that I'm not. 1. "Memory allocation" is a verb as well as a noun. We know that enablefinal uses lots of memory. The USE flag description mentions that and we've discovered it to be /very/ true. If you run ksysguard on your panel as I do, and monitor memory using it as I do (or run a VT with a top session running if compiling at the text console), you are also aware that memory use during compile sessions, particularly KDE compile sessions with enablefinal set, varies VERY drastically! From my observations, each "job" will at times eat more and more memory, until with kmail in particular, multiple jobs are taking well over 200MB of memory a piece! (See why I mentioned parallel jobs above? At 200, possibly 300+ MB apiece, multiple parallel jobs eat up the memory VERY fast!) After grabbing more and more memory for awhile, a job will suddenly complete and release it ALL at once. The memory usage graph will suddenly drop multiple hundreds of megabytes -- for ONE job! Well, during the memory usage increase phase, each job will allocate more and more memory, a chunk at a time. It's possible (tho not likely from my observations of this particular usage pattern) that an app could want X MB of memory all at once, in ordered to complete the task. Until it gets that memory it can't go any further, the task it is trying to do is half complete so it can't release any memory either, without losing what it has already done. If the allocation request is big enough, (or you have several of them in parallel all at the same time that together are big enough), it can cause the OOM to trigger even with what looks like quite a bit of free memory left, because all available cache and other memory that can be freed has already been freed, and no app can continue to the point of being able to release memory, without grabbing some memory first. If one of them is wanting a LOT of memory, and the OOM killer isn't killing it off first (there are various OOM killer algorithms out there, some using different factors for picking the app to die than others), stuff will start dieing to allow the app wanting all that memory to get it. Of course, it could also be very plainly a screwed up VMM or OOM killer, as well. These things aren't exactly simple to get right... and if gcc took an unexpected optimization that has side effects... 2. There is memory and there is "memory", and then there is 'memory' and "'memory'" and '"memory"' as well. <g> There is of course the obvious difference between real/physical and swap/virtual memory, with real memory being far faster (while at the same time being slower than L2 cache, which is slower than L1 cache, which is slower than the registers, which can be accessed at full CPU speed, but that's beside the point for this discussion). That's only the tip of the iceberg, however. From the software's perspective, that division mainly affects locked memory vs swappable memory. The kernel is always locked memory -- it cannot be swapped, even drivers that are never used, the reason it makes sense to keep your kernel as small as possible, leaving more room in real memory for programs to use. Depending on your kernel and its configuration, various forms of RAMDISK, ramfs vs tmpfs vs ... may be locked (or not). Likewise, some kernel patches and configs make it easier or harder for applications to lock memory as well. Maybe a complicating factor here is that you had a lot of locked memory and the compile process required more locked memory than was left? I'm not sure how much locked memory a normal process on a normal kernel can have, if any, but given both that and the fact that the kernel you were running is unknown, it's a possibility. Then there are the "memory zones". Fortunately, amd64 is less complicated in this respect than x86. However, various memory zones do still exist, and not only do some things require memory in a specific zone, but it can be difficult to transfer in-use memory from one zone to another, even where it COULD be placed in a different zone. Up until earlier this year, it was often impossible to transfer memory between zones without using the backing store (swap). That was the /only/ way possible! However, as I said, amd64 is less complicated in this respect than x86, so memory zones weren't likely the issue here -- unless something was going wrong, of of course. Finally, there's the "contiguous memory" issue. Right after boot, your system has lots of free memory, in large blobs of contiguous pages. It's easy to get contiguous memory allocated in blocks of 256, 512, and 1024 pages at once. As uptime increases, however, memory gets fragmented thru normal use. A system that has been up awhile will have far fewer 1024 page blocks immediately available for use, and fewer 512 and 256 page blocks as well. Total memory available may be the same, but if it's all in 1 and 2 page blocks, it'll take some serious time to move stuff around to allocate a 1024 page contiguous block -- if it's even possible to do at all. Given the type of memory access patterns I've observed during kde merges with enablefinal on, while I'm not technically skilled enough to verify my suspicions, of the listed possibilities which are those I know, I believe this to be the most likely culprit, the reason the OOM killer was activating even while swap (and possibly even main memory) was still free. I'm sure there are other variations on the theme, however, other memory type restrictions, and it may have been one of /those/ that it just so happened came up short at the time you needed it. In any case, as should be quite plain by now, a raw "available memory" number doesn't give /anything/ /even/ /close/ to the entire picture, at the detail needed to fully grok why the OOM killer was activating, when overall memory wasn't apparently in short supply at all. I should also mention those numbers I snipped. I know enough to just begin to make a bit of sense out of them, but not enough to /understand/ them, at least to the point of understanding what they are saying is wrong. You can see the contiguous memory block figures for each of the DMA and normal memory zones. 4kB pages, so the 1024 page blocks are 4MB. I just don't understand enough about the internals to grok either them or this log snip, however. I know the general theories and hopefully explained them well enough, but don't know how they apply concretely. Perhaps someone else does. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman in http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-amd64] Re: oom killer problems 2005-09-29 7:14 ` [gentoo-amd64] " Duncan @ 2005-09-29 16:27 ` Hemmann, Volker Armin 2005-09-30 0:40 ` John Myers 0 siblings, 1 reply; 7+ messages in thread From: Hemmann, Volker Armin @ 2005-09-29 16:27 UTC (permalink / raw To: gentoo-amd64 On Thursday 29 September 2005 09:14, Duncan wrote: > Hemmann, Volker Armin posted > <200509282235.32195.volker.armin.hemmann@tu-clausthal.de>, excerpted > > kdeenablefinal requires HUGE amounts of memory, no doubt about it. I've > not had serious issues with my gig of memory (dual Opterons as you seem to > have), using kdeenablefinal here, but I've been doing things rather > different than you probably have, and any one of the things I've done > different may be the reason I haven't had the memory issue to the severity > you have. > yeah, but with my 32bit system even 512mb were enough for building kdepim with kdeenablefinal > > The rest of the possibilities may or may not apply. You didn't include > the output of emerge info, so I can't compare the relevant info from > your system to mine. However, I suspect they /do/ apply, for reasons > which should be clear as I present them, below. > > 4. It appears (from the snipped stuff) you are running dual CPU (or a > single dual-core CPU). How many jobs do you have portage configured for? > With my dual-CPU system, I originally had four set, but after seeing what > KDE compiling with kdeenablefinal did to my memory resources, even a gig, > I decided I better reduce that to three! If you have four or more > parallel jobs set, THAT could very possibly be your problem, right there. > You can probably do four or more jobs OR kdeenablefinal, but not BOTH, at > least not BOTH, while running X and KDE at the same time! > no, single cpu, single core. Here is my emerge info: ortage 2.0.52-r1 (default-linux/amd64/2005.1, gcc-3.4.4, glibc-2.3.5-r1, 2.6.13-gentoo-r2 x86_64) ================================================================= System uname: 2.6.13-gentoo-r2 x86_64 AMD Athlon(tm) 64 Processor 3200+ Gentoo Base System version 1.12.0_pre8 ccache version 2.4 [disabled] dev-lang/python: 2.3.5, 2.4.2 sys-apps/sandbox: 1.2.13 sys-devel/autoconf: 2.13, 2.59-r7 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6 sys-devel/binutils: 2.16.1 sys-devel/libtool: 1.5.20 virtual/os-headers: 2.6.11-r2 ACCEPT_KEYWORDS="amd64 ~amd64" AUTOCLEAN="yes" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=k8 -O2 -fweb -ftracer -fpeel-loops -msse3 -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3.4/env /usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/kde/3/share/config /usr/lib/X11/xkb /usr/share/config /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" CXXFLAGS="-march=k8 -O2 -fweb -ftracer -fpeel-loops -msse3 -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="autoconfig distlocks sandbox sfperms strict" GENTOO_MIRRORS="ftp://ftp.tu-clausthal.de/pub/linux/gentoo/" LC_ALL="de_DE@euro" LINGUAS="de" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="amd64 S3TC X acpi alsa audiofile avi bash-completion berkdb bitmap-fonts bluetooth bzip2 cairo cdparanoia cdr cpudetection crypt curl dvd dvdr dvdread emboss emul-linux-x86 encode exif ffmpeg fftw foomaticdb fortran ftp gif gimp glitz glut glx gnokii gpm gstreamer gtk gtk2 icq id3 imagemagick imlib irmc jabber java javascrip jp2 jpeg jpeg2k kde kdeenablefinal kdepim lame lesstif libwww lm_sensors lzo lzw lzw-tiff mad matroska mjpeg mmap mng motif mp3 mpeg mpeg2 mplayer mysql ncurses nls no-old-linux nocd nosendmail nowin nptl nsplugin nvidia offensive ogg openal opengl oscar pam pdflib perl player png posix python qt quicktime rar readline reiserfs scanner sdl sendfile sharedmem sms sndfile sockets spell ssl stencil-buffer subtitles svg sysfs tcpd tga theora tiff transcode truetype truetype-fonts type1 type1-fonts unicode usb userlocales v4l v4l2 vcd videos visualization vorbis wmf xanim xine xml xml2 xpm xrandr xsl xv xvid xvmc yv12 zlib zvbi linguas_de userland_GNU kernel_linux elibc_glibc" Unset: ASFLAGS, CTARGET, LANG, LDFLAGS, PORTDIR_OVERLAY as you can see, makeopts is at -j2 > > (Note that the unsermake thing could compound the issue here, because as I > said, it's better at finding things to run in parallel than the normal > make system is.) > > 5. I'm now running gcc-4.0.1, and have been compiling kde with > gcc-4.0.0-preX or later since kde-3.4.0. gcc-4.x is still package.mask-ed > on Gentoo, because some packages still don't compile with it. Of course, > that's easily worked around because Gentoo slots gcc, so I have the latest > gcc-3.4.x installed, in addition to gcc-4.x, and can (and do) easily > switch between them using gcc-config. However, the fact that gcc-4 is > still masked for Gentoo, means you probably aren't running it, while I am, > and that's what I compile kde with. The 4.x version is enough different > from 3.4.x that memory use can be expected to be rather different as well. > It's quite possible that the kdeenablefinal stuff requires even more > memory with gcc-3.x than it does with the 4.x I've been successfully > using. hm, I read some stuff on anandtech, that shows, that the apps compiled with gcc4 are a LOT slower than apps compiled with 3.4 on the amd64 platform. So I stay away from it, until I see some numbers, that convince me to the opposite - and until I can be sure, that almost everything builds with it ;) > > 7. I don't do my kernels thru Gentoo, preferring instead to use the > kernel straight off of kernel.org, You say kernel 2.6.13-r2, the r2 > indicating a Gentoo revision, but you don't say /which/ Gentoo kernel you > are running. The VMM is complex enough and has a wide enough variety of > patches circulating for it, that it's possible you hit a bug that wasn't > in the mainline kernel.org kernel that I'm running. Or... it may be some > other factor in our differing kernel configs. yes I said, at the bottom of my mail: kernel is 2.6.13-r2 > ... > > Now to the theory. Why would OOM trigger when you had all that free swap? > There are two possible explanations I am aware of and maybe others that > I'm not. > > 1. "Memory allocation" is a verb as well as a noun. > > We know that enablefinal uses lots of memory. The USE flag description > mentions that and we've discovered it to be /very/ true. If you run > ksysguard on your panel as I do, and monitor memory using it as I do (or > run a VT with a top session running if compiling at the text console), you > are also aware that memory use during compile sessions, particularly KDE > compile sessions with enablefinal set, varies VERY drastically! From my > observations, each "job" will at times eat more and more memory, until > with kmail in particular, multiple jobs are taking well over 200MB of > memory a piece! (See why I mentioned parallel jobs above? At 200, > possibly 300+ MB apiece, multiple parallel jobs eat up the memory VERY > fast!) After grabbing more and more memory for awhile, a job will > suddenly complete and release it ALL at once. The memory usage graph will > suddenly drop multiple hundreds of megabytes -- for ONE job! i watched the memory consumption with gkrellm2. At first, there were several hundered mb free, dropping fast to ~150mb free, which droppend slower to 20-50mb free. There it was 'locked' for some time, when suddenly the oom-killer sprang in (I did not watch gkrellm continously, even with a 3200+ kdepim takes more time to built, than I can watch gkrellm without a break). But the behaviour was the same for 512mb or 1 gb of ram. > Well, during the memory usage increase phase, each job will allocate more > and more memory, a chunk at a time. It's possible (tho not likely from my > observations of this particular usage pattern) that an app could want X MB > of memory all at once, in ordered to complete the task. Until it gets > that memory it can't go any further, the task it is trying to do is half > complete so it can't release any memory either, without losing what it has > already done. If the allocation request is big enough, (or you have > several of them in parallel all at the same time that together are big > enough), it can cause the OOM to trigger even with what looks like quite a > bit of free memory left, because all available cache and other memory that > can be freed has already been freed, and no app can continue to the point > of being able to release memory, without grabbing some memory first. If > one of them is wanting a LOT of memory, and the OOM killer isn't killing > it off first (there are various OOM killer algorithms out there, some > using different factors for picking the app to die than others), stuff > will start dieing to allow the app wanting all that memory to get it. > > Of course, it could also be very plainly a screwed up VMM or OOM killer, > as well. These things aren't exactly simple to get right... and if gcc > took an unexpected optimization that has side effects... > > 2. There is memory and there is "memory", and then there is 'memory' and > "'memory'" and '"memory"' as well. <g> > > There is of course the obvious difference between real/physical and > swap/virtual memory, with real memory being far faster (while at the same > time being slower than L2 cache, which is slower than L1 cache, which is > slower than the registers, which can be accessed at full CPU speed, but > that's beside the point for this discussion). > > That's only the tip of the iceberg, however. From the software's > perspective, that division mainly affects locked memory vs swappable > memory. The kernel is always locked memory -- it cannot be swapped, even > drivers that are never used, the reason it makes sense to keep your kernel > as small as possible, leaving more room in real memory for programs to > use. Depending on your kernel and its configuration, various forms of > RAMDISK, ramfs vs tmpfs vs ... may be locked (or not). Likewise, some > kernel patches and configs make it easier or harder for applications to > lock memory as well. Maybe a complicating factor here is that you had a > lot of locked memory and the compile process required more locked memory > than was left? I'm not sure how much locked memory a normal process on a > normal kernel can have, if any, but given both that and the fact that the > kernel you were running is unknown, it's a possibility. I don't use ramdisks, and the only tempfs user is udev - with ~180kb used. > > Then there are the "memory zones". Fortunately, amd64 is less complicated > in this respect than x86. However, various memory zones do still exist, > and not only do some things require memory in a specific zone, but it can > be difficult to transfer in-use memory from one zone to another, even > where it COULD be placed in a different zone. Up until earlier this > year, it was often impossible to transfer memory between zones without > using the backing store (swap). That was the /only/ way possible! > However, as I said, amd64 is less complicated in this respect than x86, so > memory zones weren't likely the issue here -- unless something was going > wrong, of of course. > > Finally, there's the "contiguous memory" issue. Right after boot, your > system has lots of free memory, in large blobs of contiguous pages. It's > easy to get contiguous memory allocated in blocks of 256, 512, and 1024 > pages at once. As uptime increases, however, memory gets fragmented thru > normal use. A system that has been up awhile will have far fewer 1024 > page blocks immediately available for use, and fewer 512 and 256 page > blocks as well. Total memory available may be the same, but if it's all in > 1 and 2 page blocks, it'll take some serious time to move stuff around to > allocate a 1024 page contiguous block -- if it's even possible to do at > all. Given the type of memory access patterns I've observed during kde > merges with enablefinal on, while I'm not technically skilled enough to > verify my suspicions, of the listed possibilities which are those I know, > I believe this to be the most likely culprit, the reason the OOM killer > was activating even while swap (and possibly even main memory) was still > free. > > I'm sure there are other variations on the theme, however, other memory > type restrictions, and it may have been one of /those/ that it just so > happened came up short at the time you needed it. In any case, as should > be quite plain by now, a raw "available memory" number doesn't give > /anything/ /even/ /close/ to the entire picture, at the detail needed to > fully grok why the OOM killer was activating, when overall memory wasn't > apparently in short supply at all. > > I should also mention those numbers I snipped. I know enough to just > begin to make a bit of sense out of them, but not enough to /understand/ > them, at least to the point of understanding what they are saying is > wrong. You can see the contiguous memory block figures for each of the > DMA and normal memory zones. 4kB pages, so the 1024 page blocks are 4MB. > I just don't understand enough about the internals to grok either them or > this log snip, however. I know the general theories and hopefully > explained them well enough, but don't know how they apply concretely. > Perhaps someone else does. > thanks for your time - I will try vanilla kernel.org kernels this weekend and if there is any difference, I will post again. Glück Auf Volker -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-amd64] Re: oom killer problems 2005-09-29 16:27 ` Hemmann, Volker Armin @ 2005-09-30 0:40 ` John Myers 0 siblings, 0 replies; 7+ messages in thread From: John Myers @ 2005-09-30 0:40 UTC (permalink / raw To: gentoo-amd64 [-- Attachment #1: Type: text/plain, Size: 268 bytes --] On Thursday 29 September 2005 09:27, Hemmann, Volker Armin wrote: > yeah, but with my 32bit system even 512mb were enough for building kdepim > with kdeenablefinal > 32-bit code is a lot smaller than 64-bit code, both the size of the code, and the memory usage [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-amd64] oom killer problems - solved 2005-09-28 20:35 [gentoo-amd64] oom killer problems Hemmann, Volker Armin 2005-09-29 7:14 ` [gentoo-amd64] " Duncan @ 2005-10-01 20:47 ` Hemmann, Volker Armin 2005-10-01 22:26 ` Hemmann, Volker Armin 1 sibling, 1 reply; 7+ messages in thread From: Hemmann, Volker Armin @ 2005-10-01 20:47 UTC (permalink / raw To: gentoo-amd64 Hi, I tried vanilla kernel org kernel (2.6.14-rc2) and no oom-kills occur! Hmm, does that mean, that there is a bug in the gentoo-sources? -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-amd64] oom killer problems - solved 2005-10-01 20:47 ` [gentoo-amd64] oom killer problems - solved Hemmann, Volker Armin @ 2005-10-01 22:26 ` Hemmann, Volker Armin 2005-10-01 22:30 ` Hemmann, Volker Armin 0 siblings, 1 reply; 7+ messages in thread From: Hemmann, Volker Armin @ 2005-10-01 22:26 UTC (permalink / raw To: gentoo-amd64 On Saturday 01 October 2005 22:47, Hemmann, Volker Armin wrote: > Hi, > > I tried vanilla kernel org kernel (2.6.14-rc2) > > and no oom-kills occur! > > Hmm, does that mean, that there is a bug in the gentoo-sources? no, it is not a gentoo-sources bug. UNORDERED_IO=y was the culprit. Changing it to n made kdepim built without problems. -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-amd64] oom killer problems - solved 2005-10-01 22:26 ` Hemmann, Volker Armin @ 2005-10-01 22:30 ` Hemmann, Volker Armin 0 siblings, 0 replies; 7+ messages in thread From: Hemmann, Volker Armin @ 2005-10-01 22:30 UTC (permalink / raw To: gentoo-amd64 On Sunday 02 October 2005 00:26, Hemmann, Volker Armin wrote: > On Saturday 01 October 2005 22:47, Hemmann, Volker Armin wrote: > > Hi, > > > > I tried vanilla kernel org kernel (2.6.14-rc2) > > > > and no oom-kills occur! > > > > Hmm, does that mean, that there is a bug in the gentoo-sources? > > no, it is not a gentoo-sources bug. > > UNORDERED_IO=y > was the culprit. > Changing it to n made kdepim built without problems. I was to fast - I got an oom-kill again. I just not 'saw' it, because this time, it only killed kdelauncher and nothing I was seeing directly :( hmm -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-10-01 22:32 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-09-28 20:35 [gentoo-amd64] oom killer problems Hemmann, Volker Armin 2005-09-29 7:14 ` [gentoo-amd64] " Duncan 2005-09-29 16:27 ` Hemmann, Volker Armin 2005-09-30 0:40 ` John Myers 2005-10-01 20:47 ` [gentoo-amd64] oom killer problems - solved Hemmann, Volker Armin 2005-10-01 22:26 ` Hemmann, Volker Armin 2005-10-01 22:30 ` Hemmann, Volker Armin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox