public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-user] Machine hangs up with out of memory
@ 2021-04-28  9:58 Kai Peter
  2021-04-30  1:30 ` Adam Carter
  0 siblings, 1 reply; 6+ messages in thread
From: Kai Peter @ 2021-04-28  9:58 UTC (permalink / raw
  To: Gentoo User

Hi,

I have an issue with a machine where I'm not able to detect the real 
root cause. It hangs up totally. It seems like it was running out of 
memory - but why? Hopefully somebody can give me some insight. As far I 
can see right now, it hangs up a few hours after an `emerge --update 
--newuse --deep --with-bdeps=y @world`.

The machine is an Intel Atom with 8 GB RAM (physical, max) and 24 GB 
swap (a file). So 32 GB RAM in total. It has a 250GB SSD. It runs 
gentoo-sources-4.14.83 build with genkernel. Portage uses the stable 
tree only. It basically provides the hardware for a qemu VM which does 
the network management: primary ns, dhcp, apache ssl proxy. This VM uses 
4 GB RAM and has 8 GB swap file. The VM works smoothly. The atom machine 
itself acts further as basic nfs server to an independent dedicated 
server - which does the (re)exports - and as secondary ns. For this I'm 
convinced that 32GB RAM total have to be enough - correct me if I'm 
wrong!

The issue starts round about in February (IIRC). The update of gcc-10.2 
did fail. I have /var/tmp/portage on tmpfs - I did increase the size in 
fstab from 8 to 16 GB. Afterwards gcc build successfully.

After two hang-ups I did increase the swap from 8 to 24 GB. It doesn't 
help. Here is a complete log from /var/log/messages:

Apr 28 05:35:57 Syrin kernel: [1454017.499919] isc-net-0000 invoked 
oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), 
nodemask=(null),  order=0, oom_score_adj=0
Apr 28 05:35:57 Syrin kernel: [1454017.499925] isc-net-0000 cpuset=/ 
mems_allowed=0
Apr 28 05:35:57 Syrin kernel: [1454017.499933] CPU: 0 PID: 27685 Comm: 
isc-net-0000 Not tainted 4.14.83-gentoo #1
Apr 28 05:35:57 Syrin kernel: [1454017.499935] Hardware name: MSI 
MS-7877/J1900I, BIOS V1.2 03/25/2014
Apr 28 05:35:57 Syrin kernel: [1454017.499936] Call Trace:
Apr 28 05:35:57 Syrin kernel: [1454017.499948]  dump_stack+0x67/0x98
Apr 28 05:35:57 Syrin kernel: [1454017.499954]  dump_header+0x94/0x20c
Apr 28 05:35:57 Syrin kernel: [1454017.499958]  
oom_kill_process+0x24a/0x420
Apr 28 05:35:57 Syrin kernel: [1454017.499962]  ? 
oom_badness.part.9+0xd3/0x150
Apr 28 05:35:57 Syrin kernel: [1454017.499965]  out_of_memory+0xf9/0x290
Apr 28 05:35:57 Syrin kernel: [1454017.499968]  
__alloc_pages_nodemask+0xf48/0xff0
Apr 28 05:35:57 Syrin kernel: [1454017.499974]  
filemap_fault+0x294/0x4c0
Apr 28 05:35:57 Syrin kernel: [1454017.499979]  
ext4_filemap_fault+0x2c/0x40
Apr 28 05:35:57 Syrin kernel: [1454017.499983]  __do_fault+0x1f/0xb0
Apr 28 05:35:57 Syrin kernel: [1454017.499986]  
__handle_mm_fault+0x3ed/0xad0
Apr 28 05:35:57 Syrin kernel: [1454017.499991]  
handle_mm_fault+0xaa/0x1f0
Apr 28 05:35:57 Syrin kernel: [1454017.499996]  
__do_page_fault+0x250/0x4f0
Apr 28 05:35:57 Syrin kernel: [1454017.500000]  ? page_fault+0x2f/0x50
Apr 28 05:35:57 Syrin kernel: [1454017.500003]  page_fault+0x45/0x50
Apr 28 05:35:57 Syrin kernel: [1454017.500005] RIP: 0000:          
(null)
Apr 28 05:35:57 Syrin kernel: [1454017.500007] RSP: 
12f83750:0000000000000001 EFLAGS: 7ffa12f837a0
Apr 28 05:35:57 Syrin kernel: [1454017.500010] Mem-Info:
Apr 28 05:35:57 Syrin kernel: [1454017.500017] active_anon:1694713 
inactive_anon:211859 isolated_anon:0
Apr 28 05:35:57 Syrin kernel: [1454017.500017]  active_file:328 
inactive_file:344 isolated_file:32
Apr 28 05:35:57 Syrin kernel: [1454017.500017]  unevictable:1374 dirty:0 
writeback:0 unstable:0
Apr 28 05:35:57 Syrin kernel: [1454017.500017]  slab_reclaimable:4480 
slab_unreclaimable:7449
Apr 28 05:35:57 Syrin kernel: [1454017.500017]  mapped:1071 shmem:3 
pagetables:16352 bounce:0
Apr 28 05:35:57 Syrin kernel: [1454017.500017]  free:11655 free_pcp:534 
free_cma:0
Apr 28 05:35:57 Syrin kernel: [1454017.500021] Node 0 
active_anon:6778852kB inactive_anon:847436kB active_file:1312kB 
inactive_file:1376kB unevictable:5496kB isolated(anon):0kB 
isolated(file):128kB mapped:4284kB dirty:0kB writeback:0kB shmem:12kB 
writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
Apr 28 05:35:57 Syrin kernel: [1454017.500026] DMA free:15836kB min:20kB 
low:32kB high:44kB active_anon:0kB inactive_anon:0kB active_file:0kB 
inactive_file:0kB unevictable:0kB writepending:0kB present:15920kB 
managed:15836kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB 
free_pcp:0kB local_pcp:0kB free_cma:0kB
Apr 28 05:35:57 Syrin kernel: [1454017.500027] lowmem_reserve[]: 0 2664 
7647 7647
Apr 28 05:35:57 Syrin kernel: [1454017.500036] DMA32 free:23732kB 
min:3892kB low:6620kB high:9348kB active_anon:2319992kB 
inactive_anon:363260kB active_file:0kB inactive_file:92kB 
unevictable:0kB writepending:0kB present:2825512kB managed:2734888kB 
mlocked:0kB kernel_stack:140kB pagetables:22572kB bounce:0kB 
free_pcp:1136kB local_pcp:476kB free_cma:0kB
Apr 28 05:35:57 Syrin kernel: [1454017.500037] lowmem_reserve[]: 0 0 
4982 4982
Apr 28 05:35:57 Syrin kernel: [1454017.500045] Normal free:7052kB 
min:7284kB low:12384kB high:17484kB active_anon:4458860kB 
inactive_anon:484176kB active_file:1368kB inactive_file:1568kB 
unevictable:5496kB writepending:0kB present:5242880kB managed:5102420kB 
mlocked:5496kB kernel_stack:2276kB pagetables:42836kB bounce:0kB 
free_pcp:1000kB local_pcp:724kB free_cma:0kB
Apr 28 05:35:57 Syrin kernel: [1454017.500046] lowmem_reserve[]: 0 0 0 0
Apr 28 05:35:57 Syrin kernel: [1454017.500050] DMA: 1*4kB (U) 1*8kB (U) 
1*16kB (U) 0*32kB 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB 
(U) 1*2048kB (M) 3*4096kB (M) = 15836kB
Apr 28 05:35:57 Syrin kernel: [1454017.500070] DMA32: 268*4kB (UE) 
1562*8kB (UE) 645*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 
0*1024kB 0*2048kB 0*4096kB = 23888kB
Apr 28 05:35:57 Syrin kernel: [1454017.500084] Normal: 293*4kB (UME) 
490*8kB (UME) 152*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 
0*1024kB 0*2048kB 0*4096kB = 7524kB
Apr 28 05:35:57 Syrin kernel: [1454017.500100] Node 0 hugepages_total=0 
hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Apr 28 05:35:57 Syrin kernel: [1454017.500101] 1818 total pagecache 
pages
Apr 28 05:35:57 Syrin kernel: [1454017.500110] 86 pages in swap cache
Apr 28 05:35:57 Syrin kernel: [1454017.500112] Swap cache stats: add 
13659627, delete 13659513, find 1129210/1793572
Apr 28 05:35:57 Syrin kernel: [1454017.500113] Free swap  = 0kB
Apr 28 05:35:57 Syrin kernel: [1454017.500114] Total swap = 25165820kB
Apr 28 05:35:57 Syrin kernel: [1454017.500115] 2021078 pages RAM
Apr 28 05:35:57 Syrin kernel: [1454017.500116] 0 pages 
HighMem/MovableOnly
Apr 28 05:35:57 Syrin kernel: [1454017.500117] 57792 pages reserved
Apr 28 05:35:57 Syrin kernel: [1454017.500118] 0 pages hwpoisoned
Apr 28 05:35:57 Syrin kernel: [1454017.500119] [ pid ]   uid  tgid 
total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Apr 28 05:35:57 Syrin kernel: [1454017.500125] [ 4009]     0  4009    
21207      320       7       3       20             0 apcupsd
Apr 28 05:35:57 Syrin kernel: [1454017.500128] [ 4043]     0  4043    
54371       48      12       3      220             0 rsyslogd
Apr 28 05:35:57 Syrin kernel: [1454017.500131] [ 4084]     0  4084     
1938      178       6       3       18             0 fcron
Apr 28 05:35:57 Syrin kernel: [1454017.500133] [ 4400]     0  4400    
17733     1376       8       3        0             0 ntpd
Apr 28 05:35:57 Syrin kernel: [1454017.500136] [ 4429]     0  4429     
2789        0       7       3      103             0 rsync
Apr 28 05:35:57 Syrin kernel: [1454017.500139] [ 4460]     0  4460     
1689      241       6       3      112         -1000 sshd
Apr 28 05:35:57 Syrin kernel: [1454017.500142] [ 4693]     0  4693  
1067352    80035    1546       7   664044             0 qemu-system-x86
Apr 28 05:35:57 Syrin kernel: [1454017.500145] [ 4863]     0  4863     
1905      465       7       3       43             0 agetty
Apr 28 05:35:57 Syrin kernel: [1454017.500148] [ 4864]     0  4864     
1905      433       7       3       44             0 agetty
Apr 28 05:35:57 Syrin kernel: [1454017.500151] [ 4865]     0  4865     
1905      431       7       3       43             0 agetty
Apr 28 05:35:57 Syrin kernel: [1454017.500153] [ 4866]     0  4866     
1905      443       7       3       43             0 agetty
Apr 28 05:35:57 Syrin kernel: [1454017.500156] [ 4867]     0  4867     
1905      453       7       3       43             0 agetty
Apr 28 05:35:57 Syrin kernel: [1454017.500159] [ 4868]     0  4868     
1905      433       8       3       43             0 agetty
Apr 28 05:35:57 Syrin kernel: [1454017.500162] [27439]     0 27439      
675      295       5       3       60             0 rpcbind
Apr 28 05:35:57 Syrin kernel: [1454017.500164] [27509]     0 27509      
750      419       5       3       80             0 rpc.idmapd
Apr 28 05:35:57 Syrin kernel: [1454017.500167] [27520] 65534 27520      
693      421       5       3       53             0 rpc.statd
Apr 28 05:35:57 Syrin kernel: [1454017.500170] [27583]     0 27583      
819      365       5       3      108             0 rpc.mountd
Apr 28 05:35:57 Syrin kernel: [1454017.500173] [27684]    40 27684  
7570889  1823291   14584      32  5626031             0 named
Apr 28 05:35:57 Syrin kernel: [1454017.500176] [10479]     0 10479     
3262      449       7       3      182             0 udevd
Apr 28 05:35:57 Syrin kernel: [1454017.500179] [ 2923]     0  2923     
1938      318       6       3       10             0 fcron
Apr 28 05:35:57 Syrin kernel: [1454017.500182] [ 2924]     0  2924      
981      406       5       3        0             0 backup.cron
Apr 28 05:35:57 Syrin kernel: [1454017.500184] [ 2930]     0  2930      
981      462       5       3        0             0 rsync.sh
Apr 28 05:35:57 Syrin kernel: [1454017.500187] [ 2965]     0  2965    
13185     1261      30       3        0             0 rsync
Apr 28 05:35:57 Syrin kernel: [1454017.500190] [ 2966]     0  2966      
574      158       5       3        0             0 tee
Apr 28 05:35:57 Syrin kernel: [1454017.500192] [ 2968]     0  2968     
2038      482       7       3        0             0 rsync
Apr 28 05:35:57 Syrin kernel: [1454017.500195] [ 2974]     0  2974    
14142     1189      31       3        0             0 rsync
Apr 28 05:35:57 Syrin kernel: [1454017.500198] [ 2995]     0  2995     
1938      271       6       3        7             0 fcron
Apr 28 05:35:57 Syrin kernel: [1454017.500201] [ 2996]     0  2996      
981       68       6       3        0             0 sh
Apr 28 05:35:57 Syrin kernel: [1454017.500203] Out of memory: Kill 
process 27684 (named) score 904 or sacrifice child
Apr 28 05:35:57 Syrin kernel: [1454017.500234] Killed process 27684 
(named) total-vm:30283556kB, anon-rss:7293164kB, file-rss:0kB, 
shmem-rss:0kB
Apr 28 05:36:00 Syrin kernel: [1454019.937636] oom_reaper: reaped 
process 27684 (named), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

All the processes after udevd are just from this day (backup job).

For comparison, the last call trace before was nearly the same:

Apr 15 18:26:37 Syrin kernel: [376620.710330] Call Trace:
Apr 15 18:26:37 Syrin kernel: [376620.710341]  dump_stack+0x67/0x98
Apr 15 18:26:37 Syrin kernel: [376620.710347]  dump_header+0x94/0x20c
Apr 15 18:26:37 Syrin kernel: [376620.710352]  
oom_kill_process+0x24a/0x420
Apr 15 18:26:37 Syrin kernel: [376620.710355]  ? 
oom_badness.part.9+0xd3/0x150
Apr 15 18:26:37 Syrin kernel: [376620.710358]  out_of_memory+0xf9/0x290
Apr 15 18:26:37 Syrin kernel: [376620.710361]  
__alloc_pages_nodemask+0xf48/0xff0
Apr 15 18:26:37 Syrin kernel: [376620.710367]  filemap_fault+0x294/0x4c0
Apr 15 18:26:37 Syrin kernel: [376620.710372]  
ext4_filemap_fault+0x2c/0x40
Apr 15 18:26:37 Syrin kernel: [376620.710376]  __do_fault+0x1f/0xb0
Apr 15 18:26:37 Syrin kernel: [376620.710380]  
__handle_mm_fault+0x3ed/0xad0
Apr 15 18:26:37 Syrin kernel: [376620.710385]  
handle_mm_fault+0xaa/0x1f0
Apr 15 18:26:37 Syrin kernel: [376620.710390]  
__do_page_fault+0x250/0x4f0
Apr 15 18:26:37 Syrin kernel: [376620.710394]  ? page_fault+0x2f/0x50
Apr 15 18:26:37 Syrin kernel: [376620.710396]  page_fault+0x45/0x50
Apr 15 18:26:37 Syrin kernel: [376620.710400] RIP: 
21e50088:0x7f41aedee150
Apr 15 18:26:37 Syrin kernel: [376620.710401] RSP: 
21e50070:0000000000000000 EFLAGS: 7f41aedee150
Apr 15 18:26:37 Syrin kernel: [376620.710405] Mem-Info:
Apr 15 18:26:37 Syrin kernel: [376620.710411] active_anon:1695970 
inactive_anon:212020 isolated_anon:0
Apr 15 18:26:37 Syrin kernel: [376620.710411]  active_file:339 
inactive_file:320 isolated_file:0
Apr 15 18:26:37 Syrin kernel: [376620.710411]  unevictable:1374 dirty:0 
writeback:0 unstable:0
Apr 15 18:26:37 Syrin kernel: [376620.710411]  slab_reclaimable:4311 
slab_unreclaimable:7552
Apr 15 18:26:37 Syrin kernel: [376620.710411]  mapped:1059 shmem:3 
pagetables:16264 bounce:0
Apr 15 18:26:37 Syrin kernel: [376620.710411]  free:11840 free_pcp:0 
free_cma:0

Unfortunately I don't understand all the details. Any help is highly 
appreciated.

I assume it has something to do with tmpfs which will not be freed. Just 
an assumption, I'm searching for clarity, not try and error.

Thanks
Kai




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-05-02  0:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <C8P97-495-1@gated-at.bofh.it>
     [not found] ` <d9456416-d744-43a4-a9d0-6a70eb8ea0d6@iname.com>
2021-04-29 18:01   ` [gentoo-user] Machine hangs up with out of memory Kai Peter
2021-04-28  9:58 Kai Peter
2021-04-30  1:30 ` Adam Carter
2021-04-30 10:09   ` Michael
2021-04-30 23:07     ` Kai Peter
2021-05-02  0:39     ` Adam Carter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox