public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-user] OOM memory issues
@ 2014-09-18 15:48 James
  2014-09-18 16:27 ` Kerin Millar
  0 siblings, 1 reply; 8+ messages in thread
From: James @ 2014-09-18 15:48 UTC (permalink / raw
  To: gentoo-user

Hello,

Out Of Memory seems to invoke mysterious processes that kill
such offending processes. OOM seems to be a common problem
that pops up over and over again within the clustering communities.


I would greatly appreciate (gentoo) illuminations on the OOM issues;
both historically and for folks using/testing systemd. Not a flame_a_thon,
just some technical information, as I need to understand these
issues more deeply, how to find, measure and configure around OOM issues,  
in my quest for gentoo clustering.


Also, I posted a bug to add Ftrace/trace-cmd/kernelshark
to gentoo's ebuilds. They are looking for a maintainer for
this *wonderful* new_age kernel tool for Function Tracing.
It's a wee_bit out of my league (really?) so anyone wanting to
create an overlay, would be fantastic. There's lots of links
in the bug report; even videos to see what it can be used for.


Bug-517428

All information is welcome. Please, no flames.

James



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [gentoo-user] OOM memory issues
  2014-09-18 15:48 [gentoo-user] OOM memory issues James
@ 2014-09-18 16:27 ` Kerin Millar
  2014-09-18 16:56   ` Godzil
                     ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Kerin Millar @ 2014-09-18 16:27 UTC (permalink / raw
  To: gentoo-user

On 18/09/2014 16:48, James wrote:
> Hello,
>
> Out Of Memory seems to invoke mysterious processes that kill
> such offending processes. OOM seems to be a common problem
> that pops up over and over again within the clustering communities.
>
>
> I would greatly appreciate (gentoo) illuminations on the OOM issues;
> both historically and for folks using/testing systemd. Not a flame_a_thon,
> just some technical information, as I need to understand these
> issues more deeply, how to find, measure and configure around OOM issues,
> in my quest for gentoo clustering.

The need for the OOM killer stems from the fact that memory can be 
overcommitted. These articles may prove informative:

http://lwn.net/Articles/317814/
http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html

In my case, the most likely trigger - as rare as it is - would be a 
runaway process that consumes more than its fair share of RAM. 
Therefore, I make a point of adjusting the score of production-critical 
applications to ensure that they are less likely to be culled.

If your cases are not pathological, you could increase the amount of 
memory, be it by additional RAM or additional swap [1]. Alternatively, 
if you are able to precisely control the way in which memory is 
allocated and can guarantee that it will not be exhausted, you may elect 
to disable overcommit, though I would not recommend it.

With NUMA, things may be more complicated because there is the potential 
for a particular memory node to be exhausted, unless memory interleaving 
is employed. Indeed, I make a point of using interleaving for MySQL, 
having gotten the idea from the Twitter fork.

Finally, make sure you are using at least Linux 3.12, because some 
improvements have been made there [2].

--Kerin

[1] At a pinch, additional swap may be allocated as a file
[2] https://lwn.net/Articles/562211/#oom


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [gentoo-user] OOM memory issues
  2014-09-18 16:27 ` Kerin Millar
@ 2014-09-18 16:56   ` Godzil
  2014-09-18 18:27   ` [gentoo-user] " James
  2014-09-18 19:13   ` [gentoo-user] " Rich Freeman
  2 siblings, 0 replies; 8+ messages in thread
From: Godzil @ 2014-09-18 16:56 UTC (permalink / raw
  To: gentoo-user@lists.gentoo.org

You could also disable the overcommitment so that an app that ask for too much memory will be denied (you know the possible NULL pointer malloc could return. With overcommit, it will never return NULL whatever the memory status is. Without this, all requested memory is really allocated, and malloc will fail if it is unable to reserve the asked memory size.

 

> Le 18 sept. 2014 à 17:27, Kerin Millar <kerframil@fastmail.co.uk> a écrit :
> 
> 
>> On 18/09/2014 16:48, James wrote:
>> Hello,
>> 
>> Out Of Memory seems to invoke mysterious processes that kill
>> such offending processes. OOM seems to be a common problem
>> that pops up over and over again within the clustering communities.
>> 
>> 
>> I would greatly appreciate (gentoo) illuminations on the OOM issues;
>> both historically and for folks using/testing systemd. Not a flame_a_thon,
>> just some technical information, as I need to understand these
>> issues more deeply, how to find, measure and configure around OOM issues,
>> in my quest for gentoo clustering.
> 
> The need for the OOM killer stems from the fact that memory can be overcommitted. These articles may prove informative:
> 
> http://lwn.net/Articles/317814/
> http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html
> 
> In my case, the most likely trigger - as rare as it is - would be a runaway process that consumes more than its fair share of RAM. Therefore, I make a point of adjusting the score of production-critical applications to ensure that they are less likely to be culled.
> 
> If your cases are not pathological, you could increase the amount of memory, be it by additional RAM or additional swap [1]. Alternatively, if you are able to precisely control the way in which memory is allocated and can guarantee that it will not be exhausted, you may elect to disable overcommit, though I would not recommend it.
> 
> With NUMA, things may be more complicated because there is the potential for a particular memory node to be exhausted, unless memory interleaving is employed. Indeed, I make a point of using interleaving for MySQL, having gotten the idea from the Twitter fork.
> 
> Finally, make sure you are using at least Linux 3.12, because some improvements have been made there [2].
> 
> --Kerin
> 
> [1] At a pinch, additional swap may be allocated as a file
> [2] https://lwn.net/Articles/562211/#oom
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [gentoo-user] Re: OOM memory issues
  2014-09-18 16:27 ` Kerin Millar
  2014-09-18 16:56   ` Godzil
@ 2014-09-18 18:27   ` James
  2014-09-18 19:16     ` Kerin Millar
  2014-09-18 19:13   ` [gentoo-user] " Rich Freeman
  2 siblings, 1 reply; 8+ messages in thread
From: James @ 2014-09-18 18:27 UTC (permalink / raw
  To: gentoo-user

Kerin Millar <kerframil <at> fastmail.co.uk> writes:


> The need for the OOM killer stems from the fact that memory can be 
> overcommitted. These articles may prove informative:

> http://lwn.net/Articles/317814/

Yea I saw this article.  Its dated February 4, 2009. How much has
changed with the kernel/configs/userspace mechanism? Nothing, everything?


>
http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html

Nice to know.

> In my case, the most likely trigger - as rare as it is - would be a 
> runaway process that consumes more than its fair share of RAM. 
> Therefore, I make a point of adjusting the score of production-critical 
> applications to ensure that they are less likely to be culled.

Ok I see the manual tools for OOM-killer. Are there any graphical tools
for monitoring, configuring, and control of OOM related files and target
processes? All of this performed by hand?


> If your cases are not pathological, you could increase the amount of 
> memory, be it by additional RAM or additional swap [1]. Alternatively, 
> if you are able to precisely control the way in which memory is 
> allocated and can guarantee that it will not be exhausted, you may elect 
> to disable overcommit, though I would not recommend it.

I do not have a problem. It keeps popping up in my clustering research,
frequently. Many of the clustering environments have heavy memory
requirements, so this will eventually be monitored, diagnosed and managed,
real time, in the cluser softwares, such as load balancing. These are
very new technologies, hence my need to understand both legacy current
issues and solutions. You cannot just always add resources. ONce set up
you have to dynamically manage resource consumption, or at least that
is what the current readings reveal.


> With NUMA, things may be more complicated because there is the potential 
> for a particular memory node to be exhausted, unless memory interleaving 
> is employed. Indeed, I make a point of using interleaving for MySQL, 
> having gotten the idea from the Twitter fork.

Well my first cluster is just (3) AMD-FX8350 with 32G ram each.
Once that is working, reasonably well, I'm sure I'll be adding
different (multi) processors to the mix, with differnt ram characteristis.
There is a *huge interest* in heterogenous clusters, including but
not limited to the GPU/APU hardware. So dynamic, real-time memory
managment is quintessentially important for successful clustering.
  

> Finally, make sure you are using at least Linux 3.12, because some 
> improvements have been made there [2].

yep, [1] I always set of gigs of swap and rarely use it, for critical
computations that must be fast. Many cluster folks are building
systems with both SSD and traditional (raid) HD setups. The SSD
could be partitioned for the cluster and swap. Lots of experimentation
on how best to deploy SSD with max_ram in systems for clusters is
ongoing.


Memory Management is a primary focus of Apache-Spark (in-memory)
computations. Spark can be use with Python, Java and Scala; so it is very cool. 


> --Kerin
> [1] At a pinch, additional swap may be allocated as a file
> [2] https://lwn.net/Articles/562211/#oom

(2) is also good to know.

thx,
James








^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [gentoo-user] OOM memory issues
  2014-09-18 16:27 ` Kerin Millar
  2014-09-18 16:56   ` Godzil
  2014-09-18 18:27   ` [gentoo-user] " James
@ 2014-09-18 19:13   ` Rich Freeman
  2014-09-19  2:12     ` [gentoo-user] " James
  2 siblings, 1 reply; 8+ messages in thread
From: Rich Freeman @ 2014-09-18 19:13 UTC (permalink / raw
  To: gentoo-user

On Thu, Sep 18, 2014 at 12:27 PM, Kerin Millar <kerframil@fastmail.co.uk> wrote:
>
> The need for the OOM killer stems from the fact that memory can be
> overcommitted. These articles may prove informative:
>

A big problem with Linux along these fronts is that we don't really
have good mechanisms for prioritizing memory use.  You can set hard
limits of course, which aren't flexible, but otherwise software is
trusted to just guess how much RAM it should use.

It would be nice if processes could allocate cache RAM, which could be
preferentially freed if the kernel deems necessary.  If some pages are
easier to regenerate than to swap, this could also be flagged (I have
a 50Mbps connection - I'd rather see my browser re-fetch pages than go
to disk when the disk is already busy).  There are probably a lot of
other ways that memory use could be optimized with hinting.

--
Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [gentoo-user] Re: OOM memory issues
  2014-09-18 18:27   ` [gentoo-user] " James
@ 2014-09-18 19:16     ` Kerin Millar
  2014-09-19  1:36       ` James
  0 siblings, 1 reply; 8+ messages in thread
From: Kerin Millar @ 2014-09-18 19:16 UTC (permalink / raw
  To: gentoo-user

On 18/09/2014 19:27, James wrote:
> Kerin Millar <kerframil <at> fastmail.co.uk> writes:
>
>
>> The need for the OOM killer stems from the fact that memory can be
>> overcommitted. These articles may prove informative:
>
>> http://lwn.net/Articles/317814/
>
> Yea I saw this article.  Its dated February 4, 2009. How much has
> changed with the kernel/configs/userspace mechanism? Nothing, everything?

A new tunable, "oom_score_adj", was added, which accepts values between 
0 and 1000.

https://github.com/torvalds/linux/commit/a63d83f#include/linux/oom.h

As mentioned there, the "oom_adj" tunable remains for reasons of 
backward compatibility. Setting one will adjust the other per the 
appropriate scale.

It doesn't look as though Karthikesan's proposal for a cgroup based 
controller was ever accepted.

--Kerin


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [gentoo-user] Re: OOM memory issues
  2014-09-18 19:16     ` Kerin Millar
@ 2014-09-19  1:36       ` James
  0 siblings, 0 replies; 8+ messages in thread
From: James @ 2014-09-19  1:36 UTC (permalink / raw
  To: gentoo-user

Kerin Millar <kerframil <at> fastmail.co.uk> writes:


> A new tunable, "oom_score_adj", was added, which accepts values between 
> 0 and 1000.

> https://github.com/torvalds/linux/commit/a63d83f#include/linux/oom.h


FANTASTIC! Exactly the sort of info I'm looking for learn the pass,
see what has been tried, how to configure it, and if it works/fails
when and why! Absolutely wonderful link!


> As mentioned there, the "oom_adj" tunable remains for reasons of 
> backward compatibility. Setting one will adjust the other per the 
> appropriate scale.

That said, the mechanism seem too simple minded to succeed in anything
but an extremely well monitored system. I think now the effort
particularly in clustering codes, is to only have basis memory monitoring
and control and leave the "fine grained" memory control needs to the 
clustering tools. The simple solution is there (in clustering) you just
priortize jobs (codes), migrate to systems with spare resources, and bump
other process to lower priority states. Also, there are (in-memory)
codes like Apache-Spark, that use (RDD) Resilient Distributed Data.

> It doesn't look as though Karthikesan's proposal for a cgroup based 
> controller was ever accepted.

I think many of the old kernel ideas, accepted or not, are being
"repackaged" in the clustering tools, or at least they are inspired
by these codes....

Dude, YOU are the main{}. Keep the info flowing, as I'm sure lots
of folks on this list are reading this .....

EXCELLENT!


> --Kerin

James





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [gentoo-user] Re: OOM memory issues
  2014-09-18 19:13   ` [gentoo-user] " Rich Freeman
@ 2014-09-19  2:12     ` James
  0 siblings, 0 replies; 8+ messages in thread
From: James @ 2014-09-19  2:12 UTC (permalink / raw
  To: gentoo-user

Rich Freeman <rich0 <at> gentoo.org> writes:




> A big problem with Linux along these fronts is that we don't really
> have good mechanisms for prioritizing memory use.  You can set hard
> limits of course, which aren't flexible, but otherwise software is
> trusted to just guess how much RAM it should use.

Exactamundo!
Besides fine grained controls I want it in a fat_boy controllable gui!
Clustering is where it's at. NOW much of the fuss I read
in the clustering groups, particularly Spark and other 
"in_memory" tools, is all about monitoring and managing
all types of memory and related issues. [1] 


> It would be nice if processes could allocate cache RAM, which could be
> preferentially freed if the kernel deems necessary.  If some pages are
> easier to regenerate than to swap, this could also be flagged (I have
> a 50Mbps connection - I'd rather see my browser re-fetch pages than go
> to disk when the disk is already busy).  There are probably a lot of
> other ways that memory use could be optimized with hinting.

I think you need to look into apache spark. It is exploding. Technology
to run certain codes 100% in memory looks to be a revolution, driven
by the mesos/spark clusters. [2] The weapons on top of mesos/spark
are Python, Java and Scala (in portage).


hth,
James

[1] https://issues.apache.org/jira/browse/SPARK-3535

[2] https://amplab.cs.berkeley.edu/

http://radar.oreilly.com/2014/06/a-growing-number-of-applications-are-being-built-with-spark.html



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-09-19  2:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-18 15:48 [gentoo-user] OOM memory issues James
2014-09-18 16:27 ` Kerin Millar
2014-09-18 16:56   ` Godzil
2014-09-18 18:27   ` [gentoo-user] " James
2014-09-18 19:16     ` Kerin Millar
2014-09-19  1:36       ` James
2014-09-18 19:13   ` [gentoo-user] " Rich Freeman
2014-09-19  2:12     ` [gentoo-user] " James

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox