[why oom_adj does not work] Re: Linux killed Kenny, bastard!

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* [why oom_adj does not work] Re: Linux killed Kenny, bastard!
  2009-01-12 15:51     ` Alan Cox
@ 2009-01-13 13:52       ` Evgeniy Polyakov
  2009-01-13 14:06         ` Alan Cox
  0 siblings, 1 reply; 22+ messages in thread
From: Evgeniy Polyakov @ 2009-01-13 13:52 UTC (permalink / raw
  To: Alan Cox; +Cc: Dave Jones, linux-kernel, Andrew Morton, Linus Torvalds

On Mon, Jan 12, 2009 at 03:51:08PM +0000, Alan Cox (alan@lxorguk.ukuu.org.uk) wrote:
> > Well, Kenny has to die, but if we still decide to change the world, here
> > is the fist step.
> 
> NAK this entire thing - we have an existing interface that does the job
> far better.

Mwahaha, I just checked how scores are calculated, so that userspace
could adjust them. Let's start with beginning:

	list_for_each_entry(child, &p->children, sibling) {
		task_lock(child);
		if (child->mm != mm && child->mm)
			points += child->mm->total_vm/2 + 1;
		task_unlock(child);
	}

	/*
	 * CPU time is in tens of seconds and run time is in thousands
         * of seconds. There is no particular reason for this other than
         * that it turned out to work very well in practice.
	 */
	cpu_time = (cputime_to_jiffies(p->utime) + cputime_to_jiffies(p->stime))
		>> (SHIFT_HZ + 3);

	if (uptime >= p->start_time.tv_sec)
		run_time = (uptime - p->start_time.tv_sec) >> 10;
	else
		run_time = 0;

	s = int_sqrt(cpu_time);
	if (s)
		points /= s;
	s = int_sqrt(int_sqrt(run_time));
	if (s)
		points /= s;

Do you _REALLY_ think anyone can calculate it yourself and then properly
calculate adjustment used to properly select oom-killed process?

I can not and will not even try if I would be an admin of the given
system. So, Alan, until you can calc that numbers in mind and then do
this for the whole heavy loaded system, please do not spread the idea
that oom_adj can be used to tune the oom-killer.
And no, reading data from /proc/.../oom_score is not enough, since
they change with time, so the same will be needed to be done to tune
the adjustment?

So far my patch is the sanest way to deal with the OOM selection, when
we have to differentiate some processes. I agree, it is not the best
solution, but it is way ahead of what we have right now for the users
and not hardcore kernel hackers.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!
  2009-01-13 13:52       ` [why oom_adj does not work] " Evgeniy Polyakov
@ 2009-01-13 14:06         ` Alan Cox
  2009-01-13 14:24           ` Evgeniy Polyakov
  0 siblings, 1 reply; 22+ messages in thread
From: Alan Cox @ 2009-01-13 14:06 UTC (permalink / raw
  To: Evgeniy Polyakov; +Cc: Dave Jones, linux-kernel, Andrew Morton, Linus Torvalds

> Do you _REALLY_ think anyone can calculate it yourself and then properly
> calculate adjustment used to properly select oom-killed process?

Its always a heuristic.

> So far my patch is the sanest way to deal with the OOM selection

No. You keep maintaining this but your crude hack is useless in a non
co-operative environment, has lots of issue with name aliasing and
doesn't deal with real needs.

We have container interfaces that can do this and far more and do them
right. In fact the very start of all the OpenVZ and container work years
ago was the beancounter patches which were addressed at exactly this
problem (although more specifically 'making sure undergraduates processes
get killed first')

Alan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!
  2009-01-13 14:06         ` Alan Cox
@ 2009-01-13 14:24           ` Evgeniy Polyakov
  2009-01-13 15:00             ` Balbir Singh
  0 siblings, 1 reply; 22+ messages in thread
From: Evgeniy Polyakov @ 2009-01-13 14:24 UTC (permalink / raw
  To: Alan Cox; +Cc: Dave Jones, linux-kernel, Andrew Morton, Linus Torvalds

On Tue, Jan 13, 2009 at 02:06:27PM +0000, Alan Cox (alan@lxorguk.ukuu.org.uk) wrote:
> > Do you _REALLY_ think anyone can calculate it yourself and then properly
> > calculate adjustment used to properly select oom-killed process?
> 
> Its always a heuristic.

For the system which knows what it is. User does not and really can not
work with it, since there is no sane way to implement that heuristic in
the applications or even in (theoretically possible) monitor daemon.

So, effectively, oom adjustment does not work.

> > So far my patch is the sanest way to deal with the OOM selection
> 
> No. You keep maintaining this but your crude hack is useless in a non
> co-operative environment, has lots of issue with name aliasing and
> doesn't deal with real needs.

It is created because of real needs. Because people need to control the
behaviour of the system and they want to control which application will
be killed to free the memory. Attached patch is not the best solution,
but it works for the all cases I can think about.

Let's take you 'name aliasing' claim: if there are several processes
with the same name, system will select the one with the worst score
according to the own magical algorithm. So it will not kill random
process just because it happend to have ricky name.

And the same applies to the other issues. It just helps system to select
the process to be killed according to userspace expectation of what
should be killed to free the memory.

> We have container interfaces that can do this and far more and do them
> right. In fact the very start of all the OpenVZ and container work years
> ago was the beancounter patches which were addressed at exactly this
> problem (although more specifically 'making sure undergraduates processes
> get killed first')

Are the beancounters used to limit amount of virtual ram and not the
physical one? It really does not work to limit for example some java
machine which will ate all virtual space swapping out different node.
It works for some (and likely the most, I do not argue this) cases and
has overhead. But we are talking not about how to limit the processes,
but what to do when we happend to have out-of-memory condition. And it
happens all the time even if you put the processes into the separate
container, since there are situations (that's why it was started at
first), when you have a huge process which should not be killed and set
of either its children or external processes, which should be checked
and some of them (administrator would like to specify the less
important) should be killed without much harm to the system.

And patch I presented allows to do it. It introduces a hint for the
killer on what processes should be checked first. It works exactly the
way people work with their system: they run different application and
expect some of them to be higher or lower priority when things come to
the oom condition. No one ever proposes to kill exactly the process we
select (although that may be a good idea in some cases), but instead to
show that oom-killer should check given group first. The group
administrator knows to be potentially harmless.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!
  2009-01-13 14:24           ` Evgeniy Polyakov
@ 2009-01-13 15:00             ` Balbir Singh
  2009-01-13 15:21               ` Evgeniy Polyakov
  0 siblings, 1 reply; 22+ messages in thread
From: Balbir Singh @ 2009-01-13 15:00 UTC (permalink / raw
  To: Evgeniy Polyakov
  Cc: Alan Cox, Dave Jones, linux-kernel, Andrew Morton, Linus Torvalds

On Tue, Jan 13, 2009 at 7:54 PM, Evgeniy Polyakov <zbr@ioremap.net> wrote:
> On Tue, Jan 13, 2009 at 02:06:27PM +0000, Alan Cox (alan@lxorguk.ukuu.org.uk) wrote:
>> > Do you _REALLY_ think anyone can calculate it yourself and then properly
>> > calculate adjustment used to properly select oom-killed process?
>>
>> Its always a heuristic.
>
> For the system which knows what it is. User does not and really can not
> work with it, since there is no sane way to implement that heuristic in
> the applications or even in (theoretically possible) monitor daemon.
>
> So, effectively, oom adjustment does not work.
>
>> > So far my patch is the sanest way to deal with the OOM selection
>>
>> No. You keep maintaining this but your crude hack is useless in a non
>> co-operative environment, has lots of issue with name aliasing and
>> doesn't deal with real needs.
>
> It is created because of real needs. Because people need to control the
> behaviour of the system and they want to control which application will
> be killed to free the memory. Attached patch is not the best solution,
> but it works for the all cases I can think about.
>

Where does this end? Tomorrow you'll add an interface for applications
that should *not* be killed? What sort of a heuristic is name? I think
the only name the kernel knows about is "init".

> Let's take you 'name aliasing' claim: if there are several processes
> with the same name, system will select the one with the worst score
> according to the own magical algorithm. So it will not kill random
> process just because it happend to have ricky name.
>

Having a name in the kernel is like building a hit-list, why can't the
examples that Alan sent work for you?
Names are tricky as well, if someone used a symbolic link to the
application with a different name, they would no longer be candidates
for OOM first? or vice-versa?

> And the same applies to the other issues. It just helps system to select
> the process to be killed according to userspace expectation of what
> should be killed to free the memory.
>
>> We have container interfaces that can do this and far more and do them
>> right. In fact the very start of all the OpenVZ and container work years
>> ago was the beancounter patches which were addressed at exactly this
>> problem (although more specifically 'making sure undergraduates processes
>> get killed first')
>
> Are the beancounters used to limit amount of virtual ram and not the
> physical one? It really does not work to limit for example some java
> machine which will ate all virtual space swapping out different node.
> It works for some (and likely the most, I do not argue this) cases and
> has overhead. But we are talking not about how to limit the processes,
> but what to do when we happend to have out-of-memory condition. And it
> happens all the time even if you put the processes into the separate
> container, since there are situations (that's why it was started at
> first), when you have a huge process which should not be killed and set
> of either its children or external processes, which should be checked
> and some of them (administrator would like to specify the less
> important) should be killed without much harm to the system.
>
> And patch I presented allows to do it. It introduces a hint for the
> killer on what processes should be checked first. It works exactly the
> way people work with their system: they run different application and
> expect some of them to be higher or lower priority when things come to
> the oom condition. No one ever proposes to kill exactly the process we
> select (although that may be a good idea in some cases), but instead to
> show that oom-killer should check given group first. The group
> administrator knows to be potentially harmless.
>

You can replace the lines of kernel code you wrote with a simple
one-line script that Alan sent out.

Balbir

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!
  2009-01-13 15:00             ` Balbir Singh
@ 2009-01-13 15:21               ` Evgeniy Polyakov
  2009-01-13 18:04                 ` Valdis.Kletnieks
  2009-01-13 19:46                 ` David Rientjes
  0 siblings, 2 replies; 22+ messages in thread
From: Evgeniy Polyakov @ 2009-01-13 15:21 UTC (permalink / raw
  To: Balbir Singh
  Cc: Alan Cox, Dave Jones, linux-kernel, Andrew Morton, Linus Torvalds

On Tue, Jan 13, 2009 at 08:30:16PM +0530, Balbir Singh (balbir@linux.vnet.ibm.com) wrote:
> > It is created because of real needs. Because people need to control the
> > behaviour of the system and they want to control which application will
> > be killed to free the memory. Attached patch is not the best solution,
> > but it works for the all cases I can think about.
> >
> 
> Where does this end? Tomorrow you'll add an interface for applications
> that should *not* be killed? What sort of a heuristic is name? I think
> the only name the kernel knows about is "init".

We have an interface to disable oom for the process already :)
But I could agree that it could be a good idea to have an interface
to provide a list of names or whatever else to select what user knows
and works with to be killed first/last

> > Let's take you 'name aliasing' claim: if there are several processes
> > with the same name, system will select the one with the worst score
> > according to the own magical algorithm. So it will not kill random
> > process just because it happend to have ricky name.
> >
> 
> Having a name in the kernel is like building a hit-list, why can't the
> examples that Alan sent work for you?

Using oom_adj? Because there is no way I can determine which number to
put there. It is not even documented for those who do not read kernel
sources. Even after that: oom_score changes with time, and having 1/2 or
8 oom_adj is correct right now, it will not be in a few moments.

Having containers is a bit overkill to determine which one to kill,
especially when several sets of processes are created from the same
parent :)

> Names are tricky as well, if someone used a symbolic link to the
> application with a different name, they would no longer be candidates
> for OOM first? or vice-versa?

It is up to the user to decide what he wants to be checked first.
Only user knows what he runs.

> You can replace the lines of kernel code you wrote with a simple
> one-line script that Alan sent out.

Almost. But I can not if tasks are spawned from the parent process. We
can not change the process to adjust its forked children to have
different adjustment and can not change it for the process itself, since
it should live and children should be dead.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!
  2009-01-13 15:21               ` Evgeniy Polyakov
@ 2009-01-13 18:04                 ` Valdis.Kletnieks
  2009-01-13 19:46                 ` David Rientjes
  1 sibling, 0 replies; 22+ messages in thread
From: Valdis.Kletnieks @ 2009-01-13 18:04 UTC (permalink / raw
  To: Evgeniy Polyakov
  Cc: Balbir Singh, Alan Cox, Dave Jones, linux-kernel, Andrew Morton,
	Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 658 bytes --]

On Tue, 13 Jan 2009 18:21:06 +0300, Evgeniy Polyakov said:

> Using oom_adj? Because there is no way I can determine which number to
> put there. It is not even documented for those who do not read kernel
> sources. Even after that: oom_score changes with time, and having 1/2 or
> 8 oom_adj is correct right now, it will not be in a few moments.

In that case, the *real* problem to be fixed is a lack of documentation.
It should be possible to add a blurb somewhere in Documentation/* that
says:

"echo 10000 > oom_adjust" is guaranteed to make this process the first one
up against the wall when the revolution comes (for some value of 10000, of
course).

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!
  2009-01-13 15:21               ` Evgeniy Polyakov
  2009-01-13 18:04                 ` Valdis.Kletnieks
@ 2009-01-13 19:46                 ` David Rientjes
  2009-01-13 21:33                   ` Evgeniy Polyakov
  1 sibling, 1 reply; 22+ messages in thread
From: David Rientjes @ 2009-01-13 19:46 UTC (permalink / raw
  To: Evgeniy Polyakov
  Cc: Balbir Singh, Alan Cox, Dave Jones, linux-kernel, Andrew Morton,
	Linus Torvalds

On Tue, 13 Jan 2009, Evgeniy Polyakov wrote:

> Using oom_adj? Because there is no way I can determine which number to
> put there. It is not even documented for those who do not read kernel
> sources. Even after that: oom_score changes with time, and having 1/2 or
> 8 oom_adj is correct right now, it will not be in a few moments.
> 

Your oom_adj scores should never need to be changed unless you're tuning 
the inherited value of a child; it simply represents your input into when 
a specific task should be considered rogue enough to target.

However, patches to improve the documentation of the oom killer, or any 
other kernel feature, are always welcome.

> > You can replace the lines of kernel code you wrote with a simple
> > one-line script that Alan sent out.
> 
> Almost. But I can not if tasks are spawned from the parent process. We
> can not change the process to adjust its forked children to have
> different adjustment and can not change it for the process itself, since
> it should live and children should be dead.
> 

Children are already preferred over the chosen parent task, as I've 
explained a few times.  When a task is identified for oom kill by the 
badness heuristics, the oom killer attempts to kill a child that does not 
share the same mm first, which is exactly what you're asking for here.  If 
the parent shares the mm, it needs to exit as well before memory freeing 
may occur.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!
  2009-01-13 19:46                 ` David Rientjes
@ 2009-01-13 21:33                   ` Evgeniy Polyakov
  2009-01-13 21:39                     ` David Rientjes
  0 siblings, 1 reply; 22+ messages in thread
From: Evgeniy Polyakov @ 2009-01-13 21:33 UTC (permalink / raw
  To: David Rientjes
  Cc: Balbir Singh, Alan Cox, Dave Jones, linux-kernel, Andrew Morton,
	Linus Torvalds

On Tue, Jan 13, 2009 at 11:46:14AM -0800, David Rientjes (rientjes@google.com) wrote:
> Children are already preferred over the chosen parent task, as I've 
> explained a few times.  When a task is identified for oom kill by the 
> badness heuristics, the oom killer attempts to kill a child that does not 
> share the same mm first, which is exactly what you're asking for here.  If 
> the parent shares the mm, it needs to exit as well before memory freeing 
> may occur.

I really did not investigate why it happend, but oom'ed machine had
killed cgi daemons and parent process itself. And ssh to the heap.
While it should be enough just to kill appropriate daemon. Apparently
things are not that shine as should be.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!
  2009-01-13 21:33                   ` Evgeniy Polyakov
@ 2009-01-13 21:39                     ` David Rientjes
  2009-01-13 22:05                       ` Evgeniy Polyakov
  0 siblings, 1 reply; 22+ messages in thread
From: David Rientjes @ 2009-01-13 21:39 UTC (permalink / raw
  To: Evgeniy Polyakov
  Cc: Balbir Singh, Alan Cox, Dave Jones, linux-kernel, Andrew Morton,
	Linus Torvalds

On Wed, 14 Jan 2009, Evgeniy Polyakov wrote:

> I really did not investigate why it happend, but oom'ed machine had
> killed cgi daemons and parent process itself. And ssh to the heap.
> While it should be enough just to kill appropriate daemon. Apparently
> things are not that shine as should be.
> 

As previously mentioned, you have all the diagnostic tools at your 
disposal already:

	echo 1 > /proc/sys/vm/oom_dump_tasks

The badness scoring is straight-forward given that information, so you can 
diagnose why a specific task was not killed and another was chosen.  You 
can also use that information to appropriately tune the oom_adj scores to 
identify your oom killer target preferences.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!
  2009-01-13 21:39                     ` David Rientjes
@ 2009-01-13 22:05                       ` Evgeniy Polyakov
  0 siblings, 0 replies; 22+ messages in thread
From: Evgeniy Polyakov @ 2009-01-13 22:05 UTC (permalink / raw
  To: David Rientjes
  Cc: Balbir Singh, Alan Cox, Dave Jones, linux-kernel, Andrew Morton,
	Linus Torvalds

On Tue, Jan 13, 2009 at 01:39:01PM -0800, David Rientjes (rientjes@google.com) wrote:
> > I really did not investigate why it happend, but oom'ed machine had
> > killed cgi daemons and parent process itself. And ssh to the heap.
> > While it should be enough just to kill appropriate daemon. Apparently
> > things are not that shine as should be.
> > 
> 
> As previously mentioned, you have all the diagnostic tools at your 
> disposal already:
> 
> 	echo 1 > /proc/sys/vm/oom_dump_tasks
> 
> The badness scoring is straight-forward given that information, so you can 
> diagnose why a specific task was not killed and another was chosen.  You 
> can also use that information to appropriately tune the oom_adj scores to 
> identify your oom killer target preferences.

There is no ssh there, I can not do any diagnostics. I first have to
change oom score for the ssh, but that's a different story.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!
       [not found]       ` <bTRYr-2up-21@gated-at.bofh.it>
@ 2009-01-14 19:18         ` Bodo Eggert
  2009-01-14 19:22           ` Evgeniy Polyakov
  0 siblings, 1 reply; 22+ messages in thread
From: Bodo Eggert @ 2009-01-14 19:18 UTC (permalink / raw
  To: Evgeniy Polyakov, Alan Cox, Dave Jones, linux-kernel,
	Andrew Morton, Linus Torvalds

Evgeniy Polyakov <zbr@ioremap.net> wrote:
> On Mon, Jan 12, 2009 at 03:51:08PM +0000, Alan Cox (alan@lxorguk.ukuu.org.uk)

>> > Well, Kenny has to die, but if we still decide to change the world, here
>> > is the fist step.
>> 
>> NAK this entire thing - we have an existing interface that does the job
>> far better.
> 
> Mwahaha, I just checked how scores are calculated, so that userspace
> could adjust them. Let's start with beginning:

[snip]

> Do you _REALLY_ think anyone can calculate it yourself and then properly
> calculate adjustment used to properly select oom-killed process?

That's easy: Just let your Kenny process run, and check it's score. If it's
too low, increase the adjustment until it's just above the other processes'
score. Using binary search, you're done in five steps.

Then, while you're at it, protect the important programs by setting
their adjustment to -17.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!
  2009-01-14 19:18         ` [why oom_adj does not work] Re: Linux killed Kenny, bastard! Bodo Eggert
@ 2009-01-14 19:22           ` Evgeniy Polyakov
  2009-01-15  0:54             ` David Rientjes
  2009-01-15 21:50             ` Bodo Eggert
  0 siblings, 2 replies; 22+ messages in thread
From: Evgeniy Polyakov @ 2009-01-14 19:22 UTC (permalink / raw
  To: Bodo Eggert
  Cc: Alan Cox, Dave Jones, linux-kernel, Andrew Morton, Linus Torvalds

On Wed, Jan 14, 2009 at 08:18:49PM +0100, Bodo Eggert (7eggert@gmx.de) wrote:
> > Mwahaha, I just checked how scores are calculated, so that userspace
> > could adjust them. Let's start with beginning:
> 
> [snip]
> 
> > Do you _REALLY_ think anyone can calculate it yourself and then properly
> > calculate adjustment used to properly select oom-killed process?
> 
> That's easy: Just let your Kenny process run, and check it's score. If it's
> too low, increase the adjustment until it's just above the other processes'
> score. Using binary search, you're done in five steps.
> 
> Then, while you're at it, protect the important programs by setting
> their adjustment to -17.

This does not work if processes are short-living and are spawned by the
parent on demand. If processes have different priority in regards to oom
condition, this problem can not be solved with existing interfaces
without changing the application. So effectively there is no solution.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!
  2009-01-14 19:22           ` Evgeniy Polyakov
@ 2009-01-15  0:54             ` David Rientjes
  2009-01-15  8:43               ` Evgeniy Polyakov
  2009-01-15 21:50             ` Bodo Eggert
  1 sibling, 1 reply; 22+ messages in thread
From: David Rientjes @ 2009-01-15  0:54 UTC (permalink / raw
  To: Evgeniy Polyakov
  Cc: Bodo Eggert, Alan Cox, Dave Jones, linux-kernel, Andrew Morton,
	Linus Torvalds

On Wed, 14 Jan 2009, Evgeniy Polyakov wrote:

> This does not work if processes are short-living and are spawned by the
> parent on demand. If processes have different priority in regards to oom
> condition, this problem can not be solved with existing interfaces
> without changing the application. So effectively there is no solution.
> 

Wrong, you can change how the application is forked.  Either immediately 
adjust /proc/$!/oom_adj or use the adjustment inheritance property and 
change /proc/$$/oom_adj to the desired value prior to forking.  Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!
  2009-01-15  0:54             ` David Rientjes
@ 2009-01-15  8:43               ` Evgeniy Polyakov
  0 siblings, 0 replies; 22+ messages in thread
From: Evgeniy Polyakov @ 2009-01-15  8:43 UTC (permalink / raw
  To: David Rientjes
  Cc: Bodo Eggert, Alan Cox, Dave Jones, linux-kernel, Andrew Morton,
	Linus Torvalds

On Wed, Jan 14, 2009 at 04:54:09PM -0800, David Rientjes (rientjes@google.com) wrote:
> > This does not work if processes are short-living and are spawned by the
> > parent on demand. If processes have different priority in regards to oom
> > condition, this problem can not be solved with existing interfaces
> > without changing the application. So effectively there is no solution.
> > 
> 
> Wrong, you can change how the application is forked.  Either immediately 
> adjust /proc/$!/oom_adj or use the adjustment inheritance property and 
> change /proc/$$/oom_adj to the desired value prior to forking.  Thanks.

You and Alan so like bash... Applications are not always forked from shell.

I already pointed multiple times where parent om_adj changes lead, and
that this does not work in a real world for some common cases. Existing
scheme only works if some daemon (or application itself) explicitely
changes oom_adj, but no dameon exists to monitor /proc and applications
do not change their own and child's oom_adj because it is way too
linuxish to add such hacks to deal with system's oom-killer, which can
not be properly configured otherwise.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!
  2009-01-14 19:22           ` Evgeniy Polyakov
  2009-01-15  0:54             ` David Rientjes
@ 2009-01-15 21:50             ` Bodo Eggert
  2009-01-15 22:35               ` Evgeniy Polyakov
  1 sibling, 1 reply; 22+ messages in thread
From: Bodo Eggert @ 2009-01-15 21:50 UTC (permalink / raw
  To: Evgeniy Polyakov
  Cc: Bodo Eggert, Alan Cox, Dave Jones, linux-kernel, Andrew Morton,
	Linus Torvalds

On Wed, 14 Jan 2009, Evgeniy Polyakov wrote:
> On Wed, Jan 14, 2009 at 08:18:49PM +0100, Bodo Eggert (7eggert@gmx.de) wrote:

> > > Mwahaha, I just checked how scores are calculated, so that userspace
> > > could adjust them. Let's start with beginning:
> > 
> > [snip]
> > 
> > > Do you _REALLY_ think anyone can calculate it yourself and then properly
> > > calculate adjustment used to properly select oom-killed process?
> > 
> > That's easy: Just let your Kenny process run, and check it's score. If it's
> > too low, increase the adjustment until it's just above the other processes'
> > score. Using binary search, you're done in five steps.
> > 
> > Then, while you're at it, protect the important programs by setting
> > their adjustment to -17.
> 
> This does not work if processes are short-living and are spawned by the
> parent on demand.

They will have the same name, too. Your Kenny-killer will fail, too.

> If processes have different priority in regards to oom
> condition, this problem can not be solved with existing interfaces
> without changing the application. So effectively there is no solution.

ACK, but being a child should count. Maybe the weight for childs should be 
increased, if it does not do the right thing? Or maybe the childs do share 
much (most of the) memory, so killing the parent is the right thing if you 
want to free some RAM?

-- 
The complexity of a weapon is inversely proportional to the IQ of the
weapon's operator.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!
  2009-01-15 21:50             ` Bodo Eggert
@ 2009-01-15 22:35               ` Evgeniy Polyakov
  2009-01-17 14:12                 ` Bodo Eggert
  0 siblings, 1 reply; 22+ messages in thread
From: Evgeniy Polyakov @ 2009-01-15 22:35 UTC (permalink / raw
  To: Bodo Eggert
  Cc: Alan Cox, Dave Jones, linux-kernel, Andrew Morton, Linus Torvalds

On Thu, Jan 15, 2009 at 10:50:58PM +0100, Bodo Eggert (7eggert@gmx.de) wrote:
> > This does not work if processes are short-living and are spawned by the
> > parent on demand.
> 
> They will have the same name, too. Your Kenny-killer will fail, too.

It is not always the case, processes start executing different binaries
and change the names, that's at least what I observed in the particular
root case of the discussion.

> > If processes have different priority in regards to oom
> > condition, this problem can not be solved with existing interfaces
> > without changing the application. So effectively there is no solution.
> 
> ACK, but being a child should count. Maybe the weight for childs should be 
> increased, if it does not do the right thing? Or maybe the childs do share 
> much (most of the) memory, so killing the parent is the right thing if you 
> want to free some RAM?

There could be lots of heuristics applied for the different cases, but
without changing the application, they are somewhat limited to
long-living processes only. There are really lots of cases when it does
not stand.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!
  2009-01-15 22:35               ` Evgeniy Polyakov
@ 2009-01-17 14:12                 ` Bodo Eggert
  2009-01-17 14:22                   ` Evgeniy Polyakov
  0 siblings, 1 reply; 22+ messages in thread
From: Bodo Eggert @ 2009-01-17 14:12 UTC (permalink / raw
  To: Evgeniy Polyakov
  Cc: Bodo Eggert, Alan Cox, Dave Jones, linux-kernel, Andrew Morton,
	Linus Torvalds

On Fri, 16 Jan 2009, Evgeniy Polyakov wrote:
> On Thu, Jan 15, 2009 at 10:50:58PM +0100, Bodo Eggert (7eggert@gmx.de) wrote:

> > > This does not work if processes are short-living and are spawned by the
> > > parent on demand.
> > 
> > They will have the same name, too. Your Kenny-killer will fail, too.
> 
> It is not always the case, processes start executing different binaries
> and change the names, that's at least what I observed in the particular
> root case of the discussion.

In that case, you can use a wrapper script.

> > > If processes have different priority in regards to oom
> > > condition, this problem can not be solved with existing interfaces
> > > without changing the application. So effectively there is no solution.
> > 
> > ACK, but being a child should count. Maybe the weight for childs should be 
> > increased, if it does not do the right thing? Or maybe the childs do share 
> > much (most of the) memory, so killing the parent is the right thing if you 
> > want to free some RAM?
> 
> There could be lots of heuristics applied for the different cases, but
> without changing the application, they are somewhat limited to
> long-living processes only. There are really lots of cases when it does
> not stand.

If it's short-lived enough, the processes will out-die the OOM-Killer.
You can only win by by suspending or killing the factory.
-- 
Why do men die before their wives?
They want to.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!
  2009-01-17 14:12                 ` Bodo Eggert
@ 2009-01-17 14:22                   ` Evgeniy Polyakov
  2009-01-18 12:37                     ` Bodo Eggert
  0 siblings, 1 reply; 22+ messages in thread
From: Evgeniy Polyakov @ 2009-01-17 14:22 UTC (permalink / raw
  To: Bodo Eggert
  Cc: Alan Cox, Dave Jones, linux-kernel, Andrew Morton, Linus Torvalds

On Sat, Jan 17, 2009 at 03:12:49PM +0100, Bodo Eggert (7eggert@gmx.de) wrote:
> > > > This does not work if processes are short-living and are spawned by the
> > > > parent on demand.
> > > 
> > > They will have the same name, too. Your Kenny-killer will fail, too.
> > 
> > It is not always the case, processes start executing different binaries
> > and change the names, that's at least what I observed in the particular
> > root case of the discussion.
> 
> In that case, you can use a wrapper script.

That may be a solution, except that not very convenient, since there may
be really lots of executables and cooking up a special script for
everyone will not scale well.

> > There could be lots of heuristics applied for the different cases, but
> > without changing the application, they are somewhat limited to
> > long-living processes only. There are really lots of cases when it does
> > not stand.
> 
> If it's short-lived enough, the processes will out-die the OOM-Killer.
> You can only win by by suspending or killing the factory.

No, admin will limit/forbid the connection from the DoSing clients,
server must always live to handle proper users.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!
  2009-01-17 14:22                   ` Evgeniy Polyakov
@ 2009-01-18 12:37                     ` Bodo Eggert
  2009-01-18 13:13                       ` Evgeniy Polyakov
  0 siblings, 1 reply; 22+ messages in thread
From: Bodo Eggert @ 2009-01-18 12:37 UTC (permalink / raw
  To: Evgeniy Polyakov
  Cc: Bodo Eggert, Alan Cox, Dave Jones, linux-kernel, Andrew Morton,
	Linus Torvalds

On Sat, 17 Jan 2009, Evgeniy Polyakov wrote:

> On Sat, Jan 17, 2009 at 03:12:49PM +0100, Bodo Eggert (7eggert@gmx.de) wrote:
> > > > > This does not work if processes are short-living and are spawned by the
> > > > > parent on demand.
> > > > 
> > > > They will have the same name, too. Your Kenny-killer will fail, too.
> > > 
> > > It is not always the case, processes start executing different binaries
> > > and change the names, that's at least what I observed in the particular
> > > root case of the discussion.
> > 
> > In that case, you can use a wrapper script.
> 
> That may be a solution, except that not very convenient, since there may
> be really lots of executables and cooking up a special script for
> everyone will not scale well.

How many different CGI handlers are you going to have?

And how does kill-kenny scale with the number of users on the system?
I want my browser not to be killed, while the other user wants his
gimp not to be killed. As you can see, it does not even scale for
the most simple multi-user system.

> > > There could be lots of heuristics applied for the different cases, but
> > > without changing the application, they are somewhat limited to
> > > long-living processes only. There are really lots of cases when it does
> > > not stand.
> > 
> > If it's short-lived enough, the processes will out-die the OOM-Killer.
> > You can only win by by suspending or killing the factory.
> 
> No, admin will limit/forbid the connection from the DoSing clients,
> server must always live to handle proper users.

If there is no memory, the admin can't even log in.
-- 
Programming is an art form that fights back.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!
  2009-01-18 12:37                     ` Bodo Eggert
@ 2009-01-18 13:13                       ` Evgeniy Polyakov
  2009-01-18 20:25                         ` Bodo Eggert
  0 siblings, 1 reply; 22+ messages in thread
From: Evgeniy Polyakov @ 2009-01-18 13:13 UTC (permalink / raw
  To: Bodo Eggert
  Cc: Alan Cox, Dave Jones, linux-kernel, Andrew Morton, Linus Torvalds

On Sun, Jan 18, 2009 at 01:37:09PM +0100, Bodo Eggert (7eggert@gmx.de) wrote:
> How many different CGI handlers are you going to have?

CGIs are usually limited, application server is not.

> And how does kill-kenny scale with the number of users on the system?
> I want my browser not to be killed, while the other user wants his
> gimp not to be killed. As you can see, it does not even scale for
> the most simple multi-user system.

It is not about who should not be killed, but who should _be_ in the
first raw.

> > No, admin will limit/forbid the connection from the DoSing clients,
> > server must always live to handle proper users.
> 
> If there is no memory, the admin can't even log in.

Admin can observe the situation via kvm or sometimes netconsole and
tune the system for the next run.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!
  2009-01-18 13:13                       ` Evgeniy Polyakov
@ 2009-01-18 20:25                         ` Bodo Eggert
  2009-01-18 20:41                           ` Evgeniy Polyakov
  0 siblings, 1 reply; 22+ messages in thread
From: Bodo Eggert @ 2009-01-18 20:25 UTC (permalink / raw
  To: Evgeniy Polyakov
  Cc: Bodo Eggert, Alan Cox, Dave Jones, linux-kernel, Andrew Morton,
	Linus Torvalds

On Sun, 18 Jan 2009, Evgeniy Polyakov wrote:
> On Sun, Jan 18, 2009 at 01:37:09PM +0100, Bodo Eggert (7eggert@gmx.de) wrote:

> > How many different CGI handlers are you going to have?
> 
> CGIs are usually limited, application server is not.
> 
> > And how does kill-kenny scale with the number of users on the system?
> > I want my browser not to be killed, while the other user wants his
> > gimp not to be killed. As you can see, it does not even scale for
> > the most simple multi-user system.
> 
> It is not about who should not be killed, but who should _be_ in the
> first raw.

If it comes to the killing, it will start with the first row, or using your 
patch, with the only man in the first row, named kenny. Now imagine a 
phalanx of spawned kennies protecting a running-wild application from being 
killed ...

If you set the oom_adj to mark the goat under normal conditions, the system 
will adjust itself to abnormal conditions.

> > > No, admin will limit/forbid the connection from the DoSing clients,
> > > server must always live to handle proper users.
> > 
> > If there is no memory, the admin can't even log in.
> 
> Admin can observe the situation via kvm or sometimes netconsole and
> tune the system for the next run.

So your kill-kenny does not only require having exactly one goat system-wide 
and no process having the same process name, but also constant supervision.
I think it's a really great design!
-- 
Whenever you have plenty of ammo, you never miss. Whenever you are low on
ammo, you can't hit the broad side of a barn.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [why oom_adj does not work] Re: Linux killed Kenny, bastard!
  2009-01-18 20:25                         ` Bodo Eggert
@ 2009-01-18 20:41                           ` Evgeniy Polyakov
  0 siblings, 0 replies; 22+ messages in thread
From: Evgeniy Polyakov @ 2009-01-18 20:41 UTC (permalink / raw
  To: Bodo Eggert
  Cc: Alan Cox, Dave Jones, linux-kernel, Andrew Morton, Linus Torvalds

On Sun, Jan 18, 2009 at 09:25:49PM +0100, Bodo Eggert (7eggert@gmx.de) wrote:
> > It is not about who should not be killed, but who should _be_ in the
> > first raw.
> 
> If it comes to the killing, it will start with the first row, or using your 
> patch, with the only man in the first row, named kenny. Now imagine a 
> phalanx of spawned kennies protecting a running-wild application from being 
> killed ...
>
> If you set the oom_adj to mark the goat under normal conditions, the system 
> will adjust itself to abnormal conditions.

Admin who sets is up knows what he is doing. Hope you will not argue
about the case, when admin will disable the oom-killer and will not be
able to log in.

Once again: this is an additional tunable which allows to easily solve
the problem showed here multiple times. And whily you did not try to
tune oom-adj yourself you continue arguing that it works the best. It
does not. Any solution for the showed problem is not a simple and
nice-looking, the one I proposed imo looks the most convenient for the
people who really work with the systems where described behaviour was
observed.

> > > > No, admin will limit/forbid the connection from the DoSing clients,
> > > > server must always live to handle proper users.
> > > 
> > > If there is no memory, the admin can't even log in.
> > 
> > Admin can observe the situation via kvm or sometimes netconsole and
> > tune the system for the next run.
> 
> So your kill-kenny does not only require having exactly one goat system-wide 
> and no process having the same process name, but also constant supervision.
> I think it's a really great design!

You should reread (better twice) what we are talking about here and what
and why patch was proposed. And how it works too.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2009-01-18 20:41 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bTx3z-8bU-23@gated-at.bofh.it>
     [not found] ` <bTxdg-8sZ-23@gated-at.bofh.it>
     [not found]   ` <bTxdg-8sZ-21@gated-at.bofh.it>
     [not found]     ` <bTxmW-hf-1@gated-at.bofh.it>
     [not found]       ` <bTRYr-2up-21@gated-at.bofh.it>
2009-01-14 19:18         ` [why oom_adj does not work] Re: Linux killed Kenny, bastard! Bodo Eggert
2009-01-14 19:22           ` Evgeniy Polyakov
2009-01-15  0:54             ` David Rientjes
2009-01-15  8:43               ` Evgeniy Polyakov
2009-01-15 21:50             ` Bodo Eggert
2009-01-15 22:35               ` Evgeniy Polyakov
2009-01-17 14:12                 ` Bodo Eggert
2009-01-17 14:22                   ` Evgeniy Polyakov
2009-01-18 12:37                     ` Bodo Eggert
2009-01-18 13:13                       ` Evgeniy Polyakov
2009-01-18 20:25                         ` Bodo Eggert
2009-01-18 20:41                           ` Evgeniy Polyakov
2009-01-12 15:33 Evgeniy Polyakov
2009-01-12 15:44 ` Dave Jones
2009-01-12 15:48   ` Evgeniy Polyakov
2009-01-12 15:51     ` Alan Cox
2009-01-13 13:52       ` [why oom_adj does not work] " Evgeniy Polyakov
2009-01-13 14:06         ` Alan Cox
2009-01-13 14:24           ` Evgeniy Polyakov
2009-01-13 15:00             ` Balbir Singh
2009-01-13 15:21               ` Evgeniy Polyakov
2009-01-13 18:04                 ` Valdis.Kletnieks
2009-01-13 19:46                 ` David Rientjes
2009-01-13 21:33                   ` Evgeniy Polyakov
2009-01-13 21:39                     ` David Rientjes
2009-01-13 22:05                       ` Evgeniy Polyakov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.