ceph-osd mem usage growth

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* ceph-osd mem usage growth
@ 2015-12-10 16:24 Igor Fedotov
  2015-12-10 17:46 ` Samuel Just
  0 siblings, 1 reply; 3+ messages in thread
From: Igor Fedotov @ 2015-12-10 16:24 UTC (permalink / raw
  To: ceph-devel

Hi Cephers,

implementing compression support for EC pools I faced an issue that can 
be summarized as follows.

Imagine a client that continuously extends specific object xattr by 
doing complete attribute rewrite with new data portion appended.
As a result one can observe permanently increasing mem usage for 
ceph-osd processes. This happens for objects at EC pools only.

I briefly investigated for the root cause and it looks like that's due 
to PG log memory consumption growth. PG log entry count is pretty stable 
but each entry consumes more and more memory over the time since it 
contains full attribute value.
As far as I understand replicated pools do not log setattr operation ( 
actually mark it as unrollbackable ) that's why the issue isn't observed 
there.

With 3000 log entries and e.g. 64Kb attribute value memory consumption 
is pretty visible.

So the questions are:
* Are there any ideas how to resolve this issue? Obvious solution is to 
refactor attribute extending by using multiple keys...  Anything else?
* Does it make sense to resolve it at all?  IMO that's a sort of 
vulnerability for Ceph process to behave this way...

Please find a python script to reproduce the issue below, to be started 
from the folder where ceph.conf is located:

python repro.py <poolname>

######################################
import rados, sys
from time import sleep
import psutil

def print_process_mem_usage(pid):
   process = psutil.Process(pid)
   mem = process.get_memory_info()
   mem0=mem[0] / (2 ** 20)
   mem1=mem[1] / (2 ** 20)
   print "pid %d: Virt: %i MB, Res: %i MB" % (pid, mem1, mem0)

def print_processes_mem_usage():
   for proc in psutil.process_iter():
     try:
       if 'ceph-osd' in proc.name():
         print_process_mem_usage(proc.pid)
     except psutil.NoSuchProcess:
       pass

cluster = rados.Rados(conffile='./ceph.conf')

cluster.connect()

ioctx = cluster.open_ioctx(sys.argv[1])
try:
     ioctx.remove_object("pyobject")
except:
     pass
s=""
for i in range(25000):
     s=''.zfill( i*15)
     ioctx.set_xattr( 'pyobject', 'somekey', s)
     if (i % 500)==0:
         print '%d-th step, attr len = %d' % (i, len(s))
         print_processes_mem_usage()

ioctx.close()
#########################
Sample output is as below:
0-th step, attr len = 0
pid 23723: Virt: 700 MB, Res: 30 MB
pid 23922: Virt: 701 MB, Res: 32 MB
pid 24142: Virt: 700 MB, Res: 32 MB
...
4000-th step, attr len = 60000
pid 23723: Virt: 896 MB, Res: 207 MB
pid 23922: Virt: 900 MB, Res: 212 MB
pid 24142: Virt: 897 MB, Res: 210 MB
...
6000-th step, attr len = 90000
pid 23723: Virt: 1025 MB, Res: 331 MB
pid 23922: Virt: 1032 MB, Res: 338 MB
pid 24142: Virt: 1025 MB, Res: 333 MB
...

Thanks,
Igor

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ceph-osd mem usage growth
  2015-12-10 16:24 ceph-osd mem usage growth Igor Fedotov
@ 2015-12-10 17:46 ` Samuel Just
  2015-12-11 16:09   ` Igor Fedotov
  0 siblings, 1 reply; 3+ messages in thread
From: Samuel Just @ 2015-12-10 17:46 UTC (permalink / raw
  To: Igor Fedotov; +Cc: ceph-devel

The short answer is that you aren't supposed to store large things in
xattrs at all.  If you feel it's a "vulnerability", than we could add
a config option to reject xattrs over a particular size.
-Sam

On Thu, Dec 10, 2015 at 8:24 AM, Igor Fedotov <ifedotov@mirantis.com> wrote:
> Hi Cephers,
>
> implementing compression support for EC pools I faced an issue that can be
> summarized as follows.
>
> Imagine a client that continuously extends specific object xattr by doing
> complete attribute rewrite with new data portion appended.
> As a result one can observe permanently increasing mem usage for ceph-osd
> processes. This happens for objects at EC pools only.
>
> I briefly investigated for the root cause and it looks like that's due to PG
> log memory consumption growth. PG log entry count is pretty stable but each
> entry consumes more and more memory over the time since it contains full
> attribute value.
> As far as I understand replicated pools do not log setattr operation (
> actually mark it as unrollbackable ) that's why the issue isn't observed
> there.
>
> With 3000 log entries and e.g. 64Kb attribute value memory consumption is
> pretty visible.
>
> So the questions are:
> * Are there any ideas how to resolve this issue? Obvious solution is to
> refactor attribute extending by using multiple keys...  Anything else?
> * Does it make sense to resolve it at all?  IMO that's a sort of
> vulnerability for Ceph process to behave this way...
>
> Please find a python script to reproduce the issue below, to be started from
> the folder where ceph.conf is located:
>
> python repro.py <poolname>
>
> ######################################
> import rados, sys
> from time import sleep
> import psutil
>
> def print_process_mem_usage(pid):
>   process = psutil.Process(pid)
>   mem = process.get_memory_info()
>   mem0=mem[0] / (2 ** 20)
>   mem1=mem[1] / (2 ** 20)
>   print "pid %d: Virt: %i MB, Res: %i MB" % (pid, mem1, mem0)
>
> def print_processes_mem_usage():
>   for proc in psutil.process_iter():
>     try:
>       if 'ceph-osd' in proc.name():
>         print_process_mem_usage(proc.pid)
>     except psutil.NoSuchProcess:
>       pass
>
> cluster = rados.Rados(conffile='./ceph.conf')
>
> cluster.connect()
>
> ioctx = cluster.open_ioctx(sys.argv[1])
> try:
>     ioctx.remove_object("pyobject")
> except:
>     pass
> s=""
> for i in range(25000):
>     s=''.zfill( i*15)
>     ioctx.set_xattr( 'pyobject', 'somekey', s)
>     if (i % 500)==0:
>         print '%d-th step, attr len = %d' % (i, len(s))
>         print_processes_mem_usage()
>
> ioctx.close()
> #########################
> Sample output is as below:
> 0-th step, attr len = 0
> pid 23723: Virt: 700 MB, Res: 30 MB
> pid 23922: Virt: 701 MB, Res: 32 MB
> pid 24142: Virt: 700 MB, Res: 32 MB
> ...
> 4000-th step, attr len = 60000
> pid 23723: Virt: 896 MB, Res: 207 MB
> pid 23922: Virt: 900 MB, Res: 212 MB
> pid 24142: Virt: 897 MB, Res: 210 MB
> ...
> 6000-th step, attr len = 90000
> pid 23723: Virt: 1025 MB, Res: 331 MB
> pid 23922: Virt: 1032 MB, Res: 338 MB
> pid 24142: Virt: 1025 MB, Res: 333 MB
> ...
>
>
> Thanks,
> Igor
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ceph-osd mem usage growth
  2015-12-10 17:46 ` Samuel Just
@ 2015-12-11 16:09   ` Igor Fedotov
  0 siblings, 0 replies; 3+ messages in thread
From: Igor Fedotov @ 2015-12-11 16:09 UTC (permalink / raw
  To: Samuel Just; +Cc: ceph-devel

Hi Samuel,

thanks for you answer.

One more question:
Why erasure coded pools use PG log for setxattr op while replicated ones 
don't?
What's the rationale for that?

Thanks,
Igor.

On 10.12.2015 20:46, Samuel Just wrote:
> The short answer is that you aren't supposed to store large things in
> xattrs at all.  If you feel it's a "vulnerability", than we could add
> a config option to reject xattrs over a particular size.
> -Sam
>
> On Thu, Dec 10, 2015 at 8:24 AM, Igor Fedotov <ifedotov@mirantis.com> wrote:
>> Hi Cephers,
>>
>> implementing compression support for EC pools I faced an issue that can be
>> summarized as follows.
>>
>> Imagine a client that continuously extends specific object xattr by doing
>> complete attribute rewrite with new data portion appended.
>> As a result one can observe permanently increasing mem usage for ceph-osd
>> processes. This happens for objects at EC pools only.
>>
>> I briefly investigated for the root cause and it looks like that's due to PG
>> log memory consumption growth. PG log entry count is pretty stable but each
>> entry consumes more and more memory over the time since it contains full
>> attribute value.
>> As far as I understand replicated pools do not log setattr operation (
>> actually mark it as unrollbackable ) that's why the issue isn't observed
>> there.
>>
>> With 3000 log entries and e.g. 64Kb attribute value memory consumption is
>> pretty visible.
>>
>> So the questions are:
>> * Are there any ideas how to resolve this issue? Obvious solution is to
>> refactor attribute extending by using multiple keys...  Anything else?
>> * Does it make sense to resolve it at all?  IMO that's a sort of
>> vulnerability for Ceph process to behave this way...
>>
>> Please find a python script to reproduce the issue below, to be started from
>> the folder where ceph.conf is located:
>>
>> python repro.py <poolname>
>>
>> ######################################
>> import rados, sys
>> from time import sleep
>> import psutil
>>
>> def print_process_mem_usage(pid):
>>    process = psutil.Process(pid)
>>    mem = process.get_memory_info()
>>    mem0=mem[0] / (2 ** 20)
>>    mem1=mem[1] / (2 ** 20)
>>    print "pid %d: Virt: %i MB, Res: %i MB" % (pid, mem1, mem0)
>>
>> def print_processes_mem_usage():
>>    for proc in psutil.process_iter():
>>      try:
>>        if 'ceph-osd' in proc.name():
>>          print_process_mem_usage(proc.pid)
>>      except psutil.NoSuchProcess:
>>        pass
>>
>> cluster = rados.Rados(conffile='./ceph.conf')
>>
>> cluster.connect()
>>
>> ioctx = cluster.open_ioctx(sys.argv[1])
>> try:
>>      ioctx.remove_object("pyobject")
>> except:
>>      pass
>> s=""
>> for i in range(25000):
>>      s=''.zfill( i*15)
>>      ioctx.set_xattr( 'pyobject', 'somekey', s)
>>      if (i % 500)==0:
>>          print '%d-th step, attr len = %d' % (i, len(s))
>>          print_processes_mem_usage()
>>
>> ioctx.close()
>> #########################
>> Sample output is as below:
>> 0-th step, attr len = 0
>> pid 23723: Virt: 700 MB, Res: 30 MB
>> pid 23922: Virt: 701 MB, Res: 32 MB
>> pid 24142: Virt: 700 MB, Res: 32 MB
>> ...
>> 4000-th step, attr len = 60000
>> pid 23723: Virt: 896 MB, Res: 207 MB
>> pid 23922: Virt: 900 MB, Res: 212 MB
>> pid 24142: Virt: 897 MB, Res: 210 MB
>> ...
>> 6000-th step, attr len = 90000
>> pid 23723: Virt: 1025 MB, Res: 331 MB
>> pid 23922: Virt: 1032 MB, Res: 338 MB
>> pid 24142: Virt: 1025 MB, Res: 333 MB
>> ...
>>
>>
>> Thanks,
>> Igor
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-12-11 16:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-10 16:24 ceph-osd mem usage growth Igor Fedotov
2015-12-10 17:46 ` Samuel Just
2015-12-11 16:09   ` Igor Fedotov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.