* [PATCH] selftests: cgroup: fix test_kmem_memcg_deletion false positives
@ 2023-08-04 15:37 Lucas Karpinski
2023-08-04 16:37 ` Johannes Weiner
0 siblings, 1 reply; 4+ messages in thread
From: Lucas Karpinski @ 2023-08-04 15:37 UTC (permalink / raw)
To: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
Shakeel Butt, Muchun Song, Tejun Heo, Zefan Li, Shuah Khan
Cc: Muchun Song, cgroups, linux-mm, linux-kselftest, linux-kernel
The test allocates dcache inside a cgroup, then destroys the cgroups and
then checks the sanity of numbers on the parent level. The reason it
fails is because dentries are freed with an RCU delay - a debugging
sleep shows that usage drops as expected shortly after.
Insert a 1s sleep after completing the cgroup creation/deletions. This
should be good enough, assuming that machines running those tests are
otherwise not very busy. This commit is directly inspired by Johannes
over at the link below.
Link: https://lore.kernel.org/all/20230801135632.1768830-1-hannes@cmpxchg.org/
Signed-off-by: Lucas Karpinski <lkarpins@redhat.com>
---
tools/testing/selftests/cgroup/test_kmem.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c
index 67cc0182058d..7ac384bbfdd5 100644
--- a/tools/testing/selftests/cgroup/test_kmem.c
+++ b/tools/testing/selftests/cgroup/test_kmem.c
@@ -183,6 +183,9 @@ static int test_kmem_memcg_deletion(const char *root)
if (cg_run_in_subcgroups(parent, alloc_kmem_smp, NULL, 100))
goto cleanup;
+ /* wait for RCU freeing */
+ sleep(1);
+
current = cg_read_long(parent, "memory.current");
slab = cg_read_key_long(parent, "memory.stat", "slab ");
anon = cg_read_key_long(parent, "memory.stat", "anon ");
--
2.41.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] selftests: cgroup: fix test_kmem_memcg_deletion false positives
2023-08-04 15:37 [PATCH] selftests: cgroup: fix test_kmem_memcg_deletion false positives Lucas Karpinski
@ 2023-08-04 16:37 ` Johannes Weiner
2023-08-04 18:59 ` Lucas Karpinski
0 siblings, 1 reply; 4+ messages in thread
From: Johannes Weiner @ 2023-08-04 16:37 UTC (permalink / raw)
To: Lucas Karpinski
Cc: Andrew Morton, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Tejun Heo, Zefan Li, Shuah Khan, cgroups, linux-mm,
linux-kselftest, linux-kernel
On Fri, Aug 04, 2023 at 11:37:33AM -0400, Lucas Karpinski wrote:
> The test allocates dcache inside a cgroup, then destroys the cgroups and
> then checks the sanity of numbers on the parent level. The reason it
> fails is because dentries are freed with an RCU delay - a debugging
> sleep shows that usage drops as expected shortly after.
>
> Insert a 1s sleep after completing the cgroup creation/deletions. This
> should be good enough, assuming that machines running those tests are
> otherwise not very busy. This commit is directly inspired by Johannes
> over at the link below.
>
> Link: https://lore.kernel.org/all/20230801135632.1768830-1-hannes@cmpxchg.org/
>
> Signed-off-by: Lucas Karpinski <lkarpins@redhat.com>
Maybe I'm missing something, but there isn't a limit set anywhere that
would cause the dentries to be reclaimed and freed, no? When the
subgroups are deleted, the objects are just moved to the parent. The
counters inside the parent (which are hierarchical) shouldn't change.
So this seems to be a different scenario than test_kmem_basic. If the
test is failing for you, I can't quite see why.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] selftests: cgroup: fix test_kmem_memcg_deletion false positives
2023-08-04 16:37 ` Johannes Weiner
@ 2023-08-04 18:59 ` Lucas Karpinski
2023-08-14 15:55 ` Lucas Karpinski
0 siblings, 1 reply; 4+ messages in thread
From: Lucas Karpinski @ 2023-08-04 18:59 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Tejun Heo, Zefan Li, Shuah Khan, cgroups, linux-mm,
linux-kselftest, linux-kernel
On Fri, Aug 04, 2023 at 12:37:16PM -0400, Johannes Weiner wrote:
> On Fri, Aug 04, 2023 at 11:37:33AM -0400, Lucas Karpinski wrote:
> > The test allocates dcache inside a cgroup, then destroys the cgroups and
> > then checks the sanity of numbers on the parent level. The reason it
> > fails is because dentries are freed with an RCU delay - a debugging
> > sleep shows that usage drops as expected shortly after.
> >
> > Insert a 1s sleep after completing the cgroup creation/deletions. This
> > should be good enough, assuming that machines running those tests are
> > otherwise not very busy. This commit is directly inspired by Johannes
> > over at the link below.
> >
> > Link: https://lore.kernel.org/all/20230801135632.1768830-1-hannes@cmpxchg.org/
> >
> > Signed-off-by: Lucas Karpinski <lkarpins@redhat.com>
>
> Maybe I'm missing something, but there isn't a limit set anywhere that
> would cause the dentries to be reclaimed and freed, no? When the
> subgroups are deleted, the objects are just moved to the parent. The
> counters inside the parent (which are hierarchical) shouldn't change.
>
> So this seems to be a different scenario than test_kmem_basic. If the
> test is failing for you, I can't quite see why.
>
You're right, the parent inherited the counters and it should behave
the same whether I'm directly removing the child or if I was moving it
under another cgroup. I do see the behaviour you described on my
x86_64 setup, but the wrong behaviour on my aarch64 dev. platform. I'll
take a closer look, but just wanted to leave an example here of what I
see.
Example of slab size pre/post sleep:
slab_pre = 18164688, slab_post = 3360000
Thanks,
Lucas
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] selftests: cgroup: fix test_kmem_memcg_deletion false positives
2023-08-04 18:59 ` Lucas Karpinski
@ 2023-08-14 15:55 ` Lucas Karpinski
0 siblings, 0 replies; 4+ messages in thread
From: Lucas Karpinski @ 2023-08-14 15:55 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Tejun Heo, Zefan Li, Shuah Khan, cgroups, linux-mm,
linux-kselftest, linux-kernel
On Fri, Aug 04, 2023 at 02:59:28PM -0400, Lucas Karpinski wrote:
> On Fri, Aug 04, 2023 at 12:37:16PM -0400, Johannes Weiner wrote:
> > On Fri, Aug 04, 2023 at 11:37:33AM -0400, Lucas Karpinski wrote:
> > > The test allocates dcache inside a cgroup, then destroys the cgroups and
> > > then checks the sanity of numbers on the parent level. The reason it
> > > fails is because dentries are freed with an RCU delay - a debugging
> > > sleep shows that usage drops as expected shortly after.
> > >
> > > Insert a 1s sleep after completing the cgroup creation/deletions. This
> > > should be good enough, assuming that machines running those tests are
> > > otherwise not very busy. This commit is directly inspired by Johannes
> > > over at the link below.
> > >
> > > Link: https://lore.kernel.org/all/20230801135632.1768830-1-hannes@cmpxchg.org/
> > >
> > > Signed-off-by: Lucas Karpinski <lkarpins@redhat.com>
> >
> > Maybe I'm missing something, but there isn't a limit set anywhere that
> > would cause the dentries to be reclaimed and freed, no? When the
> > subgroups are deleted, the objects are just moved to the parent. The
> > counters inside the parent (which are hierarchical) shouldn't change.
> >
> > So this seems to be a different scenario than test_kmem_basic. If the
> > test is failing for you, I can't quite see why.
> >
> You're right, the parent inherited the counters and it should behave
> the same whether I'm directly removing the child or if I was moving it
> under another cgroup. I do see the behaviour you described on my
> x86_64 setup, but the wrong behaviour on my aarch64 dev. platform. I'll
> take a closer look, but just wanted to leave an example here of what I
> see.
>
> Example of slab size pre/post sleep:
> slab_pre = 18164688, slab_post = 3360000
>
> Thanks,
> Lucas
Looked into the failures and I do have a proposed solution, just want
some feedback first. With how the kernel entry in memory.stat is
updated, it takes into account all charged / uncharged pages, it looks
like it makes more sense to use that single entry rather than `slab +
anon + file + kernel_stack + pagetables + percpu + sock' as it would
cover all utilization.
Thanks,
Lucas
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-08-14 15:59 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-04 15:37 [PATCH] selftests: cgroup: fix test_kmem_memcg_deletion false positives Lucas Karpinski
2023-08-04 16:37 ` Johannes Weiner
2023-08-04 18:59 ` Lucas Karpinski
2023-08-14 15:55 ` Lucas Karpinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).