Containers Archive mirror
 help / color / mirror / Atom feed
From: "Eric W. Biederman" <ebiederm@xmission.com>
To: Alexey Gladkov <legion@kernel.org>
Cc: linux-kernel@vger.kernel.org, "Kees Cook" <keescook@chromium.org>,
	"Shuah Khan" <shuah@kernel.org>,
	"Christian Brauner" <brauner@kernel.org>,
	"Solar Designer" <solar@openwall.com>,
	"Ran Xiaokai" <ran.xiaokai@zte.com.cn>,
	containers@lists.linux.dev, "Michal Koutný" <mkoutny@suse.com>
Subject: Re: [PATCH 6/8] ucounts: Handle inc_rlimit_ucounts wrapping in fork
Date: Fri, 11 Feb 2022 13:56:29 -0600	[thread overview]
Message-ID: <87zgmxxjma.fsf@email.froward.int.ebiederm.org> (raw)
In-Reply-To: <20220211184041.dlqjk2fgdnkmtpe3@example.org> (Alexey Gladkov's message of "Fri, 11 Feb 2022 19:40:41 +0100")

Alexey Gladkov <legion@kernel.org> writes:

> On Fri, Feb 11, 2022 at 11:50:46AM -0600, Eric W. Biederman wrote:
>> Alexey Gladkov <legion@kernel.org> writes:
>> 
>> > On Thu, Feb 10, 2022 at 08:13:22PM -0600, Eric W. Biederman wrote:
>> >> Move inc_rlimit_ucounts from copy_creds into copy_process immediately
>> >> after copy_creds where it can be called exactly once.  Test for and
>> >> handle it when inc_rlimit_ucounts returns LONG_MAX indicating the
>> >> count has wrapped.
>> >> 
>> >> This is good hygenine and fixes a theoretical bug.  In practice
>> >> PID_MAX_LIMIT is at most 2^22 so there is not a chance the number of
>> >> processes would ever wrap even on an architecture with a 32bit long.
>> >> 
>> >> Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
>> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> >> ---
>> >>  kernel/cred.c | 2 --
>> >>  kernel/fork.c | 2 ++
>> >>  2 files changed, 2 insertions(+), 2 deletions(-)
>> >> 
>> >> diff --git a/kernel/cred.c b/kernel/cred.c
>> >> index 229cff081167..96d5fd6ff26f 100644
>> >> --- a/kernel/cred.c
>> >> +++ b/kernel/cred.c
>> >> @@ -358,7 +358,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
>> >>  		kdebug("share_creds(%p{%d,%d})",
>> >>  		       p->cred, atomic_read(&p->cred->usage),
>> >>  		       read_cred_subscribers(p->cred));
>> >> -		inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
>> >>  		return 0;
>> >>  	}
>> >>  
>> >> @@ -395,7 +394,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
>> >>  #endif
>> >>  
>> >>  	p->cred = p->real_cred = get_cred(new);
>> >> -	inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
>> >>  	alter_cred_subscribers(new, 2);
>> >>  	validate_creds(new);
>> >>  	return 0;
>> >> diff --git a/kernel/fork.c b/kernel/fork.c
>> >> index 6f62d37f3650..69333078259c 100644
>> >> --- a/kernel/fork.c
>> >> +++ b/kernel/fork.c
>> >> @@ -2026,6 +2026,8 @@ static __latent_entropy struct task_struct *copy_process(
>> >>  		goto bad_fork_free;
>> >>  
>> >>  	retval = -EAGAIN;
>> >> +	if (inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1) == LONG_MAX)
>> >> +		goto bad_fork_cleanup_count;
>> >>  	if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
>> >>  		if ((task_ucounts(p) != &init_ucounts) &&
>> >>  		    !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
>> >
>> > It might make sense to do something like:
>> >
>> > 	if (inc_rlimit_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1, rlimit(RLIMIT_NPROC)) == LONG_MAX) {
>> > 		if ((task_ucounts(p) != &init_ucounts) &&
>> > 		    !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
>> >
>> > and the new function:
>> >
>> > long inc_rlimit_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, long v, unsigned long rlimit)
>> > {
>> > 	struct ucounts *iter;
>> > 	long ret = 0;
>> > 	long max = rlimit;
>> > 	if (rlimit > LONG_MAX)
>> > 		max = LONG_MAX;
>> > 	for (iter = ucounts; iter; iter = iter->ns->ucounts) {
>> > 		long new = atomic_long_add_return(v, &iter->ucount[type]);
>> > 		if (new < 0 || new > max)
>> > 			ret = LONG_MAX;
>> > 		else if (iter == ucounts)
>> > 			ret = new;
>> > 		max = READ_ONCE(iter->ns->ucount_max[type]);
>> > 	}
>> > 	return ret;
>> > }
>> >
>> > This will avoid double checking the same userns tree.
>> >
>> > Or even modify inc_rlimit_ucounts. This function is used elsewhere like
>> > this:
>> >
>> >
>> > msgqueue = inc_rlimit_ucounts(info->ucounts, UCOUNT_RLIMIT_MSGQUEUE, mq_bytes);
>> > if (msgqueue == LONG_MAX || msgqueue > rlimit(RLIMIT_MSGQUEUE)) {
>> >
>> >
>> > memlock = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked);
>> > if (!allowed && (memlock == LONG_MAX || memlock > lock_limit) && !capable(CAP_IPC_LOCK)) {
>> >
>> >
>> > In all cases, we have max value for comparison.
>> 
>> Good point.   The downside is that it means we can't use the same code
>> in exec.  The upside is that the code is more idiomatic.
>
> My suggestion was before I saw the 8/8 patch :)
>
> We can make something like:
>
> static inline bool is_nproc_overlimit(struct task_struct *task)
> {
> 	return (task_ucounts(task) != &init_ucounts) &&
> 		!has_capability(task, CAP_SYS_RESOURCE) &&
> 		!has_capability(task, CAP_SYS_ADMIN);
> }
>
> In copy_process:
>
> if (inc_rlimit_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1, rlimit(RLIMIT_NPROC)) == LONG_MAX) {
> 	if (is_nproc_overlimit(p))
> 		goto bad_fork_cleanup_count;
> }
>
> In do_execveat_common:
>
> if ((current->flags & PF_NPROC_CHECK) &&
>     is_ucounts_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) &&
>     is_nproc_overlimit(current)) {
> 	retval = -EAGAIN;
> 	goto out_ret;
> }


The more I think about it the more I suspect 8/8 is the wrong way to go.

The report is that adding the capability calls in kernel/sys.c which I
moved into execve broke apache.  As the change was about removing
inconsistencies I expect I should just start with the revert and keep
the difference between the two code paths.

My gut feel is that both the capable and the magic exception of a user
are wrong.  If I am wrong people can report a bug and the code can get
fixed.

But definitely a bug fix branch is the wrong place to be expanding what
is allowed without it clearly being a bug.

Eric


      reply	other threads:[~2022-02-11 20:15 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <87o83e2mbu.fsf@email.froward.int.ebiederm.org>
     [not found] ` <20220211021324.4116773-6-ebiederm@xmission.com>
     [not found]   ` <20220211113454.socmlrne5heux7q7@example.org>
     [not found]     ` <87sfspz409.fsf@email.froward.int.ebiederm.org>
2022-02-11 18:40       ` [PATCH 6/8] ucounts: Handle inc_rlimit_ucounts wrapping in fork Alexey Gladkov
2022-02-11 19:56         ` Eric W. Biederman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zgmxxjma.fsf@email.froward.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=brauner@kernel.org \
    --cc=containers@lists.linux.dev \
    --cc=keescook@chromium.org \
    --cc=legion@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mkoutny@suse.com \
    --cc=ran.xiaokai@zte.com.cn \
    --cc=shuah@kernel.org \
    --cc=solar@openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).