From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14712C47094 for ; Mon, 7 Jun 2021 19:56:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EFA5E61002 for ; Mon, 7 Jun 2021 19:56:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231745AbhFGT6b (ORCPT ); Mon, 7 Jun 2021 15:58:31 -0400 Received: from mail-lj1-f182.google.com ([209.85.208.182]:41491 "EHLO mail-lj1-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231410AbhFGT62 (ORCPT ); Mon, 7 Jun 2021 15:58:28 -0400 Received: by mail-lj1-f182.google.com with SMTP id z22so7890756ljh.8 for ; Mon, 07 Jun 2021 12:56:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8LJdE0/DOUMHrqesI+TC9uG48g5eWLGdmgXC+rBKWYk=; b=jq4wEsDdfDzBDd8wF08Z7U1wcIi/8IVpvGjKMjl6XhnY/ZS3tZh0wi2GySCz4dreC8 12ec2GVcpWM7xD+rdbt+k+2UnY3bcr5GVx4/FjmjbgkSJ7gqkKEo7Wax5CnPkmacjZ40 ktRBiCErqwr6bvjsnxafJrBNTFh3vYovlJnknW/597rkJ3F12+yKwna6UIgsacXymjD3 Xys/VTk7EZJFbx02dJ7LlMsAHALzCtqFxoRQVj3Ulr0Ne+6JCZ4JZr0UrMVmYdzaBfTG qtMQfj4O+y+zenhGQ5IxYa07aDSH660rczA1ojHrgURK3desoY+WbHEQZozss75Lt1ix GXJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8LJdE0/DOUMHrqesI+TC9uG48g5eWLGdmgXC+rBKWYk=; b=Ii+i2+hcu+P80+x6vxJHYEuDNz0QHMRkropSccy+V0Fpp99s0WeJOx+Xo8h//6jTQv 5ws98HX+2mzcy3FrBQFP/jJwfU/7u7Lz9/XVT115INb6DyUM7HjEFadNmXmcBApCjFlB MZ925yXsIfc4v7J3sACD2gZah1C7xz5lPUopbi5t6Fv3zB/hSKZdsEPlgLFG0aVliVkz vAM73yUkFlXa9YCq8cd575nBULfd30xv595MeEZt6sXw9J557+q7T9TRhfGpzpdV+rgf 8uR5szZKO+oeqvXOyldlaIdWd0Au0H8zK8GT1tPMWHEmSf4cM0jXmmcVYa4tCHUMiqPe anFA== X-Gm-Message-State: AOAM533EAgBldQIlnYEkivYbAQN7Tu0hUuhDp0/+yYKftp/EtdscoRut JKZQwO3dHvO+lXd/0Ud2V7ZBoq6umgoP5IcPjlB6hA== X-Google-Smtp-Source: ABdhPJyFA7F43gQQ8CHg14LneeqgMAlavMivZIGL81fpZSDpUZHp85rgUDcvc1UaV4dhCuLt0y/QJWr/mOni0F+MtTo= X-Received: by 2002:a05:651c:178f:: with SMTP id bn15mr16369600ljb.448.1623095735829; Mon, 07 Jun 2021 12:55:35 -0700 (PDT) MIME-Version: 1.0 References: <00000000000017977605c395a751@google.com> In-Reply-To: From: Jann Horn Date: Mon, 7 Jun 2021 21:55:09 +0200 Message-ID: Subject: Re: split_huge_page_to_list() races with page_mapcount() on migration entry in smaps code? [was: Re: [syzbot] kernel BUG in __page_mapcount] To: Matthew Wilcox Cc: Linux-MM , Zi Yan , Peter Xu , "Kirill A. Shutemov" , Konstantin Khlebnikov , Andrew Morton , chinwen.chang@mediatek.com, kernel list , syzkaller-bugs , Vlastimil Babka , Michel Lespinasse , syzbot Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 7, 2021 at 8:03 PM Matthew Wilcox wrote: > On Mon, Jun 07, 2021 at 07:27:23PM +0200, Jann Horn wrote: > > === Short summary === > > I believe the issue here is a race between /proc/*/smaps and > > split_huge_page_to_list(): > > > > The codepath for /proc/*/smaps walks the pagetables and (e.g. in > > smaps_account()) calls page_mapcount() not just on pages from normal > > PTEs but also on migration entries (since commit b1d4d9e0cbd0a > > "proc/smaps: carefully handle migration entries", from Linux v3.5). > > page_mapcount() expects compound pages to be stable. > > > > The split_huge_page_to_list() path first protects the compound page by > > locking it and replacing all its PTEs with migration entries (since > > the THP rewrite in v4.5, I think?), then does the actual splitting > > using __split_huge_page(). > > > > So there's a mismatch of expectations here: > > The smaps code expects that migration entries point to stable compound > > pages, while the THP code expects that it's okay to split a compound > > page while it has migration entries. > > Will it be a colossal performance penalty if we always get the page > refcount after looking it up? That will cause split_huge_page() to > fail to split the page if it hits this race. Hmm - but with that approach I'm not sure you could even easily take a refcount on a page whose refcount may be frozen and which may be in the middle of being shattered? get_page_unless_zero() is wrong because you can't take references on tail pages, right? (Or can you?) And try_get_page() is wrong because it bugs out if the refcount is zero - and even if it didn't do that, you might end up holding a reference on the head page while the page you're actually interested in is a tail page? I guess if it was really necessary, it'd be possible to do some kind of retry thing that grabs a reference on the compound head, then checks that the tail page is still associated with the compound head, and if not, drops the compound head and tries again?