NTFS3 file system kernel mode driver
 help / color / mirror / Atom feed
From: Gabriel Marcano <gem5597@gmail.com>
To: ntfs3@lists.linux.dev
Subject: [BUG] ntfs3: Multiple ntfs_readdir calls with unlinks in between fail to list all elements in directory
Date: Wed, 22 Nov 2023 17:05:31 -0800	[thread overview]
Message-ID: <CANVH8qL=r2H_26EjLvrbBj=TBqG_M6yXBerrpWY=dPcjLSKnNA@mail.gmail.com> (raw)

I think I've stumbled upon the bug described here:
   https://bugzilla.kernel.org/show_bug.cgi?id=217035

I've also added my own findings to it. I'll include them here as well.

When using `opendir` and `readdir` from userland, when a folder has more than
1024 entries it looks like glibc only asks for 1024 entries from the kernel at
a time via `getdents64` on my machine. It then makes a second call upon the
second `readdir` userland call to fetch the remaining files. This works fine if
nothing happens to the directory between calls. However, if all files returned
by the first `readdir` call are deleted/unlinked before the second `readdir`
call is made, the second `readdir` call returns nothing, which is wrong as
there are still files in the directory!

Specifically, if I make a directory with 1024 files, the first readdir call
returns 1024 entries including '.' and '..'. If I delete the 1022 actual files,
then call `readdir` again to fetch the last two files to delete them, the
kernel returns nothing.

I was able to build a VM and debug the kernel. On the second call to
`ntfs_readdir`, in the `ntfs_read_hdr` helper function, I noticed that we never
got past this check in fs/ntfs3/dir.c:346 :

       /* Skip already enumerated. */
       if (vbo + off < pos)
               continue;


On the final iteration of the containing loop, the loop is broken by
fs/ntfs3/dir.c:343 :

       if (de_is_last(e))
               return 0;

Which is checking if the entry is the last one in the header (I think?).

My interpretation of what's going on is as follows:

1. first ntfs_readdir call returns 1024 elements (including . and ..), with
   ctx->pos == 199328
2. we delete all 1022 actual files from userland (and not . and ..)
3. when ntfs_readdir is called again, ctx->pos is still 199328 as expected,
   but due to all of the deletions, it looks like the position of the last 2
   files in the header have moved somehow? I haven't been able to figure out
   how unlinking changes the inodes. This causes the ntfs_read_hdr code to
   skip all entries as the check hits the end flag before getting to the
   "current" position.

If this is actually the problem, I'm not sure how to go about fixing it yet as
I'm not at all versed in NTFS. EXT4 does a check to see if the underlying
directory structure has changed between readdir calls, but they check the inode
version to confirm if there's been a change. I don't yet know if there's a
similar thing we can do in NTFS. The way btrfs seems to do it is that the
entries are strictly assigned increasing offsets values (doesn't seem to be a
physical position). The issue with NTFS seems to be that they're tied to a
physical location in some NTFS structure. ntfs-3g dances around the issue by
letting FUSE deal with the ctx->pos aspect at the kernel level. I'm not sure how
vfat gets around the issue, but they don't have this bug.

The steps to reproduce the problem are pretty easy-- this comment in the kernel
bugzilla makes it trivial:
https://bugzilla.kernel.org/show_bug.cgi?id=217035#c4

And I'm also including the source code for my crappy C++ program that gets the
directory info from the kernel (through glibc), and reports "count: 1022" on
ntfs3, while if I reproduce the test setup from the comment above on my btrfs
root, or in a vfat filesystem, I get "count: 1024" and all files are deleted
properly.

Gabriel Marcano

--

#include <stdio.h>
#include <sys/types.h>
#include <dirent.h>

#include <vector>
#include <iostream>
#include <filesystem>

int main()
{
   DIR *dp;
   struct dirent *ep;
   dp = opendir("/mnt/foo/bar/");
   std::vector<std::string> names;
   while ((ep = readdir(dp)) != NULL)
   {
       if (ep->d_name[0] != '.')
       {
           names.emplace_back(ep->d_name);
           //if (names.size() < 10000)
           {
               std::filesystem::remove(
                   std::filesystem::path("/mnt/foo/bar") / ep->d_name);
           }
       }
   }
   std::cout << "count: " << names.size() << std::endl;
   return 0;
}

--

                 reply	other threads:[~2023-11-23  1:05 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANVH8qL=r2H_26EjLvrbBj=TBqG_M6yXBerrpWY=dPcjLSKnNA@mail.gmail.com' \
    --to=gem5597@gmail.com \
    --cc=ntfs3@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).