Linux-man Archive mirror
 help / color / mirror / Atom feed
From: "Vinícius Schütz Piva" <vinicius.vsczpv@outlook.com>
To: Alejandro Colomar <alx@kernel.org>
Cc: linux-man@vger.kernel.org
Subject: [PATCH] getdents.2: add note to misleading field "d_off" in struct linux_dirent64
Date: Tue, 13 Feb 2024 13:23:55 -0300	[thread overview]
Message-ID: <SCZPR80MB71490A2B475CBC153A5B3776FC4F2@SCZPR80MB7149.lamprd80.prod.outlook.com> (raw)

Sorry for the duplicate email; tried sending to myself to doublecheck
and forget to clear the Cc.

The getdents.2 man page details a pair syscalls: getdents() and
getdents64(), both of which are used to get the entries of a directory.
The results are populated into a structure, with the difference between
both syscalls being mostly bitwidth related.

However, the behaviour or the 'd_off' field in both struct linux_dirent
and linux_dirent64 is wrongly documented in this man page.

According to the current documentation, 'd_off' is used to store the
"Offset to the next linux_dirent [...] the distance from the start of
the directory to the start of the next linux_dirent."

This value, thought, is filesystem dependent, and much of the time it
stores no such offset.

According to readdir.3 [1] manpage:

 > The value returned in d_off is the same as would be returned by
 > calling telldir(3) at the current position in the directory stream.
 > Be aware that despite its type and name, the d_off field is seldom
 > any kind of directory offset on modern filesystems. Applications
 > should treat this field as an opaque value, making no assumptions
 > about its contents; see also telldir(3).

Of course, readdir(3) is a glibc function with no ties to
getdents(2), but it was implemented with such syscall and considering
that readdir(3) doesn't process the data from getdents(2) my belief is
that it inherited said behaviour from it [2]. telldir(3) tells a similar
story.

On the example provided at the end of getdents.2, notable is the d_off
value of the very last entry:

--------------- nread=120 ---------------
inode#    file type  d_reclen  d_off   d_name
       2  directory    16         12  .
       2  directory    16         24  ..
      11  directory    24         44  lost+found
      12  regular      16         56  a
  228929  directory    16         68  sub
   16353  directory    16         80  sub2
  130817  directory    16       4096  sub3

which makes a very sudden jump that is obviously not where the entry is
located.

Rerunning this same example but on a ext4 partition gives you garbage
values:

--------------- nread=176 ---------------
inode#    file type  d_reclen  d_off   d_name
    2050  directory    24 4842312636391754590  sub2
       2  directory    24 4844777444668968292  ..
    2051  directory    24 7251781863886579875  sub3
      12  regular      24 7470722685224223838  a
    2049  directory    24 7653193867028490235  sub
      11  directory    32 7925945214358802294  lost+found
       2  directory    24 9223372036854775807  .

In fact, I've had a hard time reproducing nice d_off values on ext2 too,
so what the filesystem does with d_off must have change since then.

On tmpfs it's a count:

--------------- nread=144 ---------------
inode#    file type  d_reclen  d_off   d_name
       1  directory    24          1  .
       1  directory    24          2  ..
       5  directory    24          3  sub3
       4  directory    24          4  sub2
       3  directory    24          5  sub
       2  regular      24          6  a

I've also not been the first to notice this, as you can see from this
stackoverflow issue opened last year:

https://stackoverflow.com/q/75119224

Safe to say, it's a very unreliable field.

Below is a patch that adds a warning besides the d_off field in both
structures, plus a brief explanation on why this field can be mislea-
ding (while also directing the user towards the readdir.3 man page).

[1] https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/man3/readdir.3
[2] https://elixir.bootlin.com/glibc/glibc-2.39/source/sysdeps/unix/sysv/linux/readdir.c

Signed-off-by: Vinícius Schütz Piva <vinicius.vsczpv@outlook.com>
---
 man2/getdents.2 | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/man2/getdents.2 b/man2/getdents.2
index 0d4c379..3427f4b 100644
--- a/man2/getdents.2
+++ b/man2/getdents.2
@@ -67,7 +67,7 @@ structure is declared as follows:
 .EX
 struct linux_dirent {
     unsigned long  d_ino;     /* Inode number */
-    unsigned long  d_off;     /* Offset to next \fIlinux_dirent\fP */
+    unsigned long  d_off;     /* Not an offset; see below */
     unsigned short d_reclen;  /* Length of this \fIlinux_dirent\fP */
     char           d_name[];  /* Filename (null\-terminated) */
                       /* length is actually (d_reclen \- 2 \-
@@ -84,8 +84,12 @@ struct linux_dirent {
 .I d_ino
 is an inode number.
 .I d_off
-is the distance from the start of the directory to the start of the next
-.IR linux_dirent .
+is a filesystem specific value with no specific meaning to userspace, 
+though on older filesystems it used to be the distance from the start 
+of the directory to the start of the next
+.IR linux_dirent ; 
+see
+.BR readdir (3) .
 .I d_reclen
 is the size of this entire
 .IR linux_dirent .
@@ -167,7 +171,7 @@ structures of the following type:
 .EX
 struct linux_dirent64 {
     ino64_t        d_ino;    /* 64\-bit inode number */
-    off64_t        d_off;    /* 64\-bit offset to next structure */
+    off64_t        d_off;    /* Not an offset; see readdir(3) */
     unsigned short d_reclen; /* Size of this dirent */
     unsigned char  d_type;   /* File type */
     char           d_name[]; /* Filename (null\-terminated) */
-- 
2.39.2


             reply	other threads:[~2024-02-13 16:24 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-13 16:23 Vinícius Schütz Piva [this message]
2024-02-25 11:13 ` [PATCH] getdents.2: add note to misleading field "d_off" in struct linux_dirent64 Alejandro Colomar
  -- strict thread matches above, loose matches on Subject: below --
2024-02-13 16:21 Vinícius Schütz Piva

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SCZPR80MB71490A2B475CBC153A5B3776FC4F2@SCZPR80MB7149.lamprd80.prod.outlook.com \
    --to=vinicius.vsczpv@outlook.com \
    --cc=alx@kernel.org \
    --cc=linux-man@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).