From: Myron Stowe <myron.stowe@redhat.com>
To: kay@vrfy.org
Cc: linux-hotplug@vger.kernel.org, greg@kroah.com,
alex.williamson@redhat.com, linux-pci@vger.kernel.org,
yuxiangl@marvell.com, yxlraid@gmail.com,
linux-kernel@vger.kernel.org
Subject: [PATCH] udevadm-info: Don't access sysfs entries backing device I/O port space
Date: Sat, 16 Mar 2013 21:35:12 +0000 [thread overview]
Message-ID: <20130316213512.2974.17303.stgit@amt.stowe> (raw)
I've been working on identifying the root cause of an issue exposed by
'udevadm' that was first exposed on the linux-pci mail list [1] and
believe that there is now enough of an understanding to propose a fix.
What was originally witnessed was the platform hanging after "udevadm info
--attribute-walk --path=/sys/devices/pci0000:00/<...>/block/sda" is ran.
Xiangliang was able to isolate the failure to accesses involving a Marvell
9125 device's I/O BARs, or more specifically, accesses to the I/O port
space backing the device's I/O BARs (a.k.a. the device's I/O port
resources, or regions). With this knowledge he was able to reproduce the
hang targeting the device's sysfs 'resource<N>' entries, where N
represents an I/O BARs, with "cat /sys/devices/<...>/resource<N>".
In my research, looking for possible solutions, I noticed that kernel
commit 8633328 introduced sysfs based reading and writing of I/O port
related 'resource<N>' entries as part of adding virtualization based
device assignment functionality [2]. Note that these regions directly map
to the device's control and status registers [3].
Putting together these pieces of information we now understand that:
o udevadm based attribute walking initiates read accesses of all the
entries in a device's sysfs directory [4],
o sysfs 'resource<N>' entries correspond to the device's internal
status and control registers used for driving the device,
o If the 'resource<N>' entry corresponds to a device's I/O BAR, the
device's control and status registers are directly accessible by
userspace.
Allowing userspace access to a device's registers introduces potential
simultaneous interaction with the device by a second, competing, entity;
There is the device's driver, which believes it exclusively owns the
device, and an unknown, potential second entity, which can effect control
and status changes to the device asynchronously.
Device status and control registers being accessed from an entity that has
no idea what is being read or written is just asking for trouble. Even
just reading can have consequences as the register may be a "read once to
clear" or some similar type. I think we have just been lucky, or
blissfully ignorant, concerning problems that may have, and still could
be, occurring due to this situation.
There is an aspect at play here that I still do not understand (likely
something obvious that I'm just not seeing). The sysfs read routine for
accessing I/O port 'resource<N>' entries only supports 1, 2, and 4 byte
reads (which respectively correspond to inb/outb, inw/outw, and inl/outl
I/O port accessors). When "udevadm ..." executes, the udev internals
attempt reads of 4K byte chunks.
"udevadm info --attribute-walk --path=<pci_device_path>"
print_device_chain()
print_all_attributes()
...
udev_device_get_sysattr_value()
char value[4096];
...
size = read(fd, value, sizeof(value));
...
-- ^ userspace ^ -- v kernel v --
pci_read_resource_io(..., count) # sysfs read setup in pci_create_attr()
pci_resource_io(..., count, ...)
...
if (port + count - 1 > pci_resource_end(pdev, i))
return -EINVAL;
...
What I don't understand is how the device's I/O port space is successfully
getting read. It looks to me like 'pci_resource_io()' would fail the
access size check and return '-EINVAL' having never attempted the read's
access to I/O port space causing the system to hang.
I'm keep looking into this but I do *not* have access to a platform with a
Marvell 9125 device.
Reference(s)/Foot Note(s):
[1] https://lkml.org/lkml/2013/3/7/242
[2] Note that due to the implementation specifics only the 'resource<N>'
entries representing I/O BARs can be read or written via sysfs.
Sysfs' 'resource<N>' entries representing MMIO do not have sysfs
based read/write routines as only mmap'ing of these entries is
exposed (./drivers/pci/pci-sysfs.c::pci_create_attr()).
[3] The kernel's sysfs documentation states: "Attributes should be ASCII
text files..." (./Documentation/filesystems/sysfs.txt). I wonder if
this is just out-of-date infomation as sysfs obviously supports
creating binary files (./fs/sysfs/bin.c::sysfs_create_bin_file()).
[4] Note that udevadm-info does skip a specifically named set of entries
(./src/udevadm-info.c::skip_attribute()).
---
Myron Stowe (1):
udevadm-info: Don't access sysfs 'resource<N>' files
src/udevadm-info.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
--
next reply other threads:[~2013-03-16 21:35 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-16 21:35 Myron Stowe [this message]
2013-03-16 21:35 ` [PATCH] udevadm-info: Don't access sysfs 'resource<N>' files Myron Stowe
2013-03-16 22:11 ` Greg KH
2013-03-16 22:55 ` Bjorn Helgaas
2013-03-16 23:50 ` Myron Stowe
2013-03-17 1:03 ` Greg KH
2013-03-17 4:11 ` Alex Williamson
2013-03-17 5:36 ` Greg KH
2013-03-17 13:38 ` Alex Williamson
2013-03-17 14:00 ` Kay Sievers
2013-03-17 14:20 ` Myron Stowe
2013-03-17 14:29 ` Kay Sievers
2013-03-17 14:36 ` Myron Stowe
2013-03-17 14:43 ` Kay Sievers
2013-03-18 16:24 ` Alex Williamson
2013-03-18 16:41 ` Greg KH
2013-03-18 16:51 ` Alex Williamson
2013-03-18 17:20 ` Bjørn Mork
2013-03-18 17:54 ` Alex Williamson
2013-03-18 18:02 ` Robert Brown
2013-03-18 18:25 ` Bjørn Mork
2013-03-18 18:59 ` Alex Williamson
2013-03-19 16:57 ` Myron Stowe
2013-03-19 17:06 ` Myron Stowe
2013-03-17 14:33 ` Myron Stowe
2013-03-17 22:28 ` Alex Williamson
2013-03-18 14:50 ` Don Dutile
2013-03-18 16:34 ` Alex Williamson
2013-03-17 14:12 ` Myron Stowe
2013-03-19 1:54 ` Robert Hancock
2013-03-19 2:03 ` Greg KH
2013-03-19 2:09 ` Robert Hancock
2013-03-19 2:35 ` Greg KH
2013-03-19 3:08 ` Robert Hancock
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130316213512.2974.17303.stgit@amt.stowe \
--to=myron.stowe@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=greg@kroah.com \
--cc=kay@vrfy.org \
--cc=linux-hotplug@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=yuxiangl@marvell.com \
--cc=yxlraid@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).