All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Paul Leiber <paul@onlineschubla.de>
To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Subject: [BUG] Passed through PCI devices lost after Windows HVM DomU reboot
Date: Mon, 7 Jun 2021 23:44:03 +0000	[thread overview]
Message-ID: <FRYP281MB05828EB0C49C963C7954578CB0389@FRYP281MB0582.DEUP281.PROD.OUTLOOK.COM> (raw)

Dear developers,

I  am a mostly very happy Xen beginner. My Debian PV DomUs work like a charm out of the box. The only remaining Windows instance is a MediaPortal TV Server backend on Windows Server 2012 HVM DomU. But I have problems with reliably passing through PCIe cards to this Windows HVM DomU. Further testing has lead me to the suspicion that there might be a bug where PCI passthrough does not work after a Windows DomU reboot.

Please be patient with me if am not reporting this bug as is custom, this is my first official bug report ever. (If it is indeed a bug.)

Background: I am running a standard apt-get Xen installation based on Debian Buster.  My hardware is a Fujitsu D3417-B1 with an Intel Xeon CPU E3-1235L v5, 32 GB ECC RAM, and a Hauppauge HVR-2205 TV tuner card. For getting PCI passthrough to work, I needed to set "permissive=1" and limit the Dom0 memory size. I then could pass through the PCIe TV tuner card without any problem to my Windows Server 2012 DomU. It got detected and worked very well in the Windows DomU. However, sometimes the card somehow got "lost" in the DomU, i. e. it disappeared from device manager and wasn't functional anymore. I then could reattach it to the DomU with "xl pci-attach". My TV software (MediaPortal) then seemed to recognize a new PCIe card instance (e. g. an internal id number of the tuner card was incremented). I then needed to reapply some settings. Other than that, the card was fully functional.

After more testing, I have come to the following conclusion: It seems that every time I do a _reboot_ from within a Windows DomU, the PCI device does not get attached to the DomU. After DomU reboot, it is immediately available for attachment in the Dom0 when I check for it with "xl pci-assignable-list", and I can reattach it to the DomU with "xl pci-attach" without any major problems beside some annoying side effects (e. g. need to reapply settings). If I _shut down_ the DomU from within the DomU (with Windows shutdown mechanism) or the Dom0 (with "xl shutdown) and restart the DomU with "xl create", the PCIe device gets attached automatically at DomU boot and unwanted side effects do not occur.

What I would expect is that the passed through PCIe device is available in my Windows DomU after each reboot (e. g. after Windows Update automatically installs patches and reboots).

Steps which I can take to provoke the unwanted behavior:
1. Install Xen on Debian Buster following mostly https://wiki.xenproject.org/wiki/Xen_Project_Beginners_Guide
2. Set up PCI passthrough following mostly https://wiki.xenproject.org/wiki/Xen_PCI_Passthrough (see additional details below)
3. Set up a Windows Server 2012 HVM (cfg below)
4. Start Windows Server 2012 HVM with "xl create /etc/xen/matrix.cfg", connect with Windows HVM via VNC for installation and initial settings, then via RemoteDesktop
5. Check for PCIe device in Windows Device Manager: it is available
6. Initiate reboot in Windows (Go to Server Manager -> local server -> reboot)
7. Connect with rebooted Windows via RemoteDesktop
8. Check for PCIe device in Windows Device Manager, it is not available
9. Check for PCIe device in Dom0 with " xl pci-assignable-list", it is available for passthrough
10. Attach the PCIe device to the Windows DomU, e.g. via "xl pci-attach 9 01:00.0"
11. Check for PCIe device in Windows Device Manager, it is available again
12. Repetition is possible by skipping to step 6

The xl log for a normal cold start (PCIe device attached normally) looks like this:
Waiting for domain matrix (domid 10) to die [pid 3910]

The log after a reboot (PCIe device not attached automatically) looks like this:
Waiting for domain matrix (domid 8) to die [pid 3113]
Domain 8 has shut down, reason code 1 0x1
Action for shutdown reason code 1 is restart
libxl: warning: libxl_domain.c:1739:libxl_retrieve_domain_configuration: Domain 8:Device present in JSON but not in xenstore, ignored
Domain 8 needs to be cleaned up: destroying the domain
Done. Rebooting now

Searching for this exact error message ("Device present in JSON but not in xenstore, ignored"), I found the following quite old bug report which sounds suspiciously similar to my experience, only for PV DomUs:
https://bugzilla.redhat.com/show_bug.cgi?id=233801

Additional information which might be helpful:
- I could reproduce this behavior with two different TV tuner cards from different manufacturers (Hauppauge HVR-2205 or Digital Devices Max M4) and a network card (Intel 82574L)
- I tested the behavior with a fresh install of Windows 10, with the same results.
- I used the Hauppauge PCIe card in a linux PV DomU (with VDR software) where the card was attached very reliably - as far as I can remember, there was only one occurrence of a not working TV card, but I can't remember the details (i. e. if there was a preceding reboot).
- The unwanted behavior did not occur with the bare metal system before I switched to Xen, i. e. Windows Server 2012 running directly on the hardware and the Hauppauge PCIe card.

A description of my problem (which was a little bit less detailed) on the Xen Users mailing list did not get a reply, therefore I am turning to the developer mailing list. Could anybody on this list please give me advice on what I can do solve this issue? Any more information you need to help me or any more testing I could do?

Thanks in advance,

Paul



Additional information:


While trying to fix this, I changed kernel boot parameters. I figured out that giving kernel boot option " xen-pciback.hide" is not necessary as the driver is not built into the kernel, therefore I changed the parameters from
	dom0_mem=1024M,max:1024M xen-pciback.hide=(01:00.0)
to the currently used parameters:
	dom0_mem=1024M,max:1024M


The Digital Devised PCIe device is assigned to xen-pciback via /etc/modprobe.d/xen-pciback.conf. There is no  driver on the Dom0 for the tuner card, therefore no precautions for not loading other drivers are necessary:
	options xen-pciback hide=(0000:01:00.0)


The Hauppauge card needs an additional line for preventing loading the driver in Dom0:
	install saa7164 /sbin/modprobe xen-pciback ; /sbin/modprobe --first-time --ignore-i$
	options xen-pciback hide=(0000:01:00.0)


While doing trial and error, I changed the pci line in the Xen config file, but adding " power_mgmt=1" and "seize=1" didn't change the behavior:
	pci=['01:00.0,permissive=1,power_mgmt=1,seize=1']


Xen config file for the Windows domU (besides the above mentioned changes in the line pci=[...], there were some probably minor changes between first installation and the current status, e. g. I started with VNC and later switched to SPICE):

# kernel = "/usr/lib/xen-4.0/boot/hvmloader"
type='hvm'
memory = 4096
vcpus=2
name = "matrix"
vif = ['bridge=xenbr0,mac=00:16:3E:54:A8:2B']
disk = ['phy:/dev/vg0/matrix,hda,w','phy:/dev/vg0/compudms-data,hdb,w']
device_model_version = 'qemu-xen'
boot="c"
hdtype = 'ahci'
acpi = 1
apic = 1
xen_platform_pci = 1
vendor_device = 'xenserver'
#  PCI Passthrough
pci=['01:00.0,permissive=1,power_mgmt=1']
viridian = 1
stdvga = 1
sdl = 0
serial = 'pty'
usb = 1
usbdevice = 'tablet'
keymap = 'de'
# SPICE
spice=1
spicehost='0.0.0.0'
spiceport=6000
# spicedisable_ticketing enabled is for no spice password, instead use spicepasswd
spicedisable_ticketing=1
#spicepasswd="test"
spicevdagent=1
spice_clipboard_sharing=1
# this will automatically redirect up to 4 usb devices from spice client to domUs
#spiceusbredirection=4
# This adds intel hd audio emulated card used for spice audio
soundhw="hda"


xl info:

host                   : xxx
release                : 4.19.0-14-amd64
version                : #1 SMP Debian 4.19.171-2 (2021-01-30)
machine                : x86_64
nr_cpus                : 4
max_cpu_id             : 3
nr_nodes               : 1
cores_per_socket       : 4
threads_per_core       : 1
cpu_mhz                : 1992.100
hw_caps                : bfebfbff:77faf3ff:2c100800:00000121:0000000f:009c6fbf:00000000:00000100
virt_caps              : hvm hvm_directio
total_memory           : 32542
free_memory            : 20836
sharing_freed_memory   : 0
sharing_used_memory    : 0
outstanding_claims     : 0
free_cpus              : 0
xen_major              : 4
xen_minor              : 11
xen_extra              : .4
xen_version            : 4.11.4
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          :
xen_commandline        : placeholder dom0_mem=1024M,max:1024M
cc_compiler            : gcc (Debian 8.3.0-6) 8.3.0
cc_compile_by          : pkg-xen-devel
cc_compile_domain      : lists.alioth.debian.org
cc_compile_date        : Fri Dec 11 21:33:51 UTC 2020
build_id               : 6d8e0fa3ddb825695eb6c6832631b4fa2331fe41
xend_config_format     : 4


lspci -vvv (excerpt)

01:00.0 Multimedia controller: Digital Devices GmbH Device 000a
        Subsystem: Digital Devices GmbH Device 0050
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 18
        Region 0: Memory at f7200000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [70] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [90] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L1, Exit Latency L0s unlimited, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range A, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Vendor Specific Information: ID=0000 Rev=0 Len=00c <?>
        Kernel driver in use: pciback


Xenstore-ls -fp (excerpt)

/libxl/10 = ""   (n0)
/libxl/10/device = ""   (n0)
/libxl/10/device/vbd = ""   (n0)
/libxl/10/device/vbd/768 = ""   (n0)
/libxl/10/device/vbd/768/frontend = "/local/domain/10/device/vbd/768"   (n0)
/libxl/10/device/vbd/768/backend = "/local/domain/0/backend/vbd/10/768"   (n0)
/libxl/10/device/vbd/768/params = "/dev/vg0/matrix"   (n0)
/libxl/10/device/vbd/768/script = "/etc/xen/scripts/block"   (n0)
/libxl/10/device/vbd/768/frontend-id = "10"   (n0)
/libxl/10/device/vbd/768/online = "1"   (n0)
/libxl/10/device/vbd/768/removable = "0"   (n0)
/libxl/10/device/vbd/768/bootable = "1"   (n0)
/libxl/10/device/vbd/768/state = "1"   (n0)
/libxl/10/device/vbd/768/dev = "hda"   (n0)
/libxl/10/device/vbd/768/type = "phy"   (n0)
/libxl/10/device/vbd/768/mode = "w"   (n0)
/libxl/10/device/vbd/768/device-type = "disk"   (n0)
/libxl/10/device/vbd/768/discard-enable = "1"   (n0)
/libxl/10/device/vbd/832 = ""   (n0)
/libxl/10/device/vbd/832/frontend = "/local/domain/10/device/vbd/832"   (n0)
/libxl/10/device/vbd/832/backend = "/local/domain/0/backend/vbd/10/832"   (n0)
/libxl/10/device/vbd/832/params = "/dev/vg0/compudms-data"   (n0)
/libxl/10/device/vbd/832/script = "/etc/xen/scripts/block"   (n0)
/libxl/10/device/vbd/832/frontend-id = "10"   (n0)
/libxl/10/device/vbd/832/online = "1"   (n0)
/libxl/10/device/vbd/832/removable = "0"   (n0)
/libxl/10/device/vbd/832/bootable = "1"   (n0)
/libxl/10/device/vbd/832/state = "1"   (n0)
/libxl/10/device/vbd/832/dev = "hdb"   (n0)
/libxl/10/device/vbd/832/type = "phy"   (n0)
/libxl/10/device/vbd/832/mode = "w"   (n0)
/libxl/10/device/vbd/832/device-type = "disk"   (n0)
/libxl/10/device/vbd/832/discard-enable = "1"   (n0)
/libxl/10/device/console = ""   (n0)
/libxl/10/device/console/0 = ""   (n0)
/libxl/10/device/console/0/frontend = "/local/domain/10/console"   (n0)
/libxl/10/device/console/0/backend = "/local/domain/0/backend/console/10/0"   (n0)
/libxl/10/device/console/0/frontend-id = "10"   (n0)
/libxl/10/device/console/0/online = "1"   (n0)
/libxl/10/device/console/0/state = "1"   (n0)
/libxl/10/device/console/0/protocol = "vt100"   (n0)
/libxl/10/device/vkbd = ""   (n0)
/libxl/10/device/vkbd/0 = ""   (n0)
/libxl/10/device/vkbd/0/frontend = "/local/domain/10/device/vkbd/0"   (n0)
/libxl/10/device/vkbd/0/backend = "/local/domain/0/backend/vkbd/10/0"   (n0)
/libxl/10/device/vkbd/0/frontend-id = "10"   (n0)
/libxl/10/device/vkbd/0/online = "1"   (n0)
/libxl/10/device/vkbd/0/state = "1"   (n0)
/libxl/10/device/vif = ""   (n0)
/libxl/10/device/vif/0 = ""   (n0)
/libxl/10/device/vif/0/frontend = "/local/domain/10/device/vif/0"   (n0)
/libxl/10/device/vif/0/backend = "/local/domain/0/backend/vif/10/0"   (n0)
/libxl/10/device/vif/0/frontend-id = "10"   (n0)
/libxl/10/device/vif/0/online = "1"   (n0)
/libxl/10/device/vif/0/state = "1"   (n0)
/libxl/10/device/vif/0/script = "/etc/xen/scripts/vif-bridge"   (n0)
/libxl/10/device/vif/0/mac = "00:16:3e:54:a8:2b"   (n0)
/libxl/10/device/vif/0/bridge = "xenbr0"   (n0)
/libxl/10/device/vif/0/handle = "0"   (n0)
/libxl/10/device/vif/0/type = "vif_ioemu"   (n0)
/libxl/10/device/pci = ""   (n0)
/libxl/10/device/pci/0 = ""   (n0)
/libxl/10/device/pci/0/frontend = "/local/domain/10/device/pci/0"   (n0)
/libxl/10/device/pci/0/backend = "/local/domain/0/backend/pci/10/0"   (n0)
/libxl/10/device/pci/0/frontend-id = "10"   (n0)
/libxl/10/device/pci/0/online = "1"   (n0)
/libxl/10/device/pci/0/state = "1"   (n0)
/libxl/10/device/pci/0/domain = "matrix"   (n0)
/libxl/10/device/pci/0/key-0 = "0000:01:00.0"   (n0)
/libxl/10/device/pci/0/dev-0 = "0000:01:00.0"   (n0)
/libxl/10/device/pci/0/vdevfn-0 = "48"   (n0)
/libxl/10/device/pci/0/opts-0 = "msitranslate=0,power_mgmt=1,permissive=1"   (n0)
/libxl/10/device/pci/0/state-0 = "1"   (n0)
/libxl/10/device/pci/0/num_devs = "1"   (n0)
/libxl/10/type = "hvm"   (n0)
/libxl/10/dm-version = "qemu_xen"   (n0)



             reply	other threads:[~2021-06-07 23:44 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-07 23:44 Paul Leiber [this message]
2021-06-08  6:24 ` [BUG] Passed through PCI devices lost after Windows HVM DomU reboot Jan Beulich
2021-06-08 15:13   ` AW: " Paul Leiber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FRYP281MB05828EB0C49C963C7954578CB0389@FRYP281MB0582.DEUP281.PROD.OUTLOOK.COM \
    --to=paul@onlineschubla.de \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.