All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/14] spapr: add support for pci hotplug
@ 2013-12-05 22:32 Michael Roth
  2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 01/14] spapr: populate DRC entries for root dt node Michael Roth
                   ` (14 more replies)
  0 siblings, 15 replies; 39+ messages in thread
From: Michael Roth @ 2013-12-05 22:32 UTC (permalink / raw
  To: qemu-devel; +Cc: aik, agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

These patches are based on ppc-next, and can also be obtained from:

https://github.com/mdroth/qemu/commits/spapr-pci-hotplug-v2-ppc-next

v2:
  * re-ordered patches to fix build bisectability (Alexey)
  * replaced g_warning with DPRINTF in RTAS calls for guest errors (Alexey)
  * replaced g_warning with fprintf for qemu errors (Alexey)
  * updated RTAS calls to use pre-existing error/success macros (Alexey)
  * replaced DR_*/SENSOR_* macros with INDICATOR_* for set-indicator/
    get-sensor-state (Alexey)

OVERVIEW

These patches add support for PCI hotplug for SPAPR guests. We advertise
each PHB as DR-capable (as defined by PAPR 13.5/13.6) with 32 hotpluggable
PCI slots per PHB, which models a standard PCI expansion device for Power
machines where the DRC name/loc-code/index for each slot are generated
based on bus/slot number.

This is compatible with existing guest kernel's via the rpaphp hotplug
module, and existing userspace tools such as drmgr/librtas/rtas_errd for
managing devices, in theory...

NOTES / ADDITIONAL DEPENDENCIES

Due to an issue with rpaphp, a workaround must be used for older guest
kernels which relies on using bus rescan / remove sysfs interfaces instead
of rpaphp-provided hotplug interfaces.

Guest kernel fixes for rpaphp are in progress and available for testing
here (there's still currently a benign issue with duplicate eeh sysfs
entries with these, but the full guest-driven hotplug workflow is
functional):

  https://github.com/mdroth/linux/commits/pci-hotplug-fixes

Alternatively, there are updated userspace tools which add a "-v" option
to drmgr to utilize bus rescan/remove instead of relying on rpaphp:

  https://github.com/tyreld/powerpc-utils/commits/hotplug

It's possible to test guest-driven hotplug without either of these using
a workaround (see USAGE below), but not recommended.

PAPR does not currently define a mechanism for generating PCI
hotplug/unplug events, and relies on guest-driven management of devices,
so as part of this series we also introduce an extension to the existing
EPOW power event reporting mechanism (where a guest will query for events
via check-exception RTAS calls in response to an external interrupt) to
surface hotplug/unplug events with the information needed to manage the
devices automatically via the rtas_errd guest service. In order to enable
this qemu-driven hotplug/unplug workflow (for parity with ACPI/SHPC-based
guests), updated versions of librtas/ppc64-diag are required, which are
available here:

  https://github.com/tyreld/ppc64-diag/commits/hotplug
  https://github.com/tyreld/librtas/commits/hotplug

Lacking those, users must manage device hotplug/unplug manually.

Additionally, PAPR requires the presence of additional OF properties
(ibm,my-drc-index and loc-code) for hotpluggable slots that have already
been populated at the time of boot to support unplug, so an updated SLOF
is required to allow for device unplug after a guest reboot. (these
properties cannot currently be added to boot-time FDT, since they will
conflict with SLOF-generated device nodes, so we either need to teach
SLOF to re-use/merge existing entries, or simply have it generate the
required properties values for board-qemu, which is the approach taken
here). A patch for SLOF is available below, along with a pre-built
SLOF binary which includes it (for testing):

  https://github.com/mdroth/SLOF/commit/2e09a2950db0ce8ed464b80cccfea56dccf85d66
  https://github.com/mdroth/qemu/blob/19a390e3270a7defc7158ce29e52ff2b27d666ae/pc-bios/slof.bin

PATCH LAYOUT

Patches
        1-3   advertise PHBs and associated slots as hotpluggable to guests
        4-7   add RTAS interfaces required for device configuration
        8-10  add helpers and potential fix to deal with QEMU-managed BAR
              assignments
        11    enables device_add/device_del for spapr machines and
              guest-driven hotplug
        12-14 define hotplug event structure and emit them in response to
              device_add/device_del

USAGE

With unmodified guests:
  hotplug:
    qemu:
      device_add e1000,id=slot0
    guest:
      drmgr -c pci -s "Slot 0" -n -a
      echo 1 >/sys/bus/pci/rescan
  unplug:
    guest:
      drmgr -c pci -s "Slot 0" -n -r
      echo 1 >/sys/bus/pci/devices/0000:00:00.0/remove
    qemu:
      device_del slot0

With only updated guest kernel:
  hotplug:
    qemu:
      device_add e1000,id=slot0
    guest:
      modprobe rpaphp
      drmgr -c pci -s "Slot 0" -n -a
  unplug:
    guest:
      drmgr -c pci -s "Slot 0" -n -r
    qemu:
      device_del slot0

With only updated powerpc-utils/drmgr:
  hotplug:
    qemu:
      device_add e1000,id=slot0
    guest:
      drmgr -c pci -s "Slot 0" -n -v -a
  unplug:
    guest:
      drmgr -c pci -s "Slot 0" -n -v -r
    qemu:
      device_del slot0

With updated librtas/ppc64-diag and either an updated guest kernel or drmgr:
  hotplug:
    qemu:
      device_add e1000,id=slot0
  unplug:
    qemu:
      device_del slot0

 hw/pci/pci.c                |    5 +-
 hw/ppc/spapr.c              |  174 +++++++++-
 hw/ppc/spapr_events.c       |  228 ++++++++++---
 hw/ppc/spapr_pci.c          |  768 ++++++++++++++++++++++++++++++++++++++++++-
 include/exec/memory.h       |   34 ++
 include/hw/pci-host/spapr.h |    1 +
 include/hw/pci/pci.h        |    1 +
 include/hw/ppc/spapr.h      |   77 ++++-
 memory.c                    |   50 +++
 9 files changed, 1286 insertions(+), 52 deletions(-)

pickGIT: [PATCH v2 06/14] spapr_pci: add get-sensor-state RTAS interface

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH v2 01/14] spapr: populate DRC entries for root dt node
  2013-12-05 22:32 [Qemu-devel] [PATCH v2 00/14] spapr: add support for pci hotplug Michael Roth
@ 2013-12-05 22:32 ` Michael Roth
  2013-12-16  2:59   ` Alexey Kardashevskiy
  2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 02/14] spapr_pci: populate DRC dt entries for PHBs Michael Roth
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 39+ messages in thread
From: Michael Roth @ 2013-12-05 22:32 UTC (permalink / raw
  To: qemu-devel; +Cc: aik, agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

From: Nathan Fontenot <nfont@linux.vnet.ibm.com>

This add entries to the root OF node to advertise our PHBs as being
DR-capable in according with PAPR specification.

Each PHB is given a name of PHB<bus#>, advertised as a PHB type,
and associated with a power domain of -1 (indicating to guests that
power management is handled automatically by hardware).

We currently allocate entries for up to 32 DR-capable PHBs, though
this limit can be increased later.

DrcEntry objects to track the state of the DR-connector associated
with each PHB are stored in a 32-entry array, and each DrcEntry has
in turn have a dynamically-sized number of child DR-connectors,
which we will use later to track the state of DR-connectors
associated with a PHB's physical slots.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |  132 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |   33 ++++++++++++
 2 files changed, 165 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 7e53a5f..ec3ba43 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -81,6 +81,7 @@
 #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
 
 sPAPREnvironment *spapr;
+DrcEntry drc_table[SPAPR_DRC_TABLE_SIZE];
 
 int spapr_allocate_irq(int hint, bool lsi)
 {
@@ -276,6 +277,130 @@ static size_t create_page_sizes_prop(CPUPPCState *env, uint32_t *prop,
     return (p - prop) * sizeof(uint32_t);
 }
 
+static void spapr_init_drc_table(void)
+{
+    int i;
+
+    memset(drc_table, 0, sizeof(drc_table));
+
+    /* For now we only care about PHB entries */
+    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
+        drc_table[i].drc_index = 0x2000001 + i;
+    }
+}
+
+DrcEntry *spapr_add_phb_to_drc_table(uint64_t buid, uint32_t state)
+{
+    DrcEntry *empty_drc = NULL;
+    DrcEntry *found_drc = NULL;
+    int i, phb_index;
+
+    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
+        if (drc_table[i].phb_buid == 0) {
+            empty_drc = &drc_table[i];
+        }
+
+        if (drc_table[i].phb_buid == buid) {
+            found_drc = &drc_table[i];
+            break;
+        }
+    }
+
+    if (found_drc) {
+        return found_drc;
+    }
+
+    if (empty_drc) {
+        empty_drc->phb_buid = buid;
+        empty_drc->state = state;
+        empty_drc->cc_state.fdt = NULL;
+        empty_drc->cc_state.offset = 0;
+        empty_drc->cc_state.depth = 0;
+        empty_drc->cc_state.state = CC_STATE_IDLE;
+        empty_drc->child_entries =
+            g_malloc0(sizeof(DrcEntry) * SPAPR_DRC_PHB_SLOT_MAX);
+        phb_index = buid - SPAPR_PCI_BASE_BUID;
+        for (i = 0; i < SPAPR_DRC_PHB_SLOT_MAX; i++) {
+            empty_drc->child_entries[i].drc_index =
+                SPAPR_DRC_DEV_ID_BASE + (phb_index << 8) + (i << 3);
+        }
+        return empty_drc;
+    }
+
+    return NULL;
+}
+
+static void spapr_create_drc_dt_entries(void *fdt)
+{
+    char char_buf[1024];
+    uint32_t int_buf[SPAPR_DRC_TABLE_SIZE + 1];
+    uint32_t *entries;
+    int offset, fdt_offset;
+    int i, ret;
+
+    fdt_offset = fdt_path_offset(fdt, "/");
+
+    /* ibm,drc-indexes */
+    memset(int_buf, 0, sizeof(int_buf));
+    int_buf[0] = SPAPR_DRC_TABLE_SIZE;
+
+    for (i = 1; i <= SPAPR_DRC_TABLE_SIZE; i++) {
+        int_buf[i] = drc_table[i-1].drc_index;
+    }
+
+    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-indexes", int_buf,
+                      sizeof(int_buf));
+    if (ret) {
+        fprintf(stderr, "Couldn't finalize ibm,drc-indexes property\n");
+    }
+
+    /* ibm,drc-power-domains */
+    memset(int_buf, 0, sizeof(int_buf));
+    int_buf[0] = SPAPR_DRC_TABLE_SIZE;
+
+    for (i = 1; i <= SPAPR_DRC_TABLE_SIZE; i++) {
+        int_buf[i] = 0xffffffff;
+    }
+
+    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-power-domains", int_buf,
+                      sizeof(int_buf));
+    if (ret) {
+        fprintf(stderr, "Couldn't finalize ibm,drc-power-domains property\n");
+    }
+
+    /* ibm,drc-names */
+    memset(char_buf, 0, sizeof(char_buf));
+    entries = (uint32_t *)&char_buf[0];
+    *entries = SPAPR_DRC_TABLE_SIZE;
+    offset = sizeof(*entries);
+
+    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
+        offset += sprintf(char_buf + offset, "PHB %d", i + 1);
+        char_buf[offset++] = '\0';
+    }
+
+    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-names", char_buf, offset);
+    if (ret) {
+        fprintf(stderr, "Couldn't finalize ibm,drc-names property\n");
+    }
+
+    /* ibm,drc-types */
+    memset(char_buf, 0, sizeof(char_buf));
+    entries = (uint32_t *)&char_buf[0];
+    *entries = SPAPR_DRC_TABLE_SIZE;
+    offset = sizeof(*entries);
+
+    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
+        offset += sprintf(char_buf + offset, "PHB");
+        char_buf[offset++] = '\0';
+    }
+
+    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-types", char_buf, offset);
+    if (ret) {
+        fprintf(stderr, "Couldn't finalize ibm,drc-types property\n");
+    }
+}
+
 #define _FDT(exp) \
     do { \
         int ret = (exp);                                           \
@@ -307,6 +432,8 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
     int i, smt = kvmppc_smt_threads();
     unsigned char vec5[] = {0x0, 0x0, 0x0, 0x0, 0x0, 0x80};
 
+    spapr_init_drc_table();
+
     fdt = g_malloc0(FDT_MAX_SIZE);
     _FDT((fdt_create(fdt, FDT_MAX_SIZE)));
 
@@ -590,6 +717,7 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
     int ret;
     void *fdt;
     sPAPRPHBState *phb;
+    DrcEntry *drc_entry;
 
     fdt = g_malloc(FDT_MAX_SIZE);
 
@@ -609,6 +737,8 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
     }
 
     QLIST_FOREACH(phb, &spapr->phbs, list) {
+        drc_entry = spapr_add_phb_to_drc_table(phb->buid, 2 /* Unusable */);
+        g_assert(drc_entry);
         ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt);
     }
 
@@ -633,6 +763,8 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
         spapr_populate_chosen_stdout(fdt, spapr->vio_bus);
     }
 
+    spapr_create_drc_dt_entries(fdt);
+
     _FDT((fdt_pack(fdt)));
 
     if (fdt_totalsize(fdt) > FDT_MAX_SIZE) {
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index b2f11e9..0f2e705 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -299,6 +299,39 @@ typedef struct sPAPREnvironment {
 #define KVMPPC_H_LOGICAL_MEMOP  (KVMPPC_HCALL_BASE + 0x1)
 #define KVMPPC_HCALL_MAX        KVMPPC_H_LOGICAL_MEMOP
 
+/* For dlparable/hotpluggable slots */
+#define SPAPR_DRC_TABLE_SIZE    32
+#define SPAPR_DRC_PHB_SLOT_MAX  32
+#define SPAPR_DRC_DEV_ID_BASE   0x40000000
+
+typedef struct ConfigureConnectorState {
+    void *fdt;
+    int offset_start;
+    int offset;
+    int depth;
+    PCIDevice *dev;
+    enum {
+        CC_STATE_IDLE = 0,
+        CC_STATE_PENDING = 1,
+        CC_STATE_ACTIVE,
+    } state;
+} ConfigureConnectorState;
+
+typedef struct DrcEntry DrcEntry;
+
+struct DrcEntry {
+    uint32_t drc_index;
+    uint64_t phb_buid;
+    void *fdt;
+    int fdt_offset;
+    uint32_t state;
+    ConfigureConnectorState cc_state;
+    DrcEntry *child_entries;
+};
+
+extern DrcEntry drc_table[SPAPR_DRC_TABLE_SIZE];
+DrcEntry *spapr_add_phb_to_drc_table(uint64_t buid, uint32_t state);
+
 extern sPAPREnvironment *spapr;
 
 /*#define DEBUG_SPAPR_HCALLS*/
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH v2 02/14] spapr_pci: populate DRC dt entries for PHBs
  2013-12-05 22:32 [Qemu-devel] [PATCH v2 00/14] spapr: add support for pci hotplug Michael Roth
  2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 01/14] spapr: populate DRC entries for root dt node Michael Roth
@ 2013-12-05 22:32 ` Michael Roth
  2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 03/14] spapr: add helper to retrieve a PHB/device DrcEntry Michael Roth
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 39+ messages in thread
From: Michael Roth @ 2013-12-05 22:32 UTC (permalink / raw
  To: qemu-devel; +Cc: aik, agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

Reserve 32 entries of type PCI in each PHB's initial FDT. This
advertises to guests that each PHB is DR-capable device with
physical hotpluggable slots. This is necessary for allowing
hotplugging of devices to it later via bus rescan or guest rpaphp
hotplug module.

Each entry is assigned a name of "Slot <<bus_idx>*32 +1>",
advertised as a hotpluggable PCI slot, and assigned to power domain
-1 to indicate to the guest that power management is handled by the
hardware.

This models a DR-capable PCI expansion device attached to a host/lpar
via a single PHB with 32 physical hotpluggable slots (as opposed to a
virtual bridge device with external management console). Hotplug will
be handled by the guest via bus rescan or the rpaphp hotplug module.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c              |    3 +-
 hw/ppc/spapr_pci.c          |  102 +++++++++++++++++++++++++++++++++++++++++++
 include/hw/pci-host/spapr.h |    1 +
 3 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index ec3ba43..0607559 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -739,7 +739,8 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
     QLIST_FOREACH(phb, &spapr->phbs, list) {
         drc_entry = spapr_add_phb_to_drc_table(phb->buid, 2 /* Unusable */);
         g_assert(drc_entry);
-        ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt);
+        ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, drc_entry->drc_index,
+                                    fdt);
     }
 
     if (ret < 0) {
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 7763149..7568a03 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -759,8 +759,104 @@ PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index)
 #define b_fff(x)        b_x((x), 8, 3)  /* function number */
 #define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
 
+static void spapr_create_drc_phb_dt_entries(void *fdt, int bus_off, int phb_index)
+{
+    char char_buf[1024];
+    uint32_t int_buf[SPAPR_DRC_PHB_SLOT_MAX + 1];
+    uint32_t *entries;
+    int i, ret, offset;
+
+    /* ibm,drc-indexes */
+    memset(int_buf, 0 , sizeof(int_buf));
+    int_buf[0] = SPAPR_DRC_PHB_SLOT_MAX;
+
+    for (i = 1; i <= SPAPR_DRC_PHB_SLOT_MAX; i++) {
+        int_buf[i] = SPAPR_DRC_DEV_ID_BASE + (phb_index << 8) + ((i - 1) << 3);
+    }
+
+    ret = fdt_setprop(fdt, bus_off, "ibm,drc-indexes", int_buf,
+                      sizeof(int_buf));
+    if (ret) {
+        fprintf(stderr, "error adding 'ibm,drc-indexes' field for PHB FDT");
+    }
+
+    /* ibm,drc-power-domains */
+    memset(int_buf, 0, sizeof(int_buf));
+    int_buf[0] = SPAPR_DRC_PHB_SLOT_MAX;
+
+    for (i = 1; i <= SPAPR_DRC_PHB_SLOT_MAX; i++) {
+        int_buf[i] = 0xffffffff;
+    }
+
+    ret = fdt_setprop(fdt, bus_off, "ibm,drc-power-domains", int_buf,
+                      sizeof(int_buf));
+    if (ret) {
+        fprintf(stderr,
+                "error adding 'ibm,drc-power-domains' field for PHB FDT");
+    }
+
+    /* ibm,drc-names */
+    memset(char_buf, 0, sizeof(char_buf));
+    entries = (uint32_t *)&char_buf[0];
+    *entries = SPAPR_DRC_PHB_SLOT_MAX;
+    offset = sizeof(*entries);
+
+    for (i = 1; i <= SPAPR_DRC_PHB_SLOT_MAX; i++) {
+        offset += sprintf(char_buf + offset, "Slot %d",
+                          (phb_index * SPAPR_DRC_PHB_SLOT_MAX) + i - 1);
+        char_buf[offset++] = '\0';
+    }
+
+    ret = fdt_setprop(fdt, bus_off, "ibm,drc-names", char_buf, offset);
+    if (ret) {
+        fprintf(stderr, "error adding 'ibm,drc-names' field for PHB FDT");
+    }
+
+    /* ibm,drc-types */
+    memset(char_buf, 0, sizeof(char_buf));
+    entries = (uint32_t *)&char_buf[0];
+    *entries = SPAPR_DRC_PHB_SLOT_MAX;
+    offset = sizeof(*entries);
+
+    for (i = 0; i < SPAPR_DRC_PHB_SLOT_MAX; i++) {
+        offset += sprintf(char_buf + offset, "28");
+        char_buf[offset++] = '\0';
+    }
+
+    ret = fdt_setprop(fdt, bus_off, "ibm,drc-types", char_buf, offset);
+    if (ret) {
+        fprintf(stderr, "error adding 'ibm,drc-types' field for PHB FDT");
+    }
+
+    /* we want the initial indicator state to be 0 - "empty", when we
+     * hot-plug an adaptor in the slot, we need to set the indicator
+     * to 1 - "present."
+     */
+
+    /* ibm,indicator-9003 */
+    memset(int_buf, 0, sizeof(int_buf));
+    int_buf[0] = SPAPR_DRC_PHB_SLOT_MAX;
+
+    ret = fdt_setprop(fdt, bus_off, "ibm,indicator-9003", int_buf,
+                      sizeof(int_buf));
+    if (ret) {
+        fprintf(stderr, "error adding 'ibm,indicator-9003' field for PHB FDT");
+    }
+
+    /* ibm,sensor-9003 */
+    memset(int_buf, 0, sizeof(int_buf));
+    int_buf[0] = SPAPR_DRC_PHB_SLOT_MAX;
+
+    ret = fdt_setprop(fdt, bus_off, "ibm,sensor-9003", int_buf,
+                      sizeof(int_buf));
+    if (ret) {
+        fprintf(stderr, "error adding 'ibm,sensor-9003' field for PHB FDT");
+    }
+}
+
 int spapr_populate_pci_dt(sPAPRPHBState *phb,
                           uint32_t xics_phandle,
+                          uint32_t drc_index,
                           void *fdt)
 {
     int bus_off, i, j;
@@ -842,6 +938,12 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
                  phb->dma_liobn, phb->dma_window_start,
                  phb->dma_window_size);
 
+    spapr_create_drc_phb_dt_entries(fdt, bus_off, phb->index);
+    if (drc_index) {
+        _FDT(fdt_setprop(fdt, bus_off, "ibm,my-drc-index", &drc_index,
+                         sizeof(drc_index)));
+    }
+
     return 0;
 }
 
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 970b4a9..43d19a5 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -86,6 +86,7 @@ PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index);
 
 int spapr_populate_pci_dt(sPAPRPHBState *phb,
                           uint32_t xics_phandle,
+                          uint32_t drc_index,
                           void *fdt);
 
 void spapr_pci_msi_init(sPAPREnvironment *spapr, hwaddr addr);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH v2 03/14] spapr: add helper to retrieve a PHB/device DrcEntry
  2013-12-05 22:32 [Qemu-devel] [PATCH v2 00/14] spapr: add support for pci hotplug Michael Roth
  2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 01/14] spapr: populate DRC entries for root dt node Michael Roth
  2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 02/14] spapr_pci: populate DRC dt entries for PHBs Michael Roth
@ 2013-12-05 22:32 ` Michael Roth
  2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 04/14] spapr_pci: add set-indicator RTAS interface Michael Roth
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 39+ messages in thread
From: Michael Roth @ 2013-12-05 22:32 UTC (permalink / raw
  To: qemu-devel; +Cc: aik, agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |   36 ++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |    2 ++
 2 files changed, 38 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0607559..2250ee1 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -277,6 +277,42 @@ static size_t create_page_sizes_prop(CPUPPCState *env, uint32_t *prop,
     return (p - prop) * sizeof(uint32_t);
 }
 
+DrcEntry *spapr_phb_to_drc_entry(uint64_t buid)
+{
+    int i;
+
+    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
+        if (drc_table[i].phb_buid == buid) {
+            return &drc_table[i];
+        }
+     }
+
+     return NULL;
+}
+
+DrcEntry *spapr_find_drc_entry(int drc_index)
+{
+    int i, j;
+
+    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
+        DrcEntry *phb_entry = &drc_table[i];
+        if (phb_entry->drc_index == drc_index) {
+            return phb_entry;
+        }
+        if (phb_entry->child_entries == NULL) {
+            continue;
+        }
+        for (j = 0; j < SPAPR_DRC_PHB_SLOT_MAX; j++) {
+            DrcEntry *entry = &phb_entry->child_entries[j];
+            if (entry->drc_index == drc_index) {
+                return entry;
+            }
+        }
+     }
+
+     return NULL;
+}
+
 static void spapr_init_drc_table(void)
 {
     int i;
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 0f2e705..6ae5c54 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -331,6 +331,8 @@ struct DrcEntry {
 
 extern DrcEntry drc_table[SPAPR_DRC_TABLE_SIZE];
 DrcEntry *spapr_add_phb_to_drc_table(uint64_t buid, uint32_t state);
+DrcEntry *spapr_phb_to_drc_entry(uint64_t buid);
+DrcEntry *spapr_find_drc_entry(int drc_index);
 
 extern sPAPREnvironment *spapr;
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH v2 04/14] spapr_pci: add set-indicator RTAS interface
  2013-12-05 22:32 [Qemu-devel] [PATCH v2 00/14] spapr: add support for pci hotplug Michael Roth
                   ` (2 preceding siblings ...)
  2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 03/14] spapr: add helper to retrieve a PHB/device DrcEntry Michael Roth
@ 2013-12-05 22:32 ` Michael Roth
  2013-12-16  4:26   ` Alexey Kardashevskiy
  2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 05/14] spapr_pci: add get/set-power-level RTAS interfaces Michael Roth
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 39+ messages in thread
From: Michael Roth @ 2013-12-05 22:32 UTC (permalink / raw
  To: qemu-devel; +Cc: aik, agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

From: Mike Day <ncmike@ncultra.org>

Signed-off-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_pci.c     |   93 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |   28 +++++++++++++++
 2 files changed, 121 insertions(+)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 7568a03..1046ec8 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -35,6 +35,16 @@
 
 #include "hw/pci/pci_bus.h"
 
+/* #define DEBUG_SPAPR */
+
+#ifdef DEBUG_SPAPR
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
 /* Copied from the kernel arch/powerpc/platforms/pseries/msi.c */
 #define RTAS_QUERY_FN           0
 #define RTAS_CHANGE_FN          1
@@ -404,6 +414,80 @@ static void rtas_ibm_query_interrupt_source_number(PowerPCCPU *cpu,
     rtas_st(rets, 2, 1);/* 0 == level; 1 == edge */
 }
 
+static void rtas_set_indicator(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+                               uint32_t token, uint32_t nargs,
+                               target_ulong args, uint32_t nret,
+                               target_ulong rets)
+{
+    uint32_t indicator = rtas_ld(args, 0);
+    uint32_t drc_index = rtas_ld(args, 1);
+    uint32_t indicator_state = rtas_ld(args, 2);
+    uint32_t encoded = 0, shift = 0, mask = 0;
+    uint32_t *pind;
+    DrcEntry *drc_entry = NULL;
+
+    if (drc_index == 0) { /* platform indicator */
+        pind = &spapr->state;
+    } else {
+        drc_entry = spapr_find_drc_entry(drc_index);
+        if (!drc_entry) {
+            DPRINTF("rtas_set_indicator: unable to find drc_entry for %x",
+                    drc_index);
+            rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+            return;
+        }
+        pind = &drc_entry->state;
+    }
+
+    switch (indicator) {
+    case 9:  /* EPOW */
+        shift = INDICATOR_EPOW_SHIFT;
+        mask = INDICATOR_EPOW_MASK;
+        break;
+    case 9001: /* Isolation state */
+        /* encode the new value into the correct bit field */
+        shift = INDICATOR_ISOLATION_SHIFT;
+        mask = INDICATOR_ISOLATION_MASK;
+        break;
+    case 9002: /* DR */
+        shift = INDICATOR_DR_SHIFT;
+        mask = INDICATOR_DR_MASK;
+        break;
+    case 9003: /* Allocation State */
+        shift = INDICATOR_ALLOCATION_SHIFT;
+        mask = INDICATOR_ALLOCATION_MASK;
+        break;
+    case 9005: /* global interrupt */
+        shift = INDICATOR_GLOBAL_INTERRUPT_SHIFT;
+        mask = INDICATOR_GLOBAL_INTERRUPT_MASK;
+        break;
+    case 9006: /* error log */
+        shift = INDICATOR_ERROR_LOG_SHIFT;
+        mask = INDICATOR_ERROR_LOG_MASK;
+        break;
+    case 9007: /* identify */
+        shift = INDICATOR_IDENTIFY_SHIFT;
+        mask = INDICATOR_IDENTIFY_MASK;
+        break;
+    case 9009: /* reset */
+        shift = INDICATOR_RESET_SHIFT;
+        mask = INDICATOR_RESET_MASK;
+        break;
+    default:
+        DPRINTF("rtas_set_indicator: indicator not implemented: %d",
+                indicator);
+        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+        return;
+    }
+
+    encoded = ENCODE_DRC_STATE(indicator_state, mask, shift);
+    /* clear the current indicator value */
+    *pind &= ~mask;
+    /* set the new value */
+    *pind |= encoded;
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+}
+
 static int pci_spapr_swizzle(int slot, int pin)
 {
     return (slot + pin) % PCI_NUM_PINS;
@@ -637,6 +721,14 @@ static int spapr_phb_init(SysBusDevice *s)
         sphb->lsi_table[i].irq = irq;
     }
 
+    /* make sure the platform EPOW sensor is initialized - the
+     * guest will probe it when there is a hotplug event.
+     */
+    spapr->state &= ~(uint32_t)INDICATOR_EPOW_MASK;
+    spapr->state |= ENCODE_DRC_STATE(0,
+                                     INDICATOR_EPOW_MASK,
+                                     INDICATOR_EPOW_SHIFT);
+
     return 0;
 }
 
@@ -958,6 +1050,7 @@ void spapr_pci_rtas_init(void)
                             rtas_ibm_query_interrupt_source_number);
         spapr_rtas_register("ibm,change-msi", rtas_ibm_change_msi);
     }
+    spapr_rtas_register("set-indicator", rtas_set_indicator);
 }
 
 static void spapr_pci_register_types(void)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 6ae5c54..b48c55f 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -38,6 +38,9 @@ typedef struct sPAPREnvironment {
     int htab_save_index;
     bool htab_first_pass;
     int htab_fd;
+
+    /* platform state - sensors and indicators */
+    uint32_t state;
 } sPAPREnvironment;
 
 #define H_SUCCESS         0
@@ -334,6 +337,31 @@ DrcEntry *spapr_add_phb_to_drc_table(uint64_t buid, uint32_t state);
 DrcEntry *spapr_phb_to_drc_entry(uint64_t buid);
 DrcEntry *spapr_find_drc_entry(int drc_index);
 
+/* For set-indicator RTAS interface */
+#define INDICATOR_ISOLATION_MASK            0x0001   /* 9001 one bit */
+#define INDICATOR_GLOBAL_INTERRUPT_MASK     0x0002   /* 9005 one bit */
+#define INDICATOR_ERROR_LOG_MASK            0x0004   /* 9006 one bit */
+#define INDICATOR_IDENTIFY_MASK             0x0008   /* 9007 one bit */
+#define INDICATOR_RESET_MASK                0x0010   /* 9009 one bit */
+#define INDICATOR_DR_MASK                   0x00e0   /* 9002 three bits */
+#define INDICATOR_ALLOCATION_MASK           0x0300   /* 9003 two bits */
+#define INDICATOR_EPOW_MASK                 0x1c00   /* 9 three bits */
+
+#define INDICATOR_ISOLATION_SHIFT           0x00     /* bit 0 */
+#define INDICATOR_GLOBAL_INTERRUPT_SHIFT    0x01     /* bit 1 */
+#define INDICATOR_ERROR_LOG_SHIFT           0x02     /* bit 2 */
+#define INDICATOR_IDENTIFY_SHIFT            0x03     /* bit 3 */
+#define INDICATOR_RESET_SHIFT               0x04     /* bit 4 */
+#define INDICATOR_DR_SHIFT                  0x05     /* bits 5-7 */
+#define INDICATOR_ALLOCATION_SHIFT          0x08     /* bits 8-9 */
+#define INDICATOR_EPOW_SHIFT                0x0a     /* bits 10-12 */
+
+#define DECODE_DRC_STATE(state, m, s)                  \
+    ((((uint32_t)(state) & (uint32_t)(m))) >> (s))
+
+#define ENCODE_DRC_STATE(val, m, s) \
+    (((uint32_t)(val) << (s)) & (uint32_t)(m))
+
 extern sPAPREnvironment *spapr;
 
 /*#define DEBUG_SPAPR_HCALLS*/
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH v2 05/14] spapr_pci: add get/set-power-level RTAS interfaces
  2013-12-05 22:32 [Qemu-devel] [PATCH v2 00/14] spapr: add support for pci hotplug Michael Roth
                   ` (3 preceding siblings ...)
  2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 04/14] spapr_pci: add set-indicator RTAS interface Michael Roth
@ 2013-12-05 22:32 ` Michael Roth
  2013-12-16  3:09   ` Alexey Kardashevskiy
  2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 06/14] spapr_pci: add get-sensor-state RTAS interface Michael Roth
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 39+ messages in thread
From: Michael Roth @ 2013-12-05 22:32 UTC (permalink / raw
  To: qemu-devel; +Cc: aik, agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

From: Nathan Fontenot <nfont@linux.vnet.ibm.com>

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_pci.c |   22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 1046ec8..8df44a3 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -488,6 +488,26 @@ static void rtas_set_indicator(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     rtas_st(rets, 0, RTAS_OUT_SUCCESS);
 }
 
+static void rtas_set_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+                                 uint32_t token, uint32_t nargs,
+                                 target_ulong args, uint32_t nret,
+                                 target_ulong rets)
+{
+    uint32_t power_lvl = rtas_ld(args, 1);
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+    rtas_st(rets, 1, power_lvl);
+}
+
+static void rtas_get_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+                                  uint32_t token, uint32_t nargs,
+                                  target_ulong args, uint32_t nret,
+                                  target_ulong rets)
+{
+    /* return SUCCESS with a power level of 100 */
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+    rtas_st(rets, 1, 100);
+}
+
 static int pci_spapr_swizzle(int slot, int pin)
 {
     return (slot + pin) % PCI_NUM_PINS;
@@ -1051,6 +1071,8 @@ void spapr_pci_rtas_init(void)
         spapr_rtas_register("ibm,change-msi", rtas_ibm_change_msi);
     }
     spapr_rtas_register("set-indicator", rtas_set_indicator);
+    spapr_rtas_register("set-power-level", rtas_set_power_level);
+    spapr_rtas_register("get-power-level", rtas_get_power_level);
 }
 
 static void spapr_pci_register_types(void)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH v2 06/14] spapr_pci: add get-sensor-state RTAS interface
  2013-12-05 22:32 [Qemu-devel] [PATCH v2 00/14] spapr: add support for pci hotplug Michael Roth
                   ` (4 preceding siblings ...)
  2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 05/14] spapr_pci: add get/set-power-level RTAS interfaces Michael Roth
@ 2013-12-05 22:32 ` Michael Roth
  2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 07/14] spapr_pci: add ibm, configure-connector " Michael Roth
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 39+ messages in thread
From: Michael Roth @ 2013-12-05 22:32 UTC (permalink / raw
  To: qemu-devel; +Cc: aik, agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

From: Mike Day <ncmike@ncultra.org>

Signed-off-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_pci.c     |   70 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |    7 ++++-
 2 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 8df44a3..5c099a8 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -508,6 +508,75 @@ static void rtas_get_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     rtas_st(rets, 1, 100);
 }
 
+static void rtas_get_sensor_state(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+                                  uint32_t token, uint32_t nargs,
+                                  target_ulong args, uint32_t nret,
+                                  target_ulong rets)
+{
+    uint32_t sensor = rtas_ld(args, 0);
+    uint32_t drc_index = rtas_ld(args, 1);
+    uint32_t sensor_state = 0, decoded = 0;
+    uint32_t shift = 0, mask = 0;
+    DrcEntry *drc_entry = NULL;
+
+    if (drc_index == 0) {  /* platform state sensor/indicator */
+        sensor_state = spapr->state;
+    } else { /* we should have a drc entry */
+        drc_entry = spapr_find_drc_entry(drc_index);
+        if (!drc_entry) {
+            DPRINTF("unable to find DRC entry for index %x", drc_index);
+            sensor_state = 0; /* empty */
+            rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+            return;
+        }
+        sensor_state = drc_entry->state;
+    }
+    switch (sensor) {
+    case 9:  /* EPOW */
+        shift = INDICATOR_EPOW_SHIFT;
+        mask = INDICATOR_EPOW_MASK;
+        break;
+    case 9001: /* Isolation state */
+        /* encode the new value into the correct bit field */
+        shift = INDICATOR_ISOLATION_SHIFT;
+        mask = INDICATOR_ISOLATION_MASK;
+        break;
+    case 9002: /* DR */
+        shift = INDICATOR_DR_SHIFT;
+        mask = INDICATOR_DR_MASK;
+        break;
+    case 9003: /* entity sense */
+        shift = INDICATOR_ENTITY_SENSE_SHIFT;
+        mask = INDICATOR_ENTITY_SENSE_MASK;
+        break;
+    case 9005: /* global interrupt */
+        shift = INDICATOR_GLOBAL_INTERRUPT_SHIFT;
+        mask = INDICATOR_GLOBAL_INTERRUPT_MASK;
+        break;
+    case 9006: /* error log */
+        shift = INDICATOR_ERROR_LOG_SHIFT;
+        mask = INDICATOR_ERROR_LOG_MASK;
+        break;
+    case 9007: /* identify */
+        shift = INDICATOR_IDENTIFY_SHIFT;
+        mask = INDICATOR_IDENTIFY_MASK;
+        break;
+    case 9009: /* reset */
+        shift = INDICATOR_RESET_SHIFT;
+        mask = INDICATOR_RESET_MASK;
+        break;
+    default:
+        DPRINTF("rtas_get_sensor_state: sensor not implemented: %d",
+                sensor);
+        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+        return;
+    }
+
+    decoded = DECODE_DRC_STATE(sensor_state, mask, shift);
+    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+    rtas_st(rets, 1, decoded);
+}
+
 static int pci_spapr_swizzle(int slot, int pin)
 {
     return (slot + pin) % PCI_NUM_PINS;
@@ -1073,6 +1142,7 @@ void spapr_pci_rtas_init(void)
     spapr_rtas_register("set-indicator", rtas_set_indicator);
     spapr_rtas_register("set-power-level", rtas_set_power_level);
     spapr_rtas_register("get-power-level", rtas_get_power_level);
+    spapr_rtas_register("get-sensor-state", rtas_get_sensor_state);
 }
 
 static void spapr_pci_register_types(void)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index b48c55f..7c8a521 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -337,7 +337,7 @@ DrcEntry *spapr_add_phb_to_drc_table(uint64_t buid, uint32_t state);
 DrcEntry *spapr_phb_to_drc_entry(uint64_t buid);
 DrcEntry *spapr_find_drc_entry(int drc_index);
 
-/* For set-indicator RTAS interface */
+/* For set-indicator/get-sensor-state RTAS interfaces */
 #define INDICATOR_ISOLATION_MASK            0x0001   /* 9001 one bit */
 #define INDICATOR_GLOBAL_INTERRUPT_MASK     0x0002   /* 9005 one bit */
 #define INDICATOR_ERROR_LOG_MASK            0x0004   /* 9006 one bit */
@@ -346,6 +346,7 @@ DrcEntry *spapr_find_drc_entry(int drc_index);
 #define INDICATOR_DR_MASK                   0x00e0   /* 9002 three bits */
 #define INDICATOR_ALLOCATION_MASK           0x0300   /* 9003 two bits */
 #define INDICATOR_EPOW_MASK                 0x1c00   /* 9 three bits */
+#define INDICATOR_ENTITY_SENSE_MASK         0xe000   /* 9003 three bits */
 
 #define INDICATOR_ISOLATION_SHIFT           0x00     /* bit 0 */
 #define INDICATOR_GLOBAL_INTERRUPT_SHIFT    0x01     /* bit 1 */
@@ -355,6 +356,10 @@ DrcEntry *spapr_find_drc_entry(int drc_index);
 #define INDICATOR_DR_SHIFT                  0x05     /* bits 5-7 */
 #define INDICATOR_ALLOCATION_SHIFT          0x08     /* bits 8-9 */
 #define INDICATOR_EPOW_SHIFT                0x0a     /* bits 10-12 */
+#define INDICATOR_ENTITY_SENSE_SHIFT        0x0d     /* bits 13-15 */
+
+#define INDICATOR_ENTITY_SENSE_EMPTY 0
+#define INDICATOR_ENTITY_SENSE_PRESENT 1
 
 #define DECODE_DRC_STATE(state, m, s)                  \
     ((((uint32_t)(state) & (uint32_t)(m))) >> (s))
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH v2 07/14] spapr_pci: add ibm, configure-connector RTAS interface
  2013-12-05 22:32 [Qemu-devel] [PATCH v2 00/14] spapr: add support for pci hotplug Michael Roth
                   ` (5 preceding siblings ...)
  2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 06/14] spapr_pci: add get-sensor-state RTAS interface Michael Roth
@ 2013-12-05 22:32 ` Michael Roth
  2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 08/14] memory: add memory_region_find_subregion Michael Roth
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 39+ messages in thread
From: Michael Roth @ 2013-12-05 22:32 UTC (permalink / raw
  To: qemu-devel; +Cc: aik, agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_pci.c |  111 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 111 insertions(+)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 5c099a8..6e7ee31 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -577,6 +577,115 @@ static void rtas_get_sensor_state(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     rtas_st(rets, 1, decoded);
 }
 
+/* configure-connector work area offsets, int32_t units */
+#define CC_IDX_NODE_NAME_OFFSET 2
+#define CC_IDX_PROP_NAME_OFFSET 2
+#define CC_IDX_PROP_LEN 3
+#define CC_IDX_PROP_DATA_OFFSET 4
+
+#define CC_VAL_DATA_OFFSET ((CC_IDX_PROP_DATA_OFFSET + 1) * 4)
+#define CC_RET_NEXT_SIB 1
+#define CC_RET_NEXT_CHILD 2
+#define CC_RET_NEXT_PROPERTY 3
+#define CC_RET_PREV_PARENT 4
+#define CC_RET_ERROR RTAS_OUT_HW_ERROR
+#define CC_RET_SUCCESS RTAS_OUT_SUCCESS
+
+static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
+                                         sPAPREnvironment *spapr,
+                                         uint32_t token, uint32_t nargs,
+                                         target_ulong args, uint32_t nret,
+                                         target_ulong rets)
+{
+    uint64_t wa_addr = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 0);
+    DrcEntry *drc_entry = NULL;
+    ConfigureConnectorState *ccs;
+    void *wa_buf;
+    int32_t *wa_buf_int;
+    hwaddr map_len = 0x1024;
+    uint32_t drc_index;
+    int rc = 0, next_offset, tag, prop_len, node_name_len;
+    const struct fdt_property *prop;
+    const char *node_name, *prop_name;
+
+    wa_buf = cpu_physical_memory_map(wa_addr, &map_len, 1);
+    if (!wa_buf) {
+        rc = CC_RET_ERROR;
+        goto error_exit;
+    }
+    wa_buf_int = wa_buf;
+
+    drc_index = *(uint32_t *)wa_buf;
+    drc_entry = spapr_find_drc_entry(drc_index);
+    if (!drc_entry) {
+        rc = -1;
+        goto error_exit;
+    }
+
+    ccs = &drc_entry->cc_state;
+    if (ccs->state == CC_STATE_PENDING) {
+        /* fdt should've been been attached to drc_entry during
+         * realize/hotplug
+         */
+        g_assert(ccs->fdt);
+        ccs->depth = 0;
+        ccs->offset = ccs->offset_start;
+        ccs->state = CC_STATE_ACTIVE;
+    }
+
+    if (ccs->state == CC_STATE_IDLE) {
+        rc = -1;
+        goto error_exit;
+    }
+
+retry:
+    tag = fdt_next_tag(ccs->fdt, ccs->offset, &next_offset);
+
+    switch (tag) {
+    case FDT_BEGIN_NODE:
+        ccs->depth++;
+        node_name = fdt_get_name(ccs->fdt, ccs->offset, &node_name_len);
+        wa_buf_int[CC_IDX_NODE_NAME_OFFSET] = CC_VAL_DATA_OFFSET;
+        strcpy(wa_buf + wa_buf_int[CC_IDX_NODE_NAME_OFFSET], node_name);
+        rc = CC_RET_NEXT_CHILD;
+        break;
+    case FDT_END_NODE:
+        ccs->depth--;
+        if (ccs->depth == 0) {
+            /* reached the end of top-level node, declare success */
+            ccs->state = CC_STATE_PENDING;
+            rc = CC_RET_SUCCESS;
+        } else {
+            rc = CC_RET_PREV_PARENT;
+        }
+        break;
+    case FDT_PROP:
+        prop = fdt_get_property_by_offset(ccs->fdt, ccs->offset, &prop_len);
+        prop_name = fdt_string(ccs->fdt, fdt32_to_cpu(prop->nameoff));
+        wa_buf_int[CC_IDX_PROP_NAME_OFFSET] = CC_VAL_DATA_OFFSET;
+        wa_buf_int[CC_IDX_PROP_LEN] = prop_len;
+        wa_buf_int[CC_IDX_PROP_DATA_OFFSET] =
+            CC_VAL_DATA_OFFSET + strlen(prop_name) + 1;
+        strcpy(wa_buf + wa_buf_int[CC_IDX_PROP_NAME_OFFSET], prop_name);
+        memcpy(wa_buf + wa_buf_int[CC_IDX_PROP_DATA_OFFSET],
+               prop->data, prop_len);
+        rc = CC_RET_NEXT_PROPERTY;
+        break;
+    case FDT_END:
+        rc = CC_RET_ERROR;
+        break;
+    default:
+        ccs->offset = next_offset;
+        goto retry;
+    }
+
+    ccs->offset = next_offset;
+
+error_exit:
+    cpu_physical_memory_unmap(wa_buf, 0x1024, 1, 0x1024);
+    rtas_st(rets, 0, rc);
+}
+
 static int pci_spapr_swizzle(int slot, int pin)
 {
     return (slot + pin) % PCI_NUM_PINS;
@@ -1143,6 +1252,8 @@ void spapr_pci_rtas_init(void)
     spapr_rtas_register("set-power-level", rtas_set_power_level);
     spapr_rtas_register("get-power-level", rtas_get_power_level);
     spapr_rtas_register("get-sensor-state", rtas_get_sensor_state);
+    spapr_rtas_register("ibm,configure-connector",
+                        rtas_ibm_configure_connector);
 }
 
 static void spapr_pci_register_types(void)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH v2 08/14] memory: add memory_region_find_subregion
  2013-12-05 22:32 [Qemu-devel] [PATCH v2 00/14] spapr: add support for pci hotplug Michael Roth
                   ` (6 preceding siblings ...)
  2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 07/14] spapr_pci: add ibm, configure-connector " Michael Roth
@ 2013-12-05 22:32 ` Michael Roth
  2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 09/14] pci: make pci_bar useable outside pci.c Michael Roth
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 39+ messages in thread
From: Michael Roth @ 2013-12-05 22:32 UTC (permalink / raw
  To: qemu-devel; +Cc: aik, agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

Similar to memory_region_find, but only search for overlaps among
regions that are a child of the region passed in. This is useful
for finding free ranges within a parent range to map to, in
addition to the use-cases similarly served by memory_region_find.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 include/exec/memory.h |   34 +++++++++++++++++++++++++++++++++
 memory.c              |   50 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 84 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 480dfbf..784b262 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -883,6 +883,40 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr,
                                        hwaddr addr, uint64_t size);
 
 /**
+ * memory_region_find_subregion: translate an address/size relative
+ * to a MemoryRegion into a #MemoryRegionSection corresponding to
+ * a child of that region.
+ *
+ * This is similar to memory_region_find, but locates the first
+ * #MemoryRegion within @mr that overlaps the range, as opposed to
+ * the first #MemoryRegion with @mr's address space.
+ *
+ * Returns a #MemoryRegionSection that describes a contiguous overlap.
+ * It will have the following characteristics:
+ *    .@size = 0 iff no overlap was found
+ *    .@mr is non-%NULL iff an overlap was found
+ *
+ * Remember that in the return value the @offset_within_region is
+ * relative to the returned region (in the .@mr field), not to the
+ * @mr argument.
+ *
+ * Similarly, the .@offset_within_address_space is relative to the
+ * address space that contains both regions, the passed and the
+ * returned one.  However, in the special case where the @mr argument
+ * has no parent (and thus is the root of the address space), the
+ * following will hold:
+ *    .@offset_within_address_space >= @addr
+ *    .@offset_within_address_space + .@size <= @addr + @size
+ *
+ * @mr: a MemoryRegion within which @addr is a relative address
+ * @addr: start of the area within @as to be searched
+ * @size: size of the area to be searched
+ */
+MemoryRegionSection memory_region_find_subregion(MemoryRegion *mr,
+                                                 hwaddr addr,
+                                                 uint64_t size);
+
+/**
  * address_space_sync_dirty_bitmap: synchronize the dirty log for all memory
  *
  * Synchronizes the dirty page log for an entire address space.
diff --git a/memory.c b/memory.c
index 28f6449..487e710 100644
--- a/memory.c
+++ b/memory.c
@@ -1574,6 +1574,56 @@ bool memory_region_present(MemoryRegion *parent, hwaddr addr)
     return true;
 }
 
+MemoryRegionSection memory_region_find_subregion(MemoryRegion *mr,
+                                                 hwaddr addr, uint64_t size)
+{
+    MemoryRegionSection ret = { 0 };
+    MemoryRegion *submr = NULL;
+
+    QTAILQ_FOREACH(submr, &mr->subregions, subregions_link) {
+        if (!(submr->addr + memory_region_size(submr) - 1 < addr ||
+            submr->addr >= addr + size)) {
+            break;
+        }
+    }
+
+    if (submr) {
+        hwaddr as_addr;
+        MemoryRegion *root;
+        Int128 last_range_addr = int128_make64(addr + size);
+        Int128 last_region_addr =
+            int128_make64(submr->addr + memory_region_size(submr));
+
+        for (root = submr, as_addr = submr->addr; root->parent; ) {
+            root = root->parent;
+            as_addr += root->addr;
+        }
+        ret.mr = submr;
+        ret.size = submr->size;
+        ret.offset_within_address_space = as_addr;
+        /* if the region begins before the range we're checking, subtract the
+         * difference from our offset/size
+         */
+        if (submr->addr <= addr) {
+            ret.offset_within_region = addr - submr->addr;
+            ret.offset_within_address_space += ret.offset_within_region;
+            ret.size = int128_sub(ret.size,
+                                  int128_make64(ret.offset_within_region));
+        }
+        /* if the region extends beyond the range we're checking, subtract the
+         * difference from our size
+         */
+        if (int128_gt(last_region_addr, last_range_addr)) {
+            ret.size = int128_sub(ret.size,
+                                  int128_sub(last_region_addr,
+                                             last_range_addr));
+        }
+        memory_region_ref(ret.mr);
+    }
+
+    return ret;
+}
+
 MemoryRegionSection memory_region_find(MemoryRegion *mr,
                                        hwaddr addr, uint64_t size)
 {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH v2 09/14] pci: make pci_bar useable outside pci.c
  2013-12-05 22:32 [Qemu-devel] [PATCH v2 00/14] spapr: add support for pci hotplug Michael Roth
                   ` (7 preceding siblings ...)
  2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 08/14] memory: add memory_region_find_subregion Michael Roth
@ 2013-12-05 22:33 ` Michael Roth
  2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 10/14] pci: allow 0 address for PCI IO regions Michael Roth
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 39+ messages in thread
From: Michael Roth @ 2013-12-05 22:33 UTC (permalink / raw
  To: qemu-devel; +Cc: aik, agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/pci/pci.c         |    3 ++-
 include/hw/pci/pci.h |    1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index ed32059..f15bbb0 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -103,7 +103,8 @@ static const VMStateDescription vmstate_pcibus = {
         VMSTATE_END_OF_LIST()
     }
 };
-static int pci_bar(PCIDevice *d, int reg)
+
+int pci_bar(PCIDevice *d, int reg)
 {
     uint8_t type;
 
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index b783e68..8fe7c5d 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -325,6 +325,7 @@ void pci_device_save(PCIDevice *s, QEMUFile *f);
 int pci_device_load(PCIDevice *s, QEMUFile *f);
 MemoryRegion *pci_address_space(PCIDevice *dev);
 MemoryRegion *pci_address_space_io(PCIDevice *dev);
+int pci_bar(PCIDevice *d, int reg);
 
 typedef void (*pci_set_irq_fn)(void *opaque, int irq_num, int level);
 typedef int (*pci_map_irq_fn)(PCIDevice *pci_dev, int irq_num);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH v2 10/14] pci: allow 0 address for PCI IO regions
  2013-12-05 22:32 [Qemu-devel] [PATCH v2 00/14] spapr: add support for pci hotplug Michael Roth
                   ` (8 preceding siblings ...)
  2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 09/14] pci: make pci_bar useable outside pci.c Michael Roth
@ 2013-12-05 22:33 ` Michael Roth
  2013-12-05 23:33   ` Peter Maydell
  2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 11/14] spapr_pci: enable basic hotplug operations Michael Roth
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 39+ messages in thread
From: Michael Roth @ 2013-12-05 22:33 UTC (permalink / raw
  To: qemu-devel; +Cc: aik, agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

Some kernels program a 0 address for io regions. PCI 3.0 spec
sectio 6.2.5.1 doesn't seem to disallow this.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/pci/pci.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index f15bbb0..fe5729c 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1030,7 +1030,7 @@ static pcibus_t pci_bar_address(PCIDevice *d,
         /* Check if 32 bit BAR wraps around explicitly.
          * TODO: make priorities correct and remove this work around.
          */
-        if (last_addr <= new_addr || new_addr == 0 || last_addr >= UINT32_MAX) {
+        if (last_addr <= new_addr || last_addr >= UINT32_MAX) {
             return PCI_BAR_UNMAPPED;
         }
         return new_addr;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH v2 11/14] spapr_pci: enable basic hotplug operations
  2013-12-05 22:32 [Qemu-devel] [PATCH v2 00/14] spapr: add support for pci hotplug Michael Roth
                   ` (9 preceding siblings ...)
  2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 10/14] pci: allow 0 address for PCI IO regions Michael Roth
@ 2013-12-05 22:33 ` Michael Roth
  2013-12-16  4:36   ` Alexey Kardashevskiy
  2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 12/14] spapr_events: re-use EPOW event infrastructure for hotplug events Michael Roth
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 39+ messages in thread
From: Michael Roth @ 2013-12-05 22:33 UTC (permalink / raw
  To: qemu-devel; +Cc: aik, agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

From: Mike Day <ncmike@ncultra.org>

This enables hotplug for PHB bridges. Upon hotplug we generate the
OF-nodes required by PAPR specification and IEEE 1275-1994
"PCI Bus Binding to Open Firmware" for the device.

We associate the corresponding FDT for these nodes with the DrcEntry
corresponding to the slot, which will be fetched via
ibm,configure-connector RTAS calls by the guest as described by PAPR
specification. The FDT is cleaned up in the case of unplug.

Amongst the required OF-node properties for each device are the "reg"
and "assigned-addresses" properties which describe the BAR-assignments
for IO/MEM/ROM regions. To handle these assignments we scan the address
space associated with each region for a contiguous range of appropriate
size based on PCI specification and encode these in accordance with
Open Firmware PCI Bus Binding spec.

These assignments will be used by the guest when the rpaphp hotplug
module is used, but may be re-assigned by guests for cases where we
rely on bus rescan.

Signed-off-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_pci.c     |  375 ++++++++++++++++++++++++++++++++++++++++++++++--
 include/hw/ppc/spapr.h |    1 +
 2 files changed, 368 insertions(+), 8 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 6e7ee31..9b4f829 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -56,6 +56,17 @@
 #define RTAS_TYPE_MSI           1
 #define RTAS_TYPE_MSIX          2
 
+#define FDT_MAX_SIZE            0x10000
+#define _FDT(exp) \
+    do { \
+        int ret = (exp);                                           \
+        if (ret < 0) {                                             \
+            return ret;                                            \
+        }                                                          \
+    } while (0)
+
+static void spapr_drc_state_reset(DrcEntry *drc_entry);
+
 static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
 {
     sPAPRPHBState *sphb;
@@ -448,6 +459,22 @@ static void rtas_set_indicator(PowerPCCPU *cpu, sPAPREnvironment *spapr,
         /* encode the new value into the correct bit field */
         shift = INDICATOR_ISOLATION_SHIFT;
         mask = INDICATOR_ISOLATION_MASK;
+        if (drc_entry) {
+            /* transition from unisolated to isolated for a hotplug slot
+             * entails completion of guest-side device unplug/cleanup, so
+             * we can now safely remove the device if qemu is waiting for
+             * it to be released
+             */
+            if (DECODE_DRC_STATE(*pind, mask, shift) != indicator_state) {
+                if (indicator_state == 0 && drc_entry->awaiting_release) {
+                    /* device_del has been called and host is waiting
+                     * for guest to release/isolate device, go ahead
+                     * and remove it now
+                     */
+                    spapr_drc_state_reset(drc_entry);
+                }
+            }
+        }
         break;
     case 9002: /* DR */
         shift = INDICATOR_DR_SHIFT;
@@ -776,6 +803,345 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
     return &phb->iommu_as;
 }
 
+/* for 'reg'/'assigned-addresses' OF properties */
+#define RESOURCE_CELLS_SIZE 2
+#define RESOURCE_CELLS_ADDRESS 3
+#define RESOURCE_CELLS_TOTAL \
+    (RESOURCE_CELLS_SIZE + RESOURCE_CELLS_ADDRESS)
+
+static void fill_resource_props(PCIDevice *d, int bus_num,
+                                uint32_t *reg, int *reg_size,
+                                uint32_t *assigned, int *assigned_size)
+{
+    uint32_t *reg_row, *assigned_row;
+    uint32_t dev_id = ((bus_num << 8) |
+                        (PCI_SLOT(d->devfn) << 3) | PCI_FUNC(d->devfn));
+    int i, idx = 0;
+
+    reg[0] = cpu_to_be32(dev_id << 8);
+
+    for (i = 0; i < PCI_NUM_REGIONS; i++) {
+        if (!d->io_regions[i].size) {
+            continue;
+        }
+        reg_row = &reg[(idx + 1) * RESOURCE_CELLS_TOTAL];
+        assigned_row = &assigned[idx * RESOURCE_CELLS_TOTAL];
+        reg_row[0] = cpu_to_be32((dev_id << 8) | (pci_bar(d, i) & 0xff));
+        if (d->io_regions[i].type & PCI_BASE_ADDRESS_SPACE_IO) {
+            reg_row[0] |= cpu_to_be32(0x01000000);
+        } else {
+            reg_row[0] |= cpu_to_be32(0x02000000);
+        }
+        assigned_row[0] = cpu_to_be32(reg_row[0] | 0x80000000);
+        assigned_row[3] = reg_row[3] = cpu_to_be32(d->io_regions[i].size >> 32);
+        assigned_row[4] = reg_row[4] = cpu_to_be32(d->io_regions[i].size);
+        assigned_row[1] = cpu_to_be32(d->io_regions[i].addr >> 32);
+        assigned_row[2] = cpu_to_be32(d->io_regions[i].addr);
+        idx++;
+    }
+
+    *reg_size = (idx + 1) * RESOURCE_CELLS_TOTAL * sizeof(uint32_t);
+    *assigned_size = idx * RESOURCE_CELLS_TOTAL * sizeof(uint32_t);
+}
+
+static hwaddr spapr_find_bar_addr(sPAPRPHBState *phb, PCIIORegion *r)
+{
+    MemoryRegionSection mrs = { 0 };
+    hwaddr search_addr;
+    hwaddr size = r->size;
+    hwaddr addr_mask = ~(size - 1);
+    hwaddr increment = size;
+    hwaddr limit;
+
+    if (r->type == PCI_BASE_ADDRESS_SPACE_MEMORY) {
+        /* beginning portion of mmio address space for bus does not get
+         * mapped into system memory, so calculate addr starting at the
+         * corresponding offset into mmio as.
+         */
+        search_addr = (SPAPR_PCI_MEM_WIN_BUS_OFFSET + increment) & addr_mask;
+    } else {
+        search_addr = increment;
+    }
+    limit = memory_region_size(r->address_space);
+
+    do {
+        mrs = memory_region_find_subregion(r->address_space, search_addr, size);
+        if (mrs.mr) {
+            hwaddr mr_last_addr;
+            mr_last_addr = mrs.mr->addr + memory_region_size(mrs.mr) - 1;
+            search_addr = (mr_last_addr + 1) & addr_mask;
+            if (search_addr <= mr_last_addr) {
+                search_addr += increment;
+            }
+            /* this memory region overlaps, unref and continue searching */
+            memory_region_unref(mrs.mr);
+        }
+    } while (int128_nz(mrs.size) && search_addr + size <= limit);
+
+    if (search_addr + size >= limit) {
+        return PCI_BAR_UNMAPPED;
+    }
+
+    return search_addr;
+}
+
+static int spapr_map_bars(sPAPRPHBState *phb, PCIDevice *dev)
+{
+    PCIIORegion *r;
+    int i, ret = -1;
+
+    for (i = 0; i < PCI_NUM_REGIONS; i++) {
+        uint32_t bar_address = pci_bar(dev, i);
+        uint32_t bar_value;
+        uint16_t cmd_value = pci_default_read_config(dev, PCI_COMMAND, 2);
+        hwaddr addr;
+
+        r = &dev->io_regions[i];
+
+        /* this region isn't registered */
+        if (!r->size) {
+            continue;
+        }
+
+        /* find a hw addr we can map */
+        addr = spapr_find_bar_addr(phb, r);
+        if (addr == PCI_BAR_UNMAPPED) {
+            /* we can't find a free range within address space for this BAR */
+            fprintf(stderr,
+                    "Unable to map BAR %d, no free range available\n", i);
+            return -1;
+        }
+        /* we can probably map this region into memory if there is not
+         * a race condition with some other allocator. write the address
+         * to the device BAR which will force a call to pci_update_mappings
+         */
+        if (r->type & PCI_BASE_ADDRESS_SPACE_IO) {
+            pci_default_write_config(dev, PCI_COMMAND,
+                                     cmd_value | PCI_COMMAND_IO, 2);
+        } else {
+            pci_default_write_config(dev, PCI_COMMAND,
+                                     cmd_value | PCI_COMMAND_MEMORY, 2);
+        }
+
+        bar_value = addr;
+
+        if (i == PCI_ROM_SLOT) {
+            bar_value |= PCI_ROM_ADDRESS_ENABLE;
+        }
+        /* write the new bar value */
+        pci_default_write_config(dev, bar_address, bar_value, 4);
+
+        /* if this is a 64-bit BAR, we need to also write the
+         * upper 32 bit value.
+         */
+        if (r->type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
+            bar_value = (addr >> 32) & 0xffffffffUL;
+            pci_default_write_config(dev, bar_address + 4, bar_value, 4);
+        }
+        ret = 0;
+    }
+    return ret;
+}
+
+static int spapr_populate_pci_child_dt(PCIDevice *dev, void *fdt, int offset,
+                                       int phb_index)
+{
+    int slot = PCI_SLOT(dev->devfn);
+    char slotname[16];
+    bool is_bridge = 1;
+    DrcEntry *drc_entry, *drc_entry_slot;
+    uint32_t reg[RESOURCE_CELLS_TOTAL * 8] = { 0 };
+    uint32_t assigned[RESOURCE_CELLS_TOTAL * 8] = { 0 };
+    int reg_size, assigned_size;
+
+    drc_entry = spapr_phb_to_drc_entry(phb_index + SPAPR_PCI_BASE_BUID);
+    g_assert(drc_entry);
+    drc_entry_slot = &drc_entry->child_entries[slot];
+
+    if (pci_default_read_config(dev, PCI_HEADER_TYPE, 1) ==
+        PCI_HEADER_TYPE_NORMAL) {
+        is_bridge = 0;
+    }
+
+    _FDT(fdt_setprop_cell(fdt, offset, "vendor-id",
+                          pci_default_read_config(dev, PCI_VENDOR_ID, 2)));
+    _FDT(fdt_setprop_cell(fdt, offset, "device-id",
+                          pci_default_read_config(dev, PCI_DEVICE_ID, 2)));
+    _FDT(fdt_setprop_cell(fdt, offset, "revision-id",
+                          pci_default_read_config(dev, PCI_REVISION_ID, 1)));
+    _FDT(fdt_setprop_cell(fdt, offset, "class-code",
+                          pci_default_read_config(dev, PCI_CLASS_DEVICE, 2) << 8));
+
+    _FDT(fdt_setprop_cell(fdt, offset, "interrupts",
+                          pci_default_read_config(dev, PCI_INTERRUPT_PIN, 1)));
+
+    /* if this device is NOT a bridge */
+    if (!is_bridge) {
+        _FDT(fdt_setprop_cell(fdt, offset, "min-grant",
+            pci_default_read_config(dev, PCI_MIN_GNT, 1)));
+        _FDT(fdt_setprop_cell(fdt, offset, "max-latency",
+            pci_default_read_config(dev, PCI_MAX_LAT, 1)));
+        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-id",
+            pci_default_read_config(dev, PCI_SUBSYSTEM_ID, 2)));
+        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-vendor-id",
+            pci_default_read_config(dev, PCI_SUBSYSTEM_VENDOR_ID, 2)));
+    }
+
+    _FDT(fdt_setprop_cell(fdt, offset, "cache-line-size",
+        pci_default_read_config(dev, PCI_CACHE_LINE_SIZE, 1)));
+
+    /* the following fdt cells are masked off the pci status register */
+    int pci_status = pci_default_read_config(dev, PCI_STATUS, 2);
+    _FDT(fdt_setprop_cell(fdt, offset, "devsel-speed",
+                          PCI_STATUS_DEVSEL_MASK & pci_status));
+    _FDT(fdt_setprop_cell(fdt, offset, "fast-back-to-back",
+                          PCI_STATUS_FAST_BACK & pci_status));
+    _FDT(fdt_setprop_cell(fdt, offset, "66mhz-capable",
+                          PCI_STATUS_66MHZ & pci_status));
+    _FDT(fdt_setprop_cell(fdt, offset, "udf-supported",
+                          PCI_STATUS_UDF & pci_status));
+
+    _FDT(fdt_setprop_string(fdt, offset, "name", "pci"));
+    sprintf(slotname, "Slot %d", slot + phb_index * 32);
+    _FDT(fdt_setprop(fdt, offset, "ibm,loc-code", slotname, strlen(slotname)));
+    _FDT(fdt_setprop_cell(fdt, offset, "ibm,my-drc-index",
+                          drc_entry_slot->drc_index));
+
+    _FDT(fdt_setprop_cell(fdt, offset, "#address-cells",
+                          RESOURCE_CELLS_ADDRESS));
+    _FDT(fdt_setprop_cell(fdt, offset, "#size-cells",
+                          RESOURCE_CELLS_SIZE));
+    _FDT(fdt_setprop_cell(fdt, offset, "ibm,req#msi-x",
+                          RESOURCE_CELLS_SIZE));
+    fill_resource_props(dev, phb_index, reg, &reg_size,
+                        assigned, &assigned_size);
+    _FDT(fdt_setprop(fdt, offset, "reg", reg, reg_size));
+    _FDT(fdt_setprop(fdt, offset, "assigned-addresses",
+                     assigned, assigned_size));
+
+    return 0;
+}
+
+static int spapr_device_hotplug_add(DeviceState *qdev, PCIDevice *dev)
+{
+    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(qdev);
+    DrcEntry *drc_entry, *drc_entry_slot;
+    ConfigureConnectorState *ccs;
+    int slot = PCI_SLOT(dev->devfn);
+    int offset, ret;
+    void *fdt_orig, *fdt;
+    char nodename[512];
+    uint32_t encoded = ENCODE_DRC_STATE(INDICATOR_ENTITY_SENSE_PRESENT,
+                                        INDICATOR_ENTITY_SENSE_MASK,
+                                        INDICATOR_ENTITY_SENSE_SHIFT);
+
+    drc_entry = spapr_phb_to_drc_entry(phb->buid);
+    g_assert(drc_entry);
+    drc_entry_slot = &drc_entry->child_entries[slot];
+
+    drc_entry->state &= ~(uint32_t)INDICATOR_ENTITY_SENSE_MASK;
+    drc_entry->state |= encoded; /* DR entity present */
+    drc_entry_slot->state &= ~(uint32_t)INDICATOR_ENTITY_SENSE_MASK;
+    drc_entry_slot->state |= encoded; /* and the slot */
+
+    /* need to allocate memory region for device BARs */
+    spapr_map_bars(phb, dev);
+
+    /* add OF node for pci device and required OF DT properties */
+    fdt_orig = g_malloc0(FDT_MAX_SIZE);
+    offset = fdt_create(fdt_orig, FDT_MAX_SIZE);
+    fdt_begin_node(fdt_orig, "");
+    fdt_end_node(fdt_orig);
+    fdt_finish(fdt_orig);
+
+    fdt = g_malloc0(FDT_MAX_SIZE);
+    fdt_open_into(fdt_orig, fdt, FDT_MAX_SIZE);
+    sprintf(nodename, "pci@%d", slot);
+    offset = fdt_add_subnode(fdt, 0, nodename);
+    ret = spapr_populate_pci_child_dt(dev, fdt, offset, phb->index);
+    g_assert(!ret);
+    g_free(fdt_orig);
+
+    /* hold on to node, configure_connector will pass it to the guest later */
+    ccs = &drc_entry_slot->cc_state;
+    ccs->fdt = fdt;
+    ccs->offset_start = offset;
+    ccs->state = CC_STATE_PENDING;
+    ccs->dev = dev;
+
+    return 0;
+}
+
+/* check whether guest has released/isolated device */
+static bool spapr_drc_state_is_releasable(DrcEntry *drc_entry)
+{
+    return !DECODE_DRC_STATE(drc_entry->state,
+                             INDICATOR_ISOLATION_MASK,
+                             INDICATOR_ISOLATION_SHIFT);
+}
+
+/* finalize device unplug/deletion */
+static void spapr_drc_state_reset(DrcEntry *drc_entry)
+{
+    ConfigureConnectorState *ccs = &drc_entry->cc_state;
+    uint32_t sense_empty = ENCODE_DRC_STATE(INDICATOR_ENTITY_SENSE_EMPTY,
+                                            INDICATOR_ENTITY_SENSE_MASK,
+                                            INDICATOR_ENTITY_SENSE_SHIFT);
+
+    g_free(ccs->fdt);
+    ccs->fdt = NULL;
+    object_unparent(OBJECT(ccs->dev));
+    ccs->dev = NULL;
+    ccs->state = CC_STATE_IDLE;
+    drc_entry->state &= ~INDICATOR_ENTITY_SENSE_MASK;
+    drc_entry->state |= sense_empty;
+    drc_entry->awaiting_release = false;
+}
+
+static void spapr_device_hotplug_remove(DeviceState *qdev, PCIDevice *dev)
+{
+    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(qdev);
+    DrcEntry *drc_entry, *drc_entry_slot;
+    ConfigureConnectorState *ccs;
+    int slot = PCI_SLOT(dev->devfn);
+
+    drc_entry = spapr_phb_to_drc_entry(phb->buid);
+    g_assert(drc_entry);
+    drc_entry_slot = &drc_entry->child_entries[slot];
+    ccs = &drc_entry_slot->cc_state;
+    /* shouldn't be removing devices we haven't created an fdt for */
+    g_assert(ccs->state != CC_STATE_IDLE);
+    /* if the device has already been released/isolated by guest, go ahead
+     * and remove it now. Otherwise, flag it as pending guest release so it
+     * can be removed later
+     */
+    if (spapr_drc_state_is_releasable(drc_entry_slot)) {
+        spapr_drc_state_reset(drc_entry_slot);
+    } else {
+        if (drc_entry_slot->awaiting_release) {
+            fprintf(stderr, "waiting for guest to release the device");
+        } else {
+            drc_entry_slot->awaiting_release = true;
+        }
+    }
+}
+
+static int spapr_device_hotplug(DeviceState *qdev, PCIDevice *dev,
+                                PCIHotplugState state)
+{
+    if (state == PCI_COLDPLUG_ENABLED) {
+        return 0;
+    }
+
+    if (state == PCI_HOTPLUG_ENABLED) {
+        spapr_device_hotplug_add(qdev, dev);
+    } else {
+        spapr_device_hotplug_remove(qdev, dev);
+    }
+
+    return 0;
+}
+
 static int spapr_phb_init(SysBusDevice *s)
 {
     DeviceState *dev = DEVICE(s);
@@ -889,6 +1255,7 @@ static int spapr_phb_init(SysBusDevice *s)
                            &sphb->memspace, &sphb->iospace,
                            PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
     phb->bus = bus;
+    pci_bus_hotplug(phb->bus, spapr_device_hotplug, DEVICE(sphb));
 
     sphb->dma_window_start = 0;
     sphb->dma_window_size = 0x40000000;
@@ -1181,14 +1548,6 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
         return bus_off;
     }
 
-#define _FDT(exp) \
-    do { \
-        int ret = (exp);                                           \
-        if (ret < 0) {                                             \
-            return ret;                                            \
-        }                                                          \
-    } while (0)
-
     /* Write PHB properties */
     _FDT(fdt_setprop_string(fdt, bus_off, "device_type", "pci"));
     _FDT(fdt_setprop_string(fdt, bus_off, "compatible", "IBM,Logical_PHB"));
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 7c8a521..1c9b725 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -328,6 +328,7 @@ struct DrcEntry {
     void *fdt;
     int fdt_offset;
     uint32_t state;
+    bool awaiting_release;
     ConfigureConnectorState cc_state;
     DrcEntry *child_entries;
 };
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH v2 12/14] spapr_events: re-use EPOW event infrastructure for hotplug events
  2013-12-05 22:32 [Qemu-devel] [PATCH v2 00/14] spapr: add support for pci hotplug Michael Roth
                   ` (10 preceding siblings ...)
  2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 11/14] spapr_pci: enable basic hotplug operations Michael Roth
@ 2013-12-05 22:33 ` Michael Roth
  2013-12-16  5:05   ` Alexey Kardashevskiy
  2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 13/14] spapr_events: event-scan RTAS interface Michael Roth
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 39+ messages in thread
From: Michael Roth @ 2013-12-05 22:33 UTC (permalink / raw
  To: qemu-devel; +Cc: aik, agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

From: Nathan Fontenot <nfont@linux.vnet.ibm.com>

This extends the data structures currently used to report EPOW events to
gets via the check-exception RTAS interfaces to also include event types
for hotplug/unplug events.

This is currently undocumented and being finalized for inclusion in PAPR
specification, but we implement this here as an extension for guest
userspace tools to implement (existing guest kernels simply log these
events via a sysfs interface that's read by rtas_errd).

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |    2 +-
 hw/ppc/spapr_events.c  |  219 +++++++++++++++++++++++++++++++++++++++---------
 include/hw/ppc/spapr.h |    4 +-
 3 files changed, 184 insertions(+), 41 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 2250ee1..7079e4e 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1522,7 +1522,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
     spapr->fdt_skel = spapr_create_fdt_skel(initrd_base, initrd_size,
                                             kernel_size, kernel_le,
                                             boot_device, kernel_cmdline,
-                                            spapr->epow_irq);
+                                            spapr->check_exception_irq);
     assert(spapr->fdt_skel != NULL);
 }
 
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 16fa49e..9dfdbcf 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -32,6 +32,8 @@
 
 #include "hw/ppc/spapr.h"
 #include "hw/ppc/spapr_vio.h"
+#include "hw/pci/pci.h"
+#include "hw/pci-host/spapr.h"
 
 #include <libfdt.h>
 
@@ -77,6 +79,7 @@ struct rtas_error_log {
 #define   RTAS_LOG_TYPE_ECC_UNCORR              0x00000009
 #define   RTAS_LOG_TYPE_ECC_CORR                0x0000000a
 #define   RTAS_LOG_TYPE_EPOW                    0x00000040
+#define   RTAS_LOG_TYPE_HOTPLUG                 0x000000e5
     uint32_t extended_length;
 } QEMU_PACKED;
 
@@ -166,6 +169,38 @@ struct epow_log_full {
     struct rtas_event_log_v6_epow epow;
 } QEMU_PACKED;
 
+struct rtas_event_log_v6_hp {
+#define RTAS_LOG_V6_SECTION_ID_HOTPLUG              0x4850 /* HP */
+    struct rtas_event_log_v6_section_header hdr;
+    uint8_t hotplug_type;
+#define RTAS_LOG_V6_HP_TYPE_CPU                          1
+#define RTAS_LOG_V6_HP_TYPE_MEMORY                       2
+#define RTAS_LOG_V6_HP_TYPE_SLOT                         3
+#define RTAS_LOG_V6_HP_TYPE_PHB                          4
+#define RTAS_LOG_V6_HP_TYPE_PCI                          5
+    uint8_t hotplug_action;
+#define RTAS_LOG_V6_HP_ACTION_ADD                        1
+#define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
+    uint8_t hotplug_identifier;
+#define RTAS_LOG_V6_HP_ID_DRC_NAME                       1
+#define RTAS_LOG_V6_HP_ID_DRC_INDEX                      2
+#define RTAS_LOG_V6_HP_ID_DRC_COUNT                      3
+    uint8_t reserved;
+    union {
+        uint32_t index;
+        uint32_t count;
+        char name[1];
+    } drc;
+} QEMU_PACKED;
+
+struct hp_log_full {
+    struct rtas_error_log hdr;
+    struct rtas_event_log_v6 v6hdr;
+    struct rtas_event_log_v6_maina maina;
+    struct rtas_event_log_v6_mainb mainb;
+    struct rtas_event_log_v6_hp hp;
+} QEMU_PACKED;
+
 #define EVENT_MASK_INTERNAL_ERRORS           0x80000000
 #define EVENT_MASK_EPOW                      0x40000000
 #define EVENT_MASK_HOTPLUG                   0x10000000
@@ -181,29 +216,61 @@ struct epow_log_full {
         }                                                          \
     } while (0)
 
-void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq)
+void spapr_events_fdt_skel(void *fdt, uint32_t check_exception_irq)
 {
-    uint32_t epow_irq_ranges[] = {cpu_to_be32(epow_irq), cpu_to_be32(1)};
-    uint32_t epow_interrupts[] = {cpu_to_be32(epow_irq), 0};
+    uint32_t irq_ranges[] = {cpu_to_be32(check_exception_irq), cpu_to_be32(1)};
+    uint32_t interrupts[] = {cpu_to_be32(check_exception_irq), 0};
 
     _FDT((fdt_begin_node(fdt, "event-sources")));
 
     _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
     _FDT((fdt_property_cell(fdt, "#interrupt-cells", 2)));
     _FDT((fdt_property(fdt, "interrupt-ranges",
-                       epow_irq_ranges, sizeof(epow_irq_ranges))));
+                       irq_ranges, sizeof(irq_ranges))));
 
     _FDT((fdt_begin_node(fdt, "epow-events")));
-    _FDT((fdt_property(fdt, "interrupts",
-                       epow_interrupts, sizeof(epow_interrupts))));
+    _FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts))));
     _FDT((fdt_end_node(fdt)));
 
     _FDT((fdt_end_node(fdt)));
 }
 
 static struct epow_log_full *pending_epow;
+static struct hp_log_full *pending_hp;
 static uint32_t next_plid;
 
+static void spapr_init_v6hdr(struct rtas_event_log_v6 *v6hdr)
+{
+    v6hdr->b0 = RTAS_LOG_V6_B0_VALID | RTAS_LOG_V6_B0_NEW_LOG
+        | RTAS_LOG_V6_B0_BIGENDIAN;
+    v6hdr->b2 = RTAS_LOG_V6_B2_POWERPC_FORMAT
+        | RTAS_LOG_V6_B2_LOG_FORMAT_PLATFORM_EVENT;
+    v6hdr->company = cpu_to_be32(RTAS_LOG_V6_COMPANY_IBM);
+}
+
+static void spapr_init_maina(struct rtas_event_log_v6_maina *maina,
+                             int section_count)
+{
+    struct tm tm;
+    int year;
+
+    maina->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINA);
+    maina->hdr.section_length = cpu_to_be16(sizeof(*maina));
+    /* FIXME: section version, subtype and creator id? */
+    qemu_get_timedate(&tm, spapr->rtc_offset);
+    year = tm.tm_year + 1900;
+    maina->creation_date = cpu_to_be32((to_bcd(year / 100) << 24)
+                                       | (to_bcd(year % 100) << 16)
+                                       | (to_bcd(tm.tm_mon + 1) << 8)
+                                       | to_bcd(tm.tm_mday));
+    maina->creation_time = cpu_to_be32((to_bcd(tm.tm_hour) << 24)
+                                       | (to_bcd(tm.tm_min) << 16)
+                                       | (to_bcd(tm.tm_sec) << 8));
+    maina->creator_id = 'H'; /* Hypervisor */
+    maina->section_count = section_count;
+    maina->plid = next_plid++;
+}
+
 static void spapr_powerdown_req(Notifier *n, void *opaque)
 {
     sPAPREnvironment *spapr = container_of(n, sPAPREnvironment, epow_notifier);
@@ -212,8 +279,6 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
     struct rtas_event_log_v6_maina *maina;
     struct rtas_event_log_v6_mainb *mainb;
     struct rtas_event_log_v6_epow *epow;
-    struct tm tm;
-    int year;
 
     if (pending_epow) {
         /* For now, we just throw away earlier events if two come
@@ -237,27 +302,8 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
     hdr->extended_length = cpu_to_be32(sizeof(*pending_epow)
                                        - sizeof(pending_epow->hdr));
 
-    v6hdr->b0 = RTAS_LOG_V6_B0_VALID | RTAS_LOG_V6_B0_NEW_LOG
-        | RTAS_LOG_V6_B0_BIGENDIAN;
-    v6hdr->b2 = RTAS_LOG_V6_B2_POWERPC_FORMAT
-        | RTAS_LOG_V6_B2_LOG_FORMAT_PLATFORM_EVENT;
-    v6hdr->company = cpu_to_be32(RTAS_LOG_V6_COMPANY_IBM);
-
-    maina->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINA);
-    maina->hdr.section_length = cpu_to_be16(sizeof(*maina));
-    /* FIXME: section version, subtype and creator id? */
-    qemu_get_timedate(&tm, spapr->rtc_offset);
-    year = tm.tm_year + 1900;
-    maina->creation_date = cpu_to_be32((to_bcd(year / 100) << 24)
-                                       | (to_bcd(year % 100) << 16)
-                                       | (to_bcd(tm.tm_mon + 1) << 8)
-                                       | to_bcd(tm.tm_mday));
-    maina->creation_time = cpu_to_be32((to_bcd(tm.tm_hour) << 24)
-                                       | (to_bcd(tm.tm_min) << 16)
-                                       | (to_bcd(tm.tm_sec) << 8));
-    maina->creator_id = 'H'; /* Hypervisor */
-    maina->section_count = 3; /* Main-A, Main-B and EPOW */
-    maina->plid = next_plid++;
+    spapr_init_v6hdr(v6hdr);
+    spapr_init_maina(maina, 3 /* Main-A, Main-B and EPOW */);
 
     mainb->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINB);
     mainb->hdr.section_length = cpu_to_be16(sizeof(*mainb));
@@ -274,9 +320,93 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
     epow->event_modifier = RTAS_LOG_V6_EPOW_MODIFIER_NORMAL;
     epow->extended_modifier = RTAS_LOG_V6_EPOW_XMODIFIER_PARTITION_SPECIFIC;
 
-    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->epow_irq));
+    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
+}
+
+static void spapr_hotplug_req_event(uint8_t hp_type, uint8_t hp_action,
+                                    sPAPRPHBState *phb, int slot)
+{
+    struct rtas_error_log *hdr;
+    struct rtas_event_log_v6 *v6hdr;
+    struct rtas_event_log_v6_maina *maina;
+    struct rtas_event_log_v6_mainb *mainb;
+    struct rtas_event_log_v6_hp *hp;
+    DrcEntry *drc_entry;
+
+    if (pending_hp) {
+        /* Just toss any pending hotplug events for now, this will
+         * need to be fixed later on.
+         */
+        g_free(pending_hp);
+    }
+
+    pending_hp = g_malloc0(sizeof(*pending_hp));
+    hdr = &pending_hp->hdr;
+    v6hdr = &pending_hp->v6hdr;
+    maina = &pending_hp->maina;
+    mainb = &pending_hp->mainb;
+    hp = &pending_hp->hp;
+
+    hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
+                               | RTAS_LOG_SEVERITY_EVENT
+                               | RTAS_LOG_DISPOSITION_NOT_RECOVERED
+                               | RTAS_LOG_OPTIONAL_PART_PRESENT
+                               | RTAS_LOG_INITIATOR_HOTPLUG
+                               | RTAS_LOG_TYPE_HOTPLUG);
+    hdr->extended_length = cpu_to_be32(sizeof(*pending_hp)
+                                       - sizeof(pending_hp->hdr));
+
+    spapr_init_v6hdr(v6hdr);
+    spapr_init_maina(maina, 3 /* Main-A, Main-B, HP */);
+
+    mainb->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINB);
+    mainb->hdr.section_length = cpu_to_be16(sizeof(*mainb));
+    mainb->subsystem_id = 0x80; /* External environment */
+    mainb->event_severity = 0x00; /* Informational / non-error */
+    mainb->event_subtype = 0x00; /* Normal shutdown */
+
+    hp->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_HOTPLUG);
+    hp->hdr.section_length = cpu_to_be16(sizeof(*hp));
+    hp->hdr.section_version = 1; /* includes extended modifier */
+    hp->hotplug_action = hp_action;
+
+    hp->hotplug_type = hp_type;
+
+    drc_entry = spapr_phb_to_drc_entry(phb->buid);
+    if (!drc_entry) {
+        drc_entry = spapr_add_phb_to_drc_table(phb->buid, 2 /* Unusable */);
+    }
+
+    switch (hp_type) {
+    case RTAS_LOG_V6_HP_TYPE_PCI:
+        hp->drc.index = drc_entry->child_entries[slot].drc_index;
+        hp->hotplug_identifier = RTAS_LOG_V6_HP_ID_DRC_INDEX;
+        break;
+    }
+
+    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
+}
+
+void spapr_pci_hotplug_add_event(DeviceState *qdev, int slot)
+{
+    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(qdev);
+
+    return spapr_hotplug_req_event(RTAS_LOG_V6_HP_TYPE_PCI,
+                                   RTAS_LOG_V6_HP_ACTION_ADD, phb, slot);
 }
 
+void spapr_pci_hotplug_remove_event(DeviceState *qdev, int slot)
+{
+    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(qdev);
+
+    /* TODO: removal is generally initiated by guest, need to
+     * document what exactly the guest is supposed to do with
+     * this event. What does ACPI or shpc do?
+     */
+    return spapr_hotplug_req_event(RTAS_LOG_V6_HP_TYPE_PCI,
+                                   RTAS_LOG_V6_HP_ACTION_REMOVE, phb, slot);
+ }
+
 static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
                             uint32_t token, uint32_t nargs,
                             target_ulong args,
@@ -298,15 +428,26 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
         xinfo |= (uint64_t)rtas_ld(args, 6) << 32;
     }
 
-    if ((mask & EVENT_MASK_EPOW) && pending_epow) {
-        if (sizeof(*pending_epow) < len) {
-            len = sizeof(*pending_epow);
-        }
+    if (mask & EVENT_MASK_EPOW) {
+        if (pending_epow) {
+            if (sizeof(*pending_epow) < len) {
+                len = sizeof(*pending_epow);
+            }
 
-        cpu_physical_memory_write(buf, pending_epow, len);
-        g_free(pending_epow);
-        pending_epow = NULL;
-        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+            cpu_physical_memory_write(buf, pending_epow, len);
+            g_free(pending_epow);
+            pending_epow = NULL;
+            rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+        } else if (pending_hp) {
+            if (sizeof(*pending_hp) < len) {
+                len = sizeof(*pending_hp);
+            }
+
+            cpu_physical_memory_write(buf, pending_hp, len);
+            g_free(pending_hp);
+            pending_hp = NULL;
+            rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+        }
     } else {
         rtas_st(rets, 0, RTAS_OUT_NO_ERRORS_FOUND);
     }
@@ -314,7 +455,7 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
 
 void spapr_events_init(sPAPREnvironment *spapr)
 {
-    spapr->epow_irq = spapr_allocate_msi(0);
+    spapr->check_exception_irq = spapr_allocate_msi(0);
     spapr->epow_notifier.notify = spapr_powerdown_req;
     qemu_register_powerdown_notifier(&spapr->epow_notifier);
     spapr_rtas_register("check-exception", check_exception);
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 1c9b725..9eef2ce 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -31,7 +31,7 @@ typedef struct sPAPREnvironment {
     uint64_t rtc_offset;
     bool has_graphics;
 
-    uint32_t epow_irq;
+    uint32_t check_exception_irq;
     Notifier epow_notifier;
 
     /* Migration state */
@@ -473,5 +473,7 @@ int spapr_dma_dt(void *fdt, int node_off, const char *propname,
                  uint32_t liobn, uint64_t window, uint32_t size);
 int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
                       sPAPRTCETable *tcet);
+void spapr_pci_hotplug_add_event(DeviceState *qdev, int slot);
+void spapr_pci_hotplug_remove_event(DeviceState *qdev, int slot);
 
 #endif /* !defined (__HW_SPAPR_H__) */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH v2 13/14] spapr_events: event-scan RTAS interface
  2013-12-05 22:32 [Qemu-devel] [PATCH v2 00/14] spapr: add support for pci hotplug Michael Roth
                   ` (11 preceding siblings ...)
  2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 12/14] spapr_events: re-use EPOW event infrastructure for hotplug events Michael Roth
@ 2013-12-05 22:33 ` Michael Roth
  2013-12-16  4:57   ` Alexey Kardashevskiy
  2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 14/14] spapr_pci: emit hotplug add/remove events during hotplug Michael Roth
  2014-01-10  8:29 ` [Qemu-devel] [PATCH v2 00/14] spapr: add support for pci hotplug Alexey Kardashevskiy
  14 siblings, 1 reply; 39+ messages in thread
From: Michael Roth @ 2013-12-05 22:33 UTC (permalink / raw
  To: qemu-devel; +Cc: aik, agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

From: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>

We don't actually rely on this interface to surface hotplug events, and
instead rely on the similar-but-interrupt-driven check-exception RTAS
interface used for EPOW events. However, the existence of this interface
is needed to ensure guest kernels initialize the event-reporting
interfaces which will in turn be used by userspace tools to handle these
events, so we implement this interface as a stub.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c         |    1 +
 hw/ppc/spapr_events.c  |    9 +++++++++
 include/hw/ppc/spapr.h |    2 ++
 3 files changed, 12 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 7079e4e..e7a249b 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -643,6 +643,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
         refpoints, sizeof(refpoints))));
 
     _FDT((fdt_property_cell(fdt, "rtas-error-log-max", RTAS_ERROR_LOG_MAX)));
+    _FDT((fdt_property_cell(fdt, "rtas-event-scan-rate", RTAS_EVENT_SCAN_RATE)));
 
     _FDT((fdt_end_node(fdt)));
 
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 9dfdbcf..69211c5 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -453,10 +453,19 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     }
 }
 
+static void event_scan(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+                            uint32_t token, uint32_t nargs,
+                            target_ulong args,
+                            uint32_t nret, target_ulong rets)
+{
+    rtas_st(rets, 0, 1); /* no error events found */
+}
+
 void spapr_events_init(sPAPREnvironment *spapr)
 {
     spapr->check_exception_irq = spapr_allocate_msi(0);
     spapr->epow_notifier.notify = spapr_powerdown_req;
     qemu_register_powerdown_notifier(&spapr->epow_notifier);
     spapr_rtas_register("check-exception", check_exception);
+    spapr_rtas_register("event-scan", event_scan);
 }
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 9eef2ce..293375b 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -445,6 +445,8 @@ int spapr_rtas_device_tree_setup(void *fdt, hwaddr rtas_addr,
 
 #define RTAS_ERROR_LOG_MAX      2048
 
+#define RTAS_EVENT_SCAN_RATE    1
+
 typedef struct sPAPRTCETable sPAPRTCETable;
 
 #define TYPE_SPAPR_TCE_TABLE "spapr-tce-table"
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH v2 14/14] spapr_pci: emit hotplug add/remove events during hotplug
  2013-12-05 22:32 [Qemu-devel] [PATCH v2 00/14] spapr: add support for pci hotplug Michael Roth
                   ` (12 preceding siblings ...)
  2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 13/14] spapr_events: event-scan RTAS interface Michael Roth
@ 2013-12-05 22:33 ` Michael Roth
  2013-12-16  5:06   ` Alexey Kardashevskiy
  2014-01-10  8:29 ` [Qemu-devel] [PATCH v2 00/14] spapr: add support for pci hotplug Alexey Kardashevskiy
  14 siblings, 1 reply; 39+ messages in thread
From: Michael Roth @ 2013-12-05 22:33 UTC (permalink / raw
  To: qemu-devel; +Cc: aik, agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

From: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>

This uses extension of existing EPOW interrupt/event mechanism
to notify userspace tools like librtas/drmgr to handle
in-guest configuration/cleanup operations in response to
device_add/device_del.

Userspace tools that don't implement this extension will need
to be run manually in response/advance of device_add/device_del,
respectively.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ppc/spapr_pci.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 9b4f829..9821462 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1129,14 +1129,18 @@ static void spapr_device_hotplug_remove(DeviceState *qdev, PCIDevice *dev)
 static int spapr_device_hotplug(DeviceState *qdev, PCIDevice *dev,
                                 PCIHotplugState state)
 {
+    int slot = PCI_SLOT(dev->devfn);
+
     if (state == PCI_COLDPLUG_ENABLED) {
         return 0;
     }
 
     if (state == PCI_HOTPLUG_ENABLED) {
         spapr_device_hotplug_add(qdev, dev);
+        spapr_pci_hotplug_add_event(qdev, slot);
     } else {
         spapr_device_hotplug_remove(qdev, dev);
+        spapr_pci_hotplug_remove_event(qdev, slot);
     }
 
     return 0;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/14] pci: allow 0 address for PCI IO regions
  2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 10/14] pci: allow 0 address for PCI IO regions Michael Roth
@ 2013-12-05 23:33   ` Peter Maydell
  2013-12-10 21:42     ` Michael Roth
  2013-12-12 14:34     ` [Qemu-devel] " Michael S. Tsirkin
  0 siblings, 2 replies; 39+ messages in thread
From: Peter Maydell @ 2013-12-05 23:33 UTC (permalink / raw
  To: Michael Roth
  Cc: qemu-ppc@nongnu.org, Alexey Kardashevskiy, QEMU Developers,
	Alexander Graf, Mike Day, Paul Mackerras, tyreld, nfont

On 5 December 2013 22:33, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> Some kernels program a 0 address for io regions. PCI 3.0 spec
> sectio 6.2.5.1 doesn't seem to disallow this.

Hmm. The last PCI spec I looked at said 0 wasn't a valid MMIO
address, so the variant of this patch I wrote a while back made it
a per PCI device flag whether a particular device let you get away
with it:
 http://patchwork.ozlabs.org/patch/269133/

(the device in question for me was the versatile-pci host bridge).

And presumably whoever put that specific check for 0 into
QEMU had a reason for it.

On the other hand I can't now find whatever document it was
that I was reading that claimed 0 wasn't valid :-(

thanks
-- PMM

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/14] pci: allow 0 address for PCI IO regions
  2013-12-05 23:33   ` Peter Maydell
@ 2013-12-10 21:42     ` Michael Roth
  2013-12-10 22:14       ` Peter Maydell
  2013-12-12 14:34     ` [Qemu-devel] " Michael S. Tsirkin
  1 sibling, 1 reply; 39+ messages in thread
From: Michael Roth @ 2013-12-10 21:42 UTC (permalink / raw
  To: Peter Maydell
  Cc: qemu-ppc@nongnu.org, Alexey Kardashevskiy, QEMU Developers,
	Alexander Graf, Mike Day, Paul Mackerras, tyreld, nfont

Quoting Peter Maydell (2013-12-05 17:33:48)
> On 5 December 2013 22:33, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> > Some kernels program a 0 address for io regions. PCI 3.0 spec
> > sectio 6.2.5.1 doesn't seem to disallow this.
> 
> Hmm. The last PCI spec I looked at said 0 wasn't a valid MMIO
> address, so the variant of this patch I wrote a while back made it
> a per PCI device flag whether a particular device let you get away
> with it:
>  http://patchwork.ozlabs.org/patch/269133/
> 
> (the device in question for me was the versatile-pci host bridge).
> 
> And presumably whoever put that specific check for 0 into
> QEMU had a reason for it.
> 
> On the other hand I can't now find whatever document it was
> that I was reading that claimed 0 wasn't valid :-(

Can't seem to find anything either, checked the 2.3 spec as well. I tried to
look up the git history for the new_addr == 0 check but unfortunately it seemed
to be part of the initial check-in.

The only clue I've found regarding special-casing for a 0-bar is this:

"Power-up software can determine how much address space the device requires by
writing a value of all 1's to the register and then reading the value back. The
device will return 0's in all don't-care address bits, effectively specifying
the address space required. Unimplemented Base Address registers are hardwired
to zero." - PCI 3.0, 6.2.5.1

But that's only applicable in cases where we're sizing the bar. However, the
way things are implemented in pci_bar_address(), update hooks for
unimplemented/zero-sized bars as well as zero-address bars would be handled
by the same code, so I wonder if that was initially added to check the former?
It's a bit of a stretch, since QEMU sets the reported sizes and not the guest,
but maybe?

Does that seem familiar wrt the documentation you mentioned, or was it something
else?

> 
> thanks
> -- PMM

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/14] pci: allow 0 address for PCI IO regions
  2013-12-10 21:42     ` Michael Roth
@ 2013-12-10 22:14       ` Peter Maydell
  2013-12-10 23:03         ` [Qemu-devel] [Qemu-ppc] " Benjamin Herrenschmidt
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Maydell @ 2013-12-10 22:14 UTC (permalink / raw
  To: Michael Roth
  Cc: qemu-ppc@nongnu.org, Alexey Kardashevskiy, QEMU Developers,
	Alexander Graf, Mike Day, Paul Mackerras, tyreld, nfont

On 10 December 2013 21:42, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> Quoting Peter Maydell (2013-12-05 17:33:48)
>> And presumably whoever put that specific check for 0 into
>> QEMU had a reason for it.
>>
>> On the other hand I can't now find whatever document it was
>> that I was reading that claimed 0 wasn't valid :-(
>
> Can't seem to find anything either, checked the 2.3 spec as well. I tried to
> look up the git history for the new_addr == 0 check but unfortunately it seemed
> to be part of the initial check-in.
>
> The only clue I've found regarding special-casing for a 0-bar is this:
>
> "Power-up software can determine how much address space the device requires by
> writing a value of all 1's to the register and then reading the value back. The
> device will return 0's in all don't-care address bits, effectively specifying
> the address space required. Unimplemented Base Address registers are hardwired
> to zero." - PCI 3.0, 6.2.5.1

Googling again brought up this mailing list thread:

http://www.pcisig.com/reflector/msg00459.html

which includes what is supposedly a quote from the PCI 2.1 spec:

# "Note: A Base Address register does not contain a valid
# address when it is equal to "0""

(I don't have access to the 2.1 version to check.)

This text seems to have been removed from the 2.2 spec.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 10/14] pci: allow 0 address for PCI IO regions
  2013-12-10 22:14       ` Peter Maydell
@ 2013-12-10 23:03         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 39+ messages in thread
From: Benjamin Herrenschmidt @ 2013-12-10 23:03 UTC (permalink / raw
  To: Peter Maydell
  Cc: Michael Roth, QEMU Developers, Mike Day, Paul Mackerras, tyreld,
	nfont, qemu-ppc@nongnu.org

On Tue, 2013-12-10 at 22:14 +0000, Peter Maydell wrote:

> Googling again brought up this mailing list thread:
> 
> http://www.pcisig.com/reflector/msg00459.html
> 
> which includes what is supposedly a quote from the PCI 2.1 spec:
> 
> # "Note: A Base Address register does not contain a valid
> # address when it is equal to "0""
> 
> (I don't have access to the 2.1 version to check.)
> 
> This text seems to have been removed from the 2.2 spec.

I have seen practical cases of both:

 - Systems where the FW sets up a BAR to 0 and considers it valid

 - Adapters that treat a BAR set to 0 as disabled

There's no win here. However it makes sense for qemu not to treat 0
as a special value, it's not necessary.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/14] pci: allow 0 address for PCI IO regions
  2013-12-05 23:33   ` Peter Maydell
  2013-12-10 21:42     ` Michael Roth
@ 2013-12-12 14:34     ` Michael S. Tsirkin
  1 sibling, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2013-12-12 14:34 UTC (permalink / raw
  To: Peter Maydell
  Cc: QEMU Developers, Alexey Kardashevskiy, Alexander Graf,
	Michael Roth, Mike Day, qemu-ppc@nongnu.org, tyreld, nfont,
	Paul Mackerras

On Thu, Dec 05, 2013 at 11:33:48PM +0000, Peter Maydell wrote:
> On 5 December 2013 22:33, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> > Some kernels program a 0 address for io regions. PCI 3.0 spec
> > sectio 6.2.5.1 doesn't seem to disallow this.
> 
> Hmm. The last PCI spec I looked at said 0 wasn't a valid MMIO
> address, so the variant of this patch I wrote a while back made it
> a per PCI device flag whether a particular device let you get away
> with it:
>  http://patchwork.ozlabs.org/patch/269133/
> 
> (the device in question for me was the versatile-pci host bridge).
> 
> And presumably whoever put that specific check for 0 into
> QEMU had a reason for it.

It used to be the case that if you created a conflicting
value for the BAR, you corrupted dispatch tables forever.
Now that dispatch tables are rebuilt on any change that
is less of an issue, but maybe that code is there to handle that,
e.g. to avoid conflictig with apic or other non pci devices.

> On the other hand I can't now find whatever document it was
> that I was reading that claimed 0 wasn't valid :-(
> 
> thanks
> -- PMM

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/14] spapr: populate DRC entries for root dt node
  2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 01/14] spapr: populate DRC entries for root dt node Michael Roth
@ 2013-12-16  2:59   ` Alexey Kardashevskiy
  2013-12-16  4:54     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 39+ messages in thread
From: Alexey Kardashevskiy @ 2013-12-16  2:59 UTC (permalink / raw
  To: Michael Roth, qemu-devel; +Cc: agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

On 12/06/2013 09:32 AM, Michael Roth wrote:
> From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> 
> This add entries to the root OF node to advertise our PHBs as being
> DR-capable in according with PAPR specification.
> 
> Each PHB is given a name of PHB<bus#>, advertised as a PHB type,
> and associated with a power domain of -1 (indicating to guests that
> power management is handled automatically by hardware).
> 
> We currently allocate entries for up to 32 DR-capable PHBs, though
> this limit can be increased later.
> 
> DrcEntry objects to track the state of the DR-connector associated
> with each PHB are stored in a 32-entry array, and each DrcEntry has
> in turn have a dynamically-sized number of child DR-connectors,
> which we will use later to track the state of DR-connectors
> associated with a PHB's physical slots.
> 
> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c         |  132 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |   33 ++++++++++++
>  2 files changed, 165 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 7e53a5f..ec3ba43 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -81,6 +81,7 @@
>  #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
>  
>  sPAPREnvironment *spapr;
> +DrcEntry drc_table[SPAPR_DRC_TABLE_SIZE];
>  
>  int spapr_allocate_irq(int hint, bool lsi)
>  {
> @@ -276,6 +277,130 @@ static size_t create_page_sizes_prop(CPUPPCState *env, uint32_t *prop,
>      return (p - prop) * sizeof(uint32_t);
>  }
>  
> +static void spapr_init_drc_table(void)
> +{
> +    int i;
> +
> +    memset(drc_table, 0, sizeof(drc_table));
> +
> +    /* For now we only care about PHB entries */
> +    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
> +        drc_table[i].drc_index = 0x2000001 + i;
> +    }
> +}
> +
> +DrcEntry *spapr_add_phb_to_drc_table(uint64_t buid, uint32_t state)
> +{
> +    DrcEntry *empty_drc = NULL;
> +    DrcEntry *found_drc = NULL;
> +    int i, phb_index;
> +
> +    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
> +        if (drc_table[i].phb_buid == 0) {
> +            empty_drc = &drc_table[i];
> +        }
> +
> +        if (drc_table[i].phb_buid == buid) {
> +            found_drc = &drc_table[i];
> +            break;
> +        }
> +    }
> +
> +    if (found_drc) {
> +        return found_drc;
> +    }
> +
> +    if (empty_drc) {
> +        empty_drc->phb_buid = buid;
> +        empty_drc->state = state;
> +        empty_drc->cc_state.fdt = NULL;
> +        empty_drc->cc_state.offset = 0;
> +        empty_drc->cc_state.depth = 0;
> +        empty_drc->cc_state.state = CC_STATE_IDLE;
> +        empty_drc->child_entries =
> +            g_malloc0(sizeof(DrcEntry) * SPAPR_DRC_PHB_SLOT_MAX);
> +        phb_index = buid - SPAPR_PCI_BASE_BUID;
> +        for (i = 0; i < SPAPR_DRC_PHB_SLOT_MAX; i++) {
> +            empty_drc->child_entries[i].drc_index =
> +                SPAPR_DRC_DEV_ID_BASE + (phb_index << 8) + (i << 3);
> +        }
> +        return empty_drc;
> +    }
> +
> +    return NULL;
> +}
> +
> +static void spapr_create_drc_dt_entries(void *fdt)
> +{
> +    char char_buf[1024];
> +    uint32_t int_buf[SPAPR_DRC_TABLE_SIZE + 1];
> +    uint32_t *entries;
> +    int offset, fdt_offset;
> +    int i, ret;
> +
> +    fdt_offset = fdt_path_offset(fdt, "/");
> +
> +    /* ibm,drc-indexes */
> +    memset(int_buf, 0, sizeof(int_buf));
> +    int_buf[0] = SPAPR_DRC_TABLE_SIZE;
> +
> +    for (i = 1; i <= SPAPR_DRC_TABLE_SIZE; i++) {
> +        int_buf[i] = drc_table[i-1].drc_index;
> +    }
> +
> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-indexes", int_buf,
> +                      sizeof(int_buf));
> +    if (ret) {
> +        fprintf(stderr, "Couldn't finalize ibm,drc-indexes property\n");
> +    }
> +
> +    /* ibm,drc-power-domains */
> +    memset(int_buf, 0, sizeof(int_buf));
> +    int_buf[0] = SPAPR_DRC_TABLE_SIZE;
> +
> +    for (i = 1; i <= SPAPR_DRC_TABLE_SIZE; i++) {
> +        int_buf[i] = 0xffffffff;
> +    }
> +
> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-power-domains", int_buf,
> +                      sizeof(int_buf));
> +    if (ret) {
> +        fprintf(stderr, "Couldn't finalize ibm,drc-power-domains property\n");
> +    }
> +
> +    /* ibm,drc-names */
> +    memset(char_buf, 0, sizeof(char_buf));
> +    entries = (uint32_t *)&char_buf[0];
> +    *entries = SPAPR_DRC_TABLE_SIZE;
> +    offset = sizeof(*entries);
> +
> +    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
> +        offset += sprintf(char_buf + offset, "PHB %d", i + 1);
> +        char_buf[offset++] = '\0';
> +    }
> +
> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-names", char_buf, offset);
> +    if (ret) {
> +        fprintf(stderr, "Couldn't finalize ibm,drc-names property\n");
> +    }
> +
> +    /* ibm,drc-types */
> +    memset(char_buf, 0, sizeof(char_buf));
> +    entries = (uint32_t *)&char_buf[0];
> +    *entries = SPAPR_DRC_TABLE_SIZE;
> +    offset = sizeof(*entries);
> +
> +    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
> +        offset += sprintf(char_buf + offset, "PHB");
> +        char_buf[offset++] = '\0';
> +    }
> +
> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-types", char_buf, offset);
> +    if (ret) {
> +        fprintf(stderr, "Couldn't finalize ibm,drc-types property\n");
> +    }
> +}
> +
>  #define _FDT(exp) \
>      do { \
>          int ret = (exp);                                           \
> @@ -307,6 +432,8 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
>      int i, smt = kvmppc_smt_threads();
>      unsigned char vec5[] = {0x0, 0x0, 0x0, 0x0, 0x0, 0x80};
>  
> +    spapr_init_drc_table();
> +
>      fdt = g_malloc0(FDT_MAX_SIZE);
>      _FDT((fdt_create(fdt, FDT_MAX_SIZE)));
>  
> @@ -590,6 +717,7 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
>      int ret;
>      void *fdt;
>      sPAPRPHBState *phb;
> +    DrcEntry *drc_entry;
>  
>      fdt = g_malloc(FDT_MAX_SIZE);
>  
> @@ -609,6 +737,8 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
>      }
>  
>      QLIST_FOREACH(phb, &spapr->phbs, list) {
> +        drc_entry = spapr_add_phb_to_drc_table(phb->buid, 2 /* Unusable */);
> +        g_assert(drc_entry);
>          ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt);
>      }
>  
> @@ -633,6 +763,8 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
>          spapr_populate_chosen_stdout(fdt, spapr->vio_bus);
>      }
>  
> +    spapr_create_drc_dt_entries(fdt);
> +
>      _FDT((fdt_pack(fdt)));
>  
>      if (fdt_totalsize(fdt) > FDT_MAX_SIZE) {
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index b2f11e9..0f2e705 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -299,6 +299,39 @@ typedef struct sPAPREnvironment {
>  #define KVMPPC_H_LOGICAL_MEMOP  (KVMPPC_HCALL_BASE + 0x1)
>  #define KVMPPC_HCALL_MAX        KVMPPC_H_LOGICAL_MEMOP
>  
> +/* For dlparable/hotpluggable slots */
> +#define SPAPR_DRC_TABLE_SIZE    32
> +#define SPAPR_DRC_PHB_SLOT_MAX  32
> +#define SPAPR_DRC_DEV_ID_BASE   0x40000000
> +
> +typedef struct ConfigureConnectorState {
> +    void *fdt;
> +    int offset_start;
> +    int offset;
> +    int depth;
> +    PCIDevice *dev;
> +    enum {
> +        CC_STATE_IDLE = 0,
> +        CC_STATE_PENDING = 1,
> +        CC_STATE_ACTIVE,
> +    } state;
> +} ConfigureConnectorState;
> +
> +typedef struct DrcEntry DrcEntry;
> +
> +struct DrcEntry {
> +    uint32_t drc_index;
> +    uint64_t phb_buid;
> +    void *fdt;
> +    int fdt_offset;
> +    uint32_t state;
> +    ConfigureConnectorState cc_state;
> +    DrcEntry *child_entries;
> +};
> +
> +extern DrcEntry drc_table[SPAPR_DRC_TABLE_SIZE];
> +DrcEntry *spapr_add_phb_to_drc_table(uint64_t buid, uint32_t state);
> +
>  extern sPAPREnvironment *spapr;

So far we were trying to keep everything sPAPR-related in sPAPREnvironment.
Is @drc_table really that special?


>  
>  /*#define DEBUG_SPAPR_HCALLS*/
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/14] spapr_pci: add get/set-power-level RTAS interfaces
  2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 05/14] spapr_pci: add get/set-power-level RTAS interfaces Michael Roth
@ 2013-12-16  3:09   ` Alexey Kardashevskiy
  2014-01-16 21:01     ` Michael Roth
  0 siblings, 1 reply; 39+ messages in thread
From: Alexey Kardashevskiy @ 2013-12-16  3:09 UTC (permalink / raw
  To: Michael Roth, qemu-devel; +Cc: agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

On 12/06/2013 09:32 AM, Michael Roth wrote:
> From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> 
> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_pci.c |   22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 1046ec8..8df44a3 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -488,6 +488,26 @@ static void rtas_set_indicator(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>      rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>  }
>  
> +static void rtas_set_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> +                                 uint32_t token, uint32_t nargs,
> +                                 target_ulong args, uint32_t nret,
> +                                 target_ulong rets)
> +{
> +    uint32_t power_lvl = rtas_ld(args, 1);
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +    rtas_st(rets, 1, power_lvl);
> +}
> +
> +static void rtas_get_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> +                                  uint32_t token, uint32_t nargs,
> +                                  target_ulong args, uint32_t nret,
> +                                  target_ulong rets)
> +{
> +    /* return SUCCESS with a power level of 100 */
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +    rtas_st(rets, 1, 100);
> +}
> +

The PAPR spec says that rtas_set_power_level() returns "Actual_level The
power level actually set" but rtas_get_power_level() always returns 100
(full power).

Is this inconsistency here for a reason?


>  static int pci_spapr_swizzle(int slot, int pin)
>  {
>      return (slot + pin) % PCI_NUM_PINS;
> @@ -1051,6 +1071,8 @@ void spapr_pci_rtas_init(void)
>          spapr_rtas_register("ibm,change-msi", rtas_ibm_change_msi);
>      }
>      spapr_rtas_register("set-indicator", rtas_set_indicator);
> +    spapr_rtas_register("set-power-level", rtas_set_power_level);
> +    spapr_rtas_register("get-power-level", rtas_get_power_level);
>  }
>  
>  static void spapr_pci_register_types(void)
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/14] spapr_pci: add set-indicator RTAS interface
  2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 04/14] spapr_pci: add set-indicator RTAS interface Michael Roth
@ 2013-12-16  4:26   ` Alexey Kardashevskiy
  2014-01-16 20:54     ` Michael Roth
  0 siblings, 1 reply; 39+ messages in thread
From: Alexey Kardashevskiy @ 2013-12-16  4:26 UTC (permalink / raw
  To: Michael Roth, qemu-devel; +Cc: agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

On 12/06/2013 09:32 AM, Michael Roth wrote:
> From: Mike Day <ncmike@ncultra.org>
> 
> Signed-off-by: Mike Day <ncmike@ncultra.org>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_pci.c     |   93 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h |   28 +++++++++++++++
>  2 files changed, 121 insertions(+)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 7568a03..1046ec8 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -35,6 +35,16 @@
>  
>  #include "hw/pci/pci_bus.h"
>  
> +/* #define DEBUG_SPAPR */
> +
> +#ifdef DEBUG_SPAPR
> +#define DPRINTF(fmt, ...) \
> +    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
> +#else
> +#define DPRINTF(fmt, ...) \
> +    do { } while (0)
> +#endif
> +
>  /* Copied from the kernel arch/powerpc/platforms/pseries/msi.c */
>  #define RTAS_QUERY_FN           0
>  #define RTAS_CHANGE_FN          1
> @@ -404,6 +414,80 @@ static void rtas_ibm_query_interrupt_source_number(PowerPCCPU *cpu,
>      rtas_st(rets, 2, 1);/* 0 == level; 1 == edge */
>  }
>  
> +static void rtas_set_indicator(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> +                               uint32_t token, uint32_t nargs,
> +                               target_ulong args, uint32_t nret,
> +                               target_ulong rets)
> +{
> +    uint32_t indicator = rtas_ld(args, 0);
> +    uint32_t drc_index = rtas_ld(args, 1);
> +    uint32_t indicator_state = rtas_ld(args, 2);
> +    uint32_t encoded = 0, shift = 0, mask = 0;
> +    uint32_t *pind;
> +    DrcEntry *drc_entry = NULL;
> +
> +    if (drc_index == 0) { /* platform indicator */
> +        pind = &spapr->state;
> +    } else {
> +        drc_entry = spapr_find_drc_entry(drc_index);
> +        if (!drc_entry) {
> +            DPRINTF("rtas_set_indicator: unable to find drc_entry for %x",
> +                    drc_index);
> +            rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +            return;
> +        }
> +        pind = &drc_entry->state;
> +    }
> +
> +    switch (indicator) {
> +    case 9:  /* EPOW */
> +        shift = INDICATOR_EPOW_SHIFT;
> +        mask = INDICATOR_EPOW_MASK;
> +        break;
> +    case 9001: /* Isolation state */
> +        /* encode the new value into the correct bit field */
> +        shift = INDICATOR_ISOLATION_SHIFT;
> +        mask = INDICATOR_ISOLATION_MASK;
> +        break;
> +    case 9002: /* DR */
> +        shift = INDICATOR_DR_SHIFT;
> +        mask = INDICATOR_DR_MASK;
> +        break;
> +    case 9003: /* Allocation State */
> +        shift = INDICATOR_ALLOCATION_SHIFT;
> +        mask = INDICATOR_ALLOCATION_MASK;
> +        break;
> +    case 9005: /* global interrupt */
> +        shift = INDICATOR_GLOBAL_INTERRUPT_SHIFT;
> +        mask = INDICATOR_GLOBAL_INTERRUPT_MASK;
> +        break;
> +    case 9006: /* error log */
> +        shift = INDICATOR_ERROR_LOG_SHIFT;
> +        mask = INDICATOR_ERROR_LOG_MASK;
> +        break;
> +    case 9007: /* identify */
> +        shift = INDICATOR_IDENTIFY_SHIFT;
> +        mask = INDICATOR_IDENTIFY_MASK;
> +        break;
> +    case 9009: /* reset */
> +        shift = INDICATOR_RESET_SHIFT;
> +        mask = INDICATOR_RESET_MASK;
> +        break;
> +    default:
> +        DPRINTF("rtas_set_indicator: indicator not implemented: %d",
> +                indicator);
> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> +        return;
> +    }
> +
> +    encoded = ENCODE_DRC_STATE(indicator_state, mask, shift);
> +    /* clear the current indicator value */
> +    *pind &= ~mask;
> +    /* set the new value */
> +    *pind |= encoded;
> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +}
> +
>  static int pci_spapr_swizzle(int slot, int pin)
>  {
>      return (slot + pin) % PCI_NUM_PINS;
> @@ -637,6 +721,14 @@ static int spapr_phb_init(SysBusDevice *s)
>          sphb->lsi_table[i].irq = irq;
>      }
>  
> +    /* make sure the platform EPOW sensor is initialized - the
> +     * guest will probe it when there is a hotplug event.
> +     */
> +    spapr->state &= ~(uint32_t)INDICATOR_EPOW_MASK;
> +    spapr->state |= ENCODE_DRC_STATE(0,
> +                                     INDICATOR_EPOW_MASK,
> +                                     INDICATOR_EPOW_SHIFT);
> +
>      return 0;
>  }
>  
> @@ -958,6 +1050,7 @@ void spapr_pci_rtas_init(void)
>                              rtas_ibm_query_interrupt_source_number);
>          spapr_rtas_register("ibm,change-msi", rtas_ibm_change_msi);
>      }
> +    spapr_rtas_register("set-indicator", rtas_set_indicator);
>  }
>  
>  static void spapr_pci_register_types(void)
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 6ae5c54..b48c55f 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -38,6 +38,9 @@ typedef struct sPAPREnvironment {
>      int htab_save_index;
>      bool htab_first_pass;
>      int htab_fd;
> +
> +    /* platform state - sensors and indicators */
> +    uint32_t state;
>  } sPAPREnvironment;
>  
>  #define H_SUCCESS         0
> @@ -334,6 +337,31 @@ DrcEntry *spapr_add_phb_to_drc_table(uint64_t buid, uint32_t state);
>  DrcEntry *spapr_phb_to_drc_entry(uint64_t buid);
>  DrcEntry *spapr_find_drc_entry(int drc_index);
>  
> +/* For set-indicator RTAS interface */
> +#define INDICATOR_ISOLATION_MASK            0x0001   /* 9001 one bit */
> +#define INDICATOR_GLOBAL_INTERRUPT_MASK     0x0002   /* 9005 one bit */
> +#define INDICATOR_ERROR_LOG_MASK            0x0004   /* 9006 one bit */
> +#define INDICATOR_IDENTIFY_MASK             0x0008   /* 9007 one bit */
> +#define INDICATOR_RESET_MASK                0x0010   /* 9009 one bit */
> +#define INDICATOR_DR_MASK                   0x00e0   /* 9002 three bits */
> +#define INDICATOR_ALLOCATION_MASK           0x0300   /* 9003 two bits */
> +#define INDICATOR_EPOW_MASK                 0x1c00   /* 9 three bits */
> +
> +#define INDICATOR_ISOLATION_SHIFT           0x00     /* bit 0 */
> +#define INDICATOR_GLOBAL_INTERRUPT_SHIFT    0x01     /* bit 1 */
> +#define INDICATOR_ERROR_LOG_SHIFT           0x02     /* bit 2 */
> +#define INDICATOR_IDENTIFY_SHIFT            0x03     /* bit 3 */
> +#define INDICATOR_RESET_SHIFT               0x04     /* bit 4 */
> +#define INDICATOR_DR_SHIFT                  0x05     /* bits 5-7 */
> +#define INDICATOR_ALLOCATION_SHIFT          0x08     /* bits 8-9 */
> +#define INDICATOR_EPOW_SHIFT                0x0a     /* bits 10-12 */
> +
> +#define DECODE_DRC_STATE(state, m, s)                  \
> +    ((((uint32_t)(state) & (uint32_t)(m))) >> (s))
> +
> +#define ENCODE_DRC_STATE(val, m, s) \
> +    (((uint32_t)(val) << (s)) & (uint32_t)(m))
> +


Why to put these definitions in the header when they are only used by the
spapr_pci.c? It gives me the (wrong) idea that these are shared between
files...



>  extern sPAPREnvironment *spapr;
>  
>  /*#define DEBUG_SPAPR_HCALLS*/
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/14] spapr_pci: enable basic hotplug operations
  2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 11/14] spapr_pci: enable basic hotplug operations Michael Roth
@ 2013-12-16  4:36   ` Alexey Kardashevskiy
  2014-01-16 21:22     ` Michael Roth
  0 siblings, 1 reply; 39+ messages in thread
From: Alexey Kardashevskiy @ 2013-12-16  4:36 UTC (permalink / raw
  To: Michael Roth, qemu-devel
  Cc: Michael S. Tsirkin, agraf, ncmike, paulus, tyreld, nfont,
	qemu-ppc

On 12/06/2013 09:33 AM, Michael Roth wrote:
> From: Mike Day <ncmike@ncultra.org>
> 
> This enables hotplug for PHB bridges. Upon hotplug we generate the
> OF-nodes required by PAPR specification and IEEE 1275-1994
> "PCI Bus Binding to Open Firmware" for the device.
> 
> We associate the corresponding FDT for these nodes with the DrcEntry
> corresponding to the slot, which will be fetched via
> ibm,configure-connector RTAS calls by the guest as described by PAPR
> specification. The FDT is cleaned up in the case of unplug.
> 
> Amongst the required OF-node properties for each device are the "reg"
> and "assigned-addresses" properties which describe the BAR-assignments
> for IO/MEM/ROM regions. To handle these assignments we scan the address
> space associated with each region for a contiguous range of appropriate
> size based on PCI specification and encode these in accordance with
> Open Firmware PCI Bus Binding spec.
> 
> These assignments will be used by the guest when the rpaphp hotplug
> module is used, but may be re-assigned by guests for cases where we
> rely on bus rescan.
> 
> Signed-off-by: Mike Day <ncmike@ncultra.org>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_pci.c     |  375 ++++++++++++++++++++++++++++++++++++++++++++++--
>  include/hw/ppc/spapr.h |    1 +
>  2 files changed, 368 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 6e7ee31..9b4f829 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -56,6 +56,17 @@
>  #define RTAS_TYPE_MSI           1
>  #define RTAS_TYPE_MSIX          2
>  
> +#define FDT_MAX_SIZE            0x10000
> +#define _FDT(exp) \
> +    do { \
> +        int ret = (exp);                                           \
> +        if (ret < 0) {                                             \
> +            return ret;                                            \
> +        }                                                          \
> +    } while (0)
> +
> +static void spapr_drc_state_reset(DrcEntry *drc_entry);
> +
>  static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
>  {
>      sPAPRPHBState *sphb;
> @@ -448,6 +459,22 @@ static void rtas_set_indicator(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>          /* encode the new value into the correct bit field */
>          shift = INDICATOR_ISOLATION_SHIFT;
>          mask = INDICATOR_ISOLATION_MASK;
> +        if (drc_entry) {
> +            /* transition from unisolated to isolated for a hotplug slot
> +             * entails completion of guest-side device unplug/cleanup, so
> +             * we can now safely remove the device if qemu is waiting for
> +             * it to be released
> +             */
> +            if (DECODE_DRC_STATE(*pind, mask, shift) != indicator_state) {
> +                if (indicator_state == 0 && drc_entry->awaiting_release) {
> +                    /* device_del has been called and host is waiting
> +                     * for guest to release/isolate device, go ahead
> +                     * and remove it now
> +                     */
> +                    spapr_drc_state_reset(drc_entry);
> +                }
> +            }
> +        }
>          break;
>      case 9002: /* DR */
>          shift = INDICATOR_DR_SHIFT;
> @@ -776,6 +803,345 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>      return &phb->iommu_as;
>  }
>  
> +/* for 'reg'/'assigned-addresses' OF properties */
> +#define RESOURCE_CELLS_SIZE 2
> +#define RESOURCE_CELLS_ADDRESS 3
> +#define RESOURCE_CELLS_TOTAL \
> +    (RESOURCE_CELLS_SIZE + RESOURCE_CELLS_ADDRESS)
> +
> +static void fill_resource_props(PCIDevice *d, int bus_num,
> +                                uint32_t *reg, int *reg_size,
> +                                uint32_t *assigned, int *assigned_size)
> +{
> +    uint32_t *reg_row, *assigned_row;
> +    uint32_t dev_id = ((bus_num << 8) |
> +                        (PCI_SLOT(d->devfn) << 3) | PCI_FUNC(d->devfn));
> +    int i, idx = 0;
> +
> +    reg[0] = cpu_to_be32(dev_id << 8);
> +
> +    for (i = 0; i < PCI_NUM_REGIONS; i++) {
> +        if (!d->io_regions[i].size) {
> +            continue;
> +        }
> +        reg_row = &reg[(idx + 1) * RESOURCE_CELLS_TOTAL];
> +        assigned_row = &assigned[idx * RESOURCE_CELLS_TOTAL];
> +        reg_row[0] = cpu_to_be32((dev_id << 8) | (pci_bar(d, i) & 0xff));
> +        if (d->io_regions[i].type & PCI_BASE_ADDRESS_SPACE_IO) {
> +            reg_row[0] |= cpu_to_be32(0x01000000);
> +        } else {
> +            reg_row[0] |= cpu_to_be32(0x02000000);
> +        }
> +        assigned_row[0] = cpu_to_be32(reg_row[0] | 0x80000000);


0x80000000 == relocatable? 0x01000000/0x02000000 - space codes? There are
macros (b_n, b_ss) in this file, can you please use them?


> +        assigned_row[3] = reg_row[3] = cpu_to_be32(d->io_regions[i].size >> 32);
> +        assigned_row[4] = reg_row[4] = cpu_to_be32(d->io_regions[i].size);
> +        assigned_row[1] = cpu_to_be32(d->io_regions[i].addr >> 32);
> +        assigned_row[2] = cpu_to_be32(d->io_regions[i].addr);
> +        idx++;
> +    }
> +
> +    *reg_size = (idx + 1) * RESOURCE_CELLS_TOTAL * sizeof(uint32_t);
> +    *assigned_size = idx * RESOURCE_CELLS_TOTAL * sizeof(uint32_t);
> +}
> +
> +static hwaddr spapr_find_bar_addr(sPAPRPHBState *phb, PCIIORegion *r)


This does not use @pbh at all and therefore can go to hw/pci/pci.c may be
(which can be tricky though)?


> +{
> +    MemoryRegionSection mrs = { 0 };
> +    hwaddr search_addr;
> +    hwaddr size = r->size;
> +    hwaddr addr_mask = ~(size - 1);
> +    hwaddr increment = size;
> +    hwaddr limit;
> +
> +    if (r->type == PCI_BASE_ADDRESS_SPACE_MEMORY) {
> +        /* beginning portion of mmio address space for bus does not get
> +         * mapped into system memory, so calculate addr starting at the
> +         * corresponding offset into mmio as.
> +         */
> +        search_addr = (SPAPR_PCI_MEM_WIN_BUS_OFFSET + increment) & addr_mask;
> +    } else {
> +        search_addr = increment;
> +    }
> +    limit = memory_region_size(r->address_space);
> +
> +    do {
> +        mrs = memory_region_find_subregion(r->address_space, search_addr, size);
> +        if (mrs.mr) {
> +            hwaddr mr_last_addr;
> +            mr_last_addr = mrs.mr->addr + memory_region_size(mrs.mr) - 1;
> +            search_addr = (mr_last_addr + 1) & addr_mask;
> +            if (search_addr <= mr_last_addr) {
> +                search_addr += increment;
> +            }
> +            /* this memory region overlaps, unref and continue searching */
> +            memory_region_unref(mrs.mr);
> +        }
> +    } while (int128_nz(mrs.size) && search_addr + size <= limit);
> +
> +    if (search_addr + size >= limit) {
> +        return PCI_BAR_UNMAPPED;
> +    }
> +
> +    return search_addr;
> +}
> +
> +static int spapr_map_bars(sPAPRPHBState *phb, PCIDevice *dev)

This does not use @phb, well, it uses to call spapr_find_bar_addr() but
that function does not use it either.

Yet another candidate to get moved to hw/pci/pci.c? If you do so, you'll
get even more reviews :)


> +{
> +    PCIIORegion *r;
> +    int i, ret = -1;
> +
> +    for (i = 0; i < PCI_NUM_REGIONS; i++) {
> +        uint32_t bar_address = pci_bar(dev, i);
> +        uint32_t bar_value;
> +        uint16_t cmd_value = pci_default_read_config(dev, PCI_COMMAND, 2);
> +        hwaddr addr;
> +
> +        r = &dev->io_regions[i];
> +
> +        /* this region isn't registered */
> +        if (!r->size) {
> +            continue;
> +        }
> +
> +        /* find a hw addr we can map */
> +        addr = spapr_find_bar_addr(phb, r);
> +        if (addr == PCI_BAR_UNMAPPED) {
> +            /* we can't find a free range within address space for this BAR */
> +            fprintf(stderr,
> +                    "Unable to map BAR %d, no free range available\n", i);
> +            return -1;
> +        }
> +        /* we can probably map this region into memory if there is not
> +         * a race condition with some other allocator. write the address
> +         * to the device BAR which will force a call to pci_update_mappings
> +         */
> +        if (r->type & PCI_BASE_ADDRESS_SPACE_IO) {
> +            pci_default_write_config(dev, PCI_COMMAND,
> +                                     cmd_value | PCI_COMMAND_IO, 2);
> +        } else {
> +            pci_default_write_config(dev, PCI_COMMAND,
> +                                     cmd_value | PCI_COMMAND_MEMORY, 2);
> +        }
> +
> +        bar_value = addr;
> +
> +        if (i == PCI_ROM_SLOT) {
> +            bar_value |= PCI_ROM_ADDRESS_ENABLE;
> +        }
> +        /* write the new bar value */
> +        pci_default_write_config(dev, bar_address, bar_value, 4);
> +
> +        /* if this is a 64-bit BAR, we need to also write the
> +         * upper 32 bit value.
> +         */
> +        if (r->type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
> +            bar_value = (addr >> 32) & 0xffffffffUL;
> +            pci_default_write_config(dev, bar_address + 4, bar_value, 4);
> +        }
> +        ret = 0;
> +    }
> +    return ret;
> +}
> +
> +static int spapr_populate_pci_child_dt(PCIDevice *dev, void *fdt, int offset,
> +                                       int phb_index)
> +{
> +    int slot = PCI_SLOT(dev->devfn);
> +    char slotname[16];
> +    bool is_bridge = 1;
> +    DrcEntry *drc_entry, *drc_entry_slot;
> +    uint32_t reg[RESOURCE_CELLS_TOTAL * 8] = { 0 };
> +    uint32_t assigned[RESOURCE_CELLS_TOTAL * 8] = { 0 };
> +    int reg_size, assigned_size;
> +
> +    drc_entry = spapr_phb_to_drc_entry(phb_index + SPAPR_PCI_BASE_BUID);
> +    g_assert(drc_entry);
> +    drc_entry_slot = &drc_entry->child_entries[slot];
> +
> +    if (pci_default_read_config(dev, PCI_HEADER_TYPE, 1) ==


s/1/PCI_HEADER_TYPE_BRIDGE/


> +        PCI_HEADER_TYPE_NORMAL) {
> +        is_bridge = 0;
> +    }
> +
> +    _FDT(fdt_setprop_cell(fdt, offset, "vendor-id",
> +                          pci_default_read_config(dev, PCI_VENDOR_ID, 2)));
> +    _FDT(fdt_setprop_cell(fdt, offset, "device-id",
> +                          pci_default_read_config(dev, PCI_DEVICE_ID, 2)));
> +    _FDT(fdt_setprop_cell(fdt, offset, "revision-id",
> +                          pci_default_read_config(dev, PCI_REVISION_ID, 1)));
> +    _FDT(fdt_setprop_cell(fdt, offset, "class-code",
> +                          pci_default_read_config(dev, PCI_CLASS_DEVICE, 2) << 8));
> +
> +    _FDT(fdt_setprop_cell(fdt, offset, "interrupts",
> +                          pci_default_read_config(dev, PCI_INTERRUPT_PIN, 1)));
> +
> +    /* if this device is NOT a bridge */
> +    if (!is_bridge) {


s/!is_bridge/pci_default_read_config(dev, PCI_HEADER_TYPE, 1) ==
PCI_HEADER_TYPE_NORMAL/

and get rid of is_bridge?



> +        _FDT(fdt_setprop_cell(fdt, offset, "min-grant",
> +            pci_default_read_config(dev, PCI_MIN_GNT, 1)));
> +        _FDT(fdt_setprop_cell(fdt, offset, "max-latency",
> +            pci_default_read_config(dev, PCI_MAX_LAT, 1)));
> +        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-id",
> +            pci_default_read_config(dev, PCI_SUBSYSTEM_ID, 2)));
> +        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-vendor-id",
> +            pci_default_read_config(dev, PCI_SUBSYSTEM_VENDOR_ID, 2)));
> +    }
> +
> +    _FDT(fdt_setprop_cell(fdt, offset, "cache-line-size",
> +        pci_default_read_config(dev, PCI_CACHE_LINE_SIZE, 1)));
> +
> +    /* the following fdt cells are masked off the pci status register */
> +    int pci_status = pci_default_read_config(dev, PCI_STATUS, 2);
> +    _FDT(fdt_setprop_cell(fdt, offset, "devsel-speed",
> +                          PCI_STATUS_DEVSEL_MASK & pci_status));
> +    _FDT(fdt_setprop_cell(fdt, offset, "fast-back-to-back",
> +                          PCI_STATUS_FAST_BACK & pci_status));
> +    _FDT(fdt_setprop_cell(fdt, offset, "66mhz-capable",
> +                          PCI_STATUS_66MHZ & pci_status));
> +    _FDT(fdt_setprop_cell(fdt, offset, "udf-supported",
> +                          PCI_STATUS_UDF & pci_status));
> +
> +    _FDT(fdt_setprop_string(fdt, offset, "name", "pci"));
> +    sprintf(slotname, "Slot %d", slot + phb_index * 32);
> +    _FDT(fdt_setprop(fdt, offset, "ibm,loc-code", slotname, strlen(slotname)));
> +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,my-drc-index",
> +                          drc_entry_slot->drc_index));
> +
> +    _FDT(fdt_setprop_cell(fdt, offset, "#address-cells",
> +                          RESOURCE_CELLS_ADDRESS));
> +    _FDT(fdt_setprop_cell(fdt, offset, "#size-cells",
> +                          RESOURCE_CELLS_SIZE));
> +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,req#msi-x",
> +                          RESOURCE_CELLS_SIZE));
> +    fill_resource_props(dev, phb_index, reg, &reg_size,
> +                        assigned, &assigned_size);
> +    _FDT(fdt_setprop(fdt, offset, "reg", reg, reg_size));
> +    _FDT(fdt_setprop(fdt, offset, "assigned-addresses",
> +                     assigned, assigned_size));
> +
> +    return 0;
> +}
> +
> +static int spapr_device_hotplug_add(DeviceState *qdev, PCIDevice *dev)
> +{
> +    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(qdev);
> +    DrcEntry *drc_entry, *drc_entry_slot;
> +    ConfigureConnectorState *ccs;
> +    int slot = PCI_SLOT(dev->devfn);
> +    int offset, ret;
> +    void *fdt_orig, *fdt;
> +    char nodename[512];
> +    uint32_t encoded = ENCODE_DRC_STATE(INDICATOR_ENTITY_SENSE_PRESENT,
> +                                        INDICATOR_ENTITY_SENSE_MASK,
> +                                        INDICATOR_ENTITY_SENSE_SHIFT);
> +
> +    drc_entry = spapr_phb_to_drc_entry(phb->buid);
> +    g_assert(drc_entry);
> +    drc_entry_slot = &drc_entry->child_entries[slot];
> +
> +    drc_entry->state &= ~(uint32_t)INDICATOR_ENTITY_SENSE_MASK;
> +    drc_entry->state |= encoded; /* DR entity present */
> +    drc_entry_slot->state &= ~(uint32_t)INDICATOR_ENTITY_SENSE_MASK;
> +    drc_entry_slot->state |= encoded; /* and the slot */


"and the slot" what?
s/uint32_t encoded/const uint32_t present/ and remove the comments?


> +    /* need to allocate memory region for device BARs */
> +    spapr_map_bars(phb, dev);
> +
> +    /* add OF node for pci device and required OF DT properties */
> +    fdt_orig = g_malloc0(FDT_MAX_SIZE);
> +    offset = fdt_create(fdt_orig, FDT_MAX_SIZE);
> +    fdt_begin_node(fdt_orig, "");
> +    fdt_end_node(fdt_orig);
> +    fdt_finish(fdt_orig);
> +
> +    fdt = g_malloc0(FDT_MAX_SIZE);
> +    fdt_open_into(fdt_orig, fdt, FDT_MAX_SIZE);
> +    sprintf(nodename, "pci@%d", slot);
> +    offset = fdt_add_subnode(fdt, 0, nodename);
> +    ret = spapr_populate_pci_child_dt(dev, fdt, offset, phb->index);
> +    g_assert(!ret);
> +    g_free(fdt_orig);
> +
> +    /* hold on to node, configure_connector will pass it to the guest later */
> +    ccs = &drc_entry_slot->cc_state;
> +    ccs->fdt = fdt;
> +    ccs->offset_start = offset;
> +    ccs->state = CC_STATE_PENDING;
> +    ccs->dev = dev;
> +
> +    return 0;
> +}
> +
> +/* check whether guest has released/isolated device */
> +static bool spapr_drc_state_is_releasable(DrcEntry *drc_entry)
> +{
> +    return !DECODE_DRC_STATE(drc_entry->state,
> +                             INDICATOR_ISOLATION_MASK,
> +                             INDICATOR_ISOLATION_SHIFT);
> +}

It looks like this is the only separated function which calls
DECODE_DRC_STATE, and it is used just once, and  others call
DECODE_DRC_STATE()/ENCODE_DRC_STATE() directly. I'd remove this function
and call DECODE_DRC_STATE() directly, below in the code.


> +
> +/* finalize device unplug/deletion */
> +static void spapr_drc_state_reset(DrcEntry *drc_entry)
> +{
> +    ConfigureConnectorState *ccs = &drc_entry->cc_state;
> +    uint32_t sense_empty = ENCODE_DRC_STATE(INDICATOR_ENTITY_SENSE_EMPTY,
> +                                            INDICATOR_ENTITY_SENSE_MASK,
> +                                            INDICATOR_ENTITY_SENSE_SHIFT);
> +
> +    g_free(ccs->fdt);
> +    ccs->fdt = NULL;
> +    object_unparent(OBJECT(ccs->dev));
> +    ccs->dev = NULL;
> +    ccs->state = CC_STATE_IDLE;
> +    drc_entry->state &= ~INDICATOR_ENTITY_SENSE_MASK;
> +    drc_entry->state |= sense_empty;
> +    drc_entry->awaiting_release = false;
> +}
> +
> +static void spapr_device_hotplug_remove(DeviceState *qdev, PCIDevice *dev)
> +{
> +    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(qdev);
> +    DrcEntry *drc_entry, *drc_entry_slot;
> +    ConfigureConnectorState *ccs;
> +    int slot = PCI_SLOT(dev->devfn);
> +
> +    drc_entry = spapr_phb_to_drc_entry(phb->buid);
> +    g_assert(drc_entry);
> +    drc_entry_slot = &drc_entry->child_entries[slot];
> +    ccs = &drc_entry_slot->cc_state;
> +    /* shouldn't be removing devices we haven't created an fdt for */
> +    g_assert(ccs->state != CC_STATE_IDLE);


Instead of g_assert(), would not it be better to return -1 here and then
return this return code from spapr_device_hotplug() and let common PCI code
handle this?

Or we are absolutely sure that spapr_device_hotplug() cannot possibly fail
so we are ready to kill the guest if it does? I do not know, just asking :)


> +    /* if the device has already been released/isolated by guest, go ahead
> +     * and remove it now. Otherwise, flag it as pending guest release so it
> +     * can be removed later
> +     */
> +    if (spapr_drc_state_is_releasable(drc_entry_slot)) {
> +        spapr_drc_state_reset(drc_entry_slot);
> +    } else {
> +        if (drc_entry_slot->awaiting_release) {
> +            fprintf(stderr, "waiting for guest to release the device");
> +        } else {
> +            drc_entry_slot->awaiting_release = true;
> +        }
> +    }
> +}
> +
> +static int spapr_device_hotplug(DeviceState *qdev, PCIDevice *dev,
> +                                PCIHotplugState state)
> +{

sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(qdev);

> +    if (state == PCI_COLDPLUG_ENABLED) {
> +        return 0;
> +    }
> +
> +    if (state == PCI_HOTPLUG_ENABLED) {
> +        spapr_device_hotplug_add(qdev, dev);
> +    } else {
> +        spapr_device_hotplug_remove(qdev, dev);
> +    }

and here s/qdev/phb/? spapr_device_hotplug_(add|remove),
spapr_pci_hotplug_(add|remove)_event (from further patch(es)) do not use
qdev as a DeviceState anyway, they cast it to sPAPRPHBState and use that.



> +
> +    return 0;
> +}
> +
>  static int spapr_phb_init(SysBusDevice *s)
>  {
>      DeviceState *dev = DEVICE(s);
> @@ -889,6 +1255,7 @@ static int spapr_phb_init(SysBusDevice *s)
>                             &sphb->memspace, &sphb->iospace,
>                             PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
>      phb->bus = bus;
> +    pci_bus_hotplug(phb->bus, spapr_device_hotplug, DEVICE(sphb));
>  
>      sphb->dma_window_start = 0;
>      sphb->dma_window_size = 0x40000000;
> @@ -1181,14 +1548,6 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
>          return bus_off;
>      }
>  
> -#define _FDT(exp) \
> -    do { \
> -        int ret = (exp);                                           \
> -        if (ret < 0) {                                             \
> -            return ret;                                            \
> -        }                                                          \
> -    } while (0)
> -
>      /* Write PHB properties */
>      _FDT(fdt_setprop_string(fdt, bus_off, "device_type", "pci"));
>      _FDT(fdt_setprop_string(fdt, bus_off, "compatible", "IBM,Logical_PHB"));
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 7c8a521..1c9b725 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -328,6 +328,7 @@ struct DrcEntry {
>      void *fdt;
>      int fdt_offset;
>      uint32_t state;
> +    bool awaiting_release;
>      ConfigureConnectorState cc_state;
>      DrcEntry *child_entries;
>  };
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/14] spapr: populate DRC entries for root dt node
  2013-12-16  2:59   ` Alexey Kardashevskiy
@ 2013-12-16  4:54     ` Alexey Kardashevskiy
  2014-01-16 20:51       ` Michael Roth
  0 siblings, 1 reply; 39+ messages in thread
From: Alexey Kardashevskiy @ 2013-12-16  4:54 UTC (permalink / raw
  To: Michael Roth, qemu-devel; +Cc: agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

On 12/16/2013 01:59 PM, Alexey Kardashevskiy wrote:
> On 12/06/2013 09:32 AM, Michael Roth wrote:
>> From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>>
>> This add entries to the root OF node to advertise our PHBs as being
>> DR-capable in according with PAPR specification.
>>
>> Each PHB is given a name of PHB<bus#>, advertised as a PHB type,
>> and associated with a power domain of -1 (indicating to guests that
>> power management is handled automatically by hardware).
>>
>> We currently allocate entries for up to 32 DR-capable PHBs, though
>> this limit can be increased later.
>>
>> DrcEntry objects to track the state of the DR-connector associated
>> with each PHB are stored in a 32-entry array, and each DrcEntry has
>> in turn have a dynamically-sized number of child DR-connectors,
>> which we will use later to track the state of DR-connectors
>> associated with a PHB's physical slots.
>>
>> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
>> ---
>>  hw/ppc/spapr.c         |  132 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  include/hw/ppc/spapr.h |   33 ++++++++++++
>>  2 files changed, 165 insertions(+)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 7e53a5f..ec3ba43 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -81,6 +81,7 @@
>>  #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
>>  
>>  sPAPREnvironment *spapr;
>> +DrcEntry drc_table[SPAPR_DRC_TABLE_SIZE];
>>  
>>  int spapr_allocate_irq(int hint, bool lsi)
>>  {
>> @@ -276,6 +277,130 @@ static size_t create_page_sizes_prop(CPUPPCState *env, uint32_t *prop,
>>      return (p - prop) * sizeof(uint32_t);
>>  }
>>  
>> +static void spapr_init_drc_table(void)
>> +{
>> +    int i;
>> +
>> +    memset(drc_table, 0, sizeof(drc_table));
>> +
>> +    /* For now we only care about PHB entries */
>> +    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
>> +        drc_table[i].drc_index = 0x2000001 + i;
>> +    }
>> +}
>> +
>> +DrcEntry *spapr_add_phb_to_drc_table(uint64_t buid, uint32_t state)
>> +{
>> +    DrcEntry *empty_drc = NULL;
>> +    DrcEntry *found_drc = NULL;
>> +    int i, phb_index;
>> +
>> +    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
>> +        if (drc_table[i].phb_buid == 0) {
>> +            empty_drc = &drc_table[i];
>> +        }
>> +
>> +        if (drc_table[i].phb_buid == buid) {
>> +            found_drc = &drc_table[i];
>> +            break;
>> +        }
>> +    }
>> +
>> +    if (found_drc) {
>> +        return found_drc;
>> +    }
>> +
>> +    if (empty_drc) {
>> +        empty_drc->phb_buid = buid;
>> +        empty_drc->state = state;
>> +        empty_drc->cc_state.fdt = NULL;
>> +        empty_drc->cc_state.offset = 0;
>> +        empty_drc->cc_state.depth = 0;
>> +        empty_drc->cc_state.state = CC_STATE_IDLE;
>> +        empty_drc->child_entries =
>> +            g_malloc0(sizeof(DrcEntry) * SPAPR_DRC_PHB_SLOT_MAX);
>> +        phb_index = buid - SPAPR_PCI_BASE_BUID;
>> +        for (i = 0; i < SPAPR_DRC_PHB_SLOT_MAX; i++) {
>> +            empty_drc->child_entries[i].drc_index =
>> +                SPAPR_DRC_DEV_ID_BASE + (phb_index << 8) + (i << 3);
>> +        }
>> +        return empty_drc;
>> +    }
>> +
>> +    return NULL;
>> +}
>> +
>> +static void spapr_create_drc_dt_entries(void *fdt)
>> +{
>> +    char char_buf[1024];
>> +    uint32_t int_buf[SPAPR_DRC_TABLE_SIZE + 1];
>> +    uint32_t *entries;
>> +    int offset, fdt_offset;
>> +    int i, ret;
>> +
>> +    fdt_offset = fdt_path_offset(fdt, "/");
>> +
>> +    /* ibm,drc-indexes */
>> +    memset(int_buf, 0, sizeof(int_buf));
>> +    int_buf[0] = SPAPR_DRC_TABLE_SIZE;
>> +
>> +    for (i = 1; i <= SPAPR_DRC_TABLE_SIZE; i++) {
>> +        int_buf[i] = drc_table[i-1].drc_index;
>> +    }
>> +
>> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-indexes", int_buf,
>> +                      sizeof(int_buf));
>> +    if (ret) {
>> +        fprintf(stderr, "Couldn't finalize ibm,drc-indexes property\n");
>> +    }
>> +
>> +    /* ibm,drc-power-domains */
>> +    memset(int_buf, 0, sizeof(int_buf));
>> +    int_buf[0] = SPAPR_DRC_TABLE_SIZE;
>> +
>> +    for (i = 1; i <= SPAPR_DRC_TABLE_SIZE; i++) {
>> +        int_buf[i] = 0xffffffff;
>> +    }
>> +
>> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-power-domains", int_buf,
>> +                      sizeof(int_buf));
>> +    if (ret) {
>> +        fprintf(stderr, "Couldn't finalize ibm,drc-power-domains property\n");
>> +    }
>> +
>> +    /* ibm,drc-names */
>> +    memset(char_buf, 0, sizeof(char_buf));
>> +    entries = (uint32_t *)&char_buf[0];
>> +    *entries = SPAPR_DRC_TABLE_SIZE;
>> +    offset = sizeof(*entries);
>> +
>> +    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
>> +        offset += sprintf(char_buf + offset, "PHB %d", i + 1);
>> +        char_buf[offset++] = '\0';
>> +    }
>> +
>> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-names", char_buf, offset);
>> +    if (ret) {
>> +        fprintf(stderr, "Couldn't finalize ibm,drc-names property\n");
>> +    }
>> +
>> +    /* ibm,drc-types */
>> +    memset(char_buf, 0, sizeof(char_buf));
>> +    entries = (uint32_t *)&char_buf[0];
>> +    *entries = SPAPR_DRC_TABLE_SIZE;
>> +    offset = sizeof(*entries);
>> +
>> +    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
>> +        offset += sprintf(char_buf + offset, "PHB");
>> +        char_buf[offset++] = '\0';
>> +    }
>> +
>> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-types", char_buf, offset);
>> +    if (ret) {
>> +        fprintf(stderr, "Couldn't finalize ibm,drc-types property\n");
>> +    }
>> +}
>> +
>>  #define _FDT(exp) \
>>      do { \
>>          int ret = (exp);                                           \
>> @@ -307,6 +432,8 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
>>      int i, smt = kvmppc_smt_threads();
>>      unsigned char vec5[] = {0x0, 0x0, 0x0, 0x0, 0x0, 0x80};
>>  
>> +    spapr_init_drc_table();
>> +
>>      fdt = g_malloc0(FDT_MAX_SIZE);
>>      _FDT((fdt_create(fdt, FDT_MAX_SIZE)));
>>  
>> @@ -590,6 +717,7 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
>>      int ret;
>>      void *fdt;
>>      sPAPRPHBState *phb;
>> +    DrcEntry *drc_entry;
>>  
>>      fdt = g_malloc(FDT_MAX_SIZE);
>>  
>> @@ -609,6 +737,8 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
>>      }
>>  
>>      QLIST_FOREACH(phb, &spapr->phbs, list) {
>> +        drc_entry = spapr_add_phb_to_drc_table(phb->buid, 2 /* Unusable */);
>> +        g_assert(drc_entry);
>>          ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt);
>>      }
>>  
>> @@ -633,6 +763,8 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
>>          spapr_populate_chosen_stdout(fdt, spapr->vio_bus);
>>      }
>>  
>> +    spapr_create_drc_dt_entries(fdt);
>> +
>>      _FDT((fdt_pack(fdt)));
>>  
>>      if (fdt_totalsize(fdt) > FDT_MAX_SIZE) {
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index b2f11e9..0f2e705 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -299,6 +299,39 @@ typedef struct sPAPREnvironment {
>>  #define KVMPPC_H_LOGICAL_MEMOP  (KVMPPC_HCALL_BASE + 0x1)
>>  #define KVMPPC_HCALL_MAX        KVMPPC_H_LOGICAL_MEMOP
>>  
>> +/* For dlparable/hotpluggable slots */
>> +#define SPAPR_DRC_TABLE_SIZE    32
>> +#define SPAPR_DRC_PHB_SLOT_MAX  32
>> +#define SPAPR_DRC_DEV_ID_BASE   0x40000000
>> +
>> +typedef struct ConfigureConnectorState {
>> +    void *fdt;
>> +    int offset_start;
>> +    int offset;
>> +    int depth;
>> +    PCIDevice *dev;
>> +    enum {
>> +        CC_STATE_IDLE = 0,
>> +        CC_STATE_PENDING = 1,
>> +        CC_STATE_ACTIVE,
>> +    } state;
>> +} ConfigureConnectorState;
>> +
>> +typedef struct DrcEntry DrcEntry;
>> +
>> +struct DrcEntry {
>> +    uint32_t drc_index;
>> +    uint64_t phb_buid;
>> +    void *fdt;
>> +    int fdt_offset;
>> +    uint32_t state;
>> +    ConfigureConnectorState cc_state;
>> +    DrcEntry *child_entries;
>> +};
>> +
>> +extern DrcEntry drc_table[SPAPR_DRC_TABLE_SIZE];
>> +DrcEntry *spapr_add_phb_to_drc_table(uint64_t buid, uint32_t state);
>> +
>>  extern sPAPREnvironment *spapr;
> 
> So far we were trying to keep everything sPAPR-related in sPAPREnvironment.
> Is @drc_table really that special?


One more note - we are trying to add a "spapr" or "sPAPR" prefix to all
global types defines in headers (such as sPAPRPHBState, spapr_pci_lsi,
VIOsPAPRBus, sPAPREnvironment), it would be nice to have "spapr" in some
form in these new types too.

Or we could move the whole patch (except spapr_create_drc_dt_entries()) to
hw/ppc/spapr_pci.c (and keep the original names) as it seems to be the only
user of the whole DrcEntry and ConfigureConnectorState thing.
And put a pointer to drc_table[] into @spapr (or make it static?)

The only remaining user of DrcEntry is spapr_hotplug_req_event() but this
can be easily fixed by small helper like this:

int spapr_phb_slot_to_drc_index(uint64_t buid, int slot)
{
	DrcEntry *drc_entry = spapr_phb_to_drc_entry(phb->buid);
	if (!drc_entry) {
    		drc_entry = spapr_add_phb_to_drc_table(phb->buid, 2);
	}
    	return drc_entry->child_entries[slot].drc_index;
}


> 
> 
>>  
>>  /*#define DEBUG_SPAPR_HCALLS*/
>>
> 
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 13/14] spapr_events: event-scan RTAS interface
  2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 13/14] spapr_events: event-scan RTAS interface Michael Roth
@ 2013-12-16  4:57   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 39+ messages in thread
From: Alexey Kardashevskiy @ 2013-12-16  4:57 UTC (permalink / raw
  To: Michael Roth, qemu-devel; +Cc: agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

On 12/06/2013 09:33 AM, Michael Roth wrote:
> From: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> 
> We don't actually rely on this interface to surface hotplug events, and
> instead rely on the similar-but-interrupt-driven check-exception RTAS
> interface used for EPOW events. However, the existence of this interface
> is needed to ensure guest kernels initialize the event-reporting
> interfaces which will in turn be used by userspace tools to handle these
> events, so we implement this interface as a stub.
> 
> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c         |    1 +
>  hw/ppc/spapr_events.c  |    9 +++++++++
>  include/hw/ppc/spapr.h |    2 ++
>  3 files changed, 12 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 7079e4e..e7a249b 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -643,6 +643,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
>          refpoints, sizeof(refpoints))));
>  
>      _FDT((fdt_property_cell(fdt, "rtas-error-log-max", RTAS_ERROR_LOG_MAX)));
> +    _FDT((fdt_property_cell(fdt, "rtas-event-scan-rate", RTAS_EVENT_SCAN_RATE)));
>  
>      _FDT((fdt_end_node(fdt)));
>  
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 9dfdbcf..69211c5 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -453,10 +453,19 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>      }
>  }
>  
> +static void event_scan(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> +                            uint32_t token, uint32_t nargs,
> +                            target_ulong args,
> +                            uint32_t nret, target_ulong rets)
> +{
> +    rtas_st(rets, 0, 1); /* no error events found */


s/1/RTAS_OUT_NO_ERRORS_FOUND/ ?


> +}
> +
>  void spapr_events_init(sPAPREnvironment *spapr)
>  {
>      spapr->check_exception_irq = spapr_allocate_msi(0);
>      spapr->epow_notifier.notify = spapr_powerdown_req;
>      qemu_register_powerdown_notifier(&spapr->epow_notifier);
>      spapr_rtas_register("check-exception", check_exception);
> +    spapr_rtas_register("event-scan", event_scan);
>  }
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 9eef2ce..293375b 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -445,6 +445,8 @@ int spapr_rtas_device_tree_setup(void *fdt, hwaddr rtas_addr,
>  
>  #define RTAS_ERROR_LOG_MAX      2048
>  
> +#define RTAS_EVENT_SCAN_RATE    1
> +
>  typedef struct sPAPRTCETable sPAPRTCETable;
>  
>  #define TYPE_SPAPR_TCE_TABLE "spapr-tce-table"
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 12/14] spapr_events: re-use EPOW event infrastructure for hotplug events
  2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 12/14] spapr_events: re-use EPOW event infrastructure for hotplug events Michael Roth
@ 2013-12-16  5:05   ` Alexey Kardashevskiy
  2014-01-16 21:32     ` Michael Roth
  0 siblings, 1 reply; 39+ messages in thread
From: Alexey Kardashevskiy @ 2013-12-16  5:05 UTC (permalink / raw
  To: Michael Roth, qemu-devel; +Cc: agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

On 12/06/2013 09:33 AM, Michael Roth wrote:
> From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> 
> This extends the data structures currently used to report EPOW events to
> gets via the check-exception RTAS interfaces to also include event types
> for hotplug/unplug events.
> 
> This is currently undocumented and being finalized for inclusion in PAPR
> specification, but we implement this here as an extension for guest
> userspace tools to implement (existing guest kernels simply log these
> events via a sysfs interface that's read by rtas_errd).
> 
> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c         |    2 +-
>  hw/ppc/spapr_events.c  |  219 +++++++++++++++++++++++++++++++++++++++---------
>  include/hw/ppc/spapr.h |    4 +-
>  3 files changed, 184 insertions(+), 41 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 2250ee1..7079e4e 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1522,7 +1522,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
>      spapr->fdt_skel = spapr_create_fdt_skel(initrd_base, initrd_size,
>                                              kernel_size, kernel_le,
>                                              boot_device, kernel_cmdline,
> -                                            spapr->epow_irq);
> +                                            spapr->check_exception_irq);
>      assert(spapr->fdt_skel != NULL);
>  }
>  
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 16fa49e..9dfdbcf 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -32,6 +32,8 @@
>  
>  #include "hw/ppc/spapr.h"
>  #include "hw/ppc/spapr_vio.h"
> +#include "hw/pci/pci.h"
> +#include "hw/pci-host/spapr.h"
>  
>  #include <libfdt.h>
>  
> @@ -77,6 +79,7 @@ struct rtas_error_log {
>  #define   RTAS_LOG_TYPE_ECC_UNCORR              0x00000009
>  #define   RTAS_LOG_TYPE_ECC_CORR                0x0000000a
>  #define   RTAS_LOG_TYPE_EPOW                    0x00000040
> +#define   RTAS_LOG_TYPE_HOTPLUG                 0x000000e5
>      uint32_t extended_length;
>  } QEMU_PACKED;
>  
> @@ -166,6 +169,38 @@ struct epow_log_full {
>      struct rtas_event_log_v6_epow epow;
>  } QEMU_PACKED;
>  
> +struct rtas_event_log_v6_hp {
> +#define RTAS_LOG_V6_SECTION_ID_HOTPLUG              0x4850 /* HP */
> +    struct rtas_event_log_v6_section_header hdr;
> +    uint8_t hotplug_type;
> +#define RTAS_LOG_V6_HP_TYPE_CPU                          1
> +#define RTAS_LOG_V6_HP_TYPE_MEMORY                       2
> +#define RTAS_LOG_V6_HP_TYPE_SLOT                         3
> +#define RTAS_LOG_V6_HP_TYPE_PHB                          4
> +#define RTAS_LOG_V6_HP_TYPE_PCI                          5
> +    uint8_t hotplug_action;
> +#define RTAS_LOG_V6_HP_ACTION_ADD                        1
> +#define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
> +    uint8_t hotplug_identifier;
> +#define RTAS_LOG_V6_HP_ID_DRC_NAME                       1
> +#define RTAS_LOG_V6_HP_ID_DRC_INDEX                      2
> +#define RTAS_LOG_V6_HP_ID_DRC_COUNT                      3
> +    uint8_t reserved;
> +    union {
> +        uint32_t index;
> +        uint32_t count;
> +        char name[1];
> +    } drc;
> +} QEMU_PACKED;
> +
> +struct hp_log_full {
> +    struct rtas_error_log hdr;
> +    struct rtas_event_log_v6 v6hdr;
> +    struct rtas_event_log_v6_maina maina;
> +    struct rtas_event_log_v6_mainb mainb;
> +    struct rtas_event_log_v6_hp hp;
> +} QEMU_PACKED;
> +
>  #define EVENT_MASK_INTERNAL_ERRORS           0x80000000
>  #define EVENT_MASK_EPOW                      0x40000000
>  #define EVENT_MASK_HOTPLUG                   0x10000000
> @@ -181,29 +216,61 @@ struct epow_log_full {
>          }                                                          \
>      } while (0)
>  
> -void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq)
> +void spapr_events_fdt_skel(void *fdt, uint32_t check_exception_irq)
>  {
> -    uint32_t epow_irq_ranges[] = {cpu_to_be32(epow_irq), cpu_to_be32(1)};
> -    uint32_t epow_interrupts[] = {cpu_to_be32(epow_irq), 0};
> +    uint32_t irq_ranges[] = {cpu_to_be32(check_exception_irq), cpu_to_be32(1)};
> +    uint32_t interrupts[] = {cpu_to_be32(check_exception_irq), 0};
>  
>      _FDT((fdt_begin_node(fdt, "event-sources")));
>  
>      _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
>      _FDT((fdt_property_cell(fdt, "#interrupt-cells", 2)));
>      _FDT((fdt_property(fdt, "interrupt-ranges",
> -                       epow_irq_ranges, sizeof(epow_irq_ranges))));
> +                       irq_ranges, sizeof(irq_ranges))));
>  
>      _FDT((fdt_begin_node(fdt, "epow-events")));
> -    _FDT((fdt_property(fdt, "interrupts",
> -                       epow_interrupts, sizeof(epow_interrupts))));
> +    _FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts))));
>      _FDT((fdt_end_node(fdt)));
>  
>      _FDT((fdt_end_node(fdt)));
>  }
>  
>  static struct epow_log_full *pending_epow;
> +static struct hp_log_full *pending_hp;
>  static uint32_t next_plid;
>  
> +static void spapr_init_v6hdr(struct rtas_event_log_v6 *v6hdr)
> +{
> +    v6hdr->b0 = RTAS_LOG_V6_B0_VALID | RTAS_LOG_V6_B0_NEW_LOG
> +        | RTAS_LOG_V6_B0_BIGENDIAN;
> +    v6hdr->b2 = RTAS_LOG_V6_B2_POWERPC_FORMAT
> +        | RTAS_LOG_V6_B2_LOG_FORMAT_PLATFORM_EVENT;
> +    v6hdr->company = cpu_to_be32(RTAS_LOG_V6_COMPANY_IBM);
> +}
> +
> +static void spapr_init_maina(struct rtas_event_log_v6_maina *maina,
> +                             int section_count)
> +{
> +    struct tm tm;
> +    int year;
> +
> +    maina->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINA);
> +    maina->hdr.section_length = cpu_to_be16(sizeof(*maina));
> +    /* FIXME: section version, subtype and creator id? */
> +    qemu_get_timedate(&tm, spapr->rtc_offset);
> +    year = tm.tm_year + 1900;
> +    maina->creation_date = cpu_to_be32((to_bcd(year / 100) << 24)
> +                                       | (to_bcd(year % 100) << 16)
> +                                       | (to_bcd(tm.tm_mon + 1) << 8)
> +                                       | to_bcd(tm.tm_mday));
> +    maina->creation_time = cpu_to_be32((to_bcd(tm.tm_hour) << 24)
> +                                       | (to_bcd(tm.tm_min) << 16)
> +                                       | (to_bcd(tm.tm_sec) << 8));
> +    maina->creator_id = 'H'; /* Hypervisor */
> +    maina->section_count = section_count;
> +    maina->plid = next_plid++;
> +}
> +
>  static void spapr_powerdown_req(Notifier *n, void *opaque)
>  {
>      sPAPREnvironment *spapr = container_of(n, sPAPREnvironment, epow_notifier);
> @@ -212,8 +279,6 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
>      struct rtas_event_log_v6_maina *maina;
>      struct rtas_event_log_v6_mainb *mainb;
>      struct rtas_event_log_v6_epow *epow;
> -    struct tm tm;
> -    int year;
>  
>      if (pending_epow) {
>          /* For now, we just throw away earlier events if two come
> @@ -237,27 +302,8 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
>      hdr->extended_length = cpu_to_be32(sizeof(*pending_epow)
>                                         - sizeof(pending_epow->hdr));
>  
> -    v6hdr->b0 = RTAS_LOG_V6_B0_VALID | RTAS_LOG_V6_B0_NEW_LOG
> -        | RTAS_LOG_V6_B0_BIGENDIAN;
> -    v6hdr->b2 = RTAS_LOG_V6_B2_POWERPC_FORMAT
> -        | RTAS_LOG_V6_B2_LOG_FORMAT_PLATFORM_EVENT;
> -    v6hdr->company = cpu_to_be32(RTAS_LOG_V6_COMPANY_IBM);
> -
> -    maina->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINA);
> -    maina->hdr.section_length = cpu_to_be16(sizeof(*maina));
> -    /* FIXME: section version, subtype and creator id? */
> -    qemu_get_timedate(&tm, spapr->rtc_offset);
> -    year = tm.tm_year + 1900;
> -    maina->creation_date = cpu_to_be32((to_bcd(year / 100) << 24)
> -                                       | (to_bcd(year % 100) << 16)
> -                                       | (to_bcd(tm.tm_mon + 1) << 8)
> -                                       | to_bcd(tm.tm_mday));
> -    maina->creation_time = cpu_to_be32((to_bcd(tm.tm_hour) << 24)
> -                                       | (to_bcd(tm.tm_min) << 16)
> -                                       | (to_bcd(tm.tm_sec) << 8));
> -    maina->creator_id = 'H'; /* Hypervisor */
> -    maina->section_count = 3; /* Main-A, Main-B and EPOW */
> -    maina->plid = next_plid++;
> +    spapr_init_v6hdr(v6hdr);
> +    spapr_init_maina(maina, 3 /* Main-A, Main-B and EPOW */);
>  
>      mainb->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINB);
>      mainb->hdr.section_length = cpu_to_be16(sizeof(*mainb));
> @@ -274,9 +320,93 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
>      epow->event_modifier = RTAS_LOG_V6_EPOW_MODIFIER_NORMAL;
>      epow->extended_modifier = RTAS_LOG_V6_EPOW_XMODIFIER_PARTITION_SPECIFIC;
>  
> -    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->epow_irq));
> +    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
> +}
> +
> +static void spapr_hotplug_req_event(uint8_t hp_type, uint8_t hp_action,
> +                                    sPAPRPHBState *phb, int slot)


This only uses a @buid from sPAPRPHBState, what is the point in passing the
while struct? Any plans to use other fields there?


> +{
> +    struct rtas_error_log *hdr;
> +    struct rtas_event_log_v6 *v6hdr;
> +    struct rtas_event_log_v6_maina *maina;
> +    struct rtas_event_log_v6_mainb *mainb;
> +    struct rtas_event_log_v6_hp *hp;
> +    DrcEntry *drc_entry;
> +
> +    if (pending_hp) {
> +        /* Just toss any pending hotplug events for now, this will
> +         * need to be fixed later on.
> +         */
> +        g_free(pending_hp);
> +    }
> +
> +    pending_hp = g_malloc0(sizeof(*pending_hp));
> +    hdr = &pending_hp->hdr;
> +    v6hdr = &pending_hp->v6hdr;
> +    maina = &pending_hp->maina;
> +    mainb = &pending_hp->mainb;
> +    hp = &pending_hp->hp;
> +
> +    hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
> +                               | RTAS_LOG_SEVERITY_EVENT
> +                               | RTAS_LOG_DISPOSITION_NOT_RECOVERED
> +                               | RTAS_LOG_OPTIONAL_PART_PRESENT
> +                               | RTAS_LOG_INITIATOR_HOTPLUG
> +                               | RTAS_LOG_TYPE_HOTPLUG);
> +    hdr->extended_length = cpu_to_be32(sizeof(*pending_hp)
> +                                       - sizeof(pending_hp->hdr));
> +
> +    spapr_init_v6hdr(v6hdr);
> +    spapr_init_maina(maina, 3 /* Main-A, Main-B, HP */);
> +
> +    mainb->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINB);
> +    mainb->hdr.section_length = cpu_to_be16(sizeof(*mainb));
> +    mainb->subsystem_id = 0x80; /* External environment */
> +    mainb->event_severity = 0x00; /* Informational / non-error */
> +    mainb->event_subtype = 0x00; /* Normal shutdown */
> +
> +    hp->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_HOTPLUG);
> +    hp->hdr.section_length = cpu_to_be16(sizeof(*hp));
> +    hp->hdr.section_version = 1; /* includes extended modifier */
> +    hp->hotplug_action = hp_action;
> +
> +    hp->hotplug_type = hp_type;
> +
> +    drc_entry = spapr_phb_to_drc_entry(phb->buid);
> +    if (!drc_entry) {
> +        drc_entry = spapr_add_phb_to_drc_table(phb->buid, 2 /* Unusable */);
> +    }
> +
> +    switch (hp_type) {
> +    case RTAS_LOG_V6_HP_TYPE_PCI:
> +        hp->drc.index = drc_entry->child_entries[slot].drc_index;
> +        hp->hotplug_identifier = RTAS_LOG_V6_HP_ID_DRC_INDEX;
> +        break;
> +    }
> +
> +    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
> +}
> +
> +void spapr_pci_hotplug_add_event(DeviceState *qdev, int slot)
> +{
> +    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(qdev);
> +
> +    return spapr_hotplug_req_event(RTAS_LOG_V6_HP_TYPE_PCI,
> +                                   RTAS_LOG_V6_HP_ACTION_ADD, phb, slot);
>  }
>  
> +void spapr_pci_hotplug_remove_event(DeviceState *qdev, int slot)
> +{
> +    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(qdev);
> +
> +    /* TODO: removal is generally initiated by guest, need to
> +     * document what exactly the guest is supposed to do with
> +     * this event. What does ACPI or shpc do?
> +     */
> +    return spapr_hotplug_req_event(RTAS_LOG_V6_HP_TYPE_PCI,
> +                                   RTAS_LOG_V6_HP_ACTION_REMOVE, phb, slot);
> + }
> +
>  static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>                              uint32_t token, uint32_t nargs,
>                              target_ulong args,
> @@ -298,15 +428,26 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>          xinfo |= (uint64_t)rtas_ld(args, 6) << 32;
>      }
>  
> -    if ((mask & EVENT_MASK_EPOW) && pending_epow) {
> -        if (sizeof(*pending_epow) < len) {
> -            len = sizeof(*pending_epow);
> -        }
> +    if (mask & EVENT_MASK_EPOW) {
> +        if (pending_epow) {
> +            if (sizeof(*pending_epow) < len) {
> +                len = sizeof(*pending_epow);
> +            }
>  
> -        cpu_physical_memory_write(buf, pending_epow, len);
> -        g_free(pending_epow);
> -        pending_epow = NULL;
> -        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +            cpu_physical_memory_write(buf, pending_epow, len);
> +            g_free(pending_epow);
> +            pending_epow = NULL;
> +            rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +        } else if (pending_hp) {
> +            if (sizeof(*pending_hp) < len) {
> +                len = sizeof(*pending_hp);
> +            }
> +
> +            cpu_physical_memory_write(buf, pending_hp, len);
> +            g_free(pending_hp);
> +            pending_hp = NULL;
> +            rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> +        }
>      } else {
>          rtas_st(rets, 0, RTAS_OUT_NO_ERRORS_FOUND);
>      }
> @@ -314,7 +455,7 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>  
>  void spapr_events_init(sPAPREnvironment *spapr)
>  {
> -    spapr->epow_irq = spapr_allocate_msi(0);
> +    spapr->check_exception_irq = spapr_allocate_msi(0);
>      spapr->epow_notifier.notify = spapr_powerdown_req;
>      qemu_register_powerdown_notifier(&spapr->epow_notifier);
>      spapr_rtas_register("check-exception", check_exception);
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 1c9b725..9eef2ce 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -31,7 +31,7 @@ typedef struct sPAPREnvironment {
>      uint64_t rtc_offset;
>      bool has_graphics;
>  
> -    uint32_t epow_irq;
> +    uint32_t check_exception_irq;
>      Notifier epow_notifier;
>  
>      /* Migration state */
> @@ -473,5 +473,7 @@ int spapr_dma_dt(void *fdt, int node_off, const char *propname,
>                   uint32_t liobn, uint64_t window, uint32_t size);
>  int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
>                        sPAPRTCETable *tcet);
> +void spapr_pci_hotplug_add_event(DeviceState *qdev, int slot);
> +void spapr_pci_hotplug_remove_event(DeviceState *qdev, int slot);
>  
>  #endif /* !defined (__HW_SPAPR_H__) */
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/14] spapr_pci: emit hotplug add/remove events during hotplug
  2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 14/14] spapr_pci: emit hotplug add/remove events during hotplug Michael Roth
@ 2013-12-16  5:06   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 39+ messages in thread
From: Alexey Kardashevskiy @ 2013-12-16  5:06 UTC (permalink / raw
  To: Michael Roth, qemu-devel; +Cc: agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

On 12/06/2013 09:33 AM, Michael Roth wrote:
> From: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> 
> This uses extension of existing EPOW interrupt/event mechanism
> to notify userspace tools like librtas/drmgr to handle
> in-guest configuration/cleanup operations in response to
> device_add/device_del.
> 
> Userspace tools that don't implement this extension will need
> to be run manually in response/advance of device_add/device_del,
> respectively.
> 
> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_pci.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 9b4f829..9821462 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1129,14 +1129,18 @@ static void spapr_device_hotplug_remove(DeviceState *qdev, PCIDevice *dev)
>  static int spapr_device_hotplug(DeviceState *qdev, PCIDevice *dev,
>                                  PCIHotplugState state)
>  {

sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(qdev);

> +    int slot = PCI_SLOT(dev->devfn);
> +
>      if (state == PCI_COLDPLUG_ENABLED) {
>          return 0;
>      }
>  
>      if (state == PCI_HOTPLUG_ENABLED) {
>          spapr_device_hotplug_add(qdev, dev);
> +        spapr_pci_hotplug_add_event(qdev, slot);

	spapr_pci_hotplug_add_event(phb->buid, slot);

>      } else {
>          spapr_device_hotplug_remove(qdev, dev);
> +        spapr_pci_hotplug_remove_event(qdev, slot);

	spapr_pci_hotplug_remove_event(phb->buid, slot);

and fix spapr_pci_hotplug_(add|remove)_event to receive @buid instead of
qdev/phb. Or we could even remove these helpers and call
spapr_hotplug_req_event() directly. Would not that make things easier to read?



>      }
>  
>      return 0;
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/14] spapr: add support for pci hotplug
  2013-12-05 22:32 [Qemu-devel] [PATCH v2 00/14] spapr: add support for pci hotplug Michael Roth
                   ` (13 preceding siblings ...)
  2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 14/14] spapr_pci: emit hotplug add/remove events during hotplug Michael Roth
@ 2014-01-10  8:29 ` Alexey Kardashevskiy
  14 siblings, 0 replies; 39+ messages in thread
From: Alexey Kardashevskiy @ 2014-01-10  8:29 UTC (permalink / raw
  To: Michael Roth, qemu-devel; +Cc: agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

Ping?


On 12/06/2013 09:32 AM, Michael Roth wrote:
> These patches are based on ppc-next, and can also be obtained from:
> 
> https://github.com/mdroth/qemu/commits/spapr-pci-hotplug-v2-ppc-next
> 
> v2:
>   * re-ordered patches to fix build bisectability (Alexey)
>   * replaced g_warning with DPRINTF in RTAS calls for guest errors (Alexey)
>   * replaced g_warning with fprintf for qemu errors (Alexey)
>   * updated RTAS calls to use pre-existing error/success macros (Alexey)
>   * replaced DR_*/SENSOR_* macros with INDICATOR_* for set-indicator/
>     get-sensor-state (Alexey)
> 
> OVERVIEW
> 
> These patches add support for PCI hotplug for SPAPR guests. We advertise
> each PHB as DR-capable (as defined by PAPR 13.5/13.6) with 32 hotpluggable
> PCI slots per PHB, which models a standard PCI expansion device for Power
> machines where the DRC name/loc-code/index for each slot are generated
> based on bus/slot number.
> 
> This is compatible with existing guest kernel's via the rpaphp hotplug
> module, and existing userspace tools such as drmgr/librtas/rtas_errd for
> managing devices, in theory...
> 
> NOTES / ADDITIONAL DEPENDENCIES
> 
> Due to an issue with rpaphp, a workaround must be used for older guest
> kernels which relies on using bus rescan / remove sysfs interfaces instead
> of rpaphp-provided hotplug interfaces.
> 
> Guest kernel fixes for rpaphp are in progress and available for testing
> here (there's still currently a benign issue with duplicate eeh sysfs
> entries with these, but the full guest-driven hotplug workflow is
> functional):
> 
>   https://github.com/mdroth/linux/commits/pci-hotplug-fixes
> 
> Alternatively, there are updated userspace tools which add a "-v" option
> to drmgr to utilize bus rescan/remove instead of relying on rpaphp:
> 
>   https://github.com/tyreld/powerpc-utils/commits/hotplug
> 
> It's possible to test guest-driven hotplug without either of these using
> a workaround (see USAGE below), but not recommended.
> 
> PAPR does not currently define a mechanism for generating PCI
> hotplug/unplug events, and relies on guest-driven management of devices,
> so as part of this series we also introduce an extension to the existing
> EPOW power event reporting mechanism (where a guest will query for events
> via check-exception RTAS calls in response to an external interrupt) to
> surface hotplug/unplug events with the information needed to manage the
> devices automatically via the rtas_errd guest service. In order to enable
> this qemu-driven hotplug/unplug workflow (for parity with ACPI/SHPC-based
> guests), updated versions of librtas/ppc64-diag are required, which are
> available here:
> 
>   https://github.com/tyreld/ppc64-diag/commits/hotplug
>   https://github.com/tyreld/librtas/commits/hotplug
> 
> Lacking those, users must manage device hotplug/unplug manually.
> 
> Additionally, PAPR requires the presence of additional OF properties
> (ibm,my-drc-index and loc-code) for hotpluggable slots that have already
> been populated at the time of boot to support unplug, so an updated SLOF
> is required to allow for device unplug after a guest reboot. (these
> properties cannot currently be added to boot-time FDT, since they will
> conflict with SLOF-generated device nodes, so we either need to teach
> SLOF to re-use/merge existing entries, or simply have it generate the
> required properties values for board-qemu, which is the approach taken
> here). A patch for SLOF is available below, along with a pre-built
> SLOF binary which includes it (for testing):
> 
>   https://github.com/mdroth/SLOF/commit/2e09a2950db0ce8ed464b80cccfea56dccf85d66
>   https://github.com/mdroth/qemu/blob/19a390e3270a7defc7158ce29e52ff2b27d666ae/pc-bios/slof.bin
> 
> PATCH LAYOUT
> 
> Patches
>         1-3   advertise PHBs and associated slots as hotpluggable to guests
>         4-7   add RTAS interfaces required for device configuration
>         8-10  add helpers and potential fix to deal with QEMU-managed BAR
>               assignments
>         11    enables device_add/device_del for spapr machines and
>               guest-driven hotplug
>         12-14 define hotplug event structure and emit them in response to
>               device_add/device_del
> 
> USAGE
> 
> With unmodified guests:
>   hotplug:
>     qemu:
>       device_add e1000,id=slot0
>     guest:
>       drmgr -c pci -s "Slot 0" -n -a
>       echo 1 >/sys/bus/pci/rescan
>   unplug:
>     guest:
>       drmgr -c pci -s "Slot 0" -n -r
>       echo 1 >/sys/bus/pci/devices/0000:00:00.0/remove
>     qemu:
>       device_del slot0
> 
> With only updated guest kernel:
>   hotplug:
>     qemu:
>       device_add e1000,id=slot0
>     guest:
>       modprobe rpaphp
>       drmgr -c pci -s "Slot 0" -n -a
>   unplug:
>     guest:
>       drmgr -c pci -s "Slot 0" -n -r
>     qemu:
>       device_del slot0
> 
> With only updated powerpc-utils/drmgr:
>   hotplug:
>     qemu:
>       device_add e1000,id=slot0
>     guest:
>       drmgr -c pci -s "Slot 0" -n -v -a
>   unplug:
>     guest:
>       drmgr -c pci -s "Slot 0" -n -v -r
>     qemu:
>       device_del slot0
> 
> With updated librtas/ppc64-diag and either an updated guest kernel or drmgr:
>   hotplug:
>     qemu:
>       device_add e1000,id=slot0
>   unplug:
>     qemu:
>       device_del slot0
> 
>  hw/pci/pci.c                |    5 +-
>  hw/ppc/spapr.c              |  174 +++++++++-
>  hw/ppc/spapr_events.c       |  228 ++++++++++---
>  hw/ppc/spapr_pci.c          |  768 ++++++++++++++++++++++++++++++++++++++++++-
>  include/exec/memory.h       |   34 ++
>  include/hw/pci-host/spapr.h |    1 +
>  include/hw/pci/pci.h        |    1 +
>  include/hw/ppc/spapr.h      |   77 ++++-
>  memory.c                    |   50 +++
>  9 files changed, 1286 insertions(+), 52 deletions(-)
> 
> pickGIT: [PATCH v2 06/14] spapr_pci: add get-sensor-state RTAS interface
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/14] spapr: populate DRC entries for root dt node
  2013-12-16  4:54     ` Alexey Kardashevskiy
@ 2014-01-16 20:51       ` Michael Roth
  2014-01-20  2:58         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 39+ messages in thread
From: Michael Roth @ 2014-01-16 20:51 UTC (permalink / raw
  To: Alexey Kardashevskiy, qemu-devel
  Cc: agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

Quoting Alexey Kardashevskiy (2013-12-15 22:54:42)
> On 12/16/2013 01:59 PM, Alexey Kardashevskiy wrote:
> > On 12/06/2013 09:32 AM, Michael Roth wrote:
> >> From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> >>
> >> This add entries to the root OF node to advertise our PHBs as being
> >> DR-capable in according with PAPR specification.
> >>
> >> Each PHB is given a name of PHB<bus#>, advertised as a PHB type,
> >> and associated with a power domain of -1 (indicating to guests that
> >> power management is handled automatically by hardware).
> >>
> >> We currently allocate entries for up to 32 DR-capable PHBs, though
> >> this limit can be increased later.
> >>
> >> DrcEntry objects to track the state of the DR-connector associated
> >> with each PHB are stored in a 32-entry array, and each DrcEntry has
> >> in turn have a dynamically-sized number of child DR-connectors,
> >> which we will use later to track the state of DR-connectors
> >> associated with a PHB's physical slots.
> >>
> >> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> >> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> >> ---
> >>  hw/ppc/spapr.c         |  132 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>  include/hw/ppc/spapr.h |   33 ++++++++++++
> >>  2 files changed, 165 insertions(+)
> >>
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index 7e53a5f..ec3ba43 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -81,6 +81,7 @@
> >>  #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
> >>  
> >>  sPAPREnvironment *spapr;
> >> +DrcEntry drc_table[SPAPR_DRC_TABLE_SIZE];
> >>  
> >>  int spapr_allocate_irq(int hint, bool lsi)
> >>  {
> >> @@ -276,6 +277,130 @@ static size_t create_page_sizes_prop(CPUPPCState *env, uint32_t *prop,
> >>      return (p - prop) * sizeof(uint32_t);
> >>  }
> >>  
> >> +static void spapr_init_drc_table(void)
> >> +{
> >> +    int i;
> >> +
> >> +    memset(drc_table, 0, sizeof(drc_table));
> >> +
> >> +    /* For now we only care about PHB entries */
> >> +    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
> >> +        drc_table[i].drc_index = 0x2000001 + i;
> >> +    }
> >> +}
> >> +
> >> +DrcEntry *spapr_add_phb_to_drc_table(uint64_t buid, uint32_t state)
> >> +{
> >> +    DrcEntry *empty_drc = NULL;
> >> +    DrcEntry *found_drc = NULL;
> >> +    int i, phb_index;
> >> +
> >> +    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
> >> +        if (drc_table[i].phb_buid == 0) {
> >> +            empty_drc = &drc_table[i];
> >> +        }
> >> +
> >> +        if (drc_table[i].phb_buid == buid) {
> >> +            found_drc = &drc_table[i];
> >> +            break;
> >> +        }
> >> +    }
> >> +
> >> +    if (found_drc) {
> >> +        return found_drc;
> >> +    }
> >> +
> >> +    if (empty_drc) {
> >> +        empty_drc->phb_buid = buid;
> >> +        empty_drc->state = state;
> >> +        empty_drc->cc_state.fdt = NULL;
> >> +        empty_drc->cc_state.offset = 0;
> >> +        empty_drc->cc_state.depth = 0;
> >> +        empty_drc->cc_state.state = CC_STATE_IDLE;
> >> +        empty_drc->child_entries =
> >> +            g_malloc0(sizeof(DrcEntry) * SPAPR_DRC_PHB_SLOT_MAX);
> >> +        phb_index = buid - SPAPR_PCI_BASE_BUID;
> >> +        for (i = 0; i < SPAPR_DRC_PHB_SLOT_MAX; i++) {
> >> +            empty_drc->child_entries[i].drc_index =
> >> +                SPAPR_DRC_DEV_ID_BASE + (phb_index << 8) + (i << 3);
> >> +        }
> >> +        return empty_drc;
> >> +    }
> >> +
> >> +    return NULL;
> >> +}
> >> +
> >> +static void spapr_create_drc_dt_entries(void *fdt)
> >> +{
> >> +    char char_buf[1024];
> >> +    uint32_t int_buf[SPAPR_DRC_TABLE_SIZE + 1];
> >> +    uint32_t *entries;
> >> +    int offset, fdt_offset;
> >> +    int i, ret;
> >> +
> >> +    fdt_offset = fdt_path_offset(fdt, "/");
> >> +
> >> +    /* ibm,drc-indexes */
> >> +    memset(int_buf, 0, sizeof(int_buf));
> >> +    int_buf[0] = SPAPR_DRC_TABLE_SIZE;
> >> +
> >> +    for (i = 1; i <= SPAPR_DRC_TABLE_SIZE; i++) {
> >> +        int_buf[i] = drc_table[i-1].drc_index;
> >> +    }
> >> +
> >> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-indexes", int_buf,
> >> +                      sizeof(int_buf));
> >> +    if (ret) {
> >> +        fprintf(stderr, "Couldn't finalize ibm,drc-indexes property\n");
> >> +    }
> >> +
> >> +    /* ibm,drc-power-domains */
> >> +    memset(int_buf, 0, sizeof(int_buf));
> >> +    int_buf[0] = SPAPR_DRC_TABLE_SIZE;
> >> +
> >> +    for (i = 1; i <= SPAPR_DRC_TABLE_SIZE; i++) {
> >> +        int_buf[i] = 0xffffffff;
> >> +    }
> >> +
> >> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-power-domains", int_buf,
> >> +                      sizeof(int_buf));
> >> +    if (ret) {
> >> +        fprintf(stderr, "Couldn't finalize ibm,drc-power-domains property\n");
> >> +    }
> >> +
> >> +    /* ibm,drc-names */
> >> +    memset(char_buf, 0, sizeof(char_buf));
> >> +    entries = (uint32_t *)&char_buf[0];
> >> +    *entries = SPAPR_DRC_TABLE_SIZE;
> >> +    offset = sizeof(*entries);
> >> +
> >> +    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
> >> +        offset += sprintf(char_buf + offset, "PHB %d", i + 1);
> >> +        char_buf[offset++] = '\0';
> >> +    }
> >> +
> >> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-names", char_buf, offset);
> >> +    if (ret) {
> >> +        fprintf(stderr, "Couldn't finalize ibm,drc-names property\n");
> >> +    }
> >> +
> >> +    /* ibm,drc-types */
> >> +    memset(char_buf, 0, sizeof(char_buf));
> >> +    entries = (uint32_t *)&char_buf[0];
> >> +    *entries = SPAPR_DRC_TABLE_SIZE;
> >> +    offset = sizeof(*entries);
> >> +
> >> +    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
> >> +        offset += sprintf(char_buf + offset, "PHB");
> >> +        char_buf[offset++] = '\0';
> >> +    }
> >> +
> >> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-types", char_buf, offset);
> >> +    if (ret) {
> >> +        fprintf(stderr, "Couldn't finalize ibm,drc-types property\n");
> >> +    }
> >> +}
> >> +
> >>  #define _FDT(exp) \
> >>      do { \
> >>          int ret = (exp);                                           \
> >> @@ -307,6 +432,8 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
> >>      int i, smt = kvmppc_smt_threads();
> >>      unsigned char vec5[] = {0x0, 0x0, 0x0, 0x0, 0x0, 0x80};
> >>  
> >> +    spapr_init_drc_table();
> >> +
> >>      fdt = g_malloc0(FDT_MAX_SIZE);
> >>      _FDT((fdt_create(fdt, FDT_MAX_SIZE)));
> >>  
> >> @@ -590,6 +717,7 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
> >>      int ret;
> >>      void *fdt;
> >>      sPAPRPHBState *phb;
> >> +    DrcEntry *drc_entry;
> >>  
> >>      fdt = g_malloc(FDT_MAX_SIZE);
> >>  
> >> @@ -609,6 +737,8 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
> >>      }
> >>  
> >>      QLIST_FOREACH(phb, &spapr->phbs, list) {
> >> +        drc_entry = spapr_add_phb_to_drc_table(phb->buid, 2 /* Unusable */);
> >> +        g_assert(drc_entry);
> >>          ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt);
> >>      }
> >>  
> >> @@ -633,6 +763,8 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
> >>          spapr_populate_chosen_stdout(fdt, spapr->vio_bus);
> >>      }
> >>  
> >> +    spapr_create_drc_dt_entries(fdt);
> >> +
> >>      _FDT((fdt_pack(fdt)));
> >>  
> >>      if (fdt_totalsize(fdt) > FDT_MAX_SIZE) {
> >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >> index b2f11e9..0f2e705 100644
> >> --- a/include/hw/ppc/spapr.h
> >> +++ b/include/hw/ppc/spapr.h
> >> @@ -299,6 +299,39 @@ typedef struct sPAPREnvironment {
> >>  #define KVMPPC_H_LOGICAL_MEMOP  (KVMPPC_HCALL_BASE + 0x1)
> >>  #define KVMPPC_HCALL_MAX        KVMPPC_H_LOGICAL_MEMOP
> >>  
> >> +/* For dlparable/hotpluggable slots */
> >> +#define SPAPR_DRC_TABLE_SIZE    32
> >> +#define SPAPR_DRC_PHB_SLOT_MAX  32
> >> +#define SPAPR_DRC_DEV_ID_BASE   0x40000000
> >> +
> >> +typedef struct ConfigureConnectorState {
> >> +    void *fdt;
> >> +    int offset_start;
> >> +    int offset;
> >> +    int depth;
> >> +    PCIDevice *dev;
> >> +    enum {
> >> +        CC_STATE_IDLE = 0,
> >> +        CC_STATE_PENDING = 1,
> >> +        CC_STATE_ACTIVE,
> >> +    } state;
> >> +} ConfigureConnectorState;
> >> +
> >> +typedef struct DrcEntry DrcEntry;
> >> +
> >> +struct DrcEntry {
> >> +    uint32_t drc_index;
> >> +    uint64_t phb_buid;
> >> +    void *fdt;
> >> +    int fdt_offset;
> >> +    uint32_t state;
> >> +    ConfigureConnectorState cc_state;
> >> +    DrcEntry *child_entries;
> >> +};
> >> +
> >> +extern DrcEntry drc_table[SPAPR_DRC_TABLE_SIZE];
> >> +DrcEntry *spapr_add_phb_to_drc_table(uint64_t buid, uint32_t state);
> >> +
> >>  extern sPAPREnvironment *spapr;
> > 
> > So far we were trying to keep everything sPAPR-related in sPAPREnvironment.
> > Is @drc_table really that special?
> 
> 
> One more note - we are trying to add a "spapr" or "sPAPR" prefix to all
> global types defines in headers (such as sPAPRPHBState, spapr_pci_lsi,
> VIOsPAPRBus, sPAPREnvironment), it would be nice to have "spapr" in some
> form in these new types too.
> 
> Or we could move the whole patch (except spapr_create_drc_dt_entries()) to
> hw/ppc/spapr_pci.c (and keep the original names) as it seems to be the only
> user of the whole DrcEntry and ConfigureConnectorState thing.
> And put a pointer to drc_table[] into @spapr (or make it static?)

That would work, but I think we'd need to move spapr_create_drc_dt_entries()
as well, or the bits that rely on DrcEntry at least. Though I worry
about scoping DrcEntry to spapr_pci.c at this early stage, as DR-capable
components other than PCI may come to rely on state that's captured by the
DrcEntry nodes, such as boot-time FDT generation and run-time management
(via ibm,configure-connector) of CPUs and memory.

Assuming that seems like a reasonable expectation, I think I'd prefer the
first option of using spapr-specific prefixes for global types and moving
drc_table into sPAPREnvironment

> 
> The only remaining user of DrcEntry is spapr_hotplug_req_event() but this
> can be easily fixed by small helper like this:
> 
> int spapr_phb_slot_to_drc_index(uint64_t buid, int slot)
> {
>         DrcEntry *drc_entry = spapr_phb_to_drc_entry(phb->buid);
>         if (!drc_entry) {
>                 drc_entry = spapr_add_phb_to_drc_table(phb->buid, 2);
>         }
>         return drc_entry->child_entries[slot].drc_index;
> }
> 
> 
> > 
> > 
> >>  
> >>  /*#define DEBUG_SPAPR_HCALLS*/
> >>
> > 
> > 
> 
> 
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/14] spapr_pci: add set-indicator RTAS interface
  2013-12-16  4:26   ` Alexey Kardashevskiy
@ 2014-01-16 20:54     ` Michael Roth
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Roth @ 2014-01-16 20:54 UTC (permalink / raw
  To: Alexey Kardashevskiy, qemu-devel
  Cc: agraf, ncmike, qemu-ppc, tyreld, nfont, paulus

Quoting Alexey Kardashevskiy (2013-12-15 22:26:32)
> On 12/06/2013 09:32 AM, Michael Roth wrote:
> > From: Mike Day <ncmike@ncultra.org>
> > 
> > Signed-off-by: Mike Day <ncmike@ncultra.org>
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr_pci.c     |   93 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  include/hw/ppc/spapr.h |   28 +++++++++++++++
> >  2 files changed, 121 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index 7568a03..1046ec8 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -35,6 +35,16 @@
> >  
> >  #include "hw/pci/pci_bus.h"
> >  
> > +/* #define DEBUG_SPAPR */
> > +
> > +#ifdef DEBUG_SPAPR
> > +#define DPRINTF(fmt, ...) \
> > +    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
> > +#else
> > +#define DPRINTF(fmt, ...) \
> > +    do { } while (0)
> > +#endif
> > +
> >  /* Copied from the kernel arch/powerpc/platforms/pseries/msi.c */
> >  #define RTAS_QUERY_FN           0
> >  #define RTAS_CHANGE_FN          1
> > @@ -404,6 +414,80 @@ static void rtas_ibm_query_interrupt_source_number(PowerPCCPU *cpu,
> >      rtas_st(rets, 2, 1);/* 0 == level; 1 == edge */
> >  }
> >  
> > +static void rtas_set_indicator(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> > +                               uint32_t token, uint32_t nargs,
> > +                               target_ulong args, uint32_t nret,
> > +                               target_ulong rets)
> > +{
> > +    uint32_t indicator = rtas_ld(args, 0);
> > +    uint32_t drc_index = rtas_ld(args, 1);
> > +    uint32_t indicator_state = rtas_ld(args, 2);
> > +    uint32_t encoded = 0, shift = 0, mask = 0;
> > +    uint32_t *pind;
> > +    DrcEntry *drc_entry = NULL;
> > +
> > +    if (drc_index == 0) { /* platform indicator */
> > +        pind = &spapr->state;
> > +    } else {
> > +        drc_entry = spapr_find_drc_entry(drc_index);
> > +        if (!drc_entry) {
> > +            DPRINTF("rtas_set_indicator: unable to find drc_entry for %x",
> > +                    drc_index);
> > +            rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> > +            return;
> > +        }
> > +        pind = &drc_entry->state;
> > +    }
> > +
> > +    switch (indicator) {
> > +    case 9:  /* EPOW */
> > +        shift = INDICATOR_EPOW_SHIFT;
> > +        mask = INDICATOR_EPOW_MASK;
> > +        break;
> > +    case 9001: /* Isolation state */
> > +        /* encode the new value into the correct bit field */
> > +        shift = INDICATOR_ISOLATION_SHIFT;
> > +        mask = INDICATOR_ISOLATION_MASK;
> > +        break;
> > +    case 9002: /* DR */
> > +        shift = INDICATOR_DR_SHIFT;
> > +        mask = INDICATOR_DR_MASK;
> > +        break;
> > +    case 9003: /* Allocation State */
> > +        shift = INDICATOR_ALLOCATION_SHIFT;
> > +        mask = INDICATOR_ALLOCATION_MASK;
> > +        break;
> > +    case 9005: /* global interrupt */
> > +        shift = INDICATOR_GLOBAL_INTERRUPT_SHIFT;
> > +        mask = INDICATOR_GLOBAL_INTERRUPT_MASK;
> > +        break;
> > +    case 9006: /* error log */
> > +        shift = INDICATOR_ERROR_LOG_SHIFT;
> > +        mask = INDICATOR_ERROR_LOG_MASK;
> > +        break;
> > +    case 9007: /* identify */
> > +        shift = INDICATOR_IDENTIFY_SHIFT;
> > +        mask = INDICATOR_IDENTIFY_MASK;
> > +        break;
> > +    case 9009: /* reset */
> > +        shift = INDICATOR_RESET_SHIFT;
> > +        mask = INDICATOR_RESET_MASK;
> > +        break;
> > +    default:
> > +        DPRINTF("rtas_set_indicator: indicator not implemented: %d",
> > +                indicator);
> > +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> > +        return;
> > +    }
> > +
> > +    encoded = ENCODE_DRC_STATE(indicator_state, mask, shift);
> > +    /* clear the current indicator value */
> > +    *pind &= ~mask;
> > +    /* set the new value */
> > +    *pind |= encoded;
> > +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > +}
> > +
> >  static int pci_spapr_swizzle(int slot, int pin)
> >  {
> >      return (slot + pin) % PCI_NUM_PINS;
> > @@ -637,6 +721,14 @@ static int spapr_phb_init(SysBusDevice *s)
> >          sphb->lsi_table[i].irq = irq;
> >      }
> >  
> > +    /* make sure the platform EPOW sensor is initialized - the
> > +     * guest will probe it when there is a hotplug event.
> > +     */
> > +    spapr->state &= ~(uint32_t)INDICATOR_EPOW_MASK;
> > +    spapr->state |= ENCODE_DRC_STATE(0,
> > +                                     INDICATOR_EPOW_MASK,
> > +                                     INDICATOR_EPOW_SHIFT);
> > +
> >      return 0;
> >  }
> >  
> > @@ -958,6 +1050,7 @@ void spapr_pci_rtas_init(void)
> >                              rtas_ibm_query_interrupt_source_number);
> >          spapr_rtas_register("ibm,change-msi", rtas_ibm_change_msi);
> >      }
> > +    spapr_rtas_register("set-indicator", rtas_set_indicator);
> >  }
> >  
> >  static void spapr_pci_register_types(void)
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index 6ae5c54..b48c55f 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -38,6 +38,9 @@ typedef struct sPAPREnvironment {
> >      int htab_save_index;
> >      bool htab_first_pass;
> >      int htab_fd;
> > +
> > +    /* platform state - sensors and indicators */
> > +    uint32_t state;
> >  } sPAPREnvironment;
> >  
> >  #define H_SUCCESS         0
> > @@ -334,6 +337,31 @@ DrcEntry *spapr_add_phb_to_drc_table(uint64_t buid, uint32_t state);
> >  DrcEntry *spapr_phb_to_drc_entry(uint64_t buid);
> >  DrcEntry *spapr_find_drc_entry(int drc_index);
> >  
> > +/* For set-indicator RTAS interface */
> > +#define INDICATOR_ISOLATION_MASK            0x0001   /* 9001 one bit */
> > +#define INDICATOR_GLOBAL_INTERRUPT_MASK     0x0002   /* 9005 one bit */
> > +#define INDICATOR_ERROR_LOG_MASK            0x0004   /* 9006 one bit */
> > +#define INDICATOR_IDENTIFY_MASK             0x0008   /* 9007 one bit */
> > +#define INDICATOR_RESET_MASK                0x0010   /* 9009 one bit */
> > +#define INDICATOR_DR_MASK                   0x00e0   /* 9002 three bits */
> > +#define INDICATOR_ALLOCATION_MASK           0x0300   /* 9003 two bits */
> > +#define INDICATOR_EPOW_MASK                 0x1c00   /* 9 three bits */
> > +
> > +#define INDICATOR_ISOLATION_SHIFT           0x00     /* bit 0 */
> > +#define INDICATOR_GLOBAL_INTERRUPT_SHIFT    0x01     /* bit 1 */
> > +#define INDICATOR_ERROR_LOG_SHIFT           0x02     /* bit 2 */
> > +#define INDICATOR_IDENTIFY_SHIFT            0x03     /* bit 3 */
> > +#define INDICATOR_RESET_SHIFT               0x04     /* bit 4 */
> > +#define INDICATOR_DR_SHIFT                  0x05     /* bits 5-7 */
> > +#define INDICATOR_ALLOCATION_SHIFT          0x08     /* bits 8-9 */
> > +#define INDICATOR_EPOW_SHIFT                0x0a     /* bits 10-12 */
> > +
> > +#define DECODE_DRC_STATE(state, m, s)                  \
> > +    ((((uint32_t)(state) & (uint32_t)(m))) >> (s))
> > +
> > +#define ENCODE_DRC_STATE(val, m, s) \
> > +    (((uint32_t)(val) << (s)) & (uint32_t)(m))
> > +
> 
> 
> Why to put these definitions in the header when they are only used by the
> spapr_pci.c? It gives me the (wrong) idea that these are shared between
> files...

Agreed these should live in spapr_pci.c, will fix.

> 
> 
> 
> >  extern sPAPREnvironment *spapr;
> >  
> >  /*#define DEBUG_SPAPR_HCALLS*/
> > 
> 
> 
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/14] spapr_pci: add get/set-power-level RTAS interfaces
  2013-12-16  3:09   ` Alexey Kardashevskiy
@ 2014-01-16 21:01     ` Michael Roth
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Roth @ 2014-01-16 21:01 UTC (permalink / raw
  To: Alexey Kardashevskiy, qemu-devel
  Cc: agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

Quoting Alexey Kardashevskiy (2013-12-15 21:09:09)
> On 12/06/2013 09:32 AM, Michael Roth wrote:
> > From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> > 
> > Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr_pci.c |   22 ++++++++++++++++++++++
> >  1 file changed, 22 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index 1046ec8..8df44a3 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -488,6 +488,26 @@ static void rtas_set_indicator(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> >      rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >  }
> >  
> > +static void rtas_set_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> > +                                 uint32_t token, uint32_t nargs,
> > +                                 target_ulong args, uint32_t nret,
> > +                                 target_ulong rets)
> > +{
> > +    uint32_t power_lvl = rtas_ld(args, 1);
> > +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > +    rtas_st(rets, 1, power_lvl);
> > +}
> > +
> > +static void rtas_get_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> > +                                  uint32_t token, uint32_t nargs,
> > +                                  target_ulong args, uint32_t nret,
> > +                                  target_ulong rets)
> > +{
> > +    /* return SUCCESS with a power level of 100 */
> > +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > +    rtas_st(rets, 1, 100);
> > +}
> > +
> 
> The PAPR spec says that rtas_set_power_level() returns "Actual_level The
> power level actually set" but rtas_get_power_level() always returns 100
> (full power).
> 
> Is this inconsistency here for a reason?

We advertise all PHB power-domains as being a "live insert connector", -1,
which is considered invalid to call rtas_set_power_level with. So I think
it should work either way. Will remove the inconsistency and re-test to
confirm.

> 
> 
> >  static int pci_spapr_swizzle(int slot, int pin)
> >  {
> >      return (slot + pin) % PCI_NUM_PINS;
> > @@ -1051,6 +1071,8 @@ void spapr_pci_rtas_init(void)
> >          spapr_rtas_register("ibm,change-msi", rtas_ibm_change_msi);
> >      }
> >      spapr_rtas_register("set-indicator", rtas_set_indicator);
> > +    spapr_rtas_register("set-power-level", rtas_set_power_level);
> > +    spapr_rtas_register("get-power-level", rtas_get_power_level);
> >  }
> >  
> >  static void spapr_pci_register_types(void)
> > 
> 
> 
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/14] spapr_pci: enable basic hotplug operations
  2013-12-16  4:36   ` Alexey Kardashevskiy
@ 2014-01-16 21:22     ` Michael Roth
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Roth @ 2014-01-16 21:22 UTC (permalink / raw
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Michael S. Tsirkin, agraf, ncmike, paulus, tyreld, nfont,
	qemu-ppc

Quoting Alexey Kardashevskiy (2013-12-15 22:36:32)
> On 12/06/2013 09:33 AM, Michael Roth wrote:
> > From: Mike Day <ncmike@ncultra.org>
> > 
> > This enables hotplug for PHB bridges. Upon hotplug we generate the
> > OF-nodes required by PAPR specification and IEEE 1275-1994
> > "PCI Bus Binding to Open Firmware" for the device.
> > 
> > We associate the corresponding FDT for these nodes with the DrcEntry
> > corresponding to the slot, which will be fetched via
> > ibm,configure-connector RTAS calls by the guest as described by PAPR
> > specification. The FDT is cleaned up in the case of unplug.
> > 
> > Amongst the required OF-node properties for each device are the "reg"
> > and "assigned-addresses" properties which describe the BAR-assignments
> > for IO/MEM/ROM regions. To handle these assignments we scan the address
> > space associated with each region for a contiguous range of appropriate
> > size based on PCI specification and encode these in accordance with
> > Open Firmware PCI Bus Binding spec.
> > 
> > These assignments will be used by the guest when the rpaphp hotplug
> > module is used, but may be re-assigned by guests for cases where we
> > rely on bus rescan.
> > 
> > Signed-off-by: Mike Day <ncmike@ncultra.org>
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr_pci.c     |  375 ++++++++++++++++++++++++++++++++++++++++++++++--
> >  include/hw/ppc/spapr.h |    1 +
> >  2 files changed, 368 insertions(+), 8 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index 6e7ee31..9b4f829 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -56,6 +56,17 @@
> >  #define RTAS_TYPE_MSI           1
> >  #define RTAS_TYPE_MSIX          2
> >  
> > +#define FDT_MAX_SIZE            0x10000
> > +#define _FDT(exp) \
> > +    do { \
> > +        int ret = (exp);                                           \
> > +        if (ret < 0) {                                             \
> > +            return ret;                                            \
> > +        }                                                          \
> > +    } while (0)
> > +
> > +static void spapr_drc_state_reset(DrcEntry *drc_entry);
> > +
> >  static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
> >  {
> >      sPAPRPHBState *sphb;
> > @@ -448,6 +459,22 @@ static void rtas_set_indicator(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> >          /* encode the new value into the correct bit field */
> >          shift = INDICATOR_ISOLATION_SHIFT;
> >          mask = INDICATOR_ISOLATION_MASK;
> > +        if (drc_entry) {
> > +            /* transition from unisolated to isolated for a hotplug slot
> > +             * entails completion of guest-side device unplug/cleanup, so
> > +             * we can now safely remove the device if qemu is waiting for
> > +             * it to be released
> > +             */
> > +            if (DECODE_DRC_STATE(*pind, mask, shift) != indicator_state) {
> > +                if (indicator_state == 0 && drc_entry->awaiting_release) {
> > +                    /* device_del has been called and host is waiting
> > +                     * for guest to release/isolate device, go ahead
> > +                     * and remove it now
> > +                     */
> > +                    spapr_drc_state_reset(drc_entry);
> > +                }
> > +            }
> > +        }
> >          break;
> >      case 9002: /* DR */
> >          shift = INDICATOR_DR_SHIFT;
> > @@ -776,6 +803,345 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
> >      return &phb->iommu_as;
> >  }
> >  
> > +/* for 'reg'/'assigned-addresses' OF properties */
> > +#define RESOURCE_CELLS_SIZE 2
> > +#define RESOURCE_CELLS_ADDRESS 3
> > +#define RESOURCE_CELLS_TOTAL \
> > +    (RESOURCE_CELLS_SIZE + RESOURCE_CELLS_ADDRESS)
> > +
> > +static void fill_resource_props(PCIDevice *d, int bus_num,
> > +                                uint32_t *reg, int *reg_size,
> > +                                uint32_t *assigned, int *assigned_size)
> > +{
> > +    uint32_t *reg_row, *assigned_row;
> > +    uint32_t dev_id = ((bus_num << 8) |
> > +                        (PCI_SLOT(d->devfn) << 3) | PCI_FUNC(d->devfn));
> > +    int i, idx = 0;
> > +
> > +    reg[0] = cpu_to_be32(dev_id << 8);
> > +
> > +    for (i = 0; i < PCI_NUM_REGIONS; i++) {
> > +        if (!d->io_regions[i].size) {
> > +            continue;
> > +        }
> > +        reg_row = &reg[(idx + 1) * RESOURCE_CELLS_TOTAL];
> > +        assigned_row = &assigned[idx * RESOURCE_CELLS_TOTAL];
> > +        reg_row[0] = cpu_to_be32((dev_id << 8) | (pci_bar(d, i) & 0xff));
> > +        if (d->io_regions[i].type & PCI_BASE_ADDRESS_SPACE_IO) {
> > +            reg_row[0] |= cpu_to_be32(0x01000000);
> > +        } else {
> > +            reg_row[0] |= cpu_to_be32(0x02000000);
> > +        }
> > +        assigned_row[0] = cpu_to_be32(reg_row[0] | 0x80000000);
> 
> 
> 0x80000000 == relocatable? 0x01000000/0x02000000 - space codes? There are
> macros (b_n, b_ss) in this file, can you please use them?

Ah, those look handy, thanks

> 
> 
> > +        assigned_row[3] = reg_row[3] = cpu_to_be32(d->io_regions[i].size >> 32);
> > +        assigned_row[4] = reg_row[4] = cpu_to_be32(d->io_regions[i].size);
> > +        assigned_row[1] = cpu_to_be32(d->io_regions[i].addr >> 32);
> > +        assigned_row[2] = cpu_to_be32(d->io_regions[i].addr);
> > +        idx++;
> > +    }
> > +
> > +    *reg_size = (idx + 1) * RESOURCE_CELLS_TOTAL * sizeof(uint32_t);
> > +    *assigned_size = idx * RESOURCE_CELLS_TOTAL * sizeof(uint32_t);
> > +}
> > +
> > +static hwaddr spapr_find_bar_addr(sPAPRPHBState *phb, PCIIORegion *r)
> 
> 
> This does not use @pbh at all and therefore can go to hw/pci/pci.c may be
> (which can be tricky though)?

SPAPR_PCI_MEM_WIN_BUS_OFFSET is certainly platform specific, so that would
need to be generalized. Beyond that, I don't have enough experience to know
how useful this would be for other platforms, but perhaps it's general enough
for others to add-to/re-use in the future. I'll take a stab at moving it to
hw/pci/pci.c and Cc the appropriate maintainers.

> 
> 
> > +{
> > +    MemoryRegionSection mrs = { 0 };
> > +    hwaddr search_addr;
> > +    hwaddr size = r->size;
> > +    hwaddr addr_mask = ~(size - 1);
> > +    hwaddr increment = size;
> > +    hwaddr limit;
> > +
> > +    if (r->type == PCI_BASE_ADDRESS_SPACE_MEMORY) {
> > +        /* beginning portion of mmio address space for bus does not get
> > +         * mapped into system memory, so calculate addr starting at the
> > +         * corresponding offset into mmio as.
> > +         */
> > +        search_addr = (SPAPR_PCI_MEM_WIN_BUS_OFFSET + increment) & addr_mask;
> > +    } else {
> > +        search_addr = increment;
> > +    }
> > +    limit = memory_region_size(r->address_space);
> > +
> > +    do {
> > +        mrs = memory_region_find_subregion(r->address_space, search_addr, size);
> > +        if (mrs.mr) {
> > +            hwaddr mr_last_addr;
> > +            mr_last_addr = mrs.mr->addr + memory_region_size(mrs.mr) - 1;
> > +            search_addr = (mr_last_addr + 1) & addr_mask;
> > +            if (search_addr <= mr_last_addr) {
> > +                search_addr += increment;
> > +            }
> > +            /* this memory region overlaps, unref and continue searching */
> > +            memory_region_unref(mrs.mr);
> > +        }
> > +    } while (int128_nz(mrs.size) && search_addr + size <= limit);
> > +
> > +    if (search_addr + size >= limit) {
> > +        return PCI_BAR_UNMAPPED;
> > +    }
> > +
> > +    return search_addr;
> > +}
> > +
> > +static int spapr_map_bars(sPAPRPHBState *phb, PCIDevice *dev)
> 
> This does not use @phb, well, it uses to call spapr_find_bar_addr() but
> that function does not use it either.
> 
> Yet another candidate to get moved to hw/pci/pci.c? If you do so, you'll
> get even more reviews :)
> 
> 
> > +{
> > +    PCIIORegion *r;
> > +    int i, ret = -1;
> > +
> > +    for (i = 0; i < PCI_NUM_REGIONS; i++) {
> > +        uint32_t bar_address = pci_bar(dev, i);
> > +        uint32_t bar_value;
> > +        uint16_t cmd_value = pci_default_read_config(dev, PCI_COMMAND, 2);
> > +        hwaddr addr;
> > +
> > +        r = &dev->io_regions[i];
> > +
> > +        /* this region isn't registered */
> > +        if (!r->size) {
> > +            continue;
> > +        }
> > +
> > +        /* find a hw addr we can map */
> > +        addr = spapr_find_bar_addr(phb, r);
> > +        if (addr == PCI_BAR_UNMAPPED) {
> > +            /* we can't find a free range within address space for this BAR */
> > +            fprintf(stderr,
> > +                    "Unable to map BAR %d, no free range available\n", i);
> > +            return -1;
> > +        }
> > +        /* we can probably map this region into memory if there is not
> > +         * a race condition with some other allocator. write the address
> > +         * to the device BAR which will force a call to pci_update_mappings
> > +         */
> > +        if (r->type & PCI_BASE_ADDRESS_SPACE_IO) {
> > +            pci_default_write_config(dev, PCI_COMMAND,
> > +                                     cmd_value | PCI_COMMAND_IO, 2);
> > +        } else {
> > +            pci_default_write_config(dev, PCI_COMMAND,
> > +                                     cmd_value | PCI_COMMAND_MEMORY, 2);
> > +        }
> > +
> > +        bar_value = addr;
> > +
> > +        if (i == PCI_ROM_SLOT) {
> > +            bar_value |= PCI_ROM_ADDRESS_ENABLE;
> > +        }
> > +        /* write the new bar value */
> > +        pci_default_write_config(dev, bar_address, bar_value, 4);
> > +
> > +        /* if this is a 64-bit BAR, we need to also write the
> > +         * upper 32 bit value.
> > +         */
> > +        if (r->type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
> > +            bar_value = (addr >> 32) & 0xffffffffUL;
> > +            pci_default_write_config(dev, bar_address + 4, bar_value, 4);
> > +        }
> > +        ret = 0;
> > +    }
> > +    return ret;
> > +}
> > +
> > +static int spapr_populate_pci_child_dt(PCIDevice *dev, void *fdt, int offset,
> > +                                       int phb_index)
> > +{
> > +    int slot = PCI_SLOT(dev->devfn);
> > +    char slotname[16];
> > +    bool is_bridge = 1;
> > +    DrcEntry *drc_entry, *drc_entry_slot;
> > +    uint32_t reg[RESOURCE_CELLS_TOTAL * 8] = { 0 };
> > +    uint32_t assigned[RESOURCE_CELLS_TOTAL * 8] = { 0 };
> > +    int reg_size, assigned_size;
> > +
> > +    drc_entry = spapr_phb_to_drc_entry(phb_index + SPAPR_PCI_BASE_BUID);
> > +    g_assert(drc_entry);
> > +    drc_entry_slot = &drc_entry->child_entries[slot];
> > +
> > +    if (pci_default_read_config(dev, PCI_HEADER_TYPE, 1) ==
> 
> 
> s/1/PCI_HEADER_TYPE_BRIDGE/
> 
> 
> > +        PCI_HEADER_TYPE_NORMAL) {
> > +        is_bridge = 0;
> > +    }
> > +
> > +    _FDT(fdt_setprop_cell(fdt, offset, "vendor-id",
> > +                          pci_default_read_config(dev, PCI_VENDOR_ID, 2)));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "device-id",
> > +                          pci_default_read_config(dev, PCI_DEVICE_ID, 2)));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "revision-id",
> > +                          pci_default_read_config(dev, PCI_REVISION_ID, 1)));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "class-code",
> > +                          pci_default_read_config(dev, PCI_CLASS_DEVICE, 2) << 8));
> > +
> > +    _FDT(fdt_setprop_cell(fdt, offset, "interrupts",
> > +                          pci_default_read_config(dev, PCI_INTERRUPT_PIN, 1)));
> > +
> > +    /* if this device is NOT a bridge */
> > +    if (!is_bridge) {
> 
> 
> s/!is_bridge/pci_default_read_config(dev, PCI_HEADER_TYPE, 1) ==
> PCI_HEADER_TYPE_NORMAL/
> 
> and get rid of is_bridge?
> 
> 
> 
> > +        _FDT(fdt_setprop_cell(fdt, offset, "min-grant",
> > +            pci_default_read_config(dev, PCI_MIN_GNT, 1)));
> > +        _FDT(fdt_setprop_cell(fdt, offset, "max-latency",
> > +            pci_default_read_config(dev, PCI_MAX_LAT, 1)));
> > +        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-id",
> > +            pci_default_read_config(dev, PCI_SUBSYSTEM_ID, 2)));
> > +        _FDT(fdt_setprop_cell(fdt, offset, "subsystem-vendor-id",
> > +            pci_default_read_config(dev, PCI_SUBSYSTEM_VENDOR_ID, 2)));
> > +    }
> > +
> > +    _FDT(fdt_setprop_cell(fdt, offset, "cache-line-size",
> > +        pci_default_read_config(dev, PCI_CACHE_LINE_SIZE, 1)));
> > +
> > +    /* the following fdt cells are masked off the pci status register */
> > +    int pci_status = pci_default_read_config(dev, PCI_STATUS, 2);
> > +    _FDT(fdt_setprop_cell(fdt, offset, "devsel-speed",
> > +                          PCI_STATUS_DEVSEL_MASK & pci_status));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "fast-back-to-back",
> > +                          PCI_STATUS_FAST_BACK & pci_status));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "66mhz-capable",
> > +                          PCI_STATUS_66MHZ & pci_status));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "udf-supported",
> > +                          PCI_STATUS_UDF & pci_status));
> > +
> > +    _FDT(fdt_setprop_string(fdt, offset, "name", "pci"));
> > +    sprintf(slotname, "Slot %d", slot + phb_index * 32);
> > +    _FDT(fdt_setprop(fdt, offset, "ibm,loc-code", slotname, strlen(slotname)));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,my-drc-index",
> > +                          drc_entry_slot->drc_index));
> > +
> > +    _FDT(fdt_setprop_cell(fdt, offset, "#address-cells",
> > +                          RESOURCE_CELLS_ADDRESS));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "#size-cells",
> > +                          RESOURCE_CELLS_SIZE));
> > +    _FDT(fdt_setprop_cell(fdt, offset, "ibm,req#msi-x",
> > +                          RESOURCE_CELLS_SIZE));
> > +    fill_resource_props(dev, phb_index, reg, &reg_size,
> > +                        assigned, &assigned_size);
> > +    _FDT(fdt_setprop(fdt, offset, "reg", reg, reg_size));
> > +    _FDT(fdt_setprop(fdt, offset, "assigned-addresses",
> > +                     assigned, assigned_size));
> > +
> > +    return 0;
> > +}
> > +
> > +static int spapr_device_hotplug_add(DeviceState *qdev, PCIDevice *dev)
> > +{
> > +    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(qdev);
> > +    DrcEntry *drc_entry, *drc_entry_slot;
> > +    ConfigureConnectorState *ccs;
> > +    int slot = PCI_SLOT(dev->devfn);
> > +    int offset, ret;
> > +    void *fdt_orig, *fdt;
> > +    char nodename[512];
> > +    uint32_t encoded = ENCODE_DRC_STATE(INDICATOR_ENTITY_SENSE_PRESENT,
> > +                                        INDICATOR_ENTITY_SENSE_MASK,
> > +                                        INDICATOR_ENTITY_SENSE_SHIFT);
> > +
> > +    drc_entry = spapr_phb_to_drc_entry(phb->buid);
> > +    g_assert(drc_entry);
> > +    drc_entry_slot = &drc_entry->child_entries[slot];
> > +
> > +    drc_entry->state &= ~(uint32_t)INDICATOR_ENTITY_SENSE_MASK;
> > +    drc_entry->state |= encoded; /* DR entity present */
> > +    drc_entry_slot->state &= ~(uint32_t)INDICATOR_ENTITY_SENSE_MASK;
> > +    drc_entry_slot->state |= encoded; /* and the slot */
> 
> 
> "and the slot" what?
> s/uint32_t encoded/const uint32_t present/ and remove the comments?
> 
> 
> > +    /* need to allocate memory region for device BARs */
> > +    spapr_map_bars(phb, dev);
> > +
> > +    /* add OF node for pci device and required OF DT properties */
> > +    fdt_orig = g_malloc0(FDT_MAX_SIZE);
> > +    offset = fdt_create(fdt_orig, FDT_MAX_SIZE);
> > +    fdt_begin_node(fdt_orig, "");
> > +    fdt_end_node(fdt_orig);
> > +    fdt_finish(fdt_orig);
> > +
> > +    fdt = g_malloc0(FDT_MAX_SIZE);
> > +    fdt_open_into(fdt_orig, fdt, FDT_MAX_SIZE);
> > +    sprintf(nodename, "pci@%d", slot);
> > +    offset = fdt_add_subnode(fdt, 0, nodename);
> > +    ret = spapr_populate_pci_child_dt(dev, fdt, offset, phb->index);
> > +    g_assert(!ret);
> > +    g_free(fdt_orig);
> > +
> > +    /* hold on to node, configure_connector will pass it to the guest later */
> > +    ccs = &drc_entry_slot->cc_state;
> > +    ccs->fdt = fdt;
> > +    ccs->offset_start = offset;
> > +    ccs->state = CC_STATE_PENDING;
> > +    ccs->dev = dev;
> > +
> > +    return 0;
> > +}
> > +
> > +/* check whether guest has released/isolated device */
> > +static bool spapr_drc_state_is_releasable(DrcEntry *drc_entry)
> > +{
> > +    return !DECODE_DRC_STATE(drc_entry->state,
> > +                             INDICATOR_ISOLATION_MASK,
> > +                             INDICATOR_ISOLATION_SHIFT);
> > +}
> 
> It looks like this is the only separated function which calls
> DECODE_DRC_STATE, and it is used just once, and  others call
> DECODE_DRC_STATE()/ENCODE_DRC_STATE() directly. I'd remove this function
> and call DECODE_DRC_STATE() directly, below in the code.
> 
> 
> > +
> > +/* finalize device unplug/deletion */
> > +static void spapr_drc_state_reset(DrcEntry *drc_entry)
> > +{
> > +    ConfigureConnectorState *ccs = &drc_entry->cc_state;
> > +    uint32_t sense_empty = ENCODE_DRC_STATE(INDICATOR_ENTITY_SENSE_EMPTY,
> > +                                            INDICATOR_ENTITY_SENSE_MASK,
> > +                                            INDICATOR_ENTITY_SENSE_SHIFT);
> > +
> > +    g_free(ccs->fdt);
> > +    ccs->fdt = NULL;
> > +    object_unparent(OBJECT(ccs->dev));
> > +    ccs->dev = NULL;
> > +    ccs->state = CC_STATE_IDLE;
> > +    drc_entry->state &= ~INDICATOR_ENTITY_SENSE_MASK;
> > +    drc_entry->state |= sense_empty;
> > +    drc_entry->awaiting_release = false;
> > +}
> > +
> > +static void spapr_device_hotplug_remove(DeviceState *qdev, PCIDevice *dev)
> > +{
> > +    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(qdev);
> > +    DrcEntry *drc_entry, *drc_entry_slot;
> > +    ConfigureConnectorState *ccs;
> > +    int slot = PCI_SLOT(dev->devfn);
> > +
> > +    drc_entry = spapr_phb_to_drc_entry(phb->buid);
> > +    g_assert(drc_entry);
> > +    drc_entry_slot = &drc_entry->child_entries[slot];
> > +    ccs = &drc_entry_slot->cc_state;
> > +    /* shouldn't be removing devices we haven't created an fdt for */
> > +    g_assert(ccs->state != CC_STATE_IDLE);
> 
> 
> Instead of g_assert(), would not it be better to return -1 here and then
> return this return code from spapr_device_hotplug() and let common PCI code
> handle this?
> 
> Or we are absolutely sure that spapr_device_hotplug() cannot possibly fail
> so we are ready to kill the guest if it does? I do not know, just asking :)

CC_STATE_IDLE describes an empty slot in the context of PCI, so we should
never hit the assertion. It more of a development aid to ensure proper code
handling during initial device creation. This should not be triggerable by
users.

> 
> 
> > +    /* if the device has already been released/isolated by guest, go ahead
> > +     * and remove it now. Otherwise, flag it as pending guest release so it
> > +     * can be removed later
> > +     */
> > +    if (spapr_drc_state_is_releasable(drc_entry_slot)) {
> > +        spapr_drc_state_reset(drc_entry_slot);
> > +    } else {
> > +        if (drc_entry_slot->awaiting_release) {
> > +            fprintf(stderr, "waiting for guest to release the device");
> > +        } else {
> > +            drc_entry_slot->awaiting_release = true;
> > +        }
> > +    }
> > +}
> > +
> > +static int spapr_device_hotplug(DeviceState *qdev, PCIDevice *dev,
> > +                                PCIHotplugState state)
> > +{
> 
> sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(qdev);
> 
> > +    if (state == PCI_COLDPLUG_ENABLED) {
> > +        return 0;
> > +    }
> > +
> > +    if (state == PCI_HOTPLUG_ENABLED) {
> > +        spapr_device_hotplug_add(qdev, dev);
> > +    } else {
> > +        spapr_device_hotplug_remove(qdev, dev);
> > +    }
> 
> and here s/qdev/phb/? spapr_device_hotplug_(add|remove),
> spapr_pci_hotplug_(add|remove)_event (from further patch(es)) do not use
> qdev as a DeviceState anyway, they cast it to sPAPRPHBState and use that.
> 

That's cleaner, will fix.

> 
> 
> > +
> > +    return 0;
> > +}
> > +
> >  static int spapr_phb_init(SysBusDevice *s)
> >  {
> >      DeviceState *dev = DEVICE(s);
> > @@ -889,6 +1255,7 @@ static int spapr_phb_init(SysBusDevice *s)
> >                             &sphb->memspace, &sphb->iospace,
> >                             PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
> >      phb->bus = bus;
> > +    pci_bus_hotplug(phb->bus, spapr_device_hotplug, DEVICE(sphb));
> >  
> >      sphb->dma_window_start = 0;
> >      sphb->dma_window_size = 0x40000000;
> > @@ -1181,14 +1548,6 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
> >          return bus_off;
> >      }
> >  
> > -#define _FDT(exp) \
> > -    do { \
> > -        int ret = (exp);                                           \
> > -        if (ret < 0) {                                             \
> > -            return ret;                                            \
> > -        }                                                          \
> > -    } while (0)
> > -
> >      /* Write PHB properties */
> >      _FDT(fdt_setprop_string(fdt, bus_off, "device_type", "pci"));
> >      _FDT(fdt_setprop_string(fdt, bus_off, "compatible", "IBM,Logical_PHB"));
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index 7c8a521..1c9b725 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -328,6 +328,7 @@ struct DrcEntry {
> >      void *fdt;
> >      int fdt_offset;
> >      uint32_t state;
> > +    bool awaiting_release;
> >      ConfigureConnectorState cc_state;
> >      DrcEntry *child_entries;
> >  };
> > 
> 
> 
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 12/14] spapr_events: re-use EPOW event infrastructure for hotplug events
  2013-12-16  5:05   ` Alexey Kardashevskiy
@ 2014-01-16 21:32     ` Michael Roth
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Roth @ 2014-01-16 21:32 UTC (permalink / raw
  To: Alexey Kardashevskiy, qemu-devel
  Cc: agraf, ncmike, qemu-ppc, tyreld, nfont, paulus

Quoting Alexey Kardashevskiy (2013-12-15 23:05:55)
> On 12/06/2013 09:33 AM, Michael Roth wrote:
> > From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> > 
> > This extends the data structures currently used to report EPOW events to
> > gets via the check-exception RTAS interfaces to also include event types
> > for hotplug/unplug events.
> > 
> > This is currently undocumented and being finalized for inclusion in PAPR
> > specification, but we implement this here as an extension for guest
> > userspace tools to implement (existing guest kernels simply log these
> > events via a sysfs interface that's read by rtas_errd).
> > 
> > Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr.c         |    2 +-
> >  hw/ppc/spapr_events.c  |  219 +++++++++++++++++++++++++++++++++++++++---------
> >  include/hw/ppc/spapr.h |    4 +-
> >  3 files changed, 184 insertions(+), 41 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 2250ee1..7079e4e 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1522,7 +1522,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
> >      spapr->fdt_skel = spapr_create_fdt_skel(initrd_base, initrd_size,
> >                                              kernel_size, kernel_le,
> >                                              boot_device, kernel_cmdline,
> > -                                            spapr->epow_irq);
> > +                                            spapr->check_exception_irq);
> >      assert(spapr->fdt_skel != NULL);
> >  }
> >  
> > diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> > index 16fa49e..9dfdbcf 100644
> > --- a/hw/ppc/spapr_events.c
> > +++ b/hw/ppc/spapr_events.c
> > @@ -32,6 +32,8 @@
> >  
> >  #include "hw/ppc/spapr.h"
> >  #include "hw/ppc/spapr_vio.h"
> > +#include "hw/pci/pci.h"
> > +#include "hw/pci-host/spapr.h"
> >  
> >  #include <libfdt.h>
> >  
> > @@ -77,6 +79,7 @@ struct rtas_error_log {
> >  #define   RTAS_LOG_TYPE_ECC_UNCORR              0x00000009
> >  #define   RTAS_LOG_TYPE_ECC_CORR                0x0000000a
> >  #define   RTAS_LOG_TYPE_EPOW                    0x00000040
> > +#define   RTAS_LOG_TYPE_HOTPLUG                 0x000000e5
> >      uint32_t extended_length;
> >  } QEMU_PACKED;
> >  
> > @@ -166,6 +169,38 @@ struct epow_log_full {
> >      struct rtas_event_log_v6_epow epow;
> >  } QEMU_PACKED;
> >  
> > +struct rtas_event_log_v6_hp {
> > +#define RTAS_LOG_V6_SECTION_ID_HOTPLUG              0x4850 /* HP */
> > +    struct rtas_event_log_v6_section_header hdr;
> > +    uint8_t hotplug_type;
> > +#define RTAS_LOG_V6_HP_TYPE_CPU                          1
> > +#define RTAS_LOG_V6_HP_TYPE_MEMORY                       2
> > +#define RTAS_LOG_V6_HP_TYPE_SLOT                         3
> > +#define RTAS_LOG_V6_HP_TYPE_PHB                          4
> > +#define RTAS_LOG_V6_HP_TYPE_PCI                          5
> > +    uint8_t hotplug_action;
> > +#define RTAS_LOG_V6_HP_ACTION_ADD                        1
> > +#define RTAS_LOG_V6_HP_ACTION_REMOVE                     2
> > +    uint8_t hotplug_identifier;
> > +#define RTAS_LOG_V6_HP_ID_DRC_NAME                       1
> > +#define RTAS_LOG_V6_HP_ID_DRC_INDEX                      2
> > +#define RTAS_LOG_V6_HP_ID_DRC_COUNT                      3
> > +    uint8_t reserved;
> > +    union {
> > +        uint32_t index;
> > +        uint32_t count;
> > +        char name[1];
> > +    } drc;
> > +} QEMU_PACKED;
> > +
> > +struct hp_log_full {
> > +    struct rtas_error_log hdr;
> > +    struct rtas_event_log_v6 v6hdr;
> > +    struct rtas_event_log_v6_maina maina;
> > +    struct rtas_event_log_v6_mainb mainb;
> > +    struct rtas_event_log_v6_hp hp;
> > +} QEMU_PACKED;
> > +
> >  #define EVENT_MASK_INTERNAL_ERRORS           0x80000000
> >  #define EVENT_MASK_EPOW                      0x40000000
> >  #define EVENT_MASK_HOTPLUG                   0x10000000
> > @@ -181,29 +216,61 @@ struct epow_log_full {
> >          }                                                          \
> >      } while (0)
> >  
> > -void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq)
> > +void spapr_events_fdt_skel(void *fdt, uint32_t check_exception_irq)
> >  {
> > -    uint32_t epow_irq_ranges[] = {cpu_to_be32(epow_irq), cpu_to_be32(1)};
> > -    uint32_t epow_interrupts[] = {cpu_to_be32(epow_irq), 0};
> > +    uint32_t irq_ranges[] = {cpu_to_be32(check_exception_irq), cpu_to_be32(1)};
> > +    uint32_t interrupts[] = {cpu_to_be32(check_exception_irq), 0};
> >  
> >      _FDT((fdt_begin_node(fdt, "event-sources")));
> >  
> >      _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
> >      _FDT((fdt_property_cell(fdt, "#interrupt-cells", 2)));
> >      _FDT((fdt_property(fdt, "interrupt-ranges",
> > -                       epow_irq_ranges, sizeof(epow_irq_ranges))));
> > +                       irq_ranges, sizeof(irq_ranges))));
> >  
> >      _FDT((fdt_begin_node(fdt, "epow-events")));
> > -    _FDT((fdt_property(fdt, "interrupts",
> > -                       epow_interrupts, sizeof(epow_interrupts))));
> > +    _FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts))));
> >      _FDT((fdt_end_node(fdt)));
> >  
> >      _FDT((fdt_end_node(fdt)));
> >  }
> >  
> >  static struct epow_log_full *pending_epow;
> > +static struct hp_log_full *pending_hp;
> >  static uint32_t next_plid;
> >  
> > +static void spapr_init_v6hdr(struct rtas_event_log_v6 *v6hdr)
> > +{
> > +    v6hdr->b0 = RTAS_LOG_V6_B0_VALID | RTAS_LOG_V6_B0_NEW_LOG
> > +        | RTAS_LOG_V6_B0_BIGENDIAN;
> > +    v6hdr->b2 = RTAS_LOG_V6_B2_POWERPC_FORMAT
> > +        | RTAS_LOG_V6_B2_LOG_FORMAT_PLATFORM_EVENT;
> > +    v6hdr->company = cpu_to_be32(RTAS_LOG_V6_COMPANY_IBM);
> > +}
> > +
> > +static void spapr_init_maina(struct rtas_event_log_v6_maina *maina,
> > +                             int section_count)
> > +{
> > +    struct tm tm;
> > +    int year;
> > +
> > +    maina->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINA);
> > +    maina->hdr.section_length = cpu_to_be16(sizeof(*maina));
> > +    /* FIXME: section version, subtype and creator id? */
> > +    qemu_get_timedate(&tm, spapr->rtc_offset);
> > +    year = tm.tm_year + 1900;
> > +    maina->creation_date = cpu_to_be32((to_bcd(year / 100) << 24)
> > +                                       | (to_bcd(year % 100) << 16)
> > +                                       | (to_bcd(tm.tm_mon + 1) << 8)
> > +                                       | to_bcd(tm.tm_mday));
> > +    maina->creation_time = cpu_to_be32((to_bcd(tm.tm_hour) << 24)
> > +                                       | (to_bcd(tm.tm_min) << 16)
> > +                                       | (to_bcd(tm.tm_sec) << 8));
> > +    maina->creator_id = 'H'; /* Hypervisor */
> > +    maina->section_count = section_count;
> > +    maina->plid = next_plid++;
> > +}
> > +
> >  static void spapr_powerdown_req(Notifier *n, void *opaque)
> >  {
> >      sPAPREnvironment *spapr = container_of(n, sPAPREnvironment, epow_notifier);
> > @@ -212,8 +279,6 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
> >      struct rtas_event_log_v6_maina *maina;
> >      struct rtas_event_log_v6_mainb *mainb;
> >      struct rtas_event_log_v6_epow *epow;
> > -    struct tm tm;
> > -    int year;
> >  
> >      if (pending_epow) {
> >          /* For now, we just throw away earlier events if two come
> > @@ -237,27 +302,8 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
> >      hdr->extended_length = cpu_to_be32(sizeof(*pending_epow)
> >                                         - sizeof(pending_epow->hdr));
> >  
> > -    v6hdr->b0 = RTAS_LOG_V6_B0_VALID | RTAS_LOG_V6_B0_NEW_LOG
> > -        | RTAS_LOG_V6_B0_BIGENDIAN;
> > -    v6hdr->b2 = RTAS_LOG_V6_B2_POWERPC_FORMAT
> > -        | RTAS_LOG_V6_B2_LOG_FORMAT_PLATFORM_EVENT;
> > -    v6hdr->company = cpu_to_be32(RTAS_LOG_V6_COMPANY_IBM);
> > -
> > -    maina->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINA);
> > -    maina->hdr.section_length = cpu_to_be16(sizeof(*maina));
> > -    /* FIXME: section version, subtype and creator id? */
> > -    qemu_get_timedate(&tm, spapr->rtc_offset);
> > -    year = tm.tm_year + 1900;
> > -    maina->creation_date = cpu_to_be32((to_bcd(year / 100) << 24)
> > -                                       | (to_bcd(year % 100) << 16)
> > -                                       | (to_bcd(tm.tm_mon + 1) << 8)
> > -                                       | to_bcd(tm.tm_mday));
> > -    maina->creation_time = cpu_to_be32((to_bcd(tm.tm_hour) << 24)
> > -                                       | (to_bcd(tm.tm_min) << 16)
> > -                                       | (to_bcd(tm.tm_sec) << 8));
> > -    maina->creator_id = 'H'; /* Hypervisor */
> > -    maina->section_count = 3; /* Main-A, Main-B and EPOW */
> > -    maina->plid = next_plid++;
> > +    spapr_init_v6hdr(v6hdr);
> > +    spapr_init_maina(maina, 3 /* Main-A, Main-B and EPOW */);
> >  
> >      mainb->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINB);
> >      mainb->hdr.section_length = cpu_to_be16(sizeof(*mainb));
> > @@ -274,9 +320,93 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
> >      epow->event_modifier = RTAS_LOG_V6_EPOW_MODIFIER_NORMAL;
> >      epow->extended_modifier = RTAS_LOG_V6_EPOW_XMODIFIER_PARTITION_SPECIFIC;
> >  
> > -    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->epow_irq));
> > +    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
> > +}
> > +
> > +static void spapr_hotplug_req_event(uint8_t hp_type, uint8_t hp_action,
> > +                                    sPAPRPHBState *phb, int slot)
> 
> 
> This only uses a @buid from sPAPRPHBState, what is the point in passing the
> while struct? Any plans to use other fields there?

None that I can think of, will switch to using only buid. We may need to
expand this in the future for phb/cpu/memory hp event types, but
those will require more parameters anyway.

> 
> 
> > +{
> > +    struct rtas_error_log *hdr;
> > +    struct rtas_event_log_v6 *v6hdr;
> > +    struct rtas_event_log_v6_maina *maina;
> > +    struct rtas_event_log_v6_mainb *mainb;
> > +    struct rtas_event_log_v6_hp *hp;
> > +    DrcEntry *drc_entry;
> > +
> > +    if (pending_hp) {
> > +        /* Just toss any pending hotplug events for now, this will
> > +         * need to be fixed later on.
> > +         */
> > +        g_free(pending_hp);
> > +    }
> > +
> > +    pending_hp = g_malloc0(sizeof(*pending_hp));
> > +    hdr = &pending_hp->hdr;
> > +    v6hdr = &pending_hp->v6hdr;
> > +    maina = &pending_hp->maina;
> > +    mainb = &pending_hp->mainb;
> > +    hp = &pending_hp->hp;
> > +
> > +    hdr->summary = cpu_to_be32(RTAS_LOG_VERSION_6
> > +                               | RTAS_LOG_SEVERITY_EVENT
> > +                               | RTAS_LOG_DISPOSITION_NOT_RECOVERED
> > +                               | RTAS_LOG_OPTIONAL_PART_PRESENT
> > +                               | RTAS_LOG_INITIATOR_HOTPLUG
> > +                               | RTAS_LOG_TYPE_HOTPLUG);
> > +    hdr->extended_length = cpu_to_be32(sizeof(*pending_hp)
> > +                                       - sizeof(pending_hp->hdr));
> > +
> > +    spapr_init_v6hdr(v6hdr);
> > +    spapr_init_maina(maina, 3 /* Main-A, Main-B, HP */);
> > +
> > +    mainb->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_MAINB);
> > +    mainb->hdr.section_length = cpu_to_be16(sizeof(*mainb));
> > +    mainb->subsystem_id = 0x80; /* External environment */
> > +    mainb->event_severity = 0x00; /* Informational / non-error */
> > +    mainb->event_subtype = 0x00; /* Normal shutdown */
> > +
> > +    hp->hdr.section_id = cpu_to_be16(RTAS_LOG_V6_SECTION_ID_HOTPLUG);
> > +    hp->hdr.section_length = cpu_to_be16(sizeof(*hp));
> > +    hp->hdr.section_version = 1; /* includes extended modifier */
> > +    hp->hotplug_action = hp_action;
> > +
> > +    hp->hotplug_type = hp_type;
> > +
> > +    drc_entry = spapr_phb_to_drc_entry(phb->buid);
> > +    if (!drc_entry) {
> > +        drc_entry = spapr_add_phb_to_drc_table(phb->buid, 2 /* Unusable */);
> > +    }
> > +
> > +    switch (hp_type) {
> > +    case RTAS_LOG_V6_HP_TYPE_PCI:
> > +        hp->drc.index = drc_entry->child_entries[slot].drc_index;
> > +        hp->hotplug_identifier = RTAS_LOG_V6_HP_ID_DRC_INDEX;
> > +        break;
> > +    }
> > +
> > +    qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_irq));
> > +}
> > +
> > +void spapr_pci_hotplug_add_event(DeviceState *qdev, int slot)
> > +{
> > +    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(qdev);
> > +
> > +    return spapr_hotplug_req_event(RTAS_LOG_V6_HP_TYPE_PCI,
> > +                                   RTAS_LOG_V6_HP_ACTION_ADD, phb, slot);
> >  }
> >  
> > +void spapr_pci_hotplug_remove_event(DeviceState *qdev, int slot)
> > +{
> > +    sPAPRPHBState *phb = SPAPR_PCI_HOST_BRIDGE(qdev);
> > +
> > +    /* TODO: removal is generally initiated by guest, need to
> > +     * document what exactly the guest is supposed to do with
> > +     * this event. What does ACPI or shpc do?
> > +     */
> > +    return spapr_hotplug_req_event(RTAS_LOG_V6_HP_TYPE_PCI,
> > +                                   RTAS_LOG_V6_HP_ACTION_REMOVE, phb, slot);
> > + }
> > +
> >  static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> >                              uint32_t token, uint32_t nargs,
> >                              target_ulong args,
> > @@ -298,15 +428,26 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> >          xinfo |= (uint64_t)rtas_ld(args, 6) << 32;
> >      }
> >  
> > -    if ((mask & EVENT_MASK_EPOW) && pending_epow) {
> > -        if (sizeof(*pending_epow) < len) {
> > -            len = sizeof(*pending_epow);
> > -        }
> > +    if (mask & EVENT_MASK_EPOW) {
> > +        if (pending_epow) {
> > +            if (sizeof(*pending_epow) < len) {
> > +                len = sizeof(*pending_epow);
> > +            }
> >  
> > -        cpu_physical_memory_write(buf, pending_epow, len);
> > -        g_free(pending_epow);
> > -        pending_epow = NULL;
> > -        rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > +            cpu_physical_memory_write(buf, pending_epow, len);
> > +            g_free(pending_epow);
> > +            pending_epow = NULL;
> > +            rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > +        } else if (pending_hp) {
> > +            if (sizeof(*pending_hp) < len) {
> > +                len = sizeof(*pending_hp);
> > +            }
> > +
> > +            cpu_physical_memory_write(buf, pending_hp, len);
> > +            g_free(pending_hp);
> > +            pending_hp = NULL;
> > +            rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> > +        }
> >      } else {
> >          rtas_st(rets, 0, RTAS_OUT_NO_ERRORS_FOUND);
> >      }
> > @@ -314,7 +455,7 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> >  
> >  void spapr_events_init(sPAPREnvironment *spapr)
> >  {
> > -    spapr->epow_irq = spapr_allocate_msi(0);
> > +    spapr->check_exception_irq = spapr_allocate_msi(0);
> >      spapr->epow_notifier.notify = spapr_powerdown_req;
> >      qemu_register_powerdown_notifier(&spapr->epow_notifier);
> >      spapr_rtas_register("check-exception", check_exception);
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index 1c9b725..9eef2ce 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -31,7 +31,7 @@ typedef struct sPAPREnvironment {
> >      uint64_t rtc_offset;
> >      bool has_graphics;
> >  
> > -    uint32_t epow_irq;
> > +    uint32_t check_exception_irq;
> >      Notifier epow_notifier;
> >  
> >      /* Migration state */
> > @@ -473,5 +473,7 @@ int spapr_dma_dt(void *fdt, int node_off, const char *propname,
> >                   uint32_t liobn, uint64_t window, uint32_t size);
> >  int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
> >                        sPAPRTCETable *tcet);
> > +void spapr_pci_hotplug_add_event(DeviceState *qdev, int slot);
> > +void spapr_pci_hotplug_remove_event(DeviceState *qdev, int slot);
> >  
> >  #endif /* !defined (__HW_SPAPR_H__) */
> > 
> 
> 
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/14] spapr: populate DRC entries for root dt node
  2014-01-16 20:51       ` Michael Roth
@ 2014-01-20  2:58         ` Alexey Kardashevskiy
  2014-01-20 14:12           ` Mike Day
  2014-01-20 17:24           ` Michael Roth
  0 siblings, 2 replies; 39+ messages in thread
From: Alexey Kardashevskiy @ 2014-01-20  2:58 UTC (permalink / raw
  To: Michael Roth, qemu-devel; +Cc: agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

On 01/17/2014 07:51 AM, Michael Roth wrote:
> Quoting Alexey Kardashevskiy (2013-12-15 22:54:42)
>> On 12/16/2013 01:59 PM, Alexey Kardashevskiy wrote:
>>> On 12/06/2013 09:32 AM, Michael Roth wrote:
>>>> From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>>>>
>>>> This add entries to the root OF node to advertise our PHBs as being
>>>> DR-capable in according with PAPR specification.
>>>>
>>>> Each PHB is given a name of PHB<bus#>, advertised as a PHB type,
>>>> and associated with a power domain of -1 (indicating to guests that
>>>> power management is handled automatically by hardware).
>>>>
>>>> We currently allocate entries for up to 32 DR-capable PHBs, though
>>>> this limit can be increased later.
>>>>
>>>> DrcEntry objects to track the state of the DR-connector associated
>>>> with each PHB are stored in a 32-entry array, and each DrcEntry has
>>>> in turn have a dynamically-sized number of child DR-connectors,
>>>> which we will use later to track the state of DR-connectors
>>>> associated with a PHB's physical slots.
>>>>
>>>> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>>>> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
>>>> ---
>>>>  hw/ppc/spapr.c         |  132 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  include/hw/ppc/spapr.h |   33 ++++++++++++
>>>>  2 files changed, 165 insertions(+)
>>>>
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index 7e53a5f..ec3ba43 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -81,6 +81,7 @@
>>>>  #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
>>>>  
>>>>  sPAPREnvironment *spapr;
>>>> +DrcEntry drc_table[SPAPR_DRC_TABLE_SIZE];
>>>>  
>>>>  int spapr_allocate_irq(int hint, bool lsi)
>>>>  {
>>>> @@ -276,6 +277,130 @@ static size_t create_page_sizes_prop(CPUPPCState *env, uint32_t *prop,
>>>>      return (p - prop) * sizeof(uint32_t);
>>>>  }
>>>>  
>>>> +static void spapr_init_drc_table(void)
>>>> +{
>>>> +    int i;
>>>> +
>>>> +    memset(drc_table, 0, sizeof(drc_table));
>>>> +
>>>> +    /* For now we only care about PHB entries */
>>>> +    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
>>>> +        drc_table[i].drc_index = 0x2000001 + i;
>>>> +    }
>>>> +}
>>>> +
>>>> +DrcEntry *spapr_add_phb_to_drc_table(uint64_t buid, uint32_t state)
>>>> +{
>>>> +    DrcEntry *empty_drc = NULL;
>>>> +    DrcEntry *found_drc = NULL;
>>>> +    int i, phb_index;
>>>> +
>>>> +    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
>>>> +        if (drc_table[i].phb_buid == 0) {
>>>> +            empty_drc = &drc_table[i];
>>>> +        }
>>>> +
>>>> +        if (drc_table[i].phb_buid == buid) {
>>>> +            found_drc = &drc_table[i];
>>>> +            break;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    if (found_drc) {
>>>> +        return found_drc;
>>>> +    }
>>>> +
>>>> +    if (empty_drc) {
>>>> +        empty_drc->phb_buid = buid;
>>>> +        empty_drc->state = state;
>>>> +        empty_drc->cc_state.fdt = NULL;
>>>> +        empty_drc->cc_state.offset = 0;
>>>> +        empty_drc->cc_state.depth = 0;
>>>> +        empty_drc->cc_state.state = CC_STATE_IDLE;
>>>> +        empty_drc->child_entries =
>>>> +            g_malloc0(sizeof(DrcEntry) * SPAPR_DRC_PHB_SLOT_MAX);
>>>> +        phb_index = buid - SPAPR_PCI_BASE_BUID;
>>>> +        for (i = 0; i < SPAPR_DRC_PHB_SLOT_MAX; i++) {
>>>> +            empty_drc->child_entries[i].drc_index =
>>>> +                SPAPR_DRC_DEV_ID_BASE + (phb_index << 8) + (i << 3);
>>>> +        }
>>>> +        return empty_drc;
>>>> +    }
>>>> +
>>>> +    return NULL;
>>>> +}
>>>> +
>>>> +static void spapr_create_drc_dt_entries(void *fdt)
>>>> +{
>>>> +    char char_buf[1024];
>>>> +    uint32_t int_buf[SPAPR_DRC_TABLE_SIZE + 1];
>>>> +    uint32_t *entries;
>>>> +    int offset, fdt_offset;
>>>> +    int i, ret;
>>>> +
>>>> +    fdt_offset = fdt_path_offset(fdt, "/");
>>>> +
>>>> +    /* ibm,drc-indexes */
>>>> +    memset(int_buf, 0, sizeof(int_buf));
>>>> +    int_buf[0] = SPAPR_DRC_TABLE_SIZE;
>>>> +
>>>> +    for (i = 1; i <= SPAPR_DRC_TABLE_SIZE; i++) {
>>>> +        int_buf[i] = drc_table[i-1].drc_index;
>>>> +    }
>>>> +
>>>> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-indexes", int_buf,
>>>> +                      sizeof(int_buf));
>>>> +    if (ret) {
>>>> +        fprintf(stderr, "Couldn't finalize ibm,drc-indexes property\n");
>>>> +    }
>>>> +
>>>> +    /* ibm,drc-power-domains */
>>>> +    memset(int_buf, 0, sizeof(int_buf));
>>>> +    int_buf[0] = SPAPR_DRC_TABLE_SIZE;
>>>> +
>>>> +    for (i = 1; i <= SPAPR_DRC_TABLE_SIZE; i++) {
>>>> +        int_buf[i] = 0xffffffff;
>>>> +    }
>>>> +
>>>> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-power-domains", int_buf,
>>>> +                      sizeof(int_buf));
>>>> +    if (ret) {
>>>> +        fprintf(stderr, "Couldn't finalize ibm,drc-power-domains property\n");
>>>> +    }
>>>> +
>>>> +    /* ibm,drc-names */
>>>> +    memset(char_buf, 0, sizeof(char_buf));
>>>> +    entries = (uint32_t *)&char_buf[0];
>>>> +    *entries = SPAPR_DRC_TABLE_SIZE;
>>>> +    offset = sizeof(*entries);
>>>> +
>>>> +    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
>>>> +        offset += sprintf(char_buf + offset, "PHB %d", i + 1);
>>>> +        char_buf[offset++] = '\0';
>>>> +    }
>>>> +
>>>> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-names", char_buf, offset);
>>>> +    if (ret) {
>>>> +        fprintf(stderr, "Couldn't finalize ibm,drc-names property\n");
>>>> +    }
>>>> +
>>>> +    /* ibm,drc-types */
>>>> +    memset(char_buf, 0, sizeof(char_buf));
>>>> +    entries = (uint32_t *)&char_buf[0];
>>>> +    *entries = SPAPR_DRC_TABLE_SIZE;
>>>> +    offset = sizeof(*entries);
>>>> +
>>>> +    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
>>>> +        offset += sprintf(char_buf + offset, "PHB");
>>>> +        char_buf[offset++] = '\0';
>>>> +    }
>>>> +
>>>> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-types", char_buf, offset);
>>>> +    if (ret) {
>>>> +        fprintf(stderr, "Couldn't finalize ibm,drc-types property\n");
>>>> +    }
>>>> +}
>>>> +
>>>>  #define _FDT(exp) \
>>>>      do { \
>>>>          int ret = (exp);                                           \
>>>> @@ -307,6 +432,8 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
>>>>      int i, smt = kvmppc_smt_threads();
>>>>      unsigned char vec5[] = {0x0, 0x0, 0x0, 0x0, 0x0, 0x80};
>>>>  
>>>> +    spapr_init_drc_table();
>>>> +
>>>>      fdt = g_malloc0(FDT_MAX_SIZE);
>>>>      _FDT((fdt_create(fdt, FDT_MAX_SIZE)));
>>>>  
>>>> @@ -590,6 +717,7 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
>>>>      int ret;
>>>>      void *fdt;
>>>>      sPAPRPHBState *phb;
>>>> +    DrcEntry *drc_entry;
>>>>  
>>>>      fdt = g_malloc(FDT_MAX_SIZE);
>>>>  
>>>> @@ -609,6 +737,8 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
>>>>      }
>>>>  
>>>>      QLIST_FOREACH(phb, &spapr->phbs, list) {
>>>> +        drc_entry = spapr_add_phb_to_drc_table(phb->buid, 2 /* Unusable */);
>>>> +        g_assert(drc_entry);
>>>>          ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt);
>>>>      }
>>>>  
>>>> @@ -633,6 +763,8 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
>>>>          spapr_populate_chosen_stdout(fdt, spapr->vio_bus);
>>>>      }
>>>>  
>>>> +    spapr_create_drc_dt_entries(fdt);
>>>> +
>>>>      _FDT((fdt_pack(fdt)));
>>>>  
>>>>      if (fdt_totalsize(fdt) > FDT_MAX_SIZE) {
>>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>>> index b2f11e9..0f2e705 100644
>>>> --- a/include/hw/ppc/spapr.h
>>>> +++ b/include/hw/ppc/spapr.h
>>>> @@ -299,6 +299,39 @@ typedef struct sPAPREnvironment {
>>>>  #define KVMPPC_H_LOGICAL_MEMOP  (KVMPPC_HCALL_BASE + 0x1)
>>>>  #define KVMPPC_HCALL_MAX        KVMPPC_H_LOGICAL_MEMOP
>>>>  
>>>> +/* For dlparable/hotpluggable slots */
>>>> +#define SPAPR_DRC_TABLE_SIZE    32
>>>> +#define SPAPR_DRC_PHB_SLOT_MAX  32
>>>> +#define SPAPR_DRC_DEV_ID_BASE   0x40000000
>>>> +
>>>> +typedef struct ConfigureConnectorState {
>>>> +    void *fdt;
>>>> +    int offset_start;
>>>> +    int offset;
>>>> +    int depth;
>>>> +    PCIDevice *dev;
>>>> +    enum {
>>>> +        CC_STATE_IDLE = 0,
>>>> +        CC_STATE_PENDING = 1,
>>>> +        CC_STATE_ACTIVE,
>>>> +    } state;
>>>> +} ConfigureConnectorState;
>>>> +
>>>> +typedef struct DrcEntry DrcEntry;
>>>> +
>>>> +struct DrcEntry {
>>>> +    uint32_t drc_index;
>>>> +    uint64_t phb_buid;
>>>> +    void *fdt;
>>>> +    int fdt_offset;
>>>> +    uint32_t state;
>>>> +    ConfigureConnectorState cc_state;
>>>> +    DrcEntry *child_entries;
>>>> +};
>>>> +
>>>> +extern DrcEntry drc_table[SPAPR_DRC_TABLE_SIZE];
>>>> +DrcEntry *spapr_add_phb_to_drc_table(uint64_t buid, uint32_t state);
>>>> +
>>>>  extern sPAPREnvironment *spapr;
>>>
>>> So far we were trying to keep everything sPAPR-related in sPAPREnvironment.
>>> Is @drc_table really that special?
>>
>>
>> One more note - we are trying to add a "spapr" or "sPAPR" prefix to all
>> global types defines in headers (such as sPAPRPHBState, spapr_pci_lsi,
>> VIOsPAPRBus, sPAPREnvironment), it would be nice to have "spapr" in some
>> form in these new types too.
>>
>> Or we could move the whole patch (except spapr_create_drc_dt_entries()) to
>> hw/ppc/spapr_pci.c (and keep the original names) as it seems to be the only
>> user of the whole DrcEntry and ConfigureConnectorState thing.
>> And put a pointer to drc_table[] into @spapr (or make it static?)
> 
> That would work, but I think we'd need to move spapr_create_drc_dt_entries()
> as well, or the bits that rely on DrcEntry at least. Though I worry
> about scoping DrcEntry to spapr_pci.c at this early stage, as DR-capable
> components other than PCI may come to rely on state that's captured by the
> DrcEntry nodes, such as boot-time FDT generation and run-time management
> (via ibm,configure-connector) of CPUs and memory.
> 
> Assuming that seems like a reasonable expectation, I think I'd prefer the
> first option of using spapr-specific prefixes for global types and moving
> drc_table into sPAPREnvironment


I did not realize DRC is not just for PCI. How hard would it be to add hot
plug support for a whole PHB? The current QEMU trend is to make QEMU
monitor's "device_add" equal to the command line's "-device" which is not
(yet) true for PHB but could be. Thanks.



>> The only remaining user of DrcEntry is spapr_hotplug_req_event() but this
>> can be easily fixed by small helper like this:
>>
>> int spapr_phb_slot_to_drc_index(uint64_t buid, int slot)
>> {
>>         DrcEntry *drc_entry = spapr_phb_to_drc_entry(phb->buid);
>>         if (!drc_entry) {
>>                 drc_entry = spapr_add_phb_to_drc_table(phb->buid, 2);
>>         }
>>         return drc_entry->child_entries[slot].drc_index;
>> }




-- 
Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/14] spapr: populate DRC entries for root dt node
  2014-01-20  2:58         ` Alexey Kardashevskiy
@ 2014-01-20 14:12           ` Mike Day
  2014-01-20 17:24           ` Michael Roth
  1 sibling, 0 replies; 39+ messages in thread
From: Mike Day @ 2014-01-20 14:12 UTC (permalink / raw
  To: Alexey Kardashevskiy
  Cc: qemu-devel@nongnu.org, Alexander Graf, Michael Roth,
	qemu-ppc@nongnu.org, tyreld, nfont, Paul Mackerras

On Sun, Jan 19, 2014 at 9:58 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>
> I did not realize DRC is not just for PCI. How hard would it be to add hot
> plug support for a whole PHB? The current QEMU trend is to make QEMU
> monitor's "device_add" equal to the command line's "-device" which is not
> (yet) true for PHB but could be. Thanks.

We discussed this approach (hot-plug the whole bus) during the design
phase and at one point started to work on it. I don't think we
established whether or not the Linux sys/bus/pci/* file system would
work with it.

Mike

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/14] spapr: populate DRC entries for root dt node
  2014-01-20  2:58         ` Alexey Kardashevskiy
  2014-01-20 14:12           ` Mike Day
@ 2014-01-20 17:24           ` Michael Roth
  2014-01-20 17:59             ` Mike Day
  1 sibling, 1 reply; 39+ messages in thread
From: Michael Roth @ 2014-01-20 17:24 UTC (permalink / raw
  To: Alexey Kardashevskiy, qemu-devel
  Cc: agraf, ncmike, paulus, tyreld, nfont, qemu-ppc

Quoting Alexey Kardashevskiy (2014-01-19 20:58:20)
> On 01/17/2014 07:51 AM, Michael Roth wrote:
> > Quoting Alexey Kardashevskiy (2013-12-15 22:54:42)
> >> On 12/16/2013 01:59 PM, Alexey Kardashevskiy wrote:
> >>> On 12/06/2013 09:32 AM, Michael Roth wrote:
> >>>> From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> >>>>
> >>>> This add entries to the root OF node to advertise our PHBs as being
> >>>> DR-capable in according with PAPR specification.
> >>>>
> >>>> Each PHB is given a name of PHB<bus#>, advertised as a PHB type,
> >>>> and associated with a power domain of -1 (indicating to guests that
> >>>> power management is handled automatically by hardware).
> >>>>
> >>>> We currently allocate entries for up to 32 DR-capable PHBs, though
> >>>> this limit can be increased later.
> >>>>
> >>>> DrcEntry objects to track the state of the DR-connector associated
> >>>> with each PHB are stored in a 32-entry array, and each DrcEntry has
> >>>> in turn have a dynamically-sized number of child DR-connectors,
> >>>> which we will use later to track the state of DR-connectors
> >>>> associated with a PHB's physical slots.
> >>>>
> >>>> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> >>>> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> >>>> ---
> >>>>  hw/ppc/spapr.c         |  132 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>  include/hw/ppc/spapr.h |   33 ++++++++++++
> >>>>  2 files changed, 165 insertions(+)
> >>>>
> >>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>>> index 7e53a5f..ec3ba43 100644
> >>>> --- a/hw/ppc/spapr.c
> >>>> +++ b/hw/ppc/spapr.c
> >>>> @@ -81,6 +81,7 @@
> >>>>  #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
> >>>>  
> >>>>  sPAPREnvironment *spapr;
> >>>> +DrcEntry drc_table[SPAPR_DRC_TABLE_SIZE];
> >>>>  
> >>>>  int spapr_allocate_irq(int hint, bool lsi)
> >>>>  {
> >>>> @@ -276,6 +277,130 @@ static size_t create_page_sizes_prop(CPUPPCState *env, uint32_t *prop,
> >>>>      return (p - prop) * sizeof(uint32_t);
> >>>>  }
> >>>>  
> >>>> +static void spapr_init_drc_table(void)
> >>>> +{
> >>>> +    int i;
> >>>> +
> >>>> +    memset(drc_table, 0, sizeof(drc_table));
> >>>> +
> >>>> +    /* For now we only care about PHB entries */
> >>>> +    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
> >>>> +        drc_table[i].drc_index = 0x2000001 + i;
> >>>> +    }
> >>>> +}
> >>>> +
> >>>> +DrcEntry *spapr_add_phb_to_drc_table(uint64_t buid, uint32_t state)
> >>>> +{
> >>>> +    DrcEntry *empty_drc = NULL;
> >>>> +    DrcEntry *found_drc = NULL;
> >>>> +    int i, phb_index;
> >>>> +
> >>>> +    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
> >>>> +        if (drc_table[i].phb_buid == 0) {
> >>>> +            empty_drc = &drc_table[i];
> >>>> +        }
> >>>> +
> >>>> +        if (drc_table[i].phb_buid == buid) {
> >>>> +            found_drc = &drc_table[i];
> >>>> +            break;
> >>>> +        }
> >>>> +    }
> >>>> +
> >>>> +    if (found_drc) {
> >>>> +        return found_drc;
> >>>> +    }
> >>>> +
> >>>> +    if (empty_drc) {
> >>>> +        empty_drc->phb_buid = buid;
> >>>> +        empty_drc->state = state;
> >>>> +        empty_drc->cc_state.fdt = NULL;
> >>>> +        empty_drc->cc_state.offset = 0;
> >>>> +        empty_drc->cc_state.depth = 0;
> >>>> +        empty_drc->cc_state.state = CC_STATE_IDLE;
> >>>> +        empty_drc->child_entries =
> >>>> +            g_malloc0(sizeof(DrcEntry) * SPAPR_DRC_PHB_SLOT_MAX);
> >>>> +        phb_index = buid - SPAPR_PCI_BASE_BUID;
> >>>> +        for (i = 0; i < SPAPR_DRC_PHB_SLOT_MAX; i++) {
> >>>> +            empty_drc->child_entries[i].drc_index =
> >>>> +                SPAPR_DRC_DEV_ID_BASE + (phb_index << 8) + (i << 3);
> >>>> +        }
> >>>> +        return empty_drc;
> >>>> +    }
> >>>> +
> >>>> +    return NULL;
> >>>> +}
> >>>> +
> >>>> +static void spapr_create_drc_dt_entries(void *fdt)
> >>>> +{
> >>>> +    char char_buf[1024];
> >>>> +    uint32_t int_buf[SPAPR_DRC_TABLE_SIZE + 1];
> >>>> +    uint32_t *entries;
> >>>> +    int offset, fdt_offset;
> >>>> +    int i, ret;
> >>>> +
> >>>> +    fdt_offset = fdt_path_offset(fdt, "/");
> >>>> +
> >>>> +    /* ibm,drc-indexes */
> >>>> +    memset(int_buf, 0, sizeof(int_buf));
> >>>> +    int_buf[0] = SPAPR_DRC_TABLE_SIZE;
> >>>> +
> >>>> +    for (i = 1; i <= SPAPR_DRC_TABLE_SIZE; i++) {
> >>>> +        int_buf[i] = drc_table[i-1].drc_index;
> >>>> +    }
> >>>> +
> >>>> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-indexes", int_buf,
> >>>> +                      sizeof(int_buf));
> >>>> +    if (ret) {
> >>>> +        fprintf(stderr, "Couldn't finalize ibm,drc-indexes property\n");
> >>>> +    }
> >>>> +
> >>>> +    /* ibm,drc-power-domains */
> >>>> +    memset(int_buf, 0, sizeof(int_buf));
> >>>> +    int_buf[0] = SPAPR_DRC_TABLE_SIZE;
> >>>> +
> >>>> +    for (i = 1; i <= SPAPR_DRC_TABLE_SIZE; i++) {
> >>>> +        int_buf[i] = 0xffffffff;
> >>>> +    }
> >>>> +
> >>>> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-power-domains", int_buf,
> >>>> +                      sizeof(int_buf));
> >>>> +    if (ret) {
> >>>> +        fprintf(stderr, "Couldn't finalize ibm,drc-power-domains property\n");
> >>>> +    }
> >>>> +
> >>>> +    /* ibm,drc-names */
> >>>> +    memset(char_buf, 0, sizeof(char_buf));
> >>>> +    entries = (uint32_t *)&char_buf[0];
> >>>> +    *entries = SPAPR_DRC_TABLE_SIZE;
> >>>> +    offset = sizeof(*entries);
> >>>> +
> >>>> +    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
> >>>> +        offset += sprintf(char_buf + offset, "PHB %d", i + 1);
> >>>> +        char_buf[offset++] = '\0';
> >>>> +    }
> >>>> +
> >>>> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-names", char_buf, offset);
> >>>> +    if (ret) {
> >>>> +        fprintf(stderr, "Couldn't finalize ibm,drc-names property\n");
> >>>> +    }
> >>>> +
> >>>> +    /* ibm,drc-types */
> >>>> +    memset(char_buf, 0, sizeof(char_buf));
> >>>> +    entries = (uint32_t *)&char_buf[0];
> >>>> +    *entries = SPAPR_DRC_TABLE_SIZE;
> >>>> +    offset = sizeof(*entries);
> >>>> +
> >>>> +    for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
> >>>> +        offset += sprintf(char_buf + offset, "PHB");
> >>>> +        char_buf[offset++] = '\0';
> >>>> +    }
> >>>> +
> >>>> +    ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-types", char_buf, offset);
> >>>> +    if (ret) {
> >>>> +        fprintf(stderr, "Couldn't finalize ibm,drc-types property\n");
> >>>> +    }
> >>>> +}
> >>>> +
> >>>>  #define _FDT(exp) \
> >>>>      do { \
> >>>>          int ret = (exp);                                           \
> >>>> @@ -307,6 +432,8 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
> >>>>      int i, smt = kvmppc_smt_threads();
> >>>>      unsigned char vec5[] = {0x0, 0x0, 0x0, 0x0, 0x0, 0x80};
> >>>>  
> >>>> +    spapr_init_drc_table();
> >>>> +
> >>>>      fdt = g_malloc0(FDT_MAX_SIZE);
> >>>>      _FDT((fdt_create(fdt, FDT_MAX_SIZE)));
> >>>>  
> >>>> @@ -590,6 +717,7 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
> >>>>      int ret;
> >>>>      void *fdt;
> >>>>      sPAPRPHBState *phb;
> >>>> +    DrcEntry *drc_entry;
> >>>>  
> >>>>      fdt = g_malloc(FDT_MAX_SIZE);
> >>>>  
> >>>> @@ -609,6 +737,8 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
> >>>>      }
> >>>>  
> >>>>      QLIST_FOREACH(phb, &spapr->phbs, list) {
> >>>> +        drc_entry = spapr_add_phb_to_drc_table(phb->buid, 2 /* Unusable */);
> >>>> +        g_assert(drc_entry);
> >>>>          ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt);
> >>>>      }
> >>>>  
> >>>> @@ -633,6 +763,8 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
> >>>>          spapr_populate_chosen_stdout(fdt, spapr->vio_bus);
> >>>>      }
> >>>>  
> >>>> +    spapr_create_drc_dt_entries(fdt);
> >>>> +
> >>>>      _FDT((fdt_pack(fdt)));
> >>>>  
> >>>>      if (fdt_totalsize(fdt) > FDT_MAX_SIZE) {
> >>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> >>>> index b2f11e9..0f2e705 100644
> >>>> --- a/include/hw/ppc/spapr.h
> >>>> +++ b/include/hw/ppc/spapr.h
> >>>> @@ -299,6 +299,39 @@ typedef struct sPAPREnvironment {
> >>>>  #define KVMPPC_H_LOGICAL_MEMOP  (KVMPPC_HCALL_BASE + 0x1)
> >>>>  #define KVMPPC_HCALL_MAX        KVMPPC_H_LOGICAL_MEMOP
> >>>>  
> >>>> +/* For dlparable/hotpluggable slots */
> >>>> +#define SPAPR_DRC_TABLE_SIZE    32
> >>>> +#define SPAPR_DRC_PHB_SLOT_MAX  32
> >>>> +#define SPAPR_DRC_DEV_ID_BASE   0x40000000
> >>>> +
> >>>> +typedef struct ConfigureConnectorState {
> >>>> +    void *fdt;
> >>>> +    int offset_start;
> >>>> +    int offset;
> >>>> +    int depth;
> >>>> +    PCIDevice *dev;
> >>>> +    enum {
> >>>> +        CC_STATE_IDLE = 0,
> >>>> +        CC_STATE_PENDING = 1,
> >>>> +        CC_STATE_ACTIVE,
> >>>> +    } state;
> >>>> +} ConfigureConnectorState;
> >>>> +
> >>>> +typedef struct DrcEntry DrcEntry;
> >>>> +
> >>>> +struct DrcEntry {
> >>>> +    uint32_t drc_index;
> >>>> +    uint64_t phb_buid;
> >>>> +    void *fdt;
> >>>> +    int fdt_offset;
> >>>> +    uint32_t state;
> >>>> +    ConfigureConnectorState cc_state;
> >>>> +    DrcEntry *child_entries;
> >>>> +};
> >>>> +
> >>>> +extern DrcEntry drc_table[SPAPR_DRC_TABLE_SIZE];
> >>>> +DrcEntry *spapr_add_phb_to_drc_table(uint64_t buid, uint32_t state);
> >>>> +
> >>>>  extern sPAPREnvironment *spapr;
> >>>
> >>> So far we were trying to keep everything sPAPR-related in sPAPREnvironment.
> >>> Is @drc_table really that special?
> >>
> >>
> >> One more note - we are trying to add a "spapr" or "sPAPR" prefix to all
> >> global types defines in headers (such as sPAPRPHBState, spapr_pci_lsi,
> >> VIOsPAPRBus, sPAPREnvironment), it would be nice to have "spapr" in some
> >> form in these new types too.
> >>
> >> Or we could move the whole patch (except spapr_create_drc_dt_entries()) to
> >> hw/ppc/spapr_pci.c (and keep the original names) as it seems to be the only
> >> user of the whole DrcEntry and ConfigureConnectorState thing.
> >> And put a pointer to drc_table[] into @spapr (or make it static?)
> > 
> > That would work, but I think we'd need to move spapr_create_drc_dt_entries()
> > as well, or the bits that rely on DrcEntry at least. Though I worry
> > about scoping DrcEntry to spapr_pci.c at this early stage, as DR-capable
> > components other than PCI may come to rely on state that's captured by the
> > DrcEntry nodes, such as boot-time FDT generation and run-time management
> > (via ibm,configure-connector) of CPUs and memory.
> > 
> > Assuming that seems like a reasonable expectation, I think I'd prefer the
> > first option of using spapr-specific prefixes for global types and moving
> > drc_table into sPAPREnvironment
> 
> 
> I did not realize DRC is not just for PCI. How hard would it be to add hot
> plug support for a whole PHB? The current QEMU trend is to make QEMU
> monitor's "device_add" equal to the command line's "-device" which is not
> (yet) true for PHB but could be. Thanks.
> 

Would need to look at it a bit more closely to say for certain, but after
discussing it a bit Tyrel/Mike, I think the main considerations would be:

1) PHB hotplug/unplug would need to signal a different event type in it's
   check-exception/epow message, we have stubs in place for a PHB event type,
   so that's mostly a matter of adding special-casing in the hotplug callback
   for spapr-pci-host-bridge devices
2) The required properties for the OF node corresponding PHB will be different.
   Currently these are generated as part of the hotplug callback, and attached
   to the corresponding ConfigureConnectorState node to be fed to the guest
   via subsequent ibm,configure-connector RTAS calls, so we'd just hook the
   PHB's OF node generation code in there as.
3) The sysctl/kernel interface for handling PHB hotplug would be different,
   we'd be relying on the rpadlar kernel module
   (/sys/bus/pci/slots/control/add_slot) rather than rpaphp
   (/sys/bus/pci/slots/<slot>/power) or the PCI rescan fallback.
   This is mostly a matter of modifying the handling in the guest tools, namely
   in rtas_errd, to handle the event accordingly.

We also haven't done anything extensive using rpadlpar operations within qemu
guests, so there may be various odds/ends and possibly kernel changes needed to
get that working properly (as was the case for rpaphp, though thanks to the PCI
rescan workaround a new kernel isn't required for existing guests... a similar
fallback likely won't be available for rpadlpar)

But from a high-level view at least it seems fairly straight-forward. I'll see
if we can get a prototype working.

> 
> 
> >> The only remaining user of DrcEntry is spapr_hotplug_req_event() but this
> >> can be easily fixed by small helper like this:
> >>
> >> int spapr_phb_slot_to_drc_index(uint64_t buid, int slot)
> >> {
> >>         DrcEntry *drc_entry = spapr_phb_to_drc_entry(phb->buid);
> >>         if (!drc_entry) {
> >>                 drc_entry = spapr_add_phb_to_drc_table(phb->buid, 2);
> >>         }
> >>         return drc_entry->child_entries[slot].drc_index;
> >> }
> 
> 
> 
> 
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/14] spapr: populate DRC entries for root dt node
  2014-01-20 17:24           ` Michael Roth
@ 2014-01-20 17:59             ` Mike Day
  2014-01-20 18:51               ` Michael Roth
  0 siblings, 1 reply; 39+ messages in thread
From: Mike Day @ 2014-01-20 17:59 UTC (permalink / raw
  To: Michael Roth
  Cc: Alexey Kardashevskiy, qemu-devel@nongnu.org, Alexander Graf,
	Paul Mackerras, tyreld, nfont, qemu-ppc@nongnu.org

On Mon, Jan 20, 2014 at 12:24 PM, Michael Roth
<mdroth@linux.vnet.ibm.com> wrote:
> Quoting Alexey Kardashevskiy (2014-01-19 20:58:20)
>
> Would need to look at it a bit more closely to say for certain, but after
> discussing it a bit Tyrel/Mike, I think the main considerations would be:
>
> 1) PHB hotplug/unplug would need to signal a different event type in it's
>    check-exception/epow message, we have stubs in place for a PHB event type,
>    so that's mostly a matter of adding special-casing in the hotplug callback
>    for spapr-pci-host-bridge devices
> 2) The required properties for the OF node corresponding PHB will be different.
>    Currently these are generated as part of the hotplug callback, and attached
>    to the corresponding ConfigureConnectorState node to be fed to the guest
>    via subsequent ibm,configure-connector RTAS calls, so we'd just hook the
>    PHB's OF node generation code in there as.
> 3) The sysctl/kernel interface for handling PHB hotplug would be different,
>    we'd be relying on the rpadlar kernel module
>    (/sys/bus/pci/slots/control/add_slot) rather than rpaphp
>    (/sys/bus/pci/slots/<slot>/power) or the PCI rescan fallback.
>    This is mostly a matter of modifying the handling in the guest tools, namely
>    in rtas_errd, to handle the event accordingly.
>
> We also haven't done anything extensive using rpadlpar operations within qemu
> guests, so there may be various odds/ends and possibly kernel changes needed to
> get that working properly (as was the case for rpaphp, though thanks to the PCI
> rescan workaround a new kernel isn't required for existing guests... a similar
> fallback likely won't be available for rpadlpar)
>
> But from a high-level view at least it seems fairly straight-forward. I'll see
> if we can get a prototype working.

The fact that it "just works" now by rescanning the pci filesystem is
a significant benefit. I don't think we want to lose it. There can be
many PHBs on one of these systems. Maybe we could make the PHB
hot-pluggable and also always have one PHB plugged in at startup. Then
the guest will see the bus when it starts and it will build the pci
file system.

Mike

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/14] spapr: populate DRC entries for root dt node
  2014-01-20 17:59             ` Mike Day
@ 2014-01-20 18:51               ` Michael Roth
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Roth @ 2014-01-20 18:51 UTC (permalink / raw
  To: Mike Day
  Cc: Alexey Kardashevskiy, qemu-devel@nongnu.org, Alexander Graf,
	Paul Mackerras, tyreld, nfont, qemu-ppc@nongnu.org

Quoting Mike Day (2014-01-20 11:59:28)
> On Mon, Jan 20, 2014 at 12:24 PM, Michael Roth
> <mdroth@linux.vnet.ibm.com> wrote:
> > Quoting Alexey Kardashevskiy (2014-01-19 20:58:20)
> >
> > Would need to look at it a bit more closely to say for certain, but after
> > discussing it a bit Tyrel/Mike, I think the main considerations would be:
> >
> > 1) PHB hotplug/unplug would need to signal a different event type in it's
> >    check-exception/epow message, we have stubs in place for a PHB event type,
> >    so that's mostly a matter of adding special-casing in the hotplug callback
> >    for spapr-pci-host-bridge devices
> > 2) The required properties for the OF node corresponding PHB will be different.
> >    Currently these are generated as part of the hotplug callback, and attached
> >    to the corresponding ConfigureConnectorState node to be fed to the guest
> >    via subsequent ibm,configure-connector RTAS calls, so we'd just hook the
> >    PHB's OF node generation code in there as.
> > 3) The sysctl/kernel interface for handling PHB hotplug would be different,
> >    we'd be relying on the rpadlar kernel module
> >    (/sys/bus/pci/slots/control/add_slot) rather than rpaphp
> >    (/sys/bus/pci/slots/<slot>/power) or the PCI rescan fallback.
> >    This is mostly a matter of modifying the handling in the guest tools, namely
> >    in rtas_errd, to handle the event accordingly.
> >
> > We also haven't done anything extensive using rpadlpar operations within qemu
> > guests, so there may be various odds/ends and possibly kernel changes needed to
> > get that working properly (as was the case for rpaphp, though thanks to the PCI
> > rescan workaround a new kernel isn't required for existing guests... a similar
> > fallback likely won't be available for rpadlpar)
> >
> > But from a high-level view at least it seems fairly straight-forward. I'll see
> > if we can get a prototype working.
> 
> The fact that it "just works" now by rescanning the pci filesystem is
> a significant benefit. I don't think we want to lose it. There can be
> many PHBs on one of these systems. Maybe we could make the PHB
> hot-pluggable and also always have one PHB plugged in at startup. Then
> the guest will see the bus when it starts and it will build the pci
> file system.

I'm not sure I understand the proposal, but to be clear this doesn't entail a
change to the existing behavior, just one of the constraints specific to
supporting PHB hotplug in the future, PCI devices can still be hotplugged via
rpaphp or rescan either way.

As far alternatives to PHB hotplug, there's options like introducing a compatible
pci-bridge device (or perhaps the standard pci-bridge code will work) that can be
hotplugged using rpaphp/rescan to add child busses, but I think that's a separate
issue (unless the only goal we care about here is increasing the pci device limit
while the guest is running (maybe it is?))

> 
> Mike

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2014-01-20 18:51 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-05 22:32 [Qemu-devel] [PATCH v2 00/14] spapr: add support for pci hotplug Michael Roth
2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 01/14] spapr: populate DRC entries for root dt node Michael Roth
2013-12-16  2:59   ` Alexey Kardashevskiy
2013-12-16  4:54     ` Alexey Kardashevskiy
2014-01-16 20:51       ` Michael Roth
2014-01-20  2:58         ` Alexey Kardashevskiy
2014-01-20 14:12           ` Mike Day
2014-01-20 17:24           ` Michael Roth
2014-01-20 17:59             ` Mike Day
2014-01-20 18:51               ` Michael Roth
2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 02/14] spapr_pci: populate DRC dt entries for PHBs Michael Roth
2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 03/14] spapr: add helper to retrieve a PHB/device DrcEntry Michael Roth
2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 04/14] spapr_pci: add set-indicator RTAS interface Michael Roth
2013-12-16  4:26   ` Alexey Kardashevskiy
2014-01-16 20:54     ` Michael Roth
2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 05/14] spapr_pci: add get/set-power-level RTAS interfaces Michael Roth
2013-12-16  3:09   ` Alexey Kardashevskiy
2014-01-16 21:01     ` Michael Roth
2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 06/14] spapr_pci: add get-sensor-state RTAS interface Michael Roth
2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 07/14] spapr_pci: add ibm, configure-connector " Michael Roth
2013-12-05 22:32 ` [Qemu-devel] [PATCH v2 08/14] memory: add memory_region_find_subregion Michael Roth
2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 09/14] pci: make pci_bar useable outside pci.c Michael Roth
2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 10/14] pci: allow 0 address for PCI IO regions Michael Roth
2013-12-05 23:33   ` Peter Maydell
2013-12-10 21:42     ` Michael Roth
2013-12-10 22:14       ` Peter Maydell
2013-12-10 23:03         ` [Qemu-devel] [Qemu-ppc] " Benjamin Herrenschmidt
2013-12-12 14:34     ` [Qemu-devel] " Michael S. Tsirkin
2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 11/14] spapr_pci: enable basic hotplug operations Michael Roth
2013-12-16  4:36   ` Alexey Kardashevskiy
2014-01-16 21:22     ` Michael Roth
2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 12/14] spapr_events: re-use EPOW event infrastructure for hotplug events Michael Roth
2013-12-16  5:05   ` Alexey Kardashevskiy
2014-01-16 21:32     ` Michael Roth
2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 13/14] spapr_events: event-scan RTAS interface Michael Roth
2013-12-16  4:57   ` Alexey Kardashevskiy
2013-12-05 22:33 ` [Qemu-devel] [PATCH v2 14/14] spapr_pci: emit hotplug add/remove events during hotplug Michael Roth
2013-12-16  5:06   ` Alexey Kardashevskiy
2014-01-10  8:29 ` [Qemu-devel] [PATCH v2 00/14] spapr: add support for pci hotplug Alexey Kardashevskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.