All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] RFC: Implement emulation of pSeries logical partitions
@ 2011-02-12 14:54 David Gibson
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 01/15] Add TAGS and *~ to .gitignore David Gibson
                   ` (15 more replies)
  0 siblings, 16 replies; 73+ messages in thread
From: David Gibson @ 2011-02-12 14:54 UTC (permalink / raw
  To: qemu-devel; +Cc: paulus, agraf, anton

This patch series adds a "pseries" machine to qemu, allowing it to
emulate IBM pSeries logical partitions.  Along the way we add a bunch
of support for more modern ppc CPUs than are currently supported.  It
also makes some significant cleanups to the translation code for hash
page table based ppc MMUs.

This is a first version of this series for review.  There are a number
of additional patches adding features such as virtual IO devices to
the emulated pSeries platform, which will be added to the series once
they're a bit more polished.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] [PATCH 01/15] Add TAGS and *~ to .gitignore
  2011-02-12 14:54 [Qemu-devel] RFC: Implement emulation of pSeries logical partitions David Gibson
@ 2011-02-12 14:54 ` David Gibson
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 02/15] Clean up PowerPC SLB handling code David Gibson
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 73+ messages in thread
From: David Gibson @ 2011-02-12 14:54 UTC (permalink / raw
  To: qemu-devel; +Cc: paulus, agraf, anton

Add the etags generated output file and editor backup files to
.gitignore.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 .gitignore |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/.gitignore b/.gitignore
index 26703e1..1d79680 100644
--- a/.gitignore
+++ b/.gitignore
@@ -63,3 +63,5 @@ pc-bios/optionrom/multiboot.raw
 .stgit-*
 cscope.*
 tags
+TAGS
+*~
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] [PATCH 02/15] Clean up PowerPC SLB handling code
  2011-02-12 14:54 [Qemu-devel] RFC: Implement emulation of pSeries logical partitions David Gibson
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 01/15] Add TAGS and *~ to .gitignore David Gibson
@ 2011-02-12 14:54 ` David Gibson
  2011-02-12 15:17   ` [Qemu-devel] " Alexander Graf
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 03/15] Allow qemu_devtree_setprop() to take arbitrary values David Gibson
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 73+ messages in thread
From: David Gibson @ 2011-02-12 14:54 UTC (permalink / raw
  To: qemu-devel; +Cc: paulus, agraf, anton

Currently the SLB information when emulating a PowerPC 970 is
storeed in a structure with the unhelpfully named fields 'tmp'
and 'tmp64'.  While the layout in these fields does match the
description of the SLB in the architecture document, it is not
convenient either for looking up the SLB, or for emulating the
slbmte instruction.

This patch, therefore, reorganizes the SLB entry structure to be
divided in the the "ESID related" and "VSID related" fields as
they are divided in instructions accessing the SLB.

In addition to making the code smaller and more readable, this will
make it easier to implement for the 1TB segments used in more
recent PowerPC chips.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 target-ppc/cpu.h       |   29 +++++++-
 target-ppc/helper.c    |  178 ++++++++++++++----------------------------------
 target-ppc/helper.h    |    1 -
 target-ppc/op_helper.c |    9 +--
 4 files changed, 80 insertions(+), 137 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index deb8d7c..a20c132 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -43,6 +43,8 @@
 # define TARGET_VIRT_ADDR_SPACE_BITS 64
 #endif
 
+#define TARGET_PAGE_BITS_16M 24
+
 #else /* defined (TARGET_PPC64) */
 /* PowerPC 32 definitions */
 #define TARGET_LONG_BITS 32
@@ -359,10 +361,31 @@ union ppc_tlb_t {
 
 typedef struct ppc_slb_t ppc_slb_t;
 struct ppc_slb_t {
-    uint64_t tmp64;
-    uint32_t tmp;
+    uint64_t esid;
+    uint64_t vsid;
 };
 
+/* Bits in the SLB ESID word */
+#define SLB_ESID_ESID           0xFFFFFFFFF0000000ULL
+#define SLB_ESID_V              0x0000000008000000ULL /* valid */
+
+/* Bits in the SLB VSID word */
+#define SLB_VSID_SHIFT          12
+#define SLB_VSID_SSIZE_SHIFT    62
+#define SLB_VSID_B              0xc000000000000000ULL
+#define SLB_VSID_B_256M         0x0000000000000000ULL
+#define SLB_VSID_VSID           0x3FFFFFFFFFFFF000ULL
+#define SLB_VSID_KS             0x0000000000000800ULL
+#define SLB_VSID_KP             0x0000000000000400ULL
+#define SLB_VSID_N              0x0000000000000200ULL /* no-execute */
+#define SLB_VSID_L              0x0000000000000100ULL
+#define SLB_VSID_C              0x0000000000000080ULL /* class */
+#define SLB_VSID_LP             0x0000000000000030ULL
+#define SLB_VSID_ATTR           0x0000000000000FFFULL
+
+#define SEGMENT_SHIFT_256M      28
+#define SEGMENT_MASK_256M       ~((1ULL << SEGMENT_SHIFT_256M) - 1)
+
 /*****************************************************************************/
 /* Machine state register bits definition                                    */
 #define MSR_SF   63 /* Sixty-four-bit mode                            hflags */
@@ -755,7 +778,7 @@ void ppc_store_sdr1 (CPUPPCState *env, target_ulong value);
 void ppc_store_asr (CPUPPCState *env, target_ulong value);
 target_ulong ppc_load_slb (CPUPPCState *env, int slb_nr);
 target_ulong ppc_load_sr (CPUPPCState *env, int sr_nr);
-void ppc_store_slb (CPUPPCState *env, target_ulong rb, target_ulong rs);
+int ppc_store_slb (CPUPPCState *env, target_ulong rb, target_ulong rs);
 #endif /* defined(TARGET_PPC64) */
 void ppc_store_sr (CPUPPCState *env, int srnum, target_ulong value);
 #endif /* !defined(CONFIG_USER_ONLY) */
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index 4b49101..2094ca3 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -672,85 +672,36 @@ static inline int find_pte(CPUState *env, mmu_ctx_t *ctx, int h, int rw,
 }
 
 #if defined(TARGET_PPC64)
-static ppc_slb_t *slb_get_entry(CPUPPCState *env, int nr)
-{
-    ppc_slb_t *retval = &env->slb[nr];
-
-#if 0 // XXX implement bridge mode?
-    if (env->spr[SPR_ASR] & 1) {
-        target_phys_addr_t sr_base;
-
-        sr_base = env->spr[SPR_ASR] & 0xfffffffffffff000;
-        sr_base += (12 * nr);
-
-        retval->tmp64 = ldq_phys(sr_base);
-        retval->tmp = ldl_phys(sr_base + 8);
-    }
-#endif
-
-    return retval;
-}
-
-static void slb_set_entry(CPUPPCState *env, int nr, ppc_slb_t *slb)
-{
-    ppc_slb_t *entry = &env->slb[nr];
-
-    if (slb == entry)
-        return;
-
-    entry->tmp64 = slb->tmp64;
-    entry->tmp = slb->tmp;
-}
-
-static inline int slb_is_valid(ppc_slb_t *slb)
-{
-    return (int)(slb->tmp64 & 0x0000000008000000ULL);
-}
-
-static inline void slb_invalidate(ppc_slb_t *slb)
-{
-    slb->tmp64 &= ~0x0000000008000000ULL;
-}
-
 static inline int slb_lookup(CPUPPCState *env, target_ulong eaddr,
                              target_ulong *vsid, target_ulong *page_mask,
                              int *attr, int *target_page_bits)
 {
-    target_ulong mask;
-    int n, ret;
+    uint64_t esid;
+    int n;
 
-    ret = -5;
     LOG_SLB("%s: eaddr " TARGET_FMT_lx "\n", __func__, eaddr);
-    mask = 0x0000000000000000ULL; /* Avoid gcc warning */
+
+    esid = (eaddr & SEGMENT_MASK_256M) | SLB_ESID_V;
+
     for (n = 0; n < env->slb_nr; n++) {
-        ppc_slb_t *slb = slb_get_entry(env, n);
-
-        LOG_SLB("%s: seg %d %016" PRIx64 " %08"
-                    PRIx32 "\n", __func__, n, slb->tmp64, slb->tmp);
-        if (slb_is_valid(slb)) {
-            /* SLB entry is valid */
-            mask = 0xFFFFFFFFF0000000ULL;
-            if (slb->tmp & 0x8) {
-                /* 16 MB PTEs */
-                if (target_page_bits)
-                    *target_page_bits = 24;
-            } else {
-                /* 4 KB PTEs */
-                if (target_page_bits)
-                    *target_page_bits = TARGET_PAGE_BITS;
-            }
-            if ((eaddr & mask) == (slb->tmp64 & mask)) {
-                /* SLB match */
-                *vsid = ((slb->tmp64 << 24) | (slb->tmp >> 8)) & 0x0003FFFFFFFFFFFFULL;
-                *page_mask = ~mask;
-                *attr = slb->tmp & 0xFF;
-                ret = n;
-                break;
+        ppc_slb_t *slb = &env->slb[n];
+
+        LOG_SLB("%s: slot %d %016" PRIx64 " %016"
+                    PRIx64 "\n", __func__, n, slb->esid, slb->vsid);
+        if (slb->esid == esid) {
+            *vsid = (slb->vsid & SLB_VSID_VSID) >> SLB_VSID_SHIFT;
+            *page_mask = ~SEGMENT_MASK_256M;
+            *attr = slb->vsid & SLB_VSID_ATTR;
+            if (target_page_bits) {
+                *target_page_bits = (slb->vsid & SLB_VSID_L)
+                    ? TARGET_PAGE_BITS_16M
+                    : TARGET_PAGE_BITS;
             }
+            return n;
         }
     }
 
-    return ret;
+    return -5;
 }
 
 void ppc_slb_invalidate_all (CPUPPCState *env)
@@ -760,11 +711,10 @@ void ppc_slb_invalidate_all (CPUPPCState *env)
     do_invalidate = 0;
     /* XXX: Warning: slbia never invalidates the first segment */
     for (n = 1; n < env->slb_nr; n++) {
-        ppc_slb_t *slb = slb_get_entry(env, n);
+        ppc_slb_t *slb = &env->slb[n];
 
-        if (slb_is_valid(slb)) {
-            slb_invalidate(slb);
-            slb_set_entry(env, n, slb);
+        if (slb->esid & SLB_ESID_V) {
+            slb->esid &= ~SLB_ESID_V;
             /* XXX: given the fact that segment size is 256 MB or 1TB,
              *      and we still don't have a tlb_flush_mask(env, n, mask)
              *      in Qemu, we just invalidate all TLBs
@@ -781,68 +731,44 @@ void ppc_slb_invalidate_one (CPUPPCState *env, uint64_t T0)
     target_ulong vsid, page_mask;
     int attr;
     int n;
+    ppc_slb_t *slb;
 
     n = slb_lookup(env, T0, &vsid, &page_mask, &attr, NULL);
-    if (n >= 0) {
-        ppc_slb_t *slb = slb_get_entry(env, n);
-
-        if (slb_is_valid(slb)) {
-            slb_invalidate(slb);
-            slb_set_entry(env, n, slb);
-            /* XXX: given the fact that segment size is 256 MB or 1TB,
-             *      and we still don't have a tlb_flush_mask(env, n, mask)
-             *      in Qemu, we just invalidate all TLBs
-             */
-            tlb_flush(env, 1);
-        }
+    if (n < 0) {
+        return;
     }
-}
 
-target_ulong ppc_load_slb (CPUPPCState *env, int slb_nr)
-{
-    target_ulong rt;
-    ppc_slb_t *slb = slb_get_entry(env, slb_nr);
+    slb = &env->slb[n];
 
-    if (slb_is_valid(slb)) {
-        /* SLB entry is valid */
-        /* Copy SLB bits 62:88 to Rt 37:63 (VSID 23:49) */
-        rt = slb->tmp >> 8;             /* 65:88 => 40:63 */
-        rt |= (slb->tmp64 & 0x7) << 24; /* 62:64 => 37:39 */
-        /* Copy SLB bits 89:92 to Rt 33:36 (KsKpNL) */
-        rt |= ((slb->tmp >> 4) & 0xF) << 27;
-    } else {
-        rt = 0;
-    }
-    LOG_SLB("%s: %016" PRIx64 " %08" PRIx32 " => %d "
-            TARGET_FMT_lx "\n", __func__, slb->tmp64, slb->tmp, slb_nr, rt);
+    if (slb->esid & SLB_ESID_V) {
+        slb->esid &= ~SLB_ESID_V;
 
-    return rt;
+        /* XXX: given the fact that segment size is 256 MB or 1TB,
+         *      and we still don't have a tlb_flush_mask(env, n, mask)
+         *      in Qemu, we just invalidate all TLBs
+         */
+        tlb_flush(env, 1);
+    }
 }
 
-void ppc_store_slb (CPUPPCState *env, target_ulong rb, target_ulong rs)
+int ppc_store_slb (CPUPPCState *env, target_ulong rb, target_ulong rs)
 {
-    ppc_slb_t *slb;
-
-    uint64_t vsid;
-    uint64_t esid;
-    int flags, valid, slb_nr;
-
-    vsid = rs >> 12;
-    flags = ((rs >> 8) & 0xf);
+    int slot = rb & 0xfff;
+    uint64_t esid = rb & ~0xfff;
+    ppc_slb_t *slb = &env->slb[slot];
 
-    esid = rb >> 28;
-    valid = (rb & (1 << 27));
-    slb_nr = rb & 0xfff;
+    if (slot >= env->slb_nr) {
+        return -1;
+    }
 
-    slb = slb_get_entry(env, slb_nr);
-    slb->tmp64 = (esid << 28) | valid | (vsid >> 24);
-    slb->tmp = (vsid << 8) | (flags << 3);
+    slb->esid = esid;
+    slb->vsid = rs;
 
     LOG_SLB("%s: %d " TARGET_FMT_lx " - " TARGET_FMT_lx " => %016" PRIx64
-            " %08" PRIx32 "\n", __func__, slb_nr, rb, rs, slb->tmp64,
-            slb->tmp);
+            " %016" PRIx64 "\n", __func__, slot, rb, rs,
+            slb->esid, slb->vsid);
 
-    slb_set_entry(env, slb_nr, slb);
+    return 0;
 }
 #endif /* defined(TARGET_PPC64) */
 
@@ -860,24 +786,22 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
 {
     target_phys_addr_t sdr, hash, mask, sdr_mask, htab_mask;
     target_ulong sr, vsid, vsid_mask, pgidx, page_mask;
-#if defined(TARGET_PPC64)
-    int attr;
-#endif
     int ds, vsid_sh, sdr_sh, pr, target_page_bits;
     int ret, ret2;
 
     pr = msr_pr;
 #if defined(TARGET_PPC64)
     if (env->mmu_model & POWERPC_MMU_64) {
+        int attr;
+
         LOG_MMU("Check SLBs\n");
         ret = slb_lookup(env, eaddr, &vsid, &page_mask, &attr,
                          &target_page_bits);
         if (ret < 0)
             return ret;
-        ctx->key = ((attr & 0x40) && (pr != 0)) ||
-            ((attr & 0x80) && (pr == 0)) ? 1 : 0;
+        ctx->key = !!(pr ? (attr & SLB_VSID_KP) : (attr & SLB_VSID_KS));
         ds = 0;
-        ctx->nx = attr & 0x10 ? 1 : 0;
+        ctx->nx = !!(attr & SLB_VSID_N);
         ctx->eaddr = eaddr;
         vsid_mask = 0x00003FFFFFFFFF80ULL;
         vsid_sh = 7;
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 2bf9283..d512cb0 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -340,7 +340,6 @@ DEF_HELPER_1(74xx_tlbi, void, tl)
 DEF_HELPER_FLAGS_0(tlbia, TCG_CALL_CONST, void)
 DEF_HELPER_FLAGS_1(tlbie, TCG_CALL_CONST, void, tl)
 #if defined(TARGET_PPC64)
-DEF_HELPER_FLAGS_1(load_slb, TCG_CALL_CONST, tl, tl)
 DEF_HELPER_FLAGS_2(store_slb, TCG_CALL_CONST, void, tl, tl)
 DEF_HELPER_FLAGS_0(slbia, TCG_CALL_CONST, void)
 DEF_HELPER_FLAGS_1(slbie, TCG_CALL_CONST, void, tl)
diff --git a/target-ppc/op_helper.c b/target-ppc/op_helper.c
index 17e070a..bf41627 100644
--- a/target-ppc/op_helper.c
+++ b/target-ppc/op_helper.c
@@ -3746,14 +3746,11 @@ void helper_store_sr (target_ulong sr_num, target_ulong val)
 
 /* SLB management */
 #if defined(TARGET_PPC64)
-target_ulong helper_load_slb (target_ulong slb_nr)
-{
-    return ppc_load_slb(env, slb_nr);
-}
-
 void helper_store_slb (target_ulong rb, target_ulong rs)
 {
-    ppc_store_slb(env, rb, rs);
+    if (ppc_store_slb(env, rb, rs) < 0) {
+        helper_raise_exception_err(POWERPC_EXCP_PROGRAM, POWERPC_EXCP_INVAL);
+    }
 }
 
 void helper_slbia (void)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] [PATCH 03/15] Allow qemu_devtree_setprop() to take arbitrary values
  2011-02-12 14:54 [Qemu-devel] RFC: Implement emulation of pSeries logical partitions David Gibson
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 01/15] Add TAGS and *~ to .gitignore David Gibson
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 02/15] Clean up PowerPC SLB handling code David Gibson
@ 2011-02-12 14:54 ` David Gibson
  2011-02-12 15:18   ` [Qemu-devel] " Alexander Graf
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 04/15] Add a hook to allow hypercalls to be emulated on PowerPC David Gibson
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 73+ messages in thread
From: David Gibson @ 2011-02-12 14:54 UTC (permalink / raw
  To: qemu-devel; +Cc: paulus, agraf, anton

From: David Gibson <dwg@au1.ibm.com>

Currently qemu_devtree_setprop() expects the new property value to be
given as a uint32_t *.  While property values consisting of u32s are
common, in general they can have any bytestring value.

Therefore, this patch alters the function to take a void * instead,
allowing callers to easily give anything as the property value.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 device_tree.c |    2 +-
 device_tree.h |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/device_tree.c b/device_tree.c
index 426a631..21be070 100644
--- a/device_tree.c
+++ b/device_tree.c
@@ -74,7 +74,7 @@ fail:
 }
 
 int qemu_devtree_setprop(void *fdt, const char *node_path,
-                         const char *property, uint32_t *val_array, int size)
+                         const char *property, void *val_array, int size)
 {
     int offset;
 
diff --git a/device_tree.h b/device_tree.h
index f05c4e7..cecd98f 100644
--- a/device_tree.h
+++ b/device_tree.h
@@ -17,7 +17,7 @@
 void *load_device_tree(const char *filename_path, int *sizep);
 
 int qemu_devtree_setprop(void *fdt, const char *node_path,
-                         const char *property, uint32_t *val_array, int size);
+                         const char *property, void *val_array, int size);
 int qemu_devtree_setprop_cell(void *fdt, const char *node_path,
                               const char *property, uint32_t val);
 int qemu_devtree_setprop_string(void *fdt, const char *node_path,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] [PATCH 04/15] Add a hook to allow hypercalls to be emulated on PowerPC
  2011-02-12 14:54 [Qemu-devel] RFC: Implement emulation of pSeries logical partitions David Gibson
                   ` (2 preceding siblings ...)
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 03/15] Allow qemu_devtree_setprop() to take arbitrary values David Gibson
@ 2011-02-12 14:54 ` David Gibson
  2011-02-12 15:19   ` [Qemu-devel] " Alexander Graf
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 05/15] Implement PowerPC slbmfee and slbmfev instructions David Gibson
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 73+ messages in thread
From: David Gibson @ 2011-02-12 14:54 UTC (permalink / raw
  To: qemu-devel; +Cc: paulus, agraf, anton

From: David Gibson <dwg@au1.ibm.com>

PowerPC and POWER chips since the POWER4 and 970 have a special
hypervisor mode, and a corresponding form of the system call
instruction which traps to the hypervisor.

qemu currently has stub implementations of hypervisor mode.  That
is, the outline is there to allow qemu to run a PowerPC hypervisor
under emulation.  There are a number of details missing so this
won't actually work at present, but the idea is there.

What there is no provision at all, is for qemu to instead emulate
the hypervisor itself.  That is to have hypercalls trap into qemu
and their result be emulated from qemu, rather than running
hypervisor code within the emulated system.

Hypervisor hardware aware KVM implementations are in the works and
it would  be useful for debugging and development to also allow
full emulation of the same para-virtualized guests as such a KVM.

Therefore, this patch adds a hook which will allow a machine to
set up emulation of hypervisor calls.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 target-ppc/cpu.h    |    2 ++
 target-ppc/helper.c |    4 ++++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index a20c132..eaddc27 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -692,6 +692,8 @@ struct CPUPPCState {
     int bfd_mach;
     uint32_t flags;
     uint64_t insns_flags;
+    void (*emulate_hypercall)(CPUState *, void *);
+    void *hcall_opaque;
 
     int error_code;
     uint32_t pending_interrupts;
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index 2094ca3..19aa067 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -2152,6 +2152,10 @@ static inline void powerpc_excp(CPUState *env, int excp_model, int excp)
     case POWERPC_EXCP_SYSCALL:   /* System call exception                    */
         dump_syscall(env);
         lev = env->error_code;
+	if ((lev == 1) && env->emulate_hypercall) {
+	    env->emulate_hypercall(env, env->hcall_opaque);
+	    return;
+	}	    
         if (lev == 1 || (lpes0 == 0 && lpes1 == 0))
             new_msr |= (target_ulong)MSR_HVB;
         goto store_next;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] [PATCH 05/15] Implement PowerPC slbmfee and slbmfev instructions
  2011-02-12 14:54 [Qemu-devel] RFC: Implement emulation of pSeries logical partitions David Gibson
                   ` (3 preceding siblings ...)
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 04/15] Add a hook to allow hypercalls to be emulated on PowerPC David Gibson
@ 2011-02-12 14:54 ` David Gibson
  2011-02-12 15:23   ` [Qemu-devel] " Alexander Graf
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 06/15] Implement missing parts of the logic for the POWER PURR David Gibson
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 73+ messages in thread
From: David Gibson @ 2011-02-12 14:54 UTC (permalink / raw
  To: qemu-devel; +Cc: paulus, agraf, anton

From: David Gibson <dwg@au1.ibm.com>

For a 64-bit PowerPC target, qemu correctly implements translation
through the segment lookaside buffer.  Likewise it supports the
slbmte instruction which is used to load entries into the SLB.

However, it does not emulate the slbmfee and slbmfev instructions
which read SLB entries back into registers.  Because these are
only occasionally used in guests (mostly for debugging) we get
away with it.

However, given the recent SLB cleanups, it becomes quite easy to
implement these, and thereby allow, amongst other things, a guest
Linux to use xmon's command to dump the SLB.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 target-ppc/cpu.h       |    2 ++
 target-ppc/helper.c    |   26 ++++++++++++++++++++++++++
 target-ppc/helper.h    |    2 ++
 target-ppc/op_helper.c |   20 ++++++++++++++++++++
 target-ppc/translate.c |   29 ++++++++++++++++++++++++++++-
 5 files changed, 78 insertions(+), 1 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index eaddc27..9a7495a 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -781,6 +781,8 @@ void ppc_store_asr (CPUPPCState *env, target_ulong value);
 target_ulong ppc_load_slb (CPUPPCState *env, int slb_nr);
 target_ulong ppc_load_sr (CPUPPCState *env, int sr_nr);
 int ppc_store_slb (CPUPPCState *env, target_ulong rb, target_ulong rs);
+int ppc_load_slb_esid (CPUPPCState *env, target_ulong rb, target_ulong *rt);
+int ppc_load_slb_vsid (CPUPPCState *env, target_ulong rb, target_ulong *rt);
 #endif /* defined(TARGET_PPC64) */
 void ppc_store_sr (CPUPPCState *env, int srnum, target_ulong value);
 #endif /* !defined(CONFIG_USER_ONLY) */
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index 19aa067..4830981 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -770,6 +770,32 @@ int ppc_store_slb (CPUPPCState *env, target_ulong rb, target_ulong rs)
 
     return 0;
 }
+
+int ppc_load_slb_esid (CPUPPCState *env, target_ulong rb, target_ulong *rt)
+{
+    int slot = rb & 0xfff;
+    ppc_slb_t *slb = &env->slb[slot];
+
+    if (slot >= env->slb_nr) {
+        return -1;
+    }
+
+    *rt = slb->esid;
+    return 0;
+}
+
+int ppc_load_slb_vsid (CPUPPCState *env, target_ulong rb, target_ulong *rt)
+{
+    int slot = rb & 0xfff;
+    ppc_slb_t *slb = &env->slb[slot];
+
+    if (slot >= env->slb_nr) {
+        return -1;
+    }
+
+    *rt = slb->vsid;
+    return 0;
+}
 #endif /* defined(TARGET_PPC64) */
 
 /* Perform segment based translation */
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index d512cb0..1a69cf8 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -341,6 +341,8 @@ DEF_HELPER_FLAGS_0(tlbia, TCG_CALL_CONST, void)
 DEF_HELPER_FLAGS_1(tlbie, TCG_CALL_CONST, void, tl)
 #if defined(TARGET_PPC64)
 DEF_HELPER_FLAGS_2(store_slb, TCG_CALL_CONST, void, tl, tl)
+DEF_HELPER_1(load_slb_esid, tl, tl)
+DEF_HELPER_1(load_slb_vsid, tl, tl)
 DEF_HELPER_FLAGS_0(slbia, TCG_CALL_CONST, void)
 DEF_HELPER_FLAGS_1(slbie, TCG_CALL_CONST, void, tl)
 #endif
diff --git a/target-ppc/op_helper.c b/target-ppc/op_helper.c
index bf41627..bdb1f17 100644
--- a/target-ppc/op_helper.c
+++ b/target-ppc/op_helper.c
@@ -3753,6 +3753,26 @@ void helper_store_slb (target_ulong rb, target_ulong rs)
     }
 }
 
+target_ulong helper_load_slb_esid (target_ulong rb)
+{
+    target_ulong rt;
+
+    if (ppc_load_slb_esid(env, rb, &rt) < 0) {
+        helper_raise_exception_err(POWERPC_EXCP_PROGRAM, POWERPC_EXCP_INVAL);
+    }
+    return rt;
+}
+
+target_ulong helper_load_slb_vsid (target_ulong rb)
+{
+    target_ulong rt;
+
+    if (ppc_load_slb_vsid(env, rb, &rt) < 0) {
+        helper_raise_exception_err(POWERPC_EXCP_PROGRAM, POWERPC_EXCP_INVAL);
+    }
+    return rt;
+}
+
 void helper_slbia (void)
 {
     ppc_slb_invalidate_all(env);
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 89413c5..2b1a851 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -4227,6 +4227,31 @@ static void gen_slbmte(DisasContext *ctx)
 #endif
 }
 
+static void gen_slbmfee(DisasContext *ctx)
+{
+#if defined(CONFIG_USER_ONLY)
+    gen_inval_exception(ctx, POWERPC_EXCP_PRIV_REG);
+#else
+    if (unlikely(!ctx->mem_idx)) {
+        gen_inval_exception(ctx, POWERPC_EXCP_PRIV_REG);
+        return;
+    }
+    gen_helper_load_slb_esid(cpu_gpr[rS(ctx->opcode)], cpu_gpr[rB(ctx->opcode)]);
+#endif
+}
+
+static void gen_slbmfev(DisasContext *ctx)
+{
+#if defined(CONFIG_USER_ONLY)
+    gen_inval_exception(ctx, POWERPC_EXCP_PRIV_REG);
+#else
+    if (unlikely(!ctx->mem_idx)) {
+        gen_inval_exception(ctx, POWERPC_EXCP_PRIV_REG);
+        return;
+    }
+    gen_helper_load_slb_vsid(cpu_gpr[rS(ctx->opcode)], cpu_gpr[rB(ctx->opcode)]);
+#endif
+}
 #endif /* defined(TARGET_PPC64) */
 
 /***                      Lookaside buffer management                      ***/
@@ -8110,7 +8135,9 @@ GEN_HANDLER2(mfsrin_64b, "mfsrin", 0x1F, 0x13, 0x14, 0x001F0001,
 GEN_HANDLER2(mtsr_64b, "mtsr", 0x1F, 0x12, 0x06, 0x0010F801, PPC_SEGMENT_64B),
 GEN_HANDLER2(mtsrin_64b, "mtsrin", 0x1F, 0x12, 0x07, 0x001F0001,
              PPC_SEGMENT_64B),
-GEN_HANDLER2(slbmte, "slbmte", 0x1F, 0x12, 0x0C, 0x00000000, PPC_SEGMENT_64B),
+GEN_HANDLER2(slbmte, "slbmte", 0x1F, 0x12, 0x0C, 0x001F0001, PPC_SEGMENT_64B),
+GEN_HANDLER2(slbmfee, "slbmfee", 0x1F, 0x13, 0x1C, 0x001F0001, PPC_SEGMENT_64B),
+GEN_HANDLER2(slbmfev, "slbmfev", 0x1F, 0x13, 0x1A, 0x001F0001, PPC_SEGMENT_64B),
 #endif
 GEN_HANDLER(tlbia, 0x1F, 0x12, 0x0B, 0x03FFFC01, PPC_MEM_TLBIA),
 GEN_HANDLER(tlbiel, 0x1F, 0x12, 0x08, 0x03FF0001, PPC_MEM_TLBIE),
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] [PATCH 06/15] Implement missing parts of the logic for the POWER PURR
  2011-02-12 14:54 [Qemu-devel] RFC: Implement emulation of pSeries logical partitions David Gibson
                   ` (4 preceding siblings ...)
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 05/15] Implement PowerPC slbmfee and slbmfev instructions David Gibson
@ 2011-02-12 14:54 ` David Gibson
  2011-02-12 15:25   ` [Qemu-devel] " Alexander Graf
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 07/15] Correct ppc popcntb logic, implement popcntw and popcntd David Gibson
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 73+ messages in thread
From: David Gibson @ 2011-02-12 14:54 UTC (permalink / raw
  To: qemu-devel; +Cc: paulus, agraf, anton

From: David Gibson <dwg@au1.ibm.com>

The PURR (Processor Utilization Resource Register) is a register found
on recent POWER CPUs.  The guts of implementing it at least enough to
get by are already present in qemu, however some of the helper
functions needed to actually wire it up are missing.

This patch adds the necessary glue, so that the PURR can be wired up
when we implement newer POWER CPU targets which include it.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 target-ppc/helper.h         |    1 +
 target-ppc/op_helper.c      |    5 +++++
 target-ppc/translate_init.c |    6 ++++++
 3 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 1a69cf8..4227897 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -371,6 +371,7 @@ DEF_HELPER_0(load_tbl, tl)
 DEF_HELPER_0(load_tbu, tl)
 DEF_HELPER_0(load_atbl, tl)
 DEF_HELPER_0(load_atbu, tl)
+DEF_HELPER_0(load_purr, tl)
 DEF_HELPER_0(load_601_rtcl, tl)
 DEF_HELPER_0(load_601_rtcu, tl)
 #if !defined(CONFIG_USER_ONLY)
diff --git a/target-ppc/op_helper.c b/target-ppc/op_helper.c
index bdb1f17..b9b5ae2 100644
--- a/target-ppc/op_helper.c
+++ b/target-ppc/op_helper.c
@@ -86,6 +86,11 @@ target_ulong helper_load_atbu (void)
     return cpu_ppc_load_atbu(env);
 }
 
+target_ulong helper_load_purr (void)
+{
+    return (target_ulong)cpu_ppc_load_purr(env);
+}
+
 target_ulong helper_load_601_rtcl (void)
 {
     return cpu_ppc601_load_rtcl(env);
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index dfcd949..c842330 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -209,6 +209,12 @@ static void spr_write_atbu (void *opaque, int sprn, int gprn)
 {
     gen_helper_store_atbu(cpu_gpr[gprn]);
 }
+
+__attribute__ (( unused ))
+static void spr_read_purr(void *opaque, int gprn, int sprn)
+{
+    gen_helper_load_purr(cpu_gpr[gprn]);
+}
 #endif
 
 #if !defined(CONFIG_USER_ONLY)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] [PATCH 07/15] Correct ppc popcntb logic, implement popcntw and popcntd
  2011-02-12 14:54 [Qemu-devel] RFC: Implement emulation of pSeries logical partitions David Gibson
                   ` (5 preceding siblings ...)
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 06/15] Implement missing parts of the logic for the POWER PURR David Gibson
@ 2011-02-12 14:54 ` David Gibson
  2011-02-12 15:27   ` [Qemu-devel] " Alexander Graf
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 08/15] Clean up slb_lookup() function David Gibson
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 73+ messages in thread
From: David Gibson @ 2011-02-12 14:54 UTC (permalink / raw
  To: qemu-devel; +Cc: paulus, agraf, anton

From: David Gibson <dwg@au1.ibm.com>

qemu already includes support for the popcntb instruction introduced
in POWER5 (although it doesn't actually allow you to choose POWER5).

However, the logic is slightly incorrect: it will generate results
truncated to 32-bits when the CPU is in 32-bit mode.  This is not
normal for powerpc - generally arithmetic instructions on a 64-bit
powerpc cpu will generate full 64 bit results, it's just that only the
low 32 bits will be significant for condition codes.

This patch corrects this nit, which actually simplifies the code slightly.

In addition, this patch implements the popcntw and popcntd
instructions added in POWER7, in preparation for allowing POWER7 as an
emulated CPU.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 target-ppc/cpu.h       |    2 +
 target-ppc/helper.h    |    3 +-
 target-ppc/op_helper.c |   55 +++++++++++++++++++++++++++++++++++++++++++----
 target-ppc/translate.c |   20 +++++++++++++----
 4 files changed, 69 insertions(+), 11 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 9a7495a..f9ad3b8 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -1507,6 +1507,8 @@ enum {
     PPC_DCRX           = 0x2000000000000000ULL,
     /* user-mode DCR access, implemented in PowerPC 460                      */
     PPC_DCRUX          = 0x4000000000000000ULL,
+    /* popcntw and popcntd instructions                                      */
+    PPC_POPCNTWD       = 0x8000000000000000ULL,
 };
 
 /*****************************************************************************/
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 4227897..19c5ebe 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -38,10 +38,11 @@ DEF_HELPER_2(mulldo, i64, i64, i64)
 
 DEF_HELPER_FLAGS_1(cntlzw, TCG_CALL_CONST | TCG_CALL_PURE, tl, tl)
 DEF_HELPER_FLAGS_1(popcntb, TCG_CALL_CONST | TCG_CALL_PURE, tl, tl)
+DEF_HELPER_FLAGS_1(popcntw, TCG_CALL_CONST | TCG_CALL_PURE, tl, tl)
 DEF_HELPER_2(sraw, tl, tl, tl)
 #if defined(TARGET_PPC64)
 DEF_HELPER_FLAGS_1(cntlzd, TCG_CALL_CONST | TCG_CALL_PURE, tl, tl)
-DEF_HELPER_FLAGS_1(popcntb_64, TCG_CALL_CONST | TCG_CALL_PURE, tl, tl)
+DEF_HELPER_FLAGS_1(popcntd, TCG_CALL_CONST | TCG_CALL_PURE, tl, tl)
 DEF_HELPER_2(srad, tl, tl, tl)
 #endif
 
diff --git a/target-ppc/op_helper.c b/target-ppc/op_helper.c
index b9b5ae2..9dd3217 100644
--- a/target-ppc/op_helper.c
+++ b/target-ppc/op_helper.c
@@ -497,6 +497,50 @@ target_ulong helper_srad (target_ulong value, target_ulong shift)
 }
 #endif
 
+#if defined(TARGET_PPC64)
+target_ulong helper_popcntb (target_ulong val)
+{
+    val = (val & 0x5555555555555555ULL) + ((val >>  1) &
+                                           0x5555555555555555ULL);
+    val = (val & 0x3333333333333333ULL) + ((val >>  2) &
+                                           0x3333333333333333ULL);
+    val = (val & 0x0f0f0f0f0f0f0f0fULL) + ((val >>  4) &
+                                           0x0f0f0f0f0f0f0f0fULL);
+    return val;
+}
+
+target_ulong helper_popcntw (target_ulong val)
+{
+    val = (val & 0x5555555555555555ULL) + ((val >>  1) &
+                                           0x5555555555555555ULL);
+    val = (val & 0x3333333333333333ULL) + ((val >>  2) &
+                                           0x3333333333333333ULL);
+    val = (val & 0x0f0f0f0f0f0f0f0fULL) + ((val >>  4) &
+                                           0x0f0f0f0f0f0f0f0fULL);
+    val = (val & 0x00ff00ff00ff00ffULL) + ((val >>  8) &
+                                           0x00ff00ff00ff00ffULL);
+    val = (val & 0x0000ffff0000ffffULL) + ((val >> 16) &
+                                           0x0000ffff0000ffffULL);
+    return val;
+}
+
+target_ulong helper_popcntd (target_ulong val)
+{
+    val = (val & 0x5555555555555555ULL) + ((val >>  1) &
+                                           0x5555555555555555ULL);
+    val = (val & 0x3333333333333333ULL) + ((val >>  2) &
+                                           0x3333333333333333ULL);
+    val = (val & 0x0f0f0f0f0f0f0f0fULL) + ((val >>  4) &
+                                           0x0f0f0f0f0f0f0f0fULL);
+    val = (val & 0x00ff00ff00ff00ffULL) + ((val >>  8) &
+                                           0x00ff00ff00ff00ffULL);
+    val = (val & 0x0000ffff0000ffffULL) + ((val >> 16) &
+                                           0x0000ffff0000ffffULL);
+    val = (val & 0x00000000ffffffffULL) + ((val >> 32) &
+                                           0x00000000ffffffffULL);
+    return val;
+}
+#else
 target_ulong helper_popcntb (target_ulong val)
 {
     val = (val & 0x55555555) + ((val >>  1) & 0x55555555);
@@ -505,12 +549,13 @@ target_ulong helper_popcntb (target_ulong val)
     return val;
 }
 
-#if defined(TARGET_PPC64)
-target_ulong helper_popcntb_64 (target_ulong val)
+target_ulong helper_popcntw (target_ulong val)
 {
-    val = (val & 0x5555555555555555ULL) + ((val >>  1) & 0x5555555555555555ULL);
-    val = (val & 0x3333333333333333ULL) + ((val >>  2) & 0x3333333333333333ULL);
-    val = (val & 0x0f0f0f0f0f0f0f0fULL) + ((val >>  4) & 0x0f0f0f0f0f0f0f0fULL);
+    val = (val & 0x55555555) + ((val >>  1) & 0x55555555);
+    val = (val & 0x33333333) + ((val >>  2) & 0x33333333);
+    val = (val & 0x0f0f0f0f) + ((val >>  4) & 0x0f0f0f0f);
+    val = (val & 0x00ff00ff) + ((val >>  8) & 0x00ff00ff);
+    val = (val & 0x0000ffff) + ((val >> 16) & 0x0000ffff);
     return val;
 }
 #endif
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 2b1a851..5c28ac3 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -1483,13 +1483,21 @@ static void gen_xoris(DisasContext *ctx)
 /* popcntb : PowerPC 2.03 specification */
 static void gen_popcntb(DisasContext *ctx)
 {
+    gen_helper_popcntb(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+}
+
+static void gen_popcntw(DisasContext *ctx)
+{
+    gen_helper_popcntw(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+}
+
 #if defined(TARGET_PPC64)
-    if (ctx->sf_mode)
-        gen_helper_popcntb_64(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
-    else
-#endif
-        gen_helper_popcntb(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+/* popcntd: PowerPC 2.06 specification */
+static void gen_popcntd(DisasContext *ctx)
+{
+    gen_helper_popcntd(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
 }
+#endif
 
 #if defined(TARGET_PPC64)
 /* extsw & extsw. */
@@ -8034,7 +8042,9 @@ GEN_HANDLER(oris, 0x19, 0xFF, 0xFF, 0x00000000, PPC_INTEGER),
 GEN_HANDLER(xori, 0x1A, 0xFF, 0xFF, 0x00000000, PPC_INTEGER),
 GEN_HANDLER(xoris, 0x1B, 0xFF, 0xFF, 0x00000000, PPC_INTEGER),
 GEN_HANDLER(popcntb, 0x1F, 0x03, 0x03, 0x0000F801, PPC_POPCNTB),
+GEN_HANDLER(popcntw, 0x1F, 0x1A, 0x0b, 0x0000F801, PPC_POPCNTWD),
 #if defined(TARGET_PPC64)
+GEN_HANDLER(popcntd, 0x1F, 0x1A, 0x0F, 0x0000F801, PPC_POPCNTWD),
 GEN_HANDLER(cntlzd, 0x1F, 0x1A, 0x01, 0x00000000, PPC_64B),
 #endif
 GEN_HANDLER(rlwimi, 0x14, 0xFF, 0xFF, 0x00000000, PPC_INTEGER),
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] [PATCH 08/15] Clean up slb_lookup() function
  2011-02-12 14:54 [Qemu-devel] RFC: Implement emulation of pSeries logical partitions David Gibson
                   ` (6 preceding siblings ...)
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 07/15] Correct ppc popcntb logic, implement popcntw and popcntd David Gibson
@ 2011-02-12 14:54 ` David Gibson
  2011-02-12 15:30   ` [Qemu-devel] " Alexander Graf
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 09/15] Parse SDR1 on mtspr instead of at translate time David Gibson
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 73+ messages in thread
From: David Gibson @ 2011-02-12 14:54 UTC (permalink / raw
  To: qemu-devel; +Cc: paulus, agraf, anton

The slb_lookup() function, used in the ppc translation path returns a
number of slb entry fields in reference parameters.  However, only one
of the two callers of slb_lookup() actually wants this information.

This patch, therefore, makes slb_lookup() return a simple pointer to the
located SLB entry (or NULL), and the caller which needs the fields can
extract them itself.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 target-ppc/helper.c |   45 ++++++++++++++++++---------------------------
 1 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index 4830981..73d93ca 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -672,9 +672,7 @@ static inline int find_pte(CPUState *env, mmu_ctx_t *ctx, int h, int rw,
 }
 
 #if defined(TARGET_PPC64)
-static inline int slb_lookup(CPUPPCState *env, target_ulong eaddr,
-                             target_ulong *vsid, target_ulong *page_mask,
-                             int *attr, int *target_page_bits)
+static inline ppc_slb_t *slb_lookup(CPUPPCState *env, target_ulong eaddr)
 {
     uint64_t esid;
     int n;
@@ -689,19 +687,11 @@ static inline int slb_lookup(CPUPPCState *env, target_ulong eaddr,
         LOG_SLB("%s: slot %d %016" PRIx64 " %016"
                     PRIx64 "\n", __func__, n, slb->esid, slb->vsid);
         if (slb->esid == esid) {
-            *vsid = (slb->vsid & SLB_VSID_VSID) >> SLB_VSID_SHIFT;
-            *page_mask = ~SEGMENT_MASK_256M;
-            *attr = slb->vsid & SLB_VSID_ATTR;
-            if (target_page_bits) {
-                *target_page_bits = (slb->vsid & SLB_VSID_L)
-                    ? TARGET_PAGE_BITS_16M
-                    : TARGET_PAGE_BITS;
-            }
-            return n;
+            return slb;
         }
     }
 
-    return -5;
+    return NULL;
 }
 
 void ppc_slb_invalidate_all (CPUPPCState *env)
@@ -728,18 +718,13 @@ void ppc_slb_invalidate_all (CPUPPCState *env)
 
 void ppc_slb_invalidate_one (CPUPPCState *env, uint64_t T0)
 {
-    target_ulong vsid, page_mask;
-    int attr;
-    int n;
     ppc_slb_t *slb;
 
-    n = slb_lookup(env, T0, &vsid, &page_mask, &attr, NULL);
-    if (n < 0) {
+    slb = slb_lookup(env, T0);
+    if (!slb) {
         return;
     }
 
-    slb = &env->slb[n];
-
     if (slb->esid & SLB_ESID_V) {
         slb->esid &= ~SLB_ESID_V;
 
@@ -818,16 +803,22 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
     pr = msr_pr;
 #if defined(TARGET_PPC64)
     if (env->mmu_model & POWERPC_MMU_64) {
-        int attr;
+        ppc_slb_t *slb;
 
         LOG_MMU("Check SLBs\n");
-        ret = slb_lookup(env, eaddr, &vsid, &page_mask, &attr,
-                         &target_page_bits);
-        if (ret < 0)
-            return ret;
-        ctx->key = !!(pr ? (attr & SLB_VSID_KP) : (attr & SLB_VSID_KS));
+        slb = slb_lookup(env, eaddr);
+        if (!slb) {
+            return -5;
+        }
+
+        vsid = (slb->vsid & SLB_VSID_VSID) >> SLB_VSID_SHIFT;
+        page_mask = ~SEGMENT_MASK_256M;
+        target_page_bits = (slb->vsid & SLB_VSID_L)
+            ? TARGET_PAGE_BITS_16M : TARGET_PAGE_BITS;
+        ctx->key = !!(pr ? (slb->vsid & SLB_VSID_KP)
+                      : (slb->vsid & SLB_VSID_KS));
         ds = 0;
-        ctx->nx = !!(attr & SLB_VSID_N);
+        ctx->nx = !!(slb->vsid & SLB_VSID_N);
         ctx->eaddr = eaddr;
         vsid_mask = 0x00003FFFFFFFFF80ULL;
         vsid_sh = 7;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] [PATCH 09/15] Parse SDR1 on mtspr instead of at translate time
  2011-02-12 14:54 [Qemu-devel] RFC: Implement emulation of pSeries logical partitions David Gibson
                   ` (7 preceding siblings ...)
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 08/15] Clean up slb_lookup() function David Gibson
@ 2011-02-12 14:54 ` David Gibson
  2011-02-12 15:37   ` [Qemu-devel] " Alexander Graf
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 10/15] Use "hash" more consistently in ppc mmu code David Gibson
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 73+ messages in thread
From: David Gibson @ 2011-02-12 14:54 UTC (permalink / raw
  To: qemu-devel; +Cc: paulus, agraf, anton

On ppc machines with hash table MMUs, the special purpose register SDR1
contains both the base address of the encoded size (hashed) page tables.

At present, we interpret the SDR1 value within the address translation
path.  But because the encodings of the size for 32-bit and 64-bit are
different this makes for a confusing branch on the MMU type with a bunch
of curly shifts and masks in the middle of the translate path.

This patch cleans things up by moving the interpretation on SDR1 into the
helper function handling the write to the register.  This leaves a simple
pre-sanitized base address and mask for the hash table in the CPUState
structure which is easier to work with in the translation path.

This makes the translation path more readable.  It addresses the FIXME
comment currently in the mtsdr1 helper, by validating the SDR1 value during
interpretation.  Finally it opens the way for emulating a pSeries-style
partition where the hash table used for translation is not mapped into
the guests's RAM.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 monitor.c                   |    2 +-
 target-ppc/cpu.h            |   11 +++++-
 target-ppc/helper.c         |   79 ++++++++++++++++++++++++-------------------
 target-ppc/machine.c        |    6 ++-
 target-ppc/translate.c      |    2 +-
 target-ppc/translate_init.c |    7 +---
 6 files changed, 61 insertions(+), 46 deletions(-)

diff --git a/monitor.c b/monitor.c
index 7fc311d..3f77ffc 100644
--- a/monitor.c
+++ b/monitor.c
@@ -3457,7 +3457,7 @@ static const MonitorDef monitor_defs[] = {
     { "asr", offsetof(CPUState, asr) },
 #endif
     /* Segment registers */
-    { "sdr1", offsetof(CPUState, sdr1) },
+    { "sdr1", offsetof(CPUState, spr[SPR_SDR1]) },
     { "sr0", offsetof(CPUState, sr[0]) },
     { "sr1", offsetof(CPUState, sr[1]) },
     { "sr2", offsetof(CPUState, sr[2]) },
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index f9ad3b8..4d30352 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -359,6 +359,14 @@ union ppc_tlb_t {
 };
 #endif
 
+#define SDR_HTABORG_32         0xFFFF0000UL
+#define SDR_HTABMASK           0x000001FFUL
+
+#if defined(TARGET_PPC64)
+#define SDR_HTABORG_64         0xFFFFFFFFFFFC0000ULL
+#define SDR_HTABSIZE           0x000000000000001FULL
+#endif /* defined(TARGET_PPC64 */
+
 typedef struct ppc_slb_t ppc_slb_t;
 struct ppc_slb_t {
     uint64_t esid;
@@ -642,7 +650,8 @@ struct CPUPPCState {
     int slb_nr;
 #endif
     /* segment registers */
-    target_ulong sdr1;
+    target_phys_addr_t htab_base;
+    target_phys_addr_t htab_mask;
     target_ulong sr[32];
     /* BATs */
     int nb_BATs;
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index 73d93ca..dcb336b 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -784,20 +784,19 @@ int ppc_load_slb_vsid (CPUPPCState *env, target_ulong rb, target_ulong *rt)
 #endif /* defined(TARGET_PPC64) */
 
 /* Perform segment based translation */
-static inline target_phys_addr_t get_pgaddr(target_phys_addr_t sdr1,
-                                            int sdr_sh,
-                                            target_phys_addr_t hash,
-                                            target_phys_addr_t mask)
+static inline target_phys_addr_t get_pgaddr(target_phys_addr_t htab_base,
+                                            target_phys_addr_t htab_mask,
+                                            target_phys_addr_t hash)
 {
-    return (sdr1 & ((target_phys_addr_t)(-1ULL) << sdr_sh)) | (hash & mask);
+    return htab_base | (hash & htab_mask);
 }
 
 static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
                               target_ulong eaddr, int rw, int type)
 {
-    target_phys_addr_t sdr, hash, mask, sdr_mask, htab_mask;
+    target_phys_addr_t hash;
     target_ulong sr, vsid, vsid_mask, pgidx, page_mask;
-    int ds, vsid_sh, sdr_sh, pr, target_page_bits;
+    int ds, vsid_sh, pr, target_page_bits;
     int ret, ret2;
 
     pr = msr_pr;
@@ -822,8 +821,6 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
         ctx->eaddr = eaddr;
         vsid_mask = 0x00003FFFFFFFFF80ULL;
         vsid_sh = 7;
-        sdr_sh = 18;
-        sdr_mask = 0x3FF80;
     } else
 #endif /* defined(TARGET_PPC64) */
     {
@@ -836,8 +833,6 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
         vsid = sr & 0x00FFFFFF;
         vsid_mask = 0x01FFFFC0;
         vsid_sh = 6;
-        sdr_sh = 16;
-        sdr_mask = 0xFFC0;
         target_page_bits = TARGET_PAGE_BITS;
         LOG_MMU("Check segment v=" TARGET_FMT_lx " %d " TARGET_FMT_lx " nip="
                 TARGET_FMT_lx " lr=" TARGET_FMT_lx
@@ -853,29 +848,26 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
         if (type != ACCESS_CODE || ctx->nx == 0) {
             /* Page address translation */
             /* Primary table address */
-            sdr = env->sdr1;
             pgidx = (eaddr & page_mask) >> target_page_bits;
 #if defined(TARGET_PPC64)
             if (env->mmu_model & POWERPC_MMU_64) {
-                htab_mask = 0x0FFFFFFF >> (28 - (sdr & 0x1F));
                 /* XXX: this is false for 1 TB segments */
                 hash = ((vsid ^ pgidx) << vsid_sh) & vsid_mask;
             } else
 #endif
             {
-                htab_mask = sdr & 0x000001FF;
                 hash = ((vsid ^ pgidx) << vsid_sh) & vsid_mask;
             }
-            mask = (htab_mask << sdr_sh) | sdr_mask;
-            LOG_MMU("sdr " TARGET_FMT_plx " sh %d hash " TARGET_FMT_plx
-                    " mask " TARGET_FMT_plx " " TARGET_FMT_lx "\n",
-                    sdr, sdr_sh, hash, mask, page_mask);
-            ctx->pg_addr[0] = get_pgaddr(sdr, sdr_sh, hash, mask);
+            LOG_MMU("htab_base " TARGET_FMT_plx " htab_mask " TARGET_FMT_plx
+                    " hash " TARGET_FMT_plx "\n",
+                    env->htab_base, env->htab_mask, hash);
+            ctx->pg_addr[0] = get_pgaddr(env->htab_base, env->htab_mask, hash);
             /* Secondary table address */
             hash = (~hash) & vsid_mask;
-            LOG_MMU("sdr " TARGET_FMT_plx " sh %d hash " TARGET_FMT_plx
-                    " mask " TARGET_FMT_plx "\n", sdr, sdr_sh, hash, mask);
-            ctx->pg_addr[1] = get_pgaddr(sdr, sdr_sh, hash, mask);
+            LOG_MMU("htab_base " TARGET_FMT_plx " htab_mask " TARGET_FMT_plx
+                    " hash " TARGET_FMT_plx "\n",
+                    env->htab_base, env->htab_mask, hash);
+            ctx->pg_addr[1] = get_pgaddr(env->htab_base, env->htab_mask, hash);
 #if defined(TARGET_PPC64)
             if (env->mmu_model & POWERPC_MMU_64) {
                 /* Only 5 bits of the page index are used in the AVPN */
@@ -897,19 +889,21 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
                 /* Software TLB search */
                 ret = ppc6xx_tlb_check(env, ctx, eaddr, rw, type);
             } else {
-                LOG_MMU("0 sdr1=" TARGET_FMT_plx " vsid=" TARGET_FMT_lx " "
-                        "api=" TARGET_FMT_lx " hash=" TARGET_FMT_plx
-                        " pg_addr=" TARGET_FMT_plx "\n",
-                        sdr, vsid, pgidx, hash, ctx->pg_addr[0]);
+                LOG_MMU("0 htab=" TARGET_FMT_plx "/" TARGET_FMT_plx
+                        " vsid=" TARGET_FMT_lx " api=" TARGET_FMT_lx
+                        " hash=" TARGET_FMT_plx " pg_addr=" TARGET_FMT_plx "\n",
+                        env->htab_base, env->htab_mask, vsid, pgidx, hash,
+                        ctx->pg_addr[0]);
                 /* Primary table lookup */
                 ret = find_pte(env, ctx, 0, rw, type, target_page_bits);
                 if (ret < 0) {
                     /* Secondary table lookup */
                     if (eaddr != 0xEFFFFFFF)
-                        LOG_MMU("1 sdr1=" TARGET_FMT_plx " vsid=" TARGET_FMT_lx " "
-                                "api=" TARGET_FMT_lx " hash=" TARGET_FMT_plx
-                                " pg_addr=" TARGET_FMT_plx "\n", sdr, vsid,
-                                pgidx, hash, ctx->pg_addr[1]);
+                        LOG_MMU("1 htab=" TARGET_FMT_plx "/" TARGET_FMT_plx
+                                " vsid=" TARGET_FMT_lx " api=" TARGET_FMT_lx
+                                " hash=" TARGET_FMT_plx " pg_addr=" TARGET_FMT_plx "\n",
+                                env->htab_base, env->htab_mask, vsid, pgidx, hash,
+                                ctx->pg_addr[1]);
                     ret2 = find_pte(env, ctx, 1, rw, type,
                                     target_page_bits);
                     if (ret2 != -1)
@@ -1915,11 +1909,26 @@ void ppc_store_asr (CPUPPCState *env, target_ulong value)
 void ppc_store_sdr1 (CPUPPCState *env, target_ulong value)
 {
     LOG_MMU("%s: " TARGET_FMT_lx "\n", __func__, value);
-    if (env->sdr1 != value) {
-        /* XXX: for PowerPC 64, should check that the HTABSIZE value
-         *      is <= 28
-         */
-        env->sdr1 = value;
+    if (env->spr[SPR_SDR1] != value) {
+        env->spr[SPR_SDR1] = value;
+#if defined(TARGET_PPC64)
+        if (env->mmu_model & POWERPC_MMU_64) {
+            target_ulong htabsize = value & SDR_HTABSIZE;
+
+            if (htabsize > 28) {
+                fprintf(stderr, "Invalid HTABSIZE 0x" TARGET_FMT_lx
+                        " stored in SDR1\n", htabsize);
+                htabsize = 28;
+            }
+            env->htab_mask = (1ULL << (htabsize + 18)) - 1;
+            env->htab_base = value & SDR_HTABORG_64;
+        } else
+#endif /* defined(TARGET_PPC64) */
+        {
+            /* FIXME: Should check for valid HTABMASK values */
+            env->htab_mask = ((value & SDR_HTABMASK) << 16) | 0xFFFF;
+            env->htab_base = value & SDR_HTABORG_32;
+        }
         tlb_flush(env, 1);
     }
 }
diff --git a/target-ppc/machine.c b/target-ppc/machine.c
index 67de951..0c1986e 100644
--- a/target-ppc/machine.c
+++ b/target-ppc/machine.c
@@ -37,7 +37,7 @@ void cpu_save(QEMUFile *f, void *opaque)
     qemu_put_betls(f, &env->asr);
     qemu_put_sbe32s(f, &env->slb_nr);
 #endif
-    qemu_put_betls(f, &env->sdr1);
+    qemu_put_betls(f, &env->spr[SPR_SDR1]);
     for (i = 0; i < 32; i++)
         qemu_put_betls(f, &env->sr[i]);
     for (i = 0; i < 2; i++)
@@ -93,6 +93,7 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
 {
     CPUState *env = (CPUState *)opaque;
     unsigned int i, j;
+    target_ulong sdr1;
 
     for (i = 0; i < 32; i++)
         qemu_get_betls(f, &env->gpr[i]);
@@ -124,7 +125,7 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
     qemu_get_betls(f, &env->asr);
     qemu_get_sbe32s(f, &env->slb_nr);
 #endif
-    qemu_get_betls(f, &env->sdr1);
+    qemu_get_betls(f, &sdr1);
     for (i = 0; i < 32; i++)
         qemu_get_betls(f, &env->sr[i]);
     for (i = 0; i < 2; i++)
@@ -152,6 +153,7 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
 #endif
     for (i = 0; i < 1024; i++)
         qemu_get_betls(f, &env->spr[i]);
+    ppc_store_sdr1(env, sdr1);
     qemu_get_be32s(f, &env->vscr);
     qemu_get_be64s(f, &env->spe_acc);
     qemu_get_be32s(f, &env->spe_fscr);
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 5c28ac3..561b756 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -8930,7 +8930,7 @@ void cpu_dump_state (CPUState *env, FILE *f, fprintf_function cpu_fprintf,
 #if !defined(CONFIG_USER_ONLY)
     cpu_fprintf(f, "SRR0 " TARGET_FMT_lx " SRR1 " TARGET_FMT_lx " SDR1 "
                 TARGET_FMT_lx "\n", env->spr[SPR_SRR0], env->spr[SPR_SRR1],
-                env->sdr1);
+                env->spr[SPR_SDR1]);
 #endif
 
 #undef RGPL
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index c842330..c84581e 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -299,11 +299,6 @@ static void spr_write_dbatl_h (void *opaque, int sprn, int gprn)
 }
 
 /* SDR1 */
-static void spr_read_sdr1 (void *opaque, int gprn, int sprn)
-{
-    tcg_gen_ld_tl(cpu_gpr[gprn], cpu_env, offsetof(CPUState, sdr1));
-}
-
 static void spr_write_sdr1 (void *opaque, int sprn, int gprn)
 {
     gen_helper_store_sdr1(cpu_gpr[gprn]);
@@ -627,7 +622,7 @@ static void gen_spr_ne_601 (CPUPPCState *env)
     /* Memory management */
     spr_register(env, SPR_SDR1, "SDR1",
                  SPR_NOACCESS, SPR_NOACCESS,
-                 &spr_read_sdr1, &spr_write_sdr1,
+                 &spr_read_generic, &spr_write_sdr1,
                  0x00000000);
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] [PATCH 10/15] Use "hash" more consistently in ppc mmu code
  2011-02-12 14:54 [Qemu-devel] RFC: Implement emulation of pSeries logical partitions David Gibson
                   ` (8 preceding siblings ...)
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 09/15] Parse SDR1 on mtspr instead of at translate time David Gibson
@ 2011-02-12 14:54 ` David Gibson
  2011-02-12 15:47   ` [Qemu-devel] " Alexander Graf
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 11/15] Better factor the ppc hash translation path David Gibson
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 73+ messages in thread
From: David Gibson @ 2011-02-12 14:54 UTC (permalink / raw
  To: qemu-devel; +Cc: paulus, agraf, anton

Currently, get_segment() has a variable called hash.  However it doesn't
(quite) get the hash value for the ppc hashed page table.  Instead it
gets the hash shifted - effectively the offset of the hash bucket within
the hash page table.

As well, as being different to the normal use of plain "hash" in the
architecture documentation, this usage necessitates some awkward 32/64
dependent masks and shifts which clutter up the path in get_segment().

This patch alters the code to use raw hash values through get_segment()
including storing raw hashes instead of pte group offsets in the ctx
structure.  This cleans up the path noticeably.

This does necessitate 32/64 dependent shifts when the hash values are
taken out of the ctx structure and used, but those paths already have
32/64 bit variants so this is less awkward than it was in get_segment().

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 target-ppc/cpu.h    |    5 ++-
 target-ppc/helper.c |   95 +++++++++++++++++++++++---------------------------
 2 files changed, 48 insertions(+), 52 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 4d30352..f69bcd2 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -359,6 +359,9 @@ union ppc_tlb_t {
 };
 #endif
 
+#define HASH_PTE_SIZE_32       8
+#define HASH_PTE_SIZE_64       16
+
 #define SDR_HTABORG_32         0xFFFF0000UL
 #define SDR_HTABMASK           0x000001FFUL
 
@@ -746,7 +749,7 @@ struct mmu_ctx_t {
     target_phys_addr_t raddr;      /* Real address              */
     target_phys_addr_t eaddr;      /* Effective address         */
     int prot;                      /* Protection bits           */
-    target_phys_addr_t pg_addr[2]; /* PTE tables base addresses */
+    target_phys_addr_t hash[2];    /* Pagetable hash values     */
     target_ulong ptem;             /* Virtual segment ID | API  */
     int key;                       /* Access key                */
     int nx;                        /* Non-execute area          */
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index dcb336b..d83738b 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -563,21 +563,30 @@ static inline int get_bat(CPUState *env, mmu_ctx_t *ctx, target_ulong virtual,
     return ret;
 }
 
+static inline target_phys_addr_t get_pteg_addr(CPUState *env,
+                                               target_phys_addr_t hash,
+                                               int pte_size)
+{
+    return env->htab_base + ((hash * pte_size * 8) & env->htab_mask);
+}
+
 /* PTE table lookup */
-static inline int _find_pte(mmu_ctx_t *ctx, int is_64b, int h, int rw,
-                            int type, int target_page_bits)
+static inline int _find_pte(CPUState *env, mmu_ctx_t *ctx, int is_64b, int h,
+                            int rw, int type, int target_page_bits)
 {
-    target_ulong base, pte0, pte1;
+    target_phys_addr_t pteg_base;
+    target_ulong pte0, pte1;
     int i, good = -1;
     int ret, r;
 
     ret = -1; /* No entry found */
-    base = ctx->pg_addr[h];
+    pteg_base = get_pteg_addr(env, ctx->hash[h],
+                              is_64b ? HASH_PTE_SIZE_64 : HASH_PTE_SIZE_32);
     for (i = 0; i < 8; i++) {
 #if defined(TARGET_PPC64)
         if (is_64b) {
-            pte0 = ldq_phys(base + (i * 16));
-            pte1 = ldq_phys(base + (i * 16) + 8);
+            pte0 = ldq_phys(pteg_base + (i * 16));
+            pte1 = ldq_phys(pteg_base + (i * 16) + 8);
 
             /* We have a TLB that saves 4K pages, so let's
              * split a huge page to 4k chunks */
@@ -588,17 +597,17 @@ static inline int _find_pte(mmu_ctx_t *ctx, int is_64b, int h, int rw,
             r = pte64_check(ctx, pte0, pte1, h, rw, type);
             LOG_MMU("Load pte from " TARGET_FMT_lx " => " TARGET_FMT_lx " "
                     TARGET_FMT_lx " %d %d %d " TARGET_FMT_lx "\n",
-                    base + (i * 16), pte0, pte1, (int)(pte0 & 1), h,
+                    pteg_base + (i * 16), pte0, pte1, (int)(pte0 & 1), h,
                     (int)((pte0 >> 1) & 1), ctx->ptem);
         } else
 #endif
         {
-            pte0 = ldl_phys(base + (i * 8));
-            pte1 =  ldl_phys(base + (i * 8) + 4);
+            pte0 = ldl_phys(pteg_base + (i * 8));
+            pte1 =  ldl_phys(pteg_base + (i * 8) + 4);
             r = pte32_check(ctx, pte0, pte1, h, rw, type);
             LOG_MMU("Load pte from " TARGET_FMT_lx " => " TARGET_FMT_lx " "
                     TARGET_FMT_lx " %d %d %d " TARGET_FMT_lx "\n",
-                    base + (i * 8), pte0, pte1, (int)(pte0 >> 31), h,
+                    pteg_base + (i * 8), pte0, pte1, (int)(pte0 >> 31), h,
                     (int)((pte0 >> 6) & 1), ctx->ptem);
         }
         switch (r) {
@@ -634,11 +643,11 @@ static inline int _find_pte(mmu_ctx_t *ctx, int is_64b, int h, int rw,
         if (pte_update_flags(ctx, &pte1, ret, rw) == 1) {
 #if defined(TARGET_PPC64)
             if (is_64b) {
-                stq_phys_notdirty(base + (good * 16) + 8, pte1);
+                stq_phys_notdirty(pteg_base + (good * 16) + 8, pte1);
             } else
 #endif
             {
-                stl_phys_notdirty(base + (good * 8) + 4, pte1);
+                stl_phys_notdirty(pteg_base + (good * 8) + 4, pte1);
             }
         }
     }
@@ -646,17 +655,17 @@ static inline int _find_pte(mmu_ctx_t *ctx, int is_64b, int h, int rw,
     return ret;
 }
 
-static inline int find_pte32(mmu_ctx_t *ctx, int h, int rw, int type,
-                             int target_page_bits)
+static inline int find_pte32(CPUState *env, mmu_ctx_t *ctx, int h, int rw,
+                             int type, int target_page_bits)
 {
-    return _find_pte(ctx, 0, h, rw, type, target_page_bits);
+    return _find_pte(env, ctx, 0, h, rw, type, target_page_bits);
 }
 
 #if defined(TARGET_PPC64)
-static inline int find_pte64(mmu_ctx_t *ctx, int h, int rw, int type,
-                             int target_page_bits)
+static inline int find_pte64(CPUState *env, mmu_ctx_t *ctx, int h, int rw,
+                             int type, int target_page_bits)
 {
-    return _find_pte(ctx, 1, h, rw, type, target_page_bits);
+    return _find_pte(env, ctx, 1, h, rw, type, target_page_bits);
 }
 #endif
 
@@ -665,10 +674,10 @@ static inline int find_pte(CPUState *env, mmu_ctx_t *ctx, int h, int rw,
 {
 #if defined(TARGET_PPC64)
     if (env->mmu_model & POWERPC_MMU_64)
-        return find_pte64(ctx, h, rw, type, target_page_bits);
+        return find_pte64(env, ctx, h, rw, type, target_page_bits);
 #endif
 
-    return find_pte32(ctx, h, rw, type, target_page_bits);
+    return find_pte32(env, ctx, h, rw, type, target_page_bits);
 }
 
 #if defined(TARGET_PPC64)
@@ -784,19 +793,12 @@ int ppc_load_slb_vsid (CPUPPCState *env, target_ulong rb, target_ulong *rt)
 #endif /* defined(TARGET_PPC64) */
 
 /* Perform segment based translation */
-static inline target_phys_addr_t get_pgaddr(target_phys_addr_t htab_base,
-                                            target_phys_addr_t htab_mask,
-                                            target_phys_addr_t hash)
-{
-    return htab_base | (hash & htab_mask);
-}
-
 static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
                               target_ulong eaddr, int rw, int type)
 {
     target_phys_addr_t hash;
-    target_ulong sr, vsid, vsid_mask, pgidx, page_mask;
-    int ds, vsid_sh, pr, target_page_bits;
+    target_ulong sr, vsid, pgidx, page_mask;
+    int ds, pr, target_page_bits;
     int ret, ret2;
 
     pr = msr_pr;
@@ -819,8 +821,6 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
         ds = 0;
         ctx->nx = !!(slb->vsid & SLB_VSID_N);
         ctx->eaddr = eaddr;
-        vsid_mask = 0x00003FFFFFFFFF80ULL;
-        vsid_sh = 7;
     } else
 #endif /* defined(TARGET_PPC64) */
     {
@@ -831,8 +831,6 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
         ds = sr & 0x80000000 ? 1 : 0;
         ctx->nx = sr & 0x10000000 ? 1 : 0;
         vsid = sr & 0x00FFFFFF;
-        vsid_mask = 0x01FFFFC0;
-        vsid_sh = 6;
         target_page_bits = TARGET_PAGE_BITS;
         LOG_MMU("Check segment v=" TARGET_FMT_lx " %d " TARGET_FMT_lx " nip="
                 TARGET_FMT_lx " lr=" TARGET_FMT_lx
@@ -847,27 +845,22 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
         /* Check if instruction fetch is allowed, if needed */
         if (type != ACCESS_CODE || ctx->nx == 0) {
             /* Page address translation */
-            /* Primary table address */
             pgidx = (eaddr & page_mask) >> target_page_bits;
 #if defined(TARGET_PPC64)
             if (env->mmu_model & POWERPC_MMU_64) {
                 /* XXX: this is false for 1 TB segments */
-                hash = ((vsid ^ pgidx) << vsid_sh) & vsid_mask;
+                hash = vsid ^ pgidx;
             } else
 #endif
             {
-                hash = ((vsid ^ pgidx) << vsid_sh) & vsid_mask;
+                hash = vsid ^ pgidx;
             }
             LOG_MMU("htab_base " TARGET_FMT_plx " htab_mask " TARGET_FMT_plx
                     " hash " TARGET_FMT_plx "\n",
                     env->htab_base, env->htab_mask, hash);
-            ctx->pg_addr[0] = get_pgaddr(env->htab_base, env->htab_mask, hash);
-            /* Secondary table address */
-            hash = (~hash) & vsid_mask;
-            LOG_MMU("htab_base " TARGET_FMT_plx " htab_mask " TARGET_FMT_plx
-                    " hash " TARGET_FMT_plx "\n",
-                    env->htab_base, env->htab_mask, hash);
-            ctx->pg_addr[1] = get_pgaddr(env->htab_base, env->htab_mask, hash);
+            ctx->hash[0] = hash;
+            ctx->hash[1] = ~hash;
+
 #if defined(TARGET_PPC64)
             if (env->mmu_model & POWERPC_MMU_64) {
                 /* Only 5 bits of the page index are used in the AVPN */
@@ -891,9 +884,9 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
             } else {
                 LOG_MMU("0 htab=" TARGET_FMT_plx "/" TARGET_FMT_plx
                         " vsid=" TARGET_FMT_lx " api=" TARGET_FMT_lx
-                        " hash=" TARGET_FMT_plx " pg_addr=" TARGET_FMT_plx "\n",
-                        env->htab_base, env->htab_mask, vsid, pgidx, hash,
-                        ctx->pg_addr[0]);
+                        " hash=" TARGET_FMT_plx "\n",
+                        env->htab_base, env->htab_mask, vsid, pgidx,
+                        ctx->hash[0]);
                 /* Primary table lookup */
                 ret = find_pte(env, ctx, 0, rw, type, target_page_bits);
                 if (ret < 0) {
@@ -901,9 +894,9 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
                     if (eaddr != 0xEFFFFFFF)
                         LOG_MMU("1 htab=" TARGET_FMT_plx "/" TARGET_FMT_plx
                                 " vsid=" TARGET_FMT_lx " api=" TARGET_FMT_lx
-                                " hash=" TARGET_FMT_plx " pg_addr=" TARGET_FMT_plx "\n",
-                                env->htab_base, env->htab_mask, vsid, pgidx, hash,
-                                ctx->pg_addr[1]);
+                                " hash=" TARGET_FMT_plx "\n",
+                                env->htab_base, env->htab_mask, vsid, pgidx,
+                                ctx->hash[1]);
                     ret2 = find_pte(env, ctx, 1, rw, type,
                                     target_page_bits);
                     if (ret2 != -1)
@@ -1455,8 +1448,8 @@ int cpu_ppc_handle_mmu_fault (CPUState *env, target_ulong address, int rw,
                     env->spr[SPR_DCMP] = 0x80000000 | ctx.ptem;
                 tlb_miss:
                     env->error_code |= ctx.key << 19;
-                    env->spr[SPR_HASH1] = ctx.pg_addr[0];
-                    env->spr[SPR_HASH2] = ctx.pg_addr[1];
+                    env->spr[SPR_HASH1] = get_pteg_addr(env, ctx.hash[0], HASH_PTE_SIZE_32);
+                    env->spr[SPR_HASH2] = get_pteg_addr(env, ctx.hash[1], HASH_PTE_SIZE_64);
                     break;
                 case POWERPC_MMU_SOFT_74xx:
                     if (rw == 1) {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] [PATCH 11/15] Better factor the ppc hash translation path
  2011-02-12 14:54 [Qemu-devel] RFC: Implement emulation of pSeries logical partitions David Gibson
                   ` (9 preceding siblings ...)
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 10/15] Use "hash" more consistently in ppc mmu code David Gibson
@ 2011-02-12 14:54 ` David Gibson
  2011-02-12 15:52   ` [Qemu-devel] " Alexander Graf
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 12/15] Support 1T segments on ppc David Gibson
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 73+ messages in thread
From: David Gibson @ 2011-02-12 14:54 UTC (permalink / raw
  To: qemu-devel; +Cc: paulus, agraf, anton

Currently the path handling hash page table translation in get_segment()
has a mix of common and 32 or 64 bit specific code.  However the
division is not done terribly well which results in a lot of messy code
flipping between common and divided paths.

This patch improves the organization, consolidating several divided paths
into one.  This in turn allows simplification of some code in
get_segment(), removing a number of ugly interim variables.

This new factorization will also make it easier to add support for the 1T
segments added in newer CPUs.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 target-ppc/cpu.h    |    1 +
 target-ppc/helper.c |   68 +++++++++++++++------------------------------------
 2 files changed, 21 insertions(+), 48 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index f69bcd2..3df6758 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -386,6 +386,7 @@ struct ppc_slb_t {
 #define SLB_VSID_B              0xc000000000000000ULL
 #define SLB_VSID_B_256M         0x0000000000000000ULL
 #define SLB_VSID_VSID           0x3FFFFFFFFFFFF000ULL
+#define SLB_VSID_PTEM           (SLB_VSID_B | SLB_VSID_VSID)
 #define SLB_VSID_KS             0x0000000000000800ULL
 #define SLB_VSID_KP             0x0000000000000400ULL
 #define SLB_VSID_N              0x0000000000000200ULL /* no-execute */
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index d83738b..6a1127f 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -655,29 +655,15 @@ static inline int _find_pte(CPUState *env, mmu_ctx_t *ctx, int is_64b, int h,
     return ret;
 }
 
-static inline int find_pte32(CPUState *env, mmu_ctx_t *ctx, int h, int rw,
-                             int type, int target_page_bits)
-{
-    return _find_pte(env, ctx, 0, h, rw, type, target_page_bits);
-}
-
-#if defined(TARGET_PPC64)
-static inline int find_pte64(CPUState *env, mmu_ctx_t *ctx, int h, int rw,
-                             int type, int target_page_bits)
-{
-    return _find_pte(env, ctx, 1, h, rw, type, target_page_bits);
-}
-#endif
-
 static inline int find_pte(CPUState *env, mmu_ctx_t *ctx, int h, int rw,
                            int type, int target_page_bits)
 {
 #if defined(TARGET_PPC64)
     if (env->mmu_model & POWERPC_MMU_64)
-        return find_pte64(env, ctx, h, rw, type, target_page_bits);
+        return _find_pte(env, ctx, 1, h, rw, type, target_page_bits);
 #endif
 
-    return find_pte32(env, ctx, h, rw, type, target_page_bits);
+    return _find_pte(env, ctx, 0, h, rw, type, target_page_bits);
 }
 
 #if defined(TARGET_PPC64)
@@ -797,14 +783,16 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
                               target_ulong eaddr, int rw, int type)
 {
     target_phys_addr_t hash;
-    target_ulong sr, vsid, pgidx, page_mask;
+    target_ulong vsid;
     int ds, pr, target_page_bits;
     int ret, ret2;
 
     pr = msr_pr;
+    ctx->eaddr = eaddr;
 #if defined(TARGET_PPC64)
     if (env->mmu_model & POWERPC_MMU_64) {
         ppc_slb_t *slb;
+        target_ulong pageaddr;
 
         LOG_MMU("Check SLBs\n");
         slb = slb_lookup(env, eaddr);
@@ -813,19 +801,24 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
         }
 
         vsid = (slb->vsid & SLB_VSID_VSID) >> SLB_VSID_SHIFT;
-        page_mask = ~SEGMENT_MASK_256M;
         target_page_bits = (slb->vsid & SLB_VSID_L)
             ? TARGET_PAGE_BITS_16M : TARGET_PAGE_BITS;
         ctx->key = !!(pr ? (slb->vsid & SLB_VSID_KP)
                       : (slb->vsid & SLB_VSID_KS));
         ds = 0;
         ctx->nx = !!(slb->vsid & SLB_VSID_N);
-        ctx->eaddr = eaddr;
+
+        pageaddr = eaddr & ((1ULL << 28) - (1ULL << target_page_bits));
+        /* XXX: this is false for 1 TB segments */
+        hash = vsid ^ (pageaddr >> target_page_bits);
+        /* Only 5 bits of the page index are used in the AVPN */
+        ctx->ptem = (slb->vsid & SLB_VSID_PTEM) | ((pageaddr >> 16) & 0x0F80);
     } else
 #endif /* defined(TARGET_PPC64) */
     {
+        target_ulong sr, pgidx;
+
         sr = env->sr[eaddr >> 28];
-        page_mask = 0x0FFFFFFF;
         ctx->key = (((sr & 0x20000000) && (pr != 0)) ||
                     ((sr & 0x40000000) && (pr == 0))) ? 1 : 0;
         ds = sr & 0x80000000 ? 1 : 0;
@@ -837,6 +830,9 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
                 " ir=%d dr=%d pr=%d %d t=%d\n",
                 eaddr, (int)(eaddr >> 28), sr, env->nip, env->lr, (int)msr_ir,
                 (int)msr_dr, pr != 0 ? 1 : 0, rw, type);
+        pgidx = (eaddr & ~SEGMENT_MASK_256M) >> target_page_bits;
+        hash = vsid ^ pgidx;
+        ctx->ptem = (vsid << 7) | (pgidx >> 10);
     }
     LOG_MMU("pte segment: key=%d ds %d nx %d vsid " TARGET_FMT_lx "\n",
             ctx->key, ds, ctx->nx, vsid);
@@ -845,36 +841,12 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
         /* Check if instruction fetch is allowed, if needed */
         if (type != ACCESS_CODE || ctx->nx == 0) {
             /* Page address translation */
-            pgidx = (eaddr & page_mask) >> target_page_bits;
-#if defined(TARGET_PPC64)
-            if (env->mmu_model & POWERPC_MMU_64) {
-                /* XXX: this is false for 1 TB segments */
-                hash = vsid ^ pgidx;
-            } else
-#endif
-            {
-                hash = vsid ^ pgidx;
-            }
             LOG_MMU("htab_base " TARGET_FMT_plx " htab_mask " TARGET_FMT_plx
                     " hash " TARGET_FMT_plx "\n",
                     env->htab_base, env->htab_mask, hash);
             ctx->hash[0] = hash;
             ctx->hash[1] = ~hash;
 
-#if defined(TARGET_PPC64)
-            if (env->mmu_model & POWERPC_MMU_64) {
-                /* Only 5 bits of the page index are used in the AVPN */
-                if (target_page_bits > 23) {
-                    ctx->ptem = (vsid << 12) |
-                                ((pgidx << (target_page_bits - 16)) & 0xF80);
-                } else {
-                    ctx->ptem = (vsid << 12) | ((pgidx >> 4) & 0x0F80);
-                }
-            } else
-#endif
-            {
-                ctx->ptem = (vsid << 7) | (pgidx >> 10);
-            }
             /* Initialize real address with an invalid value */
             ctx->raddr = (target_phys_addr_t)-1ULL;
             if (unlikely(env->mmu_model == POWERPC_MMU_SOFT_6xx ||
@@ -883,9 +855,9 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
                 ret = ppc6xx_tlb_check(env, ctx, eaddr, rw, type);
             } else {
                 LOG_MMU("0 htab=" TARGET_FMT_plx "/" TARGET_FMT_plx
-                        " vsid=" TARGET_FMT_lx " api=" TARGET_FMT_lx
+                        " vsid=" TARGET_FMT_lx " ptem=" TARGET_FMT_lx
                         " hash=" TARGET_FMT_plx "\n",
-                        env->htab_base, env->htab_mask, vsid, pgidx,
+                        env->htab_base, env->htab_mask, vsid, ctx->ptem,
                         ctx->hash[0]);
                 /* Primary table lookup */
                 ret = find_pte(env, ctx, 0, rw, type, target_page_bits);
@@ -893,9 +865,9 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
                     /* Secondary table lookup */
                     if (eaddr != 0xEFFFFFFF)
                         LOG_MMU("1 htab=" TARGET_FMT_plx "/" TARGET_FMT_plx
-                                " vsid=" TARGET_FMT_lx " api=" TARGET_FMT_lx
+                                " vsid=" TARGET_FMT_lx " ptem=" TARGET_FMT_lx
                                 " hash=" TARGET_FMT_plx "\n",
-                                env->htab_base, env->htab_mask, vsid, pgidx,
+                                env->htab_base, env->htab_mask, vsid, ctx->ptem,
                                 ctx->hash[1]);
                     ret2 = find_pte(env, ctx, 1, rw, type,
                                     target_page_bits);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] [PATCH 12/15] Support 1T segments on ppc
  2011-02-12 14:54 [Qemu-devel] RFC: Implement emulation of pSeries logical partitions David Gibson
                   ` (10 preceding siblings ...)
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 11/15] Better factor the ppc hash translation path David Gibson
@ 2011-02-12 14:54 ` David Gibson
  2011-02-12 15:57   ` [Qemu-devel] " Alexander Graf
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 13/15] Add POWER7 support for ppc David Gibson
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 73+ messages in thread
From: David Gibson @ 2011-02-12 14:54 UTC (permalink / raw
  To: qemu-devel; +Cc: paulus, agraf, anton

Traditionally, the "segments" used for the two-stage translation used on
powerpc MMUs were 256MB in size.  This was the only option on all hash
page table based 32-bit powerpc cpus, and on the earlier 64-bit hash page
table based cpus.  However, newer 64-bit cpus also permit 1TB segments

This patch adds support for 1TB segment translation to the qemu code.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 target-ppc/cpu.h    |    7 ++++++
 target-ppc/helper.c |   58 ++++++++++++++++++++++++++++++++++++---------------
 2 files changed, 48 insertions(+), 17 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 3df6758..53b788f 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -114,6 +114,7 @@ enum powerpc_mmu_t {
     POWERPC_MMU_601        = 0x0000000A,
 #if defined(TARGET_PPC64)
 #define POWERPC_MMU_64       0x00010000
+#define POWERPC_MMU_1TSEG    0x00020000
     /* 64 bits PowerPC MMU                                     */
     POWERPC_MMU_64B        = POWERPC_MMU_64 | 0x00000001,
     /* 620 variant (no segment exceptions)                     */
@@ -382,9 +383,11 @@ struct ppc_slb_t {
 
 /* Bits in the SLB VSID word */
 #define SLB_VSID_SHIFT          12
+#define SLB_VSID_SHIFT_1T       24
 #define SLB_VSID_SSIZE_SHIFT    62
 #define SLB_VSID_B              0xc000000000000000ULL
 #define SLB_VSID_B_256M         0x0000000000000000ULL
+#define SLB_VSID_B_1T           0x4000000000000000ULL
 #define SLB_VSID_VSID           0x3FFFFFFFFFFFF000ULL
 #define SLB_VSID_PTEM           (SLB_VSID_B | SLB_VSID_VSID)
 #define SLB_VSID_KS             0x0000000000000800ULL
@@ -398,6 +401,10 @@ struct ppc_slb_t {
 #define SEGMENT_SHIFT_256M      28
 #define SEGMENT_MASK_256M       ~((1ULL << SEGMENT_SHIFT_256M) - 1)
 
+#define SEGMENT_SHIFT_1T        40
+#define SEGMENT_MASK_1T         ~((1ULL << SEGMENT_SHIFT_1T) - 1)
+
+
 /*****************************************************************************/
 /* Machine state register bits definition                                    */
 #define MSR_SF   63 /* Sixty-four-bit mode                            hflags */
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index 6a1127f..158da09 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -669,19 +669,25 @@ static inline int find_pte(CPUState *env, mmu_ctx_t *ctx, int h, int rw,
 #if defined(TARGET_PPC64)
 static inline ppc_slb_t *slb_lookup(CPUPPCState *env, target_ulong eaddr)
 {
-    uint64_t esid;
+    uint64_t match_256M, match_1T;
     int n;
 
     LOG_SLB("%s: eaddr " TARGET_FMT_lx "\n", __func__, eaddr);
 
-    esid = (eaddr & SEGMENT_MASK_256M) | SLB_ESID_V;
+    match_256M = (eaddr & SEGMENT_MASK_256M) | SLB_ESID_V |
+        (SLB_VSID_B_256M >> SLB_VSID_SSIZE_SHIFT);
+    match_1T = (eaddr & SEGMENT_MASK_1T) | SLB_ESID_V |
+        (SLB_VSID_B_1T >> SLB_VSID_SSIZE_SHIFT);
 
     for (n = 0; n < env->slb_nr; n++) {
         ppc_slb_t *slb = &env->slb[n];
 
         LOG_SLB("%s: slot %d %016" PRIx64 " %016"
                     PRIx64 "\n", __func__, n, slb->esid, slb->vsid);
-        if (slb->esid == esid) {
+        /* We check for 1T matches on all MMUs here - if the MMU
+         * doesn't have 1T segment support, we will have prevented 1T
+         * entries from being inserted in the slbmte code. */
+        if ((slb->esid == match_256M) || (slb->esid == match_1T)) {
             return slb;
         }
     }
@@ -734,16 +740,21 @@ void ppc_slb_invalidate_one (CPUPPCState *env, uint64_t T0)
 int ppc_store_slb (CPUPPCState *env, target_ulong rb, target_ulong rs)
 {
     int slot = rb & 0xfff;
-    uint64_t esid = rb & ~0xfff;
     ppc_slb_t *slb = &env->slb[slot];
-
-    if (slot >= env->slb_nr) {
-        return -1;
-    }
-
-    slb->esid = esid;
+ 
+    if (rb & (0x1000 - env->slb_nr))
+	return -1; /* Reserved bits set or slot too high */
+    if (rs & (SLB_VSID_B & ~SLB_VSID_B_1T))
+	return -1; /* Bad segment size */
+    if ((rs & SLB_VSID_B) && !(env->mmu_model & POWERPC_MMU_1TSEG))
+ 	return -1; /* 1T segment on MMU that doesn't support it */
+ 
+    /* We stuff a copy of the B field into slb->esid to simplify
+     * lookup later */
+    slb->esid = (rb & (SLB_ESID_ESID | SLB_ESID_V)) |
+        (rs >> SLB_VSID_SSIZE_SHIFT);
     slb->vsid = rs;
-
+ 
     LOG_SLB("%s: %d " TARGET_FMT_lx " - " TARGET_FMT_lx " => %016" PRIx64
             " %016" PRIx64 "\n", __func__, slot, rb, rs,
             slb->esid, slb->vsid);
@@ -760,7 +771,8 @@ int ppc_load_slb_esid (CPUPPCState *env, target_ulong rb, target_ulong *rt)
         return -1;
     }
 
-    *rt = slb->esid;
+    /* Mask out the extra copy of the B field inserted in store_slb */
+    *rt = slb->esid & ~0x3;
     return 0;
 }
 
@@ -793,6 +805,7 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
     if (env->mmu_model & POWERPC_MMU_64) {
         ppc_slb_t *slb;
         target_ulong pageaddr;
+        int segment_bits;
 
         LOG_MMU("Check SLBs\n");
         slb = slb_lookup(env, eaddr);
@@ -800,7 +813,14 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
             return -5;
         }
 
-        vsid = (slb->vsid & SLB_VSID_VSID) >> SLB_VSID_SHIFT;
+	if (slb->vsid & SLB_VSID_B) {
+	    vsid = (slb->vsid & SLB_VSID_VSID) >> SLB_VSID_SHIFT_1T;
+	    segment_bits = 40;
+	} else {
+	    vsid = (slb->vsid & SLB_VSID_VSID) >> SLB_VSID_SHIFT;
+	    segment_bits = 28;
+	}
+
         target_page_bits = (slb->vsid & SLB_VSID_L)
             ? TARGET_PAGE_BITS_16M : TARGET_PAGE_BITS;
         ctx->key = !!(pr ? (slb->vsid & SLB_VSID_KP)
@@ -808,11 +828,15 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
         ds = 0;
         ctx->nx = !!(slb->vsid & SLB_VSID_N);
 
-        pageaddr = eaddr & ((1ULL << 28) - (1ULL << target_page_bits));
-        /* XXX: this is false for 1 TB segments */
-        hash = vsid ^ (pageaddr >> target_page_bits);
+        pageaddr = eaddr & ((1ULL << segment_bits) 
+                            - (1ULL << target_page_bits));
+	if (slb->vsid & SLB_VSID_B)
+	    hash = vsid ^ (vsid << 25) ^ (pageaddr >> target_page_bits);
+	else
+	    hash = vsid ^ (pageaddr >> target_page_bits);
         /* Only 5 bits of the page index are used in the AVPN */
-        ctx->ptem = (slb->vsid & SLB_VSID_PTEM) | ((pageaddr >> 16) & 0x0F80);
+        ctx->ptem = (slb->vsid & SLB_VSID_PTEM) | 
+            ((pageaddr >> 16) & ((1ULL << segment_bits) - 0x80));
     } else
 #endif /* defined(TARGET_PPC64) */
     {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] [PATCH 13/15] Add POWER7 support for ppc
  2011-02-12 14:54 [Qemu-devel] RFC: Implement emulation of pSeries logical partitions David Gibson
                   ` (11 preceding siblings ...)
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 12/15] Support 1T segments on ppc David Gibson
@ 2011-02-12 14:54 ` David Gibson
  2011-02-12 16:09   ` [Qemu-devel] " Alexander Graf
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 14/15] Start implementing pSeries logical partition machine David Gibson
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 73+ messages in thread
From: David Gibson @ 2011-02-12 14:54 UTC (permalink / raw
  To: qemu-devel; +Cc: paulus, agraf, anton

This adds emulation support for the recent POWER7 cpu to qemu.  It's far
from perfect - it's missing a number of POWER7 features so far, including
any support for VSX or decimal floating point instructions.  However, it's
close enough to boot a kernel with the POWER7 PVR.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 hw/ppc.c                    |   83 ++++++++++++++++++++++++++++++++++
 hw/ppc.h                    |    1 +
 target-ppc/cpu.h            |   19 ++++++++
 target-ppc/helper.c         |    6 +++
 target-ppc/translate_init.c |  103 +++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 212 insertions(+), 0 deletions(-)

diff --git a/hw/ppc.c b/hw/ppc.c
index 968aec1..6975636 100644
--- a/hw/ppc.c
+++ b/hw/ppc.c
@@ -246,6 +246,89 @@ void ppc970_irq_init (CPUState *env)
     env->irq_inputs = (void **)qemu_allocate_irqs(&ppc970_set_irq, env,
                                                   PPC970_INPUT_NB);
 }
+
+/* POWER7 internal IRQ controller */
+static void power7_set_irq (void *opaque, int pin, int level)
+{
+    CPUState *env = opaque;
+    int cur_level;
+
+    LOG_IRQ("%s: env %p pin %d level %d\n", __func__,
+                env, pin, level);
+    cur_level = (env->irq_input_state >> pin) & 1;
+    /* Don't generate spurious events */
+    if ((cur_level == 1 && level == 0) || (cur_level == 0 && level != 0)) {
+        switch (pin) {
+        case POWER7_INPUT_INT:
+            /* Level sensitive - active high */
+            LOG_IRQ("%s: set the external IRQ state to %d\n",
+                        __func__, level);
+            ppc_set_irq(env, PPC_INTERRUPT_EXT, level);
+            break;
+        case POWER7_INPUT_THINT:
+            /* Level sensitive - active high */
+            LOG_IRQ("%s: set the SMI IRQ state to %d\n", __func__,
+                        level);
+            ppc_set_irq(env, PPC_INTERRUPT_THERM, level);
+            break;
+        case POWER7_INPUT_MCP:
+            /* Negative edge sensitive */
+            /* XXX: TODO: actual reaction may depends on HID0 status
+             *            603/604/740/750: check HID0[EMCP]
+             */
+            if (cur_level == 1 && level == 0) {
+                LOG_IRQ("%s: raise machine check state\n",
+                            __func__);
+                ppc_set_irq(env, PPC_INTERRUPT_MCK, 1);
+            }
+            break;
+        case POWER7_INPUT_CKSTP:
+            /* Level sensitive - active low */
+            /* XXX: TODO: relay the signal to CKSTP_OUT pin */
+            if (level) {
+                LOG_IRQ("%s: stop the CPU\n", __func__);
+                env->halted = 1;
+            } else {
+                LOG_IRQ("%s: restart the CPU\n", __func__);
+                env->halted = 0;
+            }
+            break;
+        case POWER7_INPUT_HRESET:
+            /* Level sensitive - active low */
+            if (level) {
+#if 0 // XXX: TOFIX
+                LOG_IRQ("%s: reset the CPU\n", __func__);
+                cpu_reset(env);
+#endif
+            }
+            break;
+        case POWER7_INPUT_SRESET:
+            LOG_IRQ("%s: set the RESET IRQ state to %d\n",
+                        __func__, level);
+            ppc_set_irq(env, PPC_INTERRUPT_RESET, level);
+            break;
+        case POWER7_INPUT_TBEN:
+            LOG_IRQ("%s: set the TBEN state to %d\n", __func__,
+                        level);
+            /* XXX: TODO */
+            break;
+        default:
+            /* Unknown pin - do nothing */
+            LOG_IRQ("%s: unknown IRQ pin %d\n", __func__, pin);
+            return;
+        }
+        if (level)
+            env->irq_input_state |= 1 << pin;
+        else
+            env->irq_input_state &= ~(1 << pin);
+    }
+}
+
+void ppcPOWER7_irq_init (CPUState *env)
+{
+    env->irq_inputs = (void **)qemu_allocate_irqs(&power7_set_irq, env,
+                                                  POWER7_INPUT_NB);
+}
 #endif /* defined(TARGET_PPC64) */
 
 /* PowerPC 40x internal IRQ controller */
diff --git a/hw/ppc.h b/hw/ppc.h
index 34f54cf..3ccf134 100644
--- a/hw/ppc.h
+++ b/hw/ppc.h
@@ -36,6 +36,7 @@ void ppc40x_irq_init (CPUState *env);
 void ppce500_irq_init (CPUState *env);
 void ppc6xx_irq_init (CPUState *env);
 void ppc970_irq_init (CPUState *env);
+void ppcPOWER7_irq_init (CPUState *env);
 
 /* PPC machines for OpenBIOS */
 enum {
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 53b788f..fa3cd7f 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -119,6 +119,8 @@ enum powerpc_mmu_t {
     POWERPC_MMU_64B        = POWERPC_MMU_64 | 0x00000001,
     /* 620 variant (no segment exceptions)                     */
     POWERPC_MMU_620        = POWERPC_MMU_64 | 0x00000002,
+    /* Architecture 2.06 variant                               */
+    POWERPC_MMU_2_06       = POWERPC_MMU_64 | POWERPC_MMU_1TSEG | 0x00000003,
 #endif /* defined(TARGET_PPC64) */
 };
 
@@ -154,6 +156,8 @@ enum powerpc_excp_t {
 #if defined(TARGET_PPC64)
     /* PowerPC 970 exception model      */
     POWERPC_EXCP_970,
+    /* POWER7 exception model           */
+    POWERPC_EXCP_POWER7,
 #endif /* defined(TARGET_PPC64) */
 };
 
@@ -289,6 +293,8 @@ enum powerpc_input_t {
     PPC_FLAGS_INPUT_405,
     /* PowerPC 970 bus                  */
     PPC_FLAGS_INPUT_970,
+    /* PowerPC POWER7 bus               */
+    PPC_FLAGS_INPUT_POWER7,
     /* PowerPC 401 bus                  */
     PPC_FLAGS_INPUT_401,
     /* Freescale RCPU bus               */
@@ -1003,6 +1009,7 @@ static inline void cpu_clone_regs(CPUState *env, target_ulong newsp)
 #define SPR_HSPRG1            (0x131)
 #define SPR_HDSISR            (0x132)
 #define SPR_HDAR              (0x133)
+#define SPR_SPURR             (0x134)
 #define SPR_BOOKE_DBCR0       (0x134)
 #define SPR_IBCR              (0x135)
 #define SPR_PURR              (0x135)
@@ -1627,6 +1634,18 @@ enum {
     PPC970_INPUT_THINT      = 6,
     PPC970_INPUT_NB,
 };
+
+enum {
+    /* POWER7 input pins */
+    POWER7_INPUT_HRESET     = 0,
+    POWER7_INPUT_SRESET     = 1,
+    POWER7_INPUT_CKSTP      = 2,
+    POWER7_INPUT_TBEN       = 3,
+    POWER7_INPUT_MCP        = 4,
+    POWER7_INPUT_INT        = 5,
+    POWER7_INPUT_THINT      = 6,
+    POWER7_INPUT_NB,
+};
 #endif
 
 /* Hardware exceptions definitions */
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index 158da09..a630148 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -1192,6 +1192,7 @@ static inline int check_physical(CPUState *env, mmu_ctx_t *ctx,
 #if defined(TARGET_PPC64)
     case POWERPC_MMU_620:
     case POWERPC_MMU_64B:
+    case POWERPC_MMU_2_06:
         /* Real address are 60 bits long */
         ctx->raddr &= 0x0FFFFFFFFFFFFFFFULL;
         ctx->prot |= PAGE_WRITE;
@@ -1269,6 +1270,7 @@ int get_physical_address (CPUState *env, mmu_ctx_t *ctx, target_ulong eaddr,
 #if defined(TARGET_PPC64)
         case POWERPC_MMU_620:
         case POWERPC_MMU_64B:
+        case POWERPC_MMU_2_06:
 #endif
             if (ret < 0) {
                 /* We didn't match any BAT entry or don't have BATs */
@@ -1368,6 +1370,7 @@ int cpu_ppc_handle_mmu_fault (CPUState *env, target_ulong address, int rw,
 #if defined(TARGET_PPC64)
                 case POWERPC_MMU_620:
                 case POWERPC_MMU_64B:
+                case POWERPC_MMU_2_06:
 #endif
                     env->exception_index = POWERPC_EXCP_ISI;
                     env->error_code = 0x40000000;
@@ -1475,6 +1478,7 @@ int cpu_ppc_handle_mmu_fault (CPUState *env, target_ulong address, int rw,
 #if defined(TARGET_PPC64)
                 case POWERPC_MMU_620:
                 case POWERPC_MMU_64B:
+                case POWERPC_MMU_2_06:
 #endif
                     env->exception_index = POWERPC_EXCP_DSI;
                     env->error_code = 0;
@@ -1798,6 +1802,7 @@ void ppc_tlb_invalidate_all (CPUPPCState *env)
 #if defined(TARGET_PPC64)
     case POWERPC_MMU_620:
     case POWERPC_MMU_64B:
+    case POWERPC_MMU_2_06:
 #endif /* defined(TARGET_PPC64) */
         tlb_flush(env, 1);
         break;
@@ -1865,6 +1870,7 @@ void ppc_tlb_invalidate_one (CPUPPCState *env, target_ulong addr)
 #if defined(TARGET_PPC64)
     case POWERPC_MMU_620:
     case POWERPC_MMU_64B:
+    case POWERPC_MMU_2_06:
         /* tlbie invalidate TLBs for all segments */
         /* XXX: given the fact that there are too many segments to invalidate,
          *      and we still don't have a tlb_flush_mask(env, n, mask) in Qemu,
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index c84581e..2faa591 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -61,6 +61,7 @@ void glue(glue(ppc, name),_irq_init) (CPUPPCState *env);
 PPC_IRQ_INIT_FN(40x);
 PPC_IRQ_INIT_FN(6xx);
 PPC_IRQ_INIT_FN(970);
+PPC_IRQ_INIT_FN(POWER7);
 PPC_IRQ_INIT_FN(e500);
 
 /* Generic callbacks:
@@ -3087,6 +3088,35 @@ static void init_excp_970 (CPUPPCState *env)
     env->hreset_vector = 0x0000000000000100ULL;
 #endif
 }
+
+static void init_excp_POWER7 (CPUPPCState *env)
+{
+#if !defined(CONFIG_USER_ONLY)
+    env->excp_vectors[POWERPC_EXCP_RESET]    = 0x00000100;
+    env->excp_vectors[POWERPC_EXCP_MCHECK]   = 0x00000200;
+    env->excp_vectors[POWERPC_EXCP_DSI]      = 0x00000300;
+    env->excp_vectors[POWERPC_EXCP_DSEG]     = 0x00000380;
+    env->excp_vectors[POWERPC_EXCP_ISI]      = 0x00000400;
+    env->excp_vectors[POWERPC_EXCP_ISEG]     = 0x00000480;
+    env->excp_vectors[POWERPC_EXCP_EXTERNAL] = 0x00000500;
+    env->excp_vectors[POWERPC_EXCP_ALIGN]    = 0x00000600;
+    env->excp_vectors[POWERPC_EXCP_PROGRAM]  = 0x00000700;
+    env->excp_vectors[POWERPC_EXCP_FPU]      = 0x00000800;
+    env->excp_vectors[POWERPC_EXCP_DECR]     = 0x00000900;
+    env->excp_vectors[POWERPC_EXCP_HDECR]    = 0x00000980;
+    env->excp_vectors[POWERPC_EXCP_SYSCALL]  = 0x00000C00;
+    env->excp_vectors[POWERPC_EXCP_TRACE]    = 0x00000D00;
+    env->excp_vectors[POWERPC_EXCP_PERFM]    = 0x00000F00;
+    env->excp_vectors[POWERPC_EXCP_VPU]      = 0x00000F20;
+    env->excp_vectors[POWERPC_EXCP_IABR]     = 0x00001300;
+    env->excp_vectors[POWERPC_EXCP_MAINT]    = 0x00001600;
+    env->excp_vectors[POWERPC_EXCP_VPUA]     = 0x00001700;
+    env->excp_vectors[POWERPC_EXCP_THERM]    = 0x00001800;
+    env->hreset_excp_prefix = 0x00000000FFF00000ULL;
+    /* Hardware reset vector */
+    env->hreset_vector = 0x0000000000000100ULL;
+#endif
+}
 #endif
 
 /*****************************************************************************/
@@ -6268,6 +6298,74 @@ static void init_proc_970MP (CPUPPCState *env)
     vscr_init(env, 0x00010000);
 }
 
+/* POWER7 (actually a somewhat hacked 970FX for now...) */
+#define POWERPC_INSNS_POWER7  (PPC_INSNS_BASE | PPC_STRING | PPC_MFTB |        \
+                              PPC_FLOAT | PPC_FLOAT_FSEL | PPC_FLOAT_FRES |   \
+                              PPC_FLOAT_FSQRT | PPC_FLOAT_FRSQRTE |           \
+                              PPC_FLOAT_STFIWX |                              \
+                              PPC_CACHE | PPC_CACHE_ICBI | PPC_CACHE_DCBZT |  \
+                              PPC_MEM_SYNC | PPC_MEM_EIEIO |                  \
+                              PPC_MEM_TLBIE | PPC_MEM_TLBSYNC |               \
+                              PPC_64B | PPC_ALTIVEC |                         \
+                              PPC_SEGMENT_64B | PPC_SLBI |                    \
+                              PPC_POPCNTB | PPC_POPCNTWD)
+#define POWERPC_MSRM_POWER7   (0x800000000204FF36ULL)
+#define POWERPC_MMU_POWER7    (POWERPC_MMU_2_06)
+#define POWERPC_EXCP_POWER7   (POWERPC_EXCP_POWER7)
+#define POWERPC_INPUT_POWER7  (PPC_FLAGS_INPUT_POWER7)
+#define POWERPC_BFDM_POWER7   (bfd_mach_ppc64)
+#define POWERPC_FLAG_POWER7   (POWERPC_FLAG_VRE | POWERPC_FLAG_SE |            \
+                              POWERPC_FLAG_BE | POWERPC_FLAG_PMM |            \
+                              POWERPC_FLAG_BUS_CLK)
+#define check_pow_POWER7    check_pow_nocheck
+
+static void init_proc_POWER7 (CPUPPCState *env)
+{
+    gen_spr_ne_601(env);
+    gen_spr_7xx(env);
+    /* Time base */
+    gen_tbl(env);
+    /* PURR & SPURR: Hack - treat these as aliases for the TB for now */
+    spr_register(env, SPR_PURR,   "PURR",
+                 &spr_read_purr, SPR_NOACCESS,
+                 &spr_read_purr, SPR_NOACCESS,
+                 0x00000000);
+    spr_register(env, SPR_SPURR,   "SPURR",
+                 &spr_read_purr, SPR_NOACCESS,
+                 &spr_read_purr, SPR_NOACCESS,
+                 0x00000000);
+    /* Memory management */
+    /* XXX : not implemented */
+    spr_register(env, SPR_MMUCFG, "MMUCFG",
+                 SPR_NOACCESS, SPR_NOACCESS,
+                 &spr_read_generic, SPR_NOACCESS,
+                 0x00000000); /* TOFIX */
+    /* XXX : not implemented */
+    spr_register(env, SPR_CTRL, "SPR_CTRLT",
+                 SPR_NOACCESS, SPR_NOACCESS,
+                 &spr_read_generic, &spr_write_generic,
+                 0x80800000);
+    spr_register(env, SPR_UCTRL, "SPR_CTRLF",
+                 SPR_NOACCESS, SPR_NOACCESS,
+                 &spr_read_generic, &spr_write_generic,
+                 0x80800000);
+    spr_register(env, SPR_VRSAVE, "SPR_VRSAVE",
+                 &spr_read_generic, &spr_write_generic,
+                 &spr_read_generic, &spr_write_generic,
+                 0x00000000);
+#if !defined(CONFIG_USER_ONLY)
+    env->slb_nr = 32;
+#endif
+    init_excp_POWER7(env);
+    env->dcache_line_size = 128;
+    env->icache_line_size = 128;
+    /* Allocate hardware IRQ controller */
+    ppcPOWER7_irq_init(env);
+    /* Can't find information on what this should be on reset.  This
+     * value is the one used by 74xx processors. */
+    vscr_init(env, 0x00010000);
+}
+
 /* PowerPC 620                                                               */
 #define POWERPC_INSNS_620    (PPC_INSNS_BASE | PPC_STRING | PPC_MFTB |        \
                               PPC_FLOAT | PPC_FLOAT_FSEL | PPC_FLOAT_FRES |   \
@@ -6990,6 +7088,8 @@ enum {
     CPU_POWERPC_POWER6             = 0x003E0000,
     CPU_POWERPC_POWER6_5           = 0x0F000001, /* POWER6 in POWER5 mode */
     CPU_POWERPC_POWER6A            = 0x0F000002,
+#define CPU_POWERPC_POWER7           CPU_POWERPC_POWER7_v20
+    CPU_POWERPC_POWER7_v20         = 0x003F0200,
     CPU_POWERPC_970                = 0x00390202,
 #define CPU_POWERPC_970FX            CPU_POWERPC_970FX_v31
     CPU_POWERPC_970FX_v10          = 0x00391100,
@@ -8792,6 +8892,9 @@ static const ppc_def_t ppc_defs[] = {
     /* POWER6A                                                               */
     POWERPC_DEF("POWER6A",       CPU_POWERPC_POWER6A,                POWER6),
 #endif
+    /* POWER7                                                                */
+    POWERPC_DEF("POWER7",	 CPU_POWERPC_POWER7,		     POWER7),
+    POWERPC_DEF("POWER7_v2.0",	 CPU_POWERPC_POWER7_v20,	     POWER7),
     /* PowerPC 970                                                           */
     POWERPC_DEF("970",           CPU_POWERPC_970,                    970),
     /* PowerPC 970FX (G5)                                                    */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] [PATCH 14/15] Start implementing pSeries logical partition machine
  2011-02-12 14:54 [Qemu-devel] RFC: Implement emulation of pSeries logical partitions David Gibson
                   ` (12 preceding siblings ...)
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 13/15] Add POWER7 support for ppc David Gibson
@ 2011-02-12 14:54 ` David Gibson
  2011-02-12 16:23   ` [Qemu-devel] " Alexander Graf
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 15/15] Implement the bus structure for PAPR virtual IO David Gibson
  2011-02-14  4:16   ` FUJITA Tomonori
  15 siblings, 1 reply; 73+ messages in thread
From: David Gibson @ 2011-02-12 14:54 UTC (permalink / raw
  To: qemu-devel; +Cc: paulus, agraf, anton

This patch adds a "pseries" machine to qemu.  This aims to emulate a
logical partition on an IBM pSeries machine, compliant to the
"PowerPC Architecture Platform Requirements" (PAPR) document.

This initial version is quite limited, it implements a basic machine
and PAPR hypercall emulation.  So far only one hypercall is present -
H_PUT_TERM_CHAR - so that a (write-only) console is available.

The machine so far more resembles an old POWER4 style "full system
partition" rather than a modern LPAR, in that the guest manages the
page tables directly, rather than via hypercalls.

The machine requires qemu to be configured with --enable-fdt.  The
machine can (so far) only be booted with -kernel - i.e. no partition
firmware is provided.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 Makefile.target  |    2 +
 hw/spapr.c       |  279 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/spapr.h       |  240 ++++++++++++++++++++++++++++++++++++++++++++++
 hw/spapr_hcall.c |   40 ++++++++
 4 files changed, 561 insertions(+), 0 deletions(-)
 create mode 100644 hw/spapr.c
 create mode 100644 hw/spapr.h
 create mode 100644 hw/spapr_hcall.c

diff --git a/Makefile.target b/Makefile.target
index 48e6c00..e0796ba 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -231,6 +231,8 @@ obj-ppc-y += ppc_prep.o
 obj-ppc-y += ppc_oldworld.o
 # NewWorld PowerMac
 obj-ppc-y += ppc_newworld.o
+# IBM pSeries (sPAPR)
+obj-ppc-y += spapr.o spapr_hcall.o
 # PowerPC 4xx boards
 obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
 obj-ppc-y += ppc440.o ppc440_bamboo.o
diff --git a/hw/spapr.c b/hw/spapr.c
new file mode 100644
index 0000000..8aca4e0
--- /dev/null
+++ b/hw/spapr.c
@@ -0,0 +1,279 @@
+/*
+ * QEMU PowerPC pSeries Logical Partition (aka sPAPR) hardware System Emulator
+ *
+ * Copyright (c) 2004-2007 Fabrice Bellard
+ * Copyright (c) 2007 Jocelyn Mayer
+ * Copyright (c) 2010 David Gibson, IBM Corporation.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ *
+ */
+#include "hw.h"
+#include "ppc.h"
+#include "pc.h"
+#include "sysemu.h"
+#include "boards.h"
+#include "fw_cfg.h"
+#include "loader.h"
+#include "elf.h"
+#include "kvm.h"
+#include "kvm_ppc.h"
+#include "net.h"
+#include "blockdev.h"
+#include "hw/spapr.h"
+
+#include <libfdt.h>
+
+#define KERNEL_LOAD_ADDR        0x00000000
+#define INITRD_LOAD_ADDR        0x02800000
+#define FDT_ADDR                0x0f000000
+#define FDT_MAX_SIZE            0x10000
+
+#define TIMEBASE_FREQ           512000000ULL
+
+static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
+                              const char *cpu_model, CPUState *envs[],
+                              target_phys_addr_t initrd_base,
+                              target_phys_addr_t initrd_size,
+                              const char *kernel_cmdline)
+{
+    void *fdt;
+    uint64_t mem_reg_property[] = { 0, cpu_to_be64(ramsize) };
+    uint32_t start_prop = cpu_to_be32(initrd_base);
+    uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
+    int i;
+    char *modelname;
+
+#define _FDT(exp) \
+    do { \
+        int ret = (exp);                                           \
+        if (ret < 0) {                                             \
+            hw_error("qemu: error creating device tree: %s: %s\n", \
+                     #exp, fdt_strerror(ret));                     \
+            return NULL;                                           \
+        }                                                          \
+    } while (0)
+
+    fdt = qemu_mallocz(FDT_MAX_SIZE);
+    _FDT((fdt_create(fdt, FDT_MAX_SIZE)));
+    
+    _FDT((fdt_finish_reservemap(fdt)));
+
+    /* Root node */
+    _FDT((fdt_begin_node(fdt, "")));
+    _FDT((fdt_property_string(fdt, "device_type", "chrp")));
+    _FDT((fdt_property_string(fdt, "model", "qemu,emulated-pSeries-LPAR")));
+
+    _FDT((fdt_property_cell(fdt, "#address-cells", 0x2)));
+    _FDT((fdt_property_cell(fdt, "#size-cells", 0x2)));
+
+    /* /chosen */
+    _FDT((fdt_begin_node(fdt, "chosen")));
+
+    _FDT((fdt_property_string(fdt, "bootargs", kernel_cmdline)));
+    _FDT((fdt_property(fdt, "linux,initrd-start", &start_prop, sizeof(start_prop))));
+    _FDT((fdt_property(fdt, "linux,initrd-end", &end_prop, sizeof(end_prop))));
+    
+    _FDT((fdt_end_node(fdt)));
+
+    /* memory node */
+    _FDT((fdt_begin_node(fdt, "memory@0")));
+
+    _FDT((fdt_property_string(fdt, "device_type", "memory")));
+    _FDT((fdt_property(fdt, "reg", mem_reg_property, sizeof(mem_reg_property))));
+    
+    _FDT((fdt_end_node(fdt)));
+    
+    /* cpus */
+    _FDT((fdt_begin_node(fdt, "cpus")));
+
+    _FDT((fdt_property_cell(fdt, "#address-cells", 0x1)));
+    _FDT((fdt_property_cell(fdt, "#size-cells", 0x0)));
+
+    modelname = qemu_strdup(cpu_model);
+    
+    for (i = 0; i < strlen(modelname); i++)
+        modelname[i] = toupper(modelname[i]);
+
+    for (i = 0; i < smp_cpus; i++) {
+        CPUState *env = envs[i];
+        char *nodename;
+        uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
+                           0xffffffff, 0xffffffff};
+
+        if (asprintf(&nodename, "%s@%x", modelname, i) < 0) {
+            fprintf(stderr, "Allocation failure\n");
+            exit(1);
+        }
+
+        _FDT((fdt_begin_node(fdt, nodename)));
+
+        free(nodename);
+
+        _FDT((fdt_property_cell(fdt, "reg", i)));
+        _FDT((fdt_property_string(fdt, "device_type", "cpu")));
+
+        _FDT((fdt_property_cell(fdt, "cpu-version", env->spr[SPR_PVR])));
+        _FDT((fdt_property_cell(fdt, "dcache-block-size", env->dcache_line_size)));
+        _FDT((fdt_property_cell(fdt, "icache-block-size", env->icache_line_size)));
+        _FDT((fdt_property_cell(fdt, "timebase-frequency", TIMEBASE_FREQ)));
+        /* Hardcode CPU frequency for now.  It's kind of arbitrary on
+         * full emu, for kvm we should copy it from the host */
+        _FDT((fdt_property_cell(fdt, "clock-frequency", 1000000000)));
+        _FDT((fdt_property_cell(fdt, "ibm,slb-size", env->slb_nr)));
+        _FDT((fdt_property_string(fdt, "status", "okay")));
+        _FDT((fdt_property(fdt, "64-bit", NULL, 0)));
+
+        if (envs[i]->mmu_model & POWERPC_MMU_1TSEG)
+            _FDT((fdt_property(fdt, "ibm,processor-segment-sizes",
+                               segs, sizeof(segs))));
+
+        _FDT((fdt_end_node(fdt)));
+    }
+
+    qemu_free(modelname);
+
+    _FDT((fdt_end_node(fdt)));
+
+    _FDT((fdt_end_node(fdt))); /* close root node */
+    _FDT((fdt_finish(fdt)));
+
+    if (fdt_size)
+        *fdt_size = fdt_totalsize(fdt);
+
+    return fdt;
+}
+
+static uint64_t translate_kernel_address(void *opaque, uint64_t addr)
+{
+    return (addr & 0x0fffffff) + KERNEL_LOAD_ADDR;
+}
+
+static void emulate_spapr_hypercall(CPUState *env, void *opaque)
+{
+    env->gpr[3] = spapr_hypercall(env, (sPAPREnvironment *)opaque,
+                                  env->gpr[3], &env->gpr[4]);
+}
+
+/* pSeries LPAR / sPAPR hardware init */
+static void ppc_spapr_init (ram_addr_t ram_size,
+                             const char *boot_device,
+                             const char *kernel_filename,
+                             const char *kernel_cmdline,
+                             const char *initrd_filename,
+                             const char *cpu_model)
+{
+    CPUState *env = NULL;
+    void *fdt;
+    int i;
+    ram_addr_t ram_offset;
+    uint32_t kernel_base, initrd_base;
+    long kernel_size, initrd_size;
+    int fdt_size;
+    sPAPREnvironment *spapr;
+
+    spapr = qemu_malloc(sizeof(*spapr));
+
+    /* init CPUs */
+    if (cpu_model == NULL)
+        cpu_model = "POWER7";
+    for (i = 0; i < smp_cpus; i++) {
+        env = cpu_init(cpu_model);
+        if (!env) {
+            fprintf(stderr, "Unable to find PowerPC CPU definition\n");
+            exit(1);
+        }
+        /* Set time-base frequency to 512 MHz */
+        cpu_ppc_tb_init(env, TIMEBASE_FREQ);
+        qemu_register_reset((QEMUResetHandler*)&cpu_reset, env);
+
+        env->emulate_hypercall = emulate_spapr_hypercall;
+        env->hcall_opaque = spapr;
+    }
+
+    /* allocate RAM */
+    ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", ram_size);
+    cpu_register_physical_memory(0, ram_size, ram_offset);
+
+    if (kernel_filename) {
+        uint64_t lowaddr = 0;
+
+        kernel_base = KERNEL_LOAD_ADDR;
+
+        kernel_size = load_elf(kernel_filename, translate_kernel_address, NULL,
+                               NULL, &lowaddr, NULL, 1, ELF_MACHINE, 0);
+        if (kernel_size < 0)
+            kernel_size = load_image_targphys(kernel_filename, kernel_base,
+                                              ram_size - kernel_base);
+        if (kernel_size < 0) {
+            hw_error("qemu: could not load kernel '%s'\n", kernel_filename);
+            exit(1);
+        }
+
+        /* load initrd */
+        if (initrd_filename) {
+            initrd_base = INITRD_LOAD_ADDR;
+            initrd_size = load_image_targphys(initrd_filename, initrd_base,
+                                              ram_size - initrd_base);
+            if (initrd_size < 0) {
+                hw_error("qemu: could not load initial ram disk '%s'\n",
+                         initrd_filename);
+                exit(1);
+            }
+        } else {
+            initrd_base = 0;
+            initrd_size = 0;
+        }
+
+        /* load fdt */
+        fdt = spapr_create_fdt(&fdt_size, ram_size, cpu_model, &env,
+                               initrd_base, initrd_size,
+                               kernel_cmdline);
+        if (!fdt) {
+            hw_error("Couldn't create pSeries device tree\n");
+            exit(1);
+        }
+
+        cpu_physical_memory_write(FDT_ADDR, fdt, fdt_size);
+
+        env->gpr[3] = FDT_ADDR;
+        env->gpr[5] = 0;
+        env->hreset_vector = kernel_base;
+        env->hreset_excp_prefix = 0;
+    } else {
+        fprintf(stderr, "pSeries machine needs -kernel for now");
+        exit(1);
+    }
+}
+
+static QEMUMachine spapr_machine = {
+    .name = "pseries",
+    .desc = "pSeries Logical Partition (PAPR compliant)",
+    .init = ppc_spapr_init,
+    .max_cpus = 1,
+    .no_vga = 1,
+    .no_parallel = 1,
+};
+
+static void spapr_machine_init(void)
+{
+    qemu_register_machine(&spapr_machine);
+}
+
+machine_init(spapr_machine_init);
diff --git a/hw/spapr.h b/hw/spapr.h
new file mode 100644
index 0000000..dae9617
--- /dev/null
+++ b/hw/spapr.h
@@ -0,0 +1,240 @@
+#if !defined (__HW_SPAPR_H__)
+#define __HW_SPAPR_H__
+
+typedef struct sPAPREnvironment {
+} sPAPREnvironment;
+
+#define H_SUCCESS         0
+#define H_BUSY            1        /* Hardware busy -- retry later */
+#define H_CLOSED          2        /* Resource closed */
+#define H_NOT_AVAILABLE   3
+#define H_CONSTRAINED     4        /* Resource request constrained to max allowed */
+#define H_PARTIAL         5
+#define H_IN_PROGRESS     14       /* Kind of like busy */
+#define H_PAGE_REGISTERED 15
+#define H_PARTIAL_STORE   16
+#define H_PENDING         17       /* returned from H_POLL_PENDING */
+#define H_CONTINUE        18       /* Returned from H_Join on success */
+#define H_LONG_BUSY_START_RANGE         9900  /* Start of long busy range */
+#define H_LONG_BUSY_ORDER_1_MSEC        9900  /* Long busy, hint that 1msec \
+                                                 is a good time to retry */
+#define H_LONG_BUSY_ORDER_10_MSEC       9901  /* Long busy, hint that 10msec \
+                                                 is a good time to retry */
+#define H_LONG_BUSY_ORDER_100_MSEC      9902  /* Long busy, hint that 100msec \
+                                                 is a good time to retry */
+#define H_LONG_BUSY_ORDER_1_SEC         9903  /* Long busy, hint that 1sec \
+                                                 is a good time to retry */
+#define H_LONG_BUSY_ORDER_10_SEC        9904  /* Long busy, hint that 10sec \
+                                                 is a good time to retry */
+#define H_LONG_BUSY_ORDER_100_SEC       9905  /* Long busy, hint that 100sec \
+                                                 is a good time to retry */
+#define H_LONG_BUSY_END_RANGE           9905  /* End of long busy range */
+#define H_HARDWARE        -1       /* Hardware error */
+#define H_FUNCTION        -2       /* Function not supported */
+#define H_PRIVILEGE       -3       /* Caller not privileged */
+#define H_PARAMETER       -4       /* Parameter invalid, out-of-range or conflicting */
+#define H_BAD_MODE        -5       /* Illegal msr value */
+#define H_PTEG_FULL       -6       /* PTEG is full */
+#define H_NOT_FOUND       -7       /* PTE was not found" */
+#define H_RESERVED_DABR   -8       /* DABR address is reserved by the hypervisor on this processor" */
+#define H_NO_MEM          -9
+#define H_AUTHORITY       -10
+#define H_PERMISSION      -11
+#define H_DROPPED         -12
+#define H_SOURCE_PARM     -13
+#define H_DEST_PARM       -14
+#define H_REMOTE_PARM     -15
+#define H_RESOURCE        -16
+#define H_ADAPTER_PARM    -17
+#define H_RH_PARM         -18
+#define H_RCQ_PARM        -19
+#define H_SCQ_PARM        -20
+#define H_EQ_PARM         -21
+#define H_RT_PARM         -22
+#define H_ST_PARM         -23
+#define H_SIGT_PARM       -24
+#define H_TOKEN_PARM      -25
+#define H_MLENGTH_PARM    -27
+#define H_MEM_PARM        -28
+#define H_MEM_ACCESS_PARM -29
+#define H_ATTR_PARM       -30
+#define H_PORT_PARM       -31
+#define H_MCG_PARM        -32
+#define H_VL_PARM         -33
+#define H_TSIZE_PARM      -34
+#define H_TRACE_PARM      -35
+
+#define H_MASK_PARM       -37
+#define H_MCG_FULL        -38
+#define H_ALIAS_EXIST     -39
+#define H_P_COUNTER       -40
+#define H_TABLE_FULL      -41
+#define H_ALT_TABLE       -42
+#define H_MR_CONDITION    -43
+#define H_NOT_ENOUGH_RESOURCES -44
+#define H_R_STATE         -45
+#define H_RESCINDEND      -46
+#define H_MULTI_THREADS_ACTIVE -9005
+
+
+/* Long Busy is a condition that can be returned by the firmware
+ * when a call cannot be completed now, but the identical call
+ * should be retried later.  This prevents calls blocking in the
+ * firmware for long periods of time.  Annoyingly the firmware can return
+ * a range of return codes, hinting at how long we should wait before
+ * retrying.  If you don't care for the hint, the macro below is a good
+ * way to check for the long_busy return codes
+ */
+#define H_IS_LONG_BUSY(x)  ((x >= H_LONG_BUSY_START_RANGE) \
+                            && (x <= H_LONG_BUSY_END_RANGE))
+
+/* Flags */
+#define H_LARGE_PAGE      (1ULL<<(63-16))
+#define H_EXACT           (1ULL<<(63-24))       /* Use exact PTE or return H_PTEG_FULL */
+#define H_R_XLATE         (1ULL<<(63-25))       /* include a valid logical page num in the pte if the valid bit is set */
+#define H_READ_4          (1ULL<<(63-26))       /* Return 4 PTEs */
+#define H_PAGE_STATE_CHANGE (1ULL<<(63-28))
+#define H_PAGE_UNUSED     ((1ULL<<(63-29)) | (1ULL<<(63-30)))
+#define H_PAGE_SET_UNUSED (H_PAGE_STATE_CHANGE | H_PAGE_UNUSED)
+#define H_PAGE_SET_LOANED (H_PAGE_SET_UNUSED | (1ULL<<(63-31)))
+#define H_PAGE_SET_ACTIVE H_PAGE_STATE_CHANGE
+#define H_AVPN            (1ULL<<(63-32))       /* An avpn is provided as a sanity test */
+#define H_ANDCOND         (1ULL<<(63-33))
+#define H_ICACHE_INVALIDATE (1ULL<<(63-40))     /* icbi, etc.  (ignored for IO pages) */
+#define H_ICACHE_SYNCHRONIZE (1ULL<<(63-41))    /* dcbst, icbi, etc (ignored for IO pages */
+#define H_ZERO_PAGE       (1ULL<<(63-48))       /* zero the page before mapping (ignored for IO pages) */
+#define H_COPY_PAGE       (1ULL<<(63-49))
+#define H_N               (1ULL<<(63-61))
+#define H_PP1             (1ULL<<(63-62))
+#define H_PP2             (1ULL<<(63-63))
+
+/* VASI States */
+#define H_VASI_INVALID    0
+#define H_VASI_ENABLED    1
+#define H_VASI_ABORTED    2
+#define H_VASI_SUSPENDING 3
+#define H_VASI_SUSPENDED  4
+#define H_VASI_RESUMED    5
+#define H_VASI_COMPLETED  6
+
+/* DABRX flags */
+#define H_DABRX_HYPERVISOR (1ULL<<(63-61))
+#define H_DABRX_KERNEL     (1ULL<<(63-62))
+#define H_DABRX_USER       (1ULL<<(63-63))
+
+/* Each control block has to be on a 4K bondary */
+#define H_CB_ALIGNMENT     4096
+
+/* pSeries hypervisor opcodes */
+#define H_REMOVE                0x04
+#define H_ENTER                 0x08
+#define H_READ                  0x0c
+#define H_CLEAR_MOD             0x10
+#define H_CLEAR_REF             0x14
+#define H_PROTECT               0x18
+#define H_GET_TCE               0x1c
+#define H_PUT_TCE               0x20
+#define H_SET_SPRG0             0x24
+#define H_SET_DABR              0x28
+#define H_PAGE_INIT             0x2c
+#define H_SET_ASR               0x30
+#define H_ASR_ON                0x34
+#define H_ASR_OFF               0x38
+#define H_LOGICAL_CI_LOAD       0x3c
+#define H_LOGICAL_CI_STORE      0x40
+#define H_LOGICAL_CACHE_LOAD    0x44
+#define H_LOGICAL_CACHE_STORE   0x48
+#define H_LOGICAL_ICBI          0x4c
+#define H_LOGICAL_DCBF          0x50
+#define H_GET_TERM_CHAR         0x54
+#define H_PUT_TERM_CHAR         0x58
+#define H_REAL_TO_LOGICAL       0x5c
+#define H_HYPERVISOR_DATA       0x60
+#define H_EOI                   0x64
+#define H_CPPR                  0x68
+#define H_IPI                   0x6c
+#define H_IPOLL                 0x70
+#define H_XIRR                  0x74
+#define H_PERFMON               0x7c
+#define H_MIGRATE_DMA           0x78
+#define H_REGISTER_VPA          0xDC
+#define H_CEDE                  0xE0
+#define H_CONFER                0xE4
+#define H_PROD                  0xE8
+#define H_GET_PPP               0xEC
+#define H_SET_PPP               0xF0
+#define H_PURR                  0xF4
+#define H_PIC                   0xF8
+#define H_REG_CRQ               0xFC
+#define H_FREE_CRQ              0x100
+#define H_VIO_SIGNAL            0x104
+#define H_SEND_CRQ              0x108
+#define H_COPY_RDMA             0x110
+#define H_REGISTER_LOGICAL_LAN  0x114
+#define H_FREE_LOGICAL_LAN      0x118
+#define H_ADD_LOGICAL_LAN_BUFFER 0x11C
+#define H_SEND_LOGICAL_LAN      0x120
+#define H_BULK_REMOVE           0x124
+#define H_MULTICAST_CTRL        0x130
+#define H_SET_XDABR             0x134
+#define H_STUFF_TCE             0x138
+#define H_PUT_TCE_INDIRECT      0x13C
+#define H_CHANGE_LOGICAL_LAN_MAC 0x14C
+#define H_VTERM_PARTNER_INFO    0x150
+#define H_REGISTER_VTERM        0x154
+#define H_FREE_VTERM            0x158
+#define H_RESET_EVENTS          0x15C
+#define H_ALLOC_RESOURCE        0x160
+#define H_FREE_RESOURCE         0x164
+#define H_MODIFY_QP             0x168
+#define H_QUERY_QP              0x16C
+#define H_REREGISTER_PMR        0x170
+#define H_REGISTER_SMR          0x174
+#define H_QUERY_MR              0x178
+#define H_QUERY_MW              0x17C
+#define H_QUERY_HCA             0x180
+#define H_QUERY_PORT            0x184
+#define H_MODIFY_PORT           0x188
+#define H_DEFINE_AQP1           0x18C
+#define H_GET_TRACE_BUFFER      0x190
+#define H_DEFINE_AQP0           0x194
+#define H_RESIZE_MR             0x198
+#define H_ATTACH_MCQP           0x19C
+#define H_DETACH_MCQP           0x1A0
+#define H_CREATE_RPT            0x1A4
+#define H_REMOVE_RPT            0x1A8
+#define H_REGISTER_RPAGES       0x1AC
+#define H_DISABLE_AND_GETC      0x1B0
+#define H_ERROR_DATA            0x1B4
+#define H_GET_HCA_INFO          0x1B8
+#define H_GET_PERF_COUNT        0x1BC
+#define H_MANAGE_TRACE          0x1C0
+#define H_FREE_LOGICAL_LAN_BUFFER 0x1D4
+#define H_QUERY_INT_STATE       0x1E4
+#define H_POLL_PENDING          0x1D8
+#define H_ILLAN_ATTRIBUTES      0x244
+#define H_MODIFY_HEA_QP         0x250
+#define H_QUERY_HEA_QP          0x254
+#define H_QUERY_HEA             0x258
+#define H_QUERY_HEA_PORT        0x25C
+#define H_MODIFY_HEA_PORT       0x260
+#define H_REG_BCMC              0x264
+#define H_DEREG_BCMC            0x268
+#define H_REGISTER_HEA_RPAGES   0x26C
+#define H_DISABLE_AND_GET_HEA   0x270
+#define H_GET_HEA_INFO          0x274
+#define H_ALLOC_HEA_RESOURCE    0x278
+#define H_ADD_CONN              0x284
+#define H_DEL_CONN              0x288
+#define H_JOIN                  0x298
+#define H_VASI_STATE            0x2A4
+#define H_ENABLE_CRQ            0x2B0
+#define H_GET_EM_PARMS          0x2B8
+#define H_SET_MPP               0x2D0
+#define H_GET_MPP               0x2D4
+#define MAX_HCALL_OPCODE        H_GET_MPP
+
+target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
+                             target_ulong token, target_ulong *args);
+
+#endif /* !defined (__HW_SPAPR_H__) */
diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c
new file mode 100644
index 0000000..c99c345
--- /dev/null
+++ b/hw/spapr_hcall.c
@@ -0,0 +1,40 @@
+#include "sysemu.h"
+#include "cpu.h"
+#include "qemu-char.h"
+#include "hw/spapr.h"
+
+static target_ulong h_put_term_char(target_ulong termno, target_ulong len,
+                                    target_ulong char0_7, target_ulong char8_15)
+{
+    uint8_t buf[16];
+
+    *((uint64_t *)buf) = cpu_to_be64(char0_7);
+    *((uint64_t *)buf + 1) = cpu_to_be64(char8_15);
+
+    qemu_chr_write(serial_hds[0], buf, len);
+
+    return 0;
+}
+
+target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
+                             target_ulong token, target_ulong *args)
+{
+    target_ulong r = H_FUNCTION;
+
+    if (msr_pr) {
+        fprintf(stderr, "Hypercall made with MSR=0x%016llx\n",
+                (unsigned long long)env->msr);
+        return H_PRIVILEGE;
+    }
+
+    switch (token) {
+    case H_PUT_TERM_CHAR:
+        r = h_put_term_char(args[0], args[1], args[2], args[3]);
+        break;
+
+    default:
+        fprintf(stderr, "Unimplemented hcall 0x%llx\n", (unsigned long long)token);
+    }
+
+    return r;
+}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-12 14:54 [Qemu-devel] RFC: Implement emulation of pSeries logical partitions David Gibson
                   ` (13 preceding siblings ...)
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 14/15] Start implementing pSeries logical partition machine David Gibson
@ 2011-02-12 14:54 ` David Gibson
  2011-02-12 16:47   ` [Qemu-devel] " Alexander Graf
  2011-02-14  4:16   ` FUJITA Tomonori
  15 siblings, 1 reply; 73+ messages in thread
From: David Gibson @ 2011-02-12 14:54 UTC (permalink / raw
  To: qemu-devel; +Cc: paulus, agraf, anton

This extends the "pseries" (PAPR) machine to include a virtual IO bus
supporting the PAPR defined hypercall based virtual IO mechanisms.

So far only one VIO device is provided, the vty / vterm, providing
a full console (polled only, for now).

Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 Makefile.target  |    3 +-
 hw/spapr.c       |   31 +++++++++-
 hw/spapr.h       |   10 +++
 hw/spapr_hcall.c |   19 ++----
 hw/spapr_vio.c   |  191 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/spapr_vio.h   |   49 ++++++++++++++
 hw/spapr_vty.c   |  132 +++++++++++++++++++++++++++++++++++++
 7 files changed, 419 insertions(+), 16 deletions(-)
 create mode 100644 hw/spapr_vio.c
 create mode 100644 hw/spapr_vio.h
 create mode 100644 hw/spapr_vty.c

diff --git a/Makefile.target b/Makefile.target
index e0796ba..fe232da 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -232,7 +232,8 @@ obj-ppc-y += ppc_oldworld.o
 # NewWorld PowerMac
 obj-ppc-y += ppc_newworld.o
 # IBM pSeries (sPAPR)
-obj-ppc-y += spapr.o spapr_hcall.o
+obj-ppc-y += spapr.o spapr_hcall.o spapr_vio.o
+obj-ppc-y += spapr_vty.o
 # PowerPC 4xx boards
 obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
 obj-ppc-y += ppc440.o ppc440_bamboo.o
diff --git a/hw/spapr.c b/hw/spapr.c
index 8aca4e0..da61061 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -37,6 +37,7 @@
 #include "net.h"
 #include "blockdev.h"
 #include "hw/spapr.h"
+#include "hw/spapr_vio.h"
 
 #include <libfdt.h>
 
@@ -49,6 +50,7 @@
 
 static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
                               const char *cpu_model, CPUState *envs[],
+                              sPAPREnvironment *spapr,
                               target_phys_addr_t initrd_base,
                               target_phys_addr_t initrd_size,
                               const char *kernel_cmdline)
@@ -59,6 +61,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
     uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
     int i;
     char *modelname;
+    int ret;
 
 #define _FDT(exp) \
     do { \
@@ -151,9 +154,28 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
 
     _FDT((fdt_end_node(fdt)));
 
+    /* vdevice */
+    _FDT((fdt_begin_node(fdt, "vdevice")));
+
+    _FDT((fdt_property_string(fdt, "device_type", "vdevice")));
+    _FDT((fdt_property_string(fdt, "compatible", "IBM,vdevice")));
+    _FDT((fdt_property_cell(fdt, "#address-cells", 0x1)));
+    _FDT((fdt_property_cell(fdt, "#size-cells", 0x0)));
+    
+    _FDT((fdt_end_node(fdt)));
+
     _FDT((fdt_end_node(fdt))); /* close root node */
     _FDT((fdt_finish(fdt)));
 
+    /* re-expand to allow for further tweaks */
+    _FDT((fdt_open_into(fdt, fdt, FDT_MAX_SIZE)));
+
+    ret = spapr_populate_vdevice(spapr->vio_bus, fdt);
+    if (ret < 0)
+        fprintf(stderr, "couldn't setup vio devices in fdt\n");
+
+    _FDT((fdt_pack(fdt)));
+
     if (fdt_size)
         *fdt_size = fdt_totalsize(fdt);
 
@@ -211,6 +233,12 @@ static void ppc_spapr_init (ram_addr_t ram_size,
     ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", ram_size);
     cpu_register_physical_memory(0, ram_size, ram_offset);
 
+    spapr->vio_bus = spapr_vio_bus_init();
+
+    for (i = 0; i < MAX_SERIAL_PORTS; i++)
+        if (serial_hds[i])
+            spapr_vty_create(spapr->vio_bus, i, serial_hds[i]);
+
     if (kernel_filename) {
         uint64_t lowaddr = 0;
 
@@ -242,7 +270,7 @@ static void ppc_spapr_init (ram_addr_t ram_size,
         }
 
         /* load fdt */
-        fdt = spapr_create_fdt(&fdt_size, ram_size, cpu_model, &env,
+        fdt = spapr_create_fdt(&fdt_size, ram_size, cpu_model, &env, spapr,
                                initrd_base, initrd_size,
                                kernel_cmdline);
         if (!fdt) {
@@ -267,6 +295,7 @@ static QEMUMachine spapr_machine = {
     .desc = "pSeries Logical Partition (PAPR compliant)",
     .init = ppc_spapr_init,
     .max_cpus = 1,
+    .no_parallel = 1,
     .no_vga = 1,
     .no_parallel = 1,
 };
diff --git a/hw/spapr.h b/hw/spapr.h
index dae9617..168511f 100644
--- a/hw/spapr.h
+++ b/hw/spapr.h
@@ -1,7 +1,10 @@
 #if !defined (__HW_SPAPR_H__)
 #define __HW_SPAPR_H__
 
+struct VIOsPAPRBus;
+
 typedef struct sPAPREnvironment {
+    struct VIOsPAPRBus *vio_bus;
 } sPAPREnvironment;
 
 #define H_SUCCESS         0
@@ -237,4 +240,11 @@ typedef struct sPAPREnvironment {
 target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
                              target_ulong token, target_ulong *args);
 
+target_ulong h_put_term_char(sPAPREnvironment *spapr,
+                             target_ulong termno, target_ulong len,
+                             target_ulong char0_7, target_ulong char8_15);
+target_ulong h_get_term_char(sPAPREnvironment *spapr,
+                             target_ulong termno, target_ulong *len,
+                             target_ulong *char0_7, target_ulong *char8_15);
+
 #endif /* !defined (__HW_SPAPR_H__) */
diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c
index c99c345..e2ed9cf 100644
--- a/hw/spapr_hcall.c
+++ b/hw/spapr_hcall.c
@@ -3,19 +3,6 @@
 #include "qemu-char.h"
 #include "hw/spapr.h"
 
-static target_ulong h_put_term_char(target_ulong termno, target_ulong len,
-                                    target_ulong char0_7, target_ulong char8_15)
-{
-    uint8_t buf[16];
-
-    *((uint64_t *)buf) = cpu_to_be64(char0_7);
-    *((uint64_t *)buf + 1) = cpu_to_be64(char8_15);
-
-    qemu_chr_write(serial_hds[0], buf, len);
-
-    return 0;
-}
-
 target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
                              target_ulong token, target_ulong *args)
 {
@@ -29,7 +16,11 @@ target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
 
     switch (token) {
     case H_PUT_TERM_CHAR:
-        r = h_put_term_char(args[0], args[1], args[2], args[3]);
+        r = h_put_term_char(spapr, args[0], args[1], args[2], args[3]);
+        break;
+
+    case H_GET_TERM_CHAR:
+        r = h_get_term_char(spapr, args[0], &args[0], &args[1], &args[2]);
         break;
 
     default:
diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
new file mode 100644
index 0000000..d9c7292
--- /dev/null
+++ b/hw/spapr_vio.c
@@ -0,0 +1,191 @@
+/*
+ * QEMU sPAPR VIO code
+ *
+ * Copyright (c) 2010 David Gibson, IBM Corporation <david@gibson.dropbear.id.au>
+ * Based on the s390 virtio bus code:
+ * Copyright (c) 2009 Alexander Graf <agraf@suse.de>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "hw.h"
+#include "sysemu.h"
+#include "boards.h"
+#include "monitor.h"
+#include "loader.h"
+#include "elf.h"
+#include "hw/sysbus.h"
+#include "kvm.h"
+#include "device_tree.h"
+
+#include "hw/spapr.h"
+#include "hw/spapr_vio.h"
+
+#ifdef CONFIG_FDT
+#include <libfdt.h>
+#endif /* CONFIG_FDT */
+
+/* #define DEBUG_SPAPR */
+
+#ifdef DEBUG_SPAPR
+#define dprintf(fmt, ...) \
+    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
+#else
+#define dprintf(fmt, ...) \
+    do { } while (0)
+#endif
+
+struct BusInfo spapr_vio_bus_info = {
+    .name       = "spapr-vio",
+    .size       = sizeof(VIOsPAPRBus),
+};
+
+VIOsPAPRDevice *spapr_vio_find_by_reg(VIOsPAPRBus *bus, uint32_t reg)
+{
+    DeviceState *qdev;
+    VIOsPAPRDevice *dev = NULL;
+
+    QLIST_FOREACH(qdev, &bus->bus.children, sibling) {
+        dev = (VIOsPAPRDevice *)qdev;
+        if (dev->reg == reg)
+            break;
+    }
+
+    return dev;
+}
+
+VIOsPAPRBus *spapr_vio_bus_init(void)
+{
+    VIOsPAPRBus *bus;
+    BusState *_bus;
+    DeviceState *dev;
+
+    /* Create bridge device */
+    dev = qdev_create(NULL, "spapr-vio-bridge");
+    qdev_init_nofail(dev);
+
+    /* Create bus on bridge device */
+
+    _bus = qbus_create(&spapr_vio_bus_info, dev, "spapr-vio");
+    bus = DO_UPCAST(VIOsPAPRBus, bus, _bus);
+
+    return bus;
+}
+
+#ifdef CONFIG_FDT
+static int vio_make_devnode(VIOsPAPRDevice *dev,
+                            void *fdt)
+{
+    VIOsPAPRDeviceInfo *info = (VIOsPAPRDeviceInfo *)dev->qdev.info;
+    int vdevice_off, node_off;
+    int ret;
+
+    vdevice_off = fdt_path_offset(fdt, "/vdevice");
+    if (vdevice_off < 0)
+        return vdevice_off;
+
+    node_off = fdt_add_subnode(fdt, vdevice_off, dev->qdev.id);
+    if (node_off < 0)
+        return node_off;
+
+    ret = fdt_setprop_cell(fdt, node_off, "reg", dev->reg);
+    if (ret < 0)
+        return ret;
+
+    if (info->dt_type) {
+        ret = fdt_setprop_string(fdt, node_off, "device_type",
+                                 info->dt_type);
+        if (ret < 0)
+            return ret;
+    }
+
+    if (info->dt_compatible) {
+        ret = fdt_setprop_string(fdt, node_off, "compatible",
+                                 info->dt_compatible);
+        if (ret < 0)
+            return ret;
+    }
+
+    if (info->devnode) {
+        ret = (info->devnode)(dev, fdt, node_off);
+        if (ret < 0)
+            return ret;
+    }
+
+    return node_off;
+}
+#endif /* CONFIG_FDT */
+
+static int spapr_vio_busdev_init(DeviceState *dev, DeviceInfo *info)
+{
+    VIOsPAPRDeviceInfo *_info = (VIOsPAPRDeviceInfo *)info;
+    VIOsPAPRDevice *_dev = (VIOsPAPRDevice *)dev;
+    char *id;
+
+    if (asprintf(&id, "%s@%x", _info->dt_name, _dev->reg) < 0)
+        return -1;
+
+    _dev->qdev.id = id;
+
+    return _info->init(_dev);
+}
+
+void spapr_vio_bus_register_withprop(VIOsPAPRDeviceInfo *info)
+{
+    info->qdev.init = spapr_vio_busdev_init;
+    info->qdev.bus_info = &spapr_vio_bus_info;
+
+    assert(info->qdev.size >= sizeof(VIOsPAPRDevice));
+    qdev_register(&info->qdev);
+}
+
+static int spapr_vio_bridge_init(SysBusDevice *dev)
+{
+    /* nothing */
+    return 0;
+}
+
+static SysBusDeviceInfo spapr_vio_bridge_info = {
+    .init = spapr_vio_bridge_init,
+    .qdev.name  = "spapr-vio-bridge",
+    .qdev.size  = sizeof(SysBusDevice),
+    .qdev.no_user = 1,
+};
+
+static void spapr_vio_register_devices(void)
+{
+    sysbus_register_withprop(&spapr_vio_bridge_info);
+}
+
+device_init(spapr_vio_register_devices)
+
+#ifdef CONFIG_FDT
+
+int spapr_populate_vdevice(VIOsPAPRBus *bus, void *fdt)
+{
+    DeviceState *qdev;
+    int ret = 0;
+
+    QLIST_FOREACH(qdev, &bus->bus.children, sibling) {
+        VIOsPAPRDevice *dev = (VIOsPAPRDevice *)qdev;
+
+        ret = vio_make_devnode(dev, fdt);
+
+        if (ret < 0)
+            return ret;
+    }
+    
+    return 0;
+}
+#endif /* CONFIG_FDT */
diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h
new file mode 100644
index 0000000..fb5e301
--- /dev/null
+++ b/hw/spapr_vio.h
@@ -0,0 +1,49 @@
+#ifndef _HW_SPAPR_VIO_H
+#define _HW_SPAPR_VIO_H
+/*
+ * QEMU sPAPR VIO bus definitions
+ *
+ * Copyright (c) 2010 David Gibson, IBM Corporation <david@gibson.dropbear.id.au>
+ * Based on the s390 virtio bus definitions:
+ * Copyright (c) 2009 Alexander Graf <agraf@suse.de>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+typedef struct VIOsPAPRDevice {
+    DeviceState qdev;
+    uint32_t reg;
+} VIOsPAPRDevice;
+
+typedef struct VIOsPAPRBus {
+    BusState bus;
+} VIOsPAPRBus;
+
+typedef struct {
+    DeviceInfo qdev;
+    const char *dt_name, *dt_type, *dt_compatible;
+    int (*init)(VIOsPAPRDevice *dev);
+    int (*devnode)(VIOsPAPRDevice *dev, void *fdt, int node_off);
+} VIOsPAPRDeviceInfo;
+
+extern VIOsPAPRBus *spapr_vio_bus_init(void);
+extern VIOsPAPRDevice *spapr_vio_find_by_reg(VIOsPAPRBus *bus, uint32_t reg);
+extern void spapr_vio_bus_register_withprop(VIOsPAPRDeviceInfo *info);
+extern int spapr_populate_vdevice(VIOsPAPRBus *bus, void *fdt);
+
+void vty_putchars(VIOsPAPRDevice *sdev, uint8_t *buf, int len);
+void spapr_vty_create(VIOsPAPRBus *bus,
+                      uint32_t reg, CharDriverState *chardev);
+
+#endif /* _HW_SPAPR_VIO_H */
diff --git a/hw/spapr_vty.c b/hw/spapr_vty.c
new file mode 100644
index 0000000..9a2dc0b
--- /dev/null
+++ b/hw/spapr_vty.c
@@ -0,0 +1,132 @@
+#include "qdev.h"
+#include "qemu-char.h"
+#include "hw/spapr.h"
+#include "hw/spapr_vio.h"
+
+#define VTERM_BUFSIZE   16
+
+typedef struct VIOsPAPRVTYDevice {
+    VIOsPAPRDevice sdev;
+    CharDriverState *chardev;
+    uint32_t in, out;
+    uint8_t buf[VTERM_BUFSIZE];
+} VIOsPAPRVTYDevice;
+
+static int vty_can_receive(void *opaque)
+{
+    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)opaque;
+
+    return (dev->in - dev->out) < VTERM_BUFSIZE;
+}
+
+static void vty_receive(void *opaque, const uint8_t *buf, int size)
+{
+    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)opaque;
+    int i;
+
+    for (i = 0; i < size; i++) {
+        assert((dev->in - dev->out) < VTERM_BUFSIZE);
+        dev->buf[dev->in++ % VTERM_BUFSIZE] = buf[i];
+    }
+}
+
+static int vty_getchars(VIOsPAPRDevice *sdev, uint8_t *buf, int max)
+{
+    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)sdev;
+    int n = 0;
+
+    while ((n < max) && (dev->out != dev->in))
+        buf[n++] = dev->buf[dev->out++ % VTERM_BUFSIZE];
+
+    return n;
+}
+
+void vty_putchars(VIOsPAPRDevice *sdev, uint8_t *buf, int len)
+{
+    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)sdev;
+
+    /* FIXME: should check the qemu_chr_write() return value */
+    qemu_chr_write(dev->chardev, buf, len);
+}
+
+static int spapr_vty_init(VIOsPAPRDevice *sdev)
+{
+    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)sdev;
+
+    qemu_chr_add_handlers(dev->chardev, vty_can_receive,
+                          vty_receive, NULL, dev);
+
+    return 0;
+}
+
+target_ulong h_put_term_char(sPAPREnvironment *spapr,
+                             target_ulong termno, target_ulong len,
+                             target_ulong char0_7, target_ulong char8_15)
+{
+    VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, termno);
+    uint8_t buf[16];
+
+    if (!sdev)
+        return H_PARAMETER;
+
+    if (len > 16)
+        return H_PARAMETER;
+
+    *((uint64_t *)buf) = cpu_to_be64(char0_7);
+    *((uint64_t *)buf + 1) = cpu_to_be64(char8_15);
+
+    vty_putchars(sdev, buf, len);
+
+    return 0;
+}
+
+target_ulong h_get_term_char(sPAPREnvironment *spapr,
+                             target_ulong termno, target_ulong *len,
+                             target_ulong *char0_7, target_ulong *char8_15)
+{
+    VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, termno);
+    uint8_t buf[16];
+
+    if (!sdev)
+        return H_PARAMETER;
+
+    *len = vty_getchars(sdev, buf, sizeof(buf));
+    if (*len < 16)
+        memset(buf + *len, 0, 16 - *len);
+
+    *char0_7 = be64_to_cpu(*((uint64_t *)buf));
+    *char8_15 = be64_to_cpu(*((uint64_t *)buf + 1));
+
+    return H_SUCCESS;
+}
+
+void spapr_vty_create(VIOsPAPRBus *bus,
+                      uint32_t reg, CharDriverState *chardev)
+{
+    DeviceState *dev;
+
+    dev = qdev_create(&bus->bus, "spapr-vty");
+    qdev_prop_set_uint32(dev, "reg", reg);
+    qdev_prop_set_chr(dev, "chardev", chardev);
+    qdev_init_nofail(dev);
+}
+
+static VIOsPAPRDeviceInfo spapr_vty = {
+    .init = spapr_vty_init,
+    .dt_name = "vty",
+    .dt_type = "serial",
+    .dt_compatible = "hvterm1",
+    .qdev.name = "spapr-vty",
+    .qdev.size = sizeof(VIOsPAPRVTYDevice),
+    .qdev.props = (Property[]) {
+        DEFINE_PROP_UINT32("reg", VIOsPAPRDevice, reg, 0),
+        DEFINE_PROP_CHR("chardev", VIOsPAPRVTYDevice, chardev),
+        DEFINE_PROP_END_OF_LIST(),
+    },
+};
+
+static void spapr_vty_register(void)
+{
+    spapr_vio_bus_register_withprop(&spapr_vty);
+}
+device_init(spapr_vty_register);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 02/15] Clean up PowerPC SLB handling code
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 02/15] Clean up PowerPC SLB handling code David Gibson
@ 2011-02-12 15:17   ` Alexander Graf
  0 siblings, 0 replies; 73+ messages in thread
From: Alexander Graf @ 2011-02-12 15:17 UTC (permalink / raw
  To: David Gibson; +Cc: paulus, qemu-devel, anton


On 12.02.2011, at 15:54, David Gibson wrote:

> Currently the SLB information when emulating a PowerPC 970 is
> storeed in a structure with the unhelpfully named fields 'tmp'
> and 'tmp64'.  While the layout in these fields does match the
> description of the SLB in the architecture document, it is not
> convenient either for looking up the SLB, or for emulating the
> slbmte instruction.
> 
> This patch, therefore, reorganizes the SLB entry structure to be
> divided in the the "ESID related" and "VSID related" fields as
> they are divided in instructions accessing the SLB.
> 
> In addition to making the code smaller and more readable, this will
> make it easier to implement for the 1TB segments used in more
> recent PowerPC chips.
> 
> Signed-off-by: David Gibson <dwg@au1.ibm.com>

Acked-by: Alexander Graf <agraf@suse.de>

> ---
> target-ppc/cpu.h       |   29 +++++++-
> target-ppc/helper.c    |  178 ++++++++++++++----------------------------------
> target-ppc/helper.h    |    1 -
> target-ppc/op_helper.c |    9 +--
> 4 files changed, 80 insertions(+), 137 deletions(-)
> 
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index deb8d7c..a20c132 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -43,6 +43,8 @@
> # define TARGET_VIRT_ADDR_SPACE_BITS 64
> #endif
> 
> +#define TARGET_PAGE_BITS_16M 24
> +
> #else /* defined (TARGET_PPC64) */
> /* PowerPC 32 definitions */
> #define TARGET_LONG_BITS 32
> @@ -359,10 +361,31 @@ union ppc_tlb_t {
> 
> typedef struct ppc_slb_t ppc_slb_t;
> struct ppc_slb_t {
> -    uint64_t tmp64;
> -    uint32_t tmp;
> +    uint64_t esid;
> +    uint64_t vsid;
> };
> 
> +/* Bits in the SLB ESID word */
> +#define SLB_ESID_ESID           0xFFFFFFFFF0000000ULL
> +#define SLB_ESID_V              0x0000000008000000ULL /* valid */
> +
> +/* Bits in the SLB VSID word */
> +#define SLB_VSID_SHIFT          12
> +#define SLB_VSID_SSIZE_SHIFT    62
> +#define SLB_VSID_B              0xc000000000000000ULL
> +#define SLB_VSID_B_256M         0x0000000000000000ULL
> +#define SLB_VSID_VSID           0x3FFFFFFFFFFFF000ULL
> +#define SLB_VSID_KS             0x0000000000000800ULL
> +#define SLB_VSID_KP             0x0000000000000400ULL
> +#define SLB_VSID_N              0x0000000000000200ULL /* no-execute */
> +#define SLB_VSID_L              0x0000000000000100ULL
> +#define SLB_VSID_C              0x0000000000000080ULL /* class */
> +#define SLB_VSID_LP             0x0000000000000030ULL
> +#define SLB_VSID_ATTR           0x0000000000000FFFULL
> +
> +#define SEGMENT_SHIFT_256M      28
> +#define SEGMENT_MASK_256M       ~((1ULL << SEGMENT_SHIFT_256M) - 1)
> +
> /*****************************************************************************/
> /* Machine state register bits definition                                    */
> #define MSR_SF   63 /* Sixty-four-bit mode                            hflags */
> @@ -755,7 +778,7 @@ void ppc_store_sdr1 (CPUPPCState *env, target_ulong value);
> void ppc_store_asr (CPUPPCState *env, target_ulong value);
> target_ulong ppc_load_slb (CPUPPCState *env, int slb_nr);
> target_ulong ppc_load_sr (CPUPPCState *env, int sr_nr);
> -void ppc_store_slb (CPUPPCState *env, target_ulong rb, target_ulong rs);
> +int ppc_store_slb (CPUPPCState *env, target_ulong rb, target_ulong rs);
> #endif /* defined(TARGET_PPC64) */
> void ppc_store_sr (CPUPPCState *env, int srnum, target_ulong value);
> #endif /* !defined(CONFIG_USER_ONLY) */
> diff --git a/target-ppc/helper.c b/target-ppc/helper.c
> index 4b49101..2094ca3 100644
> --- a/target-ppc/helper.c
> +++ b/target-ppc/helper.c
> @@ -672,85 +672,36 @@ static inline int find_pte(CPUState *env, mmu_ctx_t *ctx, int h, int rw,
> }
> 
> #if defined(TARGET_PPC64)
> -static ppc_slb_t *slb_get_entry(CPUPPCState *env, int nr)
> -{
> -    ppc_slb_t *retval = &env->slb[nr];
> -
> -#if 0 // XXX implement bridge mode?
> -    if (env->spr[SPR_ASR] & 1) {
> -        target_phys_addr_t sr_base;
> -
> -        sr_base = env->spr[SPR_ASR] & 0xfffffffffffff000;
> -        sr_base += (12 * nr);
> -
> -        retval->tmp64 = ldq_phys(sr_base);
> -        retval->tmp = ldl_phys(sr_base + 8);
> -    }
> -#endif
> -
> -    return retval;
> -}
> -
> -static void slb_set_entry(CPUPPCState *env, int nr, ppc_slb_t *slb)
> -{
> -    ppc_slb_t *entry = &env->slb[nr];
> -
> -    if (slb == entry)
> -        return;
> -
> -    entry->tmp64 = slb->tmp64;
> -    entry->tmp = slb->tmp;
> -}
> -
> -static inline int slb_is_valid(ppc_slb_t *slb)
> -{
> -    return (int)(slb->tmp64 & 0x0000000008000000ULL);
> -}
> -
> -static inline void slb_invalidate(ppc_slb_t *slb)
> -{
> -    slb->tmp64 &= ~0x0000000008000000ULL;
> -}
> -
> static inline int slb_lookup(CPUPPCState *env, target_ulong eaddr,
>                              target_ulong *vsid, target_ulong *page_mask,
>                              int *attr, int *target_page_bits)
> {
> -    target_ulong mask;
> -    int n, ret;
> +    uint64_t esid;
> +    int n;
> 
> -    ret = -5;
>     LOG_SLB("%s: eaddr " TARGET_FMT_lx "\n", __func__, eaddr);
> -    mask = 0x0000000000000000ULL; /* Avoid gcc warning */
> +
> +    esid = (eaddr & SEGMENT_MASK_256M) | SLB_ESID_V;
> +
>     for (n = 0; n < env->slb_nr; n++) {
> -        ppc_slb_t *slb = slb_get_entry(env, n);
> -
> -        LOG_SLB("%s: seg %d %016" PRIx64 " %08"
> -                    PRIx32 "\n", __func__, n, slb->tmp64, slb->tmp);
> -        if (slb_is_valid(slb)) {
> -            /* SLB entry is valid */
> -            mask = 0xFFFFFFFFF0000000ULL;
> -            if (slb->tmp & 0x8) {
> -                /* 16 MB PTEs */
> -                if (target_page_bits)
> -                    *target_page_bits = 24;
> -            } else {
> -                /* 4 KB PTEs */
> -                if (target_page_bits)
> -                    *target_page_bits = TARGET_PAGE_BITS;
> -            }
> -            if ((eaddr & mask) == (slb->tmp64 & mask)) {
> -                /* SLB match */
> -                *vsid = ((slb->tmp64 << 24) | (slb->tmp >> 8)) & 0x0003FFFFFFFFFFFFULL;
> -                *page_mask = ~mask;
> -                *attr = slb->tmp & 0xFF;
> -                ret = n;
> -                break;
> +        ppc_slb_t *slb = &env->slb[n];
> +
> +        LOG_SLB("%s: slot %d %016" PRIx64 " %016"
> +                    PRIx64 "\n", __func__, n, slb->esid, slb->vsid);
> +        if (slb->esid == esid) {
> +            *vsid = (slb->vsid & SLB_VSID_VSID) >> SLB_VSID_SHIFT;
> +            *page_mask = ~SEGMENT_MASK_256M;
> +            *attr = slb->vsid & SLB_VSID_ATTR;
> +            if (target_page_bits) {
> +                *target_page_bits = (slb->vsid & SLB_VSID_L)
> +                    ? TARGET_PAGE_BITS_16M
> +                    : TARGET_PAGE_BITS;
>             }
> +            return n;
>         }
>     }
> 
> -    return ret;
> +    return -5;

While at it, -5 really is not very verbose :). Doesn't have to be addressed in this patch though.


Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 03/15] Allow qemu_devtree_setprop() to take arbitrary values
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 03/15] Allow qemu_devtree_setprop() to take arbitrary values David Gibson
@ 2011-02-12 15:18   ` Alexander Graf
  0 siblings, 0 replies; 73+ messages in thread
From: Alexander Graf @ 2011-02-12 15:18 UTC (permalink / raw
  To: David Gibson; +Cc: paulus, qemu-devel, anton


On 12.02.2011, at 15:54, David Gibson wrote:

> From: David Gibson <dwg@au1.ibm.com>
> 
> Currently qemu_devtree_setprop() expects the new property value to be
> given as a uint32_t *.  While property values consisting of u32s are
> common, in general they can have any bytestring value.
> 
> Therefore, this patch alters the function to take a void * instead,
> allowing callers to easily give anything as the property value.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Acked-by: Alexander Graf <agraf@suse.de>


Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 04/15] Add a hook to allow hypercalls to be emulated on PowerPC
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 04/15] Add a hook to allow hypercalls to be emulated on PowerPC David Gibson
@ 2011-02-12 15:19   ` Alexander Graf
  0 siblings, 0 replies; 73+ messages in thread
From: Alexander Graf @ 2011-02-12 15:19 UTC (permalink / raw
  To: David Gibson; +Cc: paulus, qemu-devel, anton


On 12.02.2011, at 15:54, David Gibson wrote:

> From: David Gibson <dwg@au1.ibm.com>
> 
> PowerPC and POWER chips since the POWER4 and 970 have a special
> hypervisor mode, and a corresponding form of the system call
> instruction which traps to the hypervisor.
> 
> qemu currently has stub implementations of hypervisor mode.  That
> is, the outline is there to allow qemu to run a PowerPC hypervisor
> under emulation.  There are a number of details missing so this
> won't actually work at present, but the idea is there.
> 
> What there is no provision at all, is for qemu to instead emulate
> the hypervisor itself.  That is to have hypercalls trap into qemu
> and their result be emulated from qemu, rather than running
> hypervisor code within the emulated system.
> 
> Hypervisor hardware aware KVM implementations are in the works and
> it would  be useful for debugging and development to also allow
> full emulation of the same para-virtualized guests as such a KVM.
> 
> Therefore, this patch adds a hook which will allow a machine to
> set up emulation of hypervisor calls.
> 
> Signed-off-by: David Gibson <dwg@au1.ibm.com>

Acked-by: Alexander Graf <agraf@suse.de>


Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 05/15] Implement PowerPC slbmfee and slbmfev instructions
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 05/15] Implement PowerPC slbmfee and slbmfev instructions David Gibson
@ 2011-02-12 15:23   ` Alexander Graf
  2011-02-13 12:46     ` David Gibson
  0 siblings, 1 reply; 73+ messages in thread
From: Alexander Graf @ 2011-02-12 15:23 UTC (permalink / raw
  To: David Gibson; +Cc: paulus, qemu-devel, anton


On 12.02.2011, at 15:54, David Gibson wrote:

> From: David Gibson <dwg@au1.ibm.com>
> 
> For a 64-bit PowerPC target, qemu correctly implements translation
> through the segment lookaside buffer.  Likewise it supports the
> slbmte instruction which is used to load entries into the SLB.
> 
> However, it does not emulate the slbmfee and slbmfev instructions
> which read SLB entries back into registers.  Because these are
> only occasionally used in guests (mostly for debugging) we get
> away with it.
> 
> However, given the recent SLB cleanups, it becomes quite easy to
> implement these, and thereby allow, amongst other things, a guest
> Linux to use xmon's command to dump the SLB.
> 
> Signed-off-by: David Gibson <dwg@au1.ibm.com>
> ---
> target-ppc/cpu.h       |    2 ++
> target-ppc/helper.c    |   26 ++++++++++++++++++++++++++
> target-ppc/helper.h    |    2 ++
> target-ppc/op_helper.c |   20 ++++++++++++++++++++
> target-ppc/translate.c |   29 ++++++++++++++++++++++++++++-
> 5 files changed, 78 insertions(+), 1 deletions(-)
> 
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index eaddc27..9a7495a 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -781,6 +781,8 @@ void ppc_store_asr (CPUPPCState *env, target_ulong value);
> target_ulong ppc_load_slb (CPUPPCState *env, int slb_nr);
> target_ulong ppc_load_sr (CPUPPCState *env, int sr_nr);
> int ppc_store_slb (CPUPPCState *env, target_ulong rb, target_ulong rs);
> +int ppc_load_slb_esid (CPUPPCState *env, target_ulong rb, target_ulong *rt);
> +int ppc_load_slb_vsid (CPUPPCState *env, target_ulong rb, target_ulong *rt);
> #endif /* defined(TARGET_PPC64) */
> void ppc_store_sr (CPUPPCState *env, int srnum, target_ulong value);
> #endif /* !defined(CONFIG_USER_ONLY) */
> diff --git a/target-ppc/helper.c b/target-ppc/helper.c
> index 19aa067..4830981 100644
> --- a/target-ppc/helper.c
> +++ b/target-ppc/helper.c
> @@ -770,6 +770,32 @@ int ppc_store_slb (CPUPPCState *env, target_ulong rb, target_ulong rs)
> 
>     return 0;
> }
> +
> +int ppc_load_slb_esid (CPUPPCState *env, target_ulong rb, target_ulong *rt)
> +{
> +    int slot = rb & 0xfff;
> +    ppc_slb_t *slb = &env->slb[slot];
> +
> +    if (slot >= env->slb_nr) {
> +        return -1;
> +    }
> +
> +    *rt = slb->esid;
> +    return 0;
> +}
> +
> +int ppc_load_slb_vsid (CPUPPCState *env, target_ulong rb, target_ulong *rt)
> +{
> +    int slot = rb & 0xfff;
> +    ppc_slb_t *slb = &env->slb[slot];
> +
> +    if (slot >= env->slb_nr) {
> +        return -1;
> +    }
> +
> +    *rt = slb->vsid;
> +    return 0;
> +}
> #endif /* defined(TARGET_PPC64) */
> 
> /* Perform segment based translation */
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index d512cb0..1a69cf8 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -341,6 +341,8 @@ DEF_HELPER_FLAGS_0(tlbia, TCG_CALL_CONST, void)
> DEF_HELPER_FLAGS_1(tlbie, TCG_CALL_CONST, void, tl)
> #if defined(TARGET_PPC64)
> DEF_HELPER_FLAGS_2(store_slb, TCG_CALL_CONST, void, tl, tl)
> +DEF_HELPER_1(load_slb_esid, tl, tl)
> +DEF_HELPER_1(load_slb_vsid, tl, tl)
> DEF_HELPER_FLAGS_0(slbia, TCG_CALL_CONST, void)
> DEF_HELPER_FLAGS_1(slbie, TCG_CALL_CONST, void, tl)
> #endif
> diff --git a/target-ppc/op_helper.c b/target-ppc/op_helper.c
> index bf41627..bdb1f17 100644
> --- a/target-ppc/op_helper.c
> +++ b/target-ppc/op_helper.c
> @@ -3753,6 +3753,26 @@ void helper_store_slb (target_ulong rb, target_ulong rs)
>     }
> }
> 
> +target_ulong helper_load_slb_esid (target_ulong rb)
> +{
> +    target_ulong rt;
> +
> +    if (ppc_load_slb_esid(env, rb, &rt) < 0) {
> +        helper_raise_exception_err(POWERPC_EXCP_PROGRAM, POWERPC_EXCP_INVAL);

The spec doesn't say what to do in this case. Have you checked what real hardware does?


Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 06/15] Implement missing parts of the logic for the POWER PURR
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 06/15] Implement missing parts of the logic for the POWER PURR David Gibson
@ 2011-02-12 15:25   ` Alexander Graf
  0 siblings, 0 replies; 73+ messages in thread
From: Alexander Graf @ 2011-02-12 15:25 UTC (permalink / raw
  To: David Gibson; +Cc: paulus, qemu-devel, anton


On 12.02.2011, at 15:54, David Gibson wrote:

> From: David Gibson <dwg@au1.ibm.com>
> 
> The PURR (Processor Utilization Resource Register) is a register found
> on recent POWER CPUs.  The guts of implementing it at least enough to
> get by are already present in qemu, however some of the helper
> functions needed to actually wire it up are missing.
> 
> This patch adds the necessary glue, so that the PURR can be wired up
> when we implement newer POWER CPU targets which include it.
> 
> Signed-off-by: David Gibson <dwg@au1.ibm.com>

Yay, so we can finally emulate POWER5 guests :). Please keep in mind that PURR is also missing in kvm code.

Acked-by: Alexander Graf <agraf@suse.de>


Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 07/15] Correct ppc popcntb logic, implement popcntw and popcntd
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 07/15] Correct ppc popcntb logic, implement popcntw and popcntd David Gibson
@ 2011-02-12 15:27   ` Alexander Graf
  0 siblings, 0 replies; 73+ messages in thread
From: Alexander Graf @ 2011-02-12 15:27 UTC (permalink / raw
  To: David Gibson; +Cc: paulus, qemu-devel, anton


On 12.02.2011, at 15:54, David Gibson wrote:

> From: David Gibson <dwg@au1.ibm.com>
> 
> qemu already includes support for the popcntb instruction introduced
> in POWER5 (although it doesn't actually allow you to choose POWER5).
> 
> However, the logic is slightly incorrect: it will generate results
> truncated to 32-bits when the CPU is in 32-bit mode.  This is not
> normal for powerpc - generally arithmetic instructions on a 64-bit
> powerpc cpu will generate full 64 bit results, it's just that only the
> low 32 bits will be significant for condition codes.
> 
> This patch corrects this nit, which actually simplifies the code slightly.
> 
> In addition, this patch implements the popcntw and popcntd
> instructions added in POWER7, in preparation for allowing POWER7 as an
> emulated CPU.
> 
> Signed-off-by: David Gibson <dwg@au1.ibm.com>

I trust you on the implementation details of this one. Rest looks good.

Acked-by: Alexander Graf <agraf@suse.de>


Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 08/15] Clean up slb_lookup() function
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 08/15] Clean up slb_lookup() function David Gibson
@ 2011-02-12 15:30   ` Alexander Graf
  0 siblings, 0 replies; 73+ messages in thread
From: Alexander Graf @ 2011-02-12 15:30 UTC (permalink / raw
  To: David Gibson; +Cc: paulus, qemu-devel, anton


On 12.02.2011, at 15:54, David Gibson wrote:

> The slb_lookup() function, used in the ppc translation path returns a
> number of slb entry fields in reference parameters.  However, only one
> of the two callers of slb_lookup() actually wants this information.
> 
> This patch, therefore, makes slb_lookup() return a simple pointer to the
> located SLB entry (or NULL), and the caller which needs the fields can
> extract them itself.
> 
> Signed-off-by: David Gibson <dwg@au1.ibm.com>
> ---
> target-ppc/helper.c |   45 ++++++++++++++++++---------------------------
> 1 files changed, 18 insertions(+), 27 deletions(-)
> 
> diff --git a/target-ppc/helper.c b/target-ppc/helper.c
> index 4830981..73d93ca 100644
> --- a/target-ppc/helper.c
> +++ b/target-ppc/helper.c
> @@ -672,9 +672,7 @@ static inline int find_pte(CPUState *env, mmu_ctx_t *ctx, int h, int rw,
> }
> 
> #if defined(TARGET_PPC64)
> -static inline int slb_lookup(CPUPPCState *env, target_ulong eaddr,
> -                             target_ulong *vsid, target_ulong *page_mask,
> -                             int *attr, int *target_page_bits)
> +static inline ppc_slb_t *slb_lookup(CPUPPCState *env, target_ulong eaddr)
> {
>     uint64_t esid;
>     int n;
> @@ -689,19 +687,11 @@ static inline int slb_lookup(CPUPPCState *env, target_ulong eaddr,
>         LOG_SLB("%s: slot %d %016" PRIx64 " %016"
>                     PRIx64 "\n", __func__, n, slb->esid, slb->vsid);
>         if (slb->esid == esid) {
> -            *vsid = (slb->vsid & SLB_VSID_VSID) >> SLB_VSID_SHIFT;
> -            *page_mask = ~SEGMENT_MASK_256M;
> -            *attr = slb->vsid & SLB_VSID_ATTR;
> -            if (target_page_bits) {
> -                *target_page_bits = (slb->vsid & SLB_VSID_L)
> -                    ? TARGET_PAGE_BITS_16M
> -                    : TARGET_PAGE_BITS;
> -            }
> -            return n;
> +            return slb;
>         }
>     }
> 
> -    return -5;
> +    return NULL;

Hah - there goes the -5 :).

> }
> 
> void ppc_slb_invalidate_all (CPUPPCState *env)
> @@ -728,18 +718,13 @@ void ppc_slb_invalidate_all (CPUPPCState *env)
> 
> void ppc_slb_invalidate_one (CPUPPCState *env, uint64_t T0)
> {
> -    target_ulong vsid, page_mask;
> -    int attr;
> -    int n;
>     ppc_slb_t *slb;
> 
> -    n = slb_lookup(env, T0, &vsid, &page_mask, &attr, NULL);
> -    if (n < 0) {
> +    slb = slb_lookup(env, T0);
> +    if (!slb) {
>         return;
>     }
> 
> -    slb = &env->slb[n];
> -
>     if (slb->esid & SLB_ESID_V) {
>         slb->esid &= ~SLB_ESID_V;
> 
> @@ -818,16 +803,22 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
>     pr = msr_pr;
> #if defined(TARGET_PPC64)
>     if (env->mmu_model & POWERPC_MMU_64) {
> -        int attr;
> +        ppc_slb_t *slb;
> 
>         LOG_MMU("Check SLBs\n");
> -        ret = slb_lookup(env, eaddr, &vsid, &page_mask, &attr,
> -                         &target_page_bits);
> -        if (ret < 0)
> -            return ret;
> -        ctx->key = !!(pr ? (attr & SLB_VSID_KP) : (attr & SLB_VSID_KS));
> +        slb = slb_lookup(env, eaddr);
> +        if (!slb) {
> +            return -5;

And here it comes again - sigh :). Well, can't be helped. This at least keeps the current logic as is.
Very nice patch.

Acked-by: Alexander Graf <agraf@suse.de>


Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 09/15] Parse SDR1 on mtspr instead of at translate time
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 09/15] Parse SDR1 on mtspr instead of at translate time David Gibson
@ 2011-02-12 15:37   ` Alexander Graf
  2011-02-13  9:02     ` David Gibson
  0 siblings, 1 reply; 73+ messages in thread
From: Alexander Graf @ 2011-02-12 15:37 UTC (permalink / raw
  To: David Gibson; +Cc: paulus, qemu-devel, anton


On 12.02.2011, at 15:54, David Gibson wrote:

> On ppc machines with hash table MMUs, the special purpose register SDR1
> contains both the base address of the encoded size (hashed) page tables.
> 
> At present, we interpret the SDR1 value within the address translation
> path.  But because the encodings of the size for 32-bit and 64-bit are
> different this makes for a confusing branch on the MMU type with a bunch
> of curly shifts and masks in the middle of the translate path.
> 
> This patch cleans things up by moving the interpretation on SDR1 into the
> helper function handling the write to the register.  This leaves a simple
> pre-sanitized base address and mask for the hash table in the CPUState
> structure which is easier to work with in the translation path.
> 
> This makes the translation path more readable.  It addresses the FIXME
> comment currently in the mtsdr1 helper, by validating the SDR1 value during
> interpretation.  Finally it opens the way for emulating a pSeries-style
> partition where the hash table used for translation is not mapped into
> the guests's RAM.
> 
> Signed-off-by: David Gibson <dwg@au1.ibm.com>
> ---
> monitor.c                   |    2 +-
> target-ppc/cpu.h            |   11 +++++-
> target-ppc/helper.c         |   79 ++++++++++++++++++++++++-------------------
> target-ppc/machine.c        |    6 ++-
> target-ppc/translate.c      |    2 +-
> target-ppc/translate_init.c |    7 +---
> 6 files changed, 61 insertions(+), 46 deletions(-)
> 
> diff --git a/monitor.c b/monitor.c
> index 7fc311d..3f77ffc 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -3457,7 +3457,7 @@ static const MonitorDef monitor_defs[] = {
>     { "asr", offsetof(CPUState, asr) },
> #endif
>     /* Segment registers */
> -    { "sdr1", offsetof(CPUState, sdr1) },
> +    { "sdr1", offsetof(CPUState, spr[SPR_SDR1]) },
>     { "sr0", offsetof(CPUState, sr[0]) },
>     { "sr1", offsetof(CPUState, sr[1]) },
>     { "sr2", offsetof(CPUState, sr[2]) },
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index f9ad3b8..4d30352 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -359,6 +359,14 @@ union ppc_tlb_t {
> };
> #endif
> 
> +#define SDR_HTABORG_32         0xFFFF0000UL
> +#define SDR_HTABMASK           0x000001FFUL

Please mark this constant as ppc32

> +
> +#if defined(TARGET_PPC64)
> +#define SDR_HTABORG_64         0xFFFFFFFFFFFC0000ULL
> +#define SDR_HTABSIZE           0x000000000000001FULL

Please mark this constant as ppc64


The rest looks good :)


Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 10/15] Use "hash" more consistently in ppc mmu code
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 10/15] Use "hash" more consistently in ppc mmu code David Gibson
@ 2011-02-12 15:47   ` Alexander Graf
  0 siblings, 0 replies; 73+ messages in thread
From: Alexander Graf @ 2011-02-12 15:47 UTC (permalink / raw
  To: David Gibson; +Cc: paulus, qemu-devel, anton


On 12.02.2011, at 15:54, David Gibson wrote:

> Currently, get_segment() has a variable called hash.  However it doesn't
> (quite) get the hash value for the ppc hashed page table.  Instead it
> gets the hash shifted - effectively the offset of the hash bucket within
> the hash page table.
> 
> As well, as being different to the normal use of plain "hash" in the
> architecture documentation, this usage necessitates some awkward 32/64
> dependent masks and shifts which clutter up the path in get_segment().
> 
> This patch alters the code to use raw hash values through get_segment()
> including storing raw hashes instead of pte group offsets in the ctx
> structure.  This cleans up the path noticeably.
> 
> This does necessitate 32/64 dependent shifts when the hash values are
> taken out of the ctx structure and used, but those paths already have
> 32/64 bit variants so this is less awkward than it was in get_segment().
> 
> Signed-off-by: David Gibson <dwg@au1.ibm.com>

Still complex, but a lot more readable than before now :)


Acked-by: Alexander Graf <agraf@suse.de>


Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 11/15] Better factor the ppc hash translation path
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 11/15] Better factor the ppc hash translation path David Gibson
@ 2011-02-12 15:52   ` Alexander Graf
  0 siblings, 0 replies; 73+ messages in thread
From: Alexander Graf @ 2011-02-12 15:52 UTC (permalink / raw
  To: David Gibson; +Cc: paulus, qemu-devel, anton


On 12.02.2011, at 15:54, David Gibson wrote:

> Currently the path handling hash page table translation in get_segment()
> has a mix of common and 32 or 64 bit specific code.  However the
> division is not done terribly well which results in a lot of messy code
> flipping between common and divided paths.
> 
> This patch improves the organization, consolidating several divided paths
> into one.  This in turn allows simplification of some code in
> get_segment(), removing a number of ugly interim variables.
> 
> This new factorization will also make it easier to add support for the 1T
> segments added in newer CPUs.
> 
> Signed-off-by: David Gibson <dwg@au1.ibm.com>

Acked-by: Alexander Graf <agraf@suse.de>


Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 12/15] Support 1T segments on ppc
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 12/15] Support 1T segments on ppc David Gibson
@ 2011-02-12 15:57   ` Alexander Graf
  2011-02-13  9:34     ` David Gibson
  0 siblings, 1 reply; 73+ messages in thread
From: Alexander Graf @ 2011-02-12 15:57 UTC (permalink / raw
  To: David Gibson; +Cc: paulus, qemu-devel, anton


On 12.02.2011, at 15:54, David Gibson wrote:

> Traditionally, the "segments" used for the two-stage translation used on
> powerpc MMUs were 256MB in size.  This was the only option on all hash
> page table based 32-bit powerpc cpus, and on the earlier 64-bit hash page
> table based cpus.  However, newer 64-bit cpus also permit 1TB segments
> 
> This patch adds support for 1TB segment translation to the qemu code.
> 
> Signed-off-by: David Gibson <dwg@au1.ibm.com>
> ---
> target-ppc/cpu.h    |    7 ++++++
> target-ppc/helper.c |   58 ++++++++++++++++++++++++++++++++++++---------------
> 2 files changed, 48 insertions(+), 17 deletions(-)
> 
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index 3df6758..53b788f 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -114,6 +114,7 @@ enum powerpc_mmu_t {
>     POWERPC_MMU_601        = 0x0000000A,
> #if defined(TARGET_PPC64)
> #define POWERPC_MMU_64       0x00010000
> +#define POWERPC_MMU_1TSEG    0x00020000
>     /* 64 bits PowerPC MMU                                     */
>     POWERPC_MMU_64B        = POWERPC_MMU_64 | 0x00000001,
>     /* 620 variant (no segment exceptions)                     */
> @@ -382,9 +383,11 @@ struct ppc_slb_t {
> 
> /* Bits in the SLB VSID word */
> #define SLB_VSID_SHIFT          12
> +#define SLB_VSID_SHIFT_1T       24
> #define SLB_VSID_SSIZE_SHIFT    62
> #define SLB_VSID_B              0xc000000000000000ULL
> #define SLB_VSID_B_256M         0x0000000000000000ULL
> +#define SLB_VSID_B_1T           0x4000000000000000ULL
> #define SLB_VSID_VSID           0x3FFFFFFFFFFFF000ULL
> #define SLB_VSID_PTEM           (SLB_VSID_B | SLB_VSID_VSID)
> #define SLB_VSID_KS             0x0000000000000800ULL
> @@ -398,6 +401,10 @@ struct ppc_slb_t {
> #define SEGMENT_SHIFT_256M      28
> #define SEGMENT_MASK_256M       ~((1ULL << SEGMENT_SHIFT_256M) - 1)
> 
> +#define SEGMENT_SHIFT_1T        40
> +#define SEGMENT_MASK_1T         ~((1ULL << SEGMENT_SHIFT_1T) - 1)
> +
> +
> /*****************************************************************************/
> /* Machine state register bits definition                                    */
> #define MSR_SF   63 /* Sixty-four-bit mode                            hflags */
> diff --git a/target-ppc/helper.c b/target-ppc/helper.c
> index 6a1127f..158da09 100644
> --- a/target-ppc/helper.c
> +++ b/target-ppc/helper.c
> @@ -669,19 +669,25 @@ static inline int find_pte(CPUState *env, mmu_ctx_t *ctx, int h, int rw,
> #if defined(TARGET_PPC64)
> static inline ppc_slb_t *slb_lookup(CPUPPCState *env, target_ulong eaddr)
> {
> -    uint64_t esid;
> +    uint64_t match_256M, match_1T;
>     int n;
> 
>     LOG_SLB("%s: eaddr " TARGET_FMT_lx "\n", __func__, eaddr);
> 
> -    esid = (eaddr & SEGMENT_MASK_256M) | SLB_ESID_V;
> +    match_256M = (eaddr & SEGMENT_MASK_256M) | SLB_ESID_V |
> +        (SLB_VSID_B_256M >> SLB_VSID_SSIZE_SHIFT);
> +    match_1T = (eaddr & SEGMENT_MASK_1T) | SLB_ESID_V |
> +        (SLB_VSID_B_1T >> SLB_VSID_SSIZE_SHIFT);
> 
>     for (n = 0; n < env->slb_nr; n++) {
>         ppc_slb_t *slb = &env->slb[n];
> 
>         LOG_SLB("%s: slot %d %016" PRIx64 " %016"
>                     PRIx64 "\n", __func__, n, slb->esid, slb->vsid);
> -        if (slb->esid == esid) {
> +        /* We check for 1T matches on all MMUs here - if the MMU
> +         * doesn't have 1T segment support, we will have prevented 1T
> +         * entries from being inserted in the slbmte code. */
> +        if ((slb->esid == match_256M) || (slb->esid == match_1T)) {
>             return slb;
>         }
>     }
> @@ -734,16 +740,21 @@ void ppc_slb_invalidate_one (CPUPPCState *env, uint64_t T0)
> int ppc_store_slb (CPUPPCState *env, target_ulong rb, target_ulong rs)
> {
>     int slot = rb & 0xfff;
> -    uint64_t esid = rb & ~0xfff;
>     ppc_slb_t *slb = &env->slb[slot];
> -
> -    if (slot >= env->slb_nr) {
> -        return -1;
> -    }
> -
> -    slb->esid = esid;
> + 
> +    if (rb & (0x1000 - env->slb_nr))

Braces...

> +	return -1; /* Reserved bits set or slot too high */
> +    if (rs & (SLB_VSID_B & ~SLB_VSID_B_1T))

here too

> +	return -1; /* Bad segment size */
> +    if ((rs & SLB_VSID_B) && !(env->mmu_model & POWERPC_MMU_1TSEG))

and here

> + 	return -1; /* 1T segment on MMU that doesn't support it */
> + 
> +    /* We stuff a copy of the B field into slb->esid to simplify
> +     * lookup later */
> +    slb->esid = (rb & (SLB_ESID_ESID | SLB_ESID_V)) |
> +        (rs >> SLB_VSID_SSIZE_SHIFT);

Wouldn't it be easier to add another field?

>     slb->vsid = rs;
> -
> + 
>     LOG_SLB("%s: %d " TARGET_FMT_lx " - " TARGET_FMT_lx " => %016" PRIx64
>             " %016" PRIx64 "\n", __func__, slot, rb, rs,
>             slb->esid, slb->vsid);
> @@ -760,7 +771,8 @@ int ppc_load_slb_esid (CPUPPCState *env, target_ulong rb, target_ulong *rt)
>         return -1;
>     }
> 
> -    *rt = slb->esid;
> +    /* Mask out the extra copy of the B field inserted in store_slb */
> +    *rt = slb->esid & ~0x3;
>     return 0;
> }
> 
> @@ -793,6 +805,7 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
>     if (env->mmu_model & POWERPC_MMU_64) {
>         ppc_slb_t *slb;
>         target_ulong pageaddr;
> +        int segment_bits;
> 
>         LOG_MMU("Check SLBs\n");
>         slb = slb_lookup(env, eaddr);
> @@ -800,7 +813,14 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
>             return -5;
>         }
> 
> -        vsid = (slb->vsid & SLB_VSID_VSID) >> SLB_VSID_SHIFT;
> +	if (slb->vsid & SLB_VSID_B) {
> +	    vsid = (slb->vsid & SLB_VSID_VSID) >> SLB_VSID_SHIFT_1T;
> +	    segment_bits = 40;
> +	} else {
> +	    vsid = (slb->vsid & SLB_VSID_VSID) >> SLB_VSID_SHIFT;
> +	    segment_bits = 28;
> +	}
> +
>         target_page_bits = (slb->vsid & SLB_VSID_L)
>             ? TARGET_PAGE_BITS_16M : TARGET_PAGE_BITS;
>         ctx->key = !!(pr ? (slb->vsid & SLB_VSID_KP)
> @@ -808,11 +828,15 @@ static inline int get_segment(CPUState *env, mmu_ctx_t *ctx,
>         ds = 0;
>         ctx->nx = !!(slb->vsid & SLB_VSID_N);
> 
> -        pageaddr = eaddr & ((1ULL << 28) - (1ULL << target_page_bits));
> -        /* XXX: this is false for 1 TB segments */
> -        hash = vsid ^ (pageaddr >> target_page_bits);
> +        pageaddr = eaddr & ((1ULL << segment_bits) 
> +                            - (1ULL << target_page_bits));
> +	if (slb->vsid & SLB_VSID_B)

Braces

> +	    hash = vsid ^ (vsid << 25) ^ (pageaddr >> target_page_bits);
> +	else

here too


Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 13/15] Add POWER7 support for ppc
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 13/15] Add POWER7 support for ppc David Gibson
@ 2011-02-12 16:09   ` Alexander Graf
  2011-02-13  9:39     ` David Gibson
  0 siblings, 1 reply; 73+ messages in thread
From: Alexander Graf @ 2011-02-12 16:09 UTC (permalink / raw
  To: David Gibson; +Cc: paulus, qemu-devel, anton


On 12.02.2011, at 15:54, David Gibson wrote:

> This adds emulation support for the recent POWER7 cpu to qemu.  It's far
> from perfect - it's missing a number of POWER7 features so far, including
> any support for VSX or decimal floating point instructions.  However, it's
> close enough to boot a kernel with the POWER7 PVR.
> 
> Signed-off-by: David Gibson <dwg@au1.ibm.com>
> ---
> hw/ppc.c                    |   83 ++++++++++++++++++++++++++++++++++
> hw/ppc.h                    |    1 +
> target-ppc/cpu.h            |   19 ++++++++
> target-ppc/helper.c         |    6 +++
> target-ppc/translate_init.c |  103 +++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 212 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/ppc.c b/hw/ppc.c
> index 968aec1..6975636 100644
> --- a/hw/ppc.c
> +++ b/hw/ppc.c
> @@ -246,6 +246,89 @@ void ppc970_irq_init (CPUState *env)
>     env->irq_inputs = (void **)qemu_allocate_irqs(&ppc970_set_irq, env,
>                                                   PPC970_INPUT_NB);
> }
> +
> +/* POWER7 internal IRQ controller */
> +static void power7_set_irq (void *opaque, int pin, int level)
> +{
> +    CPUState *env = opaque;
> +    int cur_level;
> +
> +    LOG_IRQ("%s: env %p pin %d level %d\n", __func__,
> +                env, pin, level);
> +    cur_level = (env->irq_input_state >> pin) & 1;
> +    /* Don't generate spurious events */
> +    if ((cur_level == 1 && level == 0) || (cur_level == 0 && level != 0)) {

Did you hit this? Qemu's irq framework should already ensure that property. I'm also not sure it's actually correct - if a level interrupt is on, the guest would get another interrupt injected, no? That would be cur_level ==1 && level == 1 IIUC.

> +        switch (pin) {
> +        case POWER7_INPUT_INT:
> +            /* Level sensitive - active high */
> +            LOG_IRQ("%s: set the external IRQ state to %d\n",
> +                        __func__, level);
> +            ppc_set_irq(env, PPC_INTERRUPT_EXT, level);
> +            break;
> +        case POWER7_INPUT_THINT:
> +            /* Level sensitive - active high */
> +            LOG_IRQ("%s: set the SMI IRQ state to %d\n", __func__,
> +                        level);
> +            ppc_set_irq(env, PPC_INTERRUPT_THERM, level);
> +            break;
> +        case POWER7_INPUT_MCP:
> +            /* Negative edge sensitive */
> +            /* XXX: TODO: actual reaction may depends on HID0 status
> +             *            603/604/740/750: check HID0[EMCP]
> +             */
> +            if (cur_level == 1 && level == 0) {
> +                LOG_IRQ("%s: raise machine check state\n",
> +                            __func__);
> +                ppc_set_irq(env, PPC_INTERRUPT_MCK, 1);
> +            }
> +            break;
> +        case POWER7_INPUT_CKSTP:

POWER7 has checkstop?

> +            /* Level sensitive - active low */
> +            /* XXX: TODO: relay the signal to CKSTP_OUT pin */
> +            if (level) {
> +                LOG_IRQ("%s: stop the CPU\n", __func__);
> +                env->halted = 1;
> +            } else {
> +                LOG_IRQ("%s: restart the CPU\n", __func__);
> +                env->halted = 0;
> +            }
> +            break;
> +        case POWER7_INPUT_HRESET:

Does this ever get triggered? POWER7 is run in lpar only, so there is no hreset, right?

> +            /* Level sensitive - active low */
> +            if (level) {
> +#if 0 // XXX: TOFIX
> +                LOG_IRQ("%s: reset the CPU\n", __func__);
> +                cpu_reset(env);
> +#endif
> +            }
> +            break;
> +        case POWER7_INPUT_SRESET:
> +            LOG_IRQ("%s: set the RESET IRQ state to %d\n",
> +                        __func__, level);
> +            ppc_set_irq(env, PPC_INTERRUPT_RESET, level);
> +            break;
> +        case POWER7_INPUT_TBEN:
> +            LOG_IRQ("%s: set the TBEN state to %d\n", __func__,
> +                        level);
> +            /* XXX: TODO */

Hrm - what is this?

> +            break;
> +        default:
> +            /* Unknown pin - do nothing */
> +            LOG_IRQ("%s: unknown IRQ pin %d\n", __func__, pin);
> +            return;
> +        }
> +        if (level)

Braces

> +            env->irq_input_state |= 1 << pin;
> +        else
> +            env->irq_input_state &= ~(1 << pin);
> +    }
> +}
> +
> +void ppcPOWER7_irq_init (CPUState *env)
> +{
> +    env->irq_inputs = (void **)qemu_allocate_irqs(&power7_set_irq, env,
> +                                                  POWER7_INPUT_NB);
> +}
> #endif /* defined(TARGET_PPC64) */
> 
> /* PowerPC 40x internal IRQ controller */
> diff --git a/hw/ppc.h b/hw/ppc.h
> index 34f54cf..3ccf134 100644
> --- a/hw/ppc.h
> +++ b/hw/ppc.h
> @@ -36,6 +36,7 @@ void ppc40x_irq_init (CPUState *env);
> void ppce500_irq_init (CPUState *env);
> void ppc6xx_irq_init (CPUState *env);
> void ppc970_irq_init (CPUState *env);
> +void ppcPOWER7_irq_init (CPUState *env);
> 
> /* PPC machines for OpenBIOS */
> enum {
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index 53b788f..fa3cd7f 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -119,6 +119,8 @@ enum powerpc_mmu_t {
>     POWERPC_MMU_64B        = POWERPC_MMU_64 | 0x00000001,
>     /* 620 variant (no segment exceptions)                     */
>     POWERPC_MMU_620        = POWERPC_MMU_64 | 0x00000002,
> +    /* Architecture 2.06 variant                               */
> +    POWERPC_MMU_2_06       = POWERPC_MMU_64 | POWERPC_MMU_1TSEG | 0x00000003,
> #endif /* defined(TARGET_PPC64) */
> };
> 
> @@ -154,6 +156,8 @@ enum powerpc_excp_t {
> #if defined(TARGET_PPC64)
>     /* PowerPC 970 exception model      */
>     POWERPC_EXCP_970,
> +    /* POWER7 exception model           */
> +    POWERPC_EXCP_POWER7,
> #endif /* defined(TARGET_PPC64) */
> };
> 
> @@ -289,6 +293,8 @@ enum powerpc_input_t {
>     PPC_FLAGS_INPUT_405,
>     /* PowerPC 970 bus                  */
>     PPC_FLAGS_INPUT_970,
> +    /* PowerPC POWER7 bus               */
> +    PPC_FLAGS_INPUT_POWER7,
>     /* PowerPC 401 bus                  */
>     PPC_FLAGS_INPUT_401,
>     /* Freescale RCPU bus               */
> @@ -1003,6 +1009,7 @@ static inline void cpu_clone_regs(CPUState *env, target_ulong newsp)
> #define SPR_HSPRG1            (0x131)
> #define SPR_HDSISR            (0x132)
> #define SPR_HDAR              (0x133)
> +#define SPR_SPURR             (0x134)
> #define SPR_BOOKE_DBCR0       (0x134)
> #define SPR_IBCR              (0x135)
> #define SPR_PURR              (0x135)
> @@ -1627,6 +1634,18 @@ enum {
>     PPC970_INPUT_THINT      = 6,
>     PPC970_INPUT_NB,
> };
> +
> +enum {
> +    /* POWER7 input pins */
> +    POWER7_INPUT_HRESET     = 0,
> +    POWER7_INPUT_SRESET     = 1,
> +    POWER7_INPUT_CKSTP      = 2,
> +    POWER7_INPUT_TBEN       = 3,
> +    POWER7_INPUT_MCP        = 4,
> +    POWER7_INPUT_INT        = 5,
> +    POWER7_INPUT_THINT      = 6,
> +    POWER7_INPUT_NB,
> +};
> #endif
> 
> /* Hardware exceptions definitions */
> diff --git a/target-ppc/helper.c b/target-ppc/helper.c
> index 158da09..a630148 100644
> --- a/target-ppc/helper.c
> +++ b/target-ppc/helper.c
> @@ -1192,6 +1192,7 @@ static inline int check_physical(CPUState *env, mmu_ctx_t *ctx,
> #if defined(TARGET_PPC64)
>     case POWERPC_MMU_620:
>     case POWERPC_MMU_64B:
> +    case POWERPC_MMU_2_06:
>         /* Real address are 60 bits long */
>         ctx->raddr &= 0x0FFFFFFFFFFFFFFFULL;
>         ctx->prot |= PAGE_WRITE;
> @@ -1269,6 +1270,7 @@ int get_physical_address (CPUState *env, mmu_ctx_t *ctx, target_ulong eaddr,
> #if defined(TARGET_PPC64)
>         case POWERPC_MMU_620:
>         case POWERPC_MMU_64B:
> +        case POWERPC_MMU_2_06:
> #endif
>             if (ret < 0) {
>                 /* We didn't match any BAT entry or don't have BATs */
> @@ -1368,6 +1370,7 @@ int cpu_ppc_handle_mmu_fault (CPUState *env, target_ulong address, int rw,
> #if defined(TARGET_PPC64)
>                 case POWERPC_MMU_620:
>                 case POWERPC_MMU_64B:
> +                case POWERPC_MMU_2_06:
> #endif
>                     env->exception_index = POWERPC_EXCP_ISI;
>                     env->error_code = 0x40000000;
> @@ -1475,6 +1478,7 @@ int cpu_ppc_handle_mmu_fault (CPUState *env, target_ulong address, int rw,
> #if defined(TARGET_PPC64)
>                 case POWERPC_MMU_620:
>                 case POWERPC_MMU_64B:
> +                case POWERPC_MMU_2_06:
> #endif
>                     env->exception_index = POWERPC_EXCP_DSI;
>                     env->error_code = 0;
> @@ -1798,6 +1802,7 @@ void ppc_tlb_invalidate_all (CPUPPCState *env)
> #if defined(TARGET_PPC64)
>     case POWERPC_MMU_620:
>     case POWERPC_MMU_64B:
> +    case POWERPC_MMU_2_06:
> #endif /* defined(TARGET_PPC64) */
>         tlb_flush(env, 1);
>         break;
> @@ -1865,6 +1870,7 @@ void ppc_tlb_invalidate_one (CPUPPCState *env, target_ulong addr)
> #if defined(TARGET_PPC64)
>     case POWERPC_MMU_620:
>     case POWERPC_MMU_64B:
> +    case POWERPC_MMU_2_06:
>         /* tlbie invalidate TLBs for all segments */
>         /* XXX: given the fact that there are too many segments to invalidate,
>          *      and we still don't have a tlb_flush_mask(env, n, mask) in Qemu,
> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
> index c84581e..2faa591 100644
> --- a/target-ppc/translate_init.c
> +++ b/target-ppc/translate_init.c
> @@ -61,6 +61,7 @@ void glue(glue(ppc, name),_irq_init) (CPUPPCState *env);
> PPC_IRQ_INIT_FN(40x);
> PPC_IRQ_INIT_FN(6xx);
> PPC_IRQ_INIT_FN(970);
> +PPC_IRQ_INIT_FN(POWER7);
> PPC_IRQ_INIT_FN(e500);
> 
> /* Generic callbacks:
> @@ -3087,6 +3088,35 @@ static void init_excp_970 (CPUPPCState *env)
>     env->hreset_vector = 0x0000000000000100ULL;
> #endif
> }
> +
> +static void init_excp_POWER7 (CPUPPCState *env)
> +{
> +#if !defined(CONFIG_USER_ONLY)
> +    env->excp_vectors[POWERPC_EXCP_RESET]    = 0x00000100;
> +    env->excp_vectors[POWERPC_EXCP_MCHECK]   = 0x00000200;
> +    env->excp_vectors[POWERPC_EXCP_DSI]      = 0x00000300;
> +    env->excp_vectors[POWERPC_EXCP_DSEG]     = 0x00000380;
> +    env->excp_vectors[POWERPC_EXCP_ISI]      = 0x00000400;
> +    env->excp_vectors[POWERPC_EXCP_ISEG]     = 0x00000480;
> +    env->excp_vectors[POWERPC_EXCP_EXTERNAL] = 0x00000500;
> +    env->excp_vectors[POWERPC_EXCP_ALIGN]    = 0x00000600;
> +    env->excp_vectors[POWERPC_EXCP_PROGRAM]  = 0x00000700;
> +    env->excp_vectors[POWERPC_EXCP_FPU]      = 0x00000800;
> +    env->excp_vectors[POWERPC_EXCP_DECR]     = 0x00000900;
> +    env->excp_vectors[POWERPC_EXCP_HDECR]    = 0x00000980;
> +    env->excp_vectors[POWERPC_EXCP_SYSCALL]  = 0x00000C00;
> +    env->excp_vectors[POWERPC_EXCP_TRACE]    = 0x00000D00;
> +    env->excp_vectors[POWERPC_EXCP_PERFM]    = 0x00000F00;
> +    env->excp_vectors[POWERPC_EXCP_VPU]      = 0x00000F20;
> +    env->excp_vectors[POWERPC_EXCP_IABR]     = 0x00001300;
> +    env->excp_vectors[POWERPC_EXCP_MAINT]    = 0x00001600;
> +    env->excp_vectors[POWERPC_EXCP_VPUA]     = 0x00001700;
> +    env->excp_vectors[POWERPC_EXCP_THERM]    = 0x00001800;
> +    env->hreset_excp_prefix = 0x00000000FFF00000ULL;
> +    /* Hardware reset vector */
> +    env->hreset_vector = 0x0000000000000100ULL;
> +#endif
> +}
> #endif
> 
> /*****************************************************************************/
> @@ -6268,6 +6298,74 @@ static void init_proc_970MP (CPUPPCState *env)
>     vscr_init(env, 0x00010000);
> }
> 
> +/* POWER7 (actually a somewhat hacked 970FX for now...) */
> +#define POWERPC_INSNS_POWER7  (PPC_INSNS_BASE | PPC_STRING | PPC_MFTB |        \
> +                              PPC_FLOAT | PPC_FLOAT_FSEL | PPC_FLOAT_FRES |   \
> +                              PPC_FLOAT_FSQRT | PPC_FLOAT_FRSQRTE |           \
> +                              PPC_FLOAT_STFIWX |                              \
> +                              PPC_CACHE | PPC_CACHE_ICBI | PPC_CACHE_DCBZT |  \
> +                              PPC_MEM_SYNC | PPC_MEM_EIEIO |                  \
> +                              PPC_MEM_TLBIE | PPC_MEM_TLBSYNC |               \
> +                              PPC_64B | PPC_ALTIVEC |                         \
> +                              PPC_SEGMENT_64B | PPC_SLBI |                    \
> +                              PPC_POPCNTB | PPC_POPCNTWD)
> +#define POWERPC_MSRM_POWER7   (0x800000000204FF36ULL)
> +#define POWERPC_MMU_POWER7    (POWERPC_MMU_2_06)
> +#define POWERPC_EXCP_POWER7   (POWERPC_EXCP_POWER7)
> +#define POWERPC_INPUT_POWER7  (PPC_FLAGS_INPUT_POWER7)
> +#define POWERPC_BFDM_POWER7   (bfd_mach_ppc64)
> +#define POWERPC_FLAG_POWER7   (POWERPC_FLAG_VRE | POWERPC_FLAG_SE |            \
> +                              POWERPC_FLAG_BE | POWERPC_FLAG_PMM |            \
> +                              POWERPC_FLAG_BUS_CLK)
> +#define check_pow_POWER7    check_pow_nocheck
> +
> +static void init_proc_POWER7 (CPUPPCState *env)
> +{
> +    gen_spr_ne_601(env);
> +    gen_spr_7xx(env);
> +    /* Time base */
> +    gen_tbl(env);
> +    /* PURR & SPURR: Hack - treat these as aliases for the TB for now */
> +    spr_register(env, SPR_PURR,   "PURR",
> +                 &spr_read_purr, SPR_NOACCESS,
> +                 &spr_read_purr, SPR_NOACCESS,
> +                 0x00000000);
> +    spr_register(env, SPR_SPURR,   "SPURR",
> +                 &spr_read_purr, SPR_NOACCESS,
> +                 &spr_read_purr, SPR_NOACCESS,
> +                 0x00000000);
> +    /* Memory management */
> +    /* XXX : not implemented */
> +    spr_register(env, SPR_MMUCFG, "MMUCFG",
> +                 SPR_NOACCESS, SPR_NOACCESS,
> +                 &spr_read_generic, SPR_NOACCESS,
> +                 0x00000000); /* TOFIX */
> +    /* XXX : not implemented */
> +    spr_register(env, SPR_CTRL, "SPR_CTRLT",
> +                 SPR_NOACCESS, SPR_NOACCESS,
> +                 &spr_read_generic, &spr_write_generic,
> +                 0x80800000);
> +    spr_register(env, SPR_UCTRL, "SPR_CTRLF",
> +                 SPR_NOACCESS, SPR_NOACCESS,
> +                 &spr_read_generic, &spr_write_generic,
> +                 0x80800000);
> +    spr_register(env, SPR_VRSAVE, "SPR_VRSAVE",
> +                 &spr_read_generic, &spr_write_generic,
> +                 &spr_read_generic, &spr_write_generic,
> +                 0x00000000);
> +#if !defined(CONFIG_USER_ONLY)
> +    env->slb_nr = 32;

POWER7 has 64, no? Please check this :).

> +#endif
> +    init_excp_POWER7(env);
> +    env->dcache_line_size = 128;
> +    env->icache_line_size = 128;
> +    /* Allocate hardware IRQ controller */
> +    ppcPOWER7_irq_init(env);
> +    /* Can't find information on what this should be on reset.  This
> +     * value is the one used by 74xx processors. */
> +    vscr_init(env, 0x00010000);
> +}
> +
> /* PowerPC 620                                                               */
> #define POWERPC_INSNS_620    (PPC_INSNS_BASE | PPC_STRING | PPC_MFTB |        \
>                               PPC_FLOAT | PPC_FLOAT_FSEL | PPC_FLOAT_FRES |   \
> @@ -6990,6 +7088,8 @@ enum {
>     CPU_POWERPC_POWER6             = 0x003E0000,
>     CPU_POWERPC_POWER6_5           = 0x0F000001, /* POWER6 in POWER5 mode */
>     CPU_POWERPC_POWER6A            = 0x0F000002,
> +#define CPU_POWERPC_POWER7           CPU_POWERPC_POWER7_v20
> +    CPU_POWERPC_POWER7_v20         = 0x003F0200,
>     CPU_POWERPC_970                = 0x00390202,
> #define CPU_POWERPC_970FX            CPU_POWERPC_970FX_v31
>     CPU_POWERPC_970FX_v10          = 0x00391100,
> @@ -8792,6 +8892,9 @@ static const ppc_def_t ppc_defs[] = {
>     /* POWER6A                                                               */
>     POWERPC_DEF("POWER6A",       CPU_POWERPC_POWER6A,                POWER6),
> #endif
> +    /* POWER7                                                                */
> +    POWERPC_DEF("POWER7",	 CPU_POWERPC_POWER7,		     POWER7),
> +    POWERPC_DEF("POWER7_v2.0",	 CPU_POWERPC_POWER7_v20,	     POWER7),
>     /* PowerPC 970                                                           */
>     POWERPC_DEF("970",           CPU_POWERPC_970,                    970),
>     /* PowerPC 970FX (G5)                                                    */

Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 14/15] Start implementing pSeries logical partition machine
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 14/15] Start implementing pSeries logical partition machine David Gibson
@ 2011-02-12 16:23   ` Alexander Graf
  2011-02-12 16:40     ` Blue Swirl
  0 siblings, 1 reply; 73+ messages in thread
From: Alexander Graf @ 2011-02-12 16:23 UTC (permalink / raw
  To: David Gibson
  Cc: Blue Swirl, Paul Mackerras, qemu-devel@nongnu.org List, anton


On 12.02.2011, at 15:54, David Gibson wrote:

> This patch adds a "pseries" machine to qemu.  This aims to emulate a
> logical partition on an IBM pSeries machine, compliant to the
> "PowerPC Architecture Platform Requirements" (PAPR) document.
> 
> This initial version is quite limited, it implements a basic machine
> and PAPR hypercall emulation.  So far only one hypercall is present -
> H_PUT_TERM_CHAR - so that a (write-only) console is available.
> 
> The machine so far more resembles an old POWER4 style "full system
> partition" rather than a modern LPAR, in that the guest manages the
> page tables directly, rather than via hypercalls.
> 
> The machine requires qemu to be configured with --enable-fdt.  The
> machine can (so far) only be booted with -kernel - i.e. no partition
> firmware is provided.
> 
> Signed-off-by: David Gibson <dwg@au1.ibm.com>
> ---
> Makefile.target  |    2 +
> hw/spapr.c       |  279 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> hw/spapr.h       |  240 ++++++++++++++++++++++++++++++++++++++++++++++
> hw/spapr_hcall.c |   40 ++++++++
> 4 files changed, 561 insertions(+), 0 deletions(-)
> create mode 100644 hw/spapr.c
> create mode 100644 hw/spapr.h
> create mode 100644 hw/spapr_hcall.c
> 
> diff --git a/Makefile.target b/Makefile.target
> index 48e6c00..e0796ba 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -231,6 +231,8 @@ obj-ppc-y += ppc_prep.o
> obj-ppc-y += ppc_oldworld.o
> # NewWorld PowerMac
> obj-ppc-y += ppc_newworld.o
> +# IBM pSeries (sPAPR)
> +obj-ppc-y += spapr.o spapr_hcall.o
> # PowerPC 4xx boards
> obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
> obj-ppc-y += ppc440.o ppc440_bamboo.o
> diff --git a/hw/spapr.c b/hw/spapr.c
> new file mode 100644
> index 0000000..8aca4e0
> --- /dev/null
> +++ b/hw/spapr.c
> @@ -0,0 +1,279 @@
> +/*
> + * QEMU PowerPC pSeries Logical Partition (aka sPAPR) hardware System Emulator
> + *
> + * Copyright (c) 2004-2007 Fabrice Bellard
> + * Copyright (c) 2007 Jocelyn Mayer
> + * Copyright (c) 2010 David Gibson, IBM Corporation.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + *
> + */
> +#include "hw.h"
> +#include "ppc.h"
> +#include "pc.h"
> +#include "sysemu.h"
> +#include "boards.h"
> +#include "fw_cfg.h"
> +#include "loader.h"
> +#include "elf.h"
> +#include "kvm.h"
> +#include "kvm_ppc.h"
> +#include "net.h"
> +#include "blockdev.h"
> +#include "hw/spapr.h"
> +
> +#include <libfdt.h>
> +
> +#define KERNEL_LOAD_ADDR        0x00000000
> +#define INITRD_LOAD_ADDR        0x02800000
> +#define FDT_ADDR                0x0f000000
> +#define FDT_MAX_SIZE            0x10000
> +
> +#define TIMEBASE_FREQ           512000000ULL
> +
> +static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
> +                              const char *cpu_model, CPUState *envs[],
> +                              target_phys_addr_t initrd_base,
> +                              target_phys_addr_t initrd_size,
> +                              const char *kernel_cmdline)
> +{
> +    void *fdt;
> +    uint64_t mem_reg_property[] = { 0, cpu_to_be64(ramsize) };
> +    uint32_t start_prop = cpu_to_be32(initrd_base);
> +    uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
> +    int i;
> +    char *modelname;
> +
> +#define _FDT(exp) \
> +    do { \
> +        int ret = (exp);                                           \
> +        if (ret < 0) {                                             \
> +            hw_error("qemu: error creating device tree: %s: %s\n", \
> +                     #exp, fdt_strerror(ret));                     \
> +            return NULL;                                           \
> +        }                                                          \
> +    } while (0)
> +
> +    fdt = qemu_mallocz(FDT_MAX_SIZE);
> +    _FDT((fdt_create(fdt, FDT_MAX_SIZE)));
> +    
> +    _FDT((fdt_finish_reservemap(fdt)));
> +
> +    /* Root node */
> +    _FDT((fdt_begin_node(fdt, "")));
> +    _FDT((fdt_property_string(fdt, "device_type", "chrp")));
> +    _FDT((fdt_property_string(fdt, "model", "qemu,emulated-pSeries-LPAR")));
> +
> +    _FDT((fdt_property_cell(fdt, "#address-cells", 0x2)));
> +    _FDT((fdt_property_cell(fdt, "#size-cells", 0x2)));
> +
> +    /* /chosen */
> +    _FDT((fdt_begin_node(fdt, "chosen")));
> +
> +    _FDT((fdt_property_string(fdt, "bootargs", kernel_cmdline)));
> +    _FDT((fdt_property(fdt, "linux,initrd-start", &start_prop, sizeof(start_prop))));
> +    _FDT((fdt_property(fdt, "linux,initrd-end", &end_prop, sizeof(end_prop))));
> +    
> +    _FDT((fdt_end_node(fdt)));
> +
> +    /* memory node */
> +    _FDT((fdt_begin_node(fdt, "memory@0")));
> +
> +    _FDT((fdt_property_string(fdt, "device_type", "memory")));
> +    _FDT((fdt_property(fdt, "reg", mem_reg_property, sizeof(mem_reg_property))));
> +    
> +    _FDT((fdt_end_node(fdt)));
> +    
> +    /* cpus */
> +    _FDT((fdt_begin_node(fdt, "cpus")));
> +
> +    _FDT((fdt_property_cell(fdt, "#address-cells", 0x1)));
> +    _FDT((fdt_property_cell(fdt, "#size-cells", 0x0)));
> +
> +    modelname = qemu_strdup(cpu_model);
> +    
> +    for (i = 0; i < strlen(modelname); i++)

Braces

> +        modelname[i] = toupper(modelname[i]);
> +
> +    for (i = 0; i < smp_cpus; i++) {
> +        CPUState *env = envs[i];
> +        char *nodename;
> +        uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
> +                           0xffffffff, 0xffffffff};
> +
> +        if (asprintf(&nodename, "%s@%x", modelname, i) < 0) {
> +            fprintf(stderr, "Allocation failure\n");
> +            exit(1);
> +        }
> +
> +        _FDT((fdt_begin_node(fdt, nodename)));
> +
> +        free(nodename);
> +
> +        _FDT((fdt_property_cell(fdt, "reg", i)));
> +        _FDT((fdt_property_string(fdt, "device_type", "cpu")));
> +
> +        _FDT((fdt_property_cell(fdt, "cpu-version", env->spr[SPR_PVR])));
> +        _FDT((fdt_property_cell(fdt, "dcache-block-size", env->dcache_line_size)));
> +        _FDT((fdt_property_cell(fdt, "icache-block-size", env->icache_line_size)));
> +        _FDT((fdt_property_cell(fdt, "timebase-frequency", TIMEBASE_FREQ)));
> +        /* Hardcode CPU frequency for now.  It's kind of arbitrary on
> +         * full emu, for kvm we should copy it from the host */
> +        _FDT((fdt_property_cell(fdt, "clock-frequency", 1000000000)));
> +        _FDT((fdt_property_cell(fdt, "ibm,slb-size", env->slb_nr)));
> +        _FDT((fdt_property_string(fdt, "status", "okay")));
> +        _FDT((fdt_property(fdt, "64-bit", NULL, 0)));
> +
> +        if (envs[i]->mmu_model & POWERPC_MMU_1TSEG)
> +            _FDT((fdt_property(fdt, "ibm,processor-segment-sizes",
> +                               segs, sizeof(segs))));
> +
> +        _FDT((fdt_end_node(fdt)));
> +    }
> +
> +    qemu_free(modelname);
> +
> +    _FDT((fdt_end_node(fdt)));
> +
> +    _FDT((fdt_end_node(fdt))); /* close root node */
> +    _FDT((fdt_finish(fdt)));
> +
> +    if (fdt_size)

Braces

> +        *fdt_size = fdt_totalsize(fdt);
> +
> +    return fdt;
> +}
> +
> +static uint64_t translate_kernel_address(void *opaque, uint64_t addr)
> +{
> +    return (addr & 0x0fffffff) + KERNEL_LOAD_ADDR;
> +}
> +
> +static void emulate_spapr_hypercall(CPUState *env, void *opaque)
> +{
> +    env->gpr[3] = spapr_hypercall(env, (sPAPREnvironment *)opaque,
> +                                  env->gpr[3], &env->gpr[4]);
> +}
> +
> +/* pSeries LPAR / sPAPR hardware init */
> +static void ppc_spapr_init (ram_addr_t ram_size,
> +                             const char *boot_device,
> +                             const char *kernel_filename,
> +                             const char *kernel_cmdline,
> +                             const char *initrd_filename,
> +                             const char *cpu_model)
> +{
> +    CPUState *env = NULL;
> +    void *fdt;
> +    int i;
> +    ram_addr_t ram_offset;
> +    uint32_t kernel_base, initrd_base;
> +    long kernel_size, initrd_size;
> +    int fdt_size;
> +    sPAPREnvironment *spapr;

Not sure this complies with CODING_STYLE. I don't care - but Blue does care a lot. So let's better ask him.

> +
> +    spapr = qemu_malloc(sizeof(*spapr));
> +
> +    /* init CPUs */
> +    if (cpu_model == NULL)

Braces

> +        cpu_model = "POWER7";
> +    for (i = 0; i < smp_cpus; i++) {
> +        env = cpu_init(cpu_model);
> +        if (!env) {
> +            fprintf(stderr, "Unable to find PowerPC CPU definition\n");
> +            exit(1);
> +        }
> +        /* Set time-base frequency to 512 MHz */
> +        cpu_ppc_tb_init(env, TIMEBASE_FREQ);
> +        qemu_register_reset((QEMUResetHandler*)&cpu_reset, env);
> +
> +        env->emulate_hypercall = emulate_spapr_hypercall;
> +        env->hcall_opaque = spapr;
> +    }
> +
> +    /* allocate RAM */
> +    ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", ram_size);
> +    cpu_register_physical_memory(0, ram_size, ram_offset);
> +
> +    if (kernel_filename) {
> +        uint64_t lowaddr = 0;
> +
> +        kernel_base = KERNEL_LOAD_ADDR;
> +
> +        kernel_size = load_elf(kernel_filename, translate_kernel_address, NULL,
> +                               NULL, &lowaddr, NULL, 1, ELF_MACHINE, 0);
> +        if (kernel_size < 0)

Braces

> +            kernel_size = load_image_targphys(kernel_filename, kernel_base,
> +                                              ram_size - kernel_base);

Are you sure you want this? Are there any non-elf kernels for this platform?

> +        if (kernel_size < 0) {
> +            hw_error("qemu: could not load kernel '%s'\n", kernel_filename);
> +            exit(1);
> +        }
> +
> +        /* load initrd */
> +        if (initrd_filename) {
> +            initrd_base = INITRD_LOAD_ADDR;
> +            initrd_size = load_image_targphys(initrd_filename, initrd_base,
> +                                              ram_size - initrd_base);
> +            if (initrd_size < 0) {
> +                hw_error("qemu: could not load initial ram disk '%s'\n",
> +                         initrd_filename);
> +                exit(1);
> +            }
> +        } else {
> +            initrd_base = 0;
> +            initrd_size = 0;
> +        }
> +
> +        /* load fdt */
> +        fdt = spapr_create_fdt(&fdt_size, ram_size, cpu_model, &env,
> +                               initrd_base, initrd_size,
> +                               kernel_cmdline);
> +        if (!fdt) {
> +            hw_error("Couldn't create pSeries device tree\n");
> +            exit(1);
> +        }
> +
> +        cpu_physical_memory_write(FDT_ADDR, fdt, fdt_size);

Free the fdt stuff again?

> +
> +        env->gpr[3] = FDT_ADDR;
> +        env->gpr[5] = 0;
> +        env->hreset_vector = kernel_base;
> +        env->hreset_excp_prefix = 0;
> +    } else {
> +        fprintf(stderr, "pSeries machine needs -kernel for now");
> +        exit(1);
> +    }
> +}
> +
> +static QEMUMachine spapr_machine = {
> +    .name = "pseries",
> +    .desc = "pSeries Logical Partition (PAPR compliant)",
> +    .init = ppc_spapr_init,
> +    .max_cpus = 1,
> +    .no_vga = 1,
> +    .no_parallel = 1,
> +};
> +
> +static void spapr_machine_init(void)
> +{
> +    qemu_register_machine(&spapr_machine);
> +}
> +
> +machine_init(spapr_machine_init);
> diff --git a/hw/spapr.h b/hw/spapr.h
> new file mode 100644
> index 0000000..dae9617
> --- /dev/null
> +++ b/hw/spapr.h
> @@ -0,0 +1,240 @@
> +#if !defined (__HW_SPAPR_H__)
> +#define __HW_SPAPR_H__
> +
> +typedef struct sPAPREnvironment {
> +} sPAPREnvironment;
> +
> +#define H_SUCCESS         0
> +#define H_BUSY            1        /* Hardware busy -- retry later */
> +#define H_CLOSED          2        /* Resource closed */
> +#define H_NOT_AVAILABLE   3
> +#define H_CONSTRAINED     4        /* Resource request constrained to max allowed */
> +#define H_PARTIAL         5
> +#define H_IN_PROGRESS     14       /* Kind of like busy */
> +#define H_PAGE_REGISTERED 15
> +#define H_PARTIAL_STORE   16
> +#define H_PENDING         17       /* returned from H_POLL_PENDING */
> +#define H_CONTINUE        18       /* Returned from H_Join on success */
> +#define H_LONG_BUSY_START_RANGE         9900  /* Start of long busy range */
> +#define H_LONG_BUSY_ORDER_1_MSEC        9900  /* Long busy, hint that 1msec \
> +                                                 is a good time to retry */
> +#define H_LONG_BUSY_ORDER_10_MSEC       9901  /* Long busy, hint that 10msec \
> +                                                 is a good time to retry */
> +#define H_LONG_BUSY_ORDER_100_MSEC      9902  /* Long busy, hint that 100msec \
> +                                                 is a good time to retry */
> +#define H_LONG_BUSY_ORDER_1_SEC         9903  /* Long busy, hint that 1sec \
> +                                                 is a good time to retry */
> +#define H_LONG_BUSY_ORDER_10_SEC        9904  /* Long busy, hint that 10sec \
> +                                                 is a good time to retry */
> +#define H_LONG_BUSY_ORDER_100_SEC       9905  /* Long busy, hint that 100sec \
> +                                                 is a good time to retry */
> +#define H_LONG_BUSY_END_RANGE           9905  /* End of long busy range */
> +#define H_HARDWARE        -1       /* Hardware error */
> +#define H_FUNCTION        -2       /* Function not supported */
> +#define H_PRIVILEGE       -3       /* Caller not privileged */
> +#define H_PARAMETER       -4       /* Parameter invalid, out-of-range or conflicting */
> +#define H_BAD_MODE        -5       /* Illegal msr value */
> +#define H_PTEG_FULL       -6       /* PTEG is full */
> +#define H_NOT_FOUND       -7       /* PTE was not found" */
> +#define H_RESERVED_DABR   -8       /* DABR address is reserved by the hypervisor on this processor" */
> +#define H_NO_MEM          -9
> +#define H_AUTHORITY       -10
> +#define H_PERMISSION      -11
> +#define H_DROPPED         -12
> +#define H_SOURCE_PARM     -13
> +#define H_DEST_PARM       -14
> +#define H_REMOTE_PARM     -15
> +#define H_RESOURCE        -16
> +#define H_ADAPTER_PARM    -17
> +#define H_RH_PARM         -18
> +#define H_RCQ_PARM        -19
> +#define H_SCQ_PARM        -20
> +#define H_EQ_PARM         -21
> +#define H_RT_PARM         -22
> +#define H_ST_PARM         -23
> +#define H_SIGT_PARM       -24
> +#define H_TOKEN_PARM      -25
> +#define H_MLENGTH_PARM    -27
> +#define H_MEM_PARM        -28
> +#define H_MEM_ACCESS_PARM -29
> +#define H_ATTR_PARM       -30
> +#define H_PORT_PARM       -31
> +#define H_MCG_PARM        -32
> +#define H_VL_PARM         -33
> +#define H_TSIZE_PARM      -34
> +#define H_TRACE_PARM      -35
> +
> +#define H_MASK_PARM       -37
> +#define H_MCG_FULL        -38
> +#define H_ALIAS_EXIST     -39
> +#define H_P_COUNTER       -40
> +#define H_TABLE_FULL      -41
> +#define H_ALT_TABLE       -42
> +#define H_MR_CONDITION    -43
> +#define H_NOT_ENOUGH_RESOURCES -44
> +#define H_R_STATE         -45
> +#define H_RESCINDEND      -46
> +#define H_MULTI_THREADS_ACTIVE -9005
> +
> +
> +/* Long Busy is a condition that can be returned by the firmware
> + * when a call cannot be completed now, but the identical call
> + * should be retried later.  This prevents calls blocking in the
> + * firmware for long periods of time.  Annoyingly the firmware can return
> + * a range of return codes, hinting at how long we should wait before
> + * retrying.  If you don't care for the hint, the macro below is a good
> + * way to check for the long_busy return codes
> + */
> +#define H_IS_LONG_BUSY(x)  ((x >= H_LONG_BUSY_START_RANGE) \
> +                            && (x <= H_LONG_BUSY_END_RANGE))
> +
> +/* Flags */
> +#define H_LARGE_PAGE      (1ULL<<(63-16))
> +#define H_EXACT           (1ULL<<(63-24))       /* Use exact PTE or return H_PTEG_FULL */
> +#define H_R_XLATE         (1ULL<<(63-25))       /* include a valid logical page num in the pte if the valid bit is set */
> +#define H_READ_4          (1ULL<<(63-26))       /* Return 4 PTEs */
> +#define H_PAGE_STATE_CHANGE (1ULL<<(63-28))
> +#define H_PAGE_UNUSED     ((1ULL<<(63-29)) | (1ULL<<(63-30)))
> +#define H_PAGE_SET_UNUSED (H_PAGE_STATE_CHANGE | H_PAGE_UNUSED)
> +#define H_PAGE_SET_LOANED (H_PAGE_SET_UNUSED | (1ULL<<(63-31)))
> +#define H_PAGE_SET_ACTIVE H_PAGE_STATE_CHANGE
> +#define H_AVPN            (1ULL<<(63-32))       /* An avpn is provided as a sanity test */
> +#define H_ANDCOND         (1ULL<<(63-33))
> +#define H_ICACHE_INVALIDATE (1ULL<<(63-40))     /* icbi, etc.  (ignored for IO pages) */
> +#define H_ICACHE_SYNCHRONIZE (1ULL<<(63-41))    /* dcbst, icbi, etc (ignored for IO pages */
> +#define H_ZERO_PAGE       (1ULL<<(63-48))       /* zero the page before mapping (ignored for IO pages) */
> +#define H_COPY_PAGE       (1ULL<<(63-49))
> +#define H_N               (1ULL<<(63-61))
> +#define H_PP1             (1ULL<<(63-62))
> +#define H_PP2             (1ULL<<(63-63))
> +
> +/* VASI States */
> +#define H_VASI_INVALID    0
> +#define H_VASI_ENABLED    1
> +#define H_VASI_ABORTED    2
> +#define H_VASI_SUSPENDING 3
> +#define H_VASI_SUSPENDED  4
> +#define H_VASI_RESUMED    5
> +#define H_VASI_COMPLETED  6
> +
> +/* DABRX flags */
> +#define H_DABRX_HYPERVISOR (1ULL<<(63-61))
> +#define H_DABRX_KERNEL     (1ULL<<(63-62))
> +#define H_DABRX_USER       (1ULL<<(63-63))
> +
> +/* Each control block has to be on a 4K bondary */
> +#define H_CB_ALIGNMENT     4096
> +
> +/* pSeries hypervisor opcodes */
> +#define H_REMOVE                0x04
> +#define H_ENTER                 0x08
> +#define H_READ                  0x0c
> +#define H_CLEAR_MOD             0x10
> +#define H_CLEAR_REF             0x14
> +#define H_PROTECT               0x18
> +#define H_GET_TCE               0x1c
> +#define H_PUT_TCE               0x20
> +#define H_SET_SPRG0             0x24
> +#define H_SET_DABR              0x28
> +#define H_PAGE_INIT             0x2c
> +#define H_SET_ASR               0x30
> +#define H_ASR_ON                0x34
> +#define H_ASR_OFF               0x38
> +#define H_LOGICAL_CI_LOAD       0x3c
> +#define H_LOGICAL_CI_STORE      0x40
> +#define H_LOGICAL_CACHE_LOAD    0x44
> +#define H_LOGICAL_CACHE_STORE   0x48
> +#define H_LOGICAL_ICBI          0x4c
> +#define H_LOGICAL_DCBF          0x50
> +#define H_GET_TERM_CHAR         0x54
> +#define H_PUT_TERM_CHAR         0x58
> +#define H_REAL_TO_LOGICAL       0x5c
> +#define H_HYPERVISOR_DATA       0x60
> +#define H_EOI                   0x64
> +#define H_CPPR                  0x68
> +#define H_IPI                   0x6c
> +#define H_IPOLL                 0x70
> +#define H_XIRR                  0x74
> +#define H_PERFMON               0x7c
> +#define H_MIGRATE_DMA           0x78
> +#define H_REGISTER_VPA          0xDC
> +#define H_CEDE                  0xE0
> +#define H_CONFER                0xE4
> +#define H_PROD                  0xE8
> +#define H_GET_PPP               0xEC
> +#define H_SET_PPP               0xF0
> +#define H_PURR                  0xF4
> +#define H_PIC                   0xF8
> +#define H_REG_CRQ               0xFC
> +#define H_FREE_CRQ              0x100
> +#define H_VIO_SIGNAL            0x104
> +#define H_SEND_CRQ              0x108
> +#define H_COPY_RDMA             0x110
> +#define H_REGISTER_LOGICAL_LAN  0x114
> +#define H_FREE_LOGICAL_LAN      0x118
> +#define H_ADD_LOGICAL_LAN_BUFFER 0x11C
> +#define H_SEND_LOGICAL_LAN      0x120
> +#define H_BULK_REMOVE           0x124
> +#define H_MULTICAST_CTRL        0x130
> +#define H_SET_XDABR             0x134
> +#define H_STUFF_TCE             0x138
> +#define H_PUT_TCE_INDIRECT      0x13C
> +#define H_CHANGE_LOGICAL_LAN_MAC 0x14C
> +#define H_VTERM_PARTNER_INFO    0x150
> +#define H_REGISTER_VTERM        0x154
> +#define H_FREE_VTERM            0x158
> +#define H_RESET_EVENTS          0x15C
> +#define H_ALLOC_RESOURCE        0x160
> +#define H_FREE_RESOURCE         0x164
> +#define H_MODIFY_QP             0x168
> +#define H_QUERY_QP              0x16C
> +#define H_REREGISTER_PMR        0x170
> +#define H_REGISTER_SMR          0x174
> +#define H_QUERY_MR              0x178
> +#define H_QUERY_MW              0x17C
> +#define H_QUERY_HCA             0x180
> +#define H_QUERY_PORT            0x184
> +#define H_MODIFY_PORT           0x188
> +#define H_DEFINE_AQP1           0x18C
> +#define H_GET_TRACE_BUFFER      0x190
> +#define H_DEFINE_AQP0           0x194
> +#define H_RESIZE_MR             0x198
> +#define H_ATTACH_MCQP           0x19C
> +#define H_DETACH_MCQP           0x1A0
> +#define H_CREATE_RPT            0x1A4
> +#define H_REMOVE_RPT            0x1A8
> +#define H_REGISTER_RPAGES       0x1AC
> +#define H_DISABLE_AND_GETC      0x1B0
> +#define H_ERROR_DATA            0x1B4
> +#define H_GET_HCA_INFO          0x1B8
> +#define H_GET_PERF_COUNT        0x1BC
> +#define H_MANAGE_TRACE          0x1C0
> +#define H_FREE_LOGICAL_LAN_BUFFER 0x1D4
> +#define H_QUERY_INT_STATE       0x1E4
> +#define H_POLL_PENDING          0x1D8
> +#define H_ILLAN_ATTRIBUTES      0x244
> +#define H_MODIFY_HEA_QP         0x250
> +#define H_QUERY_HEA_QP          0x254
> +#define H_QUERY_HEA             0x258
> +#define H_QUERY_HEA_PORT        0x25C
> +#define H_MODIFY_HEA_PORT       0x260
> +#define H_REG_BCMC              0x264
> +#define H_DEREG_BCMC            0x268
> +#define H_REGISTER_HEA_RPAGES   0x26C
> +#define H_DISABLE_AND_GET_HEA   0x270
> +#define H_GET_HEA_INFO          0x274
> +#define H_ALLOC_HEA_RESOURCE    0x278
> +#define H_ADD_CONN              0x284
> +#define H_DEL_CONN              0x288
> +#define H_JOIN                  0x298
> +#define H_VASI_STATE            0x2A4
> +#define H_ENABLE_CRQ            0x2B0
> +#define H_GET_EM_PARMS          0x2B8
> +#define H_SET_MPP               0x2D0
> +#define H_GET_MPP               0x2D4
> +#define MAX_HCALL_OPCODE        H_GET_MPP
> +
> +target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
> +                             target_ulong token, target_ulong *args);
> +
> +#endif /* !defined (__HW_SPAPR_H__) */
> diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c
> new file mode 100644
> index 0000000..c99c345
> --- /dev/null
> +++ b/hw/spapr_hcall.c
> @@ -0,0 +1,40 @@
> +#include "sysemu.h"
> +#include "cpu.h"
> +#include "qemu-char.h"
> +#include "hw/spapr.h"
> +
> +static target_ulong h_put_term_char(target_ulong termno, target_ulong len,
> +                                    target_ulong char0_7, target_ulong char8_15)
> +{
> +    uint8_t buf[16];
> +
> +    *((uint64_t *)buf) = cpu_to_be64(char0_7);

stq_p

> +    *((uint64_t *)buf + 1) = cpu_to_be64(char8_15);

stq_p(buf + 8)

At least I hope those work O_o.


Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 14/15] Start implementing pSeries logical partition machine
  2011-02-12 16:23   ` [Qemu-devel] " Alexander Graf
@ 2011-02-12 16:40     ` Blue Swirl
  2011-02-12 20:54       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 73+ messages in thread
From: Blue Swirl @ 2011-02-12 16:40 UTC (permalink / raw
  To: Alexander Graf
  Cc: Paul Mackerras, qemu-devel@nongnu.org List, anton, David Gibson

On Sat, Feb 12, 2011 at 6:23 PM, Alexander Graf <agraf@suse.de> wrote:
>
> On 12.02.2011, at 15:54, David Gibson wrote:
>
>> This patch adds a "pseries" machine to qemu.  This aims to emulate a
>> logical partition on an IBM pSeries machine, compliant to the
>> "PowerPC Architecture Platform Requirements" (PAPR) document.
>>
>> This initial version is quite limited, it implements a basic machine
>> and PAPR hypercall emulation.  So far only one hypercall is present -
>> H_PUT_TERM_CHAR - so that a (write-only) console is available.
>>
>> The machine so far more resembles an old POWER4 style "full system
>> partition" rather than a modern LPAR, in that the guest manages the
>> page tables directly, rather than via hypercalls.
>>
>> The machine requires qemu to be configured with --enable-fdt.  The
>> machine can (so far) only be booted with -kernel - i.e. no partition
>> firmware is provided.
>>
>> Signed-off-by: David Gibson <dwg@au1.ibm.com>
>> ---
>> Makefile.target  |    2 +
>> hw/spapr.c       |  279 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> hw/spapr.h       |  240 ++++++++++++++++++++++++++++++++++++++++++++++
>> hw/spapr_hcall.c |   40 ++++++++
>> 4 files changed, 561 insertions(+), 0 deletions(-)
>> create mode 100644 hw/spapr.c
>> create mode 100644 hw/spapr.h
>> create mode 100644 hw/spapr_hcall.c
>>
>> diff --git a/Makefile.target b/Makefile.target
>> index 48e6c00..e0796ba 100644
>> --- a/Makefile.target
>> +++ b/Makefile.target
>> @@ -231,6 +231,8 @@ obj-ppc-y += ppc_prep.o
>> obj-ppc-y += ppc_oldworld.o
>> # NewWorld PowerMac
>> obj-ppc-y += ppc_newworld.o
>> +# IBM pSeries (sPAPR)
>> +obj-ppc-y += spapr.o spapr_hcall.o
>> # PowerPC 4xx boards
>> obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
>> obj-ppc-y += ppc440.o ppc440_bamboo.o
>> diff --git a/hw/spapr.c b/hw/spapr.c
>> new file mode 100644
>> index 0000000..8aca4e0
>> --- /dev/null
>> +++ b/hw/spapr.c
>> @@ -0,0 +1,279 @@
>> +/*
>> + * QEMU PowerPC pSeries Logical Partition (aka sPAPR) hardware System Emulator
>> + *
>> + * Copyright (c) 2004-2007 Fabrice Bellard
>> + * Copyright (c) 2007 Jocelyn Mayer
>> + * Copyright (c) 2010 David Gibson, IBM Corporation.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>> + * of this software and associated documentation files (the "Software"), to deal
>> + * in the Software without restriction, including without limitation the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> + * THE SOFTWARE.
>> + *
>> + */
>> +#include "hw.h"
>> +#include "ppc.h"
>> +#include "pc.h"
>> +#include "sysemu.h"
>> +#include "boards.h"
>> +#include "fw_cfg.h"
>> +#include "loader.h"
>> +#include "elf.h"
>> +#include "kvm.h"
>> +#include "kvm_ppc.h"
>> +#include "net.h"
>> +#include "blockdev.h"
>> +#include "hw/spapr.h"
>> +
>> +#include <libfdt.h>
>> +
>> +#define KERNEL_LOAD_ADDR        0x00000000
>> +#define INITRD_LOAD_ADDR        0x02800000
>> +#define FDT_ADDR                0x0f000000
>> +#define FDT_MAX_SIZE            0x10000
>> +
>> +#define TIMEBASE_FREQ           512000000ULL
>> +
>> +static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>> +                              const char *cpu_model, CPUState *envs[],
>> +                              target_phys_addr_t initrd_base,
>> +                              target_phys_addr_t initrd_size,
>> +                              const char *kernel_cmdline)
>> +{
>> +    void *fdt;
>> +    uint64_t mem_reg_property[] = { 0, cpu_to_be64(ramsize) };
>> +    uint32_t start_prop = cpu_to_be32(initrd_base);
>> +    uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
>> +    int i;
>> +    char *modelname;
>> +
>> +#define _FDT(exp) \
>> +    do { \
>> +        int ret = (exp);                                           \
>> +        if (ret < 0) {                                             \
>> +            hw_error("qemu: error creating device tree: %s: %s\n", \
>> +                     #exp, fdt_strerror(ret));                     \
>> +            return NULL;                                           \
>> +        }                                                          \
>> +    } while (0)
>> +
>> +    fdt = qemu_mallocz(FDT_MAX_SIZE);
>> +    _FDT((fdt_create(fdt, FDT_MAX_SIZE)));
>> +
>> +    _FDT((fdt_finish_reservemap(fdt)));
>> +
>> +    /* Root node */
>> +    _FDT((fdt_begin_node(fdt, "")));
>> +    _FDT((fdt_property_string(fdt, "device_type", "chrp")));
>> +    _FDT((fdt_property_string(fdt, "model", "qemu,emulated-pSeries-LPAR")));
>> +
>> +    _FDT((fdt_property_cell(fdt, "#address-cells", 0x2)));
>> +    _FDT((fdt_property_cell(fdt, "#size-cells", 0x2)));
>> +
>> +    /* /chosen */
>> +    _FDT((fdt_begin_node(fdt, "chosen")));
>> +
>> +    _FDT((fdt_property_string(fdt, "bootargs", kernel_cmdline)));
>> +    _FDT((fdt_property(fdt, "linux,initrd-start", &start_prop, sizeof(start_prop))));
>> +    _FDT((fdt_property(fdt, "linux,initrd-end", &end_prop, sizeof(end_prop))));
>> +
>> +    _FDT((fdt_end_node(fdt)));
>> +
>> +    /* memory node */
>> +    _FDT((fdt_begin_node(fdt, "memory@0")));
>> +
>> +    _FDT((fdt_property_string(fdt, "device_type", "memory")));
>> +    _FDT((fdt_property(fdt, "reg", mem_reg_property, sizeof(mem_reg_property))));
>> +
>> +    _FDT((fdt_end_node(fdt)));
>> +
>> +    /* cpus */
>> +    _FDT((fdt_begin_node(fdt, "cpus")));
>> +
>> +    _FDT((fdt_property_cell(fdt, "#address-cells", 0x1)));
>> +    _FDT((fdt_property_cell(fdt, "#size-cells", 0x0)));
>> +
>> +    modelname = qemu_strdup(cpu_model);
>> +
>> +    for (i = 0; i < strlen(modelname); i++)
>
> Braces
>
>> +        modelname[i] = toupper(modelname[i]);
>> +
>> +    for (i = 0; i < smp_cpus; i++) {
>> +        CPUState *env = envs[i];
>> +        char *nodename;
>> +        uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
>> +                           0xffffffff, 0xffffffff};
>> +
>> +        if (asprintf(&nodename, "%s@%x", modelname, i) < 0) {
>> +            fprintf(stderr, "Allocation failure\n");
>> +            exit(1);
>> +        }
>> +
>> +        _FDT((fdt_begin_node(fdt, nodename)));
>> +
>> +        free(nodename);
>> +
>> +        _FDT((fdt_property_cell(fdt, "reg", i)));
>> +        _FDT((fdt_property_string(fdt, "device_type", "cpu")));
>> +
>> +        _FDT((fdt_property_cell(fdt, "cpu-version", env->spr[SPR_PVR])));
>> +        _FDT((fdt_property_cell(fdt, "dcache-block-size", env->dcache_line_size)));
>> +        _FDT((fdt_property_cell(fdt, "icache-block-size", env->icache_line_size)));
>> +        _FDT((fdt_property_cell(fdt, "timebase-frequency", TIMEBASE_FREQ)));
>> +        /* Hardcode CPU frequency for now.  It's kind of arbitrary on
>> +         * full emu, for kvm we should copy it from the host */
>> +        _FDT((fdt_property_cell(fdt, "clock-frequency", 1000000000)));
>> +        _FDT((fdt_property_cell(fdt, "ibm,slb-size", env->slb_nr)));
>> +        _FDT((fdt_property_string(fdt, "status", "okay")));
>> +        _FDT((fdt_property(fdt, "64-bit", NULL, 0)));
>> +
>> +        if (envs[i]->mmu_model & POWERPC_MMU_1TSEG)
>> +            _FDT((fdt_property(fdt, "ibm,processor-segment-sizes",
>> +                               segs, sizeof(segs))));
>> +
>> +        _FDT((fdt_end_node(fdt)));
>> +    }
>> +
>> +    qemu_free(modelname);
>> +
>> +    _FDT((fdt_end_node(fdt)));
>> +
>> +    _FDT((fdt_end_node(fdt))); /* close root node */
>> +    _FDT((fdt_finish(fdt)));
>> +
>> +    if (fdt_size)
>
> Braces
>
>> +        *fdt_size = fdt_totalsize(fdt);
>> +
>> +    return fdt;
>> +}
>> +
>> +static uint64_t translate_kernel_address(void *opaque, uint64_t addr)
>> +{
>> +    return (addr & 0x0fffffff) + KERNEL_LOAD_ADDR;
>> +}
>> +
>> +static void emulate_spapr_hypercall(CPUState *env, void *opaque)
>> +{
>> +    env->gpr[3] = spapr_hypercall(env, (sPAPREnvironment *)opaque,
>> +                                  env->gpr[3], &env->gpr[4]);
>> +}
>> +
>> +/* pSeries LPAR / sPAPR hardware init */
>> +static void ppc_spapr_init (ram_addr_t ram_size,
>> +                             const char *boot_device,
>> +                             const char *kernel_filename,
>> +                             const char *kernel_cmdline,
>> +                             const char *initrd_filename,
>> +                             const char *cpu_model)
>> +{
>> +    CPUState *env = NULL;
>> +    void *fdt;
>> +    int i;
>> +    ram_addr_t ram_offset;
>> +    uint32_t kernel_base, initrd_base;
>> +    long kernel_size, initrd_size;
>> +    int fdt_size;
>> +    sPAPREnvironment *spapr;
>
> Not sure this complies with CODING_STYLE. I don't care - but Blue does care a lot. So let's better ask him.

sPAPREnvironment has a certain aroma reminding of aHungarian
nNotation, but otherwise the bouquet is entirely passable.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 15/15] Implement the bus structure for PAPR virtual IO David Gibson
@ 2011-02-12 16:47   ` Alexander Graf
  2011-02-12 16:59     ` Blue Swirl
  2011-02-13 11:09     ` David Gibson
  0 siblings, 2 replies; 73+ messages in thread
From: Alexander Graf @ 2011-02-12 16:47 UTC (permalink / raw
  To: David Gibson
  Cc: Blue Swirl, Paul Mackerras, qemu-devel@nongnu.org List, anton


On 12.02.2011, at 15:54, David Gibson wrote:

> This extends the "pseries" (PAPR) machine to include a virtual IO bus
> supporting the PAPR defined hypercall based virtual IO mechanisms.
> 
> So far only one VIO device is provided, the vty / vterm, providing
> a full console (polled only, for now).
> 
> Signed-off-by: David Gibson <dwg@au1.ibm.com>
> ---
> Makefile.target  |    3 +-
> hw/spapr.c       |   31 +++++++++-
> hw/spapr.h       |   10 +++
> hw/spapr_hcall.c |   19 ++----
> hw/spapr_vio.c   |  191 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> hw/spapr_vio.h   |   49 ++++++++++++++
> hw/spapr_vty.c   |  132 +++++++++++++++++++++++++++++++++++++
> 7 files changed, 419 insertions(+), 16 deletions(-)
> create mode 100644 hw/spapr_vio.c
> create mode 100644 hw/spapr_vio.h
> create mode 100644 hw/spapr_vty.c
> 
> diff --git a/Makefile.target b/Makefile.target
> index e0796ba..fe232da 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -232,7 +232,8 @@ obj-ppc-y += ppc_oldworld.o
> # NewWorld PowerMac
> obj-ppc-y += ppc_newworld.o
> # IBM pSeries (sPAPR)
> -obj-ppc-y += spapr.o spapr_hcall.o
> +obj-ppc-y += spapr.o spapr_hcall.o spapr_vio.o
> +obj-ppc-y += spapr_vty.o
> # PowerPC 4xx boards
> obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
> obj-ppc-y += ppc440.o ppc440_bamboo.o
> diff --git a/hw/spapr.c b/hw/spapr.c
> index 8aca4e0..da61061 100644
> --- a/hw/spapr.c
> +++ b/hw/spapr.c
> @@ -37,6 +37,7 @@
> #include "net.h"
> #include "blockdev.h"
> #include "hw/spapr.h"
> +#include "hw/spapr_vio.h"
> 
> #include <libfdt.h>
> 
> @@ -49,6 +50,7 @@
> 
> static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>                               const char *cpu_model, CPUState *envs[],
> +                              sPAPREnvironment *spapr,
>                               target_phys_addr_t initrd_base,
>                               target_phys_addr_t initrd_size,
>                               const char *kernel_cmdline)
> @@ -59,6 +61,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>     uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
>     int i;
>     char *modelname;
> +    int ret;
> 
> #define _FDT(exp) \
>     do { \
> @@ -151,9 +154,28 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
> 
>     _FDT((fdt_end_node(fdt)));
> 
> +    /* vdevice */
> +    _FDT((fdt_begin_node(fdt, "vdevice")));
> +
> +    _FDT((fdt_property_string(fdt, "device_type", "vdevice")));
> +    _FDT((fdt_property_string(fdt, "compatible", "IBM,vdevice")));
> +    _FDT((fdt_property_cell(fdt, "#address-cells", 0x1)));
> +    _FDT((fdt_property_cell(fdt, "#size-cells", 0x0)));
> +    
> +    _FDT((fdt_end_node(fdt)));
> +
>     _FDT((fdt_end_node(fdt))); /* close root node */
>     _FDT((fdt_finish(fdt)));
> 
> +    /* re-expand to allow for further tweaks */
> +    _FDT((fdt_open_into(fdt, fdt, FDT_MAX_SIZE)));
> +
> +    ret = spapr_populate_vdevice(spapr->vio_bus, fdt);
> +    if (ret < 0)

Braces..

> +        fprintf(stderr, "couldn't setup vio devices in fdt\n");
> +
> +    _FDT((fdt_pack(fdt)));
> +
>     if (fdt_size)
>         *fdt_size = fdt_totalsize(fdt);
> 
> @@ -211,6 +233,12 @@ static void ppc_spapr_init (ram_addr_t ram_size,
>     ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", ram_size);
>     cpu_register_physical_memory(0, ram_size, ram_offset);
> 
> +    spapr->vio_bus = spapr_vio_bus_init();
> +
> +    for (i = 0; i < MAX_SERIAL_PORTS; i++)

Braces..

> +        if (serial_hds[i])

Braces..

> +            spapr_vty_create(spapr->vio_bus, i, serial_hds[i]);

There might be a qdev way to do this. Blue?

> +
>     if (kernel_filename) {
>         uint64_t lowaddr = 0;
> 
> @@ -242,7 +270,7 @@ static void ppc_spapr_init (ram_addr_t ram_size,
>         }
> 
>         /* load fdt */
> -        fdt = spapr_create_fdt(&fdt_size, ram_size, cpu_model, &env,
> +        fdt = spapr_create_fdt(&fdt_size, ram_size, cpu_model, &env, spapr,
>                                initrd_base, initrd_size,
>                                kernel_cmdline);
>         if (!fdt) {
> @@ -267,6 +295,7 @@ static QEMUMachine spapr_machine = {
>     .desc = "pSeries Logical Partition (PAPR compliant)",
>     .init = ppc_spapr_init,
>     .max_cpus = 1,
> +    .no_parallel = 1,

duplicate?

>     .no_vga = 1,
>     .no_parallel = 1,
> };
> diff --git a/hw/spapr.h b/hw/spapr.h
> index dae9617..168511f 100644
> --- a/hw/spapr.h
> +++ b/hw/spapr.h
> @@ -1,7 +1,10 @@
> #if !defined (__HW_SPAPR_H__)
> #define __HW_SPAPR_H__
> 
> +struct VIOsPAPRBus;
> +
> typedef struct sPAPREnvironment {
> +    struct VIOsPAPRBus *vio_bus;
> } sPAPREnvironment;
> 
> #define H_SUCCESS         0
> @@ -237,4 +240,11 @@ typedef struct sPAPREnvironment {
> target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
>                              target_ulong token, target_ulong *args);
> 
> +target_ulong h_put_term_char(sPAPREnvironment *spapr,
> +                             target_ulong termno, target_ulong len,
> +                             target_ulong char0_7, target_ulong char8_15);
> +target_ulong h_get_term_char(sPAPREnvironment *spapr,
> +                             target_ulong termno, target_ulong *len,
> +                             target_ulong *char0_7, target_ulong *char8_15);
> +
> #endif /* !defined (__HW_SPAPR_H__) */
> diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c
> index c99c345..e2ed9cf 100644
> --- a/hw/spapr_hcall.c
> +++ b/hw/spapr_hcall.c
> @@ -3,19 +3,6 @@
> #include "qemu-char.h"
> #include "hw/spapr.h"
> 
> -static target_ulong h_put_term_char(target_ulong termno, target_ulong len,
> -                                    target_ulong char0_7, target_ulong char8_15)
> -{
> -    uint8_t buf[16];
> -
> -    *((uint64_t *)buf) = cpu_to_be64(char0_7);
> -    *((uint64_t *)buf + 1) = cpu_to_be64(char8_15);
> -
> -    qemu_chr_write(serial_hds[0], buf, len);
> -
> -    return 0;
> -}
> -
> target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
>                              target_ulong token, target_ulong *args)
> {
> @@ -29,7 +16,11 @@ target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
> 
>     switch (token) {
>     case H_PUT_TERM_CHAR:
> -        r = h_put_term_char(args[0], args[1], args[2], args[3]);
> +        r = h_put_term_char(spapr, args[0], args[1], args[2], args[3]);
> +        break;
> +
> +    case H_GET_TERM_CHAR:
> +        r = h_get_term_char(spapr, args[0], &args[0], &args[1], &args[2]);

Slick and simple. Blue, do you think there's some random abstraction layer necessary?

>         break;
> 
>     default:
> diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
> new file mode 100644
> index 0000000..d9c7292
> --- /dev/null
> +++ b/hw/spapr_vio.c
> @@ -0,0 +1,191 @@
> +/*
> + * QEMU sPAPR VIO code
> + *
> + * Copyright (c) 2010 David Gibson, IBM Corporation <david@gibson.dropbear.id.au>
> + * Based on the s390 virtio bus code:
> + * Copyright (c) 2009 Alexander Graf <agraf@suse.de>
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "hw.h"
> +#include "sysemu.h"
> +#include "boards.h"
> +#include "monitor.h"
> +#include "loader.h"
> +#include "elf.h"
> +#include "hw/sysbus.h"
> +#include "kvm.h"
> +#include "device_tree.h"
> +
> +#include "hw/spapr.h"
> +#include "hw/spapr_vio.h"
> +
> +#ifdef CONFIG_FDT
> +#include <libfdt.h>
> +#endif /* CONFIG_FDT */
> +
> +/* #define DEBUG_SPAPR */
> +
> +#ifdef DEBUG_SPAPR
> +#define dprintf(fmt, ...) \
> +    do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
> +#else
> +#define dprintf(fmt, ...) \
> +    do { } while (0)
> +#endif
> +
> +struct BusInfo spapr_vio_bus_info = {
> +    .name       = "spapr-vio",
> +    .size       = sizeof(VIOsPAPRBus),
> +};
> +
> +VIOsPAPRDevice *spapr_vio_find_by_reg(VIOsPAPRBus *bus, uint32_t reg)
> +{
> +    DeviceState *qdev;
> +    VIOsPAPRDevice *dev = NULL;
> +
> +    QLIST_FOREACH(qdev, &bus->bus.children, sibling) {
> +        dev = (VIOsPAPRDevice *)qdev;
> +        if (dev->reg == reg)

Braces

> +            break;
> +    }
> +
> +    return dev;

What if the device doesn't exist?

> +}
> +
> +VIOsPAPRBus *spapr_vio_bus_init(void)
> +{
> +    VIOsPAPRBus *bus;
> +    BusState *_bus;
> +    DeviceState *dev;
> +
> +    /* Create bridge device */
> +    dev = qdev_create(NULL, "spapr-vio-bridge");
> +    qdev_init_nofail(dev);
> +
> +    /* Create bus on bridge device */
> +
> +    _bus = qbus_create(&spapr_vio_bus_info, dev, "spapr-vio");
> +    bus = DO_UPCAST(VIOsPAPRBus, bus, _bus);
> +
> +    return bus;
> +}
> +
> +#ifdef CONFIG_FDT
> +static int vio_make_devnode(VIOsPAPRDevice *dev,
> +                            void *fdt)
> +{
> +    VIOsPAPRDeviceInfo *info = (VIOsPAPRDeviceInfo *)dev->qdev.info;
> +    int vdevice_off, node_off;
> +    int ret;
> +
> +    vdevice_off = fdt_path_offset(fdt, "/vdevice");
> +    if (vdevice_off < 0)

Braces

> +        return vdevice_off;
> +
> +    node_off = fdt_add_subnode(fdt, vdevice_off, dev->qdev.id);
> +    if (node_off < 0)

Braces

> +        return node_off;
> +
> +    ret = fdt_setprop_cell(fdt, node_off, "reg", dev->reg);
> +    if (ret < 0)

Braces

> +        return ret;
> +
> +    if (info->dt_type) {
> +        ret = fdt_setprop_string(fdt, node_off, "device_type",
> +                                 info->dt_type);
> +        if (ret < 0)

Braces

I'll stop complaining about braces now. Please go through the patch yourself and just fix them up :)

> +            return ret;
> +    }
> +
> +    if (info->dt_compatible) {
> +        ret = fdt_setprop_string(fdt, node_off, "compatible",
> +                                 info->dt_compatible);
> +        if (ret < 0)
> +            return ret;
> +    }
> +
> +    if (info->devnode) {
> +        ret = (info->devnode)(dev, fdt, node_off);
> +        if (ret < 0)
> +            return ret;
> +    }
> +
> +    return node_off;
> +}
> +#endif /* CONFIG_FDT */
> +
> +static int spapr_vio_busdev_init(DeviceState *dev, DeviceInfo *info)
> +{
> +    VIOsPAPRDeviceInfo *_info = (VIOsPAPRDeviceInfo *)info;
> +    VIOsPAPRDevice *_dev = (VIOsPAPRDevice *)dev;
> +    char *id;
> +
> +    if (asprintf(&id, "%s@%x", _info->dt_name, _dev->reg) < 0)
> +        return -1;
> +
> +    _dev->qdev.id = id;
> +
> +    return _info->init(_dev);
> +}
> +
> +void spapr_vio_bus_register_withprop(VIOsPAPRDeviceInfo *info)
> +{
> +    info->qdev.init = spapr_vio_busdev_init;
> +    info->qdev.bus_info = &spapr_vio_bus_info;
> +
> +    assert(info->qdev.size >= sizeof(VIOsPAPRDevice));
> +    qdev_register(&info->qdev);
> +}
> +
> +static int spapr_vio_bridge_init(SysBusDevice *dev)
> +{
> +    /* nothing */
> +    return 0;
> +}
> +
> +static SysBusDeviceInfo spapr_vio_bridge_info = {
> +    .init = spapr_vio_bridge_init,
> +    .qdev.name  = "spapr-vio-bridge",
> +    .qdev.size  = sizeof(SysBusDevice),
> +    .qdev.no_user = 1,
> +};
> +
> +static void spapr_vio_register_devices(void)
> +{
> +    sysbus_register_withprop(&spapr_vio_bridge_info);
> +}
> +
> +device_init(spapr_vio_register_devices)
> +
> +#ifdef CONFIG_FDT
> +
> +int spapr_populate_vdevice(VIOsPAPRBus *bus, void *fdt)
> +{
> +    DeviceState *qdev;
> +    int ret = 0;
> +
> +    QLIST_FOREACH(qdev, &bus->bus.children, sibling) {
> +        VIOsPAPRDevice *dev = (VIOsPAPRDevice *)qdev;
> +
> +        ret = vio_make_devnode(dev, fdt);
> +
> +        if (ret < 0)
> +            return ret;
> +    }
> +    
> +    return 0;
> +}
> +#endif /* CONFIG_FDT */
> diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h
> new file mode 100644
> index 0000000..fb5e301
> --- /dev/null
> +++ b/hw/spapr_vio.h
> @@ -0,0 +1,49 @@
> +#ifndef _HW_SPAPR_VIO_H
> +#define _HW_SPAPR_VIO_H
> +/*
> + * QEMU sPAPR VIO bus definitions
> + *
> + * Copyright (c) 2010 David Gibson, IBM Corporation <david@gibson.dropbear.id.au>
> + * Based on the s390 virtio bus definitions:
> + * Copyright (c) 2009 Alexander Graf <agraf@suse.de>
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +typedef struct VIOsPAPRDevice {
> +    DeviceState qdev;
> +    uint32_t reg;
> +} VIOsPAPRDevice;
> +
> +typedef struct VIOsPAPRBus {
> +    BusState bus;
> +} VIOsPAPRBus;
> +
> +typedef struct {
> +    DeviceInfo qdev;
> +    const char *dt_name, *dt_type, *dt_compatible;
> +    int (*init)(VIOsPAPRDevice *dev);
> +    int (*devnode)(VIOsPAPRDevice *dev, void *fdt, int node_off);
> +} VIOsPAPRDeviceInfo;
> +
> +extern VIOsPAPRBus *spapr_vio_bus_init(void);
> +extern VIOsPAPRDevice *spapr_vio_find_by_reg(VIOsPAPRBus *bus, uint32_t reg);
> +extern void spapr_vio_bus_register_withprop(VIOsPAPRDeviceInfo *info);
> +extern int spapr_populate_vdevice(VIOsPAPRBus *bus, void *fdt);
> +
> +void vty_putchars(VIOsPAPRDevice *sdev, uint8_t *buf, int len);
> +void spapr_vty_create(VIOsPAPRBus *bus,
> +                      uint32_t reg, CharDriverState *chardev);
> +
> +#endif /* _HW_SPAPR_VIO_H */
> diff --git a/hw/spapr_vty.c b/hw/spapr_vty.c
> new file mode 100644
> index 0000000..9a2dc0b
> --- /dev/null
> +++ b/hw/spapr_vty.c
> @@ -0,0 +1,132 @@
> +#include "qdev.h"
> +#include "qemu-char.h"
> +#include "hw/spapr.h"
> +#include "hw/spapr_vio.h"
> +
> +#define VTERM_BUFSIZE   16
> +
> +typedef struct VIOsPAPRVTYDevice {
> +    VIOsPAPRDevice sdev;
> +    CharDriverState *chardev;
> +    uint32_t in, out;
> +    uint8_t buf[VTERM_BUFSIZE];
> +} VIOsPAPRVTYDevice;
> +
> +static int vty_can_receive(void *opaque)
> +{
> +    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)opaque;
> +
> +    return (dev->in - dev->out) < VTERM_BUFSIZE;
> +}
> +
> +static void vty_receive(void *opaque, const uint8_t *buf, int size)
> +{
> +    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)opaque;
> +    int i;
> +
> +    for (i = 0; i < size; i++) {
> +        assert((dev->in - dev->out) < VTERM_BUFSIZE);
> +        dev->buf[dev->in++ % VTERM_BUFSIZE] = buf[i];
> +    }
> +}
> +
> +static int vty_getchars(VIOsPAPRDevice *sdev, uint8_t *buf, int max)
> +{
> +    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)sdev;
> +    int n = 0;
> +
> +    while ((n < max) && (dev->out != dev->in))
> +        buf[n++] = dev->buf[dev->out++ % VTERM_BUFSIZE];
> +
> +    return n;
> +}
> +
> +void vty_putchars(VIOsPAPRDevice *sdev, uint8_t *buf, int len)
> +{
> +    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)sdev;
> +
> +    /* FIXME: should check the qemu_chr_write() return value */
> +    qemu_chr_write(dev->chardev, buf, len);
> +}
> +
> +static int spapr_vty_init(VIOsPAPRDevice *sdev)
> +{
> +    VIOsPAPRVTYDevice *dev = (VIOsPAPRVTYDevice *)sdev;
> +
> +    qemu_chr_add_handlers(dev->chardev, vty_can_receive,
> +                          vty_receive, NULL, dev);
> +
> +    return 0;
> +}
> +
> +target_ulong h_put_term_char(sPAPREnvironment *spapr,
> +                             target_ulong termno, target_ulong len,
> +                             target_ulong char0_7, target_ulong char8_15)
> +{
> +    VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, termno);
> +    uint8_t buf[16];
> +
> +    if (!sdev)
> +        return H_PARAMETER;
> +
> +    if (len > 16)
> +        return H_PARAMETER;
> +
> +    *((uint64_t *)buf) = cpu_to_be64(char0_7);
> +    *((uint64_t *)buf + 1) = cpu_to_be64(char8_15);
> +
> +    vty_putchars(sdev, buf, len);
> +
> +    return 0;
> +}
> +
> +target_ulong h_get_term_char(sPAPREnvironment *spapr,
> +                             target_ulong termno, target_ulong *len,
> +                             target_ulong *char0_7, target_ulong *char8_15)
> +{
> +    VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, termno);
> +    uint8_t buf[16];
> +
> +    if (!sdev)
> +        return H_PARAMETER;
> +
> +    *len = vty_getchars(sdev, buf, sizeof(buf));
> +    if (*len < 16)
> +        memset(buf + *len, 0, 16 - *len);
> +
> +    *char0_7 = be64_to_cpu(*((uint64_t *)buf));
> +    *char8_15 = be64_to_cpu(*((uint64_t *)buf + 1));
> +
> +    return H_SUCCESS;
> +}
> +
> +void spapr_vty_create(VIOsPAPRBus *bus,
> +                      uint32_t reg, CharDriverState *chardev)
> +{
> +    DeviceState *dev;
> +
> +    dev = qdev_create(&bus->bus, "spapr-vty");
> +    qdev_prop_set_uint32(dev, "reg", reg);
> +    qdev_prop_set_chr(dev, "chardev", chardev);
> +    qdev_init_nofail(dev);
> +}
> +
> +static VIOsPAPRDeviceInfo spapr_vty = {
> +    .init = spapr_vty_init,
> +    .dt_name = "vty",
> +    .dt_type = "serial",
> +    .dt_compatible = "hvterm1",
> +    .qdev.name = "spapr-vty",
> +    .qdev.size = sizeof(VIOsPAPRVTYDevice),
> +    .qdev.props = (Property[]) {
> +        DEFINE_PROP_UINT32("reg", VIOsPAPRDevice, reg, 0),
> +        DEFINE_PROP_CHR("chardev", VIOsPAPRVTYDevice, chardev),
> +        DEFINE_PROP_END_OF_LIST(),
> +    },
> +};
> +
> +static void spapr_vty_register(void)
> +{
> +    spapr_vio_bus_register_withprop(&spapr_vty);
> +}
> +device_init(spapr_vty_register);
> -- 
> 1.7.1
> 

Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-12 16:47   ` [Qemu-devel] " Alexander Graf
@ 2011-02-12 16:59     ` Blue Swirl
  2011-02-12 21:00       ` Benjamin Herrenschmidt
  2011-02-13 11:09     ` David Gibson
  1 sibling, 1 reply; 73+ messages in thread
From: Blue Swirl @ 2011-02-12 16:59 UTC (permalink / raw
  To: Alexander Graf
  Cc: Paul Mackerras, qemu-devel@nongnu.org List, anton, David Gibson

On Sat, Feb 12, 2011 at 6:47 PM, Alexander Graf <agraf@suse.de> wrote:
>
> On 12.02.2011, at 15:54, David Gibson wrote:
>
>> This extends the "pseries" (PAPR) machine to include a virtual IO bus
>> supporting the PAPR defined hypercall based virtual IO mechanisms.
>>
>> So far only one VIO device is provided, the vty / vterm, providing
>> a full console (polled only, for now).
>>
>> Signed-off-by: David Gibson <dwg@au1.ibm.com>
>> ---
>> Makefile.target  |    3 +-
>> hw/spapr.c       |   31 +++++++++-
>> hw/spapr.h       |   10 +++
>> hw/spapr_hcall.c |   19 ++----
>> hw/spapr_vio.c   |  191 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> hw/spapr_vio.h   |   49 ++++++++++++++
>> hw/spapr_vty.c   |  132 +++++++++++++++++++++++++++++++++++++
>> 7 files changed, 419 insertions(+), 16 deletions(-)
>> create mode 100644 hw/spapr_vio.c
>> create mode 100644 hw/spapr_vio.h
>> create mode 100644 hw/spapr_vty.c
>>
>> diff --git a/Makefile.target b/Makefile.target
>> index e0796ba..fe232da 100644
>> --- a/Makefile.target
>> +++ b/Makefile.target
>> @@ -232,7 +232,8 @@ obj-ppc-y += ppc_oldworld.o
>> # NewWorld PowerMac
>> obj-ppc-y += ppc_newworld.o
>> # IBM pSeries (sPAPR)
>> -obj-ppc-y += spapr.o spapr_hcall.o
>> +obj-ppc-y += spapr.o spapr_hcall.o spapr_vio.o
>> +obj-ppc-y += spapr_vty.o
>> # PowerPC 4xx boards
>> obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
>> obj-ppc-y += ppc440.o ppc440_bamboo.o
>> diff --git a/hw/spapr.c b/hw/spapr.c
>> index 8aca4e0..da61061 100644
>> --- a/hw/spapr.c
>> +++ b/hw/spapr.c
>> @@ -37,6 +37,7 @@
>> #include "net.h"
>> #include "blockdev.h"
>> #include "hw/spapr.h"
>> +#include "hw/spapr_vio.h"
>>
>> #include <libfdt.h>
>>
>> @@ -49,6 +50,7 @@
>>
>> static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>>                               const char *cpu_model, CPUState *envs[],
>> +                              sPAPREnvironment *spapr,
>>                               target_phys_addr_t initrd_base,
>>                               target_phys_addr_t initrd_size,
>>                               const char *kernel_cmdline)
>> @@ -59,6 +61,7 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>>     uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
>>     int i;
>>     char *modelname;
>> +    int ret;
>>
>> #define _FDT(exp) \
>>     do { \
>> @@ -151,9 +154,28 @@ static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize,
>>
>>     _FDT((fdt_end_node(fdt)));
>>
>> +    /* vdevice */
>> +    _FDT((fdt_begin_node(fdt, "vdevice")));
>> +
>> +    _FDT((fdt_property_string(fdt, "device_type", "vdevice")));
>> +    _FDT((fdt_property_string(fdt, "compatible", "IBM,vdevice")));
>> +    _FDT((fdt_property_cell(fdt, "#address-cells", 0x1)));
>> +    _FDT((fdt_property_cell(fdt, "#size-cells", 0x0)));
>> +
>> +    _FDT((fdt_end_node(fdt)));
>> +
>>     _FDT((fdt_end_node(fdt))); /* close root node */
>>     _FDT((fdt_finish(fdt)));
>>
>> +    /* re-expand to allow for further tweaks */
>> +    _FDT((fdt_open_into(fdt, fdt, FDT_MAX_SIZE)));
>> +
>> +    ret = spapr_populate_vdevice(spapr->vio_bus, fdt);
>> +    if (ret < 0)
>
> Braces..
>
>> +        fprintf(stderr, "couldn't setup vio devices in fdt\n");
>> +
>> +    _FDT((fdt_pack(fdt)));
>> +
>>     if (fdt_size)
>>         *fdt_size = fdt_totalsize(fdt);
>>
>> @@ -211,6 +233,12 @@ static void ppc_spapr_init (ram_addr_t ram_size,
>>     ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", ram_size);
>>     cpu_register_physical_memory(0, ram_size, ram_offset);
>>
>> +    spapr->vio_bus = spapr_vio_bus_init();
>> +
>> +    for (i = 0; i < MAX_SERIAL_PORTS; i++)
>
> Braces..
>
>> +        if (serial_hds[i])
>
> Braces..
>
>> +            spapr_vty_create(spapr->vio_bus, i, serial_hds[i]);
>
> There might be a qdev way to do this. Blue?

Actually I don't quite understand the need for vty layer, why not use
the chardev here directly?

>
>> +
>>     if (kernel_filename) {
>>         uint64_t lowaddr = 0;
>>
>> @@ -242,7 +270,7 @@ static void ppc_spapr_init (ram_addr_t ram_size,
>>         }
>>
>>         /* load fdt */
>> -        fdt = spapr_create_fdt(&fdt_size, ram_size, cpu_model, &env,
>> +        fdt = spapr_create_fdt(&fdt_size, ram_size, cpu_model, &env, spapr,
>>                                initrd_base, initrd_size,
>>                                kernel_cmdline);
>>         if (!fdt) {
>> @@ -267,6 +295,7 @@ static QEMUMachine spapr_machine = {
>>     .desc = "pSeries Logical Partition (PAPR compliant)",
>>     .init = ppc_spapr_init,
>>     .max_cpus = 1,
>> +    .no_parallel = 1,
>
> duplicate?
>
>>     .no_vga = 1,
>>     .no_parallel = 1,
>> };
>> diff --git a/hw/spapr.h b/hw/spapr.h
>> index dae9617..168511f 100644
>> --- a/hw/spapr.h
>> +++ b/hw/spapr.h
>> @@ -1,7 +1,10 @@
>> #if !defined (__HW_SPAPR_H__)
>> #define __HW_SPAPR_H__
>>
>> +struct VIOsPAPRBus;
>> +
>> typedef struct sPAPREnvironment {
>> +    struct VIOsPAPRBus *vio_bus;
>> } sPAPREnvironment;
>>
>> #define H_SUCCESS         0
>> @@ -237,4 +240,11 @@ typedef struct sPAPREnvironment {
>> target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
>>                              target_ulong token, target_ulong *args);
>>
>> +target_ulong h_put_term_char(sPAPREnvironment *spapr,
>> +                             target_ulong termno, target_ulong len,
>> +                             target_ulong char0_7, target_ulong char8_15);
>> +target_ulong h_get_term_char(sPAPREnvironment *spapr,
>> +                             target_ulong termno, target_ulong *len,
>> +                             target_ulong *char0_7, target_ulong *char8_15);
>> +
>> #endif /* !defined (__HW_SPAPR_H__) */
>> diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c
>> index c99c345..e2ed9cf 100644
>> --- a/hw/spapr_hcall.c
>> +++ b/hw/spapr_hcall.c
>> @@ -3,19 +3,6 @@
>> #include "qemu-char.h"
>> #include "hw/spapr.h"
>>
>> -static target_ulong h_put_term_char(target_ulong termno, target_ulong len,
>> -                                    target_ulong char0_7, target_ulong char8_15)
>> -{
>> -    uint8_t buf[16];
>> -
>> -    *((uint64_t *)buf) = cpu_to_be64(char0_7);
>> -    *((uint64_t *)buf + 1) = cpu_to_be64(char8_15);
>> -
>> -    qemu_chr_write(serial_hds[0], buf, len);
>> -
>> -    return 0;
>> -}
>> -
>> target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
>>                              target_ulong token, target_ulong *args)
>> {
>> @@ -29,7 +16,11 @@ target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr,
>>
>>     switch (token) {
>>     case H_PUT_TERM_CHAR:
>> -        r = h_put_term_char(args[0], args[1], args[2], args[3]);
>> +        r = h_put_term_char(spapr, args[0], args[1], args[2], args[3]);
>> +        break;
>> +
>> +    case H_GET_TERM_CHAR:
>> +        r = h_get_term_char(spapr, args[0], &args[0], &args[1], &args[2]);
>
> Slick and simple. Blue, do you think there's some random abstraction layer necessary?

Same here.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 14/15] Start implementing pSeries logical partition machine
  2011-02-12 16:40     ` Blue Swirl
@ 2011-02-12 20:54       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 73+ messages in thread
From: Benjamin Herrenschmidt @ 2011-02-12 20:54 UTC (permalink / raw
  To: Blue Swirl
  Cc: qemu-devel@nongnu.org List, Paul Mackerras, Alexander Graf, anton,
	David Gibson

On Sat, 2011-02-12 at 18:40 +0200, Blue Swirl wrote:
> 
> sPAPREnvironment has a certain aroma reminding of aHungarian
> nNotation, but otherwise the bouquet is entirely passable. 

It's just the smell, like a good french cheese :-)

sPAPR as "server Power Architecture® Platform Requirements",
which is really "Power Architecture® Platform Requirements+" with the
"server" bit added by us to differenciate from the newer ePAPR for
embedded :-) A bit messy ... but sPAPR is what we tend to call it
nowadays.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-12 16:59     ` Blue Swirl
@ 2011-02-12 21:00       ` Benjamin Herrenschmidt
  2011-02-12 22:52         ` Blue Swirl
  0 siblings, 1 reply; 73+ messages in thread
From: Benjamin Herrenschmidt @ 2011-02-12 21:00 UTC (permalink / raw
  To: Blue Swirl
  Cc: qemu-devel@nongnu.org List, Paul Mackerras, Alexander Graf, anton,
	David Gibson

On Sat, 2011-02-12 at 18:59 +0200, Blue Swirl wrote:
> 
> Actually I don't quite understand the need for vty layer, why not use
> the chardev here directly?

I'm not sure what you mean here...

Basically, the interface presented to guests is sPAPR compliant, so
virtual devices come with a bunch of stuff such as standard device-tree
properties, but also hcalls for interrupt control etc... which are
common to most of these guys including vty.

Some of it isn't present in David current patch just yet, but I don't
see how using an existing chardev would provide the same semantics,
especially when we start adding interrupts etc...

Also eventually, VTY's will be hot-pluggable (when we get to do that)
and will use the same mechanisms as the other sPAPR VIO devices for
that.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-12 21:00       ` Benjamin Herrenschmidt
@ 2011-02-12 22:52         ` Blue Swirl
  2011-02-12 23:15           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 73+ messages in thread
From: Blue Swirl @ 2011-02-12 22:52 UTC (permalink / raw
  To: Benjamin Herrenschmidt
  Cc: qemu-devel@nongnu.org List, Paul Mackerras, Alexander Graf, anton,
	David Gibson

On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Sat, 2011-02-12 at 18:59 +0200, Blue Swirl wrote:
>>
>> Actually I don't quite understand the need for vty layer, why not use
>> the chardev here directly?
>
> I'm not sure what you mean here...

Maybe it would be reasonable to leave h_put_term_char to spapr_hcall.c
instead of moving those to a separate file.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-12 22:52         ` Blue Swirl
@ 2011-02-12 23:15           ` Benjamin Herrenschmidt
  2011-02-13  8:08             ` Blue Swirl
  2011-02-13 11:14             ` David Gibson
  0 siblings, 2 replies; 73+ messages in thread
From: Benjamin Herrenschmidt @ 2011-02-12 23:15 UTC (permalink / raw
  To: Blue Swirl
  Cc: qemu-devel@nongnu.org List, Paul Mackerras, Alexander Graf, anton,
	David Gibson

On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote:
> On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> > On Sat, 2011-02-12 at 18:59 +0200, Blue Swirl wrote:
> >>
> >> Actually I don't quite understand the need for vty layer, why not use
> >> the chardev here directly?
> >
> > I'm not sure what you mean here...
> 
> Maybe it would be reasonable to leave h_put_term_char to spapr_hcall.c
> instead of moving those to a separate file.

Well, the VIO device instance gives the chardev instance which is all
nicely encapsulated inside spapr-vty... Also VIO devices tend to have
dedicated hcalls, not only VTY, so it makes a lot of sense to keep them
close to the rest of the VIO driver they belong to don't you think ?

(Actually veth does, vscsi uses the more "generic" CRQ stuff which we've
added to the core VIO but you'll see that when we get those patches
ready, hopefully soon).

Actually, one thing I noticed is that the current patches David posted
still have a single function with a switch/case statement for hcalls.

I'm not 100% certain what David long term plans are here, but in our
internal "WIP" tree, we've subsequently turned that into a mechanism
where any module can call powerpc_register_hypercall() to add hcalls.

So if David intends to move the "upstream candidate" tree in that
direction, then naturally, the calls in spapr_hcall.c are going to
disappear in favor of a pair of powerpc_register_hypercall() locally in
the vty module.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-12 23:15           ` Benjamin Herrenschmidt
@ 2011-02-13  8:08             ` Blue Swirl
  2011-02-13 11:12               ` David Gibson
                                 ` (2 more replies)
  2011-02-13 11:14             ` David Gibson
  1 sibling, 3 replies; 73+ messages in thread
From: Blue Swirl @ 2011-02-13  8:08 UTC (permalink / raw
  To: Benjamin Herrenschmidt
  Cc: qemu-devel@nongnu.org List, Paul Mackerras, Alexander Graf, anton,
	David Gibson

On Sun, Feb 13, 2011 at 1:15 AM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote:
>> On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt
>> <benh@kernel.crashing.org> wrote:
>> > On Sat, 2011-02-12 at 18:59 +0200, Blue Swirl wrote:
>> >>
>> >> Actually I don't quite understand the need for vty layer, why not use
>> >> the chardev here directly?
>> >
>> > I'm not sure what you mean here...
>>
>> Maybe it would be reasonable to leave h_put_term_char to spapr_hcall.c
>> instead of moving those to a separate file.
>
> Well, the VIO device instance gives the chardev instance which is all
> nicely encapsulated inside spapr-vty... Also VIO devices tend to have
> dedicated hcalls, not only VTY, so it makes a lot of sense to keep them
> close to the rest of the VIO driver they belong to don't you think ?
>
> (Actually veth does, vscsi uses the more "generic" CRQ stuff which we've
> added to the core VIO but you'll see that when we get those patches
> ready, hopefully soon).

This is a bit of a special case, much like semihosting modes for m68k
or ARM, or like MOL hacks which were removed recently. From QEMU point
of view, the most natural way of handling this would be hypervisor
implemented in the guest side (for example BIOS). Then the hypervisor
would use normal IO (or virtio) to communicate with the host. If
inside QEMU, the interface of the hypervisor to the devices needs some
thought. We'd like to avoid ugly interfaces like vmmouse where a
device probes CPU registers directly or spaghetti interfaces like
APIC.

> Actually, one thing I noticed is that the current patches David posted
> still have a single function with a switch/case statement for hcalls.
>
> I'm not 100% certain what David long term plans are here, but in our
> internal "WIP" tree, we've subsequently turned that into a mechanism
> where any module can call powerpc_register_hypercall() to add hcalls.
>
> So if David intends to move the "upstream candidate" tree in that
> direction, then naturally, the calls in spapr_hcall.c are going to
> disappear in favor of a pair of powerpc_register_hypercall() locally in
> the vty module.

Is the interface new design, or are you implementing what is used also
on real HW?

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 09/15] Parse SDR1 on mtspr instead of at translate time
  2011-02-12 15:37   ` [Qemu-devel] " Alexander Graf
@ 2011-02-13  9:02     ` David Gibson
  2011-02-13 12:33       ` Alexander Graf
  0 siblings, 1 reply; 73+ messages in thread
From: David Gibson @ 2011-02-13  9:02 UTC (permalink / raw
  To: Alexander Graf; +Cc: paulus, qemu-devel, anton

On Sat, Feb 12, 2011 at 04:37:46PM +0100, Alexander Graf wrote:
> On 12.02.2011, at 15:54, David Gibson wrote:
[snip]
> > +#define SDR_HTABORG_32         0xFFFF0000UL
> > +#define SDR_HTABMASK           0x000001FFUL
> 
> Please mark this constant as ppc32
> 
> > +
> > +#if defined(TARGET_PPC64)
> > +#define SDR_HTABORG_64         0xFFFFFFFFFFFC0000ULL
> > +#define SDR_HTABSIZE           0x000000000000001FULL
> 
> Please mark this constant as ppc64

Um.. I'm not sure what you mean by this.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 12/15] Support 1T segments on ppc
  2011-02-12 15:57   ` [Qemu-devel] " Alexander Graf
@ 2011-02-13  9:34     ` David Gibson
  2011-02-13 12:37       ` Alexander Graf
  0 siblings, 1 reply; 73+ messages in thread
From: David Gibson @ 2011-02-13  9:34 UTC (permalink / raw
  To: Alexander Graf; +Cc: paulus, qemu-devel, anton

On Sat, Feb 12, 2011 at 04:57:39PM +0100, Alexander Graf wrote:
> On 12.02.2011, at 15:54, David Gibson wrote:
[snip]
> > +    if (rb & (0x1000 - env->slb_nr))
> 
> Braces...

Oops, yeah.  These later patches in the series I haven't really
audited for coding style adequately yet.  I'll fix these before the
next version.

[snip]
> > + 	return -1; /* 1T segment on MMU that doesn't support it */
> > + 
> > +    /* We stuff a copy of the B field into slb->esid to simplify
> > +     * lookup later */
> > +    slb->esid = (rb & (SLB_ESID_ESID | SLB_ESID_V)) |
> > +        (rs >> SLB_VSID_SSIZE_SHIFT);
> 
> Wouldn't it be easier to add another field?

Easier for what?  The reason I put these bits in here is that the rest
of the things slb_lookup() needs to scan for are all in the esid
field, so putting B in there means slb_lookup() needs only one
comparison per-slot, per segment size.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 13/15] Add POWER7 support for ppc
  2011-02-12 16:09   ` [Qemu-devel] " Alexander Graf
@ 2011-02-13  9:39     ` David Gibson
  2011-02-13 12:37       ` Alexander Graf
  0 siblings, 1 reply; 73+ messages in thread
From: David Gibson @ 2011-02-13  9:39 UTC (permalink / raw
  To: Alexander Graf; +Cc: paulus, qemu-devel, anton

On Sat, Feb 12, 2011 at 05:09:39PM +0100, Alexander Graf wrote:
> On 12.02.2011, at 15:54, David Gibson wrote:
[snip]
> > +    /* Don't generate spurious events */
> > +    if ((cur_level == 1 && level == 0) || (cur_level == 0 && level != 0)) {
> 
> Did you hit this? Qemu's irq framework should already ensure that
> property. I'm also not sure it's actually correct - if a level
> interrupt is on, the guest would get another interrupt injected, no?
> That would be cur_level ==1 && level == 1 IIUC.

[snip]
> > +        case POWER7_INPUT_CKSTP:
> 
> POWER7 has checkstop?

[snip]
> > +        case POWER7_INPUT_HRESET:
> 
> Does this ever get triggered? POWER7 is run in lpar only, so there is no hreset, right?

[snip]
> > +        case POWER7_INPUT_TBEN:
> > +            LOG_IRQ("%s: set the TBEN state to %d\n", __func__,
> > +                        level);
> > +            /* XXX: TODO */
> 
> Hrm - what is this?

Ah, drat.  I forgot about this.  The POWER7 interrupt stuff I copied
from 970 and them modified minimally to get it working.  I meant to
get around to auditing this stuff to see what was actually relevant to
POWER7.  I'll address this for the next version.

[snip]
> > +#if !defined(CONFIG_USER_ONLY)
> > +    env->slb_nr = 32;
> 
> POWER7 has 64, no? Please check this :).

Nope.  POWER4 and POWER5 have 64, but POWER7 has 32.  This one I did
check and change.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-12 16:47   ` [Qemu-devel] " Alexander Graf
  2011-02-12 16:59     ` Blue Swirl
@ 2011-02-13 11:09     ` David Gibson
  2011-02-13 12:38       ` Alexander Graf
  1 sibling, 1 reply; 73+ messages in thread
From: David Gibson @ 2011-02-13 11:09 UTC (permalink / raw
  To: Alexander Graf
  Cc: Blue Swirl, Paul Mackerras, qemu-devel@nongnu.org List, anton

On Sat, Feb 12, 2011 at 05:47:53PM +0100, Alexander Graf wrote:
> On 12.02.2011, at 15:54, David Gibson wrote:
[snip] 
> > @@ -267,6 +295,7 @@ static QEMUMachine spapr_machine = {
> >     .desc = "pSeries Logical Partition (PAPR compliant)",
> >     .init = ppc_spapr_init,
> >     .max_cpus = 1,
> > +    .no_parallel = 1,
> 
> duplicate?

Oops, rebasing mistake.  Fixed now.

[snip]
> > +VIOsPAPRDevice *spapr_vio_find_by_reg(VIOsPAPRBus *bus, uint32_t reg)
> > +{
> > +    DeviceState *qdev;
> > +    VIOsPAPRDevice *dev = NULL;
> > +
> > +    QLIST_FOREACH(qdev, &bus->bus.children, sibling) {
> > +        dev = (VIOsPAPRDevice *)qdev;
> > +        if (dev->reg == reg)
> 
> Braces
> 
> > +            break;
> > +    }
> > +
> > +    return dev;
> 
> What if the device doesn't exist?

This returns NULL, the caller returns an error...

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13  8:08             ` Blue Swirl
@ 2011-02-13 11:12               ` David Gibson
  2011-02-13 12:15                 ` Blue Swirl
  2011-02-13 15:08                 ` Anthony Liguori
  2011-02-13 12:31               ` Alexander Graf
  2011-02-13 16:07               ` Benjamin Herrenschmidt
  2 siblings, 2 replies; 73+ messages in thread
From: David Gibson @ 2011-02-13 11:12 UTC (permalink / raw
  To: Blue Swirl
  Cc: Paul Mackerras, qemu-devel@nongnu.org List, anton, Alexander Graf

On Sun, Feb 13, 2011 at 10:08:23AM +0200, Blue Swirl wrote:
> On Sun, Feb 13, 2011 at 1:15 AM, Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> > On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote:
> >> On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt
> >> <benh@kernel.crashing.org> wrote:
> >> > On Sat, 2011-02-12 at 18:59 +0200, Blue Swirl wrote:
> >> >>
> >> >> Actually I don't quite understand the need for vty layer, why not use
> >> >> the chardev here directly?
> >> >
> >> > I'm not sure what you mean here...
> >>
> >> Maybe it would be reasonable to leave h_put_term_char to spapr_hcall.c
> >> instead of moving those to a separate file.
> >
> > Well, the VIO device instance gives the chardev instance which is all
> > nicely encapsulated inside spapr-vty... Also VIO devices tend to have
> > dedicated hcalls, not only VTY, so it makes a lot of sense to keep them
> > close to the rest of the VIO driver they belong to don't you think ?
> >
> > (Actually veth does, vscsi uses the more "generic" CRQ stuff which we've
> > added to the core VIO but you'll see that when we get those patches
> > ready, hopefully soon).
> 
> This is a bit of a special case, much like semihosting modes for m68k
> or ARM, or like MOL hacks which were removed recently. From QEMU point
> of view, the most natural way of handling this would be hypervisor
> implemented in the guest side (for example BIOS). Then the hypervisor
> would use normal IO (or virtio) to communicate with the host. If
> inside QEMU, the interface of the hypervisor to the devices needs some
> thought. We'd like to avoid ugly interfaces like vmmouse where a
> device probes CPU registers directly or spaghetti interfaces like
> APIC.

I really don't follow what you're saying here.  Running the hypervisor
in the guest, rather than emulating its effect in qemu seems like an
awful lot of complexity for no clear reason.

> > Actually, one thing I noticed is that the current patches David posted
> > still have a single function with a switch/case statement for hcalls.
> >
> > I'm not 100% certain what David long term plans are here, but in our
> > internal "WIP" tree, we've subsequently turned that into a mechanism
> > where any module can call powerpc_register_hypercall() to add hcalls.
> >
> > So if David intends to move the "upstream candidate" tree in that
> > direction, then naturally, the calls in spapr_hcall.c are going to
> > disappear in favor of a pair of powerpc_register_hypercall() locally in
> > the vty module.
> 
> Is the interface new design, or are you implementing what is used also
> on real HW?

The interface already exists on real HW.  It's described in the PAPR
document we keep mentioning.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-12 23:15           ` Benjamin Herrenschmidt
  2011-02-13  8:08             ` Blue Swirl
@ 2011-02-13 11:14             ` David Gibson
  2011-02-13 12:40               ` Alexander Graf
  1 sibling, 1 reply; 73+ messages in thread
From: David Gibson @ 2011-02-13 11:14 UTC (permalink / raw
  To: Benjamin Herrenschmidt
  Cc: Blue Swirl, Paul Mackerras, qemu-devel@nongnu.org List, anton,
	Alexander Graf

On Sun, Feb 13, 2011 at 10:15:03AM +1100, Benjamin Herrenschmidt wrote:
> On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote:
> > On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt
[snip]
> Actually, one thing I noticed is that the current patches David posted
> still have a single function with a switch/case statement for hcalls.
> 
> I'm not 100% certain what David long term plans are here, but in our
> internal "WIP" tree, we've subsequently turned that into a mechanism
> where any module can call powerpc_register_hypercall() to add hcalls.
> 
> So if David intends to move the "upstream candidate" tree in that
> direction, then naturally, the calls in spapr_hcall.c are going to
> disappear in favor of a pair of powerpc_register_hypercall() locally in
> the vty module.

Ah, yeah.  I'm still not sure what to do about it.  I was going to
fold the dynamic hcall registration into the patch set before
upstreaming.  But then something paulus said made me rethink whether
the dynamic registration was a good idea.  Still need to sort this out
before the series is really ready.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13 11:12               ` David Gibson
@ 2011-02-13 12:15                 ` Blue Swirl
  2011-02-13 16:12                   ` Benjamin Herrenschmidt
  2011-02-13 15:08                 ` Anthony Liguori
  1 sibling, 1 reply; 73+ messages in thread
From: Blue Swirl @ 2011-02-13 12:15 UTC (permalink / raw
  To: David Gibson
  Cc: Paul Mackerras, qemu-devel@nongnu.org List, anton, Alexander Graf

On Sun, Feb 13, 2011 at 1:12 PM, David Gibson
<david@gibson.dropbear.id.au> wrote:
> On Sun, Feb 13, 2011 at 10:08:23AM +0200, Blue Swirl wrote:
>> On Sun, Feb 13, 2011 at 1:15 AM, Benjamin Herrenschmidt
>> <benh@kernel.crashing.org> wrote:
>> > On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote:
>> >> On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt
>> >> <benh@kernel.crashing.org> wrote:
>> >> > On Sat, 2011-02-12 at 18:59 +0200, Blue Swirl wrote:
>> >> >>
>> >> >> Actually I don't quite understand the need for vty layer, why not use
>> >> >> the chardev here directly?
>> >> >
>> >> > I'm not sure what you mean here...
>> >>
>> >> Maybe it would be reasonable to leave h_put_term_char to spapr_hcall.c
>> >> instead of moving those to a separate file.
>> >
>> > Well, the VIO device instance gives the chardev instance which is all
>> > nicely encapsulated inside spapr-vty... Also VIO devices tend to have
>> > dedicated hcalls, not only VTY, so it makes a lot of sense to keep them
>> > close to the rest of the VIO driver they belong to don't you think ?
>> >
>> > (Actually veth does, vscsi uses the more "generic" CRQ stuff which we've
>> > added to the core VIO but you'll see that when we get those patches
>> > ready, hopefully soon).
>>
>> This is a bit of a special case, much like semihosting modes for m68k
>> or ARM, or like MOL hacks which were removed recently. From QEMU point
>> of view, the most natural way of handling this would be hypervisor
>> implemented in the guest side (for example BIOS). Then the hypervisor
>> would use normal IO (or virtio) to communicate with the host. If
>> inside QEMU, the interface of the hypervisor to the devices needs some
>> thought. We'd like to avoid ugly interfaces like vmmouse where a
>> device probes CPU registers directly or spaghetti interfaces like
>> APIC.
>
> I really don't follow what you're saying here.  Running the hypervisor
> in the guest, rather than emulating its effect in qemu seems like an
> awful lot of complexity for no clear reason.

Maybe it would be more complex but also emulation accuracy would be
increased and the interfaces would be saner. We don't shortcut BIOS
and implement its services to OS in QEMU for other machines either.

I'd expect one problem with that approach though, the interface used
on real HW between the hypervisor and the underlying HW may be
undocumented, but then it could use for example existing virtio
devices.

One way to handle this could be to add the hypervisor interface now to
QEMU and switch to guest hypervisor when (if) it becomes available.
I'd just like to avoid duplication with virtio or messy interfaces
like vmport.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13  8:08             ` Blue Swirl
  2011-02-13 11:12               ` David Gibson
@ 2011-02-13 12:31               ` Alexander Graf
  2011-02-13 12:59                 ` Blue Swirl
  2011-02-13 16:07               ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 73+ messages in thread
From: Alexander Graf @ 2011-02-13 12:31 UTC (permalink / raw
  To: Blue Swirl
  Cc: Paul Mackerras, qemu-devel@nongnu.org List, anton, David Gibson


On 13.02.2011, at 09:08, Blue Swirl wrote:

> On Sun, Feb 13, 2011 at 1:15 AM, Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
>> On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote:
>>> On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt
>>> <benh@kernel.crashing.org> wrote:
>>>> On Sat, 2011-02-12 at 18:59 +0200, Blue Swirl wrote:
>>>>> 
>>>>> Actually I don't quite understand the need for vty layer, why not use
>>>>> the chardev here directly?
>>>> 
>>>> I'm not sure what you mean here...
>>> 
>>> Maybe it would be reasonable to leave h_put_term_char to spapr_hcall.c
>>> instead of moving those to a separate file.
>> 
>> Well, the VIO device instance gives the chardev instance which is all
>> nicely encapsulated inside spapr-vty... Also VIO devices tend to have
>> dedicated hcalls, not only VTY, so it makes a lot of sense to keep them
>> close to the rest of the VIO driver they belong to don't you think ?
>> 
>> (Actually veth does, vscsi uses the more "generic" CRQ stuff which we've
>> added to the core VIO but you'll see that when we get those patches
>> ready, hopefully soon).
> 
> This is a bit of a special case, much like semihosting modes for m68k
> or ARM, or like MOL hacks which were removed recently. From QEMU point
> of view, the most natural way of handling this would be hypervisor
> implemented in the guest side (for example BIOS). Then the hypervisor
> would use normal IO (or virtio) to communicate with the host. If
> inside QEMU, the interface of the hypervisor to the devices needs some
> thought. We'd like to avoid ugly interfaces like vmmouse where a
> device probes CPU registers directly or spaghetti interfaces like
> APIC.

In this case I disagree. While the "emulate real hardware" case would be to have a full proprietary binary blob of firmware running in the guest that would handle all the hypervisor stuff, this is not feasible. Implementing PAPR in Qemu is hard (and slow) enough - doing it all emulated simply is overkill for what we're trying to achieve here.

The PAPR interfaces are well specified (in the PAPR spec - seems to be power.org member restricted) and are the only thing you ever get to see on recent POWER hardware. The real hardware interface is simply inaccessible to you.

It's basically the same as the S390 machine we have. On those machine we simply don't have access to real hw, so emulating it is moot. All interfaces that the OS sees are PV interfaces.

> 
>> Actually, one thing I noticed is that the current patches David posted
>> still have a single function with a switch/case statement for hcalls.
>> 
>> I'm not 100% certain what David long term plans are here, but in our
>> internal "WIP" tree, we've subsequently turned that into a mechanism
>> where any module can call powerpc_register_hypercall() to add hcalls.
>> 
>> So if David intends to move the "upstream candidate" tree in that
>> direction, then naturally, the calls in spapr_hcall.c are going to
>> disappear in favor of a pair of powerpc_register_hypercall() locally in
>> the vty module.
> 
> Is the interface new design, or are you implementing what is used also
> on real HW?

PAPR is the spec that defines the PV interface in use on POWER. Outside of IBM, nobody can run Linux on POWER without going through phyp which is their hypervisor.

So this implements exactly what the OS sees on real HW :).


Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 09/15] Parse SDR1 on mtspr instead of at translate time
  2011-02-13  9:02     ` David Gibson
@ 2011-02-13 12:33       ` Alexander Graf
  2011-02-13 12:52         ` David Gibson
  0 siblings, 1 reply; 73+ messages in thread
From: Alexander Graf @ 2011-02-13 12:33 UTC (permalink / raw
  To: David Gibson; +Cc: paulus, qemu-devel, anton


On 13.02.2011, at 10:02, David Gibson wrote:

> On Sat, Feb 12, 2011 at 04:37:46PM +0100, Alexander Graf wrote:
>> On 12.02.2011, at 15:54, David Gibson wrote:
> [snip]
>>> +#define SDR_HTABORG_32         0xFFFF0000UL
>>> +#define SDR_HTABMASK           0x000001FFUL
>> 
>> Please mark this constant as ppc32
>> 
>>> +
>>> +#if defined(TARGET_PPC64)
>>> +#define SDR_HTABORG_64         0xFFFFFFFFFFFC0000ULL
>>> +#define SDR_HTABSIZE           0x000000000000001FULL
>> 
>> Please mark this constant as ppc64
> 
> Um.. I'm not sure what you mean by this.

Well, while to you SDR_HTABMASK and SDR_HTABSIZE are "obviously" meant for ppc32/ppc64 respectively, the average code reader won't know the difference. What I'm proposing is:

#define SDR_32_HTABORG
#define SDR_32_HTABMASK

#define SDR_64_HTABORG
#define SDR_64_HTABSIZE

This way it's a lot more obvious that the two constants really belong to two completely different semantics :).


Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 12/15] Support 1T segments on ppc
  2011-02-13  9:34     ` David Gibson
@ 2011-02-13 12:37       ` Alexander Graf
  2011-02-13 13:38         ` David Gibson
  0 siblings, 1 reply; 73+ messages in thread
From: Alexander Graf @ 2011-02-13 12:37 UTC (permalink / raw
  To: David Gibson; +Cc: paulus, qemu-devel, anton


On 13.02.2011, at 10:34, David Gibson wrote:

> On Sat, Feb 12, 2011 at 04:57:39PM +0100, Alexander Graf wrote:
>> On 12.02.2011, at 15:54, David Gibson wrote:
> [snip]
>>> +    if (rb & (0x1000 - env->slb_nr))
>> 
>> Braces...
> 
> Oops, yeah.  These later patches in the series I haven't really
> audited for coding style adequately yet.  I'll fix these before the
> next version.
> 
> [snip]
>>> + 	return -1; /* 1T segment on MMU that doesn't support it */
>>> + 
>>> +    /* We stuff a copy of the B field into slb->esid to simplify
>>> +     * lookup later */
>>> +    slb->esid = (rb & (SLB_ESID_ESID | SLB_ESID_V)) |
>>> +        (rs >> SLB_VSID_SSIZE_SHIFT);
>> 
>> Wouldn't it be easier to add another field?
> 
> Easier for what?  The reason I put these bits in here is that the rest
> of the things slb_lookup() needs to scan for are all in the esid
> field, so putting B in there means slb_lookup() needs only one
> comparison per-slot, per segment size.

Hrm - but it also needs random & ~3 masking in other code which is very unpretty. Comparing two numbers really shouldn't hurt performance too much, but makes the code better maintainable.

struct slb_entry {
    uint64_t esid;
    uint64_t vsid;
    int b;
}

or so :).

Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 13/15] Add POWER7 support for ppc
  2011-02-13  9:39     ` David Gibson
@ 2011-02-13 12:37       ` Alexander Graf
  0 siblings, 0 replies; 73+ messages in thread
From: Alexander Graf @ 2011-02-13 12:37 UTC (permalink / raw
  To: David Gibson; +Cc: paulus, qemu-devel, anton


On 13.02.2011, at 10:39, David Gibson wrote:

> On Sat, Feb 12, 2011 at 05:09:39PM +0100, Alexander Graf wrote:
>> On 12.02.2011, at 15:54, David Gibson wrote:
> [snip]
>>> +    /* Don't generate spurious events */
>>> +    if ((cur_level == 1 && level == 0) || (cur_level == 0 && level != 0)) {
>> 
>> Did you hit this? Qemu's irq framework should already ensure that
>> property. I'm also not sure it's actually correct - if a level
>> interrupt is on, the guest would get another interrupt injected, no?
>> That would be cur_level ==1 && level == 1 IIUC.
> 
> [snip]
>>> +        case POWER7_INPUT_CKSTP:
>> 
>> POWER7 has checkstop?
> 
> [snip]
>>> +        case POWER7_INPUT_HRESET:
>> 
>> Does this ever get triggered? POWER7 is run in lpar only, so there is no hreset, right?
> 
> [snip]
>>> +        case POWER7_INPUT_TBEN:
>>> +            LOG_IRQ("%s: set the TBEN state to %d\n", __func__,
>>> +                        level);
>>> +            /* XXX: TODO */
>> 
>> Hrm - what is this?
> 
> Ah, drat.  I forgot about this.  The POWER7 interrupt stuff I copied
> from 970 and them modified minimally to get it working.  I meant to
> get around to auditing this stuff to see what was actually relevant to
> POWER7.  I'll address this for the next version.
> 
> [snip]
>>> +#if !defined(CONFIG_USER_ONLY)
>>> +    env->slb_nr = 32;
>> 
>> POWER7 has 64, no? Please check this :).
> 
> Nope.  POWER4 and POWER5 have 64, but POWER7 has 32.  This one I did
> check and change.

Oh? Interesting. Good to know :)


Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13 11:09     ` David Gibson
@ 2011-02-13 12:38       ` Alexander Graf
  0 siblings, 0 replies; 73+ messages in thread
From: Alexander Graf @ 2011-02-13 12:38 UTC (permalink / raw
  To: David Gibson
  Cc: Blue Swirl, Paul Mackerras, qemu-devel@nongnu.org List, anton


On 13.02.2011, at 12:09, David Gibson wrote:

> On Sat, Feb 12, 2011 at 05:47:53PM +0100, Alexander Graf wrote:
>> On 12.02.2011, at 15:54, David Gibson wrote:
> [snip] 
>>> @@ -267,6 +295,7 @@ static QEMUMachine spapr_machine = {
>>>    .desc = "pSeries Logical Partition (PAPR compliant)",
>>>    .init = ppc_spapr_init,
>>>    .max_cpus = 1,
>>> +    .no_parallel = 1,
>> 
>> duplicate?
> 
> Oops, rebasing mistake.  Fixed now.
> 
> [snip]
>>> +VIOsPAPRDevice *spapr_vio_find_by_reg(VIOsPAPRBus *bus, uint32_t reg)
>>> +{
>>> +    DeviceState *qdev;
>>> +    VIOsPAPRDevice *dev = NULL;
>>> +
>>> +    QLIST_FOREACH(qdev, &bus->bus.children, sibling) {
>>> +        dev = (VIOsPAPRDevice *)qdev;
>>> +        if (dev->reg == reg)
>> 
>> Braces
>> 
>>> +            break;
>>> +    }
>>> +
>>> +    return dev;
>> 
>> What if the device doesn't exist?
> 
> This returns NULL, the caller returns an error...

Makes sense :).


Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13 11:14             ` David Gibson
@ 2011-02-13 12:40               ` Alexander Graf
  2011-02-13 12:44                 ` David Gibson
                                   ` (2 more replies)
  0 siblings, 3 replies; 73+ messages in thread
From: Alexander Graf @ 2011-02-13 12:40 UTC (permalink / raw
  To: David Gibson
  Cc: Blue Swirl, Paul Mackerras, qemu-devel@nongnu.org List, anton


On 13.02.2011, at 12:14, David Gibson wrote:

> On Sun, Feb 13, 2011 at 10:15:03AM +1100, Benjamin Herrenschmidt wrote:
>> On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote:
>>> On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt
> [snip]
>> Actually, one thing I noticed is that the current patches David posted
>> still have a single function with a switch/case statement for hcalls.
>> 
>> I'm not 100% certain what David long term plans are here, but in our
>> internal "WIP" tree, we've subsequently turned that into a mechanism
>> where any module can call powerpc_register_hypercall() to add hcalls.
>> 
>> So if David intends to move the "upstream candidate" tree in that
>> direction, then naturally, the calls in spapr_hcall.c are going to
>> disappear in favor of a pair of powerpc_register_hypercall() locally in
>> the vty module.
> 
> Ah, yeah.  I'm still not sure what to do about it.  I was going to
> fold the dynamic hcall registration into the patch set before
> upstreaming.  But then something paulus said made me rethink whether
> the dynamic registration was a good idea.  Still need to sort this out
> before the series is really ready.

We can surely move it to dynamic later on. I think the "proper" way would be to populate a qdev bus and have the individual hypercall receivers register themselves through -device creations. But Blue really is the expert here :).


Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13 12:40               ` Alexander Graf
@ 2011-02-13 12:44                 ` David Gibson
  2011-02-13 13:09                   ` Alexander Graf
  2011-02-13 15:14                 ` Anthony Liguori
  2011-02-13 16:17                 ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 73+ messages in thread
From: David Gibson @ 2011-02-13 12:44 UTC (permalink / raw
  To: Alexander Graf
  Cc: Blue Swirl, Paul Mackerras, qemu-devel@nongnu.org List, anton

On Sun, Feb 13, 2011 at 01:40:14PM +0100, Alexander Graf wrote:
> 
> On 13.02.2011, at 12:14, David Gibson wrote:
> 
> > On Sun, Feb 13, 2011 at 10:15:03AM +1100, Benjamin Herrenschmidt wrote:
> >> On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote:
> >>> On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt
> > [snip]
> >> Actually, one thing I noticed is that the current patches David posted
> >> still have a single function with a switch/case statement for hcalls.
> >> 
> >> I'm not 100% certain what David long term plans are here, but in our
> >> internal "WIP" tree, we've subsequently turned that into a mechanism
> >> where any module can call powerpc_register_hypercall() to add hcalls.
> >> 
> >> So if David intends to move the "upstream candidate" tree in that
> >> direction, then naturally, the calls in spapr_hcall.c are going to
> >> disappear in favor of a pair of powerpc_register_hypercall() locally in
> >> the vty module.
> > 
> > Ah, yeah.  I'm still not sure what to do about it.  I was going to
> > fold the dynamic hcall registration into the patch set before
> > upstreaming.  But then something paulus said made me rethink whether
> > the dynamic registration was a good idea.  Still need to sort this out
> > before the series is really ready.
> 
> We can surely move it to dynamic later on. I think the "proper" way
> would be to populate a qdev bus and have the individual hypercall
> receivers register themselves through -device creations. But Blue
> really is the expert here :).

Ok, not sure what you mean here.  I already have a qdev bus for the
VIO devices.  With my tentative dynamic model as devices are created
on the bus they may register hypercalls as well.

Is that what you mean, or do you mean have a separate "hypercall"
bus.  That sounds like serious overkill for a simple number->function
translation.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 05/15] Implement PowerPC slbmfee and slbmfev instructions
  2011-02-12 15:23   ` [Qemu-devel] " Alexander Graf
@ 2011-02-13 12:46     ` David Gibson
  0 siblings, 0 replies; 73+ messages in thread
From: David Gibson @ 2011-02-13 12:46 UTC (permalink / raw
  To: Alexander Graf; +Cc: paulus, qemu-devel, anton

On Sat, Feb 12, 2011 at 04:23:39PM +0100, Alexander Graf wrote:
> On 12.02.2011, at 15:54, David Gibson wrote:
[snip]
> > +target_ulong helper_load_slb_esid (target_ulong rb)
> > +{
> > +    target_ulong rt;
> > +
> > +    if (ppc_load_slb_esid(env, rb, &rt) < 0) {
> > +        helper_raise_exception_err(POWERPC_EXCP_PROGRAM, POWERPC_EXCP_INVAL);
> 
> The spec doesn't say what to do in this case. Have you checked what
> real hardware does?

Erm, I don't think I've checked this specific case, on this specific
CPU.  Generally I've found that invalid parameters to MMU management
instructions results in invalid instruction program checks, so I
assumed that's what would happen in this case.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 09/15] Parse SDR1 on mtspr instead of at translate time
  2011-02-13 12:33       ` Alexander Graf
@ 2011-02-13 12:52         ` David Gibson
  0 siblings, 0 replies; 73+ messages in thread
From: David Gibson @ 2011-02-13 12:52 UTC (permalink / raw
  To: Alexander Graf; +Cc: paulus, qemu-devel, anton

On Sun, Feb 13, 2011 at 01:33:44PM +0100, Alexander Graf wrote:
> 
> On 13.02.2011, at 10:02, David Gibson wrote:
> 
> > On Sat, Feb 12, 2011 at 04:37:46PM +0100, Alexander Graf wrote:
> >> On 12.02.2011, at 15:54, David Gibson wrote:
> > [snip]
> >>> +#define SDR_HTABORG_32         0xFFFF0000UL
> >>> +#define SDR_HTABMASK           0x000001FFUL
> >> 
> >> Please mark this constant as ppc32
> >> 
> >>> +
> >>> +#if defined(TARGET_PPC64)
> >>> +#define SDR_HTABORG_64         0xFFFFFFFFFFFC0000ULL
> >>> +#define SDR_HTABSIZE           0x000000000000001FULL
> >> 
> >> Please mark this constant as ppc64
> > 
> > Um.. I'm not sure what you mean by this.
> 
> Well, while to you SDR_HTABMASK and SDR_HTABSIZE are "obviously"
> meant for ppc32/ppc64 respectively, the average code reader won't
> know the difference. What I'm proposing is:
> 
> #define SDR_32_HTABORG
> #define SDR_32_HTABMASK
> 
> #define SDR_64_HTABORG
> #define SDR_64_HTABSIZE
> 
> This way it's a lot more obvious that the two constants really
> belong to two completely different semantics :).

Ah! I see.  Done, I'll have this in the next cut.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13 12:31               ` Alexander Graf
@ 2011-02-13 12:59                 ` Blue Swirl
  0 siblings, 0 replies; 73+ messages in thread
From: Blue Swirl @ 2011-02-13 12:59 UTC (permalink / raw
  To: Alexander Graf
  Cc: Paul Mackerras, qemu-devel@nongnu.org List, anton, David Gibson

On Sun, Feb 13, 2011 at 2:31 PM, Alexander Graf <agraf@suse.de> wrote:
>
> On 13.02.2011, at 09:08, Blue Swirl wrote:
>
>> On Sun, Feb 13, 2011 at 1:15 AM, Benjamin Herrenschmidt
>> <benh@kernel.crashing.org> wrote:
>>> On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote:
>>>> On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt
>>>> <benh@kernel.crashing.org> wrote:
>>>>> On Sat, 2011-02-12 at 18:59 +0200, Blue Swirl wrote:
>>>>>>
>>>>>> Actually I don't quite understand the need for vty layer, why not use
>>>>>> the chardev here directly?
>>>>>
>>>>> I'm not sure what you mean here...
>>>>
>>>> Maybe it would be reasonable to leave h_put_term_char to spapr_hcall.c
>>>> instead of moving those to a separate file.
>>>
>>> Well, the VIO device instance gives the chardev instance which is all
>>> nicely encapsulated inside spapr-vty... Also VIO devices tend to have
>>> dedicated hcalls, not only VTY, so it makes a lot of sense to keep them
>>> close to the rest of the VIO driver they belong to don't you think ?
>>>
>>> (Actually veth does, vscsi uses the more "generic" CRQ stuff which we've
>>> added to the core VIO but you'll see that when we get those patches
>>> ready, hopefully soon).
>>
>> This is a bit of a special case, much like semihosting modes for m68k
>> or ARM, or like MOL hacks which were removed recently. From QEMU point
>> of view, the most natural way of handling this would be hypervisor
>> implemented in the guest side (for example BIOS). Then the hypervisor
>> would use normal IO (or virtio) to communicate with the host. If
>> inside QEMU, the interface of the hypervisor to the devices needs some
>> thought. We'd like to avoid ugly interfaces like vmmouse where a
>> device probes CPU registers directly or spaghetti interfaces like
>> APIC.
>
> In this case I disagree. While the "emulate real hardware" case would be to have a full proprietary binary blob of firmware running in the guest that would handle all the hypervisor stuff, this is not feasible. Implementing PAPR in Qemu is hard (and slow) enough - doing it all emulated simply is overkill for what we're trying to achieve here.
>
> The PAPR interfaces are well specified (in the PAPR spec - seems to be power.org member restricted) and are the only thing you ever get to see on recent POWER hardware. The real hardware interface is simply inaccessible to you.

Hah, I'm sure that if sufficiently motivated bunch of cool hackers got
access to a truckload (or several, aren't these big machines?) of
POWER hardware, which they could open and brick at leisure, like it is
done with other firmware reverse engineering efforts, we'd have open
hypervisor at no time. ;-)

It's not impossible to imagine also IBM open sourcing theirs.

> It's basically the same as the S390 machine we have. On those machine we simply don't have access to real hw, so emulating it is moot. All interfaces that the OS sees are PV interfaces.
>
>>
>>> Actually, one thing I noticed is that the current patches David posted
>>> still have a single function with a switch/case statement for hcalls.
>>>
>>> I'm not 100% certain what David long term plans are here, but in our
>>> internal "WIP" tree, we've subsequently turned that into a mechanism
>>> where any module can call powerpc_register_hypercall() to add hcalls.
>>>
>>> So if David intends to move the "upstream candidate" tree in that
>>> direction, then naturally, the calls in spapr_hcall.c are going to
>>> disappear in favor of a pair of powerpc_register_hypercall() locally in
>>> the vty module.
>>
>> Is the interface new design, or are you implementing what is used also
>> on real HW?
>
> PAPR is the spec that defines the PV interface in use on POWER. Outside of IBM, nobody can run Linux on POWER without going through phyp which is their hypervisor.
>
> So this implements exactly what the OS sees on real HW :).

Right. As long as the resulting spaghetti is well contained, I'm OK
with this approach. But should ever the millionaire hacker team (or
IBM) appear with their open hypervisor, this shall make room for it.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13 12:44                 ` David Gibson
@ 2011-02-13 13:09                   ` Alexander Graf
  0 siblings, 0 replies; 73+ messages in thread
From: Alexander Graf @ 2011-02-13 13:09 UTC (permalink / raw
  To: David Gibson
  Cc: Blue Swirl, Paul Mackerras, qemu-devel@nongnu.org List, anton


On 13.02.2011, at 13:44, David Gibson wrote:

> On Sun, Feb 13, 2011 at 01:40:14PM +0100, Alexander Graf wrote:
>> 
>> On 13.02.2011, at 12:14, David Gibson wrote:
>> 
>>> On Sun, Feb 13, 2011 at 10:15:03AM +1100, Benjamin Herrenschmidt wrote:
>>>> On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote:
>>>>> On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt
>>> [snip]
>>>> Actually, one thing I noticed is that the current patches David posted
>>>> still have a single function with a switch/case statement for hcalls.
>>>> 
>>>> I'm not 100% certain what David long term plans are here, but in our
>>>> internal "WIP" tree, we've subsequently turned that into a mechanism
>>>> where any module can call powerpc_register_hypercall() to add hcalls.
>>>> 
>>>> So if David intends to move the "upstream candidate" tree in that
>>>> direction, then naturally, the calls in spapr_hcall.c are going to
>>>> disappear in favor of a pair of powerpc_register_hypercall() locally in
>>>> the vty module.
>>> 
>>> Ah, yeah.  I'm still not sure what to do about it.  I was going to
>>> fold the dynamic hcall registration into the patch set before
>>> upstreaming.  But then something paulus said made me rethink whether
>>> the dynamic registration was a good idea.  Still need to sort this out
>>> before the series is really ready.
>> 
>> We can surely move it to dynamic later on. I think the "proper" way
>> would be to populate a qdev bus and have the individual hypercall
>> receivers register themselves through -device creations. But Blue
>> really is the expert here :).
> 
> Ok, not sure what you mean here.  I already have a qdev bus for the
> VIO devices.  With my tentative dynamic model as devices are created
> on the bus they may register hypercalls as well.

Oh, guess I just overlooked that then, sorry :).

> Is that what you mean, or do you mean have a separate "hypercall"
> bus.  That sounds like serious overkill for a simple number->function
> translation.

Yup.


Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 12/15] Support 1T segments on ppc
  2011-02-13 12:37       ` Alexander Graf
@ 2011-02-13 13:38         ` David Gibson
  0 siblings, 0 replies; 73+ messages in thread
From: David Gibson @ 2011-02-13 13:38 UTC (permalink / raw
  To: Alexander Graf; +Cc: paulus, qemu-devel, anton

On Sun, Feb 13, 2011 at 01:37:12PM +0100, Alexander Graf wrote:
> On 13.02.2011, at 10:34, David Gibson wrote:
> > On Sat, Feb 12, 2011 at 04:57:39PM +0100, Alexander Graf wrote:
> >> On 12.02.2011, at 15:54, David Gibson wrote:
> > [snip]
> >>> +    if (rb & (0x1000 - env->slb_nr))
> >> 
> >> Braces...
> > 
> > Oops, yeah.  These later patches in the series I haven't really
> > audited for coding style adequately yet.  I'll fix these before the
> > next version.
> > 
> > [snip]
> >>> + 	return -1; /* 1T segment on MMU that doesn't support it */
> >>> + 
> >>> +    /* We stuff a copy of the B field into slb->esid to simplify
> >>> +     * lookup later */
> >>> +    slb->esid = (rb & (SLB_ESID_ESID | SLB_ESID_V)) |
> >>> +        (rs >> SLB_VSID_SSIZE_SHIFT);
> >> 
> >> Wouldn't it be easier to add another field?
> > 
> > Easier for what?  The reason I put these bits in here is that the rest
> > of the things slb_lookup() needs to scan for are all in the esid
> > field, so putting B in there means slb_lookup() needs only one
> > comparison per-slot, per segment size.
> 
> Hrm - but it also needs random & ~3 masking in other code which is
> very unpretty. Comparing two numbers really shouldn't hurt
> performance too much, but makes the code better maintainable.

Well, it's only one place.  But fair enough, I'll avoid this hack in
the next version.

> struct slb_entry {
>     uint64_t esid;
>     uint64_t vsid;
>     int b;
> }
> 
> or so :).

Actually, we don't even need that.  The B field is already in
slb->vsid.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13 11:12               ` David Gibson
  2011-02-13 12:15                 ` Blue Swirl
@ 2011-02-13 15:08                 ` Anthony Liguori
  2011-02-13 15:56                   ` Alexander Graf
                                     ` (2 more replies)
  1 sibling, 3 replies; 73+ messages in thread
From: Anthony Liguori @ 2011-02-13 15:08 UTC (permalink / raw
  To: David Gibson
  Cc: Blue Swirl, Alexander Graf, Paul Mackerras,
	qemu-devel@nongnu.org List, anton

On 02/13/2011 05:12 AM, David Gibson wrote:
> On Sun, Feb 13, 2011 at 10:08:23AM +0200, Blue Swirl wrote:
>    
>> On Sun, Feb 13, 2011 at 1:15 AM, Benjamin Herrenschmidt
>> <benh@kernel.crashing.org>  wrote:
>>      
>>> On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote:
>>>        
>>>> On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt
>>>> <benh@kernel.crashing.org>  wrote:
>>>>          
>>>>> On Sat, 2011-02-12 at 18:59 +0200, Blue Swirl wrote:
>>>>>            
>>>>>> Actually I don't quite understand the need for vty layer, why not use
>>>>>> the chardev here directly?
>>>>>>              
>>>>> I'm not sure what you mean here...
>>>>>            
>>>> Maybe it would be reasonable to leave h_put_term_char to spapr_hcall.c
>>>> instead of moving those to a separate file.
>>>>          
>>> Well, the VIO device instance gives the chardev instance which is all
>>> nicely encapsulated inside spapr-vty... Also VIO devices tend to have
>>> dedicated hcalls, not only VTY, so it makes a lot of sense to keep them
>>> close to the rest of the VIO driver they belong to don't you think ?
>>>
>>> (Actually veth does, vscsi uses the more "generic" CRQ stuff which we've
>>> added to the core VIO but you'll see that when we get those patches
>>> ready, hopefully soon).
>>>        
>> This is a bit of a special case, much like semihosting modes for m68k
>> or ARM, or like MOL hacks which were removed recently. From QEMU point
>> of view, the most natural way of handling this would be hypervisor
>> implemented in the guest side (for example BIOS). Then the hypervisor
>> would use normal IO (or virtio) to communicate with the host. If
>> inside QEMU, the interface of the hypervisor to the devices needs some
>> thought. We'd like to avoid ugly interfaces like vmmouse where a
>> device probes CPU registers directly or spaghetti interfaces like
>> APIC.
>>      
> I really don't follow what you're saying here.  Running the hypervisor
> in the guest, rather than emulating its effect in qemu seems like an
> awful lot of complexity for no clear reason.
>    

In KVM for x86, instead of using a secondary interface (like 
vmmcall/vmcall), we do all of our paravirtualization using native 
hardware interfaces that we can trap (PIO/MMIO).

IIUC, on Power, trapping MMIO is not possible due to the MMU mode that 
is currently used (PFs are delivered directly to the guest).

So it's not really possible to switch from using hypercalls to using MMIO.

What I would suggest is modelling hypercalls as another I/O address 
space for the processor.  So instead of having a function pointer in the 
CPUState, introduce a:

typedef void (HypercallFunc)(CPUState *env, void *opaque);

/* register a hypercall handler */
void register_hypercall(target_ulong index, HypercallFunc *handler, void 
*opaque);
void unregister_hypercall(target_ulong index);

/* dispatch a hypercall */
void cpu_hypercall(CPUState *env, target_ulong index);

This interface could also be used to implement hypercall based 
interfaces on s390 and x86.

The arguments will have to be extracted from the CPU state but I don't 
think we'll really ever have common hypercall implementations anyway so 
that's not a huge problem.

>> on real HW?
>>      
> The interface already exists on real HW.  It's described in the PAPR
> document we keep mentioning.
>    

Another thing to note is that the hypercall based I/O devices the 
interfaces that the current Power hypervisor uses so implementing this 
interface is necessary to support existing guests.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13 12:40               ` Alexander Graf
  2011-02-13 12:44                 ` David Gibson
@ 2011-02-13 15:14                 ` Anthony Liguori
  2011-02-13 16:17                 ` Benjamin Herrenschmidt
  2 siblings, 0 replies; 73+ messages in thread
From: Anthony Liguori @ 2011-02-13 15:14 UTC (permalink / raw
  To: Alexander Graf
  Cc: Blue Swirl, Paul Mackerras, qemu-devel@nongnu.org List, anton,
	David Gibson

On 02/13/2011 06:40 AM, Alexander Graf wrote:
>
>> Ah, yeah.  I'm still not sure what to do about it.  I was going to
>> fold the dynamic hcall registration into the patch set before
>> upstreaming.  But then something paulus said made me rethink whether
>> the dynamic registration was a good idea.  Still need to sort this out
>> before the series is really ready.
>>      
> We can surely move it to dynamic later on. I think the "proper" way would be to populate a qdev bus and have the individual hypercall receivers register themselves through -device creations.
>    

Ignore the qdev bit for a moment.  Hypercalls could be plausibly 
implemented as another I/O space from a processor so the thing to model 
off of would be PIO dispatch (cpu_outb and friends).

 From a qdev perspective, having a VIO bus makes sense.  The details of 
which I/O spaces are uses are not as important from a device tree 
perspective.

Regards,

Anthony Liguori

> Alex
>
>
>    

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13 15:08                 ` Anthony Liguori
@ 2011-02-13 15:56                   ` Alexander Graf
  2011-02-13 16:46                     ` Anthony Liguori
  2011-02-13 18:29                   ` Blue Swirl
  2011-02-13 23:30                   ` David Gibson
  2 siblings, 1 reply; 73+ messages in thread
From: Alexander Graf @ 2011-02-13 15:56 UTC (permalink / raw
  To: Anthony Liguori
  Cc: Blue Swirl, Paul Mackerras, qemu-devel@nongnu.org List, anton,
	David Gibson


On 13.02.2011, at 16:08, Anthony Liguori wrote:

> On 02/13/2011 05:12 AM, David Gibson wrote:
>> On Sun, Feb 13, 2011 at 10:08:23AM +0200, Blue Swirl wrote:
>>   
>>> On Sun, Feb 13, 2011 at 1:15 AM, Benjamin Herrenschmidt
>>> <benh@kernel.crashing.org>  wrote:
>>>     
>>>> On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote:
>>>>       
>>>>> On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt
>>>>> <benh@kernel.crashing.org>  wrote:
>>>>>         
>>>>>> On Sat, 2011-02-12 at 18:59 +0200, Blue Swirl wrote:
>>>>>>           
>>>>>>> Actually I don't quite understand the need for vty layer, why not use
>>>>>>> the chardev here directly?
>>>>>>>             
>>>>>> I'm not sure what you mean here...
>>>>>>           
>>>>> Maybe it would be reasonable to leave h_put_term_char to spapr_hcall.c
>>>>> instead of moving those to a separate file.
>>>>>         
>>>> Well, the VIO device instance gives the chardev instance which is all
>>>> nicely encapsulated inside spapr-vty... Also VIO devices tend to have
>>>> dedicated hcalls, not only VTY, so it makes a lot of sense to keep them
>>>> close to the rest of the VIO driver they belong to don't you think ?
>>>> 
>>>> (Actually veth does, vscsi uses the more "generic" CRQ stuff which we've
>>>> added to the core VIO but you'll see that when we get those patches
>>>> ready, hopefully soon).
>>>>       
>>> This is a bit of a special case, much like semihosting modes for m68k
>>> or ARM, or like MOL hacks which were removed recently. From QEMU point
>>> of view, the most natural way of handling this would be hypervisor
>>> implemented in the guest side (for example BIOS). Then the hypervisor
>>> would use normal IO (or virtio) to communicate with the host. If
>>> inside QEMU, the interface of the hypervisor to the devices needs some
>>> thought. We'd like to avoid ugly interfaces like vmmouse where a
>>> device probes CPU registers directly or spaghetti interfaces like
>>> APIC.
>>>     
>> I really don't follow what you're saying here.  Running the hypervisor
>> in the guest, rather than emulating its effect in qemu seems like an
>> awful lot of complexity for no clear reason.
>>   
> 
> In KVM for x86, instead of using a secondary interface (like vmmcall/vmcall), we do all of our paravirtualization using native hardware interfaces that we can trap (PIO/MMIO).
> 
> IIUC, on Power, trapping MMIO is not possible due to the MMU mode that is currently used (PFs are delivered directly to the guest).
> 
> So it's not really possible to switch from using hypercalls to using MMIO.
> 
> What I would suggest is modelling hypercalls as another I/O address space for the processor.  So instead of having a function pointer in the CPUState, introduce a:
> 
> typedef void (HypercallFunc)(CPUState *env, void *opaque);
> 
> /* register a hypercall handler */
> void register_hypercall(target_ulong index, HypercallFunc *handler, void *opaque);
> void unregister_hypercall(target_ulong index);
> 
> /* dispatch a hypercall */
> void cpu_hypercall(CPUState *env, target_ulong index);
> 
> This interface could also be used to implement hypercall based interfaces on s390 and x86.
> 
> The arguments will have to be extracted from the CPU state but I don't think we'll really ever have common hypercall implementations anyway so that's not a huge problem.

Is there enough common ground between the hypercall interfaces that this makes sense? It sounds nice on paper, but in the end the "hypercall not implemented" return codes differ, the argument interpretation differs and even the amount of return values differ.

So at the end of the day, all this interface could do is call a machine helper function to evaluate the hypercall id from its register state and dispatch to that, leaving the rest to the individual hypercall function implementation.

The implementations themselves also couldn't be reused. A PAPR hypercall simply doesn't work on x86. PIO and MMIO interfaces make sense to share - devices implemented using them could potentially be reused later by other platforms. For the hypercall devices, that doesn't work.

> 
>>> on real HW?
>>>     
>> The interface already exists on real HW.  It's described in the PAPR
>> document we keep mentioning.
>>   
> 
> Another thing to note is that the hypercall based I/O devices the interfaces that the current Power hypervisor uses so implementing this interface is necessary to support existing guests.

Yes, implementing the existing PAPR interfaces is crucial to run existing guests. Implementing it at the hypercall level is required if we ever want to run it with hardware accelerated KVM on ppc, as there hypercalls simply get forwarded to the hypervisor (kvm) which would pass them on to qemu. And since the interface is not nesting capable, running a hypervisor inside the guest doesn't work.


Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13  8:08             ` Blue Swirl
  2011-02-13 11:12               ` David Gibson
  2011-02-13 12:31               ` Alexander Graf
@ 2011-02-13 16:07               ` Benjamin Herrenschmidt
  2011-02-13 16:48                 ` Anthony Liguori
  2 siblings, 1 reply; 73+ messages in thread
From: Benjamin Herrenschmidt @ 2011-02-13 16:07 UTC (permalink / raw
  To: Blue Swirl
  Cc: qemu-devel@nongnu.org List, Paul Mackerras, Alexander Graf, anton,
	David Gibson

On Sun, 2011-02-13 at 10:08 +0200, Blue Swirl wrote:
> This is a bit of a special case, much like semihosting modes for m68k
> or ARM, or like MOL hacks which were removed recently. From QEMU point
> of view, the most natural way of handling this would be hypervisor
> implemented in the guest side (for example BIOS). Then the hypervisor
> would use normal IO (or virtio) to communicate with the host. If
> inside QEMU, the interface of the hypervisor to the devices needs some
> thought. We'd like to avoid ugly interfaces like vmmouse where a
> device probes CPU registers directly or spaghetti interfaces like
> APIC.

I'm not sure I understand your point. We are emulating a guest running
under pHyp, ie, qemu in that case is the hypervisor, and provides the
same interfaces as pHyp does (the IBM proprietary hypervisor, aka
sPAPR). This is how we enable booting existing kernels and distributions
inside qemu/kvm.

> Is the interface new design, or are you implementing what is used also
> on real HW?

We are implementing a subset of the interfaces implemented by pHyp today
on "real HW". In the long run, when you throw KVM into the picture, on
machines (hypothetical machines at this stage) where one can run a KVM
has a hypervisor (currently you can only run pHyp on all released IBM
machines), this will give you an environment that is compatible with
pHyp and thus supports existing distributions and kernels.

We -will- add support for the "real" virtio on top of that, but those
will require modified guest kernels (and will provide better
performances than the sPAPR emulation). But the sPAPR emulation is a
necessary step to enable existing stuff to run.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13 12:15                 ` Blue Swirl
@ 2011-02-13 16:12                   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 73+ messages in thread
From: Benjamin Herrenschmidt @ 2011-02-13 16:12 UTC (permalink / raw
  To: Blue Swirl
  Cc: Graf, qemu-devel@nongnu.org List, Paul Mackerras, anton,
	David Gibson, Alexander

On Sun, 2011-02-13 at 14:15 +0200, Blue Swirl wrote:
> 
> Maybe it would be more complex but also emulation accuracy would be
> increased and the interfaces would be saner. We don't shortcut BIOS
> and implement its services to OS in QEMU for other machines either.

But that is not comparable. BIOS is comparable for example to Open
Firmware and we do not 'emulate' OF, we will provide an implementation
that runs inside the guest, just like you do for BIOS (SLOF based, tho
people are welcome to play with OpenBIOS if they want, but SLOF is what
we will provide and support).

In this case, we are talking about a hypervisor which is somewhat a
different beast. Sure you -could- run it into the guest, I suppose, if
emulation accuracy was your ultimate goal. That would entail at least
the followings:

 - Implement support for the complete "hypervisor" mode inside qemu
 - Re-implement a complete hypervisor compatible with pHyp

An enormous amount of work, for a result that would have low
performances and about zero interest to anybody.

The goal here is to provide a runtime environment for kernels and
distributions that is -compatible- with sPAPR/pHyp to enable existing
distributions to operate in KVM.

> I'd expect one problem with that approach though, the interface used
> on real HW between the hypervisor and the underlying HW may be
> undocumented, but then it could use for example existing virtio
> devices.

But what would be the point ?

> One way to handle this could be to add the hypervisor interface now to
> QEMU and switch to guest hypervisor when (if) it becomes available.
> I'd just like to avoid duplication with virtio or messy interfaces
> like vmport. 

Again, what would be the point ? Eventually, KVM will be available as an
"alternate" hypervisor to pHyp which I suppose one could run entirely
inside qemu once we add support for the HV mode to it, and that would
somewhat do what you describe but that isn't what we are trying to get
at here.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13 12:40               ` Alexander Graf
  2011-02-13 12:44                 ` David Gibson
  2011-02-13 15:14                 ` Anthony Liguori
@ 2011-02-13 16:17                 ` Benjamin Herrenschmidt
  2011-02-13 16:52                   ` Anthony Liguori
  2 siblings, 1 reply; 73+ messages in thread
From: Benjamin Herrenschmidt @ 2011-02-13 16:17 UTC (permalink / raw
  To: Alexander Graf
  Cc: Blue Swirl, Paul Mackerras, qemu-devel@nongnu.org List, anton,
	David Gibson

On Sun, 2011-02-13 at 13:40 +0100, Alexander Graf wrote:
> 
> We can surely move it to dynamic later on. I think the "proper" way
> would be to populate a qdev bus and have the individual hypercall
> receivers register themselves through -device creations. But Blue
> really is the expert here :).

Why would you want to go through a bus for all hcalls ? (ie including
the ones that aren't device related ?). That doesn't quite "tick" but
I'm sure I'm missing part of the picture here :-)

A simple dispatch table based approach is the most efficient and simple
way to do that (after a switch/case) in my opinion, this is more/less
what we have done internally, with a pair of calls for "modules" to
register hcalls if they need to. The hcalls numbers are fixed and
specified in sPAPR.

BTW. We are still missing in this picture RTAS. I suppose David is still
cleaning up those patches. Basically, we use a 5 instruction trampoline
that calls a private h-call, the RTAS emulation is entirely in qemu.
This has to be done that way for various reasons, but essentially RTAS
under pHyp also more/less turns into private pHyp calls under the hood.
 
Cheers,
Ben.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13 15:56                   ` Alexander Graf
@ 2011-02-13 16:46                     ` Anthony Liguori
  0 siblings, 0 replies; 73+ messages in thread
From: Anthony Liguori @ 2011-02-13 16:46 UTC (permalink / raw
  To: Alexander Graf
  Cc: Blue Swirl, David Gibson, Paul Mackerras,
	qemu-devel@nongnu.org List, anton

On 02/13/2011 09:56 AM, Alexander Graf wrote:
>
>> This interface could also be used to implement hypercall based interfaces on s390 and x86.
>>
>> The arguments will have to be extracted from the CPU state but I don't think we'll really ever have common hypercall implementations anyway so that's not a huge problem.
>>      
> Is there enough common ground between the hypercall interfaces that this makes sense?

Between the dispatch, registration, and tracing, yes.

>   It sounds nice on paper, but in the end the "hypercall not implemented" return codes differ, the argument interpretation differs and even the amount of return values differ.
>    

Yes, that's why we pass CPUState.  But keep in mind, CPUState needs to 
be sync'd before and after the invocation.

> So at the end of the day, all this interface could do is call a machine helper function to evaluate the hypercall id from its register state and dispatch to that, leaving the rest to the individual hypercall function implementation.
>
> The implementations themselves also couldn't be reused. A PAPR hypercall simply doesn't work on x86. PIO and MMIO interfaces make sense to share - devices implemented using them could potentially be reused later by other platforms. For the hypercall devices, that doesn't work.
>    

Yes, which is why I'm not advocating trying to unmarshal the calls or 
anything like that.  But the dispatch, registration, tracing, and CPU 
sync'ing bits can all be reused.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13 16:07               ` Benjamin Herrenschmidt
@ 2011-02-13 16:48                 ` Anthony Liguori
  2011-02-13 18:19                   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 73+ messages in thread
From: Anthony Liguori @ 2011-02-13 16:48 UTC (permalink / raw
  To: Benjamin Herrenschmidt
  Cc: qemu-devel@nongnu.org List, Alexander Graf, Blue Swirl,
	Paul Mackerras, anton, David Gibson

On 02/13/2011 10:07 AM, Benjamin Herrenschmidt wrote:
> On Sun, 2011-02-13 at 10:08 +0200, Blue Swirl wrote:
>    
>> This is a bit of a special case, much like semihosting modes for m68k
>> or ARM, or like MOL hacks which were removed recently. From QEMU point
>> of view, the most natural way of handling this would be hypervisor
>> implemented in the guest side (for example BIOS). Then the hypervisor
>> would use normal IO (or virtio) to communicate with the host. If
>> inside QEMU, the interface of the hypervisor to the devices needs some
>> thought. We'd like to avoid ugly interfaces like vmmouse where a
>> device probes CPU registers directly or spaghetti interfaces like
>> APIC.
>>      
> I'm not sure I understand your point. We are emulating a guest running
> under pHyp, ie, qemu in that case is the hypervisor, and provides the
> same interfaces as pHyp does (the IBM proprietary hypervisor, aka
> sPAPR). This is how we enable booting existing kernels and distributions
> inside qemu/kvm.
>
>    
>> Is the interface new design, or are you implementing what is used also
>> on real HW?
>>      
> We are implementing a subset of the interfaces implemented by pHyp today
> on "real HW". In the long run, when you throw KVM into the picture, on
> machines (hypothetical machines at this stage) where one can run a KVM
> has a hypervisor (currently you can only run pHyp on all released IBM
> machines), this will give you an environment that is compatible with
> pHyp and thus supports existing distributions and kernels.
>    

We try very, very hard to make our paravirtualization look like real 
hardware.

When the paravirtualization interfaces are already defined, we have no 
choice in supporting those interfaces although sometimes we try to push 
that to firmware (like with Xenner).  It's better from a security PoV.

But in this case, we have no choice but to implement the 
paravirtualization interfaces in QEMU because of the way the hardware 
works and because these interfaces are already well defined.

Regards,

Anthony Liguori

> We -will- add support for the "real" virtio on top of that, but those
> will require modified guest kernels (and will provide better
> performances than the sPAPR emulation). But the sPAPR emulation is a
> necessary step to enable existing stuff to run.
>
> Cheers,
> Ben.
>
>
>    

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13 16:17                 ` Benjamin Herrenschmidt
@ 2011-02-13 16:52                   ` Anthony Liguori
  2011-02-13 18:21                     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 73+ messages in thread
From: Anthony Liguori @ 2011-02-13 16:52 UTC (permalink / raw
  To: Benjamin Herrenschmidt
  Cc: Alexander Graf, qemu-devel@nongnu.org List, Blue Swirl,
	Paul Mackerras, anton, David Gibson

On 02/13/2011 10:17 AM, Benjamin Herrenschmidt wrote:
> On Sun, 2011-02-13 at 13:40 +0100, Alexander Graf wrote:
>    
>> We can surely move it to dynamic later on. I think the "proper" way
>> would be to populate a qdev bus and have the individual hypercall
>> receivers register themselves through -device creations. But Blue
>> really is the expert here :).
>>      
> Why would you want to go through a bus for all hcalls ? (ie including
> the ones that aren't device related ?). That doesn't quite "tick" but
> I'm sure I'm missing part of the picture here :-)
>    

A virtual bus is just an interface.  If all virtual devices that 
interact via hcalls would all reside on the same virtual bus, then 
having hypercalls registered through that interface makes sense because 
you can associate hypercalls with particular devices.  This means that 
you can automatically deregister on device removal and things like that.

But I don't think this will work out well.  I think treating the 
hypercalls as a simple dispatch table just like ioport would make sense.

Regards,

Anthony Liguori

> A simple dispatch table based approach is the most efficient and simple
> way to do that (after a switch/case) in my opinion, this is more/less
> what we have done internally, with a pair of calls for "modules" to
> register hcalls if they need to. The hcalls numbers are fixed and
> specified in sPAPR.
>
> BTW. We are still missing in this picture RTAS. I suppose David is still
> cleaning up those patches. Basically, we use a 5 instruction trampoline
> that calls a private h-call, the RTAS emulation is entirely in qemu.
> This has to be done that way for various reasons, but essentially RTAS
> under pHyp also more/less turns into private pHyp calls under the hood.
>
> Cheers,
> Ben.
>
>
>
>    

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13 16:48                 ` Anthony Liguori
@ 2011-02-13 18:19                   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 73+ messages in thread
From: Benjamin Herrenschmidt @ 2011-02-13 18:19 UTC (permalink / raw
  To: Anthony Liguori
  Cc: qemu-devel@nongnu.org List, Alexander Graf, Blue Swirl,
	Paul Mackerras, anton, David Gibson

On Sun, 2011-02-13 at 10:48 -0600, Anthony Liguori wrote:
> 
> We try very, very hard to make our paravirtualization look like real 
> hardware.

Sure, that makes sense when you invent new paravirt interfaces, but that
isn't the case. Note also that our current processors do not have the
ability to emulate MMIOs in all cases, ie, when doing "real" KVM in HV
mode, we cannot trap MMIO unless we redirect all page faults to the
hypervisor, which comes at a cost.

> When the paravirtualization interfaces are already defined, we have no
> choice in supporting those interfaces although sometimes we try to
> push  that to firmware (like with Xenner).  It's better from a
> security PoV.
> 
> But in this case, we have no choice but to implement the 
> paravirtualization interfaces in QEMU because of the way the hardware 
> works and because these interfaces are already well defined.

Right.

Now, in the long run, we might decide to "reflect" some of these back
into some guest co-located FW or whatever of that kind, especially if we
get to a point where linux virt-io becomes more prevalent and the sPAPR
style IOs become purely legacy backward compat stubs, but we aren't
there yet.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13 16:52                   ` Anthony Liguori
@ 2011-02-13 18:21                     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 73+ messages in thread
From: Benjamin Herrenschmidt @ 2011-02-13 18:21 UTC (permalink / raw
  To: Anthony Liguori
  Cc: Alexander Graf, qemu-devel@nongnu.org List, Blue Swirl, Mackerras,
	anton, Paul, David Gibson

On Sun, 2011-02-13 at 10:52 -0600, Anthony Liguori wrote:
> 
> A virtual bus is just an interface.  If all virtual devices that 
> interact via hcalls would all reside on the same virtual bus, then 
> having hypercalls registered through that interface makes sense
> because 
> you can associate hypercalls with particular devices.  This means
> that 
> you can automatically deregister on device removal and things like
> that.

I see. Well, VIO related h-calls are only part of the picture here, I
think we can live with having explicit de-registration if needed ;-)
Besides the h-call is still implemented even if no device -instance- is
currently plugged into the partition anyways. It just returns a (well
defined per-hcall) error code if the instance handle passed to it is
bogus.

> But I don't think this will work out well.  I think treating the 
> hypercalls as a simple dispatch table just like ioport would make
> sense.

Yup.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13 15:08                 ` Anthony Liguori
  2011-02-13 15:56                   ` Alexander Graf
@ 2011-02-13 18:29                   ` Blue Swirl
  2011-02-13 19:32                     ` Anthony Liguori
  2011-02-13 23:33                     ` David Gibson
  2011-02-13 23:30                   ` David Gibson
  2 siblings, 2 replies; 73+ messages in thread
From: Blue Swirl @ 2011-02-13 18:29 UTC (permalink / raw
  To: Anthony Liguori
  Cc: Alexander Graf, Paul Mackerras, qemu-devel@nongnu.org List, anton,
	David Gibson

On Sun, Feb 13, 2011 at 5:08 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 02/13/2011 05:12 AM, David Gibson wrote:
>>
>> On Sun, Feb 13, 2011 at 10:08:23AM +0200, Blue Swirl wrote:
>>
>>>
>>> On Sun, Feb 13, 2011 at 1:15 AM, Benjamin Herrenschmidt
>>> <benh@kernel.crashing.org>  wrote:
>>>
>>>>
>>>> On Sun, 2011-02-13 at 00:52 +0200, Blue Swirl wrote:
>>>>
>>>>>
>>>>> On Sat, Feb 12, 2011 at 11:00 PM, Benjamin Herrenschmidt
>>>>> <benh@kernel.crashing.org>  wrote:
>>>>>
>>>>>>
>>>>>> On Sat, 2011-02-12 at 18:59 +0200, Blue Swirl wrote:
>>>>>>
>>>>>>>
>>>>>>> Actually I don't quite understand the need for vty layer, why not use
>>>>>>> the chardev here directly?
>>>>>>>
>>>>>>
>>>>>> I'm not sure what you mean here...
>>>>>>
>>>>>
>>>>> Maybe it would be reasonable to leave h_put_term_char to spapr_hcall.c
>>>>> instead of moving those to a separate file.
>>>>>
>>>>
>>>> Well, the VIO device instance gives the chardev instance which is all
>>>> nicely encapsulated inside spapr-vty... Also VIO devices tend to have
>>>> dedicated hcalls, not only VTY, so it makes a lot of sense to keep them
>>>> close to the rest of the VIO driver they belong to don't you think ?
>>>>
>>>> (Actually veth does, vscsi uses the more "generic" CRQ stuff which we've
>>>> added to the core VIO but you'll see that when we get those patches
>>>> ready, hopefully soon).
>>>>
>>>
>>> This is a bit of a special case, much like semihosting modes for m68k
>>> or ARM, or like MOL hacks which were removed recently. From QEMU point
>>> of view, the most natural way of handling this would be hypervisor
>>> implemented in the guest side (for example BIOS). Then the hypervisor
>>> would use normal IO (or virtio) to communicate with the host. If
>>> inside QEMU, the interface of the hypervisor to the devices needs some
>>> thought. We'd like to avoid ugly interfaces like vmmouse where a
>>> device probes CPU registers directly or spaghetti interfaces like
>>> APIC.
>>>
>>
>> I really don't follow what you're saying here.  Running the hypervisor
>> in the guest, rather than emulating its effect in qemu seems like an
>> awful lot of complexity for no clear reason.
>>
>
> In KVM for x86, instead of using a secondary interface (like
> vmmcall/vmcall), we do all of our paravirtualization using native hardware
> interfaces that we can trap (PIO/MMIO).
>
> IIUC, on Power, trapping MMIO is not possible due to the MMU mode that is
> currently used (PFs are delivered directly to the guest).
>
> So it's not really possible to switch from using hypercalls to using MMIO.
>
> What I would suggest is modelling hypercalls as another I/O address space
> for the processor.  So instead of having a function pointer in the CPUState,
> introduce a:
>
> typedef void (HypercallFunc)(CPUState *env, void *opaque);
>
> /* register a hypercall handler */
> void register_hypercall(target_ulong index, HypercallFunc *handler, void
> *opaque);
> void unregister_hypercall(target_ulong index);
>
> /* dispatch a hypercall */
> void cpu_hypercall(CPUState *env, target_ulong index);
>
> This interface could also be used to implement hypercall based interfaces on
> s390 and x86.
>
> The arguments will have to be extracted from the CPU state but I don't think
> we'll really ever have common hypercall implementations anyway so that's not
> a huge problem.

Nice idea. Then the part handling CPUState probably should belong to
target-ppc/ rather than hw/.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13 18:29                   ` Blue Swirl
@ 2011-02-13 19:32                     ` Anthony Liguori
  2011-02-13 23:33                     ` David Gibson
  1 sibling, 0 replies; 73+ messages in thread
From: Anthony Liguori @ 2011-02-13 19:32 UTC (permalink / raw
  To: Blue Swirl
  Cc: David Gibson, Paul Mackerras, Alexander Graf, anton,
	qemu-devel@nongnu.org List

On 02/13/2011 12:29 PM, Blue Swirl wrote:
> On Sun, Feb 13, 2011 at 5:08 PM, Anthony Liguori<anthony@codemonkey.ws>  wrote:
>    
>>
>> In KVM for x86, instead of using a secondary interface (like
>> vmmcall/vmcall), we do all of our paravirtualization using native hardware
>> interfaces that we can trap (PIO/MMIO).
>>
>> IIUC, on Power, trapping MMIO is not possible due to the MMU mode that is
>> currently used (PFs are delivered directly to the guest).
>>
>> So it's not really possible to switch from using hypercalls to using MMIO.
>>
>> What I would suggest is modelling hypercalls as another I/O address space
>> for the processor.  So instead of having a function pointer in the CPUState,
>> introduce a:
>>
>> typedef void (HypercallFunc)(CPUState *env, void *opaque);
>>
>> /* register a hypercall handler */
>> void register_hypercall(target_ulong index, HypercallFunc *handler, void
>> *opaque);
>> void unregister_hypercall(target_ulong index);
>>
>> /* dispatch a hypercall */
>> void cpu_hypercall(CPUState *env, target_ulong index);
>>
>> This interface could also be used to implement hypercall based interfaces on
>> s390 and x86.
>>
>> The arguments will have to be extracted from the CPU state but I don't think
>> we'll really ever have common hypercall implementations anyway so that's not
>> a huge problem.
>>      
> Nice idea. Then the part handling CPUState probably should belong to
> target-ppc/ rather than hw/.
>    

Would be nice to have the target-ppc/ hypercall handlers call into a 
higher level VIO interface that didn't deal directly with CPUState.  The 
actual hardware emulation would then be implemented in hw/ and would not 
be compiled for a specific target.  That helps avoid future sloppiness.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13 15:08                 ` Anthony Liguori
  2011-02-13 15:56                   ` Alexander Graf
  2011-02-13 18:29                   ` Blue Swirl
@ 2011-02-13 23:30                   ` David Gibson
  2 siblings, 0 replies; 73+ messages in thread
From: David Gibson @ 2011-02-13 23:30 UTC (permalink / raw
  To: Anthony Liguori
  Cc: Blue Swirl, Alexander Graf, Paul Mackerras,
	qemu-devel@nongnu.org List, anton

On Sun, Feb 13, 2011 at 09:08:22AM -0600, Anthony Liguori wrote:
> On 02/13/2011 05:12 AM, David Gibson wrote:
> >On Sun, Feb 13, 2011 at 10:08:23AM +0200, Blue Swirl wrote:
> >>On Sun, Feb 13, 2011 at 1:15 AM, Benjamin Herrenschmidt
[snip]
> In KVM for x86, instead of using a secondary interface (like
> vmmcall/vmcall), we do all of our paravirtualization using native
> hardware interfaces that we can trap (PIO/MMIO).
> 
> IIUC, on Power, trapping MMIO is not possible due to the MMU mode
> that is currently used (PFs are delivered directly to the guest).
> 
> So it's not really possible to switch from using hypercalls to using MMIO.

That's correct.

> What I would suggest is modelling hypercalls as another I/O address
> space for the processor.  So instead of having a function pointer in
> the CPUState, introduce a:
> 
> typedef void (HypercallFunc)(CPUState *env, void *opaque);
> 
> /* register a hypercall handler */
> void register_hypercall(target_ulong index, HypercallFunc *handler,
> void *opaque);
> void unregister_hypercall(target_ulong index);
> 
> /* dispatch a hypercall */
> void cpu_hypercall(CPUState *env, target_ulong index);

Well, I can certainly implement dynamic registration - in fact I've
done that, I just need to fold it into the earlier part of the patch
series.

But the only "address" we have for this hypercall address space is the
hypercall number, and it's not architected where that will be
supplied.  So we'd still need a per-platform hook to extract the
index from the CPUState.

> This interface could also be used to implement hypercall based
> interfaces on s390 and x86.
> 
> The arguments will have to be extracted from the CPU state but I
> don't think we'll really ever have common hypercall implementations
> anyway so that's not a huge problem.
> 
> >>on real HW?
> >The interface already exists on real HW.  It's described in the PAPR
> >document we keep mentioning.
> 
> Another thing to note is that the hypercall based I/O devices the
> interfaces that the current Power hypervisor uses so implementing
> this interface is necessary to support existing guests.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 15/15] Implement the bus structure for PAPR virtual IO
  2011-02-13 18:29                   ` Blue Swirl
  2011-02-13 19:32                     ` Anthony Liguori
@ 2011-02-13 23:33                     ` David Gibson
  1 sibling, 0 replies; 73+ messages in thread
From: David Gibson @ 2011-02-13 23:33 UTC (permalink / raw
  To: Blue Swirl
  Cc: Alexander Graf, Paul Mackerras, qemu-devel@nongnu.org List, anton

On Sun, Feb 13, 2011 at 08:29:05PM +0200, Blue Swirl wrote:
> On Sun, Feb 13, 2011 at 5:08 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
> > On 02/13/2011 05:12 AM, David Gibson wrote:
[snip]
> > The arguments will have to be extracted from the CPU state but I don't think
> > we'll really ever have common hypercall implementations anyway so that's not
> > a huge problem.
> 
> Nice idea. Then the part handling CPUState probably should belong to
> target-ppc/ rather than hw/.

Doesn't work.  Different hypervisors may have arguments - even the
hcall number itself - arranged differently in the registers.  My
earlier drafts had this in target-ppc/; I moved it to hw/ for a
reason.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] RFC: Implement emulation of pSeries logical partitions
  2011-02-12 14:54 [Qemu-devel] RFC: Implement emulation of pSeries logical partitions David Gibson
@ 2011-02-14  4:16   ` FUJITA Tomonori
  2011-02-12 14:54 ` [Qemu-devel] [PATCH 02/15] Clean up PowerPC SLB handling code David Gibson
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 73+ messages in thread
From: FUJITA Tomonori @ 2011-02-14  4:16 UTC (permalink / raw
  To: david; +Cc: linux-scsi, paulus, qemu-devel, anton, agraf

On Sun, 13 Feb 2011 01:54:12 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> This patch series adds a "pseries" machine to qemu, allowing it to
> emulate IBM pSeries logical partitions.  Along the way we add a bunch
> of support for more modern ppc CPUs than are currently supported.  It
> also makes some significant cleanups to the translation code for hash
> page table based ppc MMUs.
> 
> This is a first version of this series for review.  There are a number
> of additional patches adding features such as virtual IO devices to
> the emulated pSeries platform, which will be added to the series once
> they're a bit more polished.

The communication between LPARs that can be used for something like
VIO server is (or will be) supported?

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [Qemu-devel] RFC: Implement emulation of pSeries logical partitions
@ 2011-02-14  4:16   ` FUJITA Tomonori
  0 siblings, 0 replies; 73+ messages in thread
From: FUJITA Tomonori @ 2011-02-14  4:16 UTC (permalink / raw
  To: david; +Cc: qemu-devel, paulus, agraf, anton, linux-scsi

On Sun, 13 Feb 2011 01:54:12 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> This patch series adds a "pseries" machine to qemu, allowing it to
> emulate IBM pSeries logical partitions.  Along the way we add a bunch
> of support for more modern ppc CPUs than are currently supported.  It
> also makes some significant cleanups to the translation code for hash
> page table based ppc MMUs.
> 
> This is a first version of this series for review.  There are a number
> of additional patches adding features such as virtual IO devices to
> the emulated pSeries platform, which will be added to the series once
> they're a bit more polished.

The communication between LPARs that can be used for something like
VIO server is (or will be) supported?

^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2011-02-14  4:17 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-12 14:54 [Qemu-devel] RFC: Implement emulation of pSeries logical partitions David Gibson
2011-02-12 14:54 ` [Qemu-devel] [PATCH 01/15] Add TAGS and *~ to .gitignore David Gibson
2011-02-12 14:54 ` [Qemu-devel] [PATCH 02/15] Clean up PowerPC SLB handling code David Gibson
2011-02-12 15:17   ` [Qemu-devel] " Alexander Graf
2011-02-12 14:54 ` [Qemu-devel] [PATCH 03/15] Allow qemu_devtree_setprop() to take arbitrary values David Gibson
2011-02-12 15:18   ` [Qemu-devel] " Alexander Graf
2011-02-12 14:54 ` [Qemu-devel] [PATCH 04/15] Add a hook to allow hypercalls to be emulated on PowerPC David Gibson
2011-02-12 15:19   ` [Qemu-devel] " Alexander Graf
2011-02-12 14:54 ` [Qemu-devel] [PATCH 05/15] Implement PowerPC slbmfee and slbmfev instructions David Gibson
2011-02-12 15:23   ` [Qemu-devel] " Alexander Graf
2011-02-13 12:46     ` David Gibson
2011-02-12 14:54 ` [Qemu-devel] [PATCH 06/15] Implement missing parts of the logic for the POWER PURR David Gibson
2011-02-12 15:25   ` [Qemu-devel] " Alexander Graf
2011-02-12 14:54 ` [Qemu-devel] [PATCH 07/15] Correct ppc popcntb logic, implement popcntw and popcntd David Gibson
2011-02-12 15:27   ` [Qemu-devel] " Alexander Graf
2011-02-12 14:54 ` [Qemu-devel] [PATCH 08/15] Clean up slb_lookup() function David Gibson
2011-02-12 15:30   ` [Qemu-devel] " Alexander Graf
2011-02-12 14:54 ` [Qemu-devel] [PATCH 09/15] Parse SDR1 on mtspr instead of at translate time David Gibson
2011-02-12 15:37   ` [Qemu-devel] " Alexander Graf
2011-02-13  9:02     ` David Gibson
2011-02-13 12:33       ` Alexander Graf
2011-02-13 12:52         ` David Gibson
2011-02-12 14:54 ` [Qemu-devel] [PATCH 10/15] Use "hash" more consistently in ppc mmu code David Gibson
2011-02-12 15:47   ` [Qemu-devel] " Alexander Graf
2011-02-12 14:54 ` [Qemu-devel] [PATCH 11/15] Better factor the ppc hash translation path David Gibson
2011-02-12 15:52   ` [Qemu-devel] " Alexander Graf
2011-02-12 14:54 ` [Qemu-devel] [PATCH 12/15] Support 1T segments on ppc David Gibson
2011-02-12 15:57   ` [Qemu-devel] " Alexander Graf
2011-02-13  9:34     ` David Gibson
2011-02-13 12:37       ` Alexander Graf
2011-02-13 13:38         ` David Gibson
2011-02-12 14:54 ` [Qemu-devel] [PATCH 13/15] Add POWER7 support for ppc David Gibson
2011-02-12 16:09   ` [Qemu-devel] " Alexander Graf
2011-02-13  9:39     ` David Gibson
2011-02-13 12:37       ` Alexander Graf
2011-02-12 14:54 ` [Qemu-devel] [PATCH 14/15] Start implementing pSeries logical partition machine David Gibson
2011-02-12 16:23   ` [Qemu-devel] " Alexander Graf
2011-02-12 16:40     ` Blue Swirl
2011-02-12 20:54       ` Benjamin Herrenschmidt
2011-02-12 14:54 ` [Qemu-devel] [PATCH 15/15] Implement the bus structure for PAPR virtual IO David Gibson
2011-02-12 16:47   ` [Qemu-devel] " Alexander Graf
2011-02-12 16:59     ` Blue Swirl
2011-02-12 21:00       ` Benjamin Herrenschmidt
2011-02-12 22:52         ` Blue Swirl
2011-02-12 23:15           ` Benjamin Herrenschmidt
2011-02-13  8:08             ` Blue Swirl
2011-02-13 11:12               ` David Gibson
2011-02-13 12:15                 ` Blue Swirl
2011-02-13 16:12                   ` Benjamin Herrenschmidt
2011-02-13 15:08                 ` Anthony Liguori
2011-02-13 15:56                   ` Alexander Graf
2011-02-13 16:46                     ` Anthony Liguori
2011-02-13 18:29                   ` Blue Swirl
2011-02-13 19:32                     ` Anthony Liguori
2011-02-13 23:33                     ` David Gibson
2011-02-13 23:30                   ` David Gibson
2011-02-13 12:31               ` Alexander Graf
2011-02-13 12:59                 ` Blue Swirl
2011-02-13 16:07               ` Benjamin Herrenschmidt
2011-02-13 16:48                 ` Anthony Liguori
2011-02-13 18:19                   ` Benjamin Herrenschmidt
2011-02-13 11:14             ` David Gibson
2011-02-13 12:40               ` Alexander Graf
2011-02-13 12:44                 ` David Gibson
2011-02-13 13:09                   ` Alexander Graf
2011-02-13 15:14                 ` Anthony Liguori
2011-02-13 16:17                 ` Benjamin Herrenschmidt
2011-02-13 16:52                   ` Anthony Liguori
2011-02-13 18:21                     ` Benjamin Herrenschmidt
2011-02-13 11:09     ` David Gibson
2011-02-13 12:38       ` Alexander Graf
2011-02-14  4:16 ` [Qemu-devel] RFC: Implement emulation of pSeries logical partitions FUJITA Tomonori
2011-02-14  4:16   ` FUJITA Tomonori

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.