spice-qemu-char: write to chardev whatever amount it can read
The current code waits until the chardev can read MIN(len, VMC_MAX)
But some chardev may never reach than amount, in fact some of them
will only ever accept write of 1. Fix the min computation and remove
the VMC_MAX constant.
Uri Lublin [Wed, 12 Dec 2012 16:30:47 +0000 (18:30 +0200)]
qxl+vnc: register a vm state change handler for dummy spice_server
When qxl + vnc are used, a dummy spice_server is initialized.
The spice_server has to be told when the VM runstate changes,
which is what this patch does.
Without it, from qxl_send_events(), the following error message is shown:
qxl_send_events: spice-server bug: guest stopped, ignoring
Replace a lot of formulaic multiplications (containing casts, no less)
with calls to a pair of functions. This encapsulates in a single
place the operations which require care relating to integer overflow.
There are lots of external users of pci_internals.h,
apparently making it an internal interface only didn't
work out. Let's stop pretending it's an internal header.
Include dependencies from pci core using the correct path.
This is required now that it's in the separate directory.
Need to check whether they can be minimized, for now,
keep the code as is.
Blue Swirl [Sat, 15 Dec 2012 09:05:26 +0000 (09:05 +0000)]
Merge branch 'ppc-for-upstream' of git://repo.or.cz/qemu/agraf
* 'ppc-for-upstream' of git://repo.or.cz/qemu/agraf: (40 commits)
pseries: Increase default NVRAM size
target-ppc: Don't use hwaddr to represent hardware state
PPC: e500: pci: Export slot2irq calculation
PPC: E500plat: Make a lot of PCI slots available
PPC: E500: Move PCI slot information into params
PPC: E500: Generate dt pci irq map dynamically
PPC: E500: PCI: Make IRQ calculation more generic
PPC: E500: PCI: Make first slot qdev settable
openpic: Accelerate pending irq search
openpic: fix minor coding style issues
MSI-X: Fix endianness
PPC: e500: Declare pci bridge as bridge
PPC: e500: Add MSI support
openpic: add Shared MSI support
openpic: make brr1 model specific
openpic: convert to qdev
openpic: remove irq_out
openpic: rename openpic_t to OpenPICState
openpic: convert simple reg operations to builtin bitops
openpic: remove unused type variable
...
With MMU option xtensa architecture has two TLBs: ITLB and DTLB. ITLB is
only used for code access, DTLB is only for data. However TLB entries in
both TLBs have attribute field controlling write and exec access. These
bits need to be properly masked off depending on TLB type before being
used as tlb_set_page prot argument. Otherwise the following happens:
(1) ITLB entry for some PFN gets invalidated
(2) DTLB entry for the same PFN gets updated, attributes allow code
execution
(3) code at the page with that PFN is executed (possible due to step 2),
entry for the TB is written into the jump cache
(4) QEMU TLB entry for the PFN gets replaced with an entry for some
other PFN
(5) code in the TB from step 3 is executed (possible due to jump cache)
and it accesses data, for which there's no DTLB entry, causing DTLB
miss exception
(6) re-translation of the TB from step 5 is attempted, but there's no
QEMU TLB entry nor xtensa ITLB entry for that PFN, which causes ITLB
miss exception at the TB start address
(7) ITLB miss exception is handled by the guest, but execution is
resumed from the beginning of the faulting TB (the point where ITLB
miss occured), not from the point where DTLB miss occured, which is
wrong.
With that fix the above scenario causes ITLB miss exception (that used
to be step 7) at step 3, right at the beginning of the TB.
Gerd Hoffmann [Fri, 14 Dec 2012 07:54:24 +0000 (07:54 +0000)]
pixman: fix vnc tight png/jpeg support
This patch adds an x argument to qemu_pixman_linebuf_fill so it can
also be used to convert a partial scanline. Then fix tight + png/jpeg
encoding by passing in the x+y offset, so the data is read from the
correct screen location instead of the upper left corner.
The only reason old pixman versions didn't work was the missing
PIXMAN_TYPE_BGRA, which is properly #ifdef'ed now. So we don't
have to require a minimum pixman version.
David Gibson [Mon, 3 Dec 2012 16:42:16 +0000 (16:42 +0000)]
pseries: Increase default NVRAM size
If no image file for NVRAM is specified, the pseries machine currently
creates a 16K non-persistent NVRAM by default. This basically works, but
is not large enough for current firmware and guest kernels to create all
the NVRAM partitions they would like to. Increasing the default size to
64K addresses this and stops the guest generating error messages.
David Gibson [Mon, 3 Dec 2012 16:42:14 +0000 (16:42 +0000)]
target-ppc: Don't use hwaddr to represent hardware state
The hwaddr type is somewhat vaguely defined as being able to contain bus
addresses on the widest possible bus in the system. For that reason it's
discouraged for representing specific pieces of persistent hardware state,
which should instead use an explicit width type that matches the bits
available in real hardware. In particular, because of the possibility that
the size of hwaddr might change if different buses are added to the target
in future, it's not suitable for use in vm state descriptions for savevm
and migration.
This patch purges such unwise uses of hwaddr from the ppc target code,
which turns out to be just one. The ppcemb_tlb_t struct, used on a number
of embedded ppc models to represent a TLB entry contains a hwaddr for the
real address field. This patch changes it to be a fixed uint64_t which is
suitable enough for all machine types which use this structure.
Other uses of hwaddr in CPUPPCState turn out not to be problematic:
htab_base and htab_mask are just used for the convenience of the TCG code;
the underlying machine state is the SDR1 register, which is stored with
a suitable type already. Likewise the mpic_cpu_base field is only used
internally and does not represent fundamental hardware state which needs to
be saved.
Alexander Graf [Thu, 13 Dec 2012 00:16:24 +0000 (01:16 +0100)]
PPC: e500: pci: Export slot2irq calculation
We need the calculation method to get from a PCI slot ID to its respective
interrupt line twice. Once in the internal map function and once when
assembling the device tree.
So let's extract the calculation to a separate function that can be called
by both users.
Alexander Graf [Wed, 12 Dec 2012 12:53:53 +0000 (13:53 +0100)]
PPC: E500: Move PCI slot information into params
We have a params struct that allows us to expose differences between
e500 machine models. Include PCI slot information there, so we can have
different machines with different PCI slot topology.
Alexander Graf [Wed, 12 Dec 2012 12:47:07 +0000 (13:47 +0100)]
PPC: E500: Generate dt pci irq map dynamically
Today we're hardcoding the PCI interrupt map in the e500 machine file.
Instead, let's write it dynamically so that different machine types
can have different slot properties.
Alexander Graf [Wed, 12 Dec 2012 11:58:12 +0000 (12:58 +0100)]
PPC: E500: PCI: Make IRQ calculation more generic
The IRQ line calculation is more or less hardcoded today. Instead, let's
write it as an algorithmic function that theoretically allows an arbitrary
number of PCI slots.
Alexander Graf [Wed, 12 Dec 2012 11:56:40 +0000 (12:56 +0100)]
PPC: E500: PCI: Make first slot qdev settable
Today the first slot id in our e500 pci implementation is hardcoded to
0x11. Keep it there as default, but allow users to change the default to
a different id.
Alexander Graf [Thu, 13 Dec 2012 11:48:14 +0000 (12:48 +0100)]
openpic: Accelerate pending irq search
When we're done with one interrupt, we need to search for the next pending
interrupt in the queue. This search has grown quite big now that we have
more than 256 possible irq lines.
So let's memorize how many interrupts we have pending in our bitmaps, so
that we can always bail out in the usual case - the one where we're all done.
Alexander Graf [Sat, 8 Dec 2012 04:17:14 +0000 (05:17 +0100)]
openpic: convert to qdev
This patch converts the OpenPIC device to qdev. Along the way it
renames the "openpic" target to "raven" and the "mpic" target to
"fsl_mpic_20", to better reflect the actual models they implement.
This way we have a generic OpenPIC device now that can handle
different flavors of the OpenPIC specification.
Alexander Graf [Sat, 8 Dec 2012 01:18:58 +0000 (02:18 +0100)]
openpic: remove irq_out
The current openpic emulation contains half-ready code for bypass mode.
Remove it, so that when someone wants to finish it they can start from a
clean state.
Alexander Graf [Sat, 8 Dec 2012 00:49:52 +0000 (01:49 +0100)]
openpic: convert simple reg operations to builtin bitops
The openpic code has its own bitmap code to access bits inside of a
bitmap. However, that is overkill when we simply want to check for a
bit inside of a uint32_t.
So instead, let's use normal bit masks and C builtin shifts and ands.
Alexander Graf [Sat, 8 Dec 2012 00:04:48 +0000 (01:04 +0100)]
openpic: unify memory api subregions
The only difference between the "openpic" and "mpic" memory api subregion
descriptors is the endianness. Unify them as openpic accessors with explicit
endianness markers in their names.
Alexander Graf [Fri, 7 Dec 2012 15:31:55 +0000 (16:31 +0100)]
openpic: update to proper memory api
The openpic code was still using the old mmio memory api. Convert it to
be a generic memory api user and clean up some code that becomes redundant
that way.
Alexander Graf [Fri, 7 Dec 2012 15:10:34 +0000 (16:10 +0100)]
mpic: Unify numbering scheme
MPIC interrupt numbers in Linux (device tree) and in QEMU are different,
because QEMU takes the sparseness of the IRQ number space into account.
Remove that cleverness and instead assume a flat number space. This makes
the code easier to understand, because we are actually aligned with Linux
on the view of our worlds.
David Gibson [Mon, 3 Dec 2012 16:42:13 +0000 (16:42 +0000)]
pseries: Don't allow TCE (iommu) tables to be registered with duplicate LIOBNs
The PAPR specification requires that every bus or device mediated by the
IOMMU have a unique Logical IO Bus Number (LIOBN). This patch adds a check
to enforce this, which will help catch errors in configuration earlier.
Bharat Bhushan [Wed, 10 Oct 2012 04:28:28 +0000 (04:28 +0000)]
Adding BAR0 for e500 PCI controller
PCI Root complex have TYPE-1 configuration header while PCI endpoint
have type-0 configuration header. The type-1 configuration header have
a BAR (BAR0). In Freescale PCI controller BAR0 is used for mapping pci
address space to CCSR address space. This can used for 2 purposes: 1)
for MSI interrupt generation 2) Allow CCSR registers access when configured
as PCI endpoint, which I am not sure is a use case with QEMU-KVM guest.
What I observed is that when guest read the size of BAR0 of host controller
configuration header (TYPE1 header) then it always reads it as 0. When
looking into the QEMU hw/ppce500_pci.c, I do not find the PCI controller
device registering BAR0. I do not find any other controller also doing so
may they do not use BAR0.
There are two issues when BAR0 is not there (which I can think of):
1) There should be BAR0 emulated for PCI Root complex (TYPE1 header) and
when reading the size of BAR0, it should give size as per real h/w.
2) Do we need this BAR0 inbound address translation?
When BAR0 is of non-zero size then it will be configured for PCI
address space to local address(CCSR) space translation on inbound access.
The primary use case is for MSI interrupt generation. The device is
configured with an address offsets in PCI address space, which will be
translated to MSI interrupt generation MPIC registers. Currently I do
not understand the MSI interrupt generation mechanism in QEMU and also
IIRC we do not use QEMU MSI interrupt mechanism on e500 guest machines.
But this BAR0 will be used when using MSI on e500.
I can see one more issue, There are ATMUs emulated in hw/ppce500_pci.c,
but i do not see these being used for address translation.
So far that works because pci address space and local address space are 1:1
mapped. BAR0 inbound translation + ATMU translation will complete the address
translation of inbound traffic.
Bharat Bhushan [Wed, 10 Oct 2012 04:28:27 +0000 (04:28 +0000)]
e500: Adding CCSR memory region
All devices are also placed under CCSR memory region.
The CCSR memory region is exported to pci device. The MSI interrupt
generation is the main reason to export the CCSR region to PCI device.
This put the requirement to move mpic under CCSR region, but logically
all devices should be under CCSR. So this patch places all emulated
devices under ccsr region.
David Gibson [Mon, 12 Nov 2012 16:46:58 +0000 (16:46 +0000)]
pseries: Update SLOF for NVRAM support
Now that we have implemented PAPR compatible NVRAM interfaces in qemu, this
updates the SLOF firmware to actually initialize and use the NVRAM as a
PAPR guest firmware is expected to do.
This SLOF update also includes an ugly but useful workaround for a bug in
the SLES11 installer which caused it to fail under KVM.
David Gibson [Mon, 12 Nov 2012 16:46:57 +0000 (16:46 +0000)]
pseries: Implement PAPR NVRAM
The PAPR specification requires a certain amount of NVRAM, accessed via
RTAS, which we don't currently implement in qemu. This patch addresses
this deficiency, implementing the NVRAM as a VIO device, with some glue to
instantiate it automatically based on a machine option.
The machine option specifies a drive id, which is used to back the NVRAM,
making it persistent. If nothing is specified, the driver instead simply
allocates space for the NVRAM, which will not be persistent
David Gibson [Mon, 12 Nov 2012 16:46:55 +0000 (16:46 +0000)]
pseries: Split xics irq configuration from state information
Currently the XICS irq controller code has a per-irq state structure which
amongst other things includes whether the interrupt is level or message
triggered - this is configured by the platform code, and is not directly
visible to the guest. This leads to a slightly awkward construct at reset
time where we need to reset everything in the state structure _except_ the
lsi/msi flag, which needs to retain the information given at platform init
time.
More importantly this flag will make matching the qemu state to the KVM
state for the upcoming in-kernel XICS implementation more awkward. This
patch, therefore, removes this flag from the per-irq state structure,
instead adding a parallel array giving the lsi/msi configuration per irq.
Kernel-based RTAS calls will not have a qemu handler, but will
still be registered in qemu in order to be assigned a token
number and appear in the device-tree.
Let's test for the name being NULL rather than the handler
when deciding to skip an entry while building the device-tree
Michael Ellerman [Mon, 12 Nov 2012 16:46:52 +0000 (16:46 +0000)]
pseries: Return the token when we register an RTAS call
The kernel will soon be able to service some RTAS calls. However the
choice of tokens will still be up to userspace. To support this have
spapr_rtas_register() return the token that is allocated for an
RTAS call, that allows the calling code to tell the kernel what the
token value is.
Currently the lowest "real" irq number for the XICS irq controller (as
opposed to numbers reserved for IPIs and other special purposes) is
hard coded as 16 in two places - in xics_system_init() and in spapr.c.
As well as being generally bad practice, we're going to need to change this
number soon to fit in with the in-kernel XICS implementation. This patch
adds a #define for this number to avoid future breakage.
David Gibson [Mon, 12 Nov 2012 16:46:49 +0000 (16:46 +0000)]
pseries: Fix incorrect initialization of interrupt controller
Currently in the reset code for the XICS interrupt controller, we
initialize the pending_priority field to 0 (most favored, by XICS
convention). This is incorrect, since there is no pending interrupt, it
should be set to least favored - 0xff. At the moment our XICS
implementation doesn't get hurt by this edge case, but it does confuse the
upcoming kernel XICS implementation.
Anthony Liguori [Thu, 13 Dec 2012 17:41:57 +0000 (11:41 -0600)]
Merge remote-tracking branch 'pmaydell/arm-devs.next' into staging
* pmaydell/arm-devs.next:
hw/ds1338.c: Fix handling of DAY (wday) register.
hw/ds1338.c: Implement support for the control register.
hw/ds1338.c: Ensure state is properly initialized.
hw/ds1338.c: Fix handling of HOURS register.
hw/ds1338.c: Add definitions for various flags in the RTC registers.
hw/ds1338.c: Correct bug in conversion to BCD.
exynos4210/mct: Avoid infinite loop on non incremental timers
hw/arm_gic: fix target CPUs affected by set enable/pending ops
xilinx_zynq: Add one variable to avoid overwriting QSPI bus
hw/arm_gic_common: Correct GICC_PMR reset value for newer GICs
hw/arm_gic: Fix comparison with priority mask register
hw/arm_boot, exynos4210, highbank: Fix secondary boot GIC init
Kevin Wolf [Fri, 7 Dec 2012 17:08:48 +0000 (18:08 +0100)]
qcow2: Execute run_dependent_requests() without lock
There's no reason for run_dependent_requests() to hold s->lock, and a
later patch will require that in fact the lock is not held.
Also, before this patch, run_dependent_requests() not only does what its
name suggests, but also removes the l2meta from the list of in-flight
requests. When changing this, it becomes an one-liner, so just inline it
completely.
Kevin Wolf [Fri, 7 Dec 2012 17:08:47 +0000 (18:08 +0100)]
qcow2: Enable dirty flag in qcow2_alloc_cluster_link_l2
This is closer to where the dirty flag is really needed, and it avoids
having checks for special cases related to cluster allocation directly
in the writev loop.
Kevin Wolf [Fri, 7 Dec 2012 17:08:46 +0000 (18:08 +0100)]
qcow2: Allocate l2meta only for cluster allocations
Even for writes to already allocated clusters, an l2meta is allocated,
though it stays effectively unused. After this patch, only allocating
requests still have one. Each l2meta now describes an in-flight request
that writes to clusters that are not yet hooked up in the L2 table.
Kevin Wolf [Fri, 7 Dec 2012 17:08:45 +0000 (18:08 +0100)]
qcow2: Drop l2meta.cluster_offset
There's no real reason to have an l2meta for normal requests that don't
allocate anything. Before we can get rid of it, we must return the host
cluster offset in a different way.
Kevin Wolf [Fri, 7 Dec 2012 17:08:43 +0000 (18:08 +0100)]
qcow2: Introduce Qcow2COWRegion
This makes it easier to address the areas for which a COW must be
performed. As a nice side effect, the COW code in
qcow2_alloc_cluster_link_l2 becomes really trivial.
Amit Shah [Thu, 13 Dec 2012 10:24:43 +0000 (15:54 +0530)]
virtio-serial: delete timer if active during exit
The post_load timer was being freed, but not deleted. This could cause
problems when the timer is armed, but the device is hot-unplugged before
the callback is executed.
Amit Shah [Thu, 29 Nov 2012 19:24:44 +0000 (00:54 +0530)]
virtio-serial: allocate post_load only at load-time
This saves us a few bytes in the VirtIOSerial struct. Not a big
savings, but since the entire structure is used only during a short
while after migration, it's helpful to keep the struct cleaner and
smaller.
pci: prepare makefiles for pci code reorganization
To make it easier to move code around without breaking
build at intermedite steps, tweak makefiles
to look in pci/ and hw/ for include files, automatically.
This will be reverted at the end of the reorganization.
For tap, we currently assume the vnet header size is 10
(the default value) but that might not be the case
if tap is persistent and has been used by qemu previously.
To fix, set vnet header size correctly on open.
David Gibson [Tue, 4 Dec 2012 00:38:39 +0000 (11:38 +1100)]
migration: Fix madvise breakage if host and guest have different page sizes
madvise(DONTNEED) will throw away the contents of the whole page at the
given address, even if the given length is less than the page size. One
can argue about whether that's the correct behaviour, but that's what it's
done for a long time in Linux at least.
That means that the madvise() in ram_load(), on a setup where
TARGET_PAGE_SIZE is smaller than the host page size, can throw away data
in guest pages adjacent to the one it's actually processing right now,
leading to guest memory corruption on an incoming migration.
This patch therefore, disables the madvise() if the host page size is
larger than TARGET_PAGE_SIZE. This means we don't get the benefits of that
madvise() in this case, but a more complete fix is more difficult to
accomplish. This at least fixes the guest memory corruption.
David Gibson [Tue, 4 Dec 2012 00:38:38 +0000 (11:38 +1100)]
Fix off-by-1 error in RAM migration code
The code for migrating (or savevm-ing) memory pages starts off by creating
a dirty bitmap and filling it with 1s. Except, actually, because bit
addresses are 0-based it fills every bit except bit 0 with 1s and puts an
extra 1 beyond the end of the bitmap, potentially corrupting unrelated
memory. Oops. This patch fixes it.
Kevin Wolf [Thu, 6 Dec 2012 13:32:59 +0000 (14:32 +0100)]
qcow2: Move BLKDBG_EVENT out of the lock
We want to use these events to suspend requests for testing concurrent
AIO requests. Suspending requests while they are holding the CoMutex is
rather boring for this purpose.