Git Repo - qemu.git/log

target-i386: Set custom features/properties without intermediate x86_def_t

Move custom features parsing after built-in cpu_model defaults are set
and set custom features directly on CPU instance. That allows to make a
clear distinction between built-in cpu model defaults that eventually
should go into class_init() and extra property setting which is done
after defaults are set on CPU instance.

Impl. details:
* use object_property_parse() property setter so it would be a mechanical
   change to switch to global properties later.
* And after all current features/properties are converted into static
   properties, it will take a trivial patch to switch to global properties.
   Which will allow to:
   * get CPU instance initialized with all parameters passed on -cpu ...
     cmd. line from object_new() call.
   * call cpu_model/featurestr parsing only once before CPUs are created
   * open a road for removing CPUxxxState.cpu_model_str field, when other
     CPUs are similarly converted to subclasses and static properties.
- re-factor error handling, to use Error instead of fprintf()s, since
   it is anyway passed in for property setter.

Signed-off-by: Igor Mammedov <[email protected]>
Signed-off-by: Andreas Färber <[email protected]>

target-i386: Remove vendor_override field from CPUX86State

Commit 8935499831312 makes cpuid return to guest host's vendor value
instead of built-in one by default if kvm_enabled() == true and allows
to override this behavior if 'vendor' is specified on -cpu command line.

But every time guest calls cpuid to get 'vendor' value, host's value is
read again and again in default case.

It complicates semantics of vendor property and makes it harder to use.

Instead of reading 'vendor' value from host every time cpuid[vendor] is
called, override 'vendor' value only once in cpu_x86_find_by_name(), when
built-in CPU model is found and if(kvm_enabled() == true).

It provides the same default semantics
if (kvm_enabled() == true) vendor = host's vendor
else vendor = built-in vendor

and then later:
if (custom vendor) vendor = custom vendor

'vendor' value is overridden when user provides it on -cpu command line,
and there is no need for vendor_override field anymore, remove it.

Signed-off-by: Igor Mammedov <[email protected]>
Signed-off-by: Andreas Färber <[email protected]>

target-i386: Replace uint32_t vendor fields by vendor string in x86_def_t

Vendor property setter takes string as vendor value but cpudefs
use uint32_t vendor[123] fields to define vendor value. It makes it
difficult to unify and use property setter for values from cpudefs.

Simplify code by using vendor property setter, vendor[123] fields
are converted into vendor[13] array to keep its value. And vendor
property setter is used to access/set value on CPU.

- Make for() cycle reusable for the next patch by adding
x86_cpu_vendor_words2str()

Intel's CPUID spec[1] says:
"
5.1.1 ...
These registers contain the ASCII string: GenuineIntel
...
"

List[2] of known vendor values shows that they all are 12 ASCII
characters long, padded where necessary with space.

Current supported values are all ASCII characters packed in
ebx, edx, ecx. So lets state that QEMU supports 12 printable ASCII
characters packed in ebx, edx, ecx registers for cpuid(0) instruction.

*1 - http://www.intel.com/Assets/PDF/appnote/241618.pdf
*2 - http://en.wikipedia.org/wiki/CPUID#EAX.3D0:_Get_vendor_ID

Signed-off-by: Igor Mammedov <[email protected]>
Signed-off-by: Andreas Färber <[email protected]>

target-i386: Print deprecation warning if xlevel < 0x80000000

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Andreas Färber <[email protected]>

target-i386: Drop redundant list of CPU definitions

It is no longer needed since dropping cpudef config file support.
Cleaning this up removes knowledge about other models from x86_def_t,
in preparation for reusing x86_def_t as intermediate step towards pure
QOM X86CPU subclasses.

Signed-off-by: Andreas Färber <[email protected]>

target-i386: Simplify cpu_x86_find_by_name()

Catch NULL name argument early to avoid repeated checks.
Similarly, check for -cpu host early and untangle from iterating through
model definitions. This prepares for introducing X86CPU subclasses.

Signed-off-by: Andreas Färber <[email protected]>

pc: Generate APIC IDs according to CPU topology

This keeps compatibility on machine-types pc-1.2 and older, and prints a
warning in case the requested configuration won't get the correct
topology.

I couldn't think of a better way to warn about broken topology when in
compat mode other than using error_report(). The warning message will
probably be buried in a log file somewhere, but it's better than
nothing.

Signed-off-by: Eduardo Habkost <[email protected]>
Signed-off-by: Andreas Färber <[email protected]>

target-i386: Topology & APIC ID utility functions

This introduces utility functions for the APIC ID calculation, based on:
  Intel® 64 Architecture Processor Topology Enumeration
  http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/

The code should be compatible with AMD's "Extended Method" described at:
  AMD CPUID Specification (Publication #25481)
  Section 3: Multiple Core Calcuation
as long as:
- nr_threads is set to 1;
- OFFSET_IDX is assumed to be 0;
- CPUID Fn8000_0008_ECX[ApicIdCoreIdSize[3:0]] is set to
   apicid_core_width().

Unit tests included.

Signed-off-by: Eduardo Habkost <[email protected]>
Signed-off-by: Andreas Färber <[email protected]>

pc: Set fw_cfg data based on APIC ID calculation

This changes FW_CFG_MAX_CPUS and FW_CFG_NUMA to use apic_id_for_cpu(),
so the NUMA table can be based on the APIC IDs, instead of CPU index
(SeaBIOS knows nothing about CPU indexes, just APIC IDs).

Signed-off-by: Eduardo Habkost <[email protected]>
Signed-off-by: Andreas Färber <[email protected]>

cpus.h: Make constant smp_cores/smp_threads available on *-user

The code that calculates the APIC ID will use smp_cores/smp_threads, so
just define them as 1 on *-user to avoid #ifdefs in the code.

Signed-off-by: Eduardo Habkost <[email protected]>
Signed-off-by: Andreas Färber <[email protected]>

fw_cfg: Remove FW_CFG_MAX_CPUS from fw_cfg_init()

PC will not use max_cpus for that field, so move it outside the common
code so it can use a different value on PC.

Signed-off-by: Eduardo Habkost <[email protected]>
Signed-off-by: Andreas Färber <[email protected]>

target-i386: Introduce x86_cpu_apic_id_from_index() function

This function will be used by both the CPU initialization code and the
fw_cfg table initialization code.

Later this function will be updated to generate APIC IDs according to
the CPU topology.

Signed-off-by: Eduardo Habkost <[email protected]>
Signed-off-by: Andreas Färber <[email protected]>

target-i386: kvm: Set vcpu_id to APIC ID instead of CPU index

The CPU ID in KVM is supposed to be the APIC ID, so change the
KVM_CREATE_VCPU call to match it. The current behavior didn't break
anything yet because today the APIC ID is assumed to be equal to the CPU
index, but this won't be true in the future.

Signed-off-by: Eduardo Habkost <[email protected]>
Reviewed-by: Marcelo Tosatti <[email protected]>
Acked-by: Gleb Natapov <[email protected]>
Signed-off-by: Andreas Färber <[email protected]>

kvm: Create kvm_arch_vcpu_id() function

This will allow each architecture to define how the VCPU ID is set on
the KVM_CREATE_VCPU ioctl call.

Signed-off-by: Eduardo Habkost <[email protected]>
Acked-by: Gleb Natapov <[email protected]>
Signed-off-by: Andreas Färber <[email protected]>

pc: Reverse pc_init_pci() compatibility logic

Currently, the pc-1.4 machine init function enables PV EOI and then
calls the pc-1.2 machine init function. The problem with this approach
is that now we can't enable any additional compatibility code inside the
pc-1.2 init function because it would end up enabling the compatibility
behavior on pc-1.3 and pc-1.4 as well.

This reverses the logic so that the pc-1.2 machine init function will
disable PV EOI, and then call the pc-1.4 machine init function.

This way we can change older machine-types to enable compatibility
behavior, and the newer machine-types (pc-1.3, pc-q35-1.4 and
pc-i440fx-1.4) would just use the default behavior.

(This means that one nice side-effect of this change is that pc-q35-1.4
will get PV EOI enabled by default, too)

It would be interesting to eventually change pc_init_pci_no_kvmclock()
and pc_init_isa() to reuse pc_init_pci_1_2() as well (so we don't need
to duplicate compatibility code on those two functions). But this will
be probably much easier to do after we create a PCInitArgs struct for
the PC initialization arguments, and/or after we use global-properties
to implement the compatibility modes present in pc_init_pci_1_2().

Signed-off-by: Eduardo Habkost <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Andreas Färber <[email protected]>

target-i386: Don't set any KVM flag by default if KVM is disabled

This is a cleanup that tries to solve two small issues:

- We don't need a separate kvm_pv_eoi_features variable just to keep a
   constant calculated at compile-time, and this style would require
   adding a separate variable (that's declared twice because of the
   CONFIG_KVM ifdef) for each feature that's going to be
   enabled/disabled by machine-type compat code.
- The pc-1.3 code is setting the kvm_pv_eoi flag on cpuid_kvm_features
   even when KVM is disabled at runtime. This small inconsistency in
   the cpuid_kvm_features field isn't a problem today because
   cpuid_kvm_features is ignored by the TCG code, but it may cause
   unexpected problems later when refactoring the CPUID handling code.

This patch eliminates the kvm_pv_eoi_features variable and simply uses
kvm_enabled() inside the enable_kvm_pv_eoi() compat function, so it
enables kvm_pv_eoi only if KVM is enabled. I believe this makes the
behavior of enable_kvm_pv_eoi() clearer and easier to understand.

Signed-off-by: Eduardo Habkost <[email protected]>
Acked-by: Gleb Natapov <[email protected]>
Reviewed-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Andreas Färber <[email protected]>

kvm: Add fake KVM_FEATURE_CLOCKSOURCE_STABLE_BIT for builds without KVM

Signed-off-by: Eduardo Habkost <[email protected]>
Acked-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Andreas Färber <[email protected]>

target-openrisc: Clean up triple QOM casts

Instead of calling openrisc_env_get_cpu(), casting to CPU() via the
ENV_GET_CPU() compatibility macro and casting back to OPENRISC_CPU(),
just call openrisc_env_get_cpu() directly.

ENV_GET_CPU() is meant as workaround for target-independent code only.

Signed-off-by: Andreas Färber <[email protected]>

target-openrisc: Drop OpenRISCCPUList

It was missed in 92a3136174f60ee45b113296cb2c2a5225b00369 (cpu:
Introduce CPUListState struct) because its naming did not match the
*CPUListState pattern. Use the generalized CPUListState instead.

Signed-off-by: Andreas Färber <[email protected]>

xilinx_ethlite: Avoid build warnings in debug code

Signed-off-by: Edgar E. Iglesias <[email protected]>

m25p80.c: Return state to IDLE after COLLECTING

Default to moving back to the IDLE state after the COLLECTING_DATA
state. For a well behaved guest this patch has no consequence, but
A bad guest could crash QEMU by using one of the erase commands
followed by a longer than 5 byte argument (undefined behaviour).

Signed-off-by: Peter Crosthwaite <[email protected]>
Signed-off-by: Edgar E. Iglesias <[email protected]>

xilinx_ethlite: Flush queued packets on SW service

Software services a received packet by clearing the CTRL_S bit in the RX_CTRLn
register. If this bit is cleared, flush any packets queued for the device.

Reported-by: John Williams <[email protected]>
Signed-off-by: Peter Crosthwaite <[email protected]>
Signed-off-by: Edgar E. Iglesias <[email protected]>

xilinx_ethlite: fix eth_can_rx() for ping-pong

The eth_can_rx() function only checks the first buffers status ("ping"). The
controller should be able to receive into "pong" when ping-pong is enabled.
Checks the active buffer (either "ping" or "pong") when determining can_rx()
rather than just testing "ping".

Signed-off-by: Peter Crosthwaite <[email protected]>
Signed-off-by: Edgar E. Iglesias <[email protected]>

Merge branch 'ppc-for-upstream' of git://repo.or.cz/qemu/agraf

* 'ppc-for-upstream' of git://repo.or.cz/qemu/agraf:
  PPC: e500: Select MPIC v4.2 on ppce500 platform
  PPC: e500: fix mpic_iack address
  openpic: add basic support for MPIC v4.2
  openpic: fix timer address decoding
  openpic: fix remaining issues from idr-to-destmask conversion
  pseries: Adjust default VIO address allocations to play better with libvirt
  pseries: Improve handling of multiple PCI host bridges
  target-ppc: Give a meaningful error if too many threads are specified
  cuda: Move ADB bus into CUDA state
  adb: QOM'ify ADB devices
  adb: QOM'ify Apple Desktop Bus
  cuda: QOM'ify CUDA
  ide/macio: QOM'ify MacIO IDE
  mac_nvram: QOM'ify MacIO NVRAM
  mac_nvram: Mark as Big Endian
  mac_nvram: Clean up public API
  macio: Split MacIO in two
  macio: Delay qdev init until all fields are initialized
  macio: QOM'ify some more
  ppc: Move Mac machines to hw/ppc/

tests: Add gcov support for x86_64 qtest

Since x86_64 is a superset of i386 and reuses all its test cases, adopt
all the i386 gcov source files as well, substituting their paths
appropriately.

Signed-off-by: Andreas Färber <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

tests: Add gcov support for sparc64 qtest

m48t59-test is individually being executed for sparc and sparc64, so add
the gcov source file for sparc64 as well.

Signed-off-by: Andreas Färber <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

tests: Fix gcov typo for tmp105-test

Commit 6e9989034b176a8e4cfdccd85892abfa73977ba7 introduced a new qtest
test case but misspelled gcov, leading to no coverage analysis. Fix it.

Signed-off-by: Andreas Färber <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

vmware_vga: fix out of bounds and invalid rects updating

This is a follow up for several attempts to fix this issue.

Previous incarnations:

1. http://thread.gmane.org/gmane.linux.ubuntu.bugs.general/3156089
https://bugs.launchpad.net/bugs/918791
"qemu-kvm dies when using vmvga driver and unity in the guest" bug.
Fix by Serge Hallyn:
https://launchpadlibrarian.net/94916786/qemu-vmware.debdiff
This fix is incomplete, since it does not check width and height
for being negative.  Serge weren't sure if that's the right place
to fix it, maybe the fix should be up the stack somewhere.

2. http://thread.gmane.org/gmane.comp.emulators.qemu/166064
by Marek Vasut: "vmware_vga: Redraw only visible area"

This one adds the (incomplete) check to vmsvga_update_rect_delayed(),
the routine just queues the rect updating but does no interesting
stuff.  It is also incomplete in the same way as patch by Serge,
but also does not touch width&height at all after adjusting x&y,
which is wrong.

As far as I can see, when processing guest requests, the device
places them into a queue (vmsvga_update_rect_delayed()) and
processes this queue in different place/time, namely, in
vmsvga_update_rect().  Sometimes, vmsvga_update_rect() is
called directly, without placing the request to the gueue.
This is the place this patch changes, which is the last
(deepest) in the stack.  I'm not sure if this is the right
place still, since it is possible we have some queue optimization
(or may have in the future) which will be upset by negative/wrong
values here, so maybe we should check for validity of input
right when receiving request from the guest (and maybe even
use unsigned types there).  But I don't know the protocol
and implementation enough to have a definitive answer.

But since vmsvga_update_rect() has other sanity checks already,
I'm adding the missing ones there as well.

Cc'ing BALATON Zoltan and Andrzej Zaborowski who shows in `git blame'
output and may know something in this area.

If this patch is accepted, it should be applied to all active
stable branches (at least since 1.1, maybe even before), with
minor context change (ds_get_*(s->vga.ds) => s->*).  I'm not
Cc'ing -stable yet, will do it explicitly once the patch is
accepted.

BTW, these checks use fprintf(stderr) -- it should be converted
to something more appropriate, since stderr will most likely
disappear somewhere.

Cc: Marek Vasut <[email protected]>
CC: Serge Hallyn <[email protected]>
Cc: BALATON Zoltan <[email protected]>
Cc: Andrzej Zaborowski <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>
Reviewed-by: Marek Vasut <[email protected]>
Signed-off-by: Serge Hallyn <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

tests: add fuzzing to visitor tests

Perform input tests on random data.

Improvement to code coverage for qapi/string-input-visitor.c
is about 3 percentage points.

Signed-off-by: Blue Swirl <[email protected]>

build: remove *.lo, *.a, *.la files from all subdirectories on make clean

.lo files in stubs/, util/ and libcacard/ were not cleaned.
Fix this.

Cc: Blue Swirl <[email protected]>
Reported-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Michal Privoznik <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

hw/arm_boot: Align device tree to 4KB boundary, not page

Align the device tree blob to a 4KB boundary, not to QEMU's
idea of a page boundary -- the latter is the smallest possible
page size for the architecture, which on ARM is 1KB.
The documentation for Linux does not impose separation
or alignment requirements on the device tree blob, but
in practice some kernels will happily trash the entire
page the initrd ends in after they have finished uncompressing
the initrd. So 4KB-align the DTB to ensure it does not get
trampled by these kernels.

Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

qemu-char: Avoid unused variable warning in some configs

Avoid unused variable warnings:
qemu-char.c: In function 'qmp_chardev_open_port':
qemu-char.c:3132: warning: unused variable 'fd'
qemu-char.c:3132: warning: unused variable 'flags'

in configurations with neither HAVE_CHARDEV_TTY nor
HAVE_CHARDEV_PARPORT set.

Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

make_device_config.sh: Fix target path in generated dependency file

config-devices.mak.d is included from Makefile.target, i.e. from inside
the *-softmmu/ directory. It included the directory path, so never
applied to the actual ./config-devices.mak. Symptoms were spurious
build failures due to missing dependency on default-configs/pci.mak.

Fix this by using `basename` to strip the directory path.

Reported-by: Gerhard Wiesinger <[email protected]>
Cc: [email protected]
Signed-off-by: Andreas Färber <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

fw_cfg: Drop a few superfluous initializers

Signed-off-by: Markus Armbruster <[email protected]>
Reviewed-by: Laszlo Ersek <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

fw_cfg: Splash image loader can overrun a stack variable, fix

read_splashfile() passes the address of an int variable as size_t *
parameter to g_file_get_contents(), with a cast to gag the compiler.

No problem on machines where sizeof(size_t) == sizeof(int).

Happens to work on my x86_64 box (64 bit little endian): the least
significant 32 bits of the file size end up in the right place
(caller's variable file_size), and the most significant 32 bits
clobber a place that gets assigned to before its next use (caller's
variable file_type).

I'd expect it to break on a 64 bit big-endian box.

Fix up the variable types and drop the problematic cast.

Signed-off-by: Markus Armbruster <[email protected]>
Reviewed-by: Laszlo Ersek <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

softfloat: Handle float_muladd_negate_c when product is zero

Honour float_muladd_negate_c in the case where the product is zero and
c is nonzero. Previously we would fail to negate c.

Seen in (and tested against) the gfortran testsuite on MIPS.

Signed-off-by: Richard Sandiford <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

hw/pxa2xx_timer: Explicitly mark fallthroughs

Explicitly mark the fallthroughs as intentional in the code
pattern where we gradually increment an index before falling
into the code to read/write that array entry:
    case THINGY_3: idx++;
    case THINGY_2: idx++;
    case THINGY_1: idx++;
    case THINGY_0: return s->thingy[idx];

This makes static analysers happy.

Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

hw/smc91c111: Add explicit 'return' rather than relying on fallthrough

Add an explicit 'return' statement to a case in smc91c111_readb
rather than relying on fallthrough to the following case's
return statement, for code clarity and to placate static analysers.

Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

hw/pflash_cfi02.c: Mark deliberate fallthrough

Mark the deliberate fallthrough where we treat the case of
an attempt to read flash when it is an unknown command
state as if it were a normal read.

Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

hw/omap_dma, hw/omap_spi: Explicitly mark fallthroughs

Explicitly mark the fallthroughs as intentional in the code
pattern where we gradually increment an index before falling
into the code to read/write that array entry:
  case THINGY_3: idx++;
  case THINGY_2: idx++;
  case THINGY_1: idx++;
  case THINGY_0: return s->thingy[idx];

This makes static analysers happy.

Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

hw/omap1.c: Add fallthrough markers and breaks

Explicitly mark cases where we are deliberately falling
through to the following code. In one case we insert a
'break' instead of falling through to a 'break', as this
seems slightly clearer.

Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

hw/arm_sysctl.c: Add missing 'break' statements

Add some break statements that were accidentally omitted
from some cases of arm_sysctl_write(). The omission was
harmless because in both cases the following case did
an immediate break, but adding the breaks explicitly
placates static analysers and avoids weird behaviour if
the following register is ever implemented as something
other than a no-op.

Signed-off-by: Peter Maydell <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

link seccomp only with softmmu targets

Now, if seccomp is detected, it is linked into every executable,
but is used only by softmmu targets (from vl.c). So link it
only where it is actually needed.

Signed-off-by: Michael Tokarev <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

bsd-user: avoid conflict with qemu_vmalloc

Rename qemu_vmalloc() to bsd_vmalloc(), adjust the only user.

Remove #ifdeffery in oslib-posix.c.

Tested-by: Andreas Färber <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

build: remove extra-obj-y

extra-obj-y is somewhat complicated to understand. Replace it with a
special CONFIG_ALL symbol that is defined only at toplevel.
This limits the case of directories defining more than one
*-obj-y target.

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

build: remove universal-obj-y

All of universal-obj-y, user-obj-y (right now unused) and common-obj-y can
be unified into common-obj-y if we take care of defining CONFIG_SOFTMMU
and CONFIG_USER_ONLY in the toplevel makefile. This is similar to how
we define symbols for hardware components.

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

build: use -$(CONFIG_SECCOMP) instead of ifeq

Signed-off-by: Paolo Bonzini <[email protected]>
Acked-by: Andreas Färber <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

build: move around libcacard-y definition

It is also needed if !CONFIG_SOFTMMU, unlike everything that surrounds it.
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

tests: adjust gcov variables for directory movement

I had missed the introduction of the gcov-files-* variables.

Cc: Blue Swirl <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Blue Swirl <[email protected]>

PPC: e500: Select MPIC v4.2 on ppce500 platform

The compatible string is changed to fsl,mpic on all e500 platforms, to
advertise the existence of BRR1. This matches what the device tree will
have on real hardware.

With MPIC v4.2 max_cpu can be increased from 15 to 32.

Signed-off-by: Scott Wood <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

PPC: e500: fix mpic_iack address

MPIC+0xa0 is IACK for the current CPU. MPIC+0x200a0 is IACK for CPU 0.
This fix allows EPR to work with an SMP target.

Signed-off-by: Scott Wood <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

openpic: add basic support for MPIC v4.2

Besides the new value in the version register, this provides:
- ILR support, which includes:
  - IDR becoming a pure CPU bitmap, allowing 32 CPUs
  - machine check output support (though other parts of QEMU need to
    be fixed for it to do something other than immediately reboot the
    guest)
- dummy error interrupt support (EISR0/EIMR0 read as zero)
  - actually all FSL MPICs get all summary registers returning zero for now,
    which includes EISR0/EIMR0

Various refactoring is done to support these changes and to ease
new functionality (e.g. a more flexible way of declaring regions).

Just as the code was already not a full implementation of MPIC v2.0,
this is not a full implementation of MPIC v4.2 -- e.g. it still has only
one bank of MSIs.

Signed-off-by: Scott Wood <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

openpic: fix timer address decoding

The timer memory range begins at 0x10f0, so that address 0x1120 shows
up as 0x30, 0x1130 shows up as 0x40, etc. However, the address
decoding (other than TFRR) is not adjusted for this, causing the
wrong registers to be accessed.

Signed-off-by: Scott Wood <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

openpic: fix remaining issues from idr-to-destmask conversion

openpic_update_irq() was checking idr rather than destmask, treating
it as if it were a simple bitmap of cpus. Changed to use destmask.

IPI delivery was removing bits directly from .idr, without calling
write_IRQreg_idr so that the change could be conveyed to destmask.
Changed to use destmask directly.

Save/restore destmask when serializing, as due to the IPI change it
cannot be reproduced from idr.

Signed-off-by: Scott Wood <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

pseries: Adjust default VIO address allocations to play better with libvirt

Currently, if VIO devices for pseries don't have addresses explicitly
allocated, they get automatically numbered from 0x1000. This is in the
same general range that libvirt will typically assign VIO device addresses.

That means that if there is a device libvirt doesn't know about, and it
gets an address assigned before the libvirt assigned devices are processed,
we can end up with an address conflict (qemu will abort with an error).

While the real solution is to teach libvirt about the other devices, so it
can correctly manage the whole allocation, this patch reduces the interim
inconvenience by moving qemu allocations to a range that libvirt is less
likely to conflict with.

Because the guest gets the device addresses through the device tree, these
addresses are truly arbitrary and can be changed without breaking guests.

Signed-off-by: David Gibson <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

pseries: Improve handling of multiple PCI host bridges

Multiple - even many - PCI host bridges (i.e. PCI domains) are very
common on real PAPR compliant hardware.  For reasons related to the
PAPR specified IOMMU interfaces, PCI device assignment with VFIO will
generally require at least two (virtual) PHBs and possibly more
depending on which devices are assigned.

At the moment the qemu PAPR PCI code will not deal with this well,
leaving several crucial parameters of PHBs other than the default one
uninitialized.  This patch reworks the code to allow this.

Every PHB needs a unique BUID (Bus Unit Identifier, the id used for
the PAPR PCI related interfaces) and a unique LIOBN (Logical IO Bus
Number, the id used for the PAPR IOMMU related interfaces).  In
addition they need windows in CPU real address space to access PCI
memory space, PCI IO space and MSIs.  Properties are added to the PCI
host bridge qdevice to allow configuration of all these.

To simplify configuration of multiple PHBs for common cases, a
convenience "index" property is also added.  This can be set instead
of the low-level properties, and will generate suitable values for the
other parameters, different for each index value.

Signed-off-by: David Gibson <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Give a meaningful error if too many threads are specified

Currently the target-ppc tcg code only supports a single thread. You can
specify more, but they're treated identically to multiple cores. On KVM
we obviously can't support more threads than the hardware; if more are
specified it will cause strange and cryptic errors.

This patch clarifies the situation by giving a simple meaningful error if
more threads are specified than we can support.

Signed-off-by: Mike Qiu <[email protected]>
Signed-off-by: David Gibson <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

cuda: Move ADB bus into CUDA state

Replace the global adb_bus with a CUDA-internal one, accessed using
regular qdev child bus accessor.

Signed-off-by: Andreas Färber <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

adb: QOM'ify ADB devices

They were not qdev'ified before. Derive ADBDevice from DeviceState and
convert reset callbacks to DeviceClass::reset, ADBDevice::opaque pointer
to ADBDevice subtypes for mouse and keyboard and adb_{kbd,mouse}_init()
to regular qdev functions.

Fixing Coding Style issues and splitting keyboard and mouse off into
their own files is left for a later point in time.

Signed-off-by: Andreas Färber <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

adb: QOM'ify Apple Desktop Bus

It was not a qbus before, turn it into a first-class bus and initialize
it properly from CUDA. Leave it a global variable as long as devices are
not QOM'ified yet.

Signed-off-by: Andreas Färber <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

cuda: QOM'ify CUDA

It was not qdev'ified before. Turn it into a SysBusDevice and embed it
in MacIO.

Signed-off-by: Andreas Färber <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

ide/macio: QOM'ify MacIO IDE

It was not qdev'ified before. Turn it into a SysBusDevice.
Embed them into the MacIO devices.

Signed-off-by: Andreas Färber <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

mac_nvram: QOM'ify MacIO NVRAM

It was not qdev'ified before. Turn it into a SysBusDevice and
initialize it via static properties.

Prepare Old World specific MacIO state and embed the NVRAM state there.

Drop macio_nvram_setup_bar() in favor of sysbus_mmio_map() or
direct use of Memory API.

Signed-off-by: Andreas Färber <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

mac_nvram: Mark as Big Endian

Signed-off-by: Andreas Färber <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

mac_nvram: Clean up public API

The state data field is accessed in uint8_t quantities, so switch from
uint32_t argument and return value to uint8_t.

Fix debug format specifiers while at it.

Signed-off-by: Andreas Färber <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

macio: Split MacIO in two

Let the machines create two different types. This prepares to move
knowledge about sub-devices from the machines into the devices.

Signed-off-by: Andreas Färber <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

macio: Delay qdev init until all fields are initialized

This turns macio_bar_setup() into an implementation detail of the qdev
initfn, to be removed step by step.

Signed-off-by: Andreas Färber <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

macio: QOM'ify some more

Move bar MemoryRegion initialization to an instance_init.

Signed-off-by: Andreas Färber <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

ppc: Move Mac machines to hw/ppc/

Signed-off-by: Andreas Färber <[email protected]>
[agraf: squash in MAINTAINERS fix]
Signed-off-by: Alexander Graf <[email protected]>

ide: Add fall through annotations

Add comments to help static analysers detect that these cases are
intentional, and clean up some whitespace in the environment of these
comments.

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>

block: Create proper size file for disk mirror

The qmp monitor command to mirror a disk was passing -1 for size
along with the disk's backing file. This size of the resulting disk
is the size of the backing file, which is incorrect if the disk
has been resized. Therefore we should always pass in the size of
the current disk.

Signed-off-by: Vishvananda Ishaya <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

ahci: Add migration support

Jason tested these patches by migrating Windows 7 and Fedora 17 guests
(while under I/O) on both piix with ahci attached and on q35 (which has
a built-in AHCI controller).

Signed-off-by: Andreas Färber <[email protected]>
Signed-off-by: Jason Baron <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

ahci: Change data types in preparation for migration

The size of an int depends on the host, so in order to be able to
migrate these fields, make them either int32_t or bool, depending on the
use.

Signed-off-by: Kevin Wolf <[email protected]>

ahci: Remove unused AHCIDevice fields

'dma_status' and 'dma_cb' are written to, but never read.
Remove these fields in preparation for AHCI migration bits.

Signed-off-by: Jason Baron <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

hbitmap: add assertion on hbitmap_iter_init

hbitmap_iter_init causes an out-of-bounds access when the "first"
argument is or greater than or equal to the size of the bitmap.
Forbid this with an assertion, and remove the failing testcase.

Reported-by: Kevin Wolf <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Laszlo Ersek <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

mirror: do nothing on zero-sized disk

On a zero-sized disk we need to break out of the job successfully
before bdrv_dirty_iter_init is called, otherwise you will get an
assertion failure with the next patch.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Laszlo Ersek <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block/vdi: Check for bad signature

vdi_open did not check for a bad signature.
This check was only in vdi_probe.

Signed-off-by: Stefan Weil <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block/vdi: Improved return values from vdi_open

vdi_open returned -1 in case of any error, but it should return an
error code (negative value of errno or -EMEDIUMTYPE).

Signed-off-by: Stefan Weil <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block/vdi: Improve debug output for signature

The signature is a 32 bit value and needs up to 8 hex digits for printing.

Signed-off-by: Stefan Weil <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Use error code EMEDIUMTYPE for wrong format in some block drivers

This improves error reports for bochs, cow, qcow, qcow2, qed and vmdk
when a file with the wrong format is selected.

Signed-off-by: Stefan Weil <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Add special error code for wrong format

The block drivers need a special error code for "wrong format".
From the available error codes EMEDIUMTYPE fits best.
It is not available on all platforms, so a definition in
qemu-common.h and a specific error report are needed.

Signed-off-by: Stefan Weil <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

mirror: support arbitrarily-sized iterations

Yet another optimization is to extend the mirroring iteration to include more
adjacent dirty blocks. This limits the number of I/O operations and makes
mirroring efficient even with a small granularity. Most of the infrastructure
is already in place; we only need to put a loop around the computation of
the origin and sector count of the iteration.

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

mirror: support more than one in-flight AIO operation

With AIO support in place, we can start copying more than one chunk
in parallel. This patch introduces the required infrastructure for
this: the buffer is split into multiple granularity-sized chunks,
and there is a free list to access them.

Because of copy-on-write, a single operation may already require
multiple chunks to be available on the free list.

In addition, two different iterations on the HBitmap may want to
copy the same cluster. We avoid this by keeping a bitmap of in-flight
I/O operations, and blocking until the previous iteration completes.
This should be a pretty rare occurrence, though; as long as there is
no overlap the next iteration can start before the previous one finishes.

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

mirror: add buf-size argument to drive-mirror

This makes sense when the next commit starts using the extra buffer space
to perform many I/O operations asynchronously.

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

mirror: switch mirror_iteration to AIO

There is really no change in the behavior of the job here, since
there is still a maximum of one in-flight I/O operation between
the source and the target. However, this patch already introduces
the AIO callbacks (which are unmodified in the next patch)
and some of the logic to count in-flight operations and only
complete the job when there is none.

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

mirror: allow customizing the granularity

The desired granularity may be very different depending on the kind of
operation (e.g. continuous replication vs. collapse-to-raw) and whether
the VM is expected to perform lots of I/O while mirroring is in progress.

Allow the user to customize it, while providing a sane default so that
in general there will be no extra allocated space in the target compared
to the source.

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: allow customizing the granularity of the dirty bitmap

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: return count of dirty sectors, not chunks

Reviewed-by: Laszlo Ersek <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

mirror: perform COW if the cluster size is bigger than the granularity

When mirroring runs, the backing files for the target may not yet be
ready.  However, this means that a copy-on-write operation on the target
would fill the missing sectors with zeros.  Copy-on-write only happens
if the granularity of the dirty bitmap is smaller than the cluster size
(and only for clusters that are allocated in the source after the job
has started copying).  So far, the granularity was fixed to 1MB; to avoid
the problem we detected the situation and required the backing files to
be available in that case only.

However, we want to lower the granularity for efficiency, so we need
a better solution.  The solution is to always copy a whole cluster the
first time it is touched.  The code keeps a bitmap of clusters that
have already been allocated by the mirroring job, and only does "manual"
copy-on-write if the chunk being copied is zero in the bitmap.

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: make round_to_clusters public

This is needed in the following patch.

Reviewed-by: Laszlo Ersek <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: implement dirty bitmap using HBitmap

This actually uses the dirty bitmap in the block layer, and converts
mirroring to use an HBitmapIter.

Reviewed-by: Laszlo Ersek <[email protected]> (except block/mirror.c parts)
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

add hierarchical bitmap data type and test cases

HBitmaps provides an array of bits.  The bits are stored as usual in an
array of unsigned longs, but HBitmap is also optimized to provide fast
iteration over set bits; going from one bit to the next is O(logB n)
worst case, with B = sizeof(long) * CHAR_BIT: the result is low enough
that the number of levels is in fact fixed.

In order to do this, it stacks multiple bitmaps with progressively coarser
granularity; in all levels except the last, bit N is set iff the N-th
unsigned long is nonzero in the immediately next level.  When iteration
completes on the last level it can examine the 2nd-last level to quickly
skip entire words, and even do so recursively to skip blocks of 64 words or
powers thereof (32 on 32-bit machines).

Given an index in the bitmap, it can be split in group of bits like
this (for the 64-bit case):

     bits 0-57 => word in the last bitmap     | bits 58-63 => bit in the word
     bits 0-51 => word in the 2nd-last bitmap | bits 52-57 => bit in the word
     bits 0-45 => word in the 3rd-last bitmap | bits 46-51 => bit in the word

So it is easy to move up simply by shifting the index right by
log2(BITS_PER_LONG) bits.  To move down, you shift the index left
similarly, and add the word index within the group.  Iteration uses
ffs (find first set bit) to find the next word to examine; this
operation can be done in constant time in most current architectures.

Setting or clearing a range of m bits on all levels, the work to perform
is O(m + m/W + m/W^2 + ...), which is O(m) like on a regular bitmap.

When iterating on a bitmap, each bit (on any level) is only visited
once.  Hence, The total cost of visiting a bitmap with m bits in it is
the number of bits that are set in all bitmaps.  Unless the bitmap is
extremely sparse, this is also O(m + m/W + m/W^2 + ...), so the amortized
cost of advancing from one bit to the next is usually constant.

Reviewed-by: Laszlo Ersek <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

host-utils: add ffsl

We can provide fast versions based on the other functions defined
by host-utils.h. Some care is required on glibc, which provides
ffsl already.

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

QAPI: Introduce memchar-read QMP command

Signed-off-by: Lei Li <[email protected]>
Signed-off-by: Luiz Capitulino <[email protected]>

QAPI: Introduce memchar-write QMP command

Signed-off-by: Lei Li <[email protected]>
Signed-off-by: Luiz Capitulino <[email protected]>

qemu-char: Add new char backend CirMemCharDriver

Signed-off-by: Lei Li <[email protected]>
Signed-off-by: Luiz Capitulino <[email protected]>

docs: document virtio-balloon stats

Signed-off-by: Luiz Capitulino <[email protected]>
Reviewed-by: Eric Blake <[email protected]>

balloon: re-enable balloon stats

The statistics are now available through device properties via a
polling mechanism. First a client has to enable polling, then it
can query available stats.

Polling is enabled by setting an update interval (in seconds)
to a property named guest-stats-polling-interval, like this:

{ "execute": "qom-set",
  "arguments": { "path": "/machine/peripheral-anon/device[1]",
                 "property": "guest-stats-polling-interval", "value": 4 } }

Then the available stats can be retrieved by querying the
guest-stats property. The returned object is a dict containing
all available stats. Example:

{ "execute": "qom-get",
  "arguments": { "path": "/machine/peripheral-anon/device[1]",
  "property": "guest-stats" } }

{
    "return": {
        "stats": {
            "stat-swap-out": 0,
            "stat-free-memory": 844943360,
            "stat-minor-faults": 219028,
            "stat-major-faults": 235,
            "stat-total-memory": 1044406272,
            "stat-swap-in": 0
        },
        "last-update": 1358529861
    }
}

Please, check the next commit for full documentation.

Signed-off-by: Luiz Capitulino <[email protected]>
Reviewed-by: Eric Blake <[email protected]>

balloon: drop old stats code & API

Next commit will re-enable balloon stats with a different interface, but
this old code conflicts with it. Let's drop it.

It's important to note that the QMP and HMP interfaces are also dropped
by this commit. That shouldn't be a problem though, because:

1. All QMP fields are optional
2. This feature has always been disabled

Signed-off-by: Luiz Capitulino <[email protected]>
Reviewed-by: Eric Blake <[email protected]>

block: Monitor command commit neglects to report some errors

The non-live bdrv_commit() function may return one of the following
errors: -ENOTSUP, -EBUSY, -EACCES, -EIO. The only error that is
checked in the HMP handler is -EBUSY, so the monitor command 'commit'
silently fails for all error cases other than 'Device is in use'.

Report error using monitor_printf() and strerror(), and convert existing
qerror_report() calls in do_commit() to monitor_printf().

Signed-off-by: Jeff Cody <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Signed-off-by: Luiz Capitulino <[email protected]>