Git Repo - qemu.git/log

tcg: Adjust CODE_GEN_AVG_BLOCK_SIZE

At present, the "average" guestimate of TB size is way too small, leading
to many unused entries in the pre-allocated TB array. For a guest with 1GB
ram, we're currently allocating 256MB for the array.

Survey arm, alpha, aarch64, ppc, sparc, i686, x86_64 guests running on
x86_64 and ppc64 hosts and select a new average. The size of the array
drops to 81MB with no more flushing than before.

Reviewed-by: Aurelien Jarno <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Check for overflow via highwater mark

We currently pre-compute an worst case code size for any TB, which
works out to be 122kB. Since the average TB size is near 1kB, this
wastes quite a lot of storage.

Instead, check for overflow in between generating code for each opcode.
The overhead of the check isn't measurable and wastage is minimized.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Allocate a guard page after code_gen_buffer

This will catch any overflow of the buffer.

Add a native win32 alternative for alloc_code_gen_buffer;
remove the malloc alternative.

Signed-off-by: Richard Henderson <[email protected]>

tcg: Emit prologue to the beginning of code_gen_buffer

By putting the prologue at the end, we risk overwriting the
prologue should our estimate of maximum TB size. Given the
two different placements of the call to tcg_prologue_init,
move the high water mark computation into tcg_prologue_init.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Remove tcg_gen_code_search_pc

It's no longer used, so tidy up everything reached by it.

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Remove gen_intermediate_code_pc

It is no longer used, so tidy up everything reached by it.
This includes the gen_opc_* arrays, the search_pc parameter
and the inline gen_intermediate_code_internal functions.

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Save insn data and use it in cpu_restore_state_from_tb

We can now restore state without retranslation.

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Pass data argument to restore_state_to_opc

The gen_opc_* arrays are already redundant with the data stored in
the insn_start arguments. Transition restore_state_to_opc to use
data from the latter.

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Add TCG_MAX_INSNS

Adjust all translators to respect it.

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-*: Drop cpu_gen_code define

This symbol no longer exists.

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Merge cpu_gen_code into tb_gen_code

As it's only caller, this tidies things a bit.

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-sparc: Add npc state to insn_start

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-sparc: Remove gen_opc_jump_pc

Since jump_pc[1] is always npc + 4, we can infer after incrementing
that jump_pc[1] == pc + 4. Because of that, we can encode the branch
destination into a single word, and store that in npc.

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-sparc: Split out gen_branch_n

Unify three copies of this code from different
branch types. Fix the case when npc == DYNAMIC_PC,
i.e. a branch within a delay slot.

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-sparc: Tidy gen_branch_a interface

We always pass pc2 == dc->npc and r_cond == cpu_cond,
and always set is_br afterward. Infer all of that.

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-cris: Mirror gen_opc_pc into insn_start

This perhaps isn't ideal in terms of (ab)using the "pc" field
to encode both pc and ppc + delay branch state, as one has to
be aware of this when examining opcode dumps.

But it preserves existing logic, which will be good for bisection,
and it certainly does save storage space.

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-sh4: Add flags state to insn_start

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-s390x: Add cc_op state to insn_start

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-mips: Add delayed branch state to insn_start

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-i386: Add cc_op state to insn_start

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-arm: Add condexec state to insn_start

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Allow extra data to be attached to insn_start

With an eye toward having this data replace the gen_opc_* arrays
that each target collects in order to enable restore_state_from_tb.

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-*: Introduce and use cpu_breakpoint_test

Reduce the boilerplate required for each target. At the same time,
move the test for breakpoint after calling tcg_gen_insn_start.

Note that arm and aarch64 do not use cpu_breakpoint_test, but still
move the inline test down after tcg_gen_insn_start.

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-*: Increment num_insns immediately after tcg_gen_insn_start

This does tidy the icount test common to all targets.

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-*: Unconditionally emit tcg_gen_insn_start

While we're at it, emit the opcode adjacent to where we currently
record data for search_pc. This puts gen_io_start et al on the
"correct" side of the marker.

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Rename debug_insn_start to insn_start

With an eye toward making it mandatory.

Reviewed-by: Aurelien Jarno <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-tilegx: Support iret instruction and related special registers

EX_CONTEXT_0_0 is used for jumping address, and EX_CONTEXT_0_1 is for
INTERRUPT_CRITICAL_SECTION, which should only be 0 or 1 in user mode, or
it will cause target SIGILL (and the patch doesn't support system mode).

Signed-off-by: Chen Gang <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-tilegx: Use TILEGX_EXCP_OPCODE_UNKNOWN and TILEGX_EXCP_OPCODE_UNIMPLEMENTED correctly

For some cases, they are for TILEGX_EXCP_OPCODE_UNKNOWN, not for
TILEGX_EXCP_OPCODE_UNIMPLEMENTED.

Also for some cases, they are for TILEGX_EXCP_OPCODE_UNIMPLEMENTED, not
for TILEGX_EXCP_OPCODE_UNKNOWN.

When analyzing issues, the correct printing information is necessary,
e.g. grep UIMP in gcc testsuite output log for finding qemu tilegx
umimplementation issues, grep UNKNOWN for finding unknown instructions.

Signed-off-by: Chen Gang <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-tilegx: Implement v2mults instruction

Signed-off-by: Chen Gang <[email protected]>
Message-Id: <1443956491 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-tilegx: Implement v?int_* instructions.

Signed-off-by: Chen Gang <[email protected]>
Message-Id: <1443956491 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-tilegx: Implement v2sh* instructions

It is just according to v1sh* instructions implementation.

Signed-off-by: Chen Gang <[email protected]>
Message-Id: <1443956491 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-tilegx: Handle nofault prefetch instructions

These are mapped onto some of the normal load instructions, when the
destination is the zero register. Other load insns do fault even
when targeting the zero register.

Signed-off-by: Richard Henderson <[email protected]>

target-tilegx: Fix a typo for mnemonic about "ld_add"

Signed-off-by: Chen Gang <[email protected]>
Message-Id: <1443562720 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-tilegx: Use TILEGX_EXCP_SIGNAL instead of TILEGX_EXCP_SEGV

Consolidate signal handling under a single exception.

Signed-off-by: Richard Henderson <[email protected]>

target-tilegx: Decode ill pseudo-instructions

Notice raise and bpt, decoding the constants embedded in the
nop addil instruction in the x0 slot.

[rth: Generalize TILEGX_EXCP_OPCODE_ILL to TILEGX_EXCP_SIGNAL.
Drop validation of signal values.]

Signed-off-by: Chen Gang <[email protected]>
Message-Id: <1443243635 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

linux-user/tilegx: Implement tilegx signal features

[rth: Remove the spreg[EX1] handling, as it's irrelevant to user-mode.]

Signed-off-by: Chen Gang <[email protected]>
Message-Id: <1443312618 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

linux-user/syscall_defs.h: Sync the latest si_code from Linux kernel

They content several new macro members, also contents TARGET_N*.

Signed-off-by: Chen Gang <[email protected]>
Message-Id: <1443240605 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-tilegx: Let x1 pipe process bpt instruction only

According to the related document, bpt can be only in x1 pipe.

Signed-off-by: Chen Gang <[email protected]>
Message-Id: <1443224574 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-tilegx: Implement complex multiply instructions

Signed-off-by: Richard Henderson <[email protected]>

target-tilegx: Implement table index instructions

Signed-off-by: Richard Henderson <[email protected]>

target-tilegx: Implement crc instructions

Signed-off-by: Richard Henderson <[email protected]>

target-tilegx: Implement v1multu instruction

Signed-off-by: Chen Gang <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <1442874414 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-tilegx: Implement v*add and v*sub instructions

[rth: Implement everything inline; handle v1addi and v2addi as well.]

Signed-off-by: Chen Gang <[email protected]>
Message-Id: <1442873918 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-tilegx: Implement v*shl, v*shru, and v*shrs instructions

v2sh* are implemented with helper functions; v4sh* are implmeneted
with inline code.

Signed-off-by: Chen Gang <[email protected]>
Message-Id: <1442872055 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-tilegx: Tidy simd_helper.c

Using the V1 macro when we want to replicate a byte across
the 8 elements of the word. Using deposit and extract for
manipulating specific elements.

Signed-off-by: Richard Henderson <[email protected]>

pc-dimm: Fail realization for invalid nodes in non-NUMA config

pc_dimm_realize() validates the NUMA node to which memory hotplug is
being performed only in case of NUMA configuration. Include a check to
fail for invalid nodes in case of non-NUMA configuration too.

Signed-off-by: Bharata B Rao <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

Merge remote-tracking branch 'remotes/borntraeger/tags/s390x-20151006' into staging

s390: fixes

Some fixes all over the place:
- ccw bios and gcc 5.1 (avoid floating point ops)
- properly print vector registers
- sclp and sclp-event-facility no longer hang on object_unref(object_new(T))
- better name for io_subsystem_reset

One feature
- the gdb server now exposes several virtualization specific register

# gpg: Signature made Tue 06 Oct 2015 11:20:24 BST using RSA key ID B5A61C7C
# gpg: Good signature from "Christian Borntraeger (IBM) <[email protected]>"

* remotes/borntraeger/tags/s390x-20151006:
  s390x: rename io_subsystem_reset -> subsystem_reset
  s390x/info registers: print vector registers properly
  s390x: set missing parent for hotplug and quiesce events
  s390x/gdb: expose virtualization specific registers
  pc-bios/s390-ccw: avoid floating point operations

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into staging

X86 queue, 2015-10-05

# gpg: Signature made Mon 05 Oct 2015 17:04:38 BST using RSA key ID 984DC5A6
# gpg: Good signature from "Eduardo Habkost <[email protected]>"

* remotes/ehabkost/tags/x86-pull-request:
  icc_bus: drop the unused files
  cpu/apic: drop icc bus/bridge
  x86: use new method to correct reset sequence
  apic: move APIC's MMIO region mapping into APIC
  Correctly re-init EFER state during INIT IPI
  target-i386: add ABM to Haswell* and Broadwell* CPU models
  target-i386: get/put MSR_TSC_AUX across reset and migration
  target-i386: Make check_hw_breakpoints static
  target-i386: Move breakpoint related functions to new file
  target-i386: Convert kvm_default_*features to property/value pairs
  vl: Add another sanity check to smp_parse() function
  cpu: Introduce X86CPUTopoInfo structure for argument simplification

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into staging

# gpg: Signature made Mon 05 Oct 2015 17:01:11 BST using RSA key ID AAFC390E
# gpg: Good signature from "John Snow (John Huston) <[email protected]>"

* remotes/jnsnow/tags/ide-pull-request:
  qtest/ide-test: ppc64be correction for ATAPI tests
  MAINTAINERS: Small IDE/FDC touchup
  qtest/ahci: fix redundant assertion

Signed-off-by: Peter Maydell <[email protected]>

tests: vhost-user: disable unless CONFIG_VHOST_NET

vhost-user depends on vhost-net. We should probably fix that.
For now, let's disable the test otherwise.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

vfio: Allow hotplug of containers onto existing guest IOMMU mappings

At present the memory listener used by vfio to keep host IOMMU mappings
in sync with the guest memory image assumes that if a guest IOMMU
appears, then it has no existing mappings.

This may not be true if a VFIO device is hotplugged onto a guest bus
which didn't previously include a VFIO device, and which has existing
guest IOMMU mappings.

Therefore, use the memory_region_register_iommu_notifier_replay()
function in order to fix this case, replaying existing guest IOMMU
mappings, bringing the host IOMMU into sync with the guest IOMMU.

Signed-off-by: David Gibson <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>

memory: Allow replay of IOMMU mapping notifications

When we have guest visible IOMMUs, we allow notifiers to be registered
which will be informed of all changes to IOMMU mappings.  This is used by
vfio to keep the host IOMMU mappings in sync with guest IOMMU mappings.

However, unlike with a memory region listener, an iommu notifier won't be
told about any mappings which already exist in the (guest) IOMMU at the
time it is registered.  This can cause problems if hotplugging a VFIO
device onto a guest bus which had existing guest IOMMU mappings, but didn't
previously have an VFIO devices (and hence no host IOMMU mappings).

This adds a memory_region_iommu_replay() function to handle this case.  It
replays any existing mappings in an IOMMU memory region to a specified
notifier.  Because the IOMMU memory region doesn't internally remember the
granularity of the guest IOMMU it has a small hack where the caller must
specify a granularity at which to replay mappings.

If there are finer mappings in the guest IOMMU these will be reported in
the iotlb structures passed to the notifier which it must handle (probably
causing it to flag an error).  This isn't new - the VFIO iommu notifier
must already handle notifications about guest IOMMU mappings too short
for it to represent in the host IOMMU.

Signed-off-by: David Gibson <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>

vfio: Record host IOMMU's available IO page sizes

Depending on the host IOMMU type we determine and record the available page
sizes for IOMMU translation. We'll need this for other validation in
future patches.

Signed-off-by: David Gibson <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>

vfio: Check guest IOVA ranges against host IOMMU capabilities

The current vfio core code assumes that the host IOMMU is capable of
mapping any IOVA the guest wants to use to where we need.  However, real
IOMMUs generally only support translating a certain range of IOVAs (the
"DMA window") not a full 64-bit address space.

The common x86 IOMMUs support a wide enough range that guests are very
unlikely to go beyond it in practice, however the IOMMU used on IBM Power
machines - in the default configuration - supports only a much more limited
IOVA range, usually 0..2GiB.

If the guest attempts to set up an IOVA range that the host IOMMU can't
map, qemu won't report an error until it actually attempts to map a bad
IOVA.  If guest RAM is being mapped directly into the IOMMU (i.e. no guest
visible IOMMU) then this will show up very quickly.  If there is a guest
visible IOMMU, however, the problem might not show up until much later when
the guest actually attempt to DMA with an IOVA the host can't handle.

This patch adds a test so that we will detect earlier if the guest is
attempting to use IOVA ranges that the host IOMMU won't be able to deal
with.

For now, we assume that "Type1" (x86) IOMMUs can support any IOVA, this is
incorrect, but no worse than what we have already.  We can't do better for
now because the Type1 kernel interface doesn't tell us what IOVA range the
IOMMU actually supports.

For the Power "sPAPR TCE" IOMMU, however, we can retrieve the supported
IOVA range and validate guest IOVA ranges against it, and this patch does
so.

Signed-off-by: David Gibson <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>

vfio: Generalize vfio_listener_region_add failure path

If a DMA mapping operation fails in vfio_listener_region_add() it
checks to see if we've already completed initial setup of the
container. If so it reports an error so the setup code can fail
gracefully, otherwise throws a hw_error().

There are other potential failure cases in vfio_listener_region_add()
which could benefit from the same logic, so move it to its own
fail: block. Later patches can use this to extend other failure cases
to fail as gracefully as possible under the circumstances.

Signed-off-by: David Gibson <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>

vfio: Remove unneeded union from VFIOContainer

Currently the VFIOContainer iommu_data field contains a union with
different information for different host iommu types.  However:
   * It only actually contains information for the x86-like "Type1" iommu
   * Because we have a common listener the Type1 fields are actually used
on all IOMMU types, including the SPAPR TCE type as well

In fact we now have a general structure for the listener which is unlikely
to ever need per-iommu-type information, so this patch removes the union.

In a similar way we can unify the setup of the vfio memory listener in
vfio_connect_container() that is currently split across a switch on iommu
type, but is effectively the same in both cases.

The iommu_data.release pointer was only needed as a cleanup function
which would handle potentially different data in the union.  With the
union gone, it too can be removed.

Signed-off-by: David Gibson <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>

hw/vfio/platform: do not set resamplefd for edge-sensitive IRQS

In irqfd mode, current code attempts to set a resamplefd whatever
the type of the IRQ. For an edge-sensitive IRQ this attempt fails
and as a consequence, the whole irqfd setup fails and we fall back
to the slow mode. This patch bypasses the resamplefd setting for
non level-sentive IRQs.

Signed-off-by: Eric Auger <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>

hw/vfio/platform: change interrupt/unmask fields into pointer

unmask EventNotifier might not be initialized in case of edge
sensitive irq. Using EventNotifier pointers make life simpler to
handle the edge-sensitive irqfd setup.

Signed-off-by: Eric Auger <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>

hw/vfio/platform: irqfd setup sequence update

With current implementation, eventfd VFIO signaling is first set up and
then irqfd is setup, if supported and allowed.

This start sequence causes several issues with IRQ forwarding setup
which, if supported, is transparently attempted on irqfd setup:
IRQ forwarding setup is likely to fail if the IRQ is detected as under
injection into the guest (active at irqchip level or VFIO masked).

This currently always happens because the current sequence explicitly
VFIO-masks the IRQ before setting irqfd.

Even if that masking were removed, we couldn't prevent the case where
the IRQ is under injection into the guest.

So the simpler solution is to remove this 2-step startup and directly
attempt irqfd setup. This is what this patch does.

Also in case the eventfd setup fails, there is no reason to go farther:
let's abort.

Signed-off-by: Eric Auger <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>

qtest/ide-test: ppc64be correction for ATAPI tests

the 16bit ide data register is LE by definition.

Signed-off-by: John Snow <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Message-id: 1443461938 [email protected]

MAINTAINERS: Small IDE/FDC touchup

libqos/ahci and tests/fdc-test are under my purview also,
include them in the appropriate stanzas.

Signed-off-by: John Snow <[email protected]>
Message-id: 1443117055 [email protected]

qtest/ahci: fix redundant assertion

Fixes https://bugs.launchpad.net/qemu/+bug/1497711

(!ncq || (ncq && lba48)) is the same as
(!ncq || lba48).

The intention is simply: "If a command is NCQ,
it must also be LBA48."

Signed-off-by: John Snow <[email protected]>
Message-id: 1442868929 [email protected]

icc_bus: drop the unused files

ICC bus impl has been droped, so all icc related files are not useful
any more; delete them.

Signed-off-by: Zhu Guihua <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

cpu/apic: drop icc bus/bridge

After CPU hotplug has been converted to BUS-less hot-plug infrastructure,
the only function ICC bus performs is to propagate reset to LAPICs. However
LAPIC could be reset by registering its reset handler after all device are
initialized.
Do so and drop ~30LOC of not needed anymore ICCBus related code.

Signed-off-by: Chen Fan <[email protected]>
Signed-off-by: Zhu Guihua <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

x86: use new method to correct reset sequence

During reset some devices (such as hpet, rtc) might send IRQ to APIC
which changes APIC's state from default one it's supposed to have
at machine startup time.
Fix this by resetting APIC after devices have been reset to cancel
any changes that qemu_devices_reset() might have done to its state.

Signed-off-by: Zhu Guihua <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

apic: move APIC's MMIO region mapping into APIC

When ICC bus/bridge is removed, APIC MMIO will be left
unmapped since it was mapped into system's address space
indirectly by ICC bridge.
Fix it by moving mapping into APIC code, so it would be
possible to remove ICC bus/bridge code later.

Signed-off-by: Chen Fan <[email protected]>
Signed-off-by: Zhu Guihua <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

Correctly re-init EFER state during INIT IPI

When doing a re-initialization of a CPU core, the default state is to _not_
have 64-bit long mode enabled. This means the LME (long mode enable) and LMA
(long mode active) bits in the EFER model-specific register should be cleared.

However, the EFER state is part of the CPU environment which is
preserved by do_cpu_init(), so if EFER.LME and EFER.LMA were set at the
time an INIT IPI was received, they will remain set after the init completes.

This is contrary to what the Intel architecture manual describes and what
happens on real hardware, and it leaves the CPU in a weird state that the
guest can't clear.

To fix this, the 'efer' member of the CPUX86State structure has been moved
to an area outside the region preserved by do_cpu_init(), so that it can
be properly re-initialized by x86_cpu_reset().

Signed-off-by: Bill Paul <[email protected]>
CC: Paolo Bonzini <[email protected]>
CC: Richard Henderson <[email protected]>
CC: Eduardo Habkost <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

target-i386: add ABM to Haswell* and Broadwell* CPU models

ABM is only implemented as a single instruction set by AMD; all AMD
processors support both instructions or neither. Intel considers POPCNT
as part of SSE4.2, and LZCNT as part of BMI1, but Intel also uses AMD's
ABM flag to indicate support for both POPCNT and LZCNT. It has to be
added to Haswell and Broadwell because Haswell, by adding LZCNT, has
completed the ABM.

Tested with "qemu-kvm -cpu Haswell-noTSX,enforce" (and also with older
machine types) on an Haswell-EP machine.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

target-i386: get/put MSR_TSC_AUX across reset and migration

There's one report of migration breaking due to missing MSR_TSC_AUX
save/restore. Fix this by adding a new subsection that saves the state
of this MSR.

https://bugzilla.redhat.com/show_bug.cgi?id=1261797

Reported-by: Xiaoqing Wei <[email protected]>
Signed-off-by: Amit Shah <[email protected]>
CC: Paolo Bonzini <[email protected]>
CC: Juan Quintela <[email protected]>
CC: "Dr. David Alan Gilbert" <[email protected]>
CC: Marcelo Tosatti <[email protected]>
CC: Richard Henderson <[email protected]>
CC: Eduardo Habkost <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

target-i386: Make check_hw_breakpoints static

The function is now only used from within a single file.

Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

target-i386: Move breakpoint related functions to new file

Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

target-i386: Convert kvm_default_*features to property/value pairs

Convert the kvm_default_features and kvm_default_unset_features arrays
into a simple list of property/value pairs that will be applied to
X86CPU objects when using KVM.

Acked-by: Paolo Bonzini <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

vl: Add another sanity check to smp_parse() function

The code in smp_parse already checks the topology information for
sockets * cores * threads < cpus and bails out with an error in
that case. However, it is still possible to supply a bad configuration
the other way round, e.g. with:

qemu-system-xxx -smp 4,sockets=1,cores=4,threads=2

QEMU then still starts the guest, with topology configuration that
is rather incomprehensible and likely not what the user wanted.
So let's add another check to refuse such wrong configurations.

Signed-off-by: Thomas Huth <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Acked-by: Cornelia Huck <[email protected]>
Acked-by: Bastian Koppelmann <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

cpu: Introduce X86CPUTopoInfo structure for argument simplification

In order to simplify arguments of function, introduce a new struct
named X86CPUTopoInfo.

Signed-off-by: Chen Fan <[email protected]>
Signed-off-by: Zhu Guihua <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

virtio,pc features, fixes

New features:
    guest RAM buffer overrun mitigation
    RAM physical address gaps for memory hotplug
    (except refactoring which got some review comments)

Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Fri 02 Oct 2015 15:04:56 BST using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>"
# gpg:                 aka "Michael S. Tsirkin <[email protected]>"

* remotes/mst/tags/for_upstream:
  vhost-user-test: fix predictable filename on tmpfs
  vhost-user-test: use tmpfs by default
  pc: memhp: force gaps between DIMM's GPA
  memhp: extend address auto assignment to support gaps
  vhost-user: unit test for new messages
  vhost-user-test: do not reinvent glib-compat.h
  virtio: Notice when the system doesn't support MSIx at all
  pc: Add a comment explaining why pc_compat_2_4() doesn't exist
  exec: allocate PROT_NONE pages on top of RAM
  oslib: allocate PROT_NONE pages on top of RAM
  oslib: rework anonimous RAM allocation
  virtio-net: correctly drop truncated packets
  virtio: introduce virtqueue_discard()
  virtio: introduce virtqueue_unmap_sg()

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/riku/tags/pull-linux-user-20151002' into staging

First set of Linux-user que patches for 2.5

# gpg: Signature made Fri 02 Oct 2015 13:38:00 BST using RSA key ID DE3C9BC0
# gpg: Good signature from "Riku Voipio <[email protected]>"
# gpg:                 aka "Riku Voipio <[email protected]>"

* remotes/riku/tags/pull-linux-user-20151002:
  linux-user: assert that target_mprotect cannot fail
  linux-user/signal.c: Use setup_rt_frame() instead of setup_frame() for target openrisc
  linux-user/syscall.c: Add EAGAIN to host_to_target_errno_table for
  linux-user: add name_to_handle_at/open_by_handle_at
  linux-user: Return target error number in do_fork()
  linux-user: fix cmsg conversion in case of multiple headers
  linux-user: remove MAX_ARG_PAGES limit
  linux-user: remove unused image_info members
  linux-user: Treat --foo options the same as -foo
  linux-user: use EXIT_SUCCESS and EXIT_FAILURE
  linux-user: Add proper error messages for bad options
  linux-user: Add -help
  linux-user: Exit 0 when -h is used

Signed-off-by: Peter Maydell <[email protected]>

vhost-user-test: fix predictable filename on tmpfs

vhost-user-test uses getpid to create a unique filename. This name is
predictable, and a security problem. Instead, use a tmp directory
created by mkdtemp, which is a suggested best practice.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>

vhost-user-test: use tmpfs by default

Most people don't run make check by default, so they skip vhost-user
unit tests. Solve this by using tmpfs instead, unless hugetlbfs is
specified (using an environment variable).

Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>

pc: memhp: force gaps between DIMM's GPA

mapping DIMMs non contiguously allows to workaround
virtio bug reported earlier:
http://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg00522.html
in this case guest kernel doesn't allocate buffers
that can cross DIMM boundary keeping each buffer
local to a DIMM.

Suggested-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Igor Mammedov <[email protected]>
Acked-by: Eduardo Habkost <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

memhp: extend address auto assignment to support gaps

setting gap to TRUE will make sparse DIMM
address auto allocation, leaving gaps between
a new DIMM address and preceeding existing DIMM.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost-user: unit test for new messages

Data is empty for now, but do make sure master
sets the new feature bit flag.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>

vhost-user-test: do not reinvent glib-compat.h

glib-compat.h has the gunk to support both old-style and new-style
gthread functions. Use it instead of reinventing it.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Tested-by: Marc-André Lureau <[email protected]>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Fri 02 Oct 2015 12:49:13 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"

* remotes/kevin/tags/for-upstream:
  block/raw-posix: Open file descriptor O_RDWR to work around glibc posix_fallocate emulation issue.
  block: disable I/O limits at the beginning of bdrv_close()
  iotests: Fix test 128 for password-less sudo
  tests: Fix test 049 fallout from improved HMP error messages
  raw-win32: Fix write request error handling

Signed-off-by: Peter Maydell <[email protected]>

block/raw-posix: Open file descriptor O_RDWR to work around glibc posix_fallocate emulation issue.

  https://bugzilla.redhat.com/show_bug.cgi?id=1265196

The following command fails on an NFS mountpoint:

  $ qemu-img create -f qcow2 -o preallocation=falloc disk.img 262144
  Formatting 'disk.img', fmt=qcow2 size=262144 encryption=off cluster_size=65536 preallocation='falloc' lazy_refcounts=off
  qemu-img: disk.img: Could not preallocate data for the new file: Bad file descriptor

The reason turns out to be because NFS doesn't support the
posix_fallocate call.  glibc emulates it instead.  However glibc's
emulation involves using the pread(2) syscall.  The pread syscall
fails with EBADF if the file descriptor is opened without the read
open-flag (ie. open (..., O_WRONLY)).

I contacted glibc upstream about this, and their response is here:

  https://bugzilla.redhat.com/show_bug.cgi?id=1265196#c9

There are two possible fixes: Use Linux fallocate directly, or (this
fix) work around the problem in qemu by opening the file with O_RDWR
instead of O_WRONLY.

Signed-off-by: Richard W.M. Jones <[email protected]>
BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1265196
Reviewed-by: Jeff Cody <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: disable I/O limits at the beginning of bdrv_close()

Disabling I/O limits from a BDS also drains all pending throttled
requests, so it should be done at the beginning of bdrv_close() with
the rest of the bdrv_drain() calls before the BlockDriver is closed.

Signed-off-by: Alberto Garcia <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

iotests: Fix test 128 for password-less sudo

As of 934659c460d46c948cf348822fda1d38556ed9a4, $QEMU_IO is generally no
longer a program name, and therefore "sudo -n $QEMU_IO" will no longer
work.

Fix this by copying the qemu-io invocation function from common.config,
making it use $sudo for invoking $QEMU_IO_PROG, and then use that
function instead of $QEMU_IO.

Reported-by: Fam Zheng <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

tests: Fix test 049 fallout from improved HMP error messages

Commit 50b7b000 improved HMP error messages, but forgot to update
qemu-iotests to match.

Reported-by: Kevin Wolf <[email protected]>
Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Alberto Garcia <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

raw-win32: Fix write request error handling

aio_worker() wrote the return code to the wrong variable.

Signed-off-by: Kevin Wolf <[email protected]>
Tested-by: Guangmu Zhu <[email protected]>
Reviewed-by: Eric Blake <[email protected]>

s390x: rename io_subsystem_reset -> subsystem_reset

According to the Pop:
"Subsystem reset operates only on those elements in the configuration
which are not CPUs".

As this is what we actually do, let's simply rename the function.

Acked-by: Cornelia Huck <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Jens Freimann <[email protected]>
Message-Id: <1443689387 [email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>

s390x/info registers: print vector registers properly

We want

F12=0000000000000000 F13=0000000000000000 F14=0000000000000000 F15=0000000000000000
V00=00000000000000000000000000000000 V01=00000000000000000000000000000000

instead of
F12=0000000000000000 F13=0000000000000000 F14=0000000000000000 F15=0000000000000000
V00=00000000000000000000000000000000
V01=00000000000000000000000000000000 V02=00000000000000000000000000000000

Signed-off-by: Christian Borntraeger <[email protected]>
Signed-off-by: Jens Freimann <[email protected]>
Message-Id: <1443689387 [email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>

s390x: set missing parent for hotplug and quiesce events

Existing code missed to set a parent for the quiesce and hotplug event.
While this didn't matter in practise, new introspection APIs basically now
do an object_unref(object_new(T)), which loops forever.

When trying to remove the event facility bus, the code tries to
unparent all childs on the bus, so they are properly deleted and therefore removed.
As object_unparent() on these child devices doesn't work, as there is no parent,
we loop forever.

Let's fix this by adding the event facility as a parent. Also switch from
object_initialize to object_new, so the only valid reference is in fact the
parent property. This makes it more obvious when the device (state) is actually
gone (and how the reference counting works).

Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Jens Freimann <[email protected]>
Message-Id: <1443689387 [email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>

s390x/gdb: expose virtualization specific registers

Let's expose some virtual/fake registers as virtualization specific
registers.

Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Jens Freimann <[email protected]>
Message-Id: <1443689387 [email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>

pc-bios/s390-ccw: avoid floating point operations

Some gcc versions (e.g. Fedora 22 gcc 5.1.1) seem to use floating
point registers for spilling and filling of general purpose registers.
As the BIOS does not activate the AFP register setting of CR0 this can
cause data exception program checks.
Disallow floating point in the BIOS as a simple solution.

Signed-off-by: Christian Borntraeger <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Jens Freimann <[email protected]>
Message-Id: <1443689387 [email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>

Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging

# gpg: Signature made Thu 01 Oct 2015 20:02:33 BST using RSA key ID C0DE3057
# gpg: Good signature from "Jeffrey Cody <[email protected]>"
# gpg:                 aka "Jeffrey Cody <[email protected]>"
# gpg:                 aka "Jeffrey Cody <[email protected]>"

* remotes/cody/tags/block-pull-request:
  block: mirror - fix full sync mode when target does not support zero init

Signed-off-by: Peter Maydell <[email protected]>

target-microblaze: Set the PC in reset instead of realize

Set the Microblaze CPU PC in the reset instead of setting it
in the realize. This is required as the PC is zeroed in the
reset function and causes problems in some situations.

Signed-off-by: Alistair Francis <[email protected]>
Reviewed-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: Edgar E. Iglesias <[email protected]>

disas/cris: Fix typo in comment

Signed-off-by: Stefan Weil <[email protected]>
Reviewed-by: Edgar E. Iglesias <[email protected]>
Signed-off-by: Edgar E. Iglesias <[email protected]>

block: mirror - fix full sync mode when target does not support zero init

During mirror, if the target device does not support zero init, a
mirror may result in a corrupted image for sync="full" mode.

This is due to how the initial dirty bitmap is set up prior to copying
data - we did not mark sectors as dirty that are unallocated.  This
means those unallocated sectors are skipped over on the target, and for
a device without zero init, invalid data may reside in those holes.

If both of the following conditions are true, then we will explicitly
mark all sectors as dirty:

    1.) sync = "full"
    2.) bdrv_has_zero_init(target) == false

If the target does support zero init, but a target image is passed in
with data already present (i.e. an "existing" image), it is assumed the
data present in the existing image is valid data for those sectors.

Reviewed-by: Paolo Bonzini <[email protected]>
Message-id: 91ed4bc5bda7e2b09eb508b07c83f4071fe0b3c9.1443705220 [email protected]
Signed-off-by: Jeff Cody <[email protected]>

virtio: Notice when the system doesn't support MSIx at all

And do not issue an error_report in that case.

Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pc: Add a comment explaining why pc_compat_2_4() doesn't exist

pc_compat_2_4() doesn't exist, and we shouldn't create one. Add a
comment explaining why the function doesn't exist and why pc_compat_*()
functions are deprecated.

Signed-off-by: Eduardo Habkost <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

exec: allocate PROT_NONE pages on top of RAM

This inserts a read and write protected page between RAM and QEMU
memory, for file-backend RAM.
This makes it harder to exploit QEMU bugs resulting from buffer
overflows in devices using variants of cpu_physical_memory_map,
dma_memory_map etc.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>