Luc Michel [Tue, 14 Aug 2018 16:17:20 +0000 (17:17 +0100)]
intc/arm_gic: Implement gic_update_virt() function
Add the gic_update_virt() function to update the vCPU interface states
and raise vIRQ and vFIQ as needed. This commit renames gic_update() to
gic_update_internal() and generalizes it to handle both cases, with a
`virt' parameter to track whether we are updating the CPU or vCPU
interfaces.
The main difference between CPU and vCPU is the way we select the best
IRQ. This part has been split into the gic_get_best_(v)irq functions.
For the virt case, the LRs are iterated to find the best candidate.
Luc Michel [Tue, 14 Aug 2018 16:17:20 +0000 (17:17 +0100)]
intc/arm_gic: Wire the vCPU interface
Add the read/write functions to handle accesses to the vCPU interface.
Those accesses are forwarded to the real CPU interface, with the CPU id
being converted to the corresponding vCPU id (vCPU id = CPU id +
GIC_NCPU).
Luc Michel [Tue, 14 Aug 2018 16:17:20 +0000 (17:17 +0100)]
intc/arm_gic: Implement virtualization extensions in gic_cpu_(read|write)
Implement virtualization extensions in the gic_cpu_read() and
gic_cpu_write() functions. Those are the last bits missing to fully
support virtualization extensions in the CPU interface path.
Luc Michel [Tue, 14 Aug 2018 16:17:20 +0000 (17:17 +0100)]
intc/arm_gic: Implement virtualization extensions in gic_(deactivate|complete_irq)
Implement virtualization extensions in the gic_deactivate_irq() and
gic_complete_irq() functions.
When the guest writes an invalid vIRQ to V_EOIR or V_DIR, since the
GICv2 specification is not entirely clear here, we adopt the behaviour
observed on real hardware:
* When V_CTRL.EOIMode is false (EOI split is disabled):
- In case of an invalid vIRQ write to V_EOIR:
-> If some bits are set in H_APR, an invalid vIRQ write to V_EOIR
triggers a priority drop, and increments V_HCR.EOICount.
-> If V_APR is already cleared, nothing happen
- An invalid vIRQ write to V_DIR is ignored.
* When V_CTRL.EOIMode is true:
- In case of an invalid vIRQ write to V_EOIR:
-> If some bits are set in H_APR, an invalid vIRQ write to V_EOIR
triggers a priority drop.
-> If V_APR is already cleared, nothing happen
- An invalid vIRQ write to V_DIR increments V_HCR.EOICount.
Luc Michel [Tue, 14 Aug 2018 16:17:20 +0000 (17:17 +0100)]
intc/arm_gic: Implement virtualization extensions in gic_acknowledge_irq
Implement virtualization extensions in the gic_acknowledge_irq()
function. This function changes the state of the highest priority IRQ
from pending to active.
When the current CPU is a vCPU, modifying the state of an IRQ modifies
the corresponding LR entry. However if we clear the pending flag before
setting the active one, we lose track of the LR entry as it becomes
invalid. The next call to gic_get_lr_entry() will fail.
To overcome this issue, we call gic_activate_irq() before
gic_clear_pending(). This does not change the general behaviour of
gic_acknowledge_irq.
We also move the SGI case in gic_clear_pending_sgi() to enhance
code readability as the virtualization extensions support adds a if-else
level.
Luc Michel [Tue, 14 Aug 2018 16:17:20 +0000 (17:17 +0100)]
intc/arm_gic: Implement virtualization extensions in gic_(activate_irq|drop_prio)
Implement virtualization extensions in gic_activate_irq() and
gic_drop_prio() and in gic_get_prio_from_apr_bits() called by
gic_drop_prio().
When the current CPU is a vCPU:
- Use GIC_VIRT_MIN_BPR and GIC_VIRT_NR_APRS instead of their non-virt
counterparts,
- the vCPU APR is stored in the virtual interface, in h_apr.
Add some helper functions to gic_internal.h to get or change the state
of an IRQ. When the current CPU is not a vCPU, the call is forwarded to
the GIC distributor. Otherwise, it acts on the list register matching
the IRQ in the current CPU virtual interface.
gic_clear_active can have a side effect on the distributor, even in the
vCPU case, when the correponding LR has the HW field set.
Use those functions in the CPU interface code path to prepare for the
vCPU interface implementation.
Luc Michel [Tue, 14 Aug 2018 16:17:20 +0000 (17:17 +0100)]
intc/arm_gic: Refactor secure/ns access check in the CPU interface
An access to the CPU interface is non-secure if the current GIC instance
implements the security extensions, and the memory access is actually
non-secure. Until then, it was checked with tests such as
if (s->security_extn && !attrs.secure) { ... }
in various places of the CPU interface code.
With the implementation of the virtualization extensions, those tests
must be updated to take into account whether we are in a vCPU interface
or not. This is because the exposed vCPU interface does not implement
security extensions.
This commits replaces all those tests with a call to the
gic_cpu_ns_access() function to check if the current access to the CPU
interface is non-secure. This function takes into account whether the
current CPU is a vCPU or not.
Note that this function is used only in the (v)CPU interface code path.
The distributor code path is left unchanged, as the distributor is not
exposed to vCPUs at all.
Luc Michel [Tue, 14 Aug 2018 16:17:20 +0000 (17:17 +0100)]
intc/arm_gic: Add virtualization extensions helper macros and functions
Add some helper macros and functions related to the virtualization
extensions to gic_internal.h.
The GICH_LR_* macros help extracting specific fields of a list register
value. The only tricky one is the priority field as only the MSB are
stored. The value must be shifted accordingly to obtain the correct
priority value.
gic_is_vcpu() and gic_get_vcpu_real_id() help with (v)CPU id manipulation
to abstract the fact that vCPU id are in the range
[ GIC_NCPU; (GIC_NCPU + num_cpu) [.
gic_lr_* and gic_virq_is_valid() help with the list registers.
gic_get_lr_entry() returns the LR entry for a given (vCPU, irq) pair. It
is meant to be used in contexts where we know for sure that the entry
exists, so we assert that entry is actually found, and the caller can
avoid the NULL check on the returned pointer.
Luc Michel [Tue, 14 Aug 2018 16:17:20 +0000 (17:17 +0100)]
intc/arm_gic: Add the virtualization extensions to the GIC state
Add the necessary parts of the virtualization extensions state to the
GIC state. We choose to increase the size of the CPU interfaces state to
add space for the vCPU interfaces (the GIC_NCPU_VCPU macro). This way,
we'll be able to reuse most of the CPU interface code for the vCPUs.
The only exception is the APR value, which is stored in h_apr in the
virtual interface state for vCPUs. This is due to some complications
with the GIC VMState, for which we don't want to break backward
compatibility. APRs being stored in 2D arrays, increasing the second
dimension would lead to some ugly VMState description. To avoid
that, we keep it in h_apr for vCPUs.
The vCPUs are numbered from GIC_NCPU to (GIC_NCPU * 2) - 1. The
`gic_is_vcpu` function help to determine if a given CPU id correspond to
a physical CPU or a virtual one.
For the in-kernel KVM VGIC, since the exposed VGIC does not implement
the virtualization extensions, we report an error if the corresponding
property is set to true.
Luc Michel [Tue, 14 Aug 2018 16:17:20 +0000 (17:17 +0100)]
intc/arm_gic: Remove some dead code and put some functions static
Some functions are now only used in arm_gic.c, put them static. Some of
them where only used by the NVIC implementation and are not used
anymore, so remove them.
Luc Michel [Tue, 14 Aug 2018 16:17:19 +0000 (17:17 +0100)]
intc/arm_gic: Implement GICD_ISACTIVERn and GICD_ICACTIVERn registers
Implement GICD_ISACTIVERn and GICD_ICACTIVERn registers in the GICv2.
Those registers allow to set or clear the active state of an IRQ in the
distributor.
Luc Michel [Tue, 14 Aug 2018 16:17:19 +0000 (17:17 +0100)]
intc/arm_gic: Refactor operations on the distributor
In preparation for the virtualization extensions implementation,
refactor the name of the functions and macros that act on the GIC
distributor to make that fact explicit. It will be useful to
differentiate them from the ones that will act on the virtual
interfaces.
Peter Maydell [Tue, 14 Aug 2018 16:17:19 +0000 (17:17 +0100)]
accel/tcg: Check whether TLB entry is RAM consistently with how we set it up
We set up TLB entries in tlb_set_page_with_attrs(), where we have
some logic for determining whether the TLB entry is considered
to be RAM-backed, and thus has a valid addend field. When we
look at the TLB entry in get_page_addr_code(), we use different
logic for determining whether to treat the page as RAM-backed
and use the addend field. This is confusing, and in fact buggy,
because the code in tlb_set_page_with_attrs() correctly decides
that rom_device memory regions not in romd mode are not RAM-backed,
but the code in get_page_addr_code() thinks they are RAM-backed.
This typically results in "Bad ram pointer" assertion if the
guest tries to execute from such a memory region.
Fix this by making get_page_addr_code() just look at the
TLB_MMIO bit in the code_address field of the TLB, which
tlb_set_page_with_attrs() sets if and only if the addend
field is not valid for code execution.
Peter Maydell [Tue, 14 Aug 2018 16:17:19 +0000 (17:17 +0100)]
target/arm: Allow execution from small regions
Now that we have full support for small regions, including execution,
we can remove the workarounds where we marked all small regions as
non-executable for the M-profile MPU and SAU.
Peter Maydell [Tue, 14 Aug 2018 16:17:19 +0000 (17:17 +0100)]
accel/tcg: Return -1 for execution from MMIO regions in get_page_addr_code()
Now that all the callers can handle get_page_addr_code() returning -1,
remove all the code which tries to handle execution from MMIO regions
or small-MMU-region RAM areas. This will mean that we can correctly
execute from these areas, rather than ending up either aborting QEMU
or delivering an incorrect guest exception.
Peter Maydell [Tue, 14 Aug 2018 16:17:19 +0000 (17:17 +0100)]
accel/tcg: tb_gen_code(): Create single-insn TB for execution from non-RAM
If get_page_addr_code() returns -1, this indicates that there is no RAM
page we can read a full TB from. Instead we must create a TB which
contains a single instruction and which we do not cache, so it is
executed only once.
Since this means we can now have TBs which are not in any page list,
we also need to make tb_phys_invalidate() handle them (by not trying
to remove them from a nonexistent page list).
Peter Maydell [Tue, 14 Aug 2018 16:17:19 +0000 (17:17 +0100)]
accel/tcg: Handle get_page_addr_code() returning -1 in tb_check_watchpoint()
When we support execution from non-RAM MMIO regions, get_page_addr_code()
will return -1 to indicate that there is no RAM at the requested address.
Handle this in tb_check_watchpoint() -- if the exception happened for a
PC which doesn't correspond to RAM then there is no need to invalidate
any TBs, because the one-instruction TB will not have been cached.
Peter Maydell [Tue, 14 Aug 2018 16:17:19 +0000 (17:17 +0100)]
accel/tcg: Handle get_page_addr_code() returning -1 in hashtable lookups
When we support execution from non-RAM MMIO regions, get_page_addr_code()
will return -1 to indicate that there is no RAM at the requested address.
Handle this in the cpu-exec TB hashtable lookup code, treating it as
"no match found".
Note that the call to get_page_addr_code() in tb_lookup_cmp() needs
no changes -- a return of -1 will already correctly result in the
function returning false.
Peter Maydell [Tue, 14 Aug 2018 16:17:19 +0000 (17:17 +0100)]
accel/tcg: Pass read access type through to io_readx()
The io_readx() function needs to know whether the load it is
doing is an MMU_DATA_LOAD or an MMU_INST_FETCH, so that it
can pass the right value to the cpu_transaction_failed()
function. Plumb this information through from the softmmu
code.
This is currently not often going to give the wrong answer,
because usually instruction fetches go via get_page_addr_code().
However once we switch over to handling execution from non-RAM by
creating single-insn TBs, the path for an insn fetch to generate
a bus error will be through cpu_ld*_code() and io_readx(),
so without this change we will generate a d-side fault when we
should generate an i-side fault.
We also have to pass the access type via a CPU struct global
down to unassigned_mem_read(), for the benefit of the targets
which still use the cpu_unassigned_access() hook (m68k, mips,
sparc, xtensa).
Julia Suvorova [Tue, 14 Aug 2018 16:17:19 +0000 (17:17 +0100)]
nvic: Change NVIC to support ARMv6-M
The differences from ARMv7-M NVIC are:
* ARMv6-M only supports up to 32 external interrupts
(configurable feature already). The ICTR is reserved.
* Active Bit Register is reserved.
* ARMv6-M supports 4 priority levels against 256 in ARMv7-M.
Julia Suvorova [Tue, 14 Aug 2018 16:17:19 +0000 (17:17 +0100)]
arm: Add ARMv6-M programmer's model support
Forbid stack alignment change. (CCR)
Reserve FAULTMASK, BASEPRI registers.
Report any fault as a HardFault. Disable MemManage, BusFault and
UsageFault, so they always escalated to HardFault. (SHCSR)
Julia Suvorova [Tue, 14 Aug 2018 16:17:19 +0000 (17:17 +0100)]
nvic: Handle ARMv6-M SCS reserved registers
Handle SCS reserved registers listed in ARMv6-M ARM D3.6.1.
All reserved registers are RAZ/WI. ARM_FEATURE_M_MAIN is used for the
checks, because these registers are reserved in ARMv8-M Baseline too.
virtio-gpu: fix crashes upon warm reboot with vga mode
With vga=775 on the Linux command line a first boot of the VM running
Linux works fine. After a warm reboot it crashes during Linux boot.
Before that, valgrind points out bad memory write to console
surface. The VGA code is not aware that virtio-gpu got a message
surface scanout when the display is disabled. Let's reset VGA graphic
mode when it is the case, so that a new display surface is created
when doing further VGA operations.
Peter Maydell [Tue, 7 Aug 2018 11:45:01 +0000 (12:45 +0100)]
slirp: Correct size check in m_inc()
The data in an mbuf buffer is not necessarily at the start of the
allocated buffer. (For instance m_adj() allows data to be trimmed
from the start by just advancing the pointer and reducing the length.)
This means that the allocated buffer size (m->m_size) and the
amount of space from the m_data pointer to the end of the
buffer (M_ROOM(m)) are not necessarily the same.
Commit 864036e251f54c9 tried to change the m_inc() function from
taking the new allocated-buffer-size to taking the new room-size,
but forgot to change the initial "do we already have enough space"
check. This meant that if we were trying to extend a buffer which
had a leading gap between the buffer start and the data, we might
incorrectly decide it didn't need to be extended, and then
overrun the end of the buffer, causing memory corruption and
an eventual crash.
Change the "already big enough?" condition from checking the
argument against m->m_size to checking against M_ROOM().
This only makes a difference for the callsite in m_cat();
the other three callsites all start with a freshly allocated
mbuf from m_get(), which will have m->m_size == M_ROOM(m).
Thomas Huth [Thu, 19 Jul 2018 13:02:00 +0000 (15:02 +0200)]
target/xtensa/cpu: Set owner of memory region in xtensa_cpu_initfn
The instance_init function of the xtensa CPUs creates a memory region,
but does not set an owner, so the memory region is not destroyed
correctly when the CPU object is removed. This can happen when
introspecting the CPU devices, so introspecting the CPU device will
leave a dangling memory region object in the QOM tree. Make sure to
set the right owner here to fix this issue.
Peter Maydell [Mon, 6 Aug 2018 12:34:45 +0000 (13:34 +0100)]
hw/intc/arm_gicv3_common: Move gicd shift bug handling to gicv3_post_load
The code currently in gicv3_gicd_no_migration_shift_bug_post_load()
that handles migration from older QEMU versions with a particular
bug is misplaced. We need to run this after migration in all cases,
not just the cases where the "arm_gicv3/gicd_no_migration_shift_bug"
subsection is present, so it must go in a post_load hook for the
top level VMSD, not for the subsection. Move it.
Peter Maydell [Mon, 6 Aug 2018 12:34:44 +0000 (13:34 +0100)]
hw/intc/arm_gicv3_common: Move post_load hooks to top-level VMSD
Contrary to the the impression given in docs/devel/migration.rst,
the migration code does not run the pre_load hook for a
subsection unless the subsection appears on the wire, and so
this is not a place where you can set the default value for
state for the "subsection not present" case. Instead this needs
to be done in a pre_load hook for whatever is the parent VMSD
of the subsection.
We got this wrong in two of the subsection definitions in
the GICv3 migration structs; fix this.
Peter Maydell [Mon, 6 Aug 2018 12:34:43 +0000 (13:34 +0100)]
target/arm: Add dummy needed functions to M profile vmstate subsections
Currently the migration code incorrectly treats a subsection with
no .needed function pointer as if it was the subsection list
terminator -- it is ignored and so is everything after it.
Work around this by giving various M profile vmstate structs
a 'needed' function that always returns true.
We reuse m_needed() for this, since it's always true here.
Peter Maydell [Mon, 6 Aug 2018 12:34:42 +0000 (13:34 +0100)]
hw/intc/arm_gicv3_common: Combine duplicate .subsections in vmstate_gicv3_cpu
Commit 6692aac411199064 accidentally introduced a second initialization
of the .subsections field of vmstate_gicv3_cpu, instead of adding
the new subsection to the existing list. The effect of this was
probably that migration of GICv3 with virtualization enabled was
broken (or alternatively that migration of ICC_SRE_EL1 was broken,
depending on which of the two initializers the compiler used).
Combine the two into a single list.
Peter Maydell [Mon, 6 Aug 2018 12:34:41 +0000 (13:34 +0100)]
hw/intc/arm_gicv3_common: Give no-migration-shift-bug subsection a needed function
Currently the migration code incorrectly treats a subsection with
no .needed function pointer as if it was the subsection list
terminator -- it is ignored and so is everything after it.
Work around this by giving vmstate_gicv3_gicd_no_migration_shift_bug
a 'needed' function that always returns true.
Peter Maydell [Mon, 6 Aug 2018 08:59:05 +0000 (09:59 +0100)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pc, virtio: fixes
A couple of last minute fixes.
Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Fri 03 Aug 2018 09:35:54 BST
# gpg: using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>"
# gpg: aka "Michael S. Tsirkin <[email protected]>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream:
tests/acpi: update tables after memory hotplug changes
pc: acpi: fix memory hotplug regression by reducing stub SRAT entry size
tests/acpi-test: update ACPI tables test blobs
hw/acpi-build: Add a check for memory-less NUMA nodes
vhost: check region type before casting
Check region type first before casting the memory region
to IOMMUMemoryRegion. Otherwise QEMU will abort with below
error message when casting non-IOMMU memory region:
vhost_iommu_region_add: Object 0x561f28bce4f0 is not an
instance of type qemu:iommu-memory-region
sam460ex: Fix PCI interrupts with multiple devices
The four interrupts of the PCI bus are connected to the same UIC pin
on the real Sam460ex. Evidence for this can be found in the UBoot
source for the Sam460ex in the Sam460ex.c file where
PCI_INTERRUPT_LINE is written. Change the ppc440_pcix model to behave
more like this.
This fixes the problem that can be observed when adding further PCI
cards that got their interrupt rotated to other interrupts than PCI
INT A. In particular, the bug was observed with an additional OHCI PCI
card or an ES1370 sound device.
Use the new function sysbus_init_child_obj() to initialize the objects
here, to get the reference counting of the objects right, so that they
are cleaned up correctly when the parent gets removed.
monitor: temporary fix for dead-lock on event recursion
With a Spice port chardev, it is possible to reenter
monitor_qapi_event_queue() (when the client disconnects for
example). This will dead-lock on monitor_lock.
Instead, use some TLS variables to check for recursion and queue the
events.
Fixes:
(gdb) bt
#0 0x00007fa69e7217fd in __lll_lock_wait () at /lib64/libpthread.so.0
#1 0x00007fa69e71acf4 in pthread_mutex_lock () at /lib64/libpthread.so.0
#2 0x0000563303567619 in qemu_mutex_lock_impl (mutex=0x563303d3e220 <monitor_lock>, file=0x5633036589a8 "/home/elmarco/src/qq/monitor.c", line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66
#3 0x0000563302fa6c25 in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x56330602bde0, errp=0x7ffc6ab5e728) at /home/elmarco/src/qq/monitor.c:645
#4 0x0000563303549aca in qapi_event_send_spice_disconnected (server=0x563305afd630, client=0x563305745360, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-ui.c:149
#5 0x00005633033e600f in channel_event (event=3, info=0x5633061b0050) at /home/elmarco/src/qq/ui/spice-core.c:235
#6 0x00007fa69f6c86bb in reds_handle_channel_event (reds=<optimized out>, event=3, info=0x5633061b0050) at reds.c:316
#7 0x00007fa69f6b193b in main_dispatcher_self_handle_channel_event (info=0x5633061b0050, event=3, self=0x563304e088c0) at main-dispatcher.c:197
#8 0x00007fa69f6b193b in main_dispatcher_channel_event (self=0x563304e088c0, event=event@entry=3, info=0x5633061b0050) at main-dispatcher.c:197
#9 0x00007fa69f6d0833 in red_stream_push_channel_event (s=s@entry=0x563305ad8f50, event=event@entry=3) at red-stream.c:414
#10 0x00007fa69f6d086b in red_stream_free (s=0x563305ad8f50) at red-stream.c:388
#11 0x00007fa69f6b7ddc in red_channel_client_finalize (object=0x563304df2360) at red-channel-client.c:347
#12 0x00007fa6a56b7fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0
#13 0x00007fa69f6ba212 in red_channel_client_push (rcc=0x563304df2360) at red-channel-client.c:1341
#14 0x00007fa69f68b259 in red_char_device_send_msg_to_client (client=<optimized out>, msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
#15 0x00007fa69f68b259 in red_char_device_send_msg_to_clients (msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
#16 0x00007fa69f68b259 in red_char_device_read_from_device (dev=0x563304e08bc0) at char-device.c:353
#17 0x000056330317d01d in spice_chr_write (chr=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/spice.c:199
#18 0x00005633034deee7 in qemu_chr_write_buffer (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, offset=0x7ffc6ab5ea70, write_all=false) at /home/elmarco/src/qq/chardev/char.c:112
#19 0x00005633034df054 in qemu_chr_write (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, write_all=false) at /home/elmarco/src/qq/chardev/char.c:147
#20 0x00005633034e1e13 in qemu_chr_fe_write (be=0x563304dbb800, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/char-fe.c:42
#21 0x0000563302fa6334 in monitor_flush_locked (mon=0x563304dbb800) at /home/elmarco/src/qq/monitor.c:425
#22 0x0000563302fa6520 in monitor_puts (mon=0x563304dbb800, str=0x563305de7e9e "") at /home/elmarco/src/qq/monitor.c:468
#23 0x0000563302fa680c in qmp_send_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:517
#24 0x0000563302fa6905 in qmp_queue_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:538
#25 0x0000563302fa6b5b in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730) at /home/elmarco/src/qq/monitor.c:624
#26 0x0000563302fa6c4b in monitor_qapi_event_queue (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730, errp=0x7ffc6ab5ed00) at /home/elmarco/src/qq/monitor.c:649
#27 0x0000563303548cce in qapi_event_send_shutdown (guest=false, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-run-state.c:58
#28 0x000056330313bcd7 in main_loop_should_exit () at /home/elmarco/src/qq/vl.c:1822
#29 0x000056330313bde3 in main_loop () at /home/elmarco/src/qq/vl.c:1862
#30 0x0000563303143781 in main (argc=3, argv=0x7ffc6ab5f068, envp=0x7ffc6ab5f088) at /home/elmarco/src/qq/vl.c:4644
Note that error report is now moved to the first caller, which may
receive an error for a recursed event. This is probably fine (95% of
callers use &error_abort, the rest have NULL error and ignore it)
* remotes/bonzini/tags/for-upstream:
backends/cryptodev: remove dead code
timer: remove replay clock probe in deadline calculation
i386: implement MSR_SMI_COUNT for TCG
i386: do not migrate MSR_SMI_COUNT on machine types <2.12
linux-user: ppc64: don't use volatile register during safe_syscall
r11 is a volatile register on PPC as per calling conventions.
The safe_syscall code uses it to check if the signal_pending
is set during the safe_syscall. When a syscall is interrupted
on return from signal handling, the r11 might be corrupted
before we retry the syscall leading to a crash. The registers
r0-r13 are not to be used here as they have
volatile/designated/reserved usages.
Change the code to use r14 which is non-volatile.
Use SP+16 which is a slot for LR, for save/restore of previous value
of r14. SP+16 can be used, as LR is preserved across the syscall.
Steps to reproduce:
On PPC host, issue `qemu-x86_64 /usr/bin/cc -E -`
Attempt Ctrl-C, the issue is reproduced.
Alex Bennée [Mon, 30 Jul 2018 13:43:20 +0000 (14:43 +0100)]
linux-user/mmap.c: handle invalid len maps correctly
I've slightly re-organised the check to more closely match the
sequence that the kernel uses in do_mmap(). We check for both the zero
case (EINVAL) and the overflow length case (ENOMEM).
Peter Maydell [Mon, 30 Jul 2018 18:11:57 +0000 (19:11 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches:
- qemu-img convert -C is now required to enable copy offloading
- file-posix: Fix write_zeroes with unmap on block devices (would fall
back to explicit writes on recent kernels)
- Fix query-blockstats interface for use with -blockdev
- Minor fixes and documentation updates
# gpg: Signature made Mon 30 Jul 2018 16:08:14 BST
# gpg: using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream:
qemu-iotests: Test query-blockstats with -drive and -blockdev
block/qapi: Include anonymous BBs in query-blockstats
block/qapi: Add 'qdev' field to query-blockstats result
file-posix: Fix write_zeroes with unmap on block devices
block: Fix documentation for BDRV_REQ_MAY_UNMAP
iotests: Add test for 'qemu-img convert -C' compatibility
qemu-img: Add -C option for convert with copy offloading
Revert "qemu-img: Document copy offloading implications with -S and -c"
iotests: Don't lock /dev/null in 226
docs: Describe using images in writing iotests
file-posix: Handle EINTR in preallocation=full write
qcow2: A grammar fix in conflicting cache sizing error message
qcow: fix a reference leak
Peter Maydell [Mon, 30 Jul 2018 16:27:54 +0000 (17:27 +0100)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20180730' into staging
target-arm queue:
* arm/smmuv3: Fix broken VM state migration
* armv7m_nvic: Fix broken VM state migration
* hw/arm/sysbus-fdt: Fix assertion in copy_properties_from_host()
* hw/arm/iotkit: Fix IRQ number for timer1
* hw/misc/tz-mpc: Zero the LUT on initialization, not just reset
* target/arm: Remove duplicate 'host' entry in '-cpu ?' output
* remotes/pmaydell/tags/pull-target-arm-20180730:
target/arm: Remove duplicate 'host' entry in '-cpu ?' output
hw/misc/tz-mpc: Zero the LUT on initialization, not just reset
hw/arm/iotkit: Fix IRQ number for timer1
armv7m_nvic: Fix m-security subsection name
hw/arm/sysbus-fdt: Fix assertion in copy_properties_from_host()
arm/smmuv3: Fix missing VMSD terminator
We clamp down ram_size to match the sclp increment size. We do
not do the same for maxram_size, which means for large guests
with some sizes (e.g. -m 50000) maxram_size differs from ram_size.
This can break other code (e.g. CMMA migration) which uses maxram_size
to calculate the number of pages and then throws some errors.
Peter Maydell [Tue, 24 Jul 2018 15:36:16 +0000 (16:36 +0100)]
hw/misc/tz-mpc: Zero the LUT on initialization, not just reset
In the tz-mpc device we allocate a data block for the LUT,
which we then clear to zero in the device's reset method.
This is conceptually fine, but unfortunately results in a
valgrind complaint about use of uninitialized data on startup:
==30906== Conditional jump or move depends on uninitialised value(s)
==30906== at 0x503609: tz_mpc_translate (tz-mpc.c:439)
==30906== by 0x3F3D90: address_space_translate_iommu (exec.c:511)
==30906== by 0x3F3FF8: flatview_do_translate (exec.c:584)
==30906== by 0x3F4292: flatview_translate (exec.c:644)
==30906== by 0x3F2120: address_space_translate (memory.h:1962)
==30906== by 0x3FB753: address_space_ldl_internal (memory_ldst.inc.c:36)
==30906== by 0x3FB8A6: address_space_ldl (memory_ldst.inc.c:80)
==30906== by 0x619037: ldl_phys (memory_ldst_phys.inc.h:25)
==30906== by 0x61985D: arm_cpu_reset (cpu.c:255)
==30906== by 0x98791B: cpu_reset (cpu.c:249)
==30906== by 0x57FFDB: armv7m_reset (armv7m.c:265)
==30906== by 0x7B1775: qemu_devices_reset (reset.c:69)
This is because of a reset ordering problem -- the TZ MPC
resets after the CPU, but an M-profile CPU's reset function
includes memory loads to get the initial PC and SP, which
then go through an MPC that hasn't yet been reset.
The simplest fix for this is to zero the LUT when we
initialize the data, which will result in the MPC's
translate function giving the right answers for these
early memory accesses.
Peter Maydell [Fri, 27 Jul 2018 11:38:54 +0000 (12:38 +0100)]
hw/arm/iotkit: Fix IRQ number for timer1
A cut-and-paste error meant we were incorrectly wiring up the timer1
IRQ to IRQ3. IRQ3 is the interrupt for timer0 -- move timer0 to
IRQ4 where it belongs.
Peter Maydell [Fri, 27 Jul 2018 11:38:53 +0000 (12:38 +0100)]
armv7m_nvic: Fix m-security subsection name
The vmstate save/load code insists that subsections of a VMState must
have names which include their parent VMState's name as a leading
substring. Unfortunately it neither documents this nor checks it on
device init or state save, but instead fails state load with a
confusing error message ("Missing section footer for armv7m_nvic").
Fix the name of the m-security subsection of the NVIC, so that
state save/load works correctly for the security-enabled NVIC.
Kevin Wolf [Fri, 27 Jul 2018 14:09:25 +0000 (16:09 +0200)]
block/qapi: Include anonymous BBs in query-blockstats
Consistent with query-block, query-blockstats should not only include
named BlockBackends, but also those that are anonymous, but belong to a
device model.
Kevin Wolf [Fri, 27 Jul 2018 14:07:07 +0000 (16:07 +0200)]
block/qapi: Add 'qdev' field to query-blockstats result
Like for query-block, the client needs to identify which BlockBackend
the returned data is for. Anonymous BlockBackends are identified by the
device model they are attached to. Add a 'qdev' field that contains the
qdev ID or QOM path of the attached device model.
Kevin Wolf [Thu, 26 Jul 2018 09:28:30 +0000 (11:28 +0200)]
file-posix: Fix write_zeroes with unmap on block devices
The BLKDISCARD ioctl doesn't guarantee that the discarded blocks read as
all-zero afterwards, so don't try to abuse it for zero writing. We try
to only use this if BLKDISCARDZEROES tells us that it is safe, but this
is unreliable on older kernels and a constant 0 in newer kernels. In
other words, this code path is never actually used with newer kernels,
so we don't even try to unmap while writing zeros.
This patch removes the abuse of discard for writing zeroes from
file-posix and instead adds a new function that uses interfaces that are
actually meant to deallocate and zero out at the same time. Only if
those fail, it falls back to zeroing out without unmap. We never fall
back to a discard operation any more that may or may not result in
zeros.
Kevin Wolf [Wed, 25 Jul 2018 11:20:32 +0000 (13:20 +0200)]
block: Fix documentation for BDRV_REQ_MAY_UNMAP
BDRV_REQ_MAY_UNMAP in a write_zeroes request does not only allow the
driver to unmap the blocks, but it actively requests that the blocks be
unmapped afterwards if at all possible.
On my system (Fedora 28), this script reports a 'failed to get
"consistent read" lock' error. Following docs/devel/testing.rst, it's
better to add locking=off here.
This is because we reference bs twice in qcow_co_create(..) one time in
bdrv_open_blockdev_ref(..) and in blk_insert_bs(..) but we unref it only once
in blk_unref which leads to the reference leak.
Note that I didn't tested much QCOW after this change as I don't use it much.
Pavel Dovgalyuk [Wed, 25 Jul 2018 12:15:26 +0000 (15:15 +0300)]
timer: remove replay clock probe in deadline calculation
Ciro Santilli reported that commit a5ed352596a8b7eb2f9acce34371b944ac3056c4
breaks the execution replay. It happens due to the probing the clock
for the new instances of iothread.
However, this probing was made in replay mode for the timer lists that
are empty.
This patch removes clock probing in replay mode.
It is an artifact of the old version with another thread model.
Paolo Bonzini [Tue, 24 Jul 2018 11:59:21 +0000 (13:59 +0200)]
i386: do not migrate MSR_SMI_COUNT on machine types <2.12
MSR_SMI_COUNT started being migrated in QEMU 2.12. Do not migrate it
on older machine types, or the subsection causes a load failure for
guests that use SMM.
Peter Maydell [Mon, 30 Jul 2018 08:55:47 +0000 (09:55 +0100)]
Merge remote-tracking branch 'remotes/armbru/tags/pull-qobject-2018-07-27-v2' into staging
QObject patches for 2018-07-27 (3.0.0-rc3)
# gpg: Signature made Sat 28 Jul 2018 08:10:39 BST
# gpg: using RSA key 3870B400EB918653
# gpg: Good signature from "Markus Armbruster <[email protected]>"
# gpg: aka "Markus Armbruster <[email protected]>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* remotes/armbru/tags/pull-qobject-2018-07-27-v2:
qstring: Move qstring_from_substr()'s @end one to the right
qstring: Assert size calculations don't overflow
qstring: Fix qstring_from_substr() not to provoke int overflow
qstring: Move qstring_from_substr()'s @end one to the right
qstring_from_substr() takes the index of the substring's first and
last character. qstring_from_substr(s, 0, SIZE_MAX) denotes an empty
substring. Awkward.
Shift the end index one to the right. This simplifies both
qstring_from_substr() and its callers.
qstring: Fix qstring_from_substr() not to provoke int overflow
qstring_from_substr() parameters @start and @end are of type int.
blkdebug_parse_filename(), blkverify_parse_filename(), nbd_parse_uri(),
and qstring_from_str() pass @end values of type size_t or ptrdiff_t.
Values exceeding INT_MAX get truncated, with possibly disastrous
results.
Such huge substrings seem unlikely, but we found one in a core dump,
where "info tlb" executed via QMP's human-monitor-command apparently
produced 35 GiB of output.
Peter Maydell [Tue, 24 Jul 2018 19:16:31 +0000 (20:16 +0100)]
Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20180724a' into staging
Migration pull for 3.0
Fixes only
# gpg: Signature made Tue 24 Jul 2018 19:31:39 BST
# gpg: using RSA key 0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <[email protected]>"
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert/tags/pull-migration-20180724a:
migration: fix duplicate initialization for expected_downtime and cleanup_bh
tests: only update last_byte when at the edge
migration: disallow recovery for release-ram
migration: update recv bitmap only on dest vm
audio/hda: Fix migration
migrate: Fix cancelling state warning
migration: fix potential overflow in multifd send
When gnutls negotiates TLS 1.3 instead of 1.2, the order of messages
sent by the handshake changes. This exposed a logic bug in the test
suite which caused us to wait for the server to see handshake
completion, but not wait for the client to see completion. The result
was the client didn't receive the certificate for verification and the
test failed.
This is exposed in Fedora 29 rawhide which has just enabled TLS 1.3 in
its GNUTLS builds.
Most of the TLS related tests are passing an in a "Error" object to
methods that are expected to fail, but then ignoring any error that is
set and instead asserting on a return value. This means that when an
error is unexpectedly raised, no information about it is printed out,
making failures hard to diagnose. Changing these tests to pass in
&error_abort will make unexpected failures print messages to stderr.
tests: don't silence error reporting for all tests
The test-vmstate test is a bit chatty because it triggers various
expected failure scenarios and the code in question uses error_report
instead of accepting 'Error **errp' parameters. To silence this test the
stubs for error_vprintf() were changed to send errors via
g_test_message() instead of stderr:
Unfortunately this change has global impact across the entire test suite
and means that when tests fail for unexpected reasons, the message is
not displayed on stderr. eg when using &error_abort in a call the test
merely prints
Unexpected error in qcrypto_tls_session_check_certificate() at crypto/tlssession.c:280:
and the actual error message is hidden, making it impossible to diagnose
the failure. This is especially problematic in CI or build systems where
it isn't possible to easily pass the --debug-log flag to tests and
re-run with the test log visible.
This change makes the previous big hammer much more nuanced, providing a
flag in the stub error_vprintf() that can used on a per-test basis to
silence the errors. Only the test-vmstate silences errors initially.
Peter Xu [Mon, 23 Jul 2018 12:33:04 +0000 (20:33 +0800)]
tests: only update last_byte when at the edge
The only possible change of last_byte is when it reaches the edge.
Setting it every time might let last_byte contain an invalid data when
memory corruption is detected, then the check of the next byte will be
incorrect. For example, a single page corruption at address 0x14ad000
will also lead to a "fake" corruption at 0x14ae000:
Memory content inconsistency at 14ad000 first_byte = 44 last_byte = 44 current = ef hit_edge = 0
Memory content inconsistency at 14ae000 first_byte = 44 last_byte = ef current = 44 hit_edge = 0
After the patch, it'll only report the corrputed page:
Memory content inconsistency at 14ad000 first_byte = 44 last_byte = 44 current = ef hit_edge = 0
Peter Xu [Mon, 23 Jul 2018 12:33:03 +0000 (20:33 +0800)]
migration: disallow recovery for release-ram
Postcopy recovery won't work well with release-ram capability since
release-ram will drop the page buffer as long as the page is put into
the send buffer. So if there is a network failure happened, any page
buffers that have not yet reached the destination VM but have already
been sent from the source VM will be lost forever. Let's refuse the
client from resuming such a postcopy migration. Luckily release-ram was
designed to only be used when src and destination VMs are on the same
host, so it should be fine.
Fix outgoing migration which was crashing in
vmstate_hda_audio_stream_buf_needed, I think the problem
is that we have room for upto 4 streams in the array but only
use 2, when we come to try and save the state of the unused
streams we hit st->state == NULL.
migration_iteration_finish: Unknown ending state 2
on a cancel.
I think that's originally due to 39b9e17905c; although
I've only seen the warning, I think that in some cases
that we could find the VM stays paused after a cancel where
it should restart.
Peter Xu [Fri, 20 Jul 2018 03:47:13 +0000 (11:47 +0800)]
migration: fix potential overflow in multifd send
I would guess it won't happen normally, but this should ease Coverity.
>>> CID 1394385: Integer handling issues (OVERFLOW_BEFORE_WIDEN)
>>> Potentially overflowing expression "pages->used * 8192U" with type "unsigned int" (32 bits, unsigned) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type "uint64_t" (64 bits, unsigned).
854 transferred = pages->used * TARGET_PAGE_SIZE + p->packet_len;