Peter Maydell [Thu, 21 Jan 2016 14:15:09 +0000 (14:15 +0000)]
target-arm: Implement FPEXC32_EL2 system register
The AArch64 FPEXC32_EL2 system register is visible at EL2 and EL3,
and allows those exception levels to read and write the FPEXC
register for a lower exception level that is using AArch32.
Peter Maydell [Thu, 21 Jan 2016 14:15:09 +0000 (14:15 +0000)]
target-arm: ignore ELR_ELx[1] for exception return to 32-bit ARM mode
The architecture requires that for an exception return to AArch32 the
low bits of ELR_ELx are ignored when the PC is set from them:
* if returning to Thumb mode, ignore ELR_ELx[0]
* if returning to ARM mode, ignore ELR_ELx[1:0]
We were only squashing bit 0; also squash bit 1 if the SPSR T bit
indicates this is a return to ARM code.
We already implement almost all the checks for the illegal
return events from AArch64 state described in the ARM ARM section
D1.11.2. Add the two missing ones:
* return to EL2 when EL3 is implemented and SCR_EL3.NS is 0
* return to Non-secure EL1 when EL2 is implemented and HCR_EL2.TGE is 1
(We don't implement external debug, so the case of "debug state exit
from EL0 using AArch64 state to EL0 using AArch32 state" doesn't apply
for QEMU.)
Peter Maydell [Thu, 21 Jan 2016 14:15:09 +0000 (14:15 +0000)]
target-arm: Handle exception return from AArch64 to non-EL0 AArch32
Remove the assumptions that the AArch64 exception return code was
making about a return to AArch32 always being a return to EL0.
This includes pulling out the illegal-SPSR checks so we can apply
them for return to 32 bit as well as return to 64-bit.
Peter Maydell [Thu, 21 Jan 2016 14:15:08 +0000 (14:15 +0000)]
target-arm: Fix wrong AArch64 entry offset for EL2/EL3 target
The entry offset when taking an exception to AArch64 from a lower
exception level may be 0x400 or 0x600. 0x400 is used if the
implemented exception level immediately lower than the target level
is using AArch64, and 0x600 if it is using AArch32. We were
incorrectly implementing this as checking the exception level
that the exception was taken from. (The two can be different if
for example we take an exception from EL0 to AArch64 EL3; we should
in this case be checking EL2 if EL2 is implemented, and EL1 if
EL2 is not implemented.)
Peter Maydell [Thu, 21 Jan 2016 14:15:08 +0000 (14:15 +0000)]
target-arm: Pull semihosting handling out to arm_cpu_do_interrupt()
Handling of semihosting calls should depend on the register width
of the calling code, not on that of any higher exception level,
so we need to identify and handle semihosting calls before we
decide whether to deliver the exception as an entry to AArch32
or AArch64. (EXCP_SEMIHOST is also an "internal exception" so
it has no target exception level in the first place.)
This will allow AArch32 EL1 code to use semihosting calls when
running under an AArch64 EL3.
Peter Maydell [Thu, 21 Jan 2016 14:15:08 +0000 (14:15 +0000)]
target-arm: Use a single entry point for AArch64 and AArch32 exceptions
If EL2 or EL3 is present on an AArch64 CPU, then exceptions can be
taken to an exception level which is running AArch32 (if only EL0
and EL1 are present then EL1 must be AArch64 and all exceptions are
taken to AArch64). To support this we need to have a single
implementation of the CPU do_interrupt() method which can handle both
32 and 64 bit exception entry.
Pull the common parts of aarch64_cpu_do_interrupt() and
arm_cpu_do_interrupt() out into a new function which calls
either the AArch32 or AArch64 specific entry code once it has
worked out which one is needed.
We temporarily special-case the handling of EXCP_SEMIHOST to
avoid an assertion in arm_el_is_aa64(); the next patch will
pull all the semihosting handling out to the arm_cpu_do_interrupt()
level (since semihosting semantics depend on the register width
of the calling code, not on that of any higher EL).
Peter Maydell [Thu, 21 Jan 2016 14:15:08 +0000 (14:15 +0000)]
target-arm: Move aarch64_cpu_do_interrupt() to helper.c
Move the aarch64_cpu_do_interrupt() function to helper.c. We want
to be able to call this from code that isn't AArch64-only, and
the move allows us to avoid awkward #ifdeffery at the callsite.
Peter Maydell [Thu, 21 Jan 2016 14:15:08 +0000 (14:15 +0000)]
target-arm: Properly support EL2 and EL3 in arm_el_is_aa64()
Support EL2 and EL3 in arm_el_is_aa64() by implementing the
logic for checking the SCR_EL3 and HCR_EL2 register-width bits
as appropriate to determine the register width of lower exception
levels.
Christoffer Dall [Thu, 21 Jan 2016 14:15:07 +0000 (14:15 +0000)]
hw/arm/virt: Add always-on property to the virt board timer
The virt board has an arch timer, which is always on. Emit the
"always-on" property to indicate to Linux that it can switch off the
periodic timer and reduces the amount of interrupts injected into a
guest.
Peter Maydell [Thu, 21 Jan 2016 14:15:07 +0000 (14:15 +0000)]
hw/arm/virt: add secure memory region and UART
Add a secure memory region to the virt board, which is the
same as the nonsecure memory region except that it also has
a secure-only UART in it. This is only created if the
board is started with the '-machine secure=on' property.
Peter Maydell [Thu, 21 Jan 2016 14:15:07 +0000 (14:15 +0000)]
hw/arm/virt: Wire up memory region to CPUs explicitly
Wire up the system memory region to the CPUs explicitly
by setting the QOM property. This doesn't change anything
over letting it default, but will be needed for adding
a secure memory region later.
Peter Maydell [Thu, 21 Jan 2016 14:15:07 +0000 (14:15 +0000)]
target-arm: Support multiple address spaces in page table walks
If we have a secure address space, use it in page table walks:
when doing the physical accesses to read descriptors, make them
through the correct address space.
(The descriptor reads are the only direct physical accesses
made in target-arm/ for CPUs which might have TrustZone.)
Peter Maydell [Thu, 21 Jan 2016 14:15:06 +0000 (14:15 +0000)]
target-arm: Add QOM property for Secure memory region
Add QOM property to the ARM CPU which boards can use to tell us what
memory region to use for secure accesses. Nonsecure accesses
go via the memory region specified with the base CPU class 'memory'
property.
By default, if no secure region is specified it is the same as the
nonsecure region, and if no nonsecure region is specified we will use
address_space_memory.
Add a MemoryRegion property, which if set is used to construct
the CPU's initial (default) AddressSpace.
Signed-off-by: Peter Crosthwaite <[email protected]>
[PMM: code is moved from qom/cpu.c to exec.c to avoid having to
make qom/cpu.o be a non-common object file; code to use the
MemoryRegion and to default it to system_memory added.] Signed-off-by: Peter Maydell <[email protected]> Acked-by: Edgar E. Iglesias <[email protected]>
This will either create a new AS or return a pointer to an
already existing equivalent one, if we have already created
an AS for the specified root memory region.
The motivation is to reuse address spaces as much as possible.
It's going to be quite common that bus masters out in device land
have pointers to the same memory region for their mastering yet
each will need to create its own address space. Let the memory
API implement sharing for them.
Aside from the perf optimisations, this should reduce the amount
of redundant output on info mtree as well.
Thee returned value will be malloced, but the malloc will be
automatically freed when the AS runs out of refs.
Signed-off-by: Peter Crosthwaite <[email protected]>
[PMM: dropped check for NULL root as unused; added doc-comment;
squashed Peter C's reference-counting patch into this one;
don't compare name string when deciding if we can share ASes;
read as->malloced before the unref of as->root to avoid possible
read-after-free if as->root was the owner of as] Signed-off-by: Peter Maydell <[email protected]> Acked-by: Edgar E. Iglesias <[email protected]>
Peter Maydell [Thu, 21 Jan 2016 14:15:06 +0000 (14:15 +0000)]
exec.c: Use correct AddressSpace in watch_mem_read and watch_mem_write
In the watchpoint access routines watch_mem_read and watch_mem_write,
find the correct AddressSpace to use from current_cpu and the memory
transaction attributes, rather than always assuming address_space_memory.
Peter Maydell [Thu, 21 Jan 2016 14:15:06 +0000 (14:15 +0000)]
exec.c: Use cpu_get_phys_page_attrs_debug
Use cpu_get_phys_page_attrs_debug() when doing virtual-to-physical
conversions in debug related code, so that we can obtain the right
address space index and thus select the correct AddressSpace,
rather than always using cpu->as.
Peter Maydell [Thu, 21 Jan 2016 14:15:05 +0000 (14:15 +0000)]
exec.c: Add cpu_get_address_space()
Add a function to return the AddressSpace for a CPU based on
its numerical index. (Callers outside exec.c don't have access
to the CPUAddressSpace struct so can't just fish it out of the
CPUState struct directly.)
Peter Maydell [Thu, 21 Jan 2016 14:15:05 +0000 (14:15 +0000)]
cputlb.c: Use correct address space when looking up MemoryRegionSection
When looking up the MemoryRegionSection for the new TLB entry in
tlb_set_page_with_attrs(), use cpu_asidx_from_attrs() to determine
the correct address space index for the lookup, and pass it into
address_space_translate_for_iotlb().
Peter Maydell [Thu, 21 Jan 2016 14:15:05 +0000 (14:15 +0000)]
cpu: Add new asidx_from_attrs() method
Add a new method to CPUClass which the memory system core can
use to obtain the correct address space index to use for a memory
access with a given set of transaction attributes, together
with the wrapper function cpu_asidx_from_attrs() which implements
the default behaviour ("always use asidx 0") for CPU classes
which don't provide the method.
Peter Maydell [Thu, 21 Jan 2016 14:15:05 +0000 (14:15 +0000)]
cpu: Add new get_phys_page_attrs_debug() method
Add a new optional method get_phys_page_attrs_debug() to CPUClass.
This is like the existing get_phys_page_debug(), but also returns
the memory transaction attributes to use for the access.
This will be necessary for CPUs which have multiple address
spaces and use the attributes to select the correct address
space.
We provide a wrapper function cpu_get_phys_page_attrs_debug()
which falls back to the existing get_phys_page_debug(), so we
don't need to change every target CPU.
Peter Maydell [Thu, 21 Jan 2016 14:15:04 +0000 (14:15 +0000)]
exec.c: Allow target CPUs to define multiple AddressSpaces
Allow multiple calls to cpu_address_space_init(); each
call adds an entry to the cpu->ases array at the specified
index. It is up to the target-specific CPU code to actually use
these extra address spaces.
Since this multiple AddressSpace support won't work with
KVM, add an assertion to avoid confusing failures.
Peter Maydell [Thu, 21 Jan 2016 14:15:04 +0000 (14:15 +0000)]
exec.c: Don't set cpu->as until cpu_address_space_init
Rather than setting cpu->as unconditionally in cpu_exec_init
(and then having target-i386 override this later), don't set
it until the first call to cpu_address_space_init.
This requires us to initialise the address space for
both TCG and KVM (KVM doesn't need the AS listener but
it does require cpu->as to be set).
For target CPUs which don't set up any address spaces (currently
everything except i386), add the default address_space_memory
in qemu_init_vcpu().
Alistair Francis [Thu, 21 Jan 2016 14:15:03 +0000 (14:15 +0000)]
xlnx-zynqmp: Connect the SPI devices
Connect the Xilinx SPI devices to the ZynqMP model.
Signed-off-by: Alistair Francis <[email protected]> Reviewed-by: Peter Crosthwaite <[email protected]>
[ PC changes
* Use QOM alias for bus connectivity on SoC level
] Signed-off-by: Peter Crosthwaite <[email protected]>
[PMM: free the g_strdup_printf() string when finished with it] Signed-off-by: Peter Maydell <[email protected]>
qdev: get_child_bus(): Use QOM lookup if available
qbus_realize() adds busses as a QOM child of the device in addition to
adding it to the qdev bus list. Change get_child_bus() to use the QOM
child if it is available. This takes priority over the bus-list, but
the child object is checked for type correctness.
This prepares support for aliasing of buses. The use case is SoCs,
where a SoC container needs to present buses to the board level, but
the buses are implemented by controller IP we already model as self
contained qbus-containing devices.
Peter Maydell [Thu, 21 Jan 2016 13:09:47 +0000 (13:09 +0000)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches
# gpg: Signature made Wed 20 Jan 2016 15:37:57 GMT using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
* remotes/kevin/tags/for-upstream:
iotests: Test that throttle values ranges
blockdev: Error out on negative throttling option values
vmdk: Create streamOptimized as version 3
qcow2: Make image inaccessible after failed qcow2_invalidate_cache()
qcow2: Fix BDRV_O_INACTIVE handling in qcow2_invalidate_cache()
qcow2: Implement .bdrv_inactivate
block: Inactivate BDS when migration completes
block: Rename BDRV_O_INCOMING to BDRV_O_INACTIVE
block: Fix error path in bdrv_invalidate_cache()
block: Assert no write requests under BDRV_O_INCOMING
qcow2: Write full header on image creation
qcow2: Write feature table only for v3 images
block: Clean up includes
qemu-iotests: Reduce racy output in 028
qemu-img: Speed up comparing empty/zero images
block/raw-posix: avoid bogus fixup for cylinders on DASD disks
block: Fix .bdrv_open flags
Peter Maydell [Thu, 21 Jan 2016 12:42:17 +0000 (12:42 +0000)]
Merge remote-tracking branch 'remotes/berrange/tags/pull-io-next-2016-01-20-1' into staging
I/O channels fixes 2016/01/20 v1
# gpg: Signature made Wed 20 Jan 2016 11:31:47 GMT using RSA key ID 15104FDF
# gpg: Good signature from "Daniel P. Berrange <[email protected]>"
# gpg: aka "Daniel P. Berrange <[email protected]>"
* remotes/berrange/tags/pull-io-next-2016-01-20-1:
io: use memset instead of { 0 } for initializing array
io: fix description of @errp parameter initialization
io: some fixes to handling of /dev/null when running commands
io: increment counter when killing off subcommand
io: fix sign of errno value passed to error report
Peter Maydell [Thu, 21 Jan 2016 12:09:41 +0000 (12:09 +0000)]
Merge remote-tracking branch 'remotes/kraxel/tags/pull-socket-20160120-1' into staging
Convert qemu-socket to use QAPI exclusively, update MAINTAINERS.
# gpg: Signature made Wed 20 Jan 2016 06:49:07 GMT using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg: aka "Gerd Hoffmann <[email protected]>"
# gpg: aka "Gerd Hoffmann (private) <[email protected]>"
* remotes/kraxel/tags/pull-socket-20160120-1:
vnc: distiguish between ipv4/ipv6 omitted vs set to off
sockets: remove use of QemuOpts from socket_dgram
sockets: remove use of QemuOpts from socket_connect
sockets: remove use of QemuOpts from socket_listen
sockets: remove use of QemuOpts from header file
add MAINTAINERS entry for qemu socket code
Fam Zheng [Wed, 20 Jan 2016 04:21:20 +0000 (12:21 +0800)]
blockdev: Error out on negative throttling option values
extract_common_blockdev_options() uses qemu_opt_get_number() to parse
the bps/iops numbers to uint64_t, then converts to double and stores in
ThrottleConfig. The actual parsing is done by strtoull() in
parse_option_number(). Negative numbers are wrapped to large positive
ones, and stored.
We used to reject negative numbers since 7d81c1413c9, but this regressed
when the option parsing code was changed later. Now fix this again.
This time, define an arbitrary large upper limit (1e15), and check the
values so both negative and impractically big numbers are caught and
reported.
Kevin Wolf [Tue, 22 Dec 2015 15:14:10 +0000 (16:14 +0100)]
qcow2: Make image inaccessible after failed qcow2_invalidate_cache()
If qcow2_invalidate_cache() fails, we are in a state where qcow2_close()
has already been completed, but the image hasn't been reopened yet.
Calling into any qcow2 function for an image in this state will cause
crashes.
The real solution would be to get rid of the close/open pair and instead
do an atomic reset of the involved data structures, but this isn't
trivial, so let's just make the image inaccessible for now.
Kevin Wolf [Tue, 22 Dec 2015 15:10:32 +0000 (16:10 +0100)]
qcow2: Fix BDRV_O_INACTIVE handling in qcow2_invalidate_cache()
What qcow2_invalidate_cache() should do is close the image with
BDRV_O_INACTIVE set and reopen it with the flag cleared. In fact, it
used to do exactly the opposite: qcow2_close() relied on bs->open_flags,
which is already updated to have cleared BDRV_O_INACTIVE at this point,
whereas qcow2_open() was called with s->flags, which has the flag still
set. Fix this.
Kevin Wolf [Tue, 22 Dec 2015 15:04:57 +0000 (16:04 +0100)]
qcow2: Implement .bdrv_inactivate
The callback has to ensure that closing or flushing the image afterwards
wouldn't cause a write access to the image files. This means that just
the caches have to be written out, which is part of the existing
.bdrv_close implementation.
Kevin Wolf [Tue, 22 Dec 2015 13:07:08 +0000 (14:07 +0100)]
block: Inactivate BDS when migration completes
So far, live migration with shared storage meant that the image is in a
not-really-ready don't-touch-me state on the destination while the
source is still actively using it, but after completing the migration,
the image was fully opened on both sides. This is bad.
This patch adds a block driver callback to inactivate images on the
source before completing the migration. Inactivation means that it goes
to a state as if it was just live migrated to the qemu instance on the
source (i.e. BDRV_O_INACTIVE is set). You're then supposed to continue
either on the source or on the destination, which takes ownership of the
image.
A typical migration looks like this now with respect to disk images:
1. Destination qemu is started, the image is opened with
BDRV_O_INACTIVE. The image is fully opened on the source.
2. Migration is about to complete. The source flushes the image and
inactivates it. Now both sides have the image opened with
BDRV_O_INACTIVE and are expecting the other side to still modify it.
3. One side (the destination on success) continues and calls
bdrv_invalidate_all() in order to take ownership of the image again.
This removes BDRV_O_INACTIVE on the resuming side; the flag remains
set on the other side.
This ensures that the same image isn't written to by both instances
(unless both are resumed, but then you get what you deserve). This is
important because .bdrv_close for non-BDRV_O_INACTIVE images could write
to the image file, which is definitely forbidden while another host is
using the image.
Kevin Wolf [Wed, 13 Jan 2016 14:56:06 +0000 (15:56 +0100)]
block: Rename BDRV_O_INCOMING to BDRV_O_INACTIVE
Instead of covering only the state of images on the migration
destination before the migration is completed, the flag will also cover
the state of images on the migration source after completion. This
common state implies that the image is technically still open, but no
writes will happen and any cached contents will be reloaded from disk if
and when the image leaves this state.
Kevin Wolf [Wed, 16 Dec 2015 13:00:36 +0000 (14:00 +0100)]
block: Assert no write requests under BDRV_O_INCOMING
As long as BDRV_O_INCOMING is set, the image file is only opened so we
have a file descriptor for it. We're definitely not supposed to modify
the image, it's still owned by the migration source.
Kevin Wolf [Wed, 2 Dec 2015 17:34:39 +0000 (18:34 +0100)]
qcow2: Write full header on image creation
When creating a qcow2 image, we didn't necessarily call
qcow2_update_header(), but could end up with the basic header that
qcow2_create2() created manually. One thing that this basic header
lacks is the feature table. Let's make sure that it's always present.
This requires a few updates to test cases as well.
Eric Blake [Fri, 11 Dec 2015 03:27:17 +0000 (20:27 -0700)]
qemu-iotests: Reduce racy output in 028
On my machine, './check -qcow2 028' was failing about 80% of the
time, due to a race in how many times the repeated attempts
to run 'info block-jobs' could occur before the job was done,
showing up as a failure of fewer '(qemu) ' prompts than in the
expected output. Silence the output during the repetitions, then
add a final clean command to keep the expected output useful;
once patched, I was finally able to run the test 20 times in a
row with no failures.
Fam Zheng [Wed, 13 Jan 2016 08:37:41 +0000 (16:37 +0800)]
qemu-img: Speed up comparing empty/zero images
Two empty raw files are always compared by actually reading data even if
there is no data, because BDRV_BLOCK_ZERO is considered "allocated" in
bdrv_is_allocated_above(). That is inefficient.
Use bdrv_get_block_status_above() for more information, and skip the
consecutive zero sectors.
This brings a huge speed up in comparing sparse/empty raw images:
$ qemu-img create a 1G
$ time ~/build/master/bin/qemu-img compare a a
Images are identical.
io: use memset instead of { 0 } for initializing array
Some versions of GCC on OS-X complain about CMSG_SPACE
not being constant size, which prevents use of { 0 }
io/channel-socket.c: In function 'qio_channel_socket_writev':
io/channel-socket.c:497:18: error: variable-sized object may not be initialized
char control[CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)] = { 0 };
The compiler is at fault here, but it is nicer to avoid
tickling this compiler bug by using memset instead.
io: some fixes to handling of /dev/null when running commands
The /dev/null file handle was leaked in a couple of places.
There is also the possibility that both readfd and writefd
point to the same /dev/null file handle, so care must be
taken not to close the same file handle twice.
Alex Williamson [Tue, 19 Jan 2016 18:33:42 +0000 (11:33 -0700)]
vfio/pci: Lazy PBA emulation
The PCI spec recommends devices use additional alignment for MSI-X
data structures to allow software to map them to separate processor
pages. One advantage of doing this is that we can emulate those data
structures without a significant performance impact to the operation
of the device. Some devices fail to implement that suggestion and
assigned device performance suffers.
One such case of this is a Mellanox MT27500 series, ConnectX-3 VF,
where the MSI-X vector table and PBA are aligned on separate 4K
pages. If PBA emulation is enabled, performance suffers. It's not
clear how much value we get from PBA emulation, but the solution here
is to only lazily enable the emulated PBA when a masked MSI-X vector
fires. We then attempt to more aggresively disable the PBA memory
region any time a vector is unmasked. The expectation is then that
a typical VM will run entirely with PBA emulation disabled, and only
when used is that emulation re-enabled.
Alex Williamson [Tue, 19 Jan 2016 18:33:41 +0000 (11:33 -0700)]
vfio/pci-quirks: Only quirk to size of PCI config space
For quirks that support the full PCIe extended config space, limit the
quirk to only the size of config space available through vfio. This
allows host systems with broken MMCONFIG regions to still make use of
these quirks without generating bad address faults trying to access
beyond the end of config space exposed through vfio. This may expose
direct access to the mirror of extended config space, only trapping
the sub-range of standard config space, but allowing this makes the
quirk, and thus the device, functional. We expect that only device
specific accesses make use of the mirror, not general extended PCI
capability accesses, so any virtualization in this space is likely
unnecessary anyway, and the device is still IOMMU isolated, so it
should only be able to hurt itself through any bogus configurations
enabled by this space.
block/raw-posix: avoid bogus fixup for cylinders on DASD disks
large volume DASD that have > 64k cylinders do claim to have
0xFFFE cylinders as special value in the old 16 bit field. We
want to pass this "token" along to the guest, instead of
calculating the real number. Otherwise qemu might fail with
"cyls must be between 1 and 65535"
Kevin Wolf [Mon, 11 Jan 2016 18:07:50 +0000 (19:07 +0100)]
block: Fix .bdrv_open flags
bdrv_common_open() modified bs->open_flags after inferring the set of
options to pass to the driver's .bdrv_open callback. This means that the
cache options were correctly set in bs->open_flags (and therefore
correctly displayed in 'info block'), but the image would actually be
opened with the default cache mode instead.
This patch removes the flags parameter to bdrv_common_open() (except for
BDRV_O_NO_BACKING it's the same as bs->open_flags anyway, and having two
names for the same thing is confusing), and moves the assignment of
open_flags down to immediately before calling into the block drivers. In
all other places, bs->open_flags is now used consistently.
vnc: distiguish between ipv4/ipv6 omitted vs set to off
The VNC code for interpreting QemuOpts does not currently
distinguish between ipv4/ipv6 being omitted, and being
set to 'off', because historically the 'ipv4' and 'ipv6'
options were just flags which did not accept a value.
The upshot is that if someone runs
$QEMU -vnc localhost:1,ipv6=off
QEMU still uses PF_UNSPEC and thus may still bind to IPv6,
when it should use PF_INET.
This is another instance of the problem previously fixed
for chardevs in
The socket_dgram method accepts a QAPI SocketAddress object
which it then turns into QemuOpts before calling the
inet_dgram_opts helper method. By converting the latter to
use QAPI SocketAddress directly, the QemuOpts conversion
step can be eliminated.
This removes the very last use of QemuOpts from the
sockets code, so the socket_optslist[] array is also
removed.
sockets: remove use of QemuOpts from socket_connect
The socket_connect method accepts a QAPI SocketAddress object
which it then turns into QemuOpts before calling the
inet_connect_opts/unix_connect_opts helper methods. By
converting the latter to use QAPI SocketAddress directly,
the QemuOpts conversion step can be eliminated
sockets: remove use of QemuOpts from socket_listen
The socket_listen method accepts a QAPI SocketAddress object
which it then turns into QemuOpts before calling the
inet_listen_opts/unix_listen_opts helper methods. By
converting the latter to use QAPI SocketAddress directly,
the QemuOpts conversion step can be eliminated
There are no callers of the sockets methods which accept
QemuOpts any more. Make all the QemuOpts related functions
static to avoid new callers being added, in preparation
for removal of all QemuOpts usage, in favour of QAPI
SocketAddress.
When killing the subcommand, it is intended to first send
SIGTERM, then SIGKILL and only report an error if it still
doesn't die after SIGKILL. The 'step' counter was not
being incremented though, so the code never got past the
SIGTERM stage.
Andreas Färber [Mon, 18 Jan 2016 17:19:35 +0000 (18:19 +0100)]
MAINTAINERS: Fix sPAPR entry heading
get_maintainers.pl does not handle parenthesis in maintenance areas well
in connection with list emails (here: [email protected]).
Resolve a recurring CC issue breaking git-send-email by reverting part
of commit 085eb217dfb3ee12e7985c11f71f8a038394735a ("Add David Gibson
for sPAPR in MAINTAINERS file").
Paolo Bonzini [Mon, 19 Oct 2015 11:11:39 +0000 (13:11 +0200)]
qdev: Free QemuOpts when the QOM path goes away
Otherwise there is a race where the DEVICE_DELETED event has been sent but
attempts to reuse the ID will fail.
Note that similar races exist for other QemuOpts, which this patch
does not attempt to fix.
For example, if the device is a block device, then unplugging it also
deletes its backend. However, this backend's get deleted in
drive_info_del(), which is only called when properties are
destroyed. Just like device_finalize(), drive_info_del() is called
some time after DEVICE_DELETED is sent. A separate patch series has
been sent to plug this other bug. Character devices also have yet to
be fixed.
Currently the ObjectProperty iterator API works as follows:
ObjectPropertyIterator *iter;
iter = object_property_iter_init(obj);
while ((prop = object_property_iter_next(iter))) {
...
}
object_property_iter_free(iter);
This has the benefit that the ObjectPropertyIterator struct
can be opaque, but has the downside that callers need to
explicitly call a free function. It is also not in keeping
with iterator style used elsewhere in QEMU/GLib2.
This patch changes the API to use stack allocation instead:
ObjectPropertyIterator iter;
object_property_iter_init(&iter, obj);
while ((prop = object_property_iter_next(&iter))) {
...
}
qom: Allow properties to be registered against classes
When there are many instances of a given class, registering
properties against the instance is wasteful of resources. The
majority of objects have a statically defined list of possible
properties, so most of the properties are easily registerable
against the class. Only those properties which are conditionally
registered at runtime need be recorded against the klass.
Registering properties against classes also makes it possible
to provide static introspection of QOM - currently introspection
is only possible after creating an instance of a class, which
severely limits its usefulness.
This impl only supports simple scalar properties. It does not
attempt to allow child object / link object properties against
the class. There are ways to support those too, but it would
make this patch more complicated, so it is left as an exercise
for the future.
There is no equivalent to object_property_del() provided, since
classes must be immutable once they are defined.
Peter Maydell [Mon, 7 Dec 2015 16:23:43 +0000 (16:23 +0000)]
scripts: Add new clean-includes script to fix C include directives
Add a new scripts/clean-includes, which can be used to automatically
ensure that a C source file includes qemu/osdep.h first and doesn't
then include any headers which osdep.h provides already.
Similarly to the commit 764eb39d1b6 fixing VNC+SASL+QXL, when starting
QEMU with SPICE but no SASL, and at the same time VNC with SASL, then
spice_server_init() will get called without a previous call to
spice_server_set_sasl_appname(), which will cause cyrus-sasl to
try to use /etc/sasl2/spice.conf (spice-server uses "spice" as its
default appname) rather than the expected /etc/sasl2/qemu.conf.
This commit unconditionally calls spice_server_set_sasl_appname()
before calling spice_server_init() in order to use the correct appname
even if SPICE without SASL was requested on qemu command line.
This pointer should be cleared in vnc_display_close()
otherwise a use-after-free can happen when when using the
old style 'x509' and 'tls' options rather than a persistent
tls-creds -object, by issuing monitor commands to change
the vnc server like so:
Start with: -vnc unix:test.socket,x509,tls
Then use the following monitor command:
change vnc unix:test.socket
After this the pointer is still set but invalid and a crash
can be triggered for instance by issuing the same command a
second time which will try to object_unparent() the same
pointer again.
Paolo Bonzini [Thu, 17 Dec 2015 12:47:02 +0000 (13:47 +0100)]
gtk: implement set_echo
Even without line editing, this makes -qmp vc more pleasant with the
GTK+ backend. The only issue is that set_echo is invoked very early,
long before a vc is actually associated with a VirtualConsole. To work
around this, create a temporary VirtualConsole until then.
Peter Maydell [Mon, 18 Jan 2016 09:33:36 +0000 (09:33 +0000)]
Merge remote-tracking branch 'remotes/mcayland/tags/qemu-sparc-signed' into staging
qemu-sparc update
# gpg: Signature made Sat 16 Jan 2016 12:32:06 GMT using RSA key ID AE0F321F
# gpg: Good signature from "Mark Cave-Ayland <[email protected]>"
* remotes/mcayland/tags/qemu-sparc-signed:
target-sparc: Migrate CWP and PIL for SPARC64
target-sparc: Use VMState arrays for SPARC64 TLB/MMU state
target-sparc: Convert to VMStateDescription
target-sparc: Don't flush TLB in cpu_load function
target-sparc: Split cpu_put_psr into side-effect and no-side-effect parts
vmstate: define vmstate_info_uinttl
vmstate: Introduce VMSTATE_VARRAY_MULTPLY
vmstate: introduce CPU_DoubleU arrays
Peter Maydell [Mon, 11 Jan 2016 12:40:28 +0000 (12:40 +0000)]
target-sparc: Migrate CWP and PIL for SPARC64
In SPARC32 the env->cwp and env->psrpil state is part of the PSR
register, and gets migrated as part of that register.
In SPARC64 this state is in separate CWP and PIL registers, but we
were not doing anything to migrate those.
Add the missing fields to the migration vmstate (which is a
migration break, but without these fields migration is completely
broken anyway).
This change means that trying a save/load of a SPARC64 target at
the boot rom prompt now produces a system which at least responds
to keyboard input after the restore.
Peter Maydell [Mon, 11 Jan 2016 12:40:27 +0000 (12:40 +0000)]
target-sparc: Use VMState arrays for SPARC64 TLB/MMU state
Use VMState arrays for SPARC64 TLB/MMU state. This is
a migration-break for SPARC64 (but not for SPARC32),
which is acceptable because currently migration does not
work for any SPARC64 machines due to the lack of any migration
of interrupt controller state.
Juan Quintela [Mon, 11 Jan 2016 12:40:26 +0000 (12:40 +0000)]
target-sparc: Convert to VMStateDescription
Convert the SPARC CPU from cpu_load/save functions to VMStateDescription.
We preserve migration compatibility with the previous version
(required for SPARC32 but not necessarily for SPARC64).
Signed-off-by: Juan Quintela <[email protected]>
[PMM:
* Rebase and update to apply to master
* VMSTATE_STRUCT_POINTER now takes type, not pointer-to-type
* QEMUTimer* are migrated via VMSTATE_TIMER_PTR
* Put CPUTimer vmstate struct inside TARGET_SPARC64 ifdef
* Convert handling of PSR to use a vmstate_psr, like Alpha and ARM
] Signed-off-by: Peter Maydell <[email protected]> Signed-off-by: Mark Cave-Ayland <[email protected]>
Peter Maydell [Mon, 11 Jan 2016 12:40:24 +0000 (12:40 +0000)]
target-sparc: Split cpu_put_psr into side-effect and no-side-effect parts
For inbound migration we really want to be able to set the PSR without
having any side effects, but cpu_put_psr() calls cpu_check_irqs() which
might try to deliver CPU interrupts. Split cpu_put_psr() into the
no-side-effect and side-effect parts.
This includes reordering the cpu_check_irqs() to the end of cpu_put_psr(),
because that function may actually end up calling cpu_interrupt(), which
does not seem like a good thing to happen in the middle of updating the PSR.
Juan Quintela [Mon, 11 Jan 2016 12:40:23 +0000 (12:40 +0000)]
vmstate: define vmstate_info_uinttl
We are going to define arrays of this type, so we need the integer type.
Signed-off-by: Juan Quintela <[email protected]>
[PMM: updated to apply on current QEMU; renamed to 'uinttl'
rather than 'uinttls' to match other vmstate naming] Signed-off-by: Peter Maydell <[email protected]> Signed-off-by: Mark Cave-Ayland <[email protected]>
Juan Quintela [Mon, 11 Jan 2016 12:40:21 +0000 (12:40 +0000)]
vmstate: introduce CPU_DoubleU arrays
Add vmstate support for migrating arrays of CPU_DoubleU via
VMSTATE_CPUDOUBLE_ARRAY.
Signed-off-by: Juan Quintela <[email protected]>
[PMM: rebased, since files have all moved since 2012;
added VMSTATE_CPUDOUBLE_ARRAY_V for consistency with FLOAT64] Signed-off-by: Peter Maydell <[email protected]> Signed-off-by: Mark Cave-Ayland <[email protected]>
# gpg: Signature made Fri 15 Jan 2016 17:58:28 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg: aka "Paolo Bonzini <[email protected]>"
* remotes/bonzini/tags/for-upstream:
qemu-char: do not leak QemuMutex when freeing a character device
qemu-char: add logfile facility to all chardev backends
nbd-server: do not exit on failed memory allocation
nbd-server: do not check request length except for reads and writes
nbd-server: Coroutine based negotiation
nbd: Split nbd.c
nbd: Always call "close_fn" in nbd_client_new
SCSI device: fix to incomplete QOMify
iscsi: send readcapacity10 when readcapacity16 failed
qemu-char: delete send_all/recv_all helper methods
vmw_pvscsi: x-disable-pcie, x-old-pci-configuration back-compat props are 2.5 specific
scsi: initialise info object with appropriate size
i386: avoid null pointer dereference
target-i386: do not duplicate page protection checks
scsi: revert change to scsi_req_cancel_async and add assertions
qemu-char: add logfile facility to all chardev backends
Typically a UNIX guest OS will log boot messages to a serial
port in addition to any graphical console. An admin user
may also wish to use the serial port for an interactive
console. A virtualization management system may wish to
collect system boot messages by logging the serial port,
but also wish to allow admins interactive access.
Currently providing such a feature forces the mgmt app
to either provide 2 separate serial ports, one for
logging boot messages and one for interactive console
login, or to proxy all output via a separate service
that can multiplex the two needs onto one serial port.
While both are valid approaches, they each have their
own downsides. The former causes confusion and extra
setup work for VM admins creating disk images. The latter
places an extra burden to re-implement much of the QEMU
chardev backends logic in libvirt or even higher level
mgmt apps and adds extra hops in the data transfer path.
A simpler approach that is satisfactory for many use
cases is to allow the QEMU chardev backends to have a
"logfile" property associated with them.
This patch introduces a 'ChardevCommon' struct which
is setup as a base for all the ChardevBackend types.
Ideally this would be registered directly as a base
against ChardevBackend, rather than each type, but
the QAPI generator doesn't allow that since the
ChardevBackend is a non-discriminated union. The
ChardevCommon struct provides the optional 'logfile'
parameter, as well as 'logappend' which controls
whether QEMU truncates or appends (default truncate).
Paolo Bonzini [Thu, 7 Jan 2016 13:34:13 +0000 (14:34 +0100)]
nbd-server: do not exit on failed memory allocation
The amount of memory allocated in nbd_co_receive_request is driven by the
NBD client (possibly a virtual machine). Parallel I/O can cause the
server to allocate a large amount of memory; check for failures and
return ENOMEM in that case.
Paolo Bonzini [Thu, 7 Jan 2016 13:32:42 +0000 (14:32 +0100)]
nbd-server: do not check request length except for reads and writes
Only reads and writes need to allocate memory correspondent to the
request length. Other requests can be sent to the storage without
allocating any memory, and thus any request length is acceptable.
Fam Zheng [Thu, 14 Jan 2016 08:41:03 +0000 (16:41 +0800)]
nbd-server: Coroutine based negotiation
Create a coroutine in nbd_client_new, so that nbd_send_negotiate doesn't
need qemu_set_block().
Handlers need to be set temporarily for csock fd in case the coroutine
yields during I/O.
With this, if the other end disappears in the middle of the negotiation,
we don't block the whole event loop.
To make the code clearer, unify all function names that belong to
negotiate, so they are less likely to be misused. This is important
because we rely on negotiation staying in main loop, as commented in
nbd_negotiate_read/write().
Fam Zheng [Thu, 14 Jan 2016 08:41:01 +0000 (16:41 +0800)]
nbd: Always call "close_fn" in nbd_client_new
Rename the parameter "close" to "close_fn" to disambiguous with
close(2).
This unifies error handling paths of NBDClient allocation:
nbd_client_new will shutdown the socket and call the "close_fn" callback
if negotiation failed, so the caller don't need a different path than
the normal close.
The returned pointer is never used, make it void in preparation for the
next patch.