Kevin Wolf [Thu, 2 Mar 2017 14:26:18 +0000 (15:26 +0100)]
block: Fix blockdev-snapshot error handling
For blockdev-snapshot, external_snapshot_prepare() accepts an arbitrary
node reference at first and only checks later whether it already has a
backing file. Between those places, other errors can occur.
Therefore checking in external_snapshot_abort() whether state->new_bs
has a backing file is not sufficient to tell whether bdrv_append() was
already completed or not. Trying to undo the bdrv_append() when it
wasn't even executed is wrong.
Introduce a new boolean flag in the state to fix this.
Kevin Wolf [Mon, 6 Mar 2017 15:03:00 +0000 (16:03 +0100)]
mirror: Fix permissions for removing mirror_top_bs
mirror_top_bs takes write permissions on its backing file, which can
make it impossible to attach that backing file node to another parent.
However, this is exactly what needs to be done in order to remove
mirror_top_bs from the backing chain. So give up the write permission
first.
Kevin Wolf [Thu, 2 Mar 2017 16:48:14 +0000 (17:48 +0100)]
mirror: Fix permission problem with 'replaces'
The 'replaces' option of drive-mirror can be used to mirror a Quorum
node to a new image and then let the target image replace one of the
Quorum children. In order for this graph modification to succeed, the
mirror job needs to lift its restrictions on the target node first
before actually replacing the child.
Kevin Wolf [Fri, 3 Mar 2017 15:54:21 +0000 (16:54 +0100)]
commit: Fix error handling
Apparently some kind of mismerge happened in commit 8dfba279, which
broke the error handling without any real reason by removing the
assignment of the return value to ret in a blk_insert_bs() call.
Peter Maydell [Tue, 7 Mar 2017 09:09:53 +0000 (09:09 +0000)]
Merge remote-tracking branch 'remotes/gkurz/tags/fixes-for-2.9' into staging
Fixes issues that got merged with the latest pull request:
- missing O_NOFOLLOW flag for CVE-2016-960
- build break with older glibc that don't have O_PATH and AT_EMPTY_PATH
- various bugs reported by Coverity
# gpg: Signature made Mon 06 Mar 2017 17:51:29 GMT
# gpg: using DSA key 0x02FC3AEB0101DBC2
# gpg: Good signature from "Greg Kurz <[email protected]>"
# gpg: aka "Greg Kurz <[email protected]>"
# gpg: aka "Greg Kurz <[email protected]>"
# gpg: aka "Gregory Kurz (Groug) <[email protected]>"
# gpg: aka "[jpeg image of size 3330]"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894 DBA2 02FC 3AEB 0101 DBC2
* remotes/gkurz/tags/fixes-for-2.9:
9pfs: fix vulnerability in openat_dir() and local_unlinkat_common()
9pfs: fix O_PATH build break with older glibc versions
9pfs: don't use AT_EMPTY_PATH in local_set_cred_passthrough()
9pfs: fail local_statfs() earlier
9pfs: fix fd leak in local_opendir()
9pfs: fix bogus fd check in local_remove()
* remotes/mdroth/tags/qga-pull-2017-03-06-tag:
tests: check path to avoid a failing qga/get-vcpus test
qga: ignore EBUSY when freezing a filesystem
qga: add systemd socket activation support
Greg Kurz [Mon, 6 Mar 2017 16:34:01 +0000 (17:34 +0100)]
9pfs: fix O_PATH build break with older glibc versions
When O_PATH is used with O_DIRECTORY, it only acts as an optimization: the
openat() syscall simply finds the name in the VFS, and doesn't trigger the
underlying filesystem.
On systems that don't define O_PATH, because they have glibc version 2.13
or older for example, we can safely omit it. We don't want to deactivate
O_PATH globally though, in case it is used without O_DIRECTORY. The is done
with a dedicated macro.
Systems without O_PATH may thus fail to resolve names that involve
unreadable directories, compared to newer systems succeeding, but such
corner case failure is our only option on those older systems to avoid
the security hole of chasing symlinks inappropriately.
Greg Kurz [Mon, 6 Mar 2017 16:34:01 +0000 (17:34 +0100)]
9pfs: don't use AT_EMPTY_PATH in local_set_cred_passthrough()
The name argument can never be an empty string, and dirfd always point to
the containing directory of the file name. AT_EMPTY_PATH is hence useless
here. Also it breaks build with glibc version 2.13 and older.
It is actually an oversight of a previous tentative patch to implement this
function. We can safely drop it.
Greg Kurz [Mon, 6 Mar 2017 16:34:01 +0000 (17:34 +0100)]
9pfs: fix bogus fd check in local_remove()
This was spotted by Coverity as a fd leak. This is certainly true, but also
local_remove() would always return without doing anything, unless the fd is
zero, which is very unlikely.
Peter Maydell [Mon, 6 Mar 2017 15:13:23 +0000 (15:13 +0000)]
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Mon 06 Mar 2017 04:15:17 GMT
# gpg: using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
net/filter-mirror: Follow CODING_STYLE
COLO-compare: Fix icmp and udp compare different packet always dump bug
COLO-compare: Optimize compare_common and compare_tcp
COLO-compare: Rename compare function and remove duplicate codes
filter-rewriter: skip net_checksum_calculate() while offset = 0
net/colo: fix memory double free error
vmxnet3: VMStatify rx/tx q_descr and int_state
vmxnet3: Convert ring values to uint32_t's
net/colo-compare: Fix memory free error
colo-compare: Fix removing fds been watched incorrectly in finalization
char: remove the right fd been watched in qemu_chr_fe_set_handlers()
colo-compare: kick compare thread to exit after some cleanup in finalization
colo-compare: use g_timeout_source_new() to process the stale packets
NetRxPkt: Remove code duplication in net_rx_pkt_pull_data()
NetRxPkt: Account buffer with ETH header in IOV length
NetRxPkt: Do not try to pull more data than present
NetRxPkt: Fix memory corruption on VLAN header stripping
eth: Extend vlan stripping functions
net: Remove useless local var pkt
Peter Maydell [Mon, 6 Mar 2017 13:06:30 +0000 (13:06 +0000)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170306' into staging
ppc patch queue for 2017-03-06
Looks like my previous batch wasn't quite the last before hard freeze.
This has a handful of bugfixes to go in. They're all genuine
bugfixes, though not regressions in some cases.
* remotes/dgibson/tags/ppc-for-2.9-20170306:
target/ppc: use helper for excp handling
target/ppc: fmadd: add macro for updating flags
target/ppc: fmadd check for excp independently
spapr: ensure that all threads within core are on the same NUMA node
ppc/xics: register reset handlers for the ICP and ICS objects
Bruce Rogers [Thu, 2 Mar 2017 19:44:37 +0000 (12:44 -0700)]
tests: check path to avoid a failing qga/get-vcpus test
The qga/get-vcpus test fails in a simple chroot environment, as
used in an openSUSE Build Service local build, so first check
that the sysfs based path exists in order to avoid calling this
test in an environment where it won't work right.
Peter Lieven [Tue, 31 Jan 2017 15:36:34 +0000 (16:36 +0100)]
qga: ignore EBUSY when freezing a filesystem
the current implementation fails if we try to freeze an
already frozen filesystem. This can happen if a filesystem
is mounted more than once (e.g. with a bind mount).
Stefan Hajnoczi [Fri, 6 Jan 2017 15:29:30 +0000 (15:29 +0000)]
qga: add systemd socket activation support
AF_UNIX and AF_VSOCK listen sockets can be passed in by systemd on
startup. This allows systemd to manage the listen socket until the
first client connects and between restarts. Advantages of socket
activation are that parallel startup of network services becomes
possible and that unused daemons do not consume memory.
The key to achieving this is the LISTEN_FDS environment variable, which
is a stable ABI as shown here:
https://www.freedesktop.org/wiki/Software/systemd/InterfacePortabilityAndStabilityChart/
We could link against libsystemd and use sd_listen_fds(3) but it's easy
to implement the tiny LISTEN_FDS ABI so that qemu-ga does not depend on
libsystemd. Some systems may not have systemd installed and wish to
avoid the dependency. Other init systems or socket activation servers
may implement the same ABI without systemd involvement.
Zhang Chen [Thu, 2 Mar 2017 09:54:17 +0000 (17:54 +0800)]
COLO-compare: Optimize compare_common and compare_tcp
Add offset args for colo_packet_compare_common, optimize
colo_packet_compare_icmp() and colo_packet_compare_udp()
just compare the IP payload. Before compare all tcp packet,
we compare tcp checksum firstly, this function can get
better performance.
Zhang Chen [Thu, 2 Mar 2017 09:54:16 +0000 (17:54 +0800)]
COLO-compare: Rename compare function and remove duplicate codes
Rename colo_packet_compare() to colo_packet_compare_common() that
make tcp_compare udp_compare icmp_compare reuse this function.
Remove minimum packet size check in icmp_compare, because we have
check this in parse_packet_early().
zhanghailiang [Tue, 28 Feb 2017 03:54:19 +0000 (11:54 +0800)]
filter-rewriter: skip net_checksum_calculate() while offset = 0
While the offset of packets's sequence for primary side and
secondary side is zero, it is unnecessary to call net_checksum_calculate()
to recalculate the checksume value of packets.
zhanghailiang [Tue, 28 Feb 2017 03:54:18 +0000 (11:54 +0800)]
net/colo: fix memory double free error
The 'primary_list' and 'secondary_list' members of struct Connection
is not allocated through dynamically g_queue_new(), but we free it by using
g_queue_free(), which will lead to a double-free bug.
The index's in the Vmxnet3Ring were migrated as 32bit ints
yet are declared as size_t's. They appear to be derived
from 32bit values loaded from guest memory, so actually
store them as that.
zhanghailiang [Fri, 17 Feb 2017 02:53:14 +0000 (10:53 +0800)]
colo-compare: Fix removing fds been watched incorrectly in finalization
We will catch the bellow error report while try to delete compare object
by qmp command:
chardev/char-io.c:91: io_watch_poll_finalize: Assertion `iwp->src == ((void *)0)' failed.
This is caused by failing to remove the right fd been watched while
call qemu_chr_fe_set_handlers();
Fix it by pass the worker_context parameter to qemu_chr_fe_set_handlers().
zhanghailiang [Fri, 17 Feb 2017 02:53:13 +0000 (10:53 +0800)]
char: remove the right fd been watched in qemu_chr_fe_set_handlers()
We can call qemu_chr_fe_set_handlers() to add/remove fd been watched
in 'context' which can be either default main context or other explicit
context. But the original logic is not correct, we didn't remove
the right fd because we call g_main_context_find_source_by_id(NULL, tag)
which always try to find the Gsource from default context.
Fix it by passing the right context to g_main_context_find_source_by_id().
zhanghailiang [Fri, 17 Feb 2017 02:53:12 +0000 (10:53 +0800)]
colo-compare: kick compare thread to exit after some cleanup in finalization
We should call g_main_loop_quit() to notify colo compare thread to
exit, Or it will run in g_main_loop_run() forever.
Besides, the finalizing process can't happen in context of colo thread,
it is reasonable to remove the 'if (qemu_thread_is_self(&s->thread))'
branch.
Before compare thead exits, some cleanup works need to be
done, All unhandled packets need to be released and connection_track_table
needs to be freed, or there will be memory leak.
zhanghailiang [Fri, 17 Feb 2017 02:53:11 +0000 (10:53 +0800)]
colo-compare: use g_timeout_source_new() to process the stale packets
Instead of using qemu timer to process the stale packets,
We re-use the colo compare thread to process these packets
by creating a new timeout coroutine.
Besides, since we process all the same vNIC's net connection/packets
in one thread, it is safe to remove the timer_check_lock.
Dmitry Fleytman [Thu, 16 Feb 2017 12:29:33 +0000 (14:29 +0200)]
NetRxPkt: Fix memory corruption on VLAN header stripping
This patch fixed a problem that was introduced in commit eb700029.
When net_rx_pkt_attach_iovec() calls eth_strip_vlan()
this can result in pkt->ehdr_buf being overflowed, because
ehdr_buf is only sizeof(struct eth_header) bytes large
but eth_strip_vlan() can write
sizeof(struct eth_header) + sizeof(struct vlan_header)
bytes into it.
Igor Mammedov [Fri, 24 Feb 2017 09:26:56 +0000 (10:26 +0100)]
spapr: ensure that all threads within core are on the same NUMA node
Threads within a core shouldn't be on different
NUMA nodes, so if user has misconfgured command
line, fail QEMU at start up to force user fix it.
For now use the first thread on the core as source
of core's node-id. Later when cpu-numa refactoring
lands it will be switched to core's node-id from
possible_cpus[].
This prevents the same problems as commit 20bb648d
"spapr: Fix default NUMA node allocation for threads",
but for the case of manually configured NUMA node
mappings, instead of just the default case.
ppc/xics: register reset handlers for the ICP and ICS objects
The recent changes on the XICS layer removed the XICSState object to
let the sPAPR machine handle the ICP and ICS directly. The reset of
these objects was previously handled by XICSState, which was a SysBus
device, and to keep the same behavior, the ICP and ICS were assigned
to SysbBus.
But that broke the 'info qtree' command in the monitor. 'qtree'
performs a loop on the children of a bus to print their properties and
SysBus devices are expected to be found under SysBus, which is not the
case anymore.
The fix for this problem is to register reset handlers for the ICP and
ICS objects and stop using SysBus for such devices.
When you try to visit beyond the end of a list, the qobject input
visitor crashes, and the string visitor screws returns garbage. The
generated list visits never go beyond the list end, but manual visits
could.
qapi: Make input visitors detect unvisited list tails
Fix the design flaw demonstrated in the previous commit: new method
check_list() lets input visitors report that unvisited input remains
for a list, exactly like check_struct() lets them report that
unvisited input remains for a struct or union.
Implement the method for the qobject input visitor (straightforward),
and the string input visitor (less so, due to the magic list syntax
there). The opts visitor's list magic is even more impenetrable, and
all I can do there today is a stub with a FIXME comment. No worse
than before.
Demonstrates a design flaw: there is no way to for input visitors to
report that a list visit didn't visit the complete input list. The
generated list visits always do, but manual visits needn't.
qapi: Drop unused non-strict qobject input visitor
The split between tests/test-qobject-input-visitor.c and
tests/test-qobject-input-strict.c now makes less sense than ever. The
next commit will take care of that.
The qobject input visitor comes in a strict and a non-strict variant.
This test is the non-strict variant's last user. Turns out it relies
on non-strict only in test_visitor_in_null(), and just out of
laziness. We don't actually test the non-strict behavior.
Clean up test_visitor_in_null(), and switch to the strict variant.
The next commit will drop the non-strict variant.
qom: Make object_property_set_qobject()'s input visitor strict
Commit 240f64b made all qobject input visitors created outside tests
strict, except for the one in object_property_set_qobject(). That one
was left behind only because Eric couldn't spare the time to figure
out whether making it strict would break anything, with a TODO
comment. Time to resolve it.
Strict makes a difference only for otherwise successful visits of QAPI
structs or unions. Let's examine what the callers of
object_property_set_qobject() visit:
* qmp_qom_set visits its @value argument. Comes straight from QMP and
can be anything ('any' in the QAPI schema). Strictness matters when
the property's set() method visits a struct or union QAPI type.
No such methods exist, thus switching to strict can't break
anything.
If we acquire such methods in the future, we'll *want* the visitor
to be strict, so that unexpected members get rejected as they should
be.
qapi: Make string input and opts visitor require non-null input
The string input visitor tries to cope with null input. Null input
isn't used anywhere, and isn't covered by tests. Unsurprisingly, it
doesn't fully work: start_list() crashes because it passes the input
via parse_str() to strtoll() unchecked.
Make string_input_visitor_new() assert its argument isn't null, and
drop the code trying to deal with null input.
The opts visitor crashes when you try to actually visit something with
null input. Make opts_visitor_new() assert its argument isn't null,
mostly for clarity.
qobject_input_visitor_new() already asserts its argument isn't null.
visit_optional() is to be called only between visit_start_struct() and
visit_end_struct(). Visitors that don't support struct visits,
i.e. don't implement start_struct(), end_struct(), have no use for it.
Clarify documentation.
The string input visitor doesn't support struct visits. Its
parse_optional() is therefore useless. Drop it.
Error messages refer to nodes of the QObject being visited by name.
Trouble is the names are sometimes less than helpful:
* The name of the root QObject is whatever @name argument got passed
to the visitor, except NULL gets mapped to "null". We commonly pass
NULL. Not good.
Avoiding errors "at the root" mitigates. For instance,
visit_start_struct() can only fail when the visited object is not a
dictionary, and we commonly ensure it is beforehand.
* The name of a QDict's member is the member key. Good enough only
when this happens to be unique.
* The name of a QList's member is "null". Not good.
Improve error messages by referring to nodes by path instead, as
follows:
* The path of the root QObject is whatever @name argument got passed
to the visitor, except NULL gets mapped to "<anonymous>".
* The path of a root QDict's member is the member key.
* The path of a root QList's member is "[%u]", where %u is the list
index, starting at zero.
* The path of a non-root QDict's member is the path of the QDict
concatenated with "." and the member key.
* The path of a non-root QList's member is the path of the QList
concatenated with "[%u]", where %u is the list index.
{"error": {"class": "GenericError", "desc": "Invalid parameter type for 'events[0]', expected: object"}}
instead of
{"error": {"class": "GenericError", "desc": "Invalid parameter type for 'null', expected: QDict"}}
Aside: calling the thing "parameter" is suboptimal for QMP, because
the root object is "arguments" there.
The qobject output visitor doesn't have this problem because it should
not fail. Same for dealloc and clone visitors.
The string visitors don't have this problem because they visit just
one value, whose name needs to be passed to the visitor as @name. The
string output visitor shouldn't fail anyway.
The options visitor uses QemuOpts names. Their name space is flat, so
the use of QDict member keys as names is fine. NULL names used with
roots and lists could conceivably result in bad error messages. Left
for another day.
qapi: Improve a QObject input visitor error message
The QObject input visitor has three error message formats:
* Parameter '%s' is missing
* "Invalid parameter type for '%s', expected: %s"
* "QMP input object member '%s' is unexpected"
The '%s' are member names (or "null", but I'll fix that later).
The last error message calls the thing "QMP input object member"
instead of "parameter". Misleading when the visitor is used on
QObjects that don't come from QMP. Change it to "Parameter '%s' is
unexpected".
The QERR_ macros are leftovers from the days of "rich" error objects.
QERR_QMP_BAD_INPUT_OBJECT, QERR_QMP_BAD_INPUT_OBJECT_MEMBER,
QERR_QMP_EXTRA_MEMBER are used in just one place now, except for one
use that has crept into qobject-input-visitor.c.
Drop these macros, to make the (bad) error messages more visible.
qmp_check_input_obj() duplicates qmp_dispatch_check_obj(), except the
latter screws up an error message. handle_qmp_command() runs first
the former, then the latter via qmp_dispatch(), masking the screwup.
qemu-ga also masks the screwup, because it also duplicates checks,
just differently.
qmp_check_input_obj() exists because handle_qmp_command() needs to
examine the command before dispatching it. The previous commit got
rid of this need, except for a tracepoint, and a bit of "id" code that
relies on qdict not being null.
Fix up the error message in qmp_dispatch_check_obj(), drop
qmp_check_input_obj() and the tracepoint. Protect the "id" code with
a conditional.
qmp: Clean up how we enforce capability negotiation
To enforce capability negotiation before normal operation,
handle_qmp_command() inspects every command before it's handed off to
qmp_dispatch(). This is a bit of a layering violation, and results in
duplicated code.
Before capability negotiation (!cur_mon->in_command_mode), we fail
commands other than "qmp_capabilities". This is what enforces
capability negotiation.
Afterwards, we fail command "qmp_capabilities".
Clean this up as follows.
The obvious place to fail a command is the command itself, so move the
"afterwards" check to qmp_qmp_capabilities().
We do the "before" check in every other command, but that would be
bothersome. Instead, start with an alternate list of commands that
contains only "qmp_capabilities". Switch to the full list in
qmp_qmp_capabilities().
Additionally, replace the generic human-readable error message for
CommandNotFound by one that reminds the user to run qmp_capabilities.
Without that, we'd regress commit 2d5a834.
qapi: Support multiple command registries per program
The command registry encapsulates a single command list. Give the
functions using it a parameter instead. Define suitable command lists
in monitor, guest agent and test-qmp-commands.
qmp: Dumb down how we run QMP command registration
The way we get QMP commands registered is high tech:
* qapi-commands.py generates qmp_init_marshal() that does the actual work
* it also generates the magic to register it as a MODULE_INIT_QAPI
function, so it runs when someone calls
module_call_init(MODULE_INIT_QAPI)
* main() calls module_call_init()
QEMU needs to register a few non-qapified commands. Same high tech
works: monitor.c has its own qmp_init_marshal() along with the magic
to make it run in module_call_init(MODULE_INIT_QAPI).
QEMU also needs to unregister commands that are not wanted in this
build's configuration (commit 5032a16). Simple enough:
qmp_unregister_commands_hack(). The difficulty is to make it run
after the generated qmp_init_marshal(). We can't simply run it in
monitor.c's qmp_init_marshal(), because the order in which the
registered functions run is indeterminate. So qmp_init_marshal()
registers qmp_unregister_commands_hack() separately. Since
registering *appends* to the list of registered functions, this will
make it run after all the functions that have been registered already.
I suspect it takes a long and expensive computer science education to
not find this silly.
Dumb it down as follows:
* Drop MODULE_INIT_QAPI entirely
* Give the generated qmp_init_marshal() external linkage.
* Call it instead of module_call_init(MODULE_INIT_QAPI)
* Except in QEMU proper, call new monitor_init_qmp_commands() that in
turn calls the generated qmp_init_marshal(), registers the
additional commands and unregisters the unwanted ones.
The next commit is going to add a test that calls qmp("null").
Curiously, this hangs. Here's why.
qmp_fd_sendv() doesn't send newlines. Not even when @fmt contains
some. At first glance, the QMP parser seems to be fine with that.
However, it turns out that it fails to react to input until it sees
either a newline, an object or an array. To reproduce, feed to a QMP
monitor like this:
The value of key 'arguments' must be a JSON object. qemu-ga neglects
to check, and crashes. To reproduce, send
{ 'execute': 'guest-sync', 'arguments': [] }
to qemu-ga.
do_qmp_dispatch() uses qdict_get_qdict() to get the arguments. When
not a JSON object, this gets a null pointer, which flows through the
generated marshalling function to qobject_input_visitor_new(), where
it fails the assertion. qmp_dispatch_check_obj() needs to catch this
error.
QEMU isn't affected, because it runs qmp_check_input_obj() first,
which basically duplicates qmp_dispatch_check_obj()'s checks, plus the
missing one.
Fix by copying the missing one from qmp_check_input_obj() to
qmp_dispatch_check_obj().
Peter Maydell [Sat, 4 Mar 2017 16:31:14 +0000 (16:31 +0000)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170303' into staging
ppc patch queuye for 2017-03-03
This will probably be my last pull request before the hard freeze. It
has some new work, but that has all been posted in draft before the
soft freeze, so I think it's reasonable to include in qemu-2.9.
This batch has:
* A substantial amount of POWER9 work
* Implements the legacy (hash) MMU for POWER9
* Some more preliminaries for implementing the POWER9 radix
MMU
* POWER9 has_work
* Basic POWER9 compatibility mode handling
* Removal of some premature tests
* Some cleanups and fixes to the existing MMU code to make the
POWER9 work simpler
* A bugfix for TCG multiply adds on power
* Allow pseries guests to access PCIe extended config space
This also includes a code-motion not strictly in ppc code - moving
getrampagesize() from ppc code to exec.c. This will make some future
VFIO improvements easier, Paolo said it was ok to merge via my tree.
* remotes/dgibson/tags/ppc-for-2.9-20170303:
target/ppc: rewrite f[n]m[add,sub] using float64_muladd
spapr: Small cleanup of PPC MMU enums
spapr_pci: Advertise access to PCIe extended config space
target/ppc: Rework hash mmu page fault code and add defines for clarity
target/ppc: Move no-execute and guarded page checking into new function
target/ppc: Add execute permission checking to access authority check
target/ppc: Add Instruction Authority Mask Register Check
hw/ppc/spapr: Add POWER9 to pseries cpu models
target/ppc/POWER9: Add cpu_has_work function for POWER9
target/ppc/POWER9: Add POWER9 pa-features definition
target/ppc/POWER9: Add POWER9 mmu fault handler
target/ppc: Don't gen an SDR1 on POWER9 and rework register creation
target/ppc: Add patb_entry to sPAPRMachineState
target/ppc/POWER9: Add POWERPC_MMU_V3 bit
powernv: Don't test POWER9 CPU yet
exec, kvm, target-ppc: Move getrampagesize() to common code
target/ppc: Add POWER9/ISAv3.00 to compat_table
* remotes/bonzini/tags/for-upstream: (21 commits)
iscsi: fix missing unlock
memory: show region offset and ROM/RAM type in "info mtree -f"
x86: Work around SMI migration breakages
spice-char: fix segfault in char_spice_finalize
vl: disable default cdrom when using explicitely scsi-hd
memory: Introduce DEVICE_HOST_ENDIAN for ram device
qmp-events: fix GUEST_PANICKED description formatting
qapi: flatten GuestPanicInformation union
vmxcap: update for September 2016 SDM
vmxcap: port to Python 3
KVM: use KVM_CAP_IMMEDIATE_EXIT
kvm: use atomic_read/atomic_set to access cpu->exit_request
KVM: move SIG_IPI handling to kvm-all.c
KVM: do not use sigtimedwait to catch SIGBUS
KVM: remove kvm_arch_on_sigbus
cpus: reorganize signal handling code
KVM: x86: cleanup SIGBUS handlers
cpus: remove ugly cast on sigbus_handler
cpu-exec: remove unnecessary check of cpu->exit_request
replay: check icount in cpu exec loop
...
Paolo Bonzini [Thu, 2 Mar 2017 21:49:41 +0000 (22:49 +0100)]
memory: show region offset and ROM/RAM type in "info mtree -f"
"info mtree -f" output is currently hard to use for large RAM regions, because
there is no hint as to what part of the region is being mapped. Add the offset
if it is nonzero.
Secondly, FlatView has a readonly field, that can override the MemoryRegion
in the presence of aliases. Take it into account.
Migration from a 2.3.0 qemu results in a reboot on the receiving QEMU
due to a disagreement about SM (System management) interrupts.
2.3.0 didn't have much SMI support, but it did set CPU_INTERRUPT_SMI
and this gets into the migration stream, but on 2.3.0 it
never got delivered.
~2.4.0 SMI interrupt support was added but was broken - so
that when a 2.3.0 stream was received it cleared the CPU_INTERRUPT_SMI
but never actually caused an interrupt.
The SMI delivery was recently fixed by 68c6efe07a, but the
effect now is that an incoming 2.3.0 stream takes the interrupt it
had flagged but it's bios can't actually handle it(I think
partly due to the original interrupt not being taken during boot?).
The consequence is a triple(?) fault and a reboot.
Li Qiang [Tue, 21 Feb 2017 08:18:27 +0000 (00:18 -0800)]
spice-char: fix segfault in char_spice_finalize
In 'qemu_chr_open_spice_vmc' if the 'psubtype' is NULL, it will
call 'char_spice_finalize'. But as the SpiceChardev is not inserted
in the 'spice_chars' list, the 'QLIST_REMOVE' will cause a segfault.
Add a detect to avoid it.
Hervé Poussineau [Mon, 20 Feb 2017 20:41:19 +0000 (21:41 +0100)]
vl: disable default cdrom when using explicitely scsi-hd
In commit af6bf1328ef90fae617857c02697e0174b84d596 (May 2011),
ide-hd, ide-cd and scsi-cd have been added to disable default cdrom,
"or else you can't put one on secondary master without -nodefaults".
Make it the same for scsi-hd, so you can put one on scsi-id 2 without
using -nodefaults.
scsi-hd has probably been forgotten, as it has been added in the
preceding commit (b443ae67130d32ad06b06fc9aa6d04d05ccd93ce).
Affected users are the ones using a machine with SCSI devices and start QEMU
with -device scsi-hd but without -device scsi-cd or -cdrom
In that case, the default cdrom device will disappear instead of being empty.
Yongji Xie [Mon, 27 Feb 2017 04:52:44 +0000 (12:52 +0800)]
memory: Introduce DEVICE_HOST_ENDIAN for ram device
At the moment ram device's memory regions are DEVICE_NATIVE_ENDIAN. It's
incorrect. This memory region is backed by a MMIO area in host, so the
uint64_t data that MemoryRegionOps read from/write to this area should be
host-endian rather than target-endian. Hence, current code does not work
when target and host endianness are different which is the most common case
on PPC64. To fix it, this introduces DEVICE_HOST_ENDIAN for the ram device.
This has been tested on PPC64 BE/LE host/guest in all possible combinations
including TCG.
Paolo Bonzini [Wed, 8 Feb 2017 12:52:50 +0000 (13:52 +0100)]
KVM: use KVM_CAP_IMMEDIATE_EXIT
The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick"
a VCPU out of KVM_RUN through a POSIX signal. A signal is attached
to a dummy signal handler; by blocking the signal outside KVM_RUN and
unblocking it inside, this possible race is closed:
VCPU thread service thread
--------------------------------------------------------------
check flag
set flag
raise signal
(signal handler does nothing)
KVM_RUN
However, one issue with KVM_SET_SIGNAL_MASK is that it has to take
tsk->sighand->siglock on every KVM_RUN. This lock is often on a
remote NUMA node, because it is on the node of a thread's creator.
Taking this lock can be very expensive if there are many userspace
exits (as is the case for SMP Windows VMs without Hyper-V reference
time counter).
KVM_CAP_IMMEDIATE_EXIT provides an alternative, where the flag is
placed directly in kvm_run so that KVM can see it:
VCPU thread service thread
--------------------------------------------------------------
raise signal
signal handler
set run->immediate_exit
KVM_RUN
check run->immediate_exit
The previous patches changed QEMU so that the only blocked signal is
SIG_IPI, so we can now stop using KVM_SET_SIGNAL_MASK and sigtimedwait
if KVM_CAP_IMMEDIATE_EXIT is available.
On a 14-VCPU guest, an "inl" operation goes down from 30k to 6k on
an unlocked (no BQL) MemoryRegion, or from 30k to 15k if the BQL
is involved.
Paolo Bonzini [Wed, 8 Feb 2017 11:48:54 +0000 (12:48 +0100)]
KVM: do not use sigtimedwait to catch SIGBUS
Call kvm_on_sigbus_vcpu asynchronously from the VCPU thread.
Information for the SIGBUS can be stored in thread-local variables
and processed later in kvm_cpu_exec.
Paolo Bonzini [Thu, 9 Feb 2017 09:04:34 +0000 (10:04 +0100)]
KVM: remove kvm_arch_on_sigbus
Build it on kvm_arch_on_sigbus_vcpu instead. They do the same
for "action optional" SIGBUSes, and the main thread should never get
"action required" SIGBUSes because it blocks the signal.
Paolo Bonzini [Thu, 9 Feb 2017 08:50:02 +0000 (09:50 +0100)]
cpus: reorganize signal handling code
Move the KVM "eat signals" code under CONFIG_LINUX, in preparation
for moving it to kvm-all.c; reraise non-MCE SIGBUS immediately,
without passing it to KVM.
Paolo Bonzini [Wed, 8 Feb 2017 12:22:12 +0000 (13:22 +0100)]
cpus: remove ugly cast on sigbus_handler
The cast is there because sigbus_handler is invoked via sigfd_handler.
But it feels just wrong to use struct qemu_signalfd_siginfo in the
prototype of a function that is passed to sigaction.
Instead, do a simple-minded conversion of qemu_signalfd_siginfo to
siginfo_t.
Peter Maydell [Fri, 3 Mar 2017 14:04:27 +0000 (14:04 +0000)]
Merge remote-tracking branch 'remotes/dgibson/tags/submodule-update-20170303' into staging
submodule updates (SLOF & dtc) 2017-03-03
This set of patches updates the SLOF and dtc submodules for qemu-2.9.
The SLOF update could have gone in my ppc pull request earlier today,
but I forgot it. It should be safe to apply in either order with that
set though.
The dtc (and libfdt) update brings us up to dtc 1.4.3 which includes
some things that will be useful in future.