Git Repo - qemu.git/log

nbd: Implement NBD_CMD_WRITE_ZEROES on server

Upstream NBD protocol recently added the ability to efficiently
write zeroes without having to send the zeroes over the wire,
along with a flag to control whether the client wants to allow
a hole.

Note that when it comes to requiring full allocation, vs.
permitting optimizations, the NBD spec intentionally picked a
different sense for the flag; the rules in qemu are:
MAY_UNMAP == 0: must write zeroes
MAY_UNMAP == 1: may use holes if reads will see zeroes

while in NBD, the rules are:
FLAG_NO_HOLE == 1: must write zeroes
FLAG_NO_HOLE == 0: may use holes if reads will see zeroes

In all cases, the 'may use holes' scenario is optional (the
server need not use a hole, and must not use a hole if
subsequent reads would not see zeroes).

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <1476469998 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

nbd: Improve server handling of shutdown requests

NBD commit 6d34500b clarified how clients and servers are supposed
to behave before closing a connection. It added NBD_REP_ERR_SHUTDOWN
(for the server to announce it is about to go away during option
haggling, so the client should quit sending NBD_OPT_* other than
NBD_OPT_ABORT) and ESHUTDOWN (for the server to announce it is about
to go away during transmission, so the client should quit sending
NBD_CMD_* other than NBD_CMD_DISC). It also clarified that
NBD_OPT_ABORT gets a reply, while NBD_CMD_DISC does not.

This patch merely adds the missing reply to NBD_OPT_ABORT and teaches
the client to recognize server errors. Actually teaching the server
to send NBD_REP_ERR_SHUTDOWN or ESHUTDOWN would require knowing that
the server has been requested to shut down soon (maybe we could do
that by installing a SIGINT handler in qemu-nbd, which transitions
from RUNNING to a new state that waits for the client to react,
rather than just out-right quitting - but that's a bigger task for
another day).

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <1476469998 [email protected]>
[Move dummy ESHUTDOWN to include/qemu/osdep.h. - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>

nbd: Refactor conversion to errno to silence checkpatch

Checkpatch complains that 'return EINVAL' is usually wrong
(since we tend to favor 'return -EINVAL'). But it is a
false positive for nbd_errno_to_system_errno(). Since NBD
may add future defined wire values, refactor the code to
keep checkpatch happy.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <1476469998 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

nbd: Support shorter handshake

The NBD Protocol allows the server and client to mutually agree
on a shorter handshake (omit the 124 bytes of reserved 0), via
the server advertising NBD_FLAG_NO_ZEROES and the client
acknowledging with NBD_FLAG_C_NO_ZEROES (only possible in
newstyle, whether or not it is fixed newstyle). It doesn't
shave much off the wire, but we might as well implement it.

Signed-off-by: Eric Blake <[email protected]>
Reviewed-by: Alex Bligh <[email protected]>
Message-Id: <1476469998 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

nbd: Less allocation during NBD_OPT_LIST

Since we know that the maximum name we are willing to accept
is small enough to stack-allocate, rework the iteration over
NBD_OPT_LIST responses to reuse a stack buffer rather than
allocating every time. Furthermore, we don't even have to
allocate if we know the server's length doesn't match what
we are searching for.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <1476469998 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

nbd: Let client skip portions of server reply

The server has a nice helper function nbd_negotiate_drop_sync()
which lets it easily ignore fluff from the client (such as the
payload to an unknown option request). We can't quite make it
common, since it depends on nbd_negotiate_read() which handles
coroutine magic, but we can copy the idea into the client where
we have places where we want to ignore data (such as the
description tacked on the end of NBD_REP_SERVER).

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <1476469998 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

nbd: Let server know when client gives up negotiation

The NBD spec says that a client should send NBD_OPT_ABORT
rather than just dropping the connection, if the client doesn't
like something the server sent during option negotiation. This
is a best-effort attempt only, and can only be done in places
where we know the server is still in sync with what we've sent,
whether or not we've read everything the server has sent.
Technically, the server then has to reply with NBD_REP_ACK, but
it's not worth complicating the client to wait around for that
reply.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <1476469998 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

nbd: Share common option-sending code in client

Rather than open-coding each option request, it's easier to
have common helper functions do the work. That in turn requires
having convenient packed types for handling option requests
and replies.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <1476469998 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

nbd: Send message along with server NBD_REP_ERR errors

The NBD Protocol allows us to send human-readable messages
along with any NBD_REP_ERR error during option negotiation;
make use of this fact for clients that know what to do with
our message.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <1476469998 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

nbd: Share common reply-sending code in server

Rather than open-coding NBD_REP_SERVER, reuse the code we
already have by adding a length parameter. Additionally,
the refactoring will make adding NBD_OPT_GO in a later patch
easier.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <1476469998 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

nbd: Rename struct nbd_request and nbd_reply

Our coding convention prefers CamelCase names, and we already
have other existing structs with NBDFoo naming. Let's be
consistent, before later patches add even more structs.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <1476469998 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

nbd: Rename NbdClientSession to NBDClientSession

It's better to use consistent capitalization of the namespace
used for NBD functions; we have more instances of NBD* than
Nbd*.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <1476469998 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

nbd: Rename NBDRequest to NBDRequestData

We have both 'struct NBDRequest' and 'struct nbd_request'; making
it confusing to see which does what. Furthermore, we want to
rename nbd_request to align with our normal CamelCase naming
conventions. So, rename the struct which is used to associate
the data received during request callbacks, while leaving the
shorter name for the description of the request sent over the
wire in the NBD protocol.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <1476469998 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

nbd: Treat flags vs. command type as separate fields

Current upstream NBD documents that requests have a 16-bit flags,
followed by a 16-bit type integer; although older versions mentioned
only a 32-bit field with masking to find flags. Since the protocol
is in network order (big-endian over the wire), the ABI is unchanged;
but dealing with the flags as a separate field rather than masking
will make it easier to add support for upcoming NBD extensions that
increase the number of both flags and commands.

Improve some comments in nbd.h based on the current upstream
NBD protocol (https://github.com/yoe/nbd/blob/master/doc/proto.md),
and touch some nearby code to keep checkpatch.pl happy.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <1476469998 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

nbd: Add qemu-nbd -D for human-readable description

The NBD protocol allows servers to advertise a human-readable
description alongside an export name during NBD_OPT_LIST. Add
an option to pass through the user's string to the NBD client.

Doing this also makes it easier to test commit 200650d4, which
is the client counterpart of receiving the description.

Signed-off-by: Eric Blake <[email protected]>
Message-Id: <1476469998 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

exec.c: check memory backend file size with 'size' option

If the memory backend file is not large enough to hold the required 'size',
Qemu will report error and exit.

Signed-off-by: Haozhong Zhang <[email protected]>
Message-Id: <20161027042300 [email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Message-Id: <20161102010551 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

exec.c: do not truncate non-empty memory backend file

For '-object memory-backend-file,mem-path=foo,size=xyz', if the size of
file 'foo' does not match the given size 'xyz', the current QEMU will
truncate the file to the given size, which may corrupt the existing data
in that file. To avoid such data corruption, this patch disables
truncating non-empty backend files.

Signed-off-by: Haozhong Zhang <[email protected]>
Message-Id: <20161027042300 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

exec.c: ensure all AddressSpaceDispatch updates under RCU

The memory_dispatch field is meant to be protected by RCU so we should
use the correct primitives when accessing it. This race was flagged up
by the ThreadSanitizer.

Signed-off-by: Alex BennÃ©e <[email protected]>
Message-Id: <20161021153418 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

tests: send error_report to test log

Implement error_vprintf to send the output of error_report to
the test log. This silences test-vmstate.

Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <1477326663 [email protected]>

qemu-error: remove dependency of stubs on monitor

Leave the implementation of error_vprintf and error_vprintf_unless_qmp
(the latter now trivially wrapped by error_printf_unless_qmp) to
libqemustub.a and monitor.c. This has two advantages: it lets us
remove the monitor_printf and monitor_vprintf stubs, and it lets
tests provide a different implementation of the functions that uses
g_test_message.

Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <1477326663 [email protected]>

nbd: Use CoQueue for free_sema instead of CoMutex

NBD is using the CoMutex in a way that wasn't anticipated. For example, if there are
N(N=26, MAX_NBD_REQUESTS=16) nbd write requests, so we will invoke nbd_client_co_pwritev
N times.
----------------------------------------------------------------------------------------
time request Actions
1    1       in_flight=1, Coroutine=C1
2    2       in_flight=2, Coroutine=C2
...
15   15      in_flight=15, Coroutine=C15
16   16      in_flight=16, Coroutine=C16, free_sema->holder=C16, mutex->locked=true
17   17      in_flight=16, Coroutine=C17, queue C17 into free_sema->queue
18   18      in_flight=16, Coroutine=C18, queue C18 into free_sema->queue
...
26   N       in_flight=16, Coroutine=C26, queue C26 into free_sema->queue
----------------------------------------------------------------------------------------

Once nbd client recieves request No.16' reply, we will re-enter C16. It's ok, because
it's equal to 'free_sema->holder'.
----------------------------------------------------------------------------------------
time request Actions
27   16      in_flight=15, Coroutine=C16, free_sema->holder=C16, mutex->locked=false
----------------------------------------------------------------------------------------

Then nbd_coroutine_end invokes qemu_co_mutex_unlock what will pop coroutines from
free_sema->queue's head and enter C17. More free_sema->holder is C17 now.
----------------------------------------------------------------------------------------
time request Actions
28   17      in_flight=16, Coroutine=C17, free_sema->holder=C17, mutex->locked=true
----------------------------------------------------------------------------------------

In above scenario, we only recieves request No.16' reply. As time goes by, nbd client will
almostly recieves replies from requests 1 to 15 rather than request 17 who owns C17. In this
case, we will encounter assert "mutex->holder == self" failed since Kevin's commit 0e438cdc
"coroutine: Let CoMutex remember who holds it". For example, if nbd client recieves request
No.15' reply, qemu will stop unexpectedly:
----------------------------------------------------------------------------------------
time request       Actions
29   15(most case) in_flight=15, Coroutine=C15, free_sema->holder=C17, mutex->locked=false
----------------------------------------------------------------------------------------

Per Paolo's suggestion "The simplest fix is to change it to CoQueue, which is like a condition
variable", this patch replaces CoMutex with CoQueue.

Cc: Wen Congyang <[email protected]>
Reported-by: zhanghailiang <[email protected]>
Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Changlong Xie <[email protected]>
Message-Id: <1476267508 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

checkpatch: tweak "struct should normally be const" warning

Avoid triggering on

typedef struct BlockJobDriver BlockJobDriver;

or

struct BlockJobDriver {

Cc: John Snow <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into staging

This pull request mostly contains some more fixes to prevent buggy guests from
breaking QEMU.

# gpg: Signature made Tue 01 Nov 2016 11:26:42 GMT
# gpg:                using DSA key 0x02FC3AEB0101DBC2
# gpg: Good signature from "Greg Kurz <[email protected]>"
# gpg:                 aka "Greg Kurz <[email protected]>"
# gpg:                 aka "Greg Kurz <[email protected]>"
# gpg:                 aka "Greg Kurz <[email protected]>"
# gpg:                 aka "Gregory Kurz (Groug) <[email protected]>"
# gpg:                 aka "Gregory Kurz (Cimai Technology) <[email protected]>"
# gpg:                 aka "Gregory Kurz (Meiosys Technology) <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894  DBA2 02FC 3AEB 0101 DBC2

* remotes/gkurz/tags/for-upstream:
  9pfs: drop excessive error message from virtfs_reset()
  9pfs: don't BUG_ON() if fid is already opened
  9pfs: xattrcreate requires non-opened fids
  9pfs: limit xattr size in xattrcreate
  9pfs: fix integer overflow issue in xattr read/write
  9pfs: convert 'len/copied_len' field in V9fsXattr to the type of uint64_t
  9pfs: add xattrwalk_fid field in V9fsXattr struct

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/mdroth/tags/qga-pull-2016-10-31-tag' into staging

qemu-ga patch queue for 2.8

* add guest-fstrim support for w32
* add support for using virtio-vsock as the communication channel

# gpg: Signature made Tue 01 Nov 2016 00:55:40 GMT
# gpg:                using RSA key 0x3353C9CEF108B584
# gpg: Good signature from "Michael Roth <[email protected]>"
# gpg:                 aka "Michael Roth <[email protected]>"
# gpg:                 aka "Michael Roth <[email protected]>"
# Primary key fingerprint: CEAC C9E1 5534 EBAB B82D  3FA0 3353 C9CE F108 B584

* remotes/mdroth/tags/qga-pull-2016-10-31-tag:
  qga: add vsock-listen method
  sockets: add AF_VSOCK support
  qga: drop unnecessary GA_CHANNEL_UNIX_LISTEN checks
  qga: drop unused sockaddr in accept(2) call
  qga: minimal support for fstrim for Windows guests

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/rth/tags/pull-sparc-20161031-2' into staging

target-sparc updates for atomics and alignment

# gpg: Signature made Mon 31 Oct 2016 20:47:57 GMT
# gpg:                using RSA key 0xAD1270CC4DD0279B
# gpg: Good signature from "Richard Henderson <[email protected]>"
# gpg:                 aka "Richard Henderson <[email protected]>"
# gpg:                 aka "Richard Henderson <[email protected]>"
# Primary key fingerprint: 9CB1 8DDA F8E8 49AD 2AFC  16A4 AD12 70CC 4DD0 279B

* remotes/rth/tags/pull-sparc-20161031-2:
  target-sparc: Use tcg_gen_atomic_cmpxchg_tl
  target-sparc: Use tcg_gen_atomic_xchg_tl
  target-sparc: Remove MMU_MODE*_SUFFIX
  target-sparc: Allow 4-byte alignment on fp mem ops
  target-sparc: Implement ldqf and stqf inline
  target-sparc: Remove asi helper code handled inline
  target-sparc: Implement BCOPY/BFILL inline
  target-sparc: Implement cas_asi/casx_asi inline
  target-sparc: Implement ldstub_asi inline
  target-sparc: Implement swap_asi inline
  target-sparc: Handle more twinx asis
  target-sparc: Use MMU_PHYS_IDX for bypass asis
  target-sparc: Add MMU_PHYS_IDX
  target-sparc: Introduce cpu_raise_exception_ra
  target-sparc: Use overalignment flags for twinx and block asis

Signed-off-by: Peter Maydell <[email protected]>

9pfs: drop excessive error message from virtfs_reset()

The virtfs_reset() function is called either when the virtio-9p device
gets reset, or when the client starts a new 9P session. In both cases,
if it finds fids from a previous session, the following is printed in
the monitor:

9pfs:virtfs_reset: One or more uncluncked fids found during reset

For example, if a linux guest with a mounted 9P share is reset from the
monitor with system_reset, the message will be printed. This is excessive
since these fids are now clunked and the state is clean.

Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>

9pfs: don't BUG_ON() if fid is already opened

A buggy or malicious guest could pass the id of an already opened fid and
cause QEMU to abort. Let's return EINVAL to the guest instead.

Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>

9pfs: xattrcreate requires non-opened fids

The xattrcreate operation only makes sense on a freshly cloned fid
actually, since any open state would be leaked because of the fid_type
change. This is indeed what the linux kernel client does:

fid = clone_fid(fid);
[...]
retval = p9_client_xattrcreate(fid, name, value_len, flags);

This patch also reverts commit ff55e94d23ae since we are sure that a fid
with type P9_FID_NONE doesn't have a previously allocated xattr.

Signed-off-by: Greg Kurz <[email protected]>

9pfs: limit xattr size in xattrcreate

We shouldn't allow guests to create extended attribute with arbitrary sizes.
On linux hosts, the limit is XATTR_SIZE_MAX. Let's use it.

Signed-off-by: Greg Kurz <[email protected]>

9pfs: fix integer overflow issue in xattr read/write

The v9fs_xattr_read() and v9fs_xattr_write() are passed a guest
originated offset: they must ensure this offset does not go beyond
the size of the extended attribute that was set in v9fs_xattrcreate().
Unfortunately, the current code implement these checks with unsafe
calculations on 32 and 64 bit values, which may allow a malicious
guest to cause OOB access anyway.

Fix this by comparing the offset and the xattr size, which are
both uint64_t, before trying to compute the effective number of bytes
to read or write.

Suggested-by: Greg Kurz <[email protected]>
Signed-off-by: Li Qiang <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Reviewed-By: Guido Günther <[email protected]>
Signed-off-by: Greg Kurz <[email protected]>

9pfs: convert 'len/copied_len' field in V9fsXattr to the type of uint64_t

The 'len' in V9fsXattr comes from the 'size' argument in setxattr()
function in guest. The setxattr() function's declaration is this:

int setxattr(const char *path, const char *name,
const void *value, size_t size, int flags);

and 'size' is treated as u64 in linux kernel client code:

int p9_client_xattrcreate(struct p9_fid *fid, const char *name,
u64 attr_size, int flags)

So the 'len' should have an type of 'uint64_t'.
The 'copied_len' in V9fsXattr is used to account for copied bytes, it
should also have an type of 'uint64_t'.

Suggested-by: Greg Kurz <[email protected]>
Signed-off-by: Li Qiang <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Signed-off-by: Greg Kurz <[email protected]>

9pfs: add xattrwalk_fid field in V9fsXattr struct

Currently, 9pfs sets the 'copied_len' field in V9fsXattr
to -1 to tag xattr walk fid. As the 'copied_len' is also
used to account for copied bytes, this may make confusion. This patch
add a bool 'xattrwalk_fid' to tag the xattr walk fid.

Suggested-by: Greg Kurz <[email protected]>
Signed-off-by: Li Qiang <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Signed-off-by: Greg Kurz <[email protected]>

Merge remote-tracking branch 'remotes/mcayland/tags/qemu-openbios-signed' into staging

Update OpenBIOS images

# gpg: Signature made Mon 31 Oct 2016 20:19:53 GMT
# gpg:                using RSA key 0x5BC2C56FAE0F321F
# gpg: Good signature from "Mark Cave-Ayland <[email protected]>"
# Primary key fingerprint: CC62 1AB9 8E82 200D 915C  C9C4 5BC2 C56F AE0F 321F

* remotes/mcayland/tags/qemu-openbios-signed:
  Update OpenBIOS images to 1dc4f16 built from submodule.

Signed-off-by: Peter Maydell <[email protected]>

migration: fix compiler warning on uninitialized variable

Some older GCC versions (e.g. 4.4.7) report a warning on an
uninitialized variable for 'request', even though all possible code
paths that reference 'request' will be initialized. To appease
these versions, initialize the variable to 0.

Reported-by: Mark Cave-Ayland <[email protected]>
Signed-off-by: Jeff Cody <[email protected]>
Reviewed-by: zhanghailiang <[email protected]>
Message-id: 259818682e41b95ae60f1423b87954a3fe377639.1477950393 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

qga: add vsock-listen method

Add AF_VSOCK (virtio-vsock) support as an alternative to virtio-serial.

$ qemu-system-x86_64 -device vhost-vsock-pci,guest-cid=3 ...
(guest)# qemu-ga -m vsock-listen -p 3:1234

Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>

sockets: add AF_VSOCK support

Add the AF_VSOCK address family so that qemu-ga will be able to use
virtio-vsock.

The AF_VSOCK address family uses <cid, port> address tuples.  The cid is
the unique identifier comparable to an IP address.  AF_VSOCK does not
use name resolution so it's easy to convert between struct sockaddr_vm
and strings.

This patch defines a VsockSocketAddress instead of trying to piggy-back
on InetSocketAddress.  This is cleaner in the long run since it avoids
lots of IPv4 vs IPv6 vs vsock special casing.

Signed-off-by: Stefan Hajnoczi <[email protected]>
* treat trailing commas as garbage when parsing (Eric Blake)
* add configure check instead of checking AF_VSOCK directly
Signed-off-by: Michael Roth <[email protected]>

qga: drop unnecessary GA_CHANNEL_UNIX_LISTEN checks

Throughout the code there are c->listen_channel checks which manage the
listen socket file descriptor (waiting for accept(2), closing the file
descriptor, etc). These checks are currently preceded by explicit
c->method == GA_CHANNEL_UNIX_LISTEN checks.

Explicit GA_CHANNEL_UNIX_LISTEN checks are not necessary since serial
channel types do not create the listen channel (c->listen_channel).

As more listen channel types are added, explicitly checking all of them
becomes messy. Rely on c->listen_channel to determine whether or not a
listen socket file descriptor is used.

Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>

qga: drop unused sockaddr in accept(2) call

ga_channel_listen_accept() is currently hard-coded to support only
AF_UNIX because the struct sockaddr_un type is used. This function
should work with any address family.

Drop the sockaddr since the client address is unused and is an optional
argument to accept(2).

Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Michael Roth <[email protected]>
Signed-off-by: Michael Roth <[email protected]>

qga: minimal support for fstrim for Windows guests

Unfortunately, there is no public Windows API to start trimming the
filesystem. The only viable way here is to call 'defrag.exe /L' for
each volume.

This is working since Win8 and Win2k12.

Signed-off-by: Denis V. Lunev <[email protected]>
Signed-off-by: Denis Plotnikov <[email protected]>
CC: Michael Roth <[email protected]>
CC: Stefan Weil <[email protected]>
CC: Marc-André Lureau <[email protected]>
* check g_utf16_to_utf8() return value for GError handling instead
of GError directly (Marc-André)
Signed-off-by: Michael Roth <[email protected]>

target-sparc: Use tcg_gen_atomic_cmpxchg_tl

Tested-by: Mark Cave-Ayland <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-sparc: Use tcg_gen_atomic_xchg_tl

Tested-by: Mark Cave-Ayland <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-sparc: Remove MMU_MODE*_SUFFIX

The functions that these generate are no longer used.

Tested-by: Mark Cave-Ayland <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-sparc: Allow 4-byte alignment on fp mem ops

The cpu is allowed to require stricter alignment on these 8- and 16-byte
operations, and the OS is required to fix up the accesses as necessary,
so the previous code was not wrong.

However, we can easily handle this misalignment for all direct 8-byte
operations and for direct 16-byte loads.

We must retain 16-byte alignment for 16-byte stores, so that we don't have
to probe for writability of a second page before performing the first of
two 8-byte stores. We also retain 8-byte alignment for no-fault loads,
since they are rare and it's not worth extending the helpers for this.

Tested-by: Mark Cave-Ayland <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-sparc: Implement ldqf and stqf inline

At the same time, fix a problem with stqf_asi, when
a write might access two pages.

Tested-by: Mark Cave-Ayland <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-sparc: Remove asi helper code handled inline

Now that we never call out to helpers when direct accesses can
handle an asi, remove the corresponding code in those helpers.
For ldda, this removes the entire helper.

Tested-by: Mark Cave-Ayland <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-sparc: Implement BCOPY/BFILL inline

Tested-by: Mark Cave-Ayland <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

Update OpenBIOS images to 1dc4f16 built from submodule.

Signed-off-by: Mark Cave-Ayland <[email protected]>

Merge remote-tracking branch 'remotes/ehabkost/tags/x86-and-machine-pull-request' into staging

x86 and machine queue, 2016-10-31

# gpg: Signature made Mon 31 Oct 2016 18:29:18 GMT
# gpg:                using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <[email protected]>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/x86-and-machine-pull-request:
  target-i386: Print warning when mixing [+-]foo and foo=(on|off)
  tests: Remove unneeded "-vnc none" option

Signed-off-by: Peter Maydell <[email protected]>

target-i386: Print warning when mixing [+-]foo and foo=(on|off)

Print a warning when mixing [+-]foo and foo=(on|off) in the -cpu
argument in a way that will break in the future.

Reviewed-by: Igor Mammedov <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

Merge remote-tracking branch 'remotes/awilliam/tags/vfio-updates-20161031.0' into staging

VFIO updates 2016-10-31

- Replace skip_dump with ram_device to denote device memory and mark
   as non-direct to avoid memcpy to MMIO - fixes RTL (Alex Williamson)
- Skip zero-length sparse mmaps - avoids unnecessary warning
   (Alex Williamson)
- Clear BARs on reset so guest doesn't assume programming on return
   from S3 (Ido Yariv)
- Enable sub-page MMIO mmaps - performance improvement for devices
   with smaller BARs, iff both host and guest map them to full,
   aligned pages (Yongji Xie)

# gpg: Signature made Mon 31 Oct 2016 17:26:47 GMT
# gpg:                using RSA key 0x239B9B6E3BB08B22
# gpg: Good signature from "Alex Williamson <[email protected]>"
# gpg:                 aka "Alex Williamson <[email protected]>"
# gpg:                 aka "Alex Williamson <[email protected]>"
# gpg:                 aka "Alex Williamson <[email protected]>"
# Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B  8A90 239B 9B6E 3BB0 8B22

* remotes/awilliam/tags/vfio-updates-20161031.0:
  vfio: Add support for mmapping sub-page MMIO BARs
  vfio/pci: fix out-of-sync BAR information on reset
  vfio: Handle zero-length sparse mmap ranges
  memory: Don't use memcpy for ram_device regions
  memory: Replace skip_dump flag with "ram_device"

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Mon 31 Oct 2016 16:10:07 GMT
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (29 commits)
  qapi: allow blockdev-add for NFS
  block/nfs: Introduce runtime_opts in NFS
  block: Mention replication in BlockdevDriver enum docs
  qemu-iotests: test 'offset' and 'size' options in raw driver
  raw_bsd: add offset and size options
  qemu-iotests: Test the 'base-node' parameter of 'block-stream'
  block: Add 'base-node' parameter to the 'block-stream' command
  qemu-iotests: Test streaming to a Quorum child
  qemu-iotests: Add iotests.supports_quorum()
  qemu-iotests: Test block-stream and block-commit in parallel
  qemu-iotests: Test overlapping stream and commit operations
  qemu-iotests: Test block-stream operations in parallel
  qemu-iotests: Test streaming to an intermediate layer
  docs: Document how to stream to an intermediate layer
  block: Add QMP support for streaming to an intermediate layer
  block: Support streaming to an intermediate layer
  block: Block all intermediate nodes in commit_active_start()
  block: Block all nodes involved in the block-commit operation
  block: Check blockers in all nodes involved in a block-commit job
  block: Use block_job_add_bdrv() in backup_start()
  ...

Signed-off-by: Peter Maydell <[email protected]>

tests: Remove unneeded "-vnc none" option

Some tests use the "-vnc none" option without any clear reason,
making those tests break when --disable-vnc is specified on
./configure. Remove the unnecessary option.

Reviewed-by: John Snow <[email protected]>
Tested-by: Corey Minyard <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Acked-by: Kevin Wolf <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>

vfio: Add support for mmapping sub-page MMIO BARs

Now the kernel commit 05f0c03fbac1 ("vfio-pci: Allow to mmap
sub-page MMIO BARs if the mmio page is exclusive") allows VFIO
to mmap sub-page BARs. This is the corresponding QEMU patch.
With those patches applied, we could passthrough sub-page BARs
to guest, which can help to improve IO performance for some devices.

In this patch, we expand MemoryRegions of these sub-page
MMIO BARs to PAGE_SIZE in vfio_pci_write_config(), so that
the BARs could be passed to KVM ioctl KVM_SET_USER_MEMORY_REGION
with a valid size. The expanding size will be recovered when
the base address of sub-page BAR is changed and not page aligned
any more in guest. And we also set the priority of these BARs'
memory regions to zero in case of overlap with BARs which share
the same page with sub-page BARs in guest.

Signed-off-by: Yongji Xie <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>

vfio/pci: fix out-of-sync BAR information on reset

When a PCI device is reset, pci_do_device_reset resets all BAR addresses
in the relevant PCIDevice's config buffer.

The VFIO configuration space stays untouched, so the guest OS may choose
to skip restoring the BAR addresses as they would seem intact. The PCI
device may be left non-operational.
One example of such a scenario is when the guest exits S3.

Fix this by resetting the BAR addresses in the VFIO configuration space
as well.

Signed-off-by: Ido Yariv <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>

vfio: Handle zero-length sparse mmap ranges

As reported in the link below, user has a PCI device with a 4KB BAR
which contains the MSI-X table.  This seems to hit a corner case in
the kernel where the region reports being mmap capable, but the sparse
mmap information reports a zero sized range.  It's not entirely clear
that the kernel is incorrect in doing this, but regardless, we need
to handle it.  To do this, fill our mmap array only with non-zero
sized sparse mmap entries and add an error return from the function
so we can tell the difference between nr_mmaps being zero based on
sparse mmap info vs lack of sparse mmap info.

NB, this doesn't actually change the behavior of the device, it only
removes the scary "Failed to mmap ... Performance may be slow" error
message.  We cannot currently create an mmap over the MSI-X table.

Link: http://lists.nongnu.org/archive/html/qemu-discuss/2016-10/msg00009.html
Signed-off-by: Alex Williamson <[email protected]>

memory: Don't use memcpy for ram_device regions

With a vfio assigned device we lay down a base MemoryRegion registered
as an IO region, giving us read & write accessors.  If the region
supports mmap, we lay down a higher priority sub-region MemoryRegion
on top of the base layer initialized as a RAM device pointer to the
mmap.  Finally, if we have any quirks for the device (ie. address
ranges that need additional virtualization support), we put another IO
sub-region on top of the mmap MemoryRegion.  When this is flattened,
we now potentially have sub-page mmap MemoryRegions exposed which
cannot be directly mapped through KVM.

This is as expected, but a subtle detail of this is that we end up
with two different access mechanisms through QEMU.  If we disable the
mmap MemoryRegion, we make use of the IO MemoryRegion and service
accesses using pread and pwrite to the vfio device file descriptor.
If the mmap MemoryRegion is enabled and results in one of these
sub-page gaps, QEMU handles the access as RAM, using memcpy to the
mmap.  Using either pread/pwrite or the mmap directly should be
correct, but using memcpy causes us problems.  I expect that not only
does memcpy not necessarily honor the original width and alignment in
performing a copy, but it potentially also uses processor instructions
not intended for MMIO spaces.  It turns out that this has been a
problem for Realtek NIC assignment, which has such a quirk that
creates a sub-page mmap MemoryRegion access.

To resolve this, we disable memory_access_is_direct() for ram_device
regions since QEMU assumes that it can use memcpy for those regions.
Instead we access through MemoryRegionOps, which replaces the memcpy
with simple de-references of standard sizes to the host memory.

With this patch we attempt to provide unrestricted access to the RAM
device, allowing byte through qword access as well as unaligned
access.  The assumption here is that accesses initiated by the VM are
driven by a device specific driver, which knows the device
capabilities.  If unaligned accesses are not supported by the device,
we don't want them to work in a VM by performing multiple aligned
accesses to compose the unaligned access.  A down-side of this
philosophy is that the xp command from the monitor attempts to use
the largest available access weidth, unaware of the underlying
device.  Using memcpy had this same restriction, but at least now an
operator can dump individual registers, even if blocks of device
memory may result in access widths beyond the capabilities of a
given device (RTL NICs only support up to dword).

Reported-by: Thorsten Kohfeldt <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>

memory: Replace skip_dump flag with "ram_device"

Setting skip_dump on a MemoryRegion allows us to modify one specific
code path, but the restriction we're trying to address encompasses
more than that. If we have a RAM MemoryRegion backed by a physical
device, it not only restricts our ability to dump that region, but
also affects how we should manipulate it. Here we recognize that
MemoryRegions do not change to sometimes allow dumps and other times
not, so we replace setting the skip_dump flag with a new initializer
so that we know exactly the type of region to which we're applying
this behavior.

Signed-off-by: Alex Williamson <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>

qapi: allow blockdev-add for NFS

Introduce new object 'BlockdevOptionsNFS' in qapi/block-core.json to
support blockdev-add for NFS network protocol driver. Also make a new
struct NFSServer to support tcp connection.

Signed-off-by: Ashijeet Acharya <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block/nfs: Introduce runtime_opts in NFS

Make NFS block driver use various fine grained runtime_opts.
Set .bdrv_parse_filename() to nfs_parse_filename() and introduce two
new functions nfs_parse_filename() and nfs_parse_uri() to help parsing
the URI.
Add a new option "server" which then accepts a new struct NFSServer.

Signed-off-by: Ashijeet Acharya <[email protected]>
[ kwolf: Fixed client->path ]
Signed-off-by: Kevin Wolf <[email protected]>

block: Mention replication in BlockdevDriver enum docs

Missed in commit 82ac554.

Signed-off-by: Eric Blake <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: test 'offset' and 'size' options in raw driver

Signed-off-by: Tomáš Golembiovský <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

raw_bsd: add offset and size options

Added two new options 'offset' and 'size'. This makes it possible to use
only part of the file as a device. This can be used e.g. to limit the
access only to single partition in a disk image or use a disk inside a
tar archive (like OVA).

When 'size' is specified we do our best to honour it.

Signed-off-by: Tomáš Golembiovský <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: Test the 'base-node' parameter of 'block-stream'

The block-stream command has traditionally used the 'base' parameter
to indicate the image to copy the data from. This test checks that the
'base-node' parameter can also be used for the same purpose.

Signed-off-by: Alberto Garcia <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Add 'base-node' parameter to the 'block-stream' command

The way to specify the node from which to copy data in the
block-stream operation is by using the 'base' parameter. This
parameter however takes a file name, not a node name.

Since we want to be able to perform this operation using only node
names, this patch adds a new 'base-node' parameter.

Signed-off-by: Alberto Garcia <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: Test streaming to a Quorum child

Quorum children are special in the sense that they're not directly
attached to a block backend but they're not used as backing images
either. However the intermediate block streaming code supports
streaming to them. This is a test case for that scenario.

Signed-off-by: Alberto Garcia <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: Add iotests.supports_quorum()

There's many tests that need Quorum support in order to run. At the
moment each test implements its own check to see if Quorum is
enabled. This patch centralizes all those checks in a new function
called iotests.supports_quorum().

Signed-off-by: Alberto Garcia <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: Test block-stream and block-commit in parallel

As with test_stream_parallel(), we allow mixing block-stream and
block-commit operations in the same backing chain as long as there's
no overlap among the involved nodes.

Signed-off-by: Alberto Garcia <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: Test overlapping stream and commit operations

These test cases check that it's not possible to perform two
block-stream or block-commit operations if there are nodes involved in
both.

Signed-off-by: Alberto Garcia <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: Test block-stream operations in parallel

This test case checks that it's possible to launch several stream
operations in parallel in the same snapshot chain, each one involving
a different set of nodes.

Signed-off-by: Alberto Garcia <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qemu-iotests: Test streaming to an intermediate layer

This adds test_stream_intermediate(), similar to test_stream() but
streams to the intermediate image instead.

Signed-off-by: Alberto Garcia <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

docs: Document how to stream to an intermediate layer

Signed-off-by: Alberto Garcia <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Add QMP support for streaming to an intermediate layer

This patch makes the 'device' parameter of the 'block-stream' command
accept a node name that is not a root node. The presence of this
feature can't be directly tested with introspection; soon we'll
introduce a 'base-node' parameter whose presence can be checked for
this purpose.

In addition to that, operation blockers will be checked in all
intermediate nodes between the top and the base node.

Signed-off-by: Alberto Garcia <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Support streaming to an intermediate layer

This makes sure that the image we are streaming into is open in
read-write mode during the operation.

Operation blockers are also set in all intermediate nodes, since they
will be removed from the chain afterwards.

Finally, this also unblocks the stream operation in backing files.

Signed-off-by: Alberto Garcia <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Block all intermediate nodes in commit_active_start()

When block-commit is launched without the top parameter, it uses
internally a mirror block job. In that case all intermediate nodes
between the active and base nodes must be blocked as well.

Signed-off-by: Alberto Garcia <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Block all nodes involved in the block-commit operation

After a successful block-commit operation all nodes between top and
base are removed from the backing chain, and top's overlay needs to
be updated to point to base. Because of that we should prevent other
block jobs from messing with them.

This patch blocks all operations in these nodes in commit_start().

Signed-off-by: Alberto Garcia <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Check blockers in all nodes involved in a block-commit job

qmp_block_commit() checks for op blockers in the active and
destination (base) images. However all nodes between top_bs and base
are also involved, and they are removed from the chain afterwards.

In addition to that, if top_bs is not the active layer then top_bs's
overlay also needs to be checked because it's involved in the job (its
backing image string needs to be updated to point to 'base').

This patch checks that none of those nodes are blocked.

Signed-off-by: Alberto Garcia <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Use block_job_add_bdrv() in backup_start()

Use block_job_add_bdrv() instead of blocking all operations in
backup_start() and unblocking them in backup_run().

Signed-off-by: Alberto Garcia <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Use block_job_add_bdrv() in mirror_start_job()

Use block_job_add_bdrv() instead of blocking all operations in
mirror_start_job() and unblocking them in mirror_exit().

Signed-off-by: Alberto Garcia <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Add block_job_add_bdrv()

When a block job is created on a certain BlockDriverState, operations
are blocked there while the job exists. However, some block jobs may
involve additional BDSs, which must be blocked separately when the job
is created and unblocked manually afterwards.

This patch adds block_job_add_bdrv(), that simplifies this process by
keeping a list of BDSs that are involved in the specified block job.

Signed-off-by: Alberto Garcia <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Pause all jobs during bdrv_reopen_multiple()

When a BlockDriverState is about to be reopened it can trigger certain
operations that need to write to disk. During this process a different
block job can be woken up. If that block job completes and also needs
to call bdrv_reopen() it can happen that it needs to do it on the same
BlockDriverState that is still in the process of being reopened.

This can have fatal consequences, like in this example:

  1) Block job A starts and sleeps after a while.
  2) Block job B starts and tries to reopen node1 (a qcow2 file).
  3) Reopening node1 means flushing and replacing its qcow2 cache.
  4) While the qcow2 cache is being flushed, job A wakes up.
  5) Job A completes and reopens node1, replacing its cache.
  6) Job B resumes, but the cache that was being flushed no longer
     exists.

This patch splits the bdrv_drain_all() call to keep all block jobs
paused during bdrv_reopen_multiple(), so that step 4 can never happen
and the operation is safe.

Note that this scenario can only happen if both bdrv_reopen() calls
are made by block jobs on the same backing chain. Otherwise there's no
chance that the same BlockDriverState appears in both reopen queues.

Signed-off-by: Alberto Garcia <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block: Add bdrv_drain_all_{begin,end}()

bdrv_drain_all() doesn't allow the caller to do anything after all
pending requests have been completed but before block jobs are
resumed.

This patch splits bdrv_drain_all() into _begin() and _end() for that
purpose. It also adds aio_{disable,enable}_external() calls to disable
external clients in the meantime.

An important restriction of this split is that no new block jobs or
BlockDriverStates can be created between the bdrv_drain_all_begin()
and bdrv_drain_all_end() calls. This is not a concern now because
we'll only be using this in bdrv_reopen_multiple(), but it must be
dealt with if we ever have other uses cases in the future.

Signed-off-by: Alberto Garcia <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

qapi: allow blockdev-add for ssh

Introduce new object 'BlockdevOptionsSsh' in qapi/block-core.json to
support blockdev-add for SSH network protocol driver. Use only 'struct
InetSocketAddress' since SSH only supports connection over TCP.

Signed-off-by: Ashijeet Acharya <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
[ kwolf: Removed host_key_check option, we want to expose this later in
a structured way rather than as a string that must be parsed ]
Signed-off-by: Kevin Wolf <[email protected]>

block/ssh: Use InetSocketAddress options

Drop the use of legacy options in favour of the InetSocketAddress
options.

Signed-off-by: Ashijeet Acharya <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block/ssh: Add InetSocketAddress and accept it

Add InetSocketAddress compatibility to SSH driver.

Add a new option "server" to the SSH block driver which then accepts
a InetSocketAddress.

"host" and "port" are supported as legacy options and are mapped to
their InetSocketAddress representation.

Signed-off-by: Ashijeet Acharya <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

util/qemu-sockets: Make inet_connect_saddr() public

Make inet_connect_saddr() in util/qemu-sockets.c public in order to be
able to use it with InetSocketAddress sockets outside of
util/qemu-sockets.c independently.

Signed-off-by: Ashijeet Acharya <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block/ssh: Add ssh_has_filename_options_conflict()

We have 5 options plus ("server") option which is added in the next
patch that conflict with specifying a SSH filename. We need to iterate
over all the options to check whether its key has an "server." prefix.

This iteration will help us adding the new option "server" easily.

Signed-off-by: Ashijeet Acharya <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

target-sparc: Implement cas_asi/casx_asi inline

Tested-by: Mark Cave-Ayland <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-sparc: Implement ldstub_asi inline

Tested-by: Mark Cave-Ayland <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-sparc: Implement swap_asi inline

Tested-by: Mark Cave-Ayland <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-sparc: Handle more twinx asis

As used by HelenOS, presumably for ultra 2 and 3,
prior to the sun4v platform and the current twinx names.

Tested-by: Mark Cave-Ayland <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-sparc: Use MMU_PHYS_IDX for bypass asis

Tested-by: Mark Cave-Ayland <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-sparc: Add MMU_PHYS_IDX

It's handy to have a mmu idx for physical addresses, so
that mmu disabled and physical access asis can use the
same path as normal accesses.

Tested-by: Mark Cave-Ayland <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-sparc: Introduce cpu_raise_exception_ra

Several helpers call helper_raise_exception directly, which requires
in turn that their callers have performed save_state. The new function
allows a TCG return address to be passed in so that we can restore
PC + NPC + flags data from that.

This fixes a bug in the usage of helper_check_align, whose callers had
not been calling save_state. It fixes another bug in which the divide
helpers used GETPC at a level other than the direct callee from TCG.

This allows the translator to avoid save_state prior to SAVE, RESTORE,
and FLUSHW instructions.

Tested-by: Mark Cave-Ayland <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

target-sparc: Use overalignment flags for twinx and block asis

This allows us to enforce 16 and 64-byte alignment
without any extra overhead.

Tested-by: Mark Cave-Ayland <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-Id: <1466744068 [email protected]>

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream-mttcg' into staging

Base patches for MTTCG enablement.

# gpg: Signature made Mon 31 Oct 2016 14:01:41 GMT
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg:                 aka "Paolo Bonzini <[email protected]>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream-mttcg:
  tcg: move locking for tb_invalidate_phys_page_range up
  *_run_on_cpu: introduce run_on_cpu_data type
  cpus: re-factor out handle_icount_deadline
  tcg: cpus rm tcg_exec_all()
  tcg: move tcg_exec_all and helpers above thread fn
  target-arm/arm-powerctl: wake up sleeping CPUs
  tcg: protect translation related stuff with tb_lock.
  translate-all: Add assert_(memory|tb)_lock annotations
  linux-user/elfload: ensure mmap_lock() held while setting up
  tcg: comment on which functions have to be called with tb_lock held
  cpu-exec: include cpu_index in CPU_LOG_EXEC messages
  translate-all: add DEBUG_LOCKING asserts
  translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH
  cpus: make all_vcpus_paused() return bool

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20161031' into staging

Two PCI fixes/improvements for s390x.

# gpg: Signature made Mon 31 Oct 2016 10:09:24 GMT
# gpg:                using RSA key 0xDECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <[email protected]>"
# gpg:                 aka "Cornelia Huck <[email protected]>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0  18CE DECF 6B93 C6F0 2FAF

* remotes/cohuck/tags/s390x-20161031:
  s390x/pci: Check memory region dispatching callbacks
  s390x/pci: use generic interface to inject interrupt

Signed-off-by: Peter Maydell <[email protected]>

tcg: move locking for tb_invalidate_phys_page_range up

In the linux-user case all things that involve ''l1_map' and PageDesc
tweaks are protected by the memory lock (mmpa_lock). For SoftMMU mode
we previously relied on single threaded behaviour, with MTTCG we now use
the tb_lock().

As a result we need to do a little re-factoring and push the taking of
this lock up the call tree. This requires a slightly different entry for
the SoftMMU and user-mode cases from tb_invalidate_phys_range.

This also means user-mode breakpoint insertion needs to take two locks
but it hadn't taken any previously so this is an improvement.

Signed-off-by: Alex Bennée <[email protected]>
Message-Id: <20161027151030 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

*_run_on_cpu: introduce run_on_cpu_data type

This changes the *_run_on_cpu APIs (and helpers) to pass data in a
run_on_cpu_data type instead of a plain void *. This is because we
sometimes want to pass a target address (target_ulong) and this fails on
32 bit hosts emulating 64 bit guests.

Signed-off-by: Alex Bennée <[email protected]>
Message-Id: <20161027151030 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge remote-tracking branch 'remotes/amit-migration/tags/migration-for-2.8' into staging

Migration bits from the COLO project

# gpg: Signature made Sun 30 Oct 2016 10:39:55 GMT
# gpg:                using RSA key 0xEB0B4DFC657EF670
# gpg: Good signature from "Amit Shah <[email protected]>"
# gpg:                 aka "Amit Shah <[email protected]>"
# gpg:                 aka "Amit Shah <[email protected]>"
# Primary key fingerprint: 48CA 3722 5FE7 F4A8 B337  2735 1E9A 3B5F 8540 83B6
#      Subkey fingerprint: CC63 D332 AB8F 4617 4529  6534 EB0B 4DFC 657E F670

* remotes/amit-migration/tags/migration-for-2.8:
  MAINTAINERS: Add maintainer for COLO framework related files
  configure: Support enable/disable COLO feature
  docs: Add documentation for COLO feature
  COLO: Implement failover work for secondary VM
  COLO: Implement the process of failover for primary VM
  COLO: Introduce state to record failover process
  COLO: Add 'x-colo-lost-heartbeat' command to trigger failover
  COLO: Synchronize PVM's state to SVM periodically
  COLO: Add checkpoint-delay parameter for migrate-set-parameters
  COLO: Load VMState into QIOChannelBuffer before restore it
  COLO: Send PVM state to secondary side when do checkpoint
  COLO: Add a new RunState RUN_STATE_COLO
  COLO: Introduce checkpointing protocol
  COLO: Establish a new communicating path for COLO
  migration: Switch to COLO process after finishing loadvm
  migration: Enter into COLO mode after migration if COLO is enabled
  COLO: migrate COLO related info to secondary node
  migration: Introduce capability 'x-colo' to migration

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/sstabellini/tags/xen-20161028-tag' into staging

Xen 2016/10/28

# gpg: Signature made Sat 29 Oct 2016 02:03:42 BST
# gpg:                using RSA key 0x894F8F4870E1AE90
# gpg: Good signature from "Stefano Stabellini <[email protected]>"
# gpg:                 aka "Stefano Stabellini <[email protected]>"
# Primary key fingerprint: D04E 33AB A51F 67BA 07D3  0AEA 894F 8F48 70E1 AE90

* remotes/sstabellini/tags/xen-20161028-tag:
  xen: Rename xen_be_del_xendev
  xen: Rename xen_be_find_xendev
  xen: Rename xen_be_evtchn_event
  xen: Rename xen_be_send_notify
  xen: Rename xen_be_unbind_evtchn
  xen: Rename xen_be_printf to xen_pv_printf
  xen: Move xenstore cleanup and mkdir functions
  xen: Prepare xendev qtail to be shared with frontends
  xen: Move evtchn functions to xen_pvdev.c
  xen: Move xenstore_update to xen_pvdev.c
  xen: Create a new file xen_pvdev.c
  xen: Fix coding style warnings
  xen: Fix coding style errors

Signed-off-by: Peter Maydell <[email protected]>