Eric Blake [Wed, 8 Mar 2017 20:02:16 +0000 (14:02 -0600)]
block: Drop unmaintained 'archipelago' driver
The driver has failed to build since commit da34e65, in qemu 2.6,
due to a missing include of qapi/error.h for error_setg().
Since no one has complained in three releases, it is easier to
remove the dead code than to keep it around, especially since it
is not being built by default and therefore prone to bitrot.
Fam Zheng [Wed, 8 Mar 2017 12:08:14 +0000 (20:08 +0800)]
file-posix: Consider max_segments for BlockLimits.max_transfer
BlockLimits.max_transfer can be too high without this fix, guest will
encounter I/O error or even get paused with werror=stop or rerror=stop. The
cause is explained below.
Linux has a separate limit, /sys/block/.../queue/max_segments, which in
the worst case can be more restrictive than the BLKSECTGET which we
already consider (note that they are two different things). So, the
failure scenario before this patch is:
1) host device has max_sectors_kb = 4096 and max_segments = 64;
2) guest learns max_sectors_kb limit from QEMU, but doesn't know
max_segments;
3) guest issues e.g. a 512KB request thinking it's okay, but actually
it's not, because it will be passed through to host device as an
SG_IO req that has niov > 64;
4) host kernel doesn't like the segmenting of the request, and returns
-EINVAL;
This patch checks the max_segments sysfs entry for the host device and
calculates a "conservative" bytes limit using the page size, which is
then merged into the existing max_transfer limit. Guest will discover
this from the usual virtual block device interfaces. (In the case of
scsi-generic, it will be done in the INQUIRY reply interception in
device model.)
The other possibility is to actually propagate it as a separate limit,
but it's not better. On the one hand, there is a big complication: the
limit is per-LUN in QEMU PoV (because we can attach LUNs from different
host HBAs to the same virtio-scsi bus), but the channel to communicate
it in a per-LUN manner is missing down the stack; on the other hand,
two limits versus one doesn't change much about the valid size of I/O
(because guest has no control over host segmenting).
Also, the idea to fall back to bounce buffering in QEMU, upon -EINVAL,
was explored. Unfortunately there is no neat way to ensure the bounce
buffer is less segmented (in terms of DMA addr) than the guest buffer.
Practically, this bug is not very common. It is only reported on a
Emulex (lpfc), so it's okay to get it fixed in the easier way.
Gerd Hoffmann [Mon, 6 Mar 2017 08:31:51 +0000 (09:31 +0100)]
qxl: clear guest_cursor on QXL_CURSOR_HIDE
Make sure we don't leave guest_cursor pointing into nowhere. This might
lead to (rare) live migration failures, due to target trying to restore
the cursor from the stale pointer.
Peter Maydell [Wed, 8 Mar 2017 09:47:52 +0000 (09:47 +0000)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer fixes for 2.9.0-rc0
# gpg: Signature made Tue 07 Mar 2017 14:59:18 GMT
# gpg: using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (27 commits)
commit: Don't use error_abort in commit_start
block: Don't use error_abort in blk_new_open
sheepdog: Support blockdev-add
qapi-schema: Rename SocketAddressFlat's variant tcp to inet
qapi-schema: Rename GlusterServer to SocketAddressFlat
gluster: Plug memory leaks in qemu_gluster_parse_json()
gluster: Don't duplicate qapi-util.c's qapi_enum_parse()
gluster: Drop assumptions on SocketTransport names
sheepdog: Implement bdrv_parse_filename()
sheepdog: Use SocketAddress and socket_connect()
sheepdog: Report errors in pseudo-filename more usefully
sheepdog: Don't truncate long VDI name in _open(), _create()
sheepdog: Fix snapshot ID parsing in _open(), _create, _goto()
sheepdog: Mark sd_snapshot_delete() lossage FIXME
sheepdog: Fix error handling sd_create()
sheepdog: Fix error handling in sd_snapshot_delete()
sheepdog: Defuse time bomb in sd_open() error handling
block: Fix error handling in bdrv_replace_in_backing_chain()
block: Handle permission errors in change_parent_backing_link()
block: Ignore multiple children in bdrv_check_update_perm()
...
Peter Maydell [Tue, 7 Mar 2017 17:06:48 +0000 (17:06 +0000)]
Merge remote-tracking branch 'remotes/armbru/tags/pull-block-2017-02-28-v4' into staging
block: Command line option -blockdev
# gpg: Signature made Tue 07 Mar 2017 15:07:59 GMT
# gpg: using RSA key 0x3870B400EB918653
# gpg: Good signature from "Markus Armbruster <[email protected]>"
# gpg: aka "Markus Armbruster <[email protected]>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* remotes/armbru/tags/pull-block-2017-02-28-v4: (24 commits)
keyval: Support lists
docs/qapi-code-gen.txt: Clarify naming rules
qapi: Improve how keyval input visitor reports unexpected dicts
block: Initial implementation of -blockdev
qapi: New qobject_input_visitor_new_str() for convenience
keyval: Restrict key components to valid QAPI names
qapi: New parse_qapi_name()
test-qapi-util: New, covering qapi/qapi-util.c
monitor: Assert qmp_schema_json[] is sane
test-visitor-serialization: Pass &error_abort to qobject_from_json()
check-qjson: Test errors from qobject_from_json()
block: More detailed syntax error reporting for JSON filenames
qobject: Propagate parse errors through qobject_from_json()
test-qobject-input-visitor: Abort earlier on bad test input
qjson: Abort earlier on qobject_from_jsonf() misuse
libqtest: Fix qmp() & friends to abort on JSON parse errors
qobject: Propagate parse errors through qobject_from_jsonv()
qapi: Factor out common qobject_input_get_keyval()
qapi: Factor out common part of qobject input visitor creation
test-keyval: Cover use with qobject input visitor
...
Additionally permit non-negative integers as key components. A
dictionary's keys must either be all integers or none. If all keys
are integers, convert the dictionary to a list. The set of keys must
be [0,N].
Examples:
* list.1=goner,list.0=null,list.1=eins,list.2=zwei
is equivalent to JSON [ "null", "eins", "zwei" ]
* a.b.c=1,a.b.0=2
is inconsistent: a.b.c clashes with a.b.0
* list.0=null,list.2=eins,list.2=zwei
has a hole: list.1 is missing
Similar design flaw as for objects: there is no way to denote an empty
list. While interpreting "key absent" as empty list seems natural
(removing a list member from the input string works when there are
multiple ones, so why not when there's just one), it doesn't work:
"key absent" already means "optional list absent", which isn't the
same as "empty list present".
Update the keyval object visitor to use this a.0 syntax in error
messages rather than the usual a[0].
is rejected with "Invalid parameter type for 'aio', expected: string".
To make sense of this, you almost have to translate it into the
equivalent QMP command
Improve the error message to "Parameters 'aio.*' are unexpected".
Take care not to confuse the case "unexpected nested parameters"
(i.e. the object is a QDict or QList) with the case "non-string scalar
parameter". The latter is a misuse of the visitor, and should perhaps
be an assertion. Note that test-qobject-input-visitor exercises this
misuse in test_visitor_in_int_keyval(), test_visitor_in_bool_keyval()
and test_visitor_in_number_keyval().
The JSON argument doesn't exactly blend into the existing option
syntax, so the traditional KEY=VALUE,... syntax is also supported,
using dotted keys to do the nesting:
This does not yet support lists, but that will be addressed shortly.
Note that calling qmp_blockdev_add() (say via qmp_marshal_block_add())
right away would crash. We need to stash the configuration for later
instead. This is crudely done, and bypasses QemuOpts, even though
storing configuration is what QemuOpts is for. Need to revamp option
infrastructure to support QAPI types like BlockdevOptions.
keyval: Restrict key components to valid QAPI names
Until now, key components are separated by '.'. This leaves little
room for evolving the syntax, and is incompatible with the __RFQDN_
prefix convention for downstream extensions.
Since key components will be commonly used as QAPI member names by the
QObject input visitor, we can just as well borrow the QAPI naming
rules here: letters, digits, hyphen and period starting with a letter,
with an optional __RFQDN_ prefix for downstream extensions.
test-visitor-serialization: Pass &error_abort to qobject_from_json()
qmp_deserialize() calls qobject_from_json() ignoring errors. It
passes the result to qobject_input_visitor_new(), which asserts it's
not null. Therefore, we can just as well pass &error_abort to
qobject_from_json().
qjson: Abort earlier on qobject_from_jsonf() misuse
Ignoring errors first, then asserting success is suboptimal. Pass
&error_abort instead, so we abort earlier, and hopefully get more
useful clues on what's wrong.
qapi: qobject input visitor variant for use with keyval_parse()
Currently the QObjectInputVisitor assumes that all scalar values are
directly represented as the final types declared by the thing being
visited. i.e. it assumes an 'int' is using QInt, and a 'bool' is using
QBool, etc. This is good when QObjectInputVisitor is fed a QObject
that came from a JSON document on the QMP monitor, as it will strictly
validate correctness.
To allow QObjectInputVisitor to be reused for visiting a QObject
originating from keyval_parse(), an alternative mode is needed where
all the scalars types are represented as QString and converted on the
fly to the final desired type.
Rebased, conflicts resolved, commit message updated to refer to
keyval_parse(). autocast replaced by keyval in identifiers,
noautocast replaced by fail in tests.
Fix qobject_input_type_uint64_keyval() not to reject '-', for QemuOpts
compatibility: replace parse_uint_full() by open-coded
parse_option_number(). The next commit will add suitable tests.
Leave out the fancy ERANGE error reporting for now, but add a TODO
comment. Add it qobject_input_type_int64_keyval() and
qobject_input_type_number_keyval(), too.
Open code parse_option_bool() and parse_option_size() so we have to
call qobject_input_get_name() only when actually needed. Again, leave
out ERANGE error reporting for now.
QAPI/QMP downstream extension prefixes __RFQDN_ don't work, because
keyval_parse() splits them at '.'. This will be addressed later in
the series.
qobject_input_type_int64_keyval(), qobject_input_type_uint64_keyval(),
qobject_input_type_number_keyval() tweaked for style.
keyval_parse() parses KEY=VALUE,... into a QDict. Works like
qemu_opts_parse(), except:
* Returns a QDict instead of a QemuOpts (d'oh).
* Supports nesting, unlike QemuOpts: a KEY is split into key
fragments at '.' (dotted key convention; the block layer does
something similar on top of QemuOpts). The key fragments are QDict
keys, and the last one's value is updated to VALUE.
* Each key fragment may be up to 127 bytes long. qemu_opts_parse()
limits the entire key to 127 bytes.
* Overlong key fragments are rejected. qemu_opts_parse() silently
truncates them.
* It does not store the returned value. qemu_opts_parse() stores it
in the QemuOptsList.
* It does not treat parameter "id" specially. qemu_opts_parse()
ignores all but the first "id", and fails when its value isn't
id_wellformed(), or duplicate (a QemuOpts with the same ID is
already stored). It also screws up when a value contains ",id=".
* Implied value is not supported. qemu_opts_parse() desugars "foo" to
"foo=on", and "nofoo" to "foo=off".
* An implied key's value can't be empty, and can't contain ','.
I intend to grow this into a saner replacement for QemuOpts. It'll
take time, though.
Note: keyval_parse() provides no way to do lists, and its key syntax
is incompatible with the __RFQDN_ prefix convention for downstream
extensions, because it blindly splits at '.', even in __RFQDN_. Both
issues will be addressed later in the series.
Peter Maydell [Fri, 3 Mar 2017 15:50:33 +0000 (15:50 +0000)]
disas/arm: Avoid unintended sign extension
When assembling 'given' from the instruction bytes, C's integer
promotion rules mean we may promote an unsigned char to a signed
integer before shifting it, and then sign extend to a 64-bit long,
which can set the high bits of the long. The code doesn't in fact
care about the high bits if the long is 64 bits, but this is
surprising, so don't do it.
Peter Maydell [Fri, 3 Mar 2017 15:50:32 +0000 (15:50 +0000)]
disas/cris: Avoid unintended sign extension
In the cris disassembler we were using 'unsigned long' to calculate
addresses which are supposed to be 32 bits. This meant that we might
accidentally sign extend or calculate a value that was outside the 32
bit range of the guest CPU. Use 'uint32_t' instead so we give the
right answers on 64-bit hosts.
Peter Maydell [Fri, 3 Mar 2017 15:50:31 +0000 (15:50 +0000)]
disas/microblaze: Avoid unintended sign extension
In read_insn_microblaze() we assemble 4 bytes into an 'unsigned
long'. If 'unsigned long' is 64 bits and the high byte has its top
bit set, then C's implicit conversion from 'unsigned char' to 'int'
for the shift will result in an unintended sign extension which sets
the top 32 bits in 'inst'. Add casts to prevent this. (Spotted by
Coverity, CID 1005401.)
Peter Maydell [Fri, 3 Mar 2017 15:50:30 +0000 (15:50 +0000)]
disas/m68k: Avoid unintended sign extension in get_field()
In get_field(), we take an 'unsigned char' value and shift it left,
which implicitly promotes it to 'signed int', before ORing it into an
'unsigned long' type. If 'unsigned long' is 64 bits then this will
result in a sign extension and the top 32 bits of the result will be
1s. Add explicit casts to unsigned long before shifting to prevent
this.
Peter Maydell [Fri, 3 Mar 2017 15:50:29 +0000 (15:50 +0000)]
disas/i386: Avoid NULL pointer dereference in error case
In a code path where we hit an internal disassembler error, execution
would subsequently attempt to dereference a NULL pointer. This
should never happen, but avoid the crash.
Peter Maydell [Fri, 3 Mar 2017 15:50:28 +0000 (15:50 +0000)]
disas/hppa: Remove dead code
Coverity complains (CID 1302705) that the "fr0" part of the ?: in
fput_fp_reg_r() is dead. This looks like cut-n-paste error from
fput_fp_reg(); delete the dead code.
qapi-schema: Rename SocketAddressFlat's variant tcp to inet
QAPI type SocketAddressFlat differs from SocketAddress pointlessly:
the discriminator value for variant InetSocketAddress is 'tcp' instead
of 'inet'. Rename.
The type is so far only used by the Gluster block drivers. Take care
to keep 'tcp' working in things like -drive's file.server.0.type=tcp.
The "gluster+tcp" URI scheme in pseudo-filenames stays the same.
blockdev-add changes, but it has changed incompatibly since 2.8
already.
gluster: Drop assumptions on SocketTransport names
qemu_gluster_glfs_init() passes the names of QAPI enumeration type
SocketTransport to glfs_set_volfile_server(). Works, because they
were chosen to match. But the coupling is artificial. Use the
appropriate literal strings instead.
sd_parse_uri() builds a string from host and port parts for
inet_connect(). inet_connect() parses it into host, port and options.
Whether this gets exactly the same host, port and no options for all
inputs is not obvious.
Cut out the string middleman and build a SocketAddress for
socket_connect() instead.
sheepdog: Report errors in pseudo-filename more usefully
Errors in the pseudo-filename are all reported with the same laconic
"Can't parse filename" message.
Add real error reporting, such as:
$ qemu-system-x86_64 --drive driver=sheepdog,filename=sheepdog:///
qemu-system-x86_64: --drive driver=sheepdog,filename=sheepdog:///: missing file path in URI
$ qemu-system-x86_64 --drive driver=sheepdog,filename=sheepgod:///vdi
qemu-system-x86_64: --drive driver=sheepdog,filename=sheepgod:///vdi: URI scheme must be 'sheepdog', 'sheepdog+tcp', or 'sheepdog+unix'
$ qemu-system-x86_64 --drive driver=sheepdog,filename=sheepdog+unix:///vdi?socke=sheepdog.sock
qemu-system-x86_64: --drive driver=sheepdog,filename=sheepdog+unix:///vdi?socke=sheepdog.sock: unexpected query parameters
The code to translate legacy syntax to URI fails to escape URI
meta-characters. The new error messages are misleading then. Replace
them by the old "Can't parse filename" message. "Internal error"
would be more honest. Anyway, no worse than before. Also add a FIXME
comment.
sheepdog: Fix snapshot ID parsing in _open(), _create, _goto()
sd_parse_uri() and sd_snapshot_goto() screw up error checking after
strtoul(), and truncate long tag names silently. Fix by replacing
those parts by new sd_parse_snapid_or_tag(), which checks more
carefully.
sd_snapshot_delete() also parses snapshot IDs, but is currently too
broken for me to touch. Mark TODO.
Two calls of strtol() without error checking remain in
parse_redundancy(). Mark them FIXME.
More silent truncation of configuration strings remains elsewhere.
Not marked.
sd_snapshot_delete() should delete the snapshot whose ID matches
@snapshot_id and whose name matches @name. But that's not what it
does. If @snapshot_id is a valid ID, it deletes the snapshot with
that ID, else it deletes the snapshot with that name. It doesn't use
@name at all. Add suitable FIXME comments, so someone who actually
knows Sheepdog can fix it.
As a bdrv_create() method, sd_create() must set an error and return
negative errno on failure. It prints the error instead of setting it
when connect_to_sdog() fails. Fix that.
While there, return the value of connect_to_sdog() like we do
elsewhere, instead of -EIO. No functional change, as
connect_to_sdog() returns no other error code.
Many more suspicious uses of error_report() and error_report_err()
remain in other functions. Left for another day.
sheepdog: Fix error handling in sd_snapshot_delete()
As a bdrv_snapshot_delete() method, sd_snapshot_delete() must set an
error and return negative errno on failure. It sometimes returns -1,
and sometimes neglects to set an error. It also prints error messages
with error_report(). Fix all that.
Moreover, its handling of an attempt to delete a nonexistent snapshot
is wrong: it error_report()s and succeeds. Fix it to set an error and
return -ENOENT instead.
sheepdog: Defuse time bomb in sd_open() error handling
When qemu_opts_absorb_qdict() fails, sd_open() closes stdin, because
sd->fd is still zero. Fortunately, qemu_opts_absorb_qdict() can't
fail, because:
1. it only fails when qemu_opt_parse() fails, and
2. the only member of runtime_opts.desc[] is a QEMU_OPT_STRING, and
3. qemu_opt_parse() can't fail for QEMU_OPT_STRING.
Defuse this ticking time bomb by jumping behind the file descriptor
cleanup on error.
Also do that for the error paths where sd->fd is still -1. The file
descriptor cleanup happens to do nothing then, but let's not rely on
that here.
While there, rename label out to err, because it's on the error path,
not the normal path out of the function.
Kevin Wolf [Mon, 6 Mar 2017 15:20:51 +0000 (16:20 +0100)]
block: Fix error handling in bdrv_replace_in_backing_chain()
When adding an Error parameter, bdrv_replace_in_backing_chain() would
become nothing more than a wrapper around change_parent_backing_link().
So make the latter public, renamed as bdrv_replace_node(), and remove
bdrv_replace_in_backing_chain().
Most of the callers just remove a node from the graph that they just
inserted, so they can use &error_abort, but completion of a mirror job
with 'replaces' set can actually fail.
Kevin Wolf [Thu, 2 Mar 2017 17:43:00 +0000 (18:43 +0100)]
block: Handle permission errors in change_parent_backing_link()
Instead of just trying to change parents by parent over to reference @to
instead of @from, and abort()ing whenever the permissions don't allow
this, do proper permission checking beforehand and pass any error to the
callers.
Kevin Wolf [Mon, 6 Mar 2017 14:00:13 +0000 (15:00 +0100)]
block: Ignore multiple children in bdrv_check_update_perm()
change_parent_backing_link() will need to update multiple BdrvChild
objects at once. Checking permissions reference by reference doesn't
work because permissions need to be consistent only with all parents
moved to the new child.
Kevin Wolf [Thu, 2 Mar 2017 14:26:18 +0000 (15:26 +0100)]
block: Fix blockdev-snapshot error handling
For blockdev-snapshot, external_snapshot_prepare() accepts an arbitrary
node reference at first and only checks later whether it already has a
backing file. Between those places, other errors can occur.
Therefore checking in external_snapshot_abort() whether state->new_bs
has a backing file is not sufficient to tell whether bdrv_append() was
already completed or not. Trying to undo the bdrv_append() when it
wasn't even executed is wrong.
Introduce a new boolean flag in the state to fix this.
Kevin Wolf [Mon, 6 Mar 2017 15:03:00 +0000 (16:03 +0100)]
mirror: Fix permissions for removing mirror_top_bs
mirror_top_bs takes write permissions on its backing file, which can
make it impossible to attach that backing file node to another parent.
However, this is exactly what needs to be done in order to remove
mirror_top_bs from the backing chain. So give up the write permission
first.
Kevin Wolf [Thu, 2 Mar 2017 16:48:14 +0000 (17:48 +0100)]
mirror: Fix permission problem with 'replaces'
The 'replaces' option of drive-mirror can be used to mirror a Quorum
node to a new image and then let the target image replace one of the
Quorum children. In order for this graph modification to succeed, the
mirror job needs to lift its restrictions on the target node first
before actually replacing the child.
Kevin Wolf [Fri, 3 Mar 2017 15:54:21 +0000 (16:54 +0100)]
commit: Fix error handling
Apparently some kind of mismerge happened in commit 8dfba279, which
broke the error handling without any real reason by removing the
assignment of the return value to ret in a blk_insert_bs() call.
Peter Maydell [Tue, 7 Mar 2017 09:09:53 +0000 (09:09 +0000)]
Merge remote-tracking branch 'remotes/gkurz/tags/fixes-for-2.9' into staging
Fixes issues that got merged with the latest pull request:
- missing O_NOFOLLOW flag for CVE-2016-960
- build break with older glibc that don't have O_PATH and AT_EMPTY_PATH
- various bugs reported by Coverity
# gpg: Signature made Mon 06 Mar 2017 17:51:29 GMT
# gpg: using DSA key 0x02FC3AEB0101DBC2
# gpg: Good signature from "Greg Kurz <[email protected]>"
# gpg: aka "Greg Kurz <[email protected]>"
# gpg: aka "Greg Kurz <[email protected]>"
# gpg: aka "Gregory Kurz (Groug) <[email protected]>"
# gpg: aka "[jpeg image of size 3330]"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894 DBA2 02FC 3AEB 0101 DBC2
* remotes/gkurz/tags/fixes-for-2.9:
9pfs: fix vulnerability in openat_dir() and local_unlinkat_common()
9pfs: fix O_PATH build break with older glibc versions
9pfs: don't use AT_EMPTY_PATH in local_set_cred_passthrough()
9pfs: fail local_statfs() earlier
9pfs: fix fd leak in local_opendir()
9pfs: fix bogus fd check in local_remove()
* remotes/mdroth/tags/qga-pull-2017-03-06-tag:
tests: check path to avoid a failing qga/get-vcpus test
qga: ignore EBUSY when freezing a filesystem
qga: add systemd socket activation support
Greg Kurz [Mon, 6 Mar 2017 16:34:01 +0000 (17:34 +0100)]
9pfs: fix O_PATH build break with older glibc versions
When O_PATH is used with O_DIRECTORY, it only acts as an optimization: the
openat() syscall simply finds the name in the VFS, and doesn't trigger the
underlying filesystem.
On systems that don't define O_PATH, because they have glibc version 2.13
or older for example, we can safely omit it. We don't want to deactivate
O_PATH globally though, in case it is used without O_DIRECTORY. The is done
with a dedicated macro.
Systems without O_PATH may thus fail to resolve names that involve
unreadable directories, compared to newer systems succeeding, but such
corner case failure is our only option on those older systems to avoid
the security hole of chasing symlinks inappropriately.
Greg Kurz [Mon, 6 Mar 2017 16:34:01 +0000 (17:34 +0100)]
9pfs: don't use AT_EMPTY_PATH in local_set_cred_passthrough()
The name argument can never be an empty string, and dirfd always point to
the containing directory of the file name. AT_EMPTY_PATH is hence useless
here. Also it breaks build with glibc version 2.13 and older.
It is actually an oversight of a previous tentative patch to implement this
function. We can safely drop it.
Greg Kurz [Mon, 6 Mar 2017 16:34:01 +0000 (17:34 +0100)]
9pfs: fix bogus fd check in local_remove()
This was spotted by Coverity as a fd leak. This is certainly true, but also
local_remove() would always return without doing anything, unless the fd is
zero, which is very unlikely.
Peter Maydell [Mon, 6 Mar 2017 15:13:23 +0000 (15:13 +0000)]
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Mon 06 Mar 2017 04:15:17 GMT
# gpg: using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
net/filter-mirror: Follow CODING_STYLE
COLO-compare: Fix icmp and udp compare different packet always dump bug
COLO-compare: Optimize compare_common and compare_tcp
COLO-compare: Rename compare function and remove duplicate codes
filter-rewriter: skip net_checksum_calculate() while offset = 0
net/colo: fix memory double free error
vmxnet3: VMStatify rx/tx q_descr and int_state
vmxnet3: Convert ring values to uint32_t's
net/colo-compare: Fix memory free error
colo-compare: Fix removing fds been watched incorrectly in finalization
char: remove the right fd been watched in qemu_chr_fe_set_handlers()
colo-compare: kick compare thread to exit after some cleanup in finalization
colo-compare: use g_timeout_source_new() to process the stale packets
NetRxPkt: Remove code duplication in net_rx_pkt_pull_data()
NetRxPkt: Account buffer with ETH header in IOV length
NetRxPkt: Do not try to pull more data than present
NetRxPkt: Fix memory corruption on VLAN header stripping
eth: Extend vlan stripping functions
net: Remove useless local var pkt
Peter Maydell [Mon, 6 Mar 2017 13:06:30 +0000 (13:06 +0000)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170306' into staging
ppc patch queue for 2017-03-06
Looks like my previous batch wasn't quite the last before hard freeze.
This has a handful of bugfixes to go in. They're all genuine
bugfixes, though not regressions in some cases.
* remotes/dgibson/tags/ppc-for-2.9-20170306:
target/ppc: use helper for excp handling
target/ppc: fmadd: add macro for updating flags
target/ppc: fmadd check for excp independently
spapr: ensure that all threads within core are on the same NUMA node
ppc/xics: register reset handlers for the ICP and ICS objects
Bruce Rogers [Thu, 2 Mar 2017 19:44:37 +0000 (12:44 -0700)]
tests: check path to avoid a failing qga/get-vcpus test
The qga/get-vcpus test fails in a simple chroot environment, as
used in an openSUSE Build Service local build, so first check
that the sysfs based path exists in order to avoid calling this
test in an environment where it won't work right.
Peter Lieven [Tue, 31 Jan 2017 15:36:34 +0000 (16:36 +0100)]
qga: ignore EBUSY when freezing a filesystem
the current implementation fails if we try to freeze an
already frozen filesystem. This can happen if a filesystem
is mounted more than once (e.g. with a bind mount).
Stefan Hajnoczi [Fri, 6 Jan 2017 15:29:30 +0000 (15:29 +0000)]
qga: add systemd socket activation support
AF_UNIX and AF_VSOCK listen sockets can be passed in by systemd on
startup. This allows systemd to manage the listen socket until the
first client connects and between restarts. Advantages of socket
activation are that parallel startup of network services becomes
possible and that unused daemons do not consume memory.
The key to achieving this is the LISTEN_FDS environment variable, which
is a stable ABI as shown here:
https://www.freedesktop.org/wiki/Software/systemd/InterfacePortabilityAndStabilityChart/
We could link against libsystemd and use sd_listen_fds(3) but it's easy
to implement the tiny LISTEN_FDS ABI so that qemu-ga does not depend on
libsystemd. Some systems may not have systemd installed and wish to
avoid the dependency. Other init systems or socket activation servers
may implement the same ABI without systemd involvement.
Zhang Chen [Thu, 2 Mar 2017 09:54:17 +0000 (17:54 +0800)]
COLO-compare: Optimize compare_common and compare_tcp
Add offset args for colo_packet_compare_common, optimize
colo_packet_compare_icmp() and colo_packet_compare_udp()
just compare the IP payload. Before compare all tcp packet,
we compare tcp checksum firstly, this function can get
better performance.
Zhang Chen [Thu, 2 Mar 2017 09:54:16 +0000 (17:54 +0800)]
COLO-compare: Rename compare function and remove duplicate codes
Rename colo_packet_compare() to colo_packet_compare_common() that
make tcp_compare udp_compare icmp_compare reuse this function.
Remove minimum packet size check in icmp_compare, because we have
check this in parse_packet_early().
zhanghailiang [Tue, 28 Feb 2017 03:54:19 +0000 (11:54 +0800)]
filter-rewriter: skip net_checksum_calculate() while offset = 0
While the offset of packets's sequence for primary side and
secondary side is zero, it is unnecessary to call net_checksum_calculate()
to recalculate the checksume value of packets.
zhanghailiang [Tue, 28 Feb 2017 03:54:18 +0000 (11:54 +0800)]
net/colo: fix memory double free error
The 'primary_list' and 'secondary_list' members of struct Connection
is not allocated through dynamically g_queue_new(), but we free it by using
g_queue_free(), which will lead to a double-free bug.
The index's in the Vmxnet3Ring were migrated as 32bit ints
yet are declared as size_t's. They appear to be derived
from 32bit values loaded from guest memory, so actually
store them as that.
zhanghailiang [Fri, 17 Feb 2017 02:53:14 +0000 (10:53 +0800)]
colo-compare: Fix removing fds been watched incorrectly in finalization
We will catch the bellow error report while try to delete compare object
by qmp command:
chardev/char-io.c:91: io_watch_poll_finalize: Assertion `iwp->src == ((void *)0)' failed.
This is caused by failing to remove the right fd been watched while
call qemu_chr_fe_set_handlers();
Fix it by pass the worker_context parameter to qemu_chr_fe_set_handlers().
zhanghailiang [Fri, 17 Feb 2017 02:53:13 +0000 (10:53 +0800)]
char: remove the right fd been watched in qemu_chr_fe_set_handlers()
We can call qemu_chr_fe_set_handlers() to add/remove fd been watched
in 'context' which can be either default main context or other explicit
context. But the original logic is not correct, we didn't remove
the right fd because we call g_main_context_find_source_by_id(NULL, tag)
which always try to find the Gsource from default context.
Fix it by passing the right context to g_main_context_find_source_by_id().
zhanghailiang [Fri, 17 Feb 2017 02:53:12 +0000 (10:53 +0800)]
colo-compare: kick compare thread to exit after some cleanup in finalization
We should call g_main_loop_quit() to notify colo compare thread to
exit, Or it will run in g_main_loop_run() forever.
Besides, the finalizing process can't happen in context of colo thread,
it is reasonable to remove the 'if (qemu_thread_is_self(&s->thread))'
branch.
Before compare thead exits, some cleanup works need to be
done, All unhandled packets need to be released and connection_track_table
needs to be freed, or there will be memory leak.
zhanghailiang [Fri, 17 Feb 2017 02:53:11 +0000 (10:53 +0800)]
colo-compare: use g_timeout_source_new() to process the stale packets
Instead of using qemu timer to process the stale packets,
We re-use the colo compare thread to process these packets
by creating a new timeout coroutine.
Besides, since we process all the same vNIC's net connection/packets
in one thread, it is safe to remove the timer_check_lock.
Dmitry Fleytman [Thu, 16 Feb 2017 12:29:33 +0000 (14:29 +0200)]
NetRxPkt: Fix memory corruption on VLAN header stripping
This patch fixed a problem that was introduced in commit eb700029.
When net_rx_pkt_attach_iovec() calls eth_strip_vlan()
this can result in pkt->ehdr_buf being overflowed, because
ehdr_buf is only sizeof(struct eth_header) bytes large
but eth_strip_vlan() can write
sizeof(struct eth_header) + sizeof(struct vlan_header)
bytes into it.