Kevin Wolf [Thu, 3 Apr 2014 10:09:34 +0000 (12:09 +0200)]
block: Fix snapshot=on for protocol parsed from filename
Since commit 9fd3171a, BDRV_O_SNAPSHOT uses an option QDict to specify
the originally requested image as the backing file of the newly created
temporary snapshot. This means that the filename is stored in
"file.filename", which is an option that is not parsed for protocol
names. Therefore things like -drive file=nbd:localhost:10809 were
broken because it looked for a local file with the literal name
'nbd:localhost:10809'.
This patch changes the way BDRV_O_SNAPSHOT works once again. We now open
the originally requested image as normal, and then do a similar
operation as for live snapshots to put the temporary snapshot on top.
This way, both driver specific options and parsed filenames work.
As a nice side effect, this results in code movement to factor
bdrv_append_temp_snapshot() out. This is a good preparation for moving
its call to drive_init() and friends eventually.
Kevin Wolf [Thu, 3 Apr 2014 10:48:38 +0000 (12:48 +0200)]
qemu-iotests: Remove CR line endings in reference output
qemu doesn't print these CRs any more. The test still didn't fail
because the output comparison ignores line endings, but the change turns
up each time when you want to update the output.
Kevin Wolf [Thu, 3 Apr 2014 10:45:51 +0000 (12:45 +0200)]
block: Don't parse 'filename' option
When using the QDict option 'filename', it is supposed to be interpreted
literally. The code did correctly avoid guessing the protocol from any
string before the first colon, but it still called bdrv_parse_filename()
which would, for example, incorrectly remove a 'file:' prefix in the
raw-posix driver.
Kevin Wolf [Sat, 8 Feb 2014 16:44:59 +0000 (17:44 +0100)]
qcow2: Put cache reference in error case
When qcow2_get_cluster_offset() sees a zero cluster in a version 2
image, it (rightfully) returns an error. But in doing so it shouldn't
leak an L2 table cache reference.
Kevin Wolf [Thu, 3 Apr 2014 11:47:50 +0000 (13:47 +0200)]
qcow2: Flush metadata during read-only reopen
If lazy refcounts are enabled for a backing file, committing to this
backing file may leave it in a dirty state even if the commit succeeds.
The reason is that the bdrv_flush() call in bdrv_commit() doesn't flush
refcount updates with lazy refcounts enabled, and qcow2_reopen_prepare()
doesn't take care to flush metadata.
In order to fix this, this patch also fixes qcow2_mark_clean(), which
contains another ineffective bdrv_flush() call beause lazy refcounts are
disabled only afterwards. All existing callers of qcow2_mark_clean()
either don't modify refcounts or already flush manually, so that this
fixes only a latent, but not yet actually triggerable bug.
Another instance of the same problem is live snapshots. Again, a real
corruption is prevented by an explicit flush for non-read-only images in
external_snapshot_prepare(), but images using lazy refcounts stay dirty.
Paolo Bonzini [Wed, 2 Apr 2014 10:12:50 +0000 (12:12 +0200)]
iscsi: recognize "invalid field" ASCQ from WRITE SAME command
Some targets may return "invalid field" as the ASCQ from WRITE SAME
if they support the command only without the UNMAP field. Recognize
that, and return ENOTSUP just like for "invalid operation code".
Frank Ch. Eigler [Tue, 25 Mar 2014 12:08:30 +0000 (13:08 +0100)]
trace: add workaround for SystemTap PR13296
SystemTap sdt.h sometimes results in compiled probes without sufficient
information to extract arguments. This can be solved in a slightly
hacky way by encouraging the compiler to place arguments into registers.
This patch fixes the apic_reset_irq_delivered() trace event on Fedora 20
with gcc-4.8.2-7.fc20 and systemtap-sdt-devel-2.4-2.fc20 on x86_64.
Stefan Hajnoczi [Tue, 1 Apr 2014 09:12:57 +0000 (11:12 +0200)]
qcow2: link all L2 meta updates in preallocate()
preallocate() only links the first QCowL2Meta's data clusters into the
L2 table and ignores any chained QCowL2Metas in the linked list.
Chains of QCowL2Meta structs are built up when contiguous clusters span
L2 tables. Each QCowL2Meta describes one L2 table update. This is a
rare case in preallocate() but can happen.
This patch fixes preallocate() by iterating over the whole list of
QCowL2Metas. Compare with the qcow2_co_writev() function's
implementation, which is similar but also also handles request
dependencies. preallocate() only performs one allocation at a time so
there can be no dependencies.
Kevin Wolf [Wed, 26 Mar 2014 12:06:09 +0000 (13:06 +0100)]
parallels: Sanity check for s->tracks (CVE-2014-0142)
This avoids a possible division by zero.
Convert s->tracks to unsigned as well because it feels better than
surviving just because the results of calculations with s->tracks are
converted to unsigned anyway.
The first test case would cause a huge memory allocation, leading to a
qemu abort; the second one to a too small malloc() for the catalog
(smaller than s->catalog_size), which causes a read-only out-of-bounds
array access and on big endian hosts an endianess conversion for an
undefined memory area.
The sample image used here is not an original Parallels image. It was
created using an hexeditor on the basis of the struct that qemu uses.
Good enough for trying to crash the driver, but not for ensuring
compatibility.
Kevin Wolf [Wed, 26 Mar 2014 12:06:07 +0000 (13:06 +0100)]
qcow2: Limit snapshot table size
Even with a limit of 64k snapshots, each snapshot could have a filename
and an ID with up to 64k, which would still lead to pretty large
allocations, which could potentially lead to qemu aborting. Limit the
total size of the snapshot table to an average of 1k per entry when
the limit of 64k snapshots is fully used. This should be plenty for any
reasonable user.
This also fixes potential integer overflows of s->snapshot_size.
Kevin Wolf [Wed, 26 Mar 2014 12:06:05 +0000 (13:06 +0100)]
qcow2: Fix L1 allocation size in qcow2_snapshot_load_tmp() (CVE-2014-0145)
For the L1 table to loaded for an internal snapshot, the code allocated
only enough memory to hold the currently active L1 table. If the
snapshot's L1 table is actually larger than the current one, this leads
to a buffer overflow.
Kevin Wolf [Wed, 26 Mar 2014 12:06:04 +0000 (13:06 +0100)]
qcow2: Fix NULL dereference in qcow2_open() error path (CVE-2014-0146)
The qcow2 code assumes that s->snapshots is non-NULL if s->nb_snapshots
!= 0. By having the initialisation of both fields separated in
qcow2_open(), any error occuring in between would cause the error path
to dereference NULL in qcow2_free_snapshots() if the image had any
snapshots.
Kevin Wolf [Wed, 26 Mar 2014 12:06:03 +0000 (13:06 +0100)]
qcow2: Fix copy_sectors() with VM state
bs->total_sectors is not the highest possible sector number that could
be involved in a copy on write operation: VM state is after the end of
the virtual disk. This resulted in wrong values for the number of
sectors to be copied (n).
The code that checks for the end of the image isn't required any more
because the code hasn't been calling the block layer's bdrv_read() for a
long time; instead, it directly calls qcow2_readv(), which doesn't error
out on VM state sector numbers.
Kevin Wolf [Wed, 26 Mar 2014 12:06:02 +0000 (13:06 +0100)]
block: Limit request size (CVE-2014-0143)
Limiting the size of a single request to INT_MAX not only fixes a
direct integer overflow in bdrv_check_request() (which would only
trigger bad behaviour with ridiculously huge images, as in close to
2^64 bytes), but can also prevent overflows in all block drivers.
Both compressed and uncompressed I/O is buffered. dmg_open() calculates
the maximum buffer size needed from the metadata in the image file.
There is currently a buffer overflow since ->lengths[] is accounted
against the maximum compressed buffer size but actually uses the
uncompressed buffer:
switch (s->types[chunk]) {
case 1: /* copy */
ret = bdrv_pread(bs->file, s->offsets[chunk],
s->uncompressed_chunk, s->lengths[chunk]);
We must account against the maximum uncompressed buffer size for type=1
chunks.
This patch fixes the maximum buffer size calculation to take into
account the chunk type. It is critical that we update the correct
maximum since there are two buffers ->compressed_chunk and
->uncompressed_chunk.
Stefan Hajnoczi [Wed, 26 Mar 2014 12:05:59 +0000 (13:05 +0100)]
dmg: use uint64_t consistently for sectors and lengths
The DMG metadata is stored as uint64_t, so use the same type for
sector_num. int was a particularly poor choice since it is only 32-bit
and would truncate large values.
Stefan Hajnoczi [Wed, 26 Mar 2014 12:05:58 +0000 (13:05 +0100)]
dmg: sanitize chunk length and sectorcount (CVE-2014-0145)
Chunk length and sectorcount are used for decompression buffers as well
as the bdrv_pread() count argument. Ensure that they have reasonable
values so neither memory allocation nor conversion from uint64_t to int
will cause problems.
Stefan Hajnoczi [Wed, 26 Mar 2014 12:05:56 +0000 (13:05 +0100)]
dmg: drop broken bdrv_pread() loop
It is not necessary to check errno for EINTR and the block layer does
not produce short reads. Therefore we can drop the loop that attempts
to read a compressed chunk.
The loop is buggy because it incorrectly adds the transferred bytes
twice:
do {
ret = bdrv_pread(...);
i += ret;
} while (ret >= 0 && ret + i < s->lengths[chunk]);
Luckily we can drop the loop completely and perform a single
bdrv_pread().
Stefan Hajnoczi [Wed, 26 Mar 2014 12:05:54 +0000 (13:05 +0100)]
dmg: coding style and indentation cleanup
Clean up the mix of tabs and spaces, as well as the coding style
violations in block/dmg.c. There are no semantic changes since this
patch simply reformats the code.
This patch is necessary before we can make meaningful changes to this
file, due to the inconsistent formatting and confusing indentation.
Kevin Wolf [Fri, 28 Mar 2014 17:06:31 +0000 (18:06 +0100)]
qcow2: Don't rely on free_cluster_index in alloc_refcount_block() (CVE-2014-0147)
free_cluster_index is only correct if update_refcount() was called from
an allocation function, and even there it's brittle because it's used to
protect unfinished allocations which still have a refcount of 0 - if it
moves in the wrong place, the unfinished allocation can be corrupted.
So not using it any more seems to be a good idea. Instead, use the
first requested cluster to do the calculations. Return -EAGAIN if
unfinished allocations could become invalid and let the caller restart
its search for some free clusters.
The context of creating a snapsnot is one situation where
update_refcount() is called outside of a cluster allocation. For this
case, the change fixes a buffer overflow if a cluster is referenced in
an L2 table that cannot be represented by an existing refcount block.
(new_table[refcount_table_index] was out of bounds)
[Bump the qemu-iotests 026 refblock_alloc.write leak count from 10 to
11.
--Stefan]
Kevin Wolf [Wed, 26 Mar 2014 12:05:47 +0000 (13:05 +0100)]
qcow2: Fix backing file name length check
len could become negative and would pass the check then. Nothing bad
happened because bdrv_pread() happens to return an error for negative
length values, but make variables for sizes unsigned anyway.
This patch also changes the behaviour to error out on invalid lengths
instead of silently truncating it to 1023.
Kevin Wolf [Wed, 26 Mar 2014 12:05:43 +0000 (13:05 +0100)]
qcow2: Check refcount table size (CVE-2014-0144)
Limit the in-memory reference count table size to 8 MB, it's enough in
practice. This fixes an unbounded allocation as well as a buffer
overflow in qcow2_refcount_init().
Kevin Wolf [Wed, 26 Mar 2014 12:05:42 +0000 (13:05 +0100)]
qcow2: Check backing_file_offset (CVE-2014-0144)
Header, header extension and the backing file name must all be stored in
the first cluster. Setting the backing file to a much higher value
allowed header extensions to become much bigger than we want them to be
(unbounded allocation).
Fam Zheng [Wed, 26 Mar 2014 12:05:40 +0000 (13:05 +0100)]
curl: check data size before memcpy to local buffer. (CVE-2014-0144)
curl_read_cb is callback function for libcurl when data arrives. The
data size passed in here is not guaranteed to be within the range of
request we submitted, so we may overflow the guest IO buffer. Check the
real size we have before memcpy to buffer to avoid overflow.
Jeff Cody [Wed, 26 Mar 2014 12:05:39 +0000 (13:05 +0100)]
vhdx: Bounds checking for block_size and logical_sector_size (CVE-2014-0148)
Other variables (e.g. sectors_per_block) are calculated using these
variables, and if not range-checked illegal values could be obtained
causing infinite loops and other potential issues when calculating
BAT entries.
The 1.00 VHDX spec requires BlockSize to be min 1MB, max 256MB.
LogicalSectorSize is required to be either 512 or 4096 bytes.
Jeff Cody [Fri, 28 Mar 2014 15:42:24 +0000 (11:42 -0400)]
vdi: add bounds checks for blocks_in_image and disk_size header fields (CVE-2014-0144)
The maximum blocks_in_image is 0xffffffff / 4, which also limits the
maximum disk_size for a VDI image to 1024TB. Note that this is the maximum
size that QEMU will currently support with this driver, not necessarily the
maximum size allowed by the image format.
Jeff Cody [Wed, 26 Mar 2014 12:05:36 +0000 (13:05 +0100)]
vpc/vhd: add bounds check for max_table_entries and block_size (CVE-2014-0144)
This adds checks to make sure that max_table_entries and block_size
are in sane ranges. Memory is allocated based on max_table_entries,
and block_size is used to calculate indices into that allocated
memory, so if these values are incorrect that can lead to potential
unbounded memory allocation, or invalid memory accesses.
Also, the allocation of the pagetable is changed from g_malloc0()
to qemu_blockalign().
Kevin Wolf [Wed, 26 Mar 2014 12:05:33 +0000 (13:05 +0100)]
bochs: Check catalog_size header field (CVE-2014-0143)
It should neither become negative nor allow unbounded memory
allocations. This fixes aborts in g_malloc() and an s->catalog_bitmap
buffer overflow on big endian hosts.
Kevin Wolf [Wed, 26 Mar 2014 12:05:31 +0000 (13:05 +0100)]
bochs: Unify header structs and make them QEMU_PACKED
This is an on-disk structure, so offsets must be accurate.
Before this patch, sizeof(bochs) != sizeof(header_v1), which makes the
memcpy() between both invalid. We're lucky enough that the destination
buffer happened to be the larger one, and the memcpy size to be taken
from the smaller one, so we didn't get a buffer overflow in practice.
This patch unifies the both structures, eliminating the need to do a
memcpy in the first place. The common fields are extracted to the top
level of the struct and the actually differing part gets a union of the
two versions.
Stefan Hajnoczi [Wed, 26 Mar 2014 12:05:29 +0000 (13:05 +0100)]
block/cloop: fix offsets[] size off-by-one
cloop stores the number of compressed blocks in the n_blocks header
field. The file actually contains n_blocks + 1 offsets, where the extra
offset is the end-of-file offset.
The following line in cloop_read_block() results in an out-of-bounds
offsets[] access:
Stefan Hajnoczi [Wed, 26 Mar 2014 12:05:28 +0000 (13:05 +0100)]
block/cloop: refuse images with bogus offsets (CVE-2014-0144)
The offsets[] array allows efficient seeking and tells us the maximum
compressed data size. If the offsets are bogus the maximum compressed
data size will be unrealistic.
This could cause g_malloc() to abort and bogus offsets mean the image is
broken anyway. Therefore we should refuse such images.
Stefan Hajnoczi [Wed, 26 Mar 2014 12:05:25 +0000 (13:05 +0100)]
block/cloop: validate block_size header field (CVE-2014-0144)
Avoid unbounded s->uncompressed_block memory allocation by checking that
the block_size header field has a reasonable value. Also enforce the
assumption that the value is a non-zero multiple of 512.
These constraints conform to cloop 2.639's code so we accept existing
image files.
Peter Maydell [Mon, 31 Mar 2014 21:11:29 +0000 (22:11 +0100)]
Merge remote-tracking branch 'remotes/afaerber/tags/qom-devices-for-2.0' into staging
QOM/QTest infrastructure fixes
* Revised QTest SIGABRT fix
* Test cleanups for non-POSIX hosts
* QTest test cases for NVMe, virtio-9p, pvpanic, i82801b11
* QTest API addition for reading events
* TMP105 fix and regression test
# gpg: Signature made Mon 31 Mar 2014 22:08:10 BST using RSA key ID 3E7E013F
# gpg: Good signature from "Andreas Färber <[email protected]>"
# gpg: aka "Andreas Färber <[email protected]>"
* remotes/afaerber/tags/qom-devices-for-2.0:
tmp105-test: Test QOM property and precision
tmp105-test: Add a second sensor and test that one
tmp105-test: Wrap simple building blocks for testing
tmp105: Read temperature in milli-celsius
tests: Add i82801b11 qtest
pvpanic-test: Assert pause event
qtest: Factor out qtest_qmp_receive()
tests: Add pvpanic qtest
tests: Add virtio-9p qtest
tests: Add nvme qtest
nvme: Permit zero-length block devices
tests: Correctly skip qtest on non-POSIX hosts
tests: Skip POSIX-only tests on Windows
tests: Remove unsupported tests for MinGW
qtest: Keep list of qtest instances for SIGABRT handler
Revert "qtest: Fix crash if SIGABRT during qtest_init()"
Paolo Bonzini [Mon, 31 Mar 2014 16:26:32 +0000 (18:26 +0200)]
tmp105: Read temperature in milli-celsius
Right now, the temperature property must be written in milli-celsius,
but it reads back the value in 8.8 fixed point. Fix this by letting the
property read back the original value (possibly rounded). Also simplify
the code that does the conversion.
But the QTEST_TARGETS definition earlier in the Makefile fails to check
CONFIG_POSIX. This causes make targets to be generated for qtest test
cases even though we don't know how to build the binaries.
The following error message is printed when trying to run gtester on a
binary that was never built:
GLib-WARNING **: Failed to execute test binary: tests/endianness-test.exe: Failed to execute child process "tests/endianness-test.exe" (No such file or directory)
This patch makes QTEST_TARGETS empty on non-POSIX hosts. This prevents
the targets from being generated.
Stefan Weil [Fri, 28 Mar 2014 09:55:52 +0000 (10:55 +0100)]
tests: Remove unsupported tests for MinGW
test_timer_schedule and test_source_timer_schedule don't compile for MinGW
because some functions are not implemented for MinGW (qemu_pipe,
aio_set_fd_handler).
Steven Noonan [Fri, 28 Mar 2014 16:19:02 +0000 (17:19 +0100)]
configure: add option to disable -fstack-protector flags
The -fstack-protector flag family is useful for ensuring safety and for
debugging, but has a performance impact. Here are some boot time comparisons of
the various versions of -fstack-protector using qemu-system-arm on an x86_64
host:
This patch introduces a configure option to disable the stack protector
entirely, and conditional stack protector flag selection (in order,
based on availability): -fstack-protector-strong, -fstack-protector-all,
no stack protector.
Cole Robinson [Mon, 31 Mar 2014 18:31:44 +0000 (14:31 -0400)]
pci: Fix clearing IRQs on reset
irq_state is cleared before calling pci_device_deassert_intx, but the
latter misbehaves if the former isn't accurate. In this case, any raised
IRQs are not cleared, which hits an assertion in pcibus_reset:
Andreas Färber [Fri, 28 Mar 2014 15:25:07 +0000 (16:25 +0100)]
cpu: Avoid QOM casts for CPU()
CPU address spaces touching load and store helpers as well as the
movement of (almost) all fields from CPU_COMMON to CPUState have led to
a noticeable increase of CPU() usage in "hot" paths for both TCG and KVM.
While CPU()'s OBJECT_CHECK() might help detect development errors, i.e.
in form of crashes due to QOM vs. non-QOM mismatches rather than QOM
type mismatches, it is not really needed at runtime since mostly used in
CPU-specific paths, coming from a target-specific CPU subtype. If that
pointer is damaged, other errors are highly likely to occur elsewhere
anyway.
Keep the CPU() macro for a consistent developer experience and for
flexibility to exchange its implementation, but turn it into a pure,
unchecked C cast for now.
Luiz Capitulino [Wed, 19 Mar 2014 21:03:53 +0000 (17:03 -0400)]
target-i386: x86_cpu_get_phys_page_debug(): support 1GB page translation
Linux guests, when using more than 4GB of RAM, may end up using 1GB pages
to store (kernel) data. When this happens, we're unable to debug a running
Linux kernel with GDB:
(gdb) p node_data[0]->node_id
Cannot access memory at address 0xffff88013fffd3a0
(gdb)
GDB returns this error because x86_cpu_get_phys_page_debug() doesn't support
translating 1GB pages in IA-32e paging mode and returns an error to GDB.
This commit adds support for 1GB page translation for IA32e paging.
Peter Maydell [Fri, 28 Mar 2014 13:46:28 +0000 (13:46 +0000)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
acpi,pc,build bug fixes
Here are some bugfixes for 2.0.
A bugfix for acpi for pci bridges, and a build fix for
old systems without pthread_setname_np: both fix regressions
so we definitely want to include them.
HPET fix is not for a regression but looks very safe,
fixes a nasty bug and has been on list for a while.
Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Fri 28 Mar 2014 12:00:12 GMT using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>"
# gpg: aka "Michael S. Tsirkin <[email protected]>"
* remotes/mst/tags/for_upstream:
acpi: fix ACPI generation for pci bridges
Don't enable a HPET timer if HPET is disabled
Detect pthread_setname_np at configure time