Kevin Wolf [Wed, 26 Mar 2014 12:06:09 +0000 (13:06 +0100)]
parallels: Sanity check for s->tracks (CVE-2014-0142)
This avoids a possible division by zero.
Convert s->tracks to unsigned as well because it feels better than
surviving just because the results of calculations with s->tracks are
converted to unsigned anyway.
The first test case would cause a huge memory allocation, leading to a
qemu abort; the second one to a too small malloc() for the catalog
(smaller than s->catalog_size), which causes a read-only out-of-bounds
array access and on big endian hosts an endianess conversion for an
undefined memory area.
The sample image used here is not an original Parallels image. It was
created using an hexeditor on the basis of the struct that qemu uses.
Good enough for trying to crash the driver, but not for ensuring
compatibility.
Kevin Wolf [Wed, 26 Mar 2014 12:06:07 +0000 (13:06 +0100)]
qcow2: Limit snapshot table size
Even with a limit of 64k snapshots, each snapshot could have a filename
and an ID with up to 64k, which would still lead to pretty large
allocations, which could potentially lead to qemu aborting. Limit the
total size of the snapshot table to an average of 1k per entry when
the limit of 64k snapshots is fully used. This should be plenty for any
reasonable user.
This also fixes potential integer overflows of s->snapshot_size.
Kevin Wolf [Wed, 26 Mar 2014 12:06:05 +0000 (13:06 +0100)]
qcow2: Fix L1 allocation size in qcow2_snapshot_load_tmp() (CVE-2014-0145)
For the L1 table to loaded for an internal snapshot, the code allocated
only enough memory to hold the currently active L1 table. If the
snapshot's L1 table is actually larger than the current one, this leads
to a buffer overflow.
Kevin Wolf [Wed, 26 Mar 2014 12:06:04 +0000 (13:06 +0100)]
qcow2: Fix NULL dereference in qcow2_open() error path (CVE-2014-0146)
The qcow2 code assumes that s->snapshots is non-NULL if s->nb_snapshots
!= 0. By having the initialisation of both fields separated in
qcow2_open(), any error occuring in between would cause the error path
to dereference NULL in qcow2_free_snapshots() if the image had any
snapshots.
Kevin Wolf [Wed, 26 Mar 2014 12:06:03 +0000 (13:06 +0100)]
qcow2: Fix copy_sectors() with VM state
bs->total_sectors is not the highest possible sector number that could
be involved in a copy on write operation: VM state is after the end of
the virtual disk. This resulted in wrong values for the number of
sectors to be copied (n).
The code that checks for the end of the image isn't required any more
because the code hasn't been calling the block layer's bdrv_read() for a
long time; instead, it directly calls qcow2_readv(), which doesn't error
out on VM state sector numbers.
Kevin Wolf [Wed, 26 Mar 2014 12:06:02 +0000 (13:06 +0100)]
block: Limit request size (CVE-2014-0143)
Limiting the size of a single request to INT_MAX not only fixes a
direct integer overflow in bdrv_check_request() (which would only
trigger bad behaviour with ridiculously huge images, as in close to
2^64 bytes), but can also prevent overflows in all block drivers.
Both compressed and uncompressed I/O is buffered. dmg_open() calculates
the maximum buffer size needed from the metadata in the image file.
There is currently a buffer overflow since ->lengths[] is accounted
against the maximum compressed buffer size but actually uses the
uncompressed buffer:
switch (s->types[chunk]) {
case 1: /* copy */
ret = bdrv_pread(bs->file, s->offsets[chunk],
s->uncompressed_chunk, s->lengths[chunk]);
We must account against the maximum uncompressed buffer size for type=1
chunks.
This patch fixes the maximum buffer size calculation to take into
account the chunk type. It is critical that we update the correct
maximum since there are two buffers ->compressed_chunk and
->uncompressed_chunk.
Stefan Hajnoczi [Wed, 26 Mar 2014 12:05:59 +0000 (13:05 +0100)]
dmg: use uint64_t consistently for sectors and lengths
The DMG metadata is stored as uint64_t, so use the same type for
sector_num. int was a particularly poor choice since it is only 32-bit
and would truncate large values.
Stefan Hajnoczi [Wed, 26 Mar 2014 12:05:58 +0000 (13:05 +0100)]
dmg: sanitize chunk length and sectorcount (CVE-2014-0145)
Chunk length and sectorcount are used for decompression buffers as well
as the bdrv_pread() count argument. Ensure that they have reasonable
values so neither memory allocation nor conversion from uint64_t to int
will cause problems.
Stefan Hajnoczi [Wed, 26 Mar 2014 12:05:56 +0000 (13:05 +0100)]
dmg: drop broken bdrv_pread() loop
It is not necessary to check errno for EINTR and the block layer does
not produce short reads. Therefore we can drop the loop that attempts
to read a compressed chunk.
The loop is buggy because it incorrectly adds the transferred bytes
twice:
do {
ret = bdrv_pread(...);
i += ret;
} while (ret >= 0 && ret + i < s->lengths[chunk]);
Luckily we can drop the loop completely and perform a single
bdrv_pread().
Stefan Hajnoczi [Wed, 26 Mar 2014 12:05:54 +0000 (13:05 +0100)]
dmg: coding style and indentation cleanup
Clean up the mix of tabs and spaces, as well as the coding style
violations in block/dmg.c. There are no semantic changes since this
patch simply reformats the code.
This patch is necessary before we can make meaningful changes to this
file, due to the inconsistent formatting and confusing indentation.
Kevin Wolf [Fri, 28 Mar 2014 17:06:31 +0000 (18:06 +0100)]
qcow2: Don't rely on free_cluster_index in alloc_refcount_block() (CVE-2014-0147)
free_cluster_index is only correct if update_refcount() was called from
an allocation function, and even there it's brittle because it's used to
protect unfinished allocations which still have a refcount of 0 - if it
moves in the wrong place, the unfinished allocation can be corrupted.
So not using it any more seems to be a good idea. Instead, use the
first requested cluster to do the calculations. Return -EAGAIN if
unfinished allocations could become invalid and let the caller restart
its search for some free clusters.
The context of creating a snapsnot is one situation where
update_refcount() is called outside of a cluster allocation. For this
case, the change fixes a buffer overflow if a cluster is referenced in
an L2 table that cannot be represented by an existing refcount block.
(new_table[refcount_table_index] was out of bounds)
[Bump the qemu-iotests 026 refblock_alloc.write leak count from 10 to
11.
--Stefan]
Kevin Wolf [Wed, 26 Mar 2014 12:05:47 +0000 (13:05 +0100)]
qcow2: Fix backing file name length check
len could become negative and would pass the check then. Nothing bad
happened because bdrv_pread() happens to return an error for negative
length values, but make variables for sizes unsigned anyway.
This patch also changes the behaviour to error out on invalid lengths
instead of silently truncating it to 1023.
Kevin Wolf [Wed, 26 Mar 2014 12:05:43 +0000 (13:05 +0100)]
qcow2: Check refcount table size (CVE-2014-0144)
Limit the in-memory reference count table size to 8 MB, it's enough in
practice. This fixes an unbounded allocation as well as a buffer
overflow in qcow2_refcount_init().
Kevin Wolf [Wed, 26 Mar 2014 12:05:42 +0000 (13:05 +0100)]
qcow2: Check backing_file_offset (CVE-2014-0144)
Header, header extension and the backing file name must all be stored in
the first cluster. Setting the backing file to a much higher value
allowed header extensions to become much bigger than we want them to be
(unbounded allocation).
Fam Zheng [Wed, 26 Mar 2014 12:05:40 +0000 (13:05 +0100)]
curl: check data size before memcpy to local buffer. (CVE-2014-0144)
curl_read_cb is callback function for libcurl when data arrives. The
data size passed in here is not guaranteed to be within the range of
request we submitted, so we may overflow the guest IO buffer. Check the
real size we have before memcpy to buffer to avoid overflow.
Jeff Cody [Wed, 26 Mar 2014 12:05:39 +0000 (13:05 +0100)]
vhdx: Bounds checking for block_size and logical_sector_size (CVE-2014-0148)
Other variables (e.g. sectors_per_block) are calculated using these
variables, and if not range-checked illegal values could be obtained
causing infinite loops and other potential issues when calculating
BAT entries.
The 1.00 VHDX spec requires BlockSize to be min 1MB, max 256MB.
LogicalSectorSize is required to be either 512 or 4096 bytes.
Jeff Cody [Fri, 28 Mar 2014 15:42:24 +0000 (11:42 -0400)]
vdi: add bounds checks for blocks_in_image and disk_size header fields (CVE-2014-0144)
The maximum blocks_in_image is 0xffffffff / 4, which also limits the
maximum disk_size for a VDI image to 1024TB. Note that this is the maximum
size that QEMU will currently support with this driver, not necessarily the
maximum size allowed by the image format.
Jeff Cody [Wed, 26 Mar 2014 12:05:36 +0000 (13:05 +0100)]
vpc/vhd: add bounds check for max_table_entries and block_size (CVE-2014-0144)
This adds checks to make sure that max_table_entries and block_size
are in sane ranges. Memory is allocated based on max_table_entries,
and block_size is used to calculate indices into that allocated
memory, so if these values are incorrect that can lead to potential
unbounded memory allocation, or invalid memory accesses.
Also, the allocation of the pagetable is changed from g_malloc0()
to qemu_blockalign().
Kevin Wolf [Wed, 26 Mar 2014 12:05:33 +0000 (13:05 +0100)]
bochs: Check catalog_size header field (CVE-2014-0143)
It should neither become negative nor allow unbounded memory
allocations. This fixes aborts in g_malloc() and an s->catalog_bitmap
buffer overflow on big endian hosts.
Kevin Wolf [Wed, 26 Mar 2014 12:05:31 +0000 (13:05 +0100)]
bochs: Unify header structs and make them QEMU_PACKED
This is an on-disk structure, so offsets must be accurate.
Before this patch, sizeof(bochs) != sizeof(header_v1), which makes the
memcpy() between both invalid. We're lucky enough that the destination
buffer happened to be the larger one, and the memcpy size to be taken
from the smaller one, so we didn't get a buffer overflow in practice.
This patch unifies the both structures, eliminating the need to do a
memcpy in the first place. The common fields are extracted to the top
level of the struct and the actually differing part gets a union of the
two versions.
Stefan Hajnoczi [Wed, 26 Mar 2014 12:05:29 +0000 (13:05 +0100)]
block/cloop: fix offsets[] size off-by-one
cloop stores the number of compressed blocks in the n_blocks header
field. The file actually contains n_blocks + 1 offsets, where the extra
offset is the end-of-file offset.
The following line in cloop_read_block() results in an out-of-bounds
offsets[] access:
Stefan Hajnoczi [Wed, 26 Mar 2014 12:05:28 +0000 (13:05 +0100)]
block/cloop: refuse images with bogus offsets (CVE-2014-0144)
The offsets[] array allows efficient seeking and tells us the maximum
compressed data size. If the offsets are bogus the maximum compressed
data size will be unrealistic.
This could cause g_malloc() to abort and bogus offsets mean the image is
broken anyway. Therefore we should refuse such images.
Stefan Hajnoczi [Wed, 26 Mar 2014 12:05:25 +0000 (13:05 +0100)]
block/cloop: validate block_size header field (CVE-2014-0144)
Avoid unbounded s->uncompressed_block memory allocation by checking that
the block_size header field has a reasonable value. Also enforce the
assumption that the value is a non-zero multiple of 512.
These constraints conform to cloop 2.639's code so we accept existing
image files.
Steven Noonan [Fri, 28 Mar 2014 16:19:02 +0000 (17:19 +0100)]
configure: add option to disable -fstack-protector flags
The -fstack-protector flag family is useful for ensuring safety and for
debugging, but has a performance impact. Here are some boot time comparisons of
the various versions of -fstack-protector using qemu-system-arm on an x86_64
host:
This patch introduces a configure option to disable the stack protector
entirely, and conditional stack protector flag selection (in order,
based on availability): -fstack-protector-strong, -fstack-protector-all,
no stack protector.
Cole Robinson [Mon, 31 Mar 2014 18:31:44 +0000 (14:31 -0400)]
pci: Fix clearing IRQs on reset
irq_state is cleared before calling pci_device_deassert_intx, but the
latter misbehaves if the former isn't accurate. In this case, any raised
IRQs are not cleared, which hits an assertion in pcibus_reset:
Andreas Färber [Fri, 28 Mar 2014 15:25:07 +0000 (16:25 +0100)]
cpu: Avoid QOM casts for CPU()
CPU address spaces touching load and store helpers as well as the
movement of (almost) all fields from CPU_COMMON to CPUState have led to
a noticeable increase of CPU() usage in "hot" paths for both TCG and KVM.
While CPU()'s OBJECT_CHECK() might help detect development errors, i.e.
in form of crashes due to QOM vs. non-QOM mismatches rather than QOM
type mismatches, it is not really needed at runtime since mostly used in
CPU-specific paths, coming from a target-specific CPU subtype. If that
pointer is damaged, other errors are highly likely to occur elsewhere
anyway.
Keep the CPU() macro for a consistent developer experience and for
flexibility to exchange its implementation, but turn it into a pure,
unchecked C cast for now.
Luiz Capitulino [Wed, 19 Mar 2014 21:03:53 +0000 (17:03 -0400)]
target-i386: x86_cpu_get_phys_page_debug(): support 1GB page translation
Linux guests, when using more than 4GB of RAM, may end up using 1GB pages
to store (kernel) data. When this happens, we're unable to debug a running
Linux kernel with GDB:
(gdb) p node_data[0]->node_id
Cannot access memory at address 0xffff88013fffd3a0
(gdb)
GDB returns this error because x86_cpu_get_phys_page_debug() doesn't support
translating 1GB pages in IA-32e paging mode and returns an error to GDB.
This commit adds support for 1GB page translation for IA32e paging.
Peter Maydell [Fri, 28 Mar 2014 13:46:28 +0000 (13:46 +0000)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
acpi,pc,build bug fixes
Here are some bugfixes for 2.0.
A bugfix for acpi for pci bridges, and a build fix for
old systems without pthread_setname_np: both fix regressions
so we definitely want to include them.
HPET fix is not for a regression but looks very safe,
fixes a nasty bug and has been on list for a while.
Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Fri 28 Mar 2014 12:00:12 GMT using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>"
# gpg: aka "Michael S. Tsirkin <[email protected]>"
* remotes/mst/tags/for_upstream:
acpi: fix ACPI generation for pci bridges
Don't enable a HPET timer if HPET is disabled
Detect pthread_setname_np at configure time
Marcel Apfelbaum [Thu, 27 Mar 2014 15:35:36 +0000 (17:35 +0200)]
acpi: fix ACPI generation for pci bridges
Commit 8dcf525abc5dff785251a881f9764dd961065c0d
acpi-build: append description for non-hotplug
appended description for all occupied non hotpluggable PCI slots.
However the bridge devices are already added to SSDT,
adding them again will create an incorrect SSDT table.
Fixed by skipping the pci bridge devices, marking them as 'system'.
Peter Maydell [Thu, 27 Mar 2014 17:08:30 +0000 (17:08 +0000)]
Merge remote-tracking branch 'remotes/afaerber/tags/ppc-for-2.0' into staging
PowerPC queue for 2.0
* OpenPIC fix
* MSR fixes for POWER7 upwards
* TCG instruction set support fix for POWER8
# gpg: Signature made Thu 27 Mar 2014 16:12:12 GMT using RSA key ID 3E7E013F
# gpg: Good signature from "Andreas Färber <[email protected]>"
# gpg: aka "Andreas Färber <[email protected]>"
* remotes/afaerber/tags/ppc-for-2.0:
target-ppc: MSR_POW not supported on POWER7/7+/8
target-ppc: POWER7+ supports the MSR_VSX bit
target-ppc: POWER8 supports isel
target-ppc: POWER8 supports the MSR_LE bit
intc/openpic_kvm: Fix MemListener delete region callback function
Peter Maydell [Thu, 27 Mar 2014 16:38:58 +0000 (16:38 +0000)]
Merge remote-tracking branch 'remotes/mjt/tags/trivial-patches-2014-03-27' into staging
trivial patches for 2014-03-27
# gpg: Signature made Thu 27 Mar 2014 15:23:53 GMT using RSA key ID 74F0C838
# gpg: Good signature from "Michael Tokarev <[email protected]>"
# gpg: aka "Michael Tokarev <[email protected]>"
# gpg: aka "Michael Tokarev <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D 4324 457C E0A0 8044 65C5
# Subkey fingerprint: E190 8639 3B10 B51B AC2C 8B73 5253 C5AD 74F0 C838
* remotes/mjt/tags/trivial-patches-2014-03-27: (23 commits)
linux-user: remove duplicate statement
hw/timer/grlib_gptimer: remove unnecessary assignment
hw/pci-host/apb.c: Avoid shifting left into sign bit
hw/intc/xilinx_intc: Avoid shifting left into sign bit
hw/intc/slavio_intctl: Avoid shifting left into sign bit
tests/libqos/pci-pc: Avoid shifting left into sign bit
hw/ppc: Avoid shifting left into sign bit
hw/intc/openpic: Avoid shifting left into sign bit
hw/usb/hcd-ohci.c: Avoid shifting left into sign bit
target-mips: Avoid shifting left into sign bit
hw/i386/acpi_build.c: Avoid shifting left into sign bit
hw/pci/pci_host.c: Avoid shifting left into sign bit
hw/intc/apic.c: Use uint32_t for mask word in foreach_apic
target-i386: Avoid shifting left into sign bit
CODING_STYLE: Section about mixed declarations
doc: update default PowerPC framebuffer settings
doc: update sun4m documentation
fix return check for KVM_GET_DIRTY_LOG ioctl
target-i386: Add missing 'static' and 'const' attributes
util: Add 'static' attribute to function implementation
...
Matt Lupfer [Sat, 22 Feb 2014 04:37:23 +0000 (21:37 -0700)]
Don't enable a HPET timer if HPET is disabled
A HPET timer can be started when HPET is not yet
enabled. This will not generate an interrupt
to the guest, but causes problems when HPET is later
enabled.
A timer that is created and expires at least once before
HPET is enabled will have an initialized comparator based
on a hpet_offset of 0 (uninitialized). When HPET is
enabled, hpet_set_timer() is called a second time, which
modifies the timer expiry to a time based on the
difference between current ticks (measured with the
newly initialized hpet_offset) and the timer's
comparator (which was generated before hpet_offset was
initialized). This results in a long period of no HPET
timer ticks.
When this occurs with a CentOS 5.x guest, the guest
may not receive timer interrupts during its narrow
timer check window and panic on boot.
Peter Maydell [Thu, 27 Mar 2014 15:29:33 +0000 (15:29 +0000)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20140327' into staging
target-arm queue:
* Don't default to integratorcp board if no machine specified
# gpg: Signature made Thu 27 Mar 2014 14:09:12 GMT using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <[email protected]>"
* remotes/pmaydell/tags/pull-target-arm-20140327:
vl.c: Improve message when no default machine is found
hw/arm: Stop specifying integratorcp as the default board
Peter Maydell [Mon, 17 Mar 2014 16:00:37 +0000 (16:00 +0000)]
hw/ppc: Avoid shifting left into sign bit
Add U suffix to various places where we were doing "1 << 31",
which is undefined behaviour, and also to other constant
definitions in the same groups, for consistency.
Peter Maydell [Mon, 17 Mar 2014 16:00:36 +0000 (16:00 +0000)]
hw/intc/openpic: Avoid shifting left into sign bit
Add U suffix to avoid undefined behaviour. This is only strictly
necessary for the 1 << 31 cases; for consistency we extend it
to other constants in the same group.
Peter Maydell [Mon, 17 Mar 2014 16:00:35 +0000 (16:00 +0000)]
hw/usb/hcd-ohci.c: Avoid shifting left into sign bit
Add U suffix to avoid undefined behaviour. This is only
strictly necessary for the 1<<31 cases, but we add it for the
other constants in these groups for consistency.
Peter Maydell [Mon, 17 Mar 2014 16:00:31 +0000 (16:00 +0000)]
hw/intc/apic.c: Use uint32_t for mask word in foreach_apic
Use unsigned arithmetic for operations on the mask word
in the foreach_apic() macro, to avoid relying on undefined
behaviour when shifting into the sign bit.
Peter Maydell [Mon, 17 Mar 2014 16:00:30 +0000 (16:00 +0000)]
target-i386: Avoid shifting left into sign bit
Add 'U' suffixes where necessary to avoid (1 << 31) which
shifts left into the sign bit, which is undefined behaviour.
Add the suffix also for other constants in the same groupings
even if they don't shift into bit 31, for consistency.
Mario Smarduch [Wed, 19 Mar 2014 17:24:26 +0000 (10:24 -0700)]
fix return check for KVM_GET_DIRTY_LOG ioctl
Fix return condition check from kvm_vm_ioctl(s, KVM_GET_DIRTY_LOG, &d) to
handle internal failures or no support for memory slot dirty bitmap.
Otherwise the ioctl succeeds and continues with migration.
Addresses BUG# 1294227
Peter Maydell [Thu, 27 Mar 2014 14:00:52 +0000 (14:00 +0000)]
hw/arm: Stop specifying integratorcp as the default board
Currently for both qemu-system-arm and qemu-system-aarch64
the default board model if the user doesn't specify one
is the 'integratorcp'. This is a totally arbitrary historical
accident since it was the first board to be modelled.
That board is now just one target among many for us, and
is a very poor choice of default:
* it's an ancient board that is now only found in the
junkpiles of longtime ARM/Linux hackers, if at all
* it's an ARMv5 CPU, when most distros are now assuming
ARMv7
* it's pretty much unmaintained in QEMU
* it doesn't even have versatilepb's advantage of
supporting PCI
Making it or any other board the default serves only
to confuse people new to ARM who expect something more
like the x86 monoculture. Remove the is_default marker
from integratorcp, and don't set it for any other board,
to give users a nudge that they need to think about
which board they want a QEMU model of. (QEMU will produce
the admittedly slightly cryptic error "No machine found.")