Kevin Wolf [Thu, 12 Apr 2012 11:20:41 +0000 (13:20 +0200)]
Specification for qcow2 version 3
This updates the qcow2 specification to cover version 3. It contains the
following changes:
- Added compatible/incompatible/auto-clear feature bits plus an optional
feature name table to allow useful error messages even if an older
version doesn't know some feature at all.
- Configurable refcount width. If you don't want to use internal
snapshots, make refcounts one bit and save cache space and I/O.
- Zero cluster flags. This allows discard even with a backing file that
doesn't contain zeros. It is also useful for copy-on-read/image
streaming, as you'll want to keep sparseness without accessing the
remote image for an unallocated cluster all the time.
- Fixed internal snapshot metadata to use 64 bit VM state size. You
can't save a snapshot of a VM with >= 4 GB RAM today.
- Extended internal snapshot metadata to contain the disk size, so that
resizing images that have snapshots can be allowed in the future.
Kevin Wolf [Fri, 20 Apr 2012 13:50:39 +0000 (15:50 +0200)]
qcow2: Fix refcount block allocation during qcow2_alloc_cluster_at()
Refcount block allocation and refcount table growth rely on
s->free_cluster_index pointing to somewhere after the current
allocation. Change qcow2_alloc_cluster_at() to fulfill this
assumption.
Without this change it could happen that a newly allocated refcount
block and the allocated data block point to the same area in the image
file, causing data corruption in the long run.
This fixes a bug that became first visible after commit 250196f1.
David Gibson [Fri, 20 Apr 2012 01:40:24 +0000 (11:40 +1000)]
Add .gitignore for tests/
The new autotests in tests/ generate a number of files, both
executable and source, which are not caught by the existing .gitignore
files. This patch adds a new .gitignore in tests/ which covers these.
[Changed 'rtc-test' to '*-test' so future tests do not need to be added
to .gitignore on a case-by-case basis. Stefan]
Amos Kong [Mon, 16 Apr 2012 07:32:49 +0000 (15:32 +0800)]
error.c: don't return value for void function
It is invalid to return a value from a function
returning void.
[C99 6.8.6.4 says "A return statement with an expression shall not
appear in a function whose return type is void" but gcc 4.6.3 with QEMU
compile flags does not complain. It's still worth fixing this. Stefan]
Liu Yuan [Fri, 20 Apr 2012 09:10:56 +0000 (17:10 +0800)]
qemu-img: let 'qemu-img convert' flush data
The 'qemu-img convert -h' advertise that the default cache mode is
'writeback', while in fact it is 'unsafe'.
This patch 1) fix the help manual and 2) let bdrv_close() call bdrv_flush()
2) is needed because some backend storage doesn't have a self-flush
mechanism(for e.g., sheepdog), so we need to call bdrv_flush() to make
sure the image is really writen to the storage instead of hanging around
writeback cache forever.
Michael Roth [Sat, 14 Apr 2012 02:07:36 +0000 (21:07 -0500)]
qemu-ga: generate missing stubs for fsfreeze
When linux-specific commands (including guest-fsfreeze-*) were consolidated
under defined(__linux__), we forgot to account for the case where
defined(__linux__) && !defined(FIFREEZE). As a result stubs are no longer
being generated on linux hosts that don't have FIFREEZE support. Fix
this.
Paolo Bonzini [Thu, 12 Apr 2012 12:00:55 +0000 (14:00 +0200)]
aio: return "AIO in progress" state from qemu_aio_wait
The definition of when qemu_aio_flush should loop is much simpler
than it looks. It just has to call qemu_aio_wait until it makes
no progress and all flush callbacks return false. qemu_aio_wait
is the logical place to tell the caller about this.
Paolo Bonzini [Mon, 5 Mar 2012 08:10:35 +0000 (09:10 +0100)]
nbd: do not block in nbd_wr_sync if no data at all is available
Right now, nbd_wr_sync will hang if no data at all is available on the
socket and the other side is not going to provide any. Relax this by
making it loop only for writes or partial reads. This fixes a race
where one thread is executing qemu_aio_wait() and another is executing
main_loop_wait(). Then, the select() call in main_loop_wait() can return
stale data and call the "readable" callback with no data in the socket.
Paolo Bonzini [Mon, 5 Mar 2012 07:56:10 +0000 (08:56 +0100)]
nbd: consistently return negative errno values
In the next patch we need to look at the return code of nbd_wr_sync.
To avoid percolating the socket_error() ugliness all around, let's
handle errors by returning negative errno values.
Paolo Bonzini [Wed, 7 Mar 2012 10:25:01 +0000 (11:25 +0100)]
nbd: consistently use ssize_t
GCC (pedantically, but correctly) considers that a negative ssize_t may
become positive when casted to int. This may cause uninitialized variable
warnings when a function returns such a negative ssize_t and is inlined.
Propagate ssize_t return types to avoid this.
Paolo Bonzini [Thu, 12 Apr 2012 12:00:52 +0000 (14:00 +0200)]
qemu-tool: map vm_clock to rt_clock
QED uses vm_clock timers so that images are not touched during and after
migration. This however does not apply to qemu-io and qemu-img.
Treat vm_clock as a synonym for rt_clock there, and enable it.
Ronnie Sahlberg [Thu, 19 Apr 2012 10:41:17 +0000 (20:41 +1000)]
SCSI emulation: should tell the guest that we actually support thin provisioning
Signed-off-by: Ronnie Sahlberg <[email protected]>
[Actually, we should report it only if discard_granularity is nonzero.
Older SBC drafts assigned 0 to thin provisioning and 1 to thick
(resource-provisioned, they call it). Newer drafts assign respectively
1 and 2 - Paolo] Signed-off-by: Paolo Bonzini <[email protected]>
Kevin Wolf [Wed, 18 Apr 2012 14:27:06 +0000 (16:27 +0200)]
qcow2: Fix return value of alloc_refcount_block
Someone forgot something in commit 29c1a730... Documenting the right
return value is not enough, you also need to actually return it in the
code.
This bug sometimes causes error return values even when everything has
succeeded: The new offset of the refcount block is truncated to 32 bits
and interpreted as signed. At least with small cluster sizes it's easy
to get a negative return value this way.
Kevin Wolf [Wed, 18 Apr 2012 14:18:14 +0000 (16:18 +0200)]
qcow2: Fix error handling in qcow2_alloc_cluster_offset
If do_alloc_cluster_offset() fails, the error handling code tried to
remove the request from the in-flight queue, to which it wasn't added
yet, resulting in a NULL pointer dereference.
m->nb_clusters really only becomes != 0 when the request is in the list.
Stefan Hajnoczi [Thu, 29 Mar 2012 09:31:31 +0000 (10:31 +0100)]
ide: convert ide_sector_write() to asynchronous I/O
The IDE PIO write sector code path uses bdrv_write() and hence can make
the guest unresponsive while the I/O request is in progress. This patch
converts ide_sector_write() to use bdrv_aio_writev() by using the
BUSY_STAT bit to tell the guest that the request is in progress.
Stefan Hajnoczi [Thu, 29 Mar 2012 09:31:30 +0000 (10:31 +0100)]
ide: convert ide_sector_read() to asynchronous I/O
The IDE PIO interface currently uses bdrv_read() to perform reads
synchronously. Synchronous I/O in the vcpu thread is bad because it
prevents the guest from executing code - it makes the guest
unresponsive.
This patch converts IDE PIO to use bdrv_aio_readv(). We simply need to
use the BUSY_STAT status so the guest knows to wait while we are busy.
The only external user of ide_sector_read() is restart behavior on I/O
errors and it is not affected by this change. We still need to restart
I/O in the same way.
Migration is also unaffected if I understand the code correctly. We
continue to use the same transfer function and the BUSY_STAT status
should never be migrated since we flush I/O before migrating device
state.
Kevin Wolf [Wed, 11 Apr 2012 09:06:37 +0000 (11:06 +0200)]
block: Drain requests in bdrv_close
If an AIO request is in flight that refers to a BlockDriverState that
has been closed and possibly even freed, more or less anything could
happen. I have seen segfaults, -EBADF return values and qcow2 sometimes
actually catches the situation in bdrv_close() and abort()s.
Kevin Wolf [Wed, 11 Apr 2012 08:45:51 +0000 (10:45 +0200)]
qemu-iotests: Test bdrv_close while AIO is in flight
If the BlockDriverState is closed/freed without draining the AIO
requests first, the request coroutines may work on invalid data and file
descriptors or have some dangling pointers that cause segfaults.
Kevin Wolf [Wed, 11 Apr 2012 09:21:25 +0000 (11:21 +0200)]
qemu-iotests: Always filter cluster_size out in _make_test_img
Some image formats do have a cluster size, others don't, but there are
tests that work with both sets of images and currently we get failures
because the qemu-img create output doesn't mention the cluster size for
some formats.
Paolo Bonzini [Thu, 19 Apr 2012 08:10:54 +0000 (10:10 +0200)]
scsi: add support for FUA on writes
To force unit access, add a flush operation after the actual write.
WRITE AND VERIFY commands always flush according to SBC, so do it
even though we do not perform the reread.
Anthony Liguori [Wed, 18 Apr 2012 15:05:58 +0000 (10:05 -0500)]
Merge remote-tracking branch 'origin/master' into staging
* origin/master:
Allow controlling volume with PulseAudio backend
configure: pa_simple is not needed anymore
Do not use pa_simple PulseAudio API
audio/spice: add support for volume control
hw/ac97: add support for volume control
hw/ac97: the volume mask is not only 0x1f
hw/ac97: remove USE_MIXER code
audio: don't apply volume effect if backend has VOICE_VOLUME_CAP
audio: add VOICE_VOLUME ctl
Language keywords cannot be used as argument names. The DTrace backend
appends an underscore to the argument name in order to make the argument
name legal.
This patch adds 'in', 'next', and 'self' keywords to dtrace.py.
Also drop the unnecessary argument name lstrip() call. The
Arguments.build() method already ensures there is no space around
argument names. Furthermore it is misleading to do the lstrip() *after*
checking against keywords because the keyword check would not match if
spaces were in the name.
tracetool: Rewrite infrastructure as python modules
The tracetool script is written in shell and has hit several portability
problems due to shell quirks or external tools across host platforms.
Additionally the amount of string processing and lack of real data
structures makes it tough to implement code generator backends for
tracers that are more complex.
This patch replaces the shell version of tracetool with a Python
version. The new tracetool design is:
scripts/tracetool.py - top-level script
scripts/tracetool/backend/ - tracer backends live here (simple, ust)
scripts/tracetool/format/ - output formats live here (.c, .h)
There is common code for trace-events definition parsing so that
backends can focus on generating code rather than parsing input.
Support for all existing backends (nop, stderr, simple, ust,
and dtrace) is added back in follow-up patches.
Anthony Liguori [Wed, 18 Apr 2012 12:55:56 +0000 (07:55 -0500)]
Merge remote-tracking branch 'kraxel/usb.46' into staging
* kraxel/usb.46: (21 commits)
usb-ehci: drop assert()
usb-redir: Notify our peer when we reject a device due to a speed mismatch
usb-ehci: Drop unused sofv value
usb-host: rewrite usb_linux_update_endp_table
usb: use USBDescriptor for endpoint descriptors.
usb: use USBDescriptor for interface descriptors.
usb: use USBDescriptor for config descriptors.
usb: use USBDescriptor for device qualifier descriptors.
usb: add USBDescriptor, use for device descriptors.
usb-ehci: frindex always is a 14 bits counter
usb-ehci: fix ehci_child_detach
usb-hub: add tracepoints
usb_packet_set_state: handle p->ep == NULL
usb-host: add property to turn off pipelining
usb-host: add usb packet to request tracepoints
usb-host: trace canceled requests
usb-host: trace emulated requests
Add bootindex support to usb-host and usb-redir
usb-uhci: queuing fix
usb-uhci: stop queue filling when we find a in-flight td
...
Alon Levy [Wed, 18 Apr 2012 09:27:00 +0000 (12:27 +0300)]
qxl-render: fix broken vnc+spice since commit f934493
Notify any listeners such as vnc that the displaysurface has been
changed, otherwise they will segfault when first accessing the freed old
displaysurface data.
Alon Levy [Thu, 29 Mar 2012 20:24:38 +0000 (22:24 +0200)]
qxl: set default values of vram*_size_mb to -1
The addition of those values caused a regression where not specifying
any value for the vram bar size would result in a 4096 _byte_ surface
area. This is ok for the windows driver but causes the X driver to be
unusable. Also, it's a regression. This patch returns the default
behavior of having a 64 megabyte vram BAR.
audio: don't apply volume effect if backend has VOICE_VOLUME_CAP
If the audio backend is capable of volume control, don't apply
software volume (mixeng_volume ()), but instead, rely on backend
volume control. This will allow guest to have full range volume
control.
Gerd Hoffmann [Fri, 30 Mar 2012 11:20:21 +0000 (13:20 +0200)]
usb-ehci: drop assert()
Not sure what the purpose of the assert() was, in any case it is bogous.
We can arrive there if transfer descriptors passed to us from the guest
failed to pass sanity checks, i.e. it is guest-triggerable. We deal
with that case by resetting the host controller. Everything is ok, no
need to throw a core dump here.
Gerd Hoffmann [Thu, 29 Mar 2012 14:06:28 +0000 (16:06 +0200)]
usb-host: rewrite usb_linux_update_endp_table
This patch carries a complete rewrite of the usb descriptor parser.
Changes / improvements:
* We are using the USBDescriptor struct instead of hard-coded offsets
now to access descriptor data.
* (debug) printfs are all gone, tracepoints have been added instead.
* We don't try (and fail) to skip over unneeded descriptors. We parse
them all one by one. We keep track of which configuration, interface
and altsetting we are looking at and use this information to figure
which desciptors are in use and which we can ignore.
* On parse errors we clear all endpoint information, which will
disallow any communication with the device, except control endpoint
messages. This makes sure we don't end up with a silly device state
where half of the endpoints got enabled and the other half was left
disabled.
* Some sanity checks have been added.
The new parser is more robust and also leaves complete device
information in the trace log if you enable the ush_host_parse_*
tracepoints.
Gerd Hoffmann [Thu, 29 Mar 2012 10:04:54 +0000 (12:04 +0200)]
usb: add USBDescriptor, use for device descriptors.
This patch adds a new type for the binary representation of usb
descriptors. It is put into use for the descriptor generator code
where the struct replaces the hard-coded offsets.
Hans de Goede [Wed, 28 Mar 2012 18:47:51 +0000 (20:47 +0200)]
usb-ehci: frindex always is a 14 bits counter
frindex always is a 14 bits counter, and not a 13 bits one as we were
emulating. There are some subtle hints to this in the spec, first of all
"Table 2-12. FRINDEX - Frame Index Register" says:
"Bit 13:0 Frame Index. The value in this register increments at the end of
each time frame (e.g. micro-frame). Bits [N:3] are used for the Frame List
current index. This means that each location of the frame list is accessed
8 times (frames or micro-frames) before moving to the next index. The
following illustrates values of N based on the value of the Frame List
Size field in the USBCMD register.
USBCMD[Frame List Size] Number Elements N
00b 1024 12
01b 512 11
10b 256 10
11b Reserved"
Notice how the text talks about "Bits [N:3]" are used ..., it does
NOT say that when N == 12 (our case) the counter will wrap from 8191 to 0,
or in otherwords that it is a 13 bits counter (bits 0 - 12).
The other hint is in "Table 2-10. USBSTS USB Status Register Bit Definitions":
"Bit 3 Frame List Rollover - R/WC. The Host Controller sets this bit to a one
when the Frame List Index (see Section 2.3.4) rolls over from its maximum value
to zero. The exact value at which the rollover occurs depends on the frame
list size. For example, if the frame list size (as programmed in the Frame
List Size field of the USBCMD register) is 1024, the Frame Index Register
rolls over every time FRINDEX[13] toggles. Similarly, if the size is 512,
the Host Controller sets this bit to a one every time FRINDEX[12] toggles."
Notice how this text talks about setting bit 3 when bit 13 of frindex toggles
(when there are 1024 entries, so our case), so this indicates that frindex
has a bit 13 making it a 14 bit counter.
Besides these clear hints the real proof is in the pudding. Before this
patch I could not stream data from a USB2 webcam under Windows XP, after
this cam using a USB2 webcam under Windows XP works fine, and no regressions
with other operating systems were seen.
Gerd Hoffmann [Fri, 23 Mar 2012 14:43:45 +0000 (15:43 +0100)]
usb-ehci: fix ehci_child_detach
Looks like a cut+paste bug from ehci_detach. When the device itself is
detached from a ehci port (ehci_detach op) we have to clear the
device pointer for the companion port too. When a device gets removed
from a downstream port of a usb hub (ehci_child_detach op) the ehci port
where the usb hub is plugged in is not affected.
Gerd Hoffmann [Fri, 23 Mar 2012 12:34:50 +0000 (13:34 +0100)]
usb_packet_set_state: handle p->ep == NULL
usb_packet_set_state can be called with p->ep = NULL. The tracepoint
there tries to log endpoint information, which leads to a segfault.
This patch makes usb_packet_set_state handle the NULL pointer properly.