Paolo Bonzini [Tue, 22 Jan 2013 08:03:12 +0000 (09:03 +0100)]
mirror: switch mirror_iteration to AIO
There is really no change in the behavior of the job here, since
there is still a maximum of one in-flight I/O operation between
the source and the target. However, this patch already introduces
the AIO callbacks (which are unmodified in the next patch)
and some of the logic to count in-flight operations and only
complete the job when there is none.
Paolo Bonzini [Mon, 21 Jan 2013 16:09:46 +0000 (17:09 +0100)]
mirror: allow customizing the granularity
The desired granularity may be very different depending on the kind of
operation (e.g. continuous replication vs. collapse-to-raw) and whether
the VM is expected to perform lots of I/O while mirroring is in progress.
Allow the user to customize it, while providing a sane default so that
in general there will be no extra allocated space in the target compared
to the source.
Paolo Bonzini [Mon, 21 Jan 2013 16:09:43 +0000 (17:09 +0100)]
mirror: perform COW if the cluster size is bigger than the granularity
When mirroring runs, the backing files for the target may not yet be
ready. However, this means that a copy-on-write operation on the target
would fill the missing sectors with zeros. Copy-on-write only happens
if the granularity of the dirty bitmap is smaller than the cluster size
(and only for clusters that are allocated in the source after the job
has started copying). So far, the granularity was fixed to 1MB; to avoid
the problem we detected the situation and required the backing files to
be available in that case only.
However, we want to lower the granularity for efficiency, so we need
a better solution. The solution is to always copy a whole cluster the
first time it is touched. The code keeps a bitmap of clusters that
have already been allocated by the mirroring job, and only does "manual"
copy-on-write if the chunk being copied is zero in the bitmap.
Paolo Bonzini [Mon, 21 Jan 2013 16:09:40 +0000 (17:09 +0100)]
add hierarchical bitmap data type and test cases
HBitmaps provides an array of bits. The bits are stored as usual in an
array of unsigned longs, but HBitmap is also optimized to provide fast
iteration over set bits; going from one bit to the next is O(logB n)
worst case, with B = sizeof(long) * CHAR_BIT: the result is low enough
that the number of levels is in fact fixed.
In order to do this, it stacks multiple bitmaps with progressively coarser
granularity; in all levels except the last, bit N is set iff the N-th
unsigned long is nonzero in the immediately next level. When iteration
completes on the last level it can examine the 2nd-last level to quickly
skip entire words, and even do so recursively to skip blocks of 64 words or
powers thereof (32 on 32-bit machines).
Given an index in the bitmap, it can be split in group of bits like
this (for the 64-bit case):
bits 0-57 => word in the last bitmap | bits 58-63 => bit in the word
bits 0-51 => word in the 2nd-last bitmap | bits 52-57 => bit in the word
bits 0-45 => word in the 3rd-last bitmap | bits 46-51 => bit in the word
So it is easy to move up simply by shifting the index right by
log2(BITS_PER_LONG) bits. To move down, you shift the index left
similarly, and add the word index within the group. Iteration uses
ffs (find first set bit) to find the next word to examine; this
operation can be done in constant time in most current architectures.
Setting or clearing a range of m bits on all levels, the work to perform
is O(m + m/W + m/W^2 + ...), which is O(m) like on a regular bitmap.
When iterating on a bitmap, each bit (on any level) is only visited
once. Hence, The total cost of visiting a bitmap with m bits in it is
the number of bits that are set in all bitmaps. Unless the bitmap is
extremely sparse, this is also O(m + m/W + m/W^2 + ...), so the amortized
cost of advancing from one bit to the next is usually constant.
Anthony Liguori [Thu, 24 Jan 2013 18:56:02 +0000 (12:56 -0600)]
Merge remote-tracking branch 'bonzini/scsi-next' into staging
# By Paolo Bonzini (1) and Peter Lieven (1)
# Via Paolo Bonzini
* bonzini/scsi-next:
iscsi: add support for iovectors
iscsi: do not leak acb->buf when commands are aborted
I'm not sure if the retry logic has ever worked when not using FIFO mode. I
found this while writing a test case although code inspection confirms it is
definitely broken.
The TSR retry logic will never actually happen because it is guarded by an
'if (s->tsr_rety > 0)' but this is the only place that can ever make the
variable greater than zero. That effectively makes the retry logic an 'if (0)
I believe this is a typo and the intention was >= 0. Once this is fixed thoug
I see double transmits with my test case. This is because in the non FIFO
case, serial_xmit may get invoked while LSR.THRE is still high because the
character was processed but the retransmit timer was still active.
We can handle this by simply checking for LSR.THRE and returning early. It's
possible that the FIFO paths also need some attention.
Cc: Stefano Stabellini <[email protected]> Signed-off-by: Anthony Liguori <[email protected]>
Even if the previous logic was never worked, new logic breaks stuff -
namely,
Paolo Bonzini [Tue, 22 Jan 2013 16:34:29 +0000 (17:34 +0100)]
iscsi: do not leak acb->buf when commands are aborted
acb->buf is freed in the WRITE(16) callback, but this may not
get called at all when commands are aborted. Add another
free in the ABORT TASK callback, which requires setting acb->buf
to NULL everywhere.
Grant Likely [Wed, 23 Jan 2013 16:15:25 +0000 (16:15 +0000)]
trivial: etraxfs_eth: Eliminate checkpatch errors
This is a trivial patch to harmonize the coding style on
hw/etraxfs_eth.c. This is in preparation to split off the bitbang mdio
code into a separate file.
Anthony Liguori [Wed, 23 Jan 2013 15:08:54 +0000 (09:08 -0600)]
Merge remote-tracking branch 'bonzini/scsi-next' into staging
# By Peter Lieven (3) and others
# Via Paolo Bonzini
* bonzini/scsi-next:
scsi: Drop useless null test in scsi_unit_attention()
lsi: use qbus_reset_all to reset SCSI bus
scsi: fix segfault with 0-byte disk
iscsi: add support for iSCSI NOPs [v2]
iscsi: partly avoid iovec linearization in iscsi_aio_writev
iscsi: add iscsi_create support
scsi: Drop useless null test in scsi_unit_attention()
req was created by scsi_req_alloc(), which initializes req->dev to a
value it dereferences. req->dev isn't changed anywhere else.
Therefore, req->dev can't be null.
Paolo Bonzini [Thu, 10 Jan 2013 14:08:05 +0000 (15:08 +0100)]
scsi: fix segfault with 0-byte disk
When a 0-sized disk is found, READ CAPACITY will return a
LUN NOT READY error. However, because it returns -1 instead
of zero, the HBA will call scsi_req_continue. This will
typically cause a segmentation fault or an assertion failure.
Peter Lieven [Thu, 6 Dec 2012 09:46:47 +0000 (10:46 +0100)]
iscsi: add support for iSCSI NOPs [v2]
This patch will send NOP-Out PDUs every 5 seconds to the iSCSI target.
If a consecutive number of NOP-In replies fail a reconnect is initiated.
iSCSI NOPs help to ensure that the connection to the target is still operational.
This should not, but in reality may be the case even if the TCP connection is still
alive if there are bugs in either the target or the initiator implementation.
v2:
- track the NOPs inside libiscsi so libiscsi can reset the counter
in case it initiates a reconnect.
Peter Lieven [Mon, 19 Nov 2012 14:58:31 +0000 (15:58 +0100)]
iscsi: partly avoid iovec linearization in iscsi_aio_writev
libiscsi expects all write16 data in a linear buffer. If the
iovec only contains one buffer we can skip the linearization
step as well as the additional malloc/free and pass the
buffer directly.
Peter Lieven [Sat, 17 Nov 2012 15:13:24 +0000 (16:13 +0100)]
iscsi: add iscsi_create support
This patch adds support for bdrv_create. This allows e.g.
to use qemu-img to convert from any supported device to
an iscsi backed storage as destination.
Alon Levy [Mon, 21 Jan 2013 12:48:07 +0000 (14:48 +0200)]
qxl: change rom size to 8192
This is a simpler solution to 869981, where migration breaks since qxl's
rom bar size has changed. Instead of ignoring fields in QXLRom, which is what has
actually changed, we remove some of the modes, a mechanism already
accounted for by the guest. The modes left allow for portrait and
landscape only modes, corresponding to orientations 0 and 1.
Orientations 2 and 3 are dropped.
Added assert so that rom size will fit the future QXLRom increases via
spice-protocol changes.
This patch has been tested with 6.1.0.10015. With the newer 6.1.0.10016
there are problems with both "(flipped)" modes prior to the patch, and
the patch loses the ability to set "Portrait" modes. But this is a
separate bug to be fixed in the driver, and besides the patch doesn't
affect the new arbitrary mode setting functionality.
Test isn't useless. scsi_req_enqueue() may finish the request (will
actually happen for requests which don't trigger any I/O such as
INQUIRY), then call usb_msd_command_complete() which in turn will
set s->req to NULL after unref'ing it.
Tim Hardeck [Mon, 21 Jan 2013 10:04:45 +0000 (11:04 +0100)]
vnc: fix possible uninitialized removals
Some VncState values are not initialized before the Websocket handshake.
If it fails QEMU segfaults during the cleanup. To prevent this behavior
intialization checks are added.
Tim Hardeck [Mon, 21 Jan 2013 10:04:44 +0000 (11:04 +0100)]
vnc: added initial websocket protocol support
This patch adds basic Websocket Protocol version 13 - RFC 6455 - support
to QEMU VNC. Binary encoding support on the client side is mandatory.
Because of the GnuTLS requirement the Websockets implementation is
optional (--enable-vnc-ws).
To activate Websocket support the VNC option "websocket"is used, for
example "-vnc :0,websocket".
The listen port for Websocket connections is (5700 + display) so if
QEMU VNC is started with :0 the Websocket port would be 5700.
As an alternative the Websocket port could be manually specified by
using ",websocket=<port>" instead.
Parts of the implementation base on Anthony Liguori's QEMU Websocket
patch from 2010 and on Joel Martin's LibVNC Websocket implementation.
Tim Hardeck [Mon, 21 Jan 2013 10:04:43 +0000 (11:04 +0100)]
vnc: added buffer_advance function
Following Anthony Liguori's Websocket implementation I have added the
buffer_advance function to VNC and replaced all related buffer memmove
operations with it.
Anthony Liguori [Mon, 21 Jan 2013 19:22:43 +0000 (13:22 -0600)]
Merge remote-tracking branch 'quintela/thread.next' into staging
# By Juan Quintela (7) and Paolo Bonzini (6)
# Via Juan Quintela
* quintela/thread.next:
migration: remove argument to qemu_savevm_state_cancel
migration: Only go to the iterate stage if there is anything to send
migration: unfold rest of migrate_fd_put_ready() into thread
migration: move exit condition to migration thread
migration: Add buffered_flush error handling
migration: move beginning stage to the migration thread
qemu-file: Only set last_error if it is not already set
migration: fix off-by-one in buffered_rate_limit
migration: remove double call to migrate_fd_close
migration: make function static
use XFER_LIMIT_RATIO consistently
Protect migration_bitmap_sync() with the ramlist lock
Unlock ramlist lock also in error case
Anthony Liguori [Mon, 21 Jan 2013 13:32:22 +0000 (07:32 -0600)]
Merge remote-tracking branch 'stefanha/trivial-patches' into staging
# By Stefan Weil (2) and others
# Via Stefan Hajnoczi
* stefanha/trivial-patches:
hw/tpci200: Fix compiler warning (redefined symbol with MinGW)
configure: silence pkg-config's check for curses
acpitable: open the data file in binary mode
hw: Spelling fix in log message
Stefan Weil [Mon, 21 Jan 2013 06:49:51 +0000 (07:49 +0100)]
hw/tpci200: Fix compiler warning (redefined symbol with MinGW)
STATUS_TIMEOUT is defined in winnt.h:
CC hw/tpci200.o
hw/tpci200.c:34:0:
warning: "STATUS_TIMEOUT" redefined [enabled by default]
/usr/lib/gcc/x86_64-w64-mingw32/4.6/../../../../x86_64-w64-mingw32/include/winnt.h:1036:0:
note: this is the location of the previous definition
Use STATUS_TIME instead of STATUS_TIMEOUT as suggested by Alberto Garcia.
Michael Tokarev [Thu, 17 Jan 2013 10:53:52 +0000 (14:53 +0400)]
acpitable: open the data file in binary mode
-acpitable {file|data}=file reads the content of file, but it is
in binary form, so the file should be opened usin O_BINARY flag.
On *nix it is a no-op, but on windows and other weird platform
it is really needed.
Anthony Liguori [Sun, 20 Jan 2013 17:01:10 +0000 (11:01 -0600)]
Merge remote-tracking branch 'stefanha/block' into staging
# By Kevin Wolf (4) and others
# Via Stefan Hajnoczi
* stefanha/block:
dataplane: support viostor virtio-pci status bit setting
dataplane: avoid reentrancy during virtio_blk_data_plane_stop()
win32-aio: use iov utility functions instead of open-coding them
win32-aio: Fix memory leak
win32-aio: Fix vectored reads
aio: Fix return value of aio_poll()
ide: Remove wrong assertion
block: fix null-pointer bug on error case in block commit
Stefan Weil [Sat, 19 Jan 2013 19:23:51 +0000 (20:23 +0100)]
tci: Fix broken build (regression)
s390x-linux-user now also uses GETPC. Instead of adding it to the list of
targets which use GETPC, the macro is now defined unconditionally.
This avoids future build regressions like this one:
CC s390x-linux-user/target-s390x/int_helper.o
cc1: warnings being treated as errors
qemu/target-s390x/int_helper.c: In function ‘helper_divs32’:
qemu/target-s390x/int_helper.c:47: error: implicit declaration of function ‘GETPC’
qemu/target-s390x/int_helper.c:47: error: nested extern declaration of ‘GETPC’
Andreas Färber [Fri, 18 Jan 2013 18:30:13 +0000 (19:30 +0100)]
cpu-defs.h: Drop qemu_work_item prototype
Commit c64ca8140e9c21cd0d44c10fbe1247cb4ade8e6e (cpu: Move
queued_work_{first,last} to CPUState) moved the qemu_work_item fields
away. Clean up the now unused prototype.
Peter Maydell [Thu, 17 Jan 2013 20:04:16 +0000 (20:04 +0000)]
tcg/target-arm: Add missing parens to assertions
Silence a (legitimate) complaint about missing parentheses:
tcg/arm/tcg-target.c: In function ‘tcg_out_qemu_ld’:
tcg/arm/tcg-target.c:1148:5: error: suggest parentheses around
comparison in operand of ‘&’ [-Werror=parentheses]
tcg/arm/tcg-target.c: In function ‘tcg_out_qemu_st’:
tcg/arm/tcg-target.c:1357:5: error: suggest parentheses around
comparison in operand of ‘&’ [-Werror=parentheses]
which meant that we would mistakenly always assert if running
a QEMU built with debug enabled on ARM.
fw_cfg: Use void *, size_t instead of uint8_t *, uint32_t for blobs
Many callers pass size_t, which gets silently truncated to uint32_t.
Harmless, because all practical sizes are well below 4GiB. Clean it
up anyway. Size overflow now fails assertions.
Paolo Bonzini [Fri, 11 Jan 2013 23:42:53 +0000 (15:42 -0800)]
optimize: optimize using nonzero bits
This adds two optimizations using the non-zero bit mask. In some cases
involving shifts or ANDs the value can become zero, and can thus be
optimized to a move of zero. Second, useless zero-extension or an
AND with constant can be detected that would only zero bits that are
already zero.
The main advantage of this optimization is that it turns zero-extensions
into moves, thus enabling much better copy propagation (around 1% code
reduction). Here is for example a "test $0xff0000,%ecx + je" before
optimization:
In some cases TCG even outsmarts GCC. :) Here the input code has
"and $0x2,%eax + movslq %eax,%rbx + test %rbx, %rbx" and the optimizer,
thanks to copy propagation, does the following:
Paolo Bonzini [Fri, 11 Jan 2013 23:42:52 +0000 (15:42 -0800)]
optimize: track nonzero bits of registers
Add a "mask" field to the tcg_temp_info struct. A bit that is zero
in "mask" will always be zero in the corresponding temporary.
Zero bits in the mask can be produced from moves of immediates,
zero-extensions, ANDs with constants, shifts; they can then be
be propagated by logical operations, shifts, sign-extensions,
negations, deposit operations, and conditional moves. Other
operations will just reset the mask to all-ones, i.e. unknown.
Paolo Bonzini [Fri, 11 Jan 2013 23:42:51 +0000 (15:42 -0800)]
optimize: only write to state when clearing optimizer data
The next patch will add to the TCG optimizer a field that should be
non-zero in the default case. Thus, replace the memset of the
temps array with a loop. Only the state field has to be up-to-date,
because others are not used except if the state is TCG_TEMP_COPY
or TCG_TEMP_CONST.
Blue Swirl [Sat, 19 Jan 2013 09:56:41 +0000 (09:56 +0000)]
Merge branch 'ppc-for-upstream' of git://repo.or.cz/qemu/agraf
* 'ppc-for-upstream' of git://repo.or.cz/qemu/agraf:
PPC: KVM: Add support for EPR with KVM
openpic: export e500 epr enable into a ppc.c function
Update Linux kernel headers
PPC: e500: Change in-memory order of load blobs
PPC: Provide zero SVR for -cpu e500mc and e5500
PPC: E500: Calculate loading blob offsets properly
openpic: set mixed mode as supported
openpic: unify gcr mode mask updates
openpic: move gcr write into a function
Blue Swirl [Sat, 19 Jan 2013 09:55:46 +0000 (09:55 +0000)]
Merge branch 's390-for-upstream' of git://repo.or.cz/qemu/agraf
* 's390-for-upstream' of git://repo.or.cz/qemu/agraf:
s390: Add a hypercall registration interface.
target-s390x: Unregister reset callback on finalization
s390x: fix indentation
s390: Add CPU reset handler
s390x: Remove inline function ebcdic_put and related data from cpu.h
S390: Enable -cpu help and QMP query-cpu-definitions
s390: Move IPL code into a separate device
s390: new contributions GPLv2 or later
Stefan Weil [Tue, 1 Jan 2013 08:24:55 +0000 (08:24 +0000)]
s390x: Remove inline function ebcdic_put and related data from cpu.h
The function is only used in misc_helper.c, so move it to that file.
This reduces the size of debug executables (compiled without optimization)
because they get unused code and data for each compilation which includes
cpu.h.
Executables with optimization don't change their size.
ebcdic2ascii is currently unused and could be removed (not done here).
The array ascii2ebcdic must be accessed with an unsigned index, therefore
(int)ascii[i] was replaced by (uint8_t)ascii[i]. The old code would have
failed for a signed char less than 0. The current code only converts
"QEMU" and spaces to EBCDIC, so there is no problem today.
S390: Enable -cpu help and QMP query-cpu-definitions
This enables qemu -cpu help to return a list of supported CPU models
on s390 and also to query for cpu definitions in the monitor.
Initially only cpu model = host is returned. This needs to be reworked
into a full-fledged CPU model handling later on.
This change is needed to allow libvirt exploiters (like OpenStack)
to specify a CPU model.
Lets move the code to setup IPL for external kernel
or via the zipl rom into a separate file. This allows to
- define a reboot handler, setting up the PSW appropriately
- enhance the boot code to IPL disks that contain a bootmap that
was created with zipl under LPAR or z/VM (future patch)
- reuse that code for several machines (e.g. virtio-ccw and virtio-s390)
- allow different machines to provide different defaults
Signed-off-by: Christian Borntraeger <[email protected]> Signed-off-by: Jens Freimann <[email protected]>
[agraf: symbolify initial psw, adjust header file location, fix for QOM] Signed-off-by: Alexander Graf <[email protected]>
IBMs s390 contributions were meant to to be gplv2 or later (since
we were contributing to qemu). Several of the s390 specific files
link to gpl code anyway, so lets clarify the licence statement for
new contributions for those files that we have touched multiple
times or will likely touch again.
This patch does not touch files that mostly deal with tcg.
Alexander Graf [Thu, 17 Jan 2013 10:32:21 +0000 (11:32 +0100)]
openpic: export e500 epr enable into a ppc.c function
Enabling and disabling the EPR capability (mpic_proxy) is a system
wide operation. As such, it belongs into the ppc.c file, since that's
where PPC specific machine wide logic happens.
Alexander Graf [Wed, 16 Jan 2013 00:43:43 +0000 (01:43 +0100)]
PPC: Provide zero SVR for -cpu e500mc and e5500
Even though our -cpu types for e500mc and e5500 are no real CPUs that
actually have version registers, a guest might still want to access
said version register and that has to succeed for a guest to be happy.
So let's expose a zero SVR value on E500_SVR SPR reads.
We have 3 blobs we need to load when booting the system:
- kernel
- initrd
- dtb
We place them in physical memory in that order. At least we should.
This patch fixes the location calculation up to take any module into
account, fixing the dtb offset along the way.
Alexander Graf [Mon, 7 Jan 2013 19:17:24 +0000 (20:17 +0100)]
openpic: set mixed mode as supported
The Raven MPIC implementation supports the "Mixed" mode to work with
an i8259. While we don't implement mixed mode, we should mark it as
a supported mode in the mode bitmap.
Stefan Hajnoczi [Thu, 17 Jan 2013 15:46:54 +0000 (16:46 +0100)]
dataplane: support viostor virtio-pci status bit setting
The viostor virtio-blk driver for Windows does not use the
VIRTIO_CONFIG_S_DRIVER bit. It only sets the VIRTIO_CONFIG_S_DRIVER_OK
bit.
The viostor driver refreshes the virtio-pci status byte sometimes while
the guest is running. We misinterpret 0x4 (VIRTIO_CONFIG_S_DRIVER_OK)
as an indication that virtio-blk-data-plane should be stopped since 0x2
(VIRTIO_CONFIG_S_DRIVER) is missing. The result is that the device
becomes unresponsive.
Stefan Hajnoczi [Tue, 15 Jan 2013 16:19:38 +0000 (17:19 +0100)]
dataplane: avoid reentrancy during virtio_blk_data_plane_stop()
When dataplane is stopping, the s->vdev->binding->set_host_notifier(...,
false) call can invoke the virtqueue handler if an ioeventfd
notification is pending. This causes hw/virtio-blk.c to invoke
virtio_blk_data_plane_start() before virtio_blk_data_plane_stop()
returns!
The result is that we try to restart dataplane while trying to stop it
and the following assertion is raised:
Although the code was intended to prevent this scenario, the s->started
boolean isn't enough. Add s->stopping so that we can postpone clearing
s->started until we've completely stopped dataplane.
This way, virtqueue handler calls during virtio_blk_data_plane_stop()
are ignored. When dataplane is legitimately started again later we
already self-kick ourselves to resume processing.
Anthony Liguori [Thu, 17 Jan 2013 19:09:57 +0000 (13:09 -0600)]
Merge remote-tracking branch 'luiz/queue/qmp' into staging
# By Wenchao Xia
# Via Luiz Capitulino
* luiz/queue/qmp:
HMP: add sub command table to info
HMP: move define of mon_cmds
HMP: add infrastructure for sub command
HMP: delete info handler
HMP: add QDict to info callback handler
Stefan Hajnoczi [Tue, 15 Jan 2013 07:47:26 +0000 (08:47 +0100)]
Makefile: drop recursive libcacard clean
Commit eb8eb53e5846a957cf333f2e1ec8cb6e0c04 ("libcacard: rewrite
Makefile in non-recursive style") refactored libcacard/Makefile so it
can be included by the top-level Makefile.
The top-level clean target still loops over subdirectories, including
libcacard/, to invoke recursive clean. Remove libcacard from the
recursive clean since its files are already included at the top level.