Peter Maydell [Fri, 29 Aug 2014 17:40:04 +0000 (18:40 +0100)]
Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
Block pull request
# gpg: Signature made Fri 29 Aug 2014 17:25:58 BST using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg: aka "Stefan Hajnoczi <[email protected]>"
* remotes/stefanha/tags/block-pull-request: (35 commits)
quorum: Fix leak of opts in quorum_open
blkverify: Fix leak of opts in blkverify_open
nfs: Fix leak of opts in nfs_file_open
curl: Don't deref NULL pointer in call to aio_poll.
curl: Allow a cookie or cookies to be sent with http/https requests.
virtio-blk: allow drive_del with dataplane
block: acquire AioContext in do_drive_del()
linux-aio: avoid deadlock in nested aio_poll() calls
qemu-iotests: add multiwrite test cases
block: fix overlapping multiwrite requests
nbd: Follow the BDS' AIO context
block: Add AIO context notifiers
nbd: Drop nbd_can_read()
sheepdog: fix a core dump while do auto-reconnecting
aio-win32: add support for sockets
qemu-coroutine-io: fix for Win32
AioContext: introduce aio_prepare
aio-win32: add aio_set_dispatching optimization
test-aio: test timers on Windows too
AioContext: export and use aio_dispatch
...
if (!state) {
- qemu_aio_wait();
+ aio_poll(state->s->aio_context, true);
}
The new code now checks if state is NULL and then dereferences it
('state->s') which is obviously incorrect.
This commit replaces state->s->aio_context with
bdrv_get_aio_context(bs), fixing this problem. The two other hunks
are concerned with getting the BlockDriverState pointer bs to where it
is needed.
The original bug causes a segfault when using libguestfs to access a
VMware vCenter Server and doing any kind of complex read-heavy
operations. With this commit the segfault goes away.
curl: Allow a cookie or cookies to be sent with http/https requests.
In order to access VMware ESX efficiently, we need to send a session
cookie. This patch is very simple and just allows you to send that
session cookie. It punts on the question of how you get the session
cookie in the first place, but in practice you can just run a `curl'
command against the server and extract the cookie that way.
To use it, add file.cookie to the curl URL. For example:
$ qemu-img info 'json: {
"file.driver":"https",
"file.url":"https://vcenter/folder/Windows%202003/Windows%202003-flat.vmdk?dcPath=Datacenter&dsName=datastore1",
"file.sslverify":"off",
"file.cookie":"vmware_soap_session=\"52a01262-bf93-ccce-d379-8dabb3e55560\""}'
image: [...]
file format: raw
virtual size: 8.0G (8589934592 bytes)
disk size: unavailable
Stefan Hajnoczi [Mon, 18 Aug 2014 15:07:13 +0000 (16:07 +0100)]
virtio-blk: allow drive_del with dataplane
Now that drive_del acquires the AioContext we can safely allow deleting
the drive. As with non-dataplane mode, all I/Os submitted by the guest
after drive_del will return EIO.
This patch makes hot unplug work with virtio-blk dataplane. Previously
drive_del reported an error because the device was busy.
Stefan Hajnoczi [Mon, 18 Aug 2014 15:07:12 +0000 (16:07 +0100)]
block: acquire AioContext in do_drive_del()
Make drive_del safe for dataplane where another thread may be running
the BlockDriverState's AioContext.
Note the assumption that AioContext's lifetime exceeds DriveInfo and
BlockDriverState. We release AioContext after DriveInfo and
BlockDriverState are potentially freed.
This is clearly safe with the global AioContext but also with -object
iothread and implicit iothreads created by -device
virtio-blk-pci,x-data-plane=on (their lifetime is tied to DeviceState,
not BlockDriverState).
Stefan Hajnoczi [Mon, 4 Aug 2014 15:56:33 +0000 (16:56 +0100)]
linux-aio: avoid deadlock in nested aio_poll() calls
If two Linux AIO request completions are fetched in the same
io_getevents() call, QEMU will deadlock if request A's callback waits
for request B to complete using an aio_poll() loop. This was reported
to happen with the mirror blockjob.
This patch moves completion processing into a BH and makes it resumable.
Nested event loops can resume completion processing so that request B
will complete and the deadlock will not occur.
Peter Maydell [Fri, 29 Aug 2014 14:48:15 +0000 (15:48 +0100)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20140829' into staging
target-arm queue:
* support PMCCNTR in ARMv8
* various GIC fixes and cleanups
* Correct Cortex-A57 ISAR5 and AA64ISAR0 ID register values
* Fix regression that disabled VFP for ARMv5 CPUs
* Update to upstream VIXL 1.5
# gpg: Signature made Fri 29 Aug 2014 15:34:47 BST using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <[email protected]>"
* remotes/pmaydell/tags/pull-target-arm-20140829:
target-arm: Implement pmccfiltr_write function
target-arm: Remove old code and replace with new functions
target-arm: Implement pmccntr_sync function
target-arm: Add arm_ccnt_enabled function
target-arm: Implement PMCCNTR_EL0 and related registers
arm: Implement PMCCNTR 32b read-modify-write
target-arm: Make the ARM PMCCNTR register 64-bit
hw/intc/arm_gic: honor target mask in gic_update()
aarch64: raise max_cpus to 8
arm_gic: Use GIC_NR_SGIS constant
arm_gic: Do not force PPIs to edge-triggered mode
arm_gic: GICD_ICFGR: Write model only for pre v1 GICs
arm_gic: Fix read of GICD_ICFGR
target-arm: Correct Cortex-A57 ISAR5 and AA64ISAR0 ID register values
target-arm: Fix regression that disabled VFP for ARMv5 CPUs
disas/libvixl: Update to upstream VIXL 1.5
Alistair Francis [Fri, 29 Aug 2014 14:00:29 +0000 (15:00 +0100)]
target-arm: Implement pmccntr_sync function
This is used to synchronise the PMCCNTR counter and swap its
state between enabled and disabled if required. It must always
be called twice, both before and after any logic that could
change the state of the PMCCNTR counter.
The register is now 64bit, however a 32 bit write to the register
should leave the higher bits unchanged. The open coded write handler
does not implement this, so we need to read-modify-write accordingly.
Peter Maydell [Fri, 29 Aug 2014 14:00:28 +0000 (15:00 +0100)]
target-arm: Correct Cortex-A57 ISAR5 and AA64ISAR0 ID register values
We implement the crypto extensions but were incorrectly reporting
ID register values for the Cortex-A57 which did not advertise
crypto. Use the correct values as described in the TRM.
With this fix Linux correctly detects presence of the crypto
features and advertises them in /proc/cpuinfo.
Peter Maydell [Fri, 29 Aug 2014 14:00:28 +0000 (15:00 +0100)]
target-arm: Fix regression that disabled VFP for ARMv5 CPUs
Commit 2c7ffc414 added support for honouring the CPACR coprocessor
access control register bits which may disable access to VFP
and Neon instructions. However it failed to account for the
fact that the CPACR is only present starting from the ARMv6
architecture version, so it accidentally disabled VFP completely
for ARMv5 CPUs like the ARM926. Linux would detect this as
"no VFP present" and probably fall back to its own emulation,
but other guest OSes might crash or misbehave.
This patch fixes data loss but this code path is probably rare. Since
guests cannot assume ordering between in-flight requests, few
applications submit overlapping write requests.
Jack Un [Sat, 9 Aug 2014 20:34:36 +0000 (23:34 +0300)]
Fix OHCI ISO TD state never being written back.
There appears to be typo in OHCI with isochronous transfers
resulting in isoch. transfer descriptor state never being written back.
The'put_words' function is in a OR statement hence it is never called.
We identify devices by their Open Firmware device paths. The encoding
of the host controller and hub port numbers is incorrect:
usb_get_fw_dev_path() formats them in decimal, while SeaBIOS uses
hexadecimal. When some port number > 9, SeaBIOS will miss the
bootindex (lucky case), or apply it to another device (unlucky case).
The relevant spec[*] agrees with SeaBIOS (and OVMF, for that matter).
Change %d to %x.
Bug can bite only with host controllers or hubs sporting more than ten
ports. I'm not aware of any.
[*] Open Firmware Recommended Practice: Universal Serial Bus,
Version 1, Section 3.2.1 Device Node Address Representation
http://www.openfirmware.org/1275/bindings/usb/usb-1_0.ps
Signed-off-by: Markus Armbruster <[email protected]> Reviewed-by: Laszlo Ersek <[email protected]>
Note: xhci can be configured with up to 15 ports (default is 4 ports).
Max Reitz [Fri, 20 Jun 2014 19:57:34 +0000 (21:57 +0200)]
nbd: Follow the BDS' AIO context
Keep the NBD server always in the same AIO context as the exported BDS
by calling bdrv_add_aio_context_notifier() and implementing the required
callbacks.
Max Reitz [Fri, 20 Jun 2014 19:57:33 +0000 (21:57 +0200)]
block: Add AIO context notifiers
If a long-running operation on a BDS wants to always remain in the same
AIO context, it somehow needs to keep track of the BDS changing its
context. This adds a function for registering callbacks on a BDS which
are called whenever the BDS is attached or detached from an AIO context.
Max Reitz [Fri, 20 Jun 2014 19:57:32 +0000 (21:57 +0200)]
nbd: Drop nbd_can_read()
There is no variant of aio_set_fd_handler() like qemu_set_fd_handler2(),
so we cannot give a can_read() callback function. Instead, unregister
the nbd_read() function whenever we cannot read and re-register it as
soon as we can read again.
All this is hidden behind the functions nbd_set_handlers() (which
registers all handlers for the AIO context and file descriptor belonging
to the given client), nbd_unset_handlers() (which unregisters them) and
nbd_update_can_read() (which checks whether NBD can read for the given
client and acts accordingly).
Liu Yuan [Thu, 28 Aug 2014 10:27:55 +0000 (18:27 +0800)]
sheepdog: fix a core dump while do auto-reconnecting
We should reinit local_err as NULL inside the while loop or g_free() will report
corrupption and abort the QEMU when sheepdog driver tries reconnecting.
qemu-system-x86_64: failed to get the header, Resource temporarily unavailable
qemu-system-x86_64: Failed to connect to socket: Connection refused
qemu-system-x86_64: (null)
[xcb] Unknown sequence number while awaiting reply
[xcb] Most likely this is a multi-threaded client and XInitThreads has not been called
[xcb] Aborting, sorry about that.
qemu-system-x86_64: ../../src/xcb_io.c:298: poll_for_response: Assertion `!xcb_xlib_threads_sequence_lost' failed.
Aborted (core dumped)
Paolo Bonzini [Wed, 9 Jul 2014 09:53:10 +0000 (11:53 +0200)]
aio-win32: add support for sockets
Uses the same select/WSAEventSelect scheme as main-loop.c.
WSAEventSelect() is edge-triggered, so it cannot be used
directly, but it is still used as a way to exit from a
blocking g_poll().
Before g_poll() is called, we poll sockets with a non-blocking
select() to achieve the level-triggered semantics we require:
if a socket is ready, the g_poll() is made non-blocking too.
Paolo Bonzini [Wed, 9 Jul 2014 09:53:08 +0000 (11:53 +0200)]
AioContext: introduce aio_prepare
This will be used to implement socket polling on Windows.
On Windows, select() and g_poll() are completely different;
sockets are polled with select() before calling g_poll,
and the g_poll must be nonblocking if select() says a
socket is ready.
Paolo Bonzini [Wed, 9 Jul 2014 09:53:05 +0000 (11:53 +0200)]
AioContext: export and use aio_dispatch
So far, aio_poll's scheme was dispatch/poll/dispatch, where
the first dispatch phase was used only in the GSource case in
order to avoid a blocking poll. Earlier patches changed it to
dispatch/prepare/poll/dispatch, where prepare is aio_compute_timeout.
By making aio_dispatch public, we can remove the first dispatch
phase altogether, so that both aio_poll and the GSource use the same
prepare/poll/dispatch scheme.
This patch breaks the invariant that aio_poll(..., true) will not block
the first time it returns false. This used to be fundamental for
qemu_aio_flush's implementation as "while (qemu_aio_wait()) {}" but
no code in QEMU relies on this invariant anymore. The return value
of aio_poll() is now comparable with that of g_main_context_iteration.
Paolo Bonzini [Wed, 9 Jul 2014 09:53:04 +0000 (11:53 +0200)]
AioContext: run bottom halves after polling
Make the dispatching phase the same before blocking and afterwards.
The next patch will make aio_dispatch public and use it directly
for the GSource case, instead of aio_poll. aio_poll can then be
simplified heavily.
Paolo Bonzini [Wed, 9 Jul 2014 09:53:03 +0000 (11:53 +0200)]
aio-win32: Factor out duplicate code into aio_dispatch_handlers
Later, the call to aio_dispatch will move int the GSource wrapper, while the
standalone case will still be call the component functions aio_bh_poll,
aio_dispatch_handlers and timerlistgroup_run_timers.
Paolo Bonzini [Wed, 9 Jul 2014 09:53:01 +0000 (11:53 +0200)]
AioContext: take bottom halves into account when computing aio_poll timeout
Right now, QEMU invokes aio_bh_poll before the "poll" phase
of aio_poll. It is simpler to do it afterwards and skip the
"poll" phase altogether when the OS-dependent parts of AioContext
are invoked from GSource. This way, AioContext behaves more
similarly when used as a GSource vs. when used as stand-alone.
As a start, take bottom halves into account when computing the
poll timeout. If a bottom half is ready, do a non-blocking
poll. As a side effect, this makes idle bottom halves work
with aio_poll; an improvement, but not really an important
one since they are deprecated.
Liu Yuan [Mon, 18 Aug 2014 09:41:05 +0000 (17:41 +0800)]
block/quorum: add simple read pattern support
This patch adds single read pattern to quorum driver and quorum vote is default
pattern.
For now we do a quorum vote on all the reads, it is designed for unreliable
underlying storage such as non-redundant NFS to make sure data integrity at the
cost of the read performance.
For some use cases as following:
VM
--------------
| |
v v
A B
Both A and B has hardware raid storage to justify the data integrity on its own.
So it would help performance if we do a single read instead of on all the nodes.
Further, if we run VM on either of the storage node, we can make a local read
request for better performance.
This patch generalize the above 2 nodes case in the N nodes. That is,
vm -> write to all the N nodes, read just one of them. If single read fails, we
try to read next node in FIFO order specified by the startup command.
The 2 nodes case is very similar to DRBD[1] though lack of auto-sync
functionality in the single device/node failure for now. But compared with DRBD
we still have some advantages over it:
- Suppose we have 20 VMs running on one(assume A) of 2 nodes' DRBD backed
storage. And if A crashes, we need to restart all the VMs on node B. But for
practice case, we can't because B might not have enough resources to setup 20 VMs
at once. So if we run our 20 VMs with quorum driver, and scatter the replicated
images over the data center, we can very likely restart 20 VMs without any
resource problem.
After all, I think we can build a more powerful replicated image functionality
on quorum and block jobs(block mirror) to meet various High Availibility needs.
Hitoshi Mitake [Mon, 11 Aug 2014 05:43:46 +0000 (14:43 +0900)]
sheepdog: improve error handling for a case of failed lock
Recently, sheepdog revived its VDI locking functionality. This patch
updates sheepdog driver of QEMU for this feature. It changes an error
code for a case of failed locking. -EBUSY is a suitable one.
Hitoshi Mitake [Mon, 11 Aug 2014 05:43:45 +0000 (14:43 +0900)]
sheepdog: adopting protocol update for VDI locking
The update is required for supporting iSCSI multipath. It doesn't
affect behavior of QEMU driver but adding a new field to vdi request
struct is required.
Stefan Hajnoczi [Tue, 26 Aug 2014 18:17:56 +0000 (19:17 +0100)]
qemu-img: always goto out in img_snapshot() error paths
The out label has the qemu_progress_end() and other cleanup calls.
Always goto out in error paths so the cleanup happens. These error
paths now return 1 instead of -1.
Note that bdrv_unref(NULL) is safe. We just need to initialize bs to
NULL at the top of the function.
We can now remove the obsolete bs_old_backing = NULL and bs_new_backing
= NULL for safe mode. Originally it was necessary in commit 3e85c6fd
("qemu-img rebase") but became useless in commit c2abcce ("qemu-img:
avoid calling exit(1) to release resources properly") because the
variables are already initialized during declaration.
Stefan Hajnoczi [Tue, 26 Aug 2014 18:17:55 +0000 (19:17 +0100)]
qemu-img: fix img_compare() flags error path
If img_compare() fails to parse the cache flags the goto out3 code path
will call qemu_progress_end(). Make sure we actually call
qemu_progress_init() first.
The curl hardcoded timeout (5 seconds) sometimes is not long
enough depending on the remote server configuration and network
traffic. The user should be able to set how much long he is
willing to wait for the connection.
Adding a new option to set this timeout gives the user this
flexibility. The previous default timeout of 5 seconds will be
used if this option is not present.
We identify devices by their Open Firmware device paths. The encoding
of bus numbers is incorrect: idebus_get_fw_dev_path() formats them in
decimal, while SeaBIOS uses hexadecimal. With bus number > 9, SeaBIOS
will miss the bootindex (lucky case), or apply it to another device
(unlucky case).
Bug can't bite right now: ich9-ahci has six ports, and the sysbus-ahci
created by Calxeda Highbank has just one.
Fix it anyway, by changing %d to %x.
I couldn't find an Open Firmware spec covering this. For what it's
worth, OVMF agrees with SeaBIOS.
Peter Maydell [Thu, 28 Aug 2014 16:08:13 +0000 (17:08 +0100)]
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
SCSI patches include bug fixes from Fam and Peter, improved error
reporting from Fam and a fix for DPRINTF bitrot. Memory patches try
again to initialize name from the QOM name.
# gpg: Signature made Thu 28 Aug 2014 15:10:31 BST using RSA key ID 9B4D86F2
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg: aka "Paolo Bonzini <[email protected]>"
* remotes/bonzini/tags/for-upstream:
memory: Lazy init name from QOM name as needed
xen: hvm: Abstract away memory region name ref
xen-hvm: Constify string
virtio-scsi: Report error if num_queues is 0 or too large
scsi-generic: remove superfluous DPRINTF avoid to break compiling
block/iscsi: fix memory corruption on iscsi resize
scsi-bus: Convert DeviceClass init to realize
block: Pass errp in blkconf_geometry
Peter Maydell [Thu, 28 Aug 2014 15:07:23 +0000 (16:07 +0100)]
Merge remote-tracking branch 'remotes/kvm/tags/for-upstream' into staging
Mostly bugfixes + Alexey's interface-based implementation
of the NMI monitor command.
# gpg: Signature made Thu 28 Aug 2014 15:07:22 BST using RSA key ID 9B4D86F2
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg: aka "Paolo Bonzini <[email protected]>"
* remotes/kvm/tags/for-upstream:
mc146818rtc: reinitialize irq_reinject_on_ack_count on reset
target-i386: Add "tsc_adjust" CPU feature name
target-i386: Add "mpx" CPU feature name
vl: process -object after other backend options
checkpatch.pl: adjust typedef definition to QEMU coding style
x86: Clear MTRRs on vCPU reset
x86: kvm: Add MTRR support for kvm_get|put_msrs()
x86: Use common variable range MTRR counts
target-i386: Don't forbid NX bit on PAE PDEs and PTEs
spapr: Add support for new NMI interface
s390x: Migrate to new NMI interface
s390x: Convert QEMUMachine to MachineClass
cpus: Define callback for QEMU "nmi" command
kvm: run cpu state synchronization on target vcpu thread
To support name retrieval of MemoryRegions that were created
dynamically (that is, not via memory_region_init and friends). We
cache the name in MemoryRegion's state as
object_get_canonical_path_component mallocs the returned value
so it's not suitable for direct return to callers. Memory already
frees the name field, so this will be garbage collected along with
the MR object.
Stefan Hajnoczi [Wed, 27 Aug 2014 11:08:51 +0000 (12:08 +0100)]
qapi.py: avoid Python 2.5+ any() function
There is one instance of any() in qapi.py that breaks builds on older
distros that ship Python 2.4 (like RHEL5):
GEN qmp-commands.h
Traceback (most recent call last):
File "build/scripts/qapi-commands.py", line 445, in ?
exprs = parse_schema(input_file)
File "build/scripts/qapi.py", line 329, in parse_schema
schema = QAPISchema(open(input_file, "r"))
File "build/scripts/qapi.py", line 110, in __init__
if any(include_path == elem[1]
NameError: global name 'any' is not defined
Paolo Bonzini [Tue, 10 Jun 2014 08:52:02 +0000 (10:52 +0200)]
checkpatch.pl: adjust typedef definition to QEMU coding style
Most QEMU typedefs are camelcase, starting with one uppercase letter
and containing at least one lowercase letter. There are a few
all-uppercase types, add the most common too.
Gonglei [Fri, 22 Aug 2014 02:01:50 +0000 (10:01 +0800)]
scsi-generic: remove superfluous DPRINTF avoid to break compiling
variables lun and tag had been eliminated, break compiling
when enable debug switch. Meanwhile traces provide the same
information with this DPRINTF, so remove it.
Peter Lieven [Fri, 22 Aug 2014 08:08:49 +0000 (10:08 +0200)]
block/iscsi: fix memory corruption on iscsi resize
bs->total_sectors is not yet updated at this point. resulting
in memory corruption if the volume has grown and data is written
to the newly availble areas.
Fam Zheng [Tue, 12 Aug 2014 02:12:55 +0000 (10:12 +0800)]
scsi-bus: Convert DeviceClass init to realize
Replace "init/destroy" with "realize/unrealize" in SCSIDeviceClass,
which has errp as a parameter. So all the implementations now use
error_setg instead of error_report for reporting error.
Also in scsi_bus_legacy_handle_cmdline, report the error when
initializing the if=scsi devices, before returning it, because in the
callee, error_report is changed to error_setg. And the callers don't
have the right locations (e.g. "-drive if=scsi").
Alex Williamson [Mon, 25 Aug 2014 18:10:15 +0000 (12:10 -0600)]
vfio: Enable NVIDIA 88000 region quirk regardless of VGA
If we make use of OVMF for the BIOS then we can use GPUs without VGA
space access, but we still need this quirk. Disassociate it from the
x-vga option and enable it on all NVIDIA VGA display class devices.
Peter Maydell [Mon, 25 Aug 2014 17:49:25 +0000 (18:49 +0100)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pci, pc fixes, features
A bunch of bugfixes - these will make sense for 2.1.1
ACPI support for TPM and partial ARI support for PCIE.
Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Sun 24 Aug 2014 23:16:35 BST using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>"
# gpg: aka "Michael S. Tsirkin <[email protected]>"
* remotes/mst/tags/for_upstream:
pcie: fix trailing whitespace
ioh3420: Enable ARI forwarding
ioh3420: Remove obsoleted, unused ioh3420_init function
pcie: Rename the pcie_cap_ari_* functions to pcie_cap_arifwd_*
pcie: Fix incorrect write to the ari capability next function field
ssdt-tpm: add generated hex file to git
Add ACPI tables for TPM
pc: reserve more memory for ACPI for new machine types
pcihp: fix possible array out of bounds
pci_bridge: manually destroy memory regions within PCIBridgeWindows
hostmem: set MPOL_MF_MOVE
Alex Williamson [Thu, 14 Aug 2014 21:39:39 +0000 (15:39 -0600)]
x86: Clear MTRRs on vCPU reset
The SDM specifies (June 2014 Vol3 11.11.5):
On a hardware reset, the P6 and more recent processors clear the
valid flags in variable-range MTRRs and clear the E flag in the
IA32_MTRR_DEF_TYPE MSR to disable all MTRRs. All other bits in the
MTRRs are undefined.
We currently do none of that, so whatever MTRR settings you had prior
to reset is what you have after reset. Usually this doesn't matter
because KVM often ignores the guest mappings and uses write-back
anyway. However, if you have an assigned device and an IOMMU that
allows NoSnoop for that device, KVM defers to the guest memory
mappings which are now stale after reset. The result is that OVMF
rebooting on such a configuration takes a full minute to LZMA
decompress the firmware volume, a process that is nearly instant on
the initial boot.
Alex Williamson [Thu, 14 Aug 2014 21:39:33 +0000 (15:39 -0600)]
x86: kvm: Add MTRR support for kvm_get|put_msrs()
The MTRR state in KVM currently runs completely independent of the
QEMU state in CPUX86State.mtrr_*. This means that on migration, the
target loses MTRR state from the source. Generally that's ok though
because KVM ignores it and maps everything as write-back anyway. The
exception to this rule is when we have an assigned device and an IOMMU
that doesn't promote NoSnoop transactions from that device to be cache
coherent. In that case KVM trusts the guest mapping of memory as
configured in the MTRR.
This patch updates kvm_get|put_msrs() so that we retrieve the actual
vCPU MTRR settings and therefore keep CPUX86State synchronized for
migration. kvm_put_msrs() is also used on vCPU reset and therefore
allows future modificaitons of MTRR state at reset to be realized.
Note that the entries array used by both functions was already
slightly undersized for holding every possible MSR, so this patch
increases it beyond the 28 new entries necessary for MTRR state.