Multiple -machine options with the same ID are merged. All but the
one without an ID are to be silently ignored.
In most places, we query these options with a null ID. This is
correct.
In some places, we instead query whatever options come first in the
list. This is wrong. When the -machine processed first happens to
have an ID, options are taken from that ID, and the ones specified
without ID are silently ignored.
$ qemu-system-x86_64 -nodefaults -S -display none -monitor stdio -machine accel=kvm,usb=on
QEMU 1.5.50 monitor - type 'help' for more information
(qemu) info kvm
kvm support: enabled
(qemu) info usb
(qemu) q
$ qemu-system-x86_64 -nodefaults -S -display none -monitor stdio -machine id=foo -machine accel=kvm,usb=on
QEMU 1.5.50 monitor - type 'help' for more information
(qemu) info kvm
kvm support: disabled
(qemu) info usb
(qemu) q
$ qemu-system-x86_64 -nodefaults -S -display none -monitor stdio -machine id=foo,accel=kvm,usb=on -machine accel=xen
QEMU 1.5.50 monitor - type 'help' for more information
(qemu) info kvm
kvm support: enabled
(qemu) info usb
USB support not enabled
(qemu) q
$ qemu-system-x86_64 -nodefaults -S -display none -monitor stdio -machine accel=xen -machine id=foo,accel=kvm,usb=on
xc: error: Could not obtain handle on privileged command interface (2 = No such file or directory): Internal error
xen be core: can't open xen interface
failed to initialize Xen: Operation not permitted
Option usb is queried correctly, and the one without an ID wins,
regardless of option order.
Option accel is queried incorrectly, and which one wins depends on
option order and ID.
Affected options are accel (and its sugared forms -enable-kvm and
-no-kvm), kernel_irqchip, kvm_shadow_mem.
Additionally, option kernel_irqchip is normally on by default, except
it's off when no -machine options are given. Bug can't bite, because
kernel_irqchip is used only when KVM is enabled, KVM is off by
default, and enabling always creates -machine options. Downstreams
that enable KVM by default do get bitten, though.
If defaults is true, and params has no ID, and options exist, we use
the first assignment. It sets opts to null if all options have an ID.
opts_parse() then returns null. qemu_opts_set_defaults() asserts the
value is non-null. It's the only caller that passes true for
defaults.
To reproduce, try "-M xenpv -machine id=foo" (yes, "id=foo" is silly,
but it shouldn't crash).
I believe the function attempts to do the following:
If options don't yet exist, create new options
Else, if defaults, modify the existing options
Else, if list->merge_lists, modify the existing options
Else, fail
A straightforward call of qemu_opts_create() does exactly that.
Extend support of SMBUS(module pm_smbus.c) HST_STS register.
Previous realization doesn't consider flags in the status register.
Add DS and INTR bits of HST_STS register set after transaction execution.
Update bits resetting in HST_STS register. Update error processing:
if DEV_ERR bit set transaction isn't execution.
Paolo Bonzini [Wed, 3 Jul 2013 16:29:45 +0000 (20:29 +0400)]
trap signals for "-serial mon:stdio"
With mon:stdio you can exit the VM by switching to the monitor and
sending the "quit" command. It is then useful to pass Ctrl-C to the
VM instead of exiting.
This in turn lets us stop tying the default signal handling behavior
to -nographic, removing gratuitous differences between "-display none"
and "-nographic".
This patch changes behavior for "-display none -serial mon:stdio", as
expected, but not for "-display none -serial stdio".
For bsd-user and linux-user emulation modes QEMU needs to be linked at an
alternate .text segment address, so that it's out of the way of the guest
executable. Instead of including modified linker scripts for each arch,
just set the address with -Ttext-segment if supported, or by using sed to
edit the default linker script.
Anthony Liguori [Mon, 8 Jul 2013 13:00:23 +0000 (08:00 -0500)]
Merge remote-tracking branch 'mst/tags/for_anthony' into staging
pci,misc enhancements
This includes some pci enhancements:
Better support for systems with multiple PCI root buses
FW cfg interface for more robust pci programming in BIOS
Minor fixes/cleanups for fw cfg and cross-version migration -
because of dependencies with other patches
Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Sun 07 Jul 2013 03:11:18 PM CDT using RSA key ID D28D5469
# gpg: Can't check signature: public key not found
# By David Gibson (10) and others
# Via Michael S. Tsirkin
* mst/tags/for_anthony:
pci: Fold host_buses list into PCIHostState functionality
pci: Remove domain from PCIHostBus
pci: Simpler implementation of primary PCI bus
pci: Add root bus parameter to pci_nic_init()
pci: Add root bus argument to pci_get_bus_devfn()
pci: Replace pci_find_domain() with more general pci_root_bus_path()
pci: Use helper to find device's root bus in pci_find_domain()
pci: Abolish pci_find_root_bus()
pci: Move pci_read_devaddr to pci-hotplug-old.c
pci: Cleanup configuration for pci-hotplug.c
pvpanic: fix fwcfg for big endian hosts
pvpanic: initialization cleanup
MAINTAINERS: s/Marcelo/Paolo/
e1000: cleanup process_tx_desc
pc_piix: cleanup init compat handling
pc: pass PCI hole ranges to Guests
pci: store PCI hole ranges in guestinfo structure
range: add Range structure
David Gibson [Thu, 6 Jun 2013 08:48:54 +0000 (18:48 +1000)]
pci: Fold host_buses list into PCIHostState functionality
The host_buses list is an odd structure - a list of pointers to PCI root
buses existing in parallel to the normal qdev tree structure. This patch
removes it, instead putting the link pointers into the PCIHostState
structure, which have a 1:1 relationship to PCIHostBus structures anyway.
David Gibson [Thu, 6 Jun 2013 08:48:53 +0000 (18:48 +1000)]
pci: Remove domain from PCIHostBus
There are now no users of the domain field of PCIHostBus, so remove it
from the structure, and as a parameter from the pci_host_bus_register()
function which sets it.
David Gibson [Thu, 6 Jun 2013 08:48:52 +0000 (18:48 +1000)]
pci: Simpler implementation of primary PCI bus
Currently pci_find_primary_bus() searches the list of root buses for one
with domain 0. But since host buses are always registered with domain 0,
this just amounts to finding the only PCI host bus. The only remaining
users of pci_find_primary_bus() are in pci-hotplug-old.c, which implements
the old style pci_add/pci_del commands.
Therefore, this patch redefines pci_find_primary_bus() to find the only
PCI root bus, returning an error if there are multiple roots. The callers
in pci-hotplug-old.c are updated correspondingly, to produce sensible
error messages.
David Gibson [Thu, 6 Jun 2013 08:48:51 +0000 (18:48 +1000)]
pci: Add root bus parameter to pci_nic_init()
At present, pci_nic_init() and pci_nic_init_nofail() assume that they will
only create a NIC under the primary PCI root. As we add support for
multiple PCI roots, that may no longer be the case. This patch adds a root
bus parameter to pci_nic_init() (and updates callers accordingly) to allow
the machine init code using it to specify the right PCI root for NICs
created by old-style -net nic parameters. NICs created new-style, with
-device can of course be put anywhere.
David Gibson [Thu, 6 Jun 2013 08:48:50 +0000 (18:48 +1000)]
pci: Add root bus argument to pci_get_bus_devfn()
pci_get_bus_devfn() interprets a full PCI address string to give a PCIBus *
and device/function number within that bus. Currently it assumes it is
working on an address under the primary PCI root bus. This patch extends
it to allow the caller to specify a root bus. This might seem a little odd
since the supplied address can (theoretically) include a PCI domain number.
However, attempting to use a non-zero domain number there is currently an
error, so that shouldn't really cause problems.
David Gibson [Thu, 6 Jun 2013 08:48:49 +0000 (18:48 +1000)]
pci: Replace pci_find_domain() with more general pci_root_bus_path()
pci_find_domain() is used in a number of places where we want an id for a
whole PCI domain (i.e. the subtree under a PCI root bus). The trouble is
that many platforms may support multiple independent host bridges with no
hardware supplied notion of domain number.
This patch, therefore, replaces calls to pci_find_domain() with calls to
a new pci_root_bus_path() returning a string. The new call is implemented
in terms of a new callback in the host bridge class, so it can be defined
in some way that's well defined for the platform. When no callback is
available we fall back on the qbus name.
Most current uses of pci_find_domain() are for error or informational
messages, so the change in identifiers should be harmless. The exception
is pci_get_dev_path(), whose results form part of migration streams. To
maintain compatibility with old migration streams, the PIIX PCI host is
altered to always supply "0000" for this path, which matches the old domain
number (since the code didn't actually support domains other than 0).
For the pseries (spapr) PCI bridge we use a different platform-unique
identifier (pseries machines can routinely have dozens of PCI host
bridges). Theoretically that breaks migration streams, but given that we
don't yet have migration support for pseries, it doesn't matter.
Any other machines that have working migration support including PCI
devices will need to be updated to maintain migration stream compatibility.
David Gibson [Thu, 6 Jun 2013 08:48:48 +0000 (18:48 +1000)]
pci: Use helper to find device's root bus in pci_find_domain()
Currently pci_find_domain() performs two functions - it locates the PCI
root bus above the given bus, then looks up that root bus's domain number.
This patch adds a helper function to perform the first task, finding the
root bus for a given PCI device. This is then used in pci_find_domain().
This changes pci_find_domain()'s signature slightly, taking a PCIDevice
instead of a PCIBus - since all callers passed something of the form
dev->bus, this simplifies things slightly.
David Gibson [Thu, 6 Jun 2013 08:48:47 +0000 (18:48 +1000)]
pci: Abolish pci_find_root_bus()
pci_find_root_bus() takes a domain parameter. Currently PCI root buses
with domain other than 0 can't be created, so this is more or less a long
winded way of retrieving the main PCI root bus. Numbered domains don't
actually properly cover the (non x86) possibilities for multiple PCI root
buses, so this patch for now enforces the domain == 0 restriction in other
places to replace pci_find_root_bus() with an explicit
pci_find_primary_bus().
Peter Maydell [Mon, 24 Jun 2013 10:49:32 +0000 (11:49 +0100)]
MAINTAINERS: fix bad F: patterns
This patch fixes a number of incorrect F: patterns which didn't
match any files in the source tree. This was caused by a mix
of minor typos (- for _ and the like) and a few entries which
hadn't been correctly updated following the rearrangement of hw/.
Offending entries were located with the following shell rune:
for pattern in $(sed -ne 's/^F: //p' MAINTAINERS); do
if ! stat --printf='' $pattern 2>/dev/null; then
echo bad pattern: $pattern
fi
done
Anthony Liguori [Sun, 7 Jul 2013 16:28:01 +0000 (11:28 -0500)]
Merge remote-tracking branch 'stefanha/block' into staging
# By Fam Zheng (2) and Stefan Hajnoczi (1)
# Via Stefan Hajnoczi
* stefanha/block:
block: fix bdrv_flush() ordering in bdrv_close()
curl: refuse to open URL from HTTP server without range support
vmdk: Implement .bdrv_has_zero_init
Anthony Liguori [Sun, 7 Jul 2013 16:19:27 +0000 (11:19 -0500)]
Merge remote-tracking branch 'bonzini/iommu-for-anthony' into staging
# By Paolo Bonzini (50) and others
# Via Paolo Bonzini
* bonzini/iommu-for-anthony: (66 commits)
exec: change some APIs to take AddressSpaceDispatch
exec: remove cur_map
exec: put memory map in AddressSpaceDispatch
exec: separate current radix tree from the one being built
exec: move listener from AddressSpaceDispatch to AddressSpace
memory: move MemoryListener declaration earlier
exec: separate current memory map from the one being built
exec: change well-known physical sections to macros
qom: Use atomics for object refcounting
memory: add reference counting to FlatView
memory: use a new FlatView pointer on every topology update
memory: access FlatView from a local variable
add a header file for atomic operations
hw/[u-x]*: pass owner to memory_region_init* functions
hw/t*: pass owner to memory_region_init* functions
hw/s*: pass owner to memory_region_init* functions
hw/p*: pass owner to memory_region_init* functions
hw/n*: pass owner to memory_region_init* functions
hw/m*: pass owner to memory_region_init* functions
hw/i*: pass owner to memory_region_init* functions
...
Stefan Hajnoczi [Tue, 2 Jul 2013 13:36:25 +0000 (15:36 +0200)]
block: fix bdrv_flush() ordering in bdrv_close()
Since 80ccf93b we flush the block device during close. The
bdrv_drain_all() call should come before bdrv_flush() to ensure guest
write requests have completed. Otherwise we may miss pending writes
when flushing.
Call bdrv_drain_all() again for safety as the final step after
bdrv_flush(). This should not be necessary but we can be paranoid here
in case bdrv_flush() left I/O pending.
curl: refuse to open URL from HTTP server without range support
CURL driver requests partial data from server on guest IO req. For HTTP
and HTTPS, it uses "Range: ***" in requests, and this will not work if
server not accepting range. This patch does this check when open.
* Removed curl_size_cb, which is not used: On one hand it's registered to
libcurl as CURLOPT_WRITEFUNCTION, instead of CURLOPT_HEADERFUNCTION,
which will get called with *data*, not *header*. On the other hand the
s->len is assigned unconditionally later.
In this gone function, the sscanf for "Content-Length: %zd", on
(void *)ptr, which is not guaranteed to be zero-terminated, is
potentially a security bug. So this patch fixes it as a side-effect. The
bug is reported as: https://bugs.launchpad.net/qemu/+bug/1188943
(Note the bug is marked "private" so you might not be able to see it)
* Introduced curl_header_cb, which is used to parse header and mark the
server as accepting range if "Accept-Ranges: bytes" line is seen from
response header. If protocol is HTTP or HTTPS, but server response has
no not this support, refuse to open this URL.
Note that python builtin module SimpleHTTPServer is an example of not
supporting range, if you need to test this driver, get a better server
or use internet URLs.
Depending on the subformat, has_zero_init queries underlying storage for
flat extent. If it has a flat extent and its underlying storage doesn't
have zero init, return 0. Otherwise return 1.
Paolo Bonzini [Wed, 29 May 2013 10:30:26 +0000 (12:30 +0200)]
exec: remove cur_map
cur_map is not used anymore; instead, each AddressSpaceDispatch
has its own nodes/sections pair. The priorities of the
MemoryListeners, and in the future RCU, guarantee that the
nodes/sections are not freed while they are still in use.
(In fact, next_map itself is not needed except to free the data on the
next update).
To avoid incorrect use, replace cur_map with a temporary copy that
is only valid while the topology is being updated. If you use it,
the name prev_map makes it clear that you're doing something weird.
Paolo Bonzini [Wed, 29 May 2013 10:28:21 +0000 (12:28 +0200)]
exec: put memory map in AddressSpaceDispatch
After this patch, AddressSpaceDispatch holds a constistent tuple of
(phys_map, nodes, sections). This will be important when updates
of the topology will run concurrently with reads.
cur_map is not used anymore except for freeing it at the end of the
topology update.
Paolo Bonzini [Wed, 29 May 2013 10:13:54 +0000 (12:13 +0200)]
exec: separate current radix tree from the one being built
This same treatment previously done to phys_node_map and phys_sections
is now applied to the dispatch field of AddressSpace. Topology updates
use as->next_dispatch while accesses use as->dispatch.
Paolo Bonzini [Sun, 2 Jun 2013 08:39:07 +0000 (10:39 +0200)]
exec: move listener from AddressSpaceDispatch to AddressSpace
This will help having two copies of AddressSpaceDispatch during the
recreation of the radix tree (one being built, and one that is complete
and will be protected by RCU). We do not want to have to unregister and
re-register the listener.
Paolo Bonzini [Wed, 29 May 2013 10:09:47 +0000 (12:09 +0200)]
exec: separate current memory map from the one being built
Currently, phys_node_map and phys_sections are shared by all
of the AddressSpaceDispatch. When updating mem topology, all
AddressSpaceDispatch will rebuild dispatch tables sequentially
on them. In order to prepare for RCU access, leave the old
memory map alive while the next one is being accessed.
When rebuilding, the new dispatch tables will build and lookup
next_map; after all dispatch tables are rebuilt, we can switch
to next_* and free the previous table.
Liu Ping Fan [Wed, 29 May 2013 09:09:17 +0000 (11:09 +0200)]
exec: change well-known physical sections to macros
Sections like phys_section_unassigned always have fixed address
in phys_sections. Declared as macro, so we can use them
when having more than one phys_sections array.
Paolo Bonzini [Mon, 6 May 2013 09:57:21 +0000 (11:57 +0200)]
memory: add reference counting to FlatView
With this change, a FlatView can be used even after a concurrent
update has replaced it. Because we do not yet have RCU, we use a
mutex to protect the small critical sections that read/write the
as->current_map pointer. Accesses to the FlatView can be done
outside the mutex.
If a MemoryRegion will be used after the FlatView is unref-ed (or after
a MemoryListener callback is returned), a reference has to be added to
that MemoryRegion. memory_region_find already does it for the region
that it returns. The same will be done for address_space_translate
as soon as the dispatch tree is also converted to RCU-style.
Paolo Bonzini [Mon, 6 May 2013 08:29:07 +0000 (10:29 +0200)]
memory: use a new FlatView pointer on every topology update
This is the first step towards converting as->current_map to
RCU-style updates, where the FlatView updates run concurrently
with uses of an old FlatView.
Paolo Bonzini [Mon, 6 May 2013 08:26:13 +0000 (10:26 +0200)]
memory: access FlatView from a local variable
We will soon require accesses to as->current_map to be placed under
a lock (with reference counting so as to keep the critical section
small). To simplify this change, always fetch as->current_map into
a local variable and access it through that variable.
Paolo Bonzini [Fri, 28 Jun 2013 15:29:27 +0000 (17:29 +0200)]
exec: reorganize address_space_map
First of all, rename "todo" to "done".
Second, clearly separate the case of done == 0 with the case of done != 0.
This will help handling reference counting in the next patch.
Third, this test:
if (memory_region_get_ram_addr(mr) + xlat != raddr + todo) {
does not guarantee that the memory region is the same across two iterations
of the while loop. For example, you could have two blocks:
A) size 640 K, mapped at physical address 0, ram_addr_t 0
B) size 64 K, mapped at physical address 0xa0000, ram_addr_t 0xa0000
then mapping 1 M starting at physical address zero will erroneously treat
B as the continuation of block A. qemu_ram_ptr_length ensures that no
invalid memory is accessed, but it is still a pointless complication of
the algorithm. The patch makes the logic clearer with an explicit test
that the memory region is the same.
Paolo Bonzini [Mon, 3 Jun 2013 10:44:02 +0000 (12:44 +0200)]
exec: move qemu_ram_addr_from_host_nofail to cputlb.c
After the next patch it would not be used elsewhere anyway. Also,
the _nofail and the standard versions of this function return different
things, which is confusing. Removing the function from the public headers
limits the confusion.
Paolo Bonzini [Tue, 7 May 2013 04:59:09 +0000 (06:59 +0200)]
memory: add getter for owner
Whenever memory regions are accessed outside the BQL, they need to be
preserved against hot-unplug. MemoryRegions actually do not have their
own reference count; they piggyback on a QOM object, their "owner".
The owner is set at creation time, and there is a function to retrieve
the owner.
Paolo Bonzini [Wed, 29 May 2013 10:07:03 +0000 (12:07 +0200)]
exec: simplify destruction of the phys map
Do not bother visiting the radix tree when an address space is destroyed.
After the previous patch, this has become a pointless exercise. When
called from address_space_destroy_dispatch, all you're doing is zeroing
out a structure that will be freed as soon as you come back. When called
from mem_begin, when phys_page_set_level will call phys_map_node_alloc the
radix tree's array will be zeroed too.
Paolo Bonzini [Tue, 25 Jun 2013 07:30:48 +0000 (09:30 +0200)]
memory: destroy phys_sections one by one
phys_sections_clear is invoked after the dispatch tree has been
destroyed. This leaves a window where phys_sections_nb > 0 but the
subpages are not valid anymore, which is a recipe for use-after-free
bugs.
Move the destruction of subpages in phys_sections_clear. We will
still destroy the subpages when an address space is cleaned up,
because address_space_destroy will clear as->root and commit the
change before it calls address_space_destroy_dispatch.
Jan Kiszka [Mon, 24 Jun 2013 08:45:09 +0000 (10:45 +0200)]
ioport: Switch dispatching to memory core layer
The current ioport dispatcher is a complex beast, mostly due to the
need to deal with old portio interface users. But we can overcome it
without converting all portio users by embedding the required base
address of a MemoryRegionPortio access into that data structure. That
removes the need to have the additional MemoryRegionIORange structure
in the loop on every access.
To handle old portio memory ops, we simply install dispatching handlers
for portio memory regions when registering them with the memory core.
This removes the need for the old_portio field.
We can drop the additional aliasing of ioport regions and also the
special address space listener. cpu_in and cpu_out now simply call
address_space_read/write. And we can concentrate portio handling in a
single source file.
Jan Kiszka [Sat, 22 Jun 2013 06:07:01 +0000 (08:07 +0200)]
isa: implement isa_is_ioport_assigned via memory_region_find
Open-code isa_is_ioport_assigned via a memory region lookup. As all IO
ports are now directly or indirectly registered via the memory API, this
becomes possible and will finally allow us to drop the ioport tables.
David Gibson [Thu, 6 Jun 2013 08:48:46 +0000 (18:48 +1000)]
pci: Move pci_read_devaddr to pci-hotplug-old.c
pci_read_devaddr() is only used by the legacy functions for the old PCI
hotplug interface in pci-hotplug-old.c. So we move the function there,
and make it static.
David Gibson [Thu, 6 Jun 2013 08:48:45 +0000 (18:48 +1000)]
pci: Cleanup configuration for pci-hotplug.c
pci-hotplug.c and the CONFIG_PCI_HOTPLUG variable which controls its
compilation are misnamed. They're not about PCI hotplug in general, but
rather about the pci_add/pci_del interface which are now deprecated in
favour of the more general device_add/device_del interface. This patch
therefore renames them to pci-hotplug-old.c and CONFIG_PCI_HOTPLUG_OLD.
CONFIG_PCI_HOTPLUG=y was listed twice in {i386,x86_64}-softmmu.make for no
particular reason, so we clean that up too. In addition it was included in
ppc64-softmmu.mak for which the old hotplug interface was never used and is
unsuitable, so we remove that too.
Most of pci-hotplug.c was additionaly protected by #ifdef TARGET_I386. The
small piece which wasn't is only called from the pci_add and pci_del hooks
in hmp-commands.hx, which themselves were protected by #ifdef TARGET_I386.
This patch therefore also removes the #ifdef from pci-hotplug-old.c,
and changes the ifdefs in hmp-commands.hx to use CONFIG_PCI_HOTPLUG_OLD.
Avoid use of static variables: PC systems
initialize pvpanic device through pvpanic_init,
so we can simply create the fw_cfg file at that point.
This also makes it possible to skip device
creation completely if fw_cfg is not there, e.g. for xen -
so the ports it reserves are not discoverable by guests.
Also, make pvpanic_init void since callers ignore return
status anyway.
Andrew Jones [Tue, 4 Jun 2013 08:49:48 +0000 (10:49 +0200)]
e1000: cleanup process_tx_desc
Coverity complains about two overruns in process_tx_desc(). The
complaints are false positives, but we might as well eliminate
them. The problem is that "hdr" is defined as an unsigned int,
but then used to offset an array of size 65536, and another of
size 256 bytes. hdr will actually never be greater than 255
though, as it's assigned only once and to the value of
tp->hdr_len, which is an uint8_t. This patch simply gets rid of
hdr, replacing it with tp->hdr_len, which makes it consistent
with all other tp member use in the function.
v2:
- also cleanup coding style issues in the touched lines
Guest currently has to jump through lots of hoops to guess the PCI hole
ranges. It's fragile, and makes us change BIOS each time we add a new
chipset. Let's report the window in a ROM file, to make BIOS do exactly
what QEMU intends.