Isaku Yamahata [Wed, 23 Jun 2010 07:15:27 +0000 (16:15 +0900)]
pci: don't overwrite multi functio bit in pci header type.
Don't overwrite pci header type.
Otherwise, multi function bit which pci_init_header_type() sets
appropriately is lost.
Anyway PCI_HEADER_TYPE_NORMAL is zero, so it is unnecessary to zero
which is already zero cleared.
how to test:
run qemu and issue info pci to see whether a device in question is
normal device, not pci-to-pci bridge.
This is handy because guest os isn't required.
tested changes:
The following files are covered by using following commands.
sparc64-softmmu
apb_pci.c, vga-pci.c, cmd646.c, ne2k_pci.c, sun4u.c
ppc-softmmu
grackle_pci.c, cmd646.c, ne2k_pci.c, vga-pci.c, macio.c
ppc-softmmu -M mac99
unin_pci.c(uni-north, uni-north-agp)
ppc64-softmmu
pci-ohci, ne2k_pci, vga-pci, unin_pci.c(u3-agp)
x86_64-softmmu
acpi_piix4.c, ide/piix.c, piix_pci.c
-vga vmware vmware_vga.c
-watchdog i6300esb wdt_i6300esb.c
-usb usb-uhci.c
-sound ac97 ac97.c
-nic model=rtl8139 rtl8139.c
-nic model=pcnet pcnet.c
-balloon virtio virtio-pci.c:
untested changes:
The following changes aren't tested.
prep_pci.c: ppc-softmmu -M prep should cover, but core dumped.
unin_pci.c(uni-north-pci): the caller is commented out.
openpic.c: the caller is commented out in ppc_prep.c
Isaku Yamahata [Wed, 23 Jun 2010 07:15:26 +0000 (16:15 +0900)]
pci: insert assert that auto-assigned-address function is single function device.
Auto-assigned-address pci function (passing devfn = -1) is always
single function.
This patch adds assert() to guarantee that auto-assigned-address function
is always single function device at function = 0.
Isaku Yamahata [Wed, 23 Jun 2010 07:15:25 +0000 (16:15 +0900)]
pci: use PCI_DEVFN() where appropriate.
Use PCI_DEVFN() and PCI_FUNC_MAX where appropriate.
This patch make it clear that func = 0.
test:
The following object files with/without this patch are stripped and compared.
They remains same.
arm-softmmu/versatile_pci.o
libhw32/ppce500_pci.o
libhw32/unin_pci.o
libhw64/ppce500_pci.o
libhw64/unin_pci.o
mips-softmmu/gt64xxx.o
mips64-softmmu/gt64xxx.o
mips64el-softmmu/gt64xxx.o
mipsel-softmmu/gt64xxx.o
Blue Swirl [Wed, 7 Jul 2010 19:37:53 +0000 (19:37 +0000)]
Fix warning about uninitialized variable
With gcc 4.2.1-sjlj (mingw32-2) I get this warning:
/src/qemu/exec.c: In function 'qemu_ram_alloc':
/src/qemu/exec.c:2777: warning: 'offset' may be used uninitialized in this function
Alex Williamson [Fri, 2 Jul 2010 17:13:29 +0000 (11:13 -0600)]
ramblocks: No more being lazy about duplicate names
Now that we have a working qemu_ram_free() and the primary runtime
user of it has been updated, don't be lenient about duplicate id strings.
We also shouldn't need to create them ondemand at the target.
Alex Williamson [Fri, 25 Jun 2010 17:10:05 +0000 (11:10 -0600)]
savevm: Create a new continue flag to avoid resending block name
Allows us to compress the protocol a bit by setting a flag on the
offset which indicates we're still working within the same block
as last time. That way we can avoid sending the block name for
every page. Suggested by Anthony Liguori.
Alex Williamson [Fri, 25 Jun 2010 17:09:57 +0000 (11:09 -0600)]
savevm: Use RAM blocks for basis of migration
We don't want to assume a contiguous address space, so migrate based
on RAM blocks instead of a fixed linear address map. This will allow
us to have holes in the ram_addr_t namespace, so we can implement
qemu_ram_free().
Alex Williamson [Fri, 25 Jun 2010 17:09:50 +0000 (11:09 -0600)]
savevm: Migrate RAM based on name/offset
Synchronize RAM blocks with the target and migrate using name/offset
pairs. This ensures both source and target have the same view of
RAM and that we get the right bits into the right slot.
Alex Williamson [Fri, 25 Jun 2010 17:09:43 +0000 (11:09 -0600)]
ramblocks: Make use of DeviceState pointer and BusInfo.get_dev_path
With these two pieces in place, we can start naming ramblocks. When
the device is present and it lives on a bus that provides a device
path, we concatenate the path and the provided name. Otherwise we
just use name. The resulting id string must be unique. For now we
assume an allocation for the same name and size is a device that has
been removed and reinserted and return the same block. This will go
away once qemu_ram_free() is implemented.
Alex Williamson [Fri, 25 Jun 2010 17:09:35 +0000 (11:09 -0600)]
qemu_ram_alloc: Add DeviceState and name parameters
These will be used to generate unique id strings for ramblocks. The name
field is required, the device pointer is optional as most callers don't
have a device. When there's no device or the device isn't a child of
a bus implementing BusInfo.get_dev_path, the name should be unique for
the platform.
Alex Williamson [Fri, 25 Jun 2010 17:09:28 +0000 (11:09 -0600)]
virtio-net: Incorporate a DeviceState pointer and let savevm track instances
Stuff a pointer to the DeviceState into the VirtIONet structure so that
we can easily remove the vmstate entry later. Also, let vmstate track
the instance number (it should always be zero internally since the
device path should now be unique).
Alex Williamson [Fri, 25 Jun 2010 17:09:14 +0000 (11:09 -0600)]
savevm: Make use of DeviceState
For callers that pass a device we can traverse up the qdev tree and
make use of the BusInfo.get_dev_path information for creating unique
savevm id strings. This avoids needing to rely on the instance number,
which can cause problems with device initialization order and hotplug.
For compatibility, we also store away the old id string and instance
so we can accept migrations from VMs as we add new get_dev_path
implementations.
Alex Williamson [Fri, 25 Jun 2010 17:09:07 +0000 (11:09 -0600)]
savevm: Add DeviceState param
When available, we'd like to be able to access the DeviceState
when registering a savevm. For buses with a get_dev_path()
function, this will allow us to create more unique savevm
id strings.
Alex Williamson [Fri, 25 Jun 2010 17:08:59 +0000 (11:08 -0600)]
pci: Implement BusInfo.get_dev_path()
This works great for PCI since a <segment>:<bus>:<dev>.<fn> uniquely
describes a global address. No need to traverse up the qdev tree.
PCI segment support is a placeholder for compatibility once we
support multiple segments.
Alex Williamson [Fri, 25 Jun 2010 17:08:52 +0000 (11:08 -0600)]
qdev: Add a get_dev_path() function to BusInfo
This function is meant to provide a stable device path for buses
which are able to implement it. If a bus has a globally unique
addresses scheme, one address level may be sufficient to provide
a path. Other buses may need to recursively traverse up the
qdev tree.
Alex Williamson [Fri, 25 Jun 2010 17:08:38 +0000 (11:08 -0600)]
Remove uses of ram.last_offset (aka last_ram_offset)
We currently need this either to allocate the next ram_addr_t for a
new block, or for total memory to be migrated. Both of which we can
calculate without need of this to keep us in a contiguous address space.
Jan Kiszka [Tue, 6 Jul 2010 08:58:03 +0000 (10:58 +0200)]
scsi: Fix SCSI bus reset
When the controller raises the SCSI reset line, we have to perform the
requested reset on all disks attached to the controller's bus. Moreover,
reset is edge triggered, so avoid repeating it if the line was already
high.
MORITA Kazutaka [Sun, 20 Jun 2010 20:01:00 +0000 (05:01 +0900)]
block: add sheepdog driver for distributed storage support
Sheepdog is a distributed storage system for QEMU. It provides highly
available block level storage volumes to VMs like Amazon EBS. This
patch adds a qemu block driver for Sheepdog.
Sheepdog features are:
- No node in the cluster is special (no metadata node, no control
node, etc)
- Linear scalability in performance and capacity
- No single point of failure
- Autonomous management (zero configuration)
- Useful volume management support such as snapshot and cloning
- Thin provisioning
- Autonomous load balancing
The more details are available at the project site:
http://www.osrg.net/sheepdog/
drive_init() doesn't permit rerror for if=scsi, but that's worthless:
we get it via if=none and -device.
Moreover, scsi-generic doesn't support werror. Since drive_init()
doesn't catch that, option werror was silently ignored even with
if=scsi.
Wart: unlike drive_init(), we don't reject the default action when
it's explicitly specified. That's because we can't distinguish "no
rerror option" from "rerror=report", or "no werror" from
"rerror=enospc". Left for another day.
Some of the failures are internal errors, and hw_error() is okay then.
But the common way to fail is bad user input, e.g. -global
isa-fdc.driveA=foo where drive foo has an unsupported rerror value.
drive_init() doesn't permit them for if=floppy, but that's worthless:
we get them via if=none and -global.
This can make device initialization fail. Since all callers of
fdctrl_init_isa() ignore its value, change it to die instead of
returning failure. Without this, some callers would ignore the
failure, and others would crash.
Wart: unlike drive_init(), we don't reject the default action when
it's explicitly specified. That's because we can't distinguish "no
rerror option" from "rerror=report", or "no werror" from
"rerror=enospc". Left for another day.
raw_pread_aligned() retries up to two times if the block device backs
a virtual CD-ROM (a drive with media=cdrom and if=ide, scsi, xen or
none). This makes no sense. Whether retrying reads can correct read
errors can only depend on what we're reading, not on how the result
gets used. We need to check what whether we're reading from a
physical CD-ROM or floppy here.
I doubt retrying is useful even then. Left for another day.
Impact:
* Virtual CD-ROM backed by host_cdrom behaves the same.
* Virtual CD-ROM backed by file or host_device no longer retries.
* A drive backed by host_cdrom now retries even if it's not a virtual
CD-ROM.
init_blk_migration_it() skips drives with type hint BDRV_TYPE_CDROM.
The intention is to skip read-only drives. However, BDRV_TYPE_CDROM
is only a hint. It is currently sufficent for read-only. But it's
not necessary, and it may not remain sufficient.
blockdev: Clean up how readonly persists across virtual media change
Since commit cb4e5f8e, monitor command change makes the new media
readonly iff the type hint is BDRV_TYPE_CDROM, i.e. the drive was
created with media=cdrom. The intention is to avoid changing a block
device's read-only-ness. However, BDRV_TYPE_CDROM is only a hint. It
is currently sufficent for read-only. But it's not necessary, and it
may not remain sufficient.
john cooper [Fri, 2 Jul 2010 17:44:25 +0000 (13:44 -0400)]
Add virtio disk identification support
This patch adds the final missing bits for support of
passing a serial/id string to a virtio-blk guest driver.
The guest-side component already exists in the virtio
driver, and has recently been reworked by Ryan to export
a /sys interface for retrieval of the id from guest userland.
Kevin Wolf [Tue, 29 Jun 2010 09:43:13 +0000 (11:43 +0200)]
qemu-img check: Distinguish different kinds of errors
People think that their images are corrupted when in fact there are just some
leaked clusters. Differentiating several error cases should make the messages
more comprehensible.
The result of parsing qemu-options.def depends on whehter or not
MAP_POPULATE is defined, so make sure to include sys/mman.h before
including qemu-options.h.
Kevin Wolf [Thu, 1 Jul 2010 14:08:51 +0000 (16:08 +0200)]
block: Handle multiwrite errors only when all requests have completed
Don't try to be clever by freeing all temporary data and calling all callbacks
when the return value (an error) is certain. Doing so has at least two
important problems:
* The temporary data that is freed (qiov, possibly zero buffer) is still used
by the requests that have not yet completed.
* Calling the callbacks for all requests in the multiwrite means for the caller
that it may free buffers etc. which are still in use.
Just remember the error value and do the cleanup when all requests have
completed.
Kevin Wolf [Fri, 2 Jul 2010 12:01:21 +0000 (14:01 +0200)]
block: Fix early failure in multiwrite
bdrv_aio_writev may call the callback immediately (and it will commonly do so
in error cases). Current code doesn't consider this. For details see the
comment added by this patch.
MORITA Kazutaka [Sun, 20 Jun 2010 19:26:35 +0000 (04:26 +0900)]
qemu-img: avoid calling exit(1) to release resources properly
This patch removes exit(1) from error(), and properly releases
resources such as a block driver and an allocated memory.
For testing the Sheepdog block driver with qemu-iotests, it is
necessary to call bdrv_delete() before the program exits. Because the
driver releases the lock of VM images in the close handler.
Drives defined with -drive if=ide get get created along with the IDE
controller, inside machine->init(). That's before cmos_init().
Drives defined with -device get created during generic device init.
That's after cmos_init(). Because of that, CMOS has no information on
them (type, geometry, translation). Older versions of Windows such as
XP reportedly choke on that.
Split off the part of CMOS initialization that needs to know about
-device devices, and turn it into a reset handler, so it runs after
device creation.
BlockDriverState member removable controls whether virtual media
change (monitor commands change, eject) is allowed. It is set when
the "type hint" is BDRV_TYPE_CDROM or BDRV_TYPE_FLOPPY.
The type hint is only set by drive_init(). It sets BDRV_TYPE_FLOPPY
for if=floppy. It sets BDRV_TYPE_CDROM for media=cdrom and if=ide,
scsi, xen, or none.
if=ide and if=scsi work, because the type hint makes it a CD-ROM.
if=xen likewise, I think.
For the same reason, if=none works when it's used by ide-drive or
scsi-disk. For other guest devices, there are problems:
* fdc: you can't change virtual media
$ qemu [...] -drive if=none,id=foo,... -global isa-fdc.driveA=foo
QEMU 0.12.50 monitor - type 'help' for more information
(qemu) eject foo
Device 'foo' is not removable
unless you add media=cdrom, but that makes it readonly.
* virtio: if you add media=cdrom, you can change virtual media. If
you eject, the guest gets I/O errors. If you change, the guest sees
the drive's contents suddenly change.
* scsi-generic: if you add media=cdrom, you can change virtual media.
I didn't test what that does to the guest or the physical device,
but it can't be pretty.
savevm.c keeps a pointer to the snapshot block device. If you manage
to get that device deleted, the pointer dangles, and the next snapshot
operation will crash & burn. Unplugging a guest device that uses it
does the trick:
$ MALLOC_PERTURB_=234 qemu-system-x86_64 [...]
QEMU 0.12.50 monitor - type 'help' for more information
(qemu) info snapshots
No available block device supports snapshots
(qemu) drive_add auto if=none,file=tmp.qcow2
OK
(qemu) device_add usb-storage,id=foo,drive=none1
(qemu) info snapshots
Snapshot devices: none1
Snapshot list (from none1):
ID TAG VM SIZE DATE VM CLOCK
(qemu) device_del foo
(qemu) info snapshots
Snapshot devices:
Segmentation fault (core dumped)
Move management of that pointer to block.c, and zap it when the device
it points becomes unusable.
Kevin Wolf [Wed, 30 Jun 2010 15:43:40 +0000 (17:43 +0200)]
blkdebug: Initialize state as 1
state = 0 in rules means that the rule is valid for any state. Therefore it's
impossible to have a rule that works only in the initial state. This changes
the initial state from 0 to 1 to make this possible.
block: Catch attempt to attach multiple devices to a blockdev
For instance, -device scsi-disk,drive=foo -device scsi-disk,drive=foo
happily creates two SCSI disks connected to the same block device.
It's all downhill from there.
Device usb-storage deliberately attaches twice to the same blockdev,
which fails with the fix in place. Detach before the second attach
there.
Also catch attempt to delete while a guest device model is attached.
Make the property point to BlockDriverState, cutting out the DriveInfo
middleman. This prepares the ground for block devices that don't have
a DriveInfo.
Currently all user-defined ones have a DriveInfo, because the only way
to define one is -drive & friends (they go through drive_init()).
DriveInfo is closely tied to -drive, and like -drive, it mixes
information about host and guest part of the block device. I'm
working towards a new way to define block devices, with clean
host/guest separation, and I need to get DriveInfo out of the way for
that.
Fortunately, the device models are perfectly happy with
BlockDriverState, except for two places: ide_drive_initfn() and
scsi_disk_initfn() need to check the DriveInfo for a serial number set
with legacy -drive serial=... Use drive_get_by_blockdev() there.
Device model code should now use DriveInfo only when explicitly
dealing with drives defined the old way, i.e. without -device.
We automatically delete blockdev host parts on unplug of the guest
device. Too much magic, but we can't change that now.
The delete happens early in the guest device teardown, before the
connection to the host part is severed. Thus, the guest part's
pointer to the host part dangles for a brief time. No actual harm
comes from this, but we'll catch such dangling pointers a few commits
down the road. Clean up the dangling pointers by delaying the
automatic deletion until the guest part's pointer is gone.
Device usb-storage deliberately makes two qdev properties refer to the
same drive, because it automatically creates a second device. Again,
too much magic we can't change now. Multiple references worked okay
before, but now free_drive() dies for the second one. Zap the extra
reference.
Ryan Harper [Mon, 28 Jun 2010 14:38:33 +0000 (09:38 -0500)]
Don't reset bs->is_temporary in bdrv_open_common
To fix https://bugs.launchpad.net/qemu/+bug/597402 where qemu fails to
call unlink() on temporary snapshots due to bs->is_temporary getting clobbered
in bdrv_open_common() after being set in bdrv_open() which calls the former.
We don't need to initialize bs->is_temporary in bdrv_open_common().
ide: Make it explicit that ide_create_drive() can't fail
All callers of ide_create_drive() ignore its value. Currently
harmless, because it fails only when qdev_init() fails, which fails
only when ide_drive_initfn() fails, which never fails.
Brittle. Change it to die instead of silently ignoring failure.
block: allow filenames with colons again for host devices
Before the raw/file split we used to allow filenames with colons for host
device only. While this was more by accident than by design people rely
on it, so we need to bring it back.
So move the host device probing to be before the protocol detection
again.
Andi Kleen [Sat, 26 Jun 2010 22:06:11 +0000 (00:06 +0200)]
Add more boundary checking to sse3/4 parsing
ssse3 uses tables with only two entries per op, but it is indexed
with b1 which can contain variables upto 3. This happens when ssse3
or sse4 are used with REP* prefixes.
Jan Kiszka [Mon, 28 Jun 2010 16:27:47 +0000 (18:27 +0200)]
monitor: Allow to exclude commands from QMP
Ported commands that are marked 'user_only' will not be considered for
QMP monitor sessions. This allows to implement new commands that do not
(yet) provide a sufficiently stable interface for QMP use.
Luiz Capitulino [Mon, 31 May 2010 20:28:01 +0000 (17:28 -0300)]
QMP: Introduce qmp_check_input_obj()
This is similar to qmp_check_client_args(), but it checks if
the input object follows the specification (QMP/qmp-spec.txt
section 2.3).
As we're limited to three keys, the work here is quite simple:
we iterate over the input object, checking each time if the
current argument complies to the specification.
Luiz Capitulino [Fri, 28 May 2010 20:24:49 +0000 (17:24 -0300)]
QMP: Drop old client argument checker
Previous two commits added qmp_check_client_args(), which
fully replaces this code and is way better.
It's important to note that the new checker doesn't support
the '/' arg type. As we don't have any of those handlers
converted to QMP, this is just dead code.
Luiz Capitulino [Tue, 22 Jun 2010 14:44:05 +0000 (11:44 -0300)]
QMP: New argument checker (second part)
This commit introduces the second (and last) part of QMP's new
argument checker.
The job is done by check_client_args_type(), it iterates over
the client's argument qdict and for for each argument it checks
if it exists and if its type is valid.
It's important to observe the following changes from the existing
argument checker:
- If the handler accepts an O-type argument, unknown arguments
are passed down to it. It's up to O-type handlers to validate
their arguments
- Boolean types (eg. 'b' and '-') don't accept integers anymore,
only json-bool
- Argument types '/' and '.' are currently unsupported under QMP,
thus they're not handled
Luiz Capitulino [Wed, 26 May 2010 19:13:09 +0000 (16:13 -0300)]
QMP: New argument checker (first part)
Current QMP's argument checker is more complex than it should be
and has (at least) one serious bug: it ignores unknown arguments.
To solve both problems we introduce a new argument checker. It's
added on top of the existing one, so that there are no regressions
during the transition.
This commit introduces the first part of the new checker, which
is run by qmp_check_client_args() and does the following:
1. Check if all mandatory arguments were provided
2. Set flags for argument validation
In order to do that, we transform the args_type string (from
qemu-montor.hx) into a qdict and iterate over it.
Next commit adds the new checker's second part: type checking and
invalid argument detection.
Luiz Capitulino [Fri, 28 May 2010 18:25:24 +0000 (15:25 -0300)]
Monitor: handle optional '-' arg as a bool
Historically, user monitor arguments beginning with '-' (eg. '-f')
were passed as integers down to handlers.
I've maintained this behavior in the new monitor because we didn't
have a boolean type at the very beginning of QMP. Today we have it
and this behavior is causing trouble to QMP's argument checker.
This commit fixes the problem by doing the following changes:
1. User Monitor
Before: the optional arg was represented as a QInt, we'd pass 1
down to handlers if the user specified the argument or
0 otherwise
This commit: the optional arg is represented as a QBool, we pass
true down to handlers if the user specified the
argument, otherwise _nothing_ is passed
2. QMP
Before: the client was required to pass the arg as QBool, but we'd
convert it to QInt internally. If the argument wasn't passed,
we'd pass 0 down
This commit: still require a QBool, but doesn't do any conversion and
doesn't pass any default value
3. Convert existing handlers (do_eject()/do_migrate()) to the new way
Before: Both handlers would expect a QInt value, either 0 or 1
This commit: Change the handlers to accept a QBool, they handle the
following cases:
A) true is passed: the option is enabled
B) false is passed: the option is disabled
C) nothing is passed: option not specified, use
default behavior
Luiz Capitulino [Mon, 7 Jun 2010 19:53:51 +0000 (16:53 -0300)]
QDict: Introduce functions to retrieve QDictEntry values
Next commit will introduce a new QDict iteration API which
returns QDictEntry entries, but we don't want users to directly
access its members since QDictEntry should be private to QDict.
In the near future this kind of data type will be turned into a
forward reference.