Alex Williamson [Thu, 2 Sep 2010 15:00:57 +0000 (09:00 -0600)]
virtio-net: Limit number of packets sent per TX flush
If virtio_net_flush_tx() is called with notification disabled, we can
race with the guest, processing packets at the same rate as they
get produced. The trouble is that this means we have no guaranteed
exit condition from the function and can spend minutes in there.
Currently flush_tx is only called with notification on, which seems
to limit us to one pass through the queue per call. An upcoming
patch changes this.
Also add an option to set this value on the command line as different
workloads may wish to use different values. We can't necessarily
support any random value, so this is a developer option: x-txburst=
Usage:
-device virtio-net-pci,x-txburst=64 # 64 packets per tx flush
One pass through the queue (256) seems to be a good default value
for this, balancing latency with throughput. We use a signed int
for x-txburst because 2^31 packets in a burst would take many, many
minutes to process and it allows us to easily return a negative
value value from virtio_net_flush_tx() to indicate a back-off
or error condition.
Alex Williamson [Thu, 2 Sep 2010 15:00:50 +0000 (09:00 -0600)]
virtio-net: Make tx_timer timeout configurable
Add an option to make the TX mitigation timer adjustable as a device
option. The 150us hard coded default used currently is reasonable,
but may not be suitable for all workloads, this gives us a way to
adjust it using a single binary. We can't support any random option
though, so use the "x-" prefix to indicate this is a developer
option. Usage:
Patch b0b900070c7cb29bbefb732ec00397abe5de6d73 made
TOR valuer incorrect: the spec says it should always
include the CRC field.
No one seems to use this field, but better to stick to spec.
Blue Swirl [Tue, 31 Aug 2010 20:16:59 +0000 (20:16 +0000)]
Fix OpenBSD linker warning
Fix a warning from OpenBSD linker:
../libhw32/vl.o(.text+0x5c3c): In function `main':
/src/qemu/vl.c:2335: warning: sprintf() is often misused, please use snprintf()
acpi table file can be modified during load so file size check
should be more strict.
pointer calculation should be after qemu_realloc(). not before realloc().
Isaku Yamahata [Wed, 4 Aug 2010 08:43:20 +0000 (17:43 +0900)]
isapc: fix segfault.
https://bugs.launchpad.net/bugs/611646
reports that ./i386-softmmu/qemu -M isapc segfaults.
This patch fixes the segfault introduced by f885f1eaa8711c06033ceb1599e3750fb37c306f
It's because i440fx_state in pc_init1() isn't initialized.
> Core was generated by `./i386-softmmu/qemu -M isapc'.
> Program terminated with signal 11, Segmentation fault.
> [New process 19686]
> at qemu/hw/piix_pci.c:136
> (gdb) where
> at qemu/hw/piix_pci.c:136
> boot_device=0x7fffe1f5b040 "cad", kernel_filename=0x0,
> kernel_cmdline=0x6469bf "", initrd_filename=0x0,
> cpu_model=0x654d10 "486", pci_enabled=0)
> at qemu/hw/pc_piix.c:178
> boot_device=0x7fffe1f5b040 "cad", kernel_filename=0x0,
> kernel_cmdline=0x6469bf "", initrd_filename=0x0, cpu_model=0x654d10 "486")
> at qemu/hw/pc_piix.c:207
> envp=0x7fffe1f5b188)
> at qemu/vl.c:2871
It is possible that subpage mmio is registered over existing memory
page. When this happens "memory" will have real memory address and not
index into io_mem array so next access to the page will generate
segfault. It is uncommon to have some part of a page to be accessed as
memory and some as mmio, but qemu shouldn't crash even when guest does
stupid things. So lets just pretend that the rest of the page is
unassigned if guest configure part of the memory page as mmio.
ppc4xx: load Bamboo kernel, initrd, and fdt at fixed addresses
We can't use the return value of load_uimage() for the kernel because it
can't account for BSS size, and the PowerPC kernel does not relocate
blobs before zeroing BSS.
Instead, we now load at the fixed addresses chosen by u-boot (the normal
firmware for the board).
The PowerPC 4xx SDRAM controller emulation unregisters RAM in its reset
callback. However, qemu_system_reset() is now called at initialization
time, so all RAM is unregistered before starting the guest (!).
The message "Truncating memory to %d MiB to fit SDRAM controller limits"
should be displayed only when a user chooses an amount of RAM which
can't be represented by the PPC 4xx SDRAM controller (e.g. 129MB, which
would only be valid if the controller supports a bank size of 1MB).
We must be able to use a non-native strip executable, but not all
versions of 'install' support the --strip-program option (e.g.
OpenBSD). Accordingly, we can't use 'install -s', and we must run strip
separately.
Alexander Graf [Tue, 3 Aug 2010 13:22:42 +0000 (15:22 +0200)]
PPC: Add PV hypercall transport through fw_cfg
On KVM for PPC we need to tell the guest which instructions to use when
doing a hypercall. The clean way to do this is to go through an ioctl
from userspace and passing it on to the guest using the device tree.
So let's do the qemu part here: read out the hypercall and pass it on
to the guest's fw_cfg so openBIOS can read it out and expose it again.
Alex Williamson [Fri, 20 Aug 2010 21:34:16 +0000 (15:34 -0600)]
VGA: Don't register deprecated VBE range
Old versions of the BOCHs VGA BIOS (cira 2003) made use of VBE
registers at 0xff80/81. In VBE API version 0xb0c2 these were
moved to 0x1ce/cf. Unfortunately, QEMU still registers handlers
for the old range. If a guest attempts to assign an I/O device
overlapping this region, QEMU exits with a hw_error. Windows
guests seem to like to assign I/O devices to the high end of
the address space, so it's pretty easy to hot add an rtl8139
to a Win2k8 guest and trigger the bug. I can't find any reason
to register these handlers, so let's remove the cruft.
Yoshiaki Tamura [Wed, 18 Aug 2010 06:41:49 +0000 (15:41 +0900)]
exec: remove code duplication in qemu_ram_alloc() and qemu_ram_alloc_from_ptr()
Since most of the code in qemu_ram_alloc() and
qemu_ram_alloc_from_ptr() are duplicated, let
qemu_ram_alloc_from_ptr() to switch by checking void *host, and change
qemu_ram_alloc() to a wrapper.
Bernhard Kohl [Thu, 19 Aug 2010 12:52:12 +0000 (14:52 +0200)]
pckbd: support for commands 0xf0-0xff: Pulse output bit
I have a guest OS which sends the command 0xfd to the keyboard
controller during initialization. To get rid of the message
"qemu: unsupported keyboard cmd=0x%02x\n" I added support for
the pulse output bit commands.
I found the following explanation here:
http://www.win.tue.nl/~aeb/linux/kbd/scancodes-11.html#ss11.3
Command 0xf0-0xff: Pulse output bit
Bits 3-0 of the output port P2 of the keyboard controller may
be pulsed low for approximately 6 µseconds. Bits 3-0 of this
command specify the output port bits to be pulsed. 0: Bit should
be pulsed. 1: Bit should not be modified. The only useful version
of this command is Command 0xfe.
(For MCA, replace 3-0 by 1-0 in the above.)
Command 0xfe: System reset
Pulse bit 0 of the output port P2 of the keyboard controller.
This will reset the CPU.
Alex Williamson [Thu, 19 Aug 2010 13:18:42 +0000 (10:18 -0300)]
savevm: Reset last block info at beginning of each save
If we save more than once we need to reset the last block info or else
only the first save has the actual block info and each subsequent save
will only use continue flags, making them unloadable independently.
Amit Shah [Wed, 23 Jun 2010 14:44:04 +0000 (20:14 +0530)]
rtc: Remove TARGET_I386 from qemu-config.c, enables driftfix
qemu-config.c doesn't contain any target-specific code, and the
TARGET_I386 conditional code didn't get compiled as a result. Removing
this enables the driftfix parameter for rtc.
Avi Kivity [Wed, 7 Jul 2010 16:44:22 +0000 (19:44 +0300)]
QEMUFileBuffered: indicate that we're ready when the underlying file is ready
QEMUFileBuffered stops writing when the underlying QEMUFile is not ready,
and tells its producer so. However, when the underlying QEMUFile becomes
ready, it neglects to pass that information along, resulting in stoppage
of all data until the next tick (a tenths of a second).
Usually this doesn't matter, because most QEMUFiles used with QEMUFileBuffered
are almost always ready, but in the case of exec: migration this is not true,
due to the small pipe buffers used to connect to the target process. The
result is very slow migration.
Fix by detecting the readiness notification and propagating it. The detection
is a little ugly since QEMUFile overloads put_buffer() to send it, but that's
the suject for a different patch.
Artyom Tarasenko [Sun, 15 Aug 2010 14:04:41 +0000 (16:04 +0200)]
sparc escc IUS improvements (SunOS 4.1.4 fix)
According to scc_escc_um.pdf:
- Reset Highest IUS must update irq status to allow processing
of the next priority interrupt.
- rx interrupt has always higher priority than tx on same channel
The documentation only explicitly says that Reset Highest IUS
command (0x38) clears IUS bits, not that it clears the corresponding
interrupt too, so don't clear interrupts on this command.
The patch allows SunOS 4.1.4 to use the serial ports
configure adds the macro WIN32_LEAN_AND_MEAN to
QEMU_CFLAGS, and SDL_syswm.h defines it, too.
This results in a compiler warning (redefinition of
WIN32_LEAN_AND_MEAN in SDL_syswm.h. That warning prevents
compilations for win32 with warning = error).
Fix this by removing the definition of WIN32_LEAN_AND_MEAN
before including SDL_syswm.h.
5da79c86a3744e3a901c7986c109dd06951befd2 broke compilation on Mac OS X v10.5 ppc.
Apple's GCC 4.0.1 does not define _CALL_DARWIN. Recognize __APPLE__ again as well.
Support an inter-vm shared memory device that maps a shared-memory object as a
PCI device in the guest. This patch also supports interrupts between guest by
communicating over a unix domain socket. This patch applies to the qemu-kvm
repository.
-device ivshmem,size=<size in format accepted by -m>[,shm=<shm name>]
Interrupts are supported between multiple VMs by using a shared memory server
by using a chardev socket.
-device ivshmem,size=<size in format accepted by -m>[,shm=<shm name>]
[,chardev=<id>][,msi=on][,ioeventfd=on][,vectors=n][,role=peer|master]
-chardev socket,path=<path>,id=<id>
The shared memory server, sample programs and init scripts are in a git repo here:
Alex Williamson [Wed, 14 Jul 2010 19:36:49 +0000 (13:36 -0600)]
kvm: Don't walk memory_size == 0 slots in kvm_client_migration_log
If we've unregistered a memory area, we should avoid calling
qemu_get_ram_ptr() on the left over phys_offset cruft in the
slot array. Now that we support removing ramblocks, the
phys_offset ram_addr_t can go away and cause a lookup fault
and abort.
Andrea Arcangeli [Tue, 27 Jul 2010 19:04:36 +0000 (21:04 +0200)]
ide: Avoid canceling IDE DMA
The reason for not actually canceling the I/O is because with
virtualization and lots of VM running, a guest fs may mistake a
overload of the host, as an IDE timeout. So rather than canceling the
I/O, it's safer to wait I/O completion and simulate that the I/O has
completed just before the io cancellation was requested by the
guest. This way if ntfs or an app writes data without checking for
-EIO retval, and it thinks the write has succeeded, it's less likely
to run into troubles. Similar issues for reads.
Furthermore because the DMA operation is splitted into many synchronous
aio_read/write if there's more than one entry in the SG table, without this
patch the DMA would be cancelled in the middle, something we've no idea if it
happens on real hardware too or not. Overall this seems a great risk for zero
gain.
This approach is sure safer than previous code given we can't pretend all guest
fs code out there to check for errors and reply the DMA if it was completed
partially, given a timeout would never materialize on a real harddisk unless
there are defective blocks (and defective blocks are practically only an issue
for reads never for writes in any recent hardware as writing to blocks is the
way to fix them) or the harddisk breaks as a whole.
bdrv_eject() gets called when a device model opens or closes the tray.
If the block driver implements method bdrv_eject(), that method gets
called. Drivers host_cdrom implements it, and it opens and closes the
physical tray, and nothing else. When a device model opens, then
closes the tray, media changes only if the user actively changes the
physical media while the tray is open. This is matches how physical
hardware behaves.
If the block driver doesn't implement method bdrv_eject(), we do
something quite different: opening the tray severs the connection to
the image by calling bdrv_close(), and closing the tray does nothing.
When the device model opens, then closes the tray, media is gone,
unless the user actively inserts another one while the tray is open,
with a suitable change command in the monitor. This isn't how
physical hardware behaves. Rather inconvenient when programs
"helpfully" eject media to give you a chance to change it. The way
bdrv_eject() behaves here turns that chance into a must, which is not
what these programs or their users expect.
Change the default action not to call bdrv_close(). Instead, note the
tray status in new BlockDriverState member tray_open. Use it in
bdrv_is_inserted().
Arguably, the device models should keep track of tray status
themselves. But this is less invasive.
Kevin Wolf [Wed, 28 Jul 2010 09:26:29 +0000 (11:26 +0200)]
block: Fix bdrv_has_zero_init
Assuming that any image on a block device is not properly zero-initialized is
actually wrong: Only raw images have this problem. Any other image format
shouldn't care about it, they initialize everything properly themselves.
This patch improves the resilience of the load_vmstate() function, doing
further and better ordered tests.
In load_vmstate(), if there is any error on bdrv_snapshot_goto(), except if the
error is on VM state device, load_vmstate() will return zero and the VM will be
started with major corruption chances.
The current process:
- test if there is any writable device without snapshot support
- if exists return -error
- get the device that saves the VM state, possible return -error but unlikely
because it was tested earlier
- flush I/O
- run bdrv_snapshot_goto() on devices
- if fails, give an warning and goes to the next (not good!)
- if fails on the VM state device, return zero (not good!)
- check if the requested snapshot exists on the device that saves the VM state
and the state is not zero
- if fails return -error
- open the file with the VM state
- if fails return -error
- load the VM state
- if fails return -error
- return zero
New behavior:
- get the device that saves the VM state
- if fails return -error
- check if the requested snapshot exists on the device that saves the VM state
and the state is not zero
- if fails return -error
- test if there is any writable device without snapshot support
- if exists return -error
- test if the devices with snapshot support have the requested snapshot
- if anyone fails, return -error
- flush I/O
- run snapshot_goto() on devices
- if anyone fails, return -error
- open the file with the VM state
- if fails return -error
- load the VM state
- if fails return -error
- return zero
do_loadvm must not call vm_start if any error has occurred in load_vmstate.
Kevin Wolf [Fri, 16 Jul 2010 15:17:01 +0000 (17:17 +0200)]
block: Change bdrv_commit to handle multiple sectors at once
bdrv_commit copies the image to its backing file sector by sector, which
is (surprise!) relatively slow. Let's take a larger buffer and handle more
sectors at once if possible.
With a 1G qcow2 file, this brought the time bdrv_commit takes down from
5:06 min to 1:14 min for me.
Blue Swirl [Sat, 31 Jul 2010 19:40:17 +0000 (19:40 +0000)]
Fix uint8_t comparison with negative value
Commit 7bccf57383cca60a778d5c543ac80c9f62d89ef2 missed this one:
/src/qemu/ui/vnc-enc-tight.c: In function 'send_sub_rect':
/src/qemu/ui/vnc-enc-tight.c:1527: warning: comparison is always true due to limited range of data type
Blue Swirl [Sat, 31 Jul 2010 19:40:13 +0000 (19:40 +0000)]
Fix a warning on OpenSolaris
Add a missing #include statement to avoid a warning:
/src/qemu/net/tap-solaris.c: In function 'tap_open':
/src/qemu/net/tap-solaris.c:189: warning: implicit declaration of function 'error_report'
Amit Shah [Tue, 27 Jul 2010 10:19:19 +0000 (15:49 +0530)]
migration: Accept 'cont' only after successful incoming migration
When a 'cont' is issued on a VM that's just waiting for an incoming
migration, the VM reboots and boots into the guest, possibly corrupting
its storage since it could be shared with another VM running elsewhere.
Ensure that a VM started with '-incoming' is only run when an incoming
migration successfully completes.
A new qerror, QERR_MIGRATION_EXPECTED, is added to signal that 'cont'
failed due to no incoming migration has been attempted yet.
Joel Schopp [Wed, 21 Jul 2010 20:05:16 +0000 (15:05 -0500)]
fix variable type in qemu-io.c
The variable len can get a negative return value from cvtnum,
which we check for, but which is impossible with the current
unsigned variable type. Currently the if(len < 0) check is
pointless. This patch fixes that.
Merge branch 'for-anthony' of git://repo.or.cz/qemu/kevin
* 'for-anthony' of git://repo.or.cz/qemu/kevin:
Fix -snapshot deleting images on disk change
block: Use error codes from lower levels for error message
block: default to 0 minimal / optiomal I/O size
move 'unsafe' to end of caching modes in help
virtio-blk: Create exit function to unregister savevm
block migration: propagate return value when bdrv_write() returns < 0
ide/atapi: add support for GET EVENT STATUS NOTIFICATION