mips: Correct MIPS interrupt glue logic for icount
When hw interrupt pending bits in CP0_Cause are set, the CPU should
see the hw interrupt line as active. The CPU may or may not take the
interrupt based on internal state (global irq mask etc) but the glue
logic shouldn't care.
This fixes MIPS external hw interrupts in combination with -icount.
Jan Kiszka [Tue, 13 Jul 2010 12:13:45 +0000 (14:13 +0200)]
scsi: Dequeue requests before invoking completion callback
The request completion callback of the LSI controller may start the next
request that can use the same tag as the completed one. As the latter is
still enqueued at that point, scsi_send_command will complain about the
tag reuse and cancel the completed request. That will cause a double
free later on when the completion path cleans up as well.
Fix this by dequeuing the request before invoking the callback.
e1000: Fix wrong microwire EEPROM state initialization
This change fixes initialization of e1000's microwire EEPROM internal
state values so that qemu's e1000 emulation works on NetBSD,
which doesn't use Intel's em driver but has its own wm driver
for the Intel i8254x Gigabit Ethernet.
Previously set_eecd() function in e1000.c clears EEPROM internal state
values on SK rising edge during CS==L, but according to FM93C06 EEPROM
(which is MicroWire compatible) data sheet, EEPROM internal status
should be cleared on CS rise edge regardless of SK input:
"... a rising edge on this (CS) signal is required to reset the internal
state-machine to accept a new cycle .."
and nothing should be changed during CS (chip select) is inactive.
Intel's em driver seems to explicitly raise SK output after CS is negated
in em_standby_eeprom() so many other OSes that use Intel's driver
don't have this problem even on the previous e1000.c implementation,
but I can't find any articles that say the MICROWIRE or EEPROM spec
requires such sequence, and actually hardware works fine without it
(i.e. real i82540EM has been working on NetBSD).
This fix also changes initialization to clear each state value in
struct eecd_state individually rather than using memset() against
the whole structre. The old_eecd member stores the last SK and CS
signal levels and it should be preserved even after reset of internal
EEPROM state to detect next signal edges for proper EEPROM emulation.
Jan Kiszka [Fri, 25 Jun 2010 14:56:56 +0000 (16:56 +0200)]
Rework debug exception processing for gdb use
Guest debugging is currently broken under CONFIG_IOTHREAD. The reason is
inconsistent or even lacking signaling the debug events from the source
VCPU to the main loop and the gdbstub.
This patch addresses the issue by pushing this signaling into a
CPUDebugExcpHandler: cpu_debug_handler is registered as first handler,
thus will be executed last after potential breakpoint emulation
handlers. It sets informs the gdbstub about the debug event source,
requests a debug exit of the main loop and stops the current VCPU. This
mechanism works both for TCG and KVM, with and without IO-thread.
Jan Kiszka [Fri, 25 Jun 2010 14:56:53 +0000 (16:56 +0200)]
Fix qemu_wait_io_event processing in io-thread mode
When checking for I/O events in the tcg CPU loop, make sure that we
call qemu_wait_io_event_common for all CPUs, not only the current one.
Otherwise pause_all_vcpus may lock up or run_on_cpu requests may starve.
Rename qemu_wait_io_event to qemu_tcg_wait_io_event at this chance and
purge its argument list as it has no use for it.
Jan Kiszka [Fri, 25 Jun 2010 14:56:52 +0000 (16:56 +0200)]
Fix cpu_exit for tcp_cpu_exec
If a cpu_exit request is pending, ensure that we leave the CPU loop
quickly. For this purpose, keep the global exit_request pending until
we are about to leave tcg_cpu_exec. Also, immediately break out of the
SMP loop if the request is set, do not run till the end of the chain.
This preserves the VCPU scheduling order in SMP mode.
Jan Kiszka [Fri, 25 Jun 2010 14:56:50 +0000 (16:56 +0200)]
Fix cpu_unlink_tb race
If a signal hit after the env->exit_request check but before cpu_exec
updated env->current_tb, cpu_unlink_tb called from the signal hander
will not unlink the current TB. This may leave us stuck in a guest loop
if no further unlink is invoked.
Fix this by reordering current_tb update and exit_request check,
additionally enforcing the correct order via a compiler barrier.
Bob Breuer [Tue, 13 Jul 2010 16:05:24 +0000 (11:05 -0500)]
Sparc32: reserve addresses for unimplemented devices on SS-20
Use empty_slot to reserve addresses for several unimplemented devices so they won't fault.
- BPP (parallel port), DBRI (audio), SX (pixel processor), and vsimms (framebuffer)
OBP for SS-20 either assumes these devices exist or probes without expecting faults.
Anthony Liguori [Wed, 14 Jul 2010 15:58:00 +0000 (10:58 -0500)]
Make default invocation of block drivers safer (v3)
CVE-2008-2004 described a vulnerability in QEMU whereas a malicious user could
trick the block probing code into accessing arbitrary files in a guest. To
mitigate this, we added an explicit format parameter to -drive which disabling
block probing.
Fast forward to today, and the vast majority of users do not use this parameter.
libvirt does not use this by default nor does virt-manager.
Most users want block probing so we should try to make it safer.
This patch adds some logic to the raw device which attempts to detect a write
operation to the beginning of a raw device. If the first 4 bytes happen to
match an image file that has a backing file that we support, it scrubs the
signature to all zeros. If a user specifies an explicit format parameter, this
behavior is disabled.
I contend that while a legitimate guest could write such a signature to the
header, we would behave incorrectly anyway upon the next invocation of QEMU.
This simply changes the incorrect behavior to not involve a security
vulnerability.
I've tested this pretty extensively both in the positive and negative case. I'm
not 100% confident in the block layer's ability to deal with zero sized writes
particularly with respect to the aio functions so some additional eyes would be
appreciated.
Even in the case of a single sector write, we have to make sure to invoked the
completion from a bottom half so just removing the zero sized write is not an
option.
Move the check from virtio_blk_init_pci(), where it protects only
virtio-blk-pci, to virtio_blk_init(). Without that, virtio-blk-s390
initializes without a drive. I figure that can lead to null pointer
dereferences.
Amit Shah [Thu, 1 Jul 2010 09:28:17 +0000 (14:58 +0530)]
virtio-serial: Assert for virtio queue ready before virtqueue operations
In addition to the previous fix for calling do_flush_queued_data() only
when the virtqueue is ready, ensure do_flush_queued_data() gets a vq
that's suitably initialised.
Sripathi Kodi [Wed, 30 Jun 2010 11:02:00 +0000 (16:32 +0530)]
virtio-9p: Avoid SEGV when log file couldn't be opened
While running in debug mode if 9P server is unable to open the log file
it results in a SEGV deep down in glibc:
Program received signal SIGSEGV, Segmentation fault.
0x008fca8c in fwrite () from /lib/libc.so.6
(gdb) bt
#0 0x008fca8c in fwrite () from /lib/libc.so.6
#1 0x081eb87e in pprint_pdu (pdu=0x89a52e1c)
at /data/sripathi/code/qemu/new/qemu-next-upstream/hw/virtio-9p-debug.c:380
#2 0x0806dad8 in submit_pdu (s=0x897dc008, pdu=0x89a52e1c)
at /data/sripathi/code/qemu/new/qemu-next-upstream/hw/virtio-9p.c:3092
#3 0x0806dc63 in handle_9p_output (vdev=0x897dc008, vq=0x86d8218)
at /data/sripathi/code/qemu/new/qemu-next-upstream/hw/virtio-9p.c:3122
#4 0x081ac728 in virtio_queue_notify (vdev=0x897dc008, n=0)
at /data/sripathi/code/qemu/new/qemu-next-upstream/hw/virtio.c:563
#5 0x08063876 in virtio_ioport_write (opaque=0x86d7b98, addr=16, val=0)
at /data/sripathi/code/qemu/new/qemu-next-upstream/hw/virtio-pci.c:222
#6 0x08063e26 in virtio_pci_config_writew (opaque=0x86d7b98, addr=16, val=0)
at /data/sripathi/code/qemu/new/qemu-next-upstream/hw/virtio-pci.c:357
#7 0x080c881a in ioport_write (index=1, address=49296, data=0) at ioport.c:80
#8 0x080c8d4c in cpu_outw (addr=49296, val=0) at ioport.c:204
#9 0x08073010 in kvm_handle_io (port=49296, data=0xab393000, direction=1, size=2, count=1)
at /data/sripathi/code/qemu/new/qemu-next-upstream/kvm-all.c:735
...
...
This is ugly and misleading. The following patch adds a BUG_ON to catch this
error. With this patch we get an abort message like the following, which makes
it easier to analyze:
Anthony Liguori [Tue, 13 Jul 2010 13:55:04 +0000 (08:55 -0500)]
Update SeaBIOS
- 17d3e46 smbios: Allow all fields to be set via qemu_cfg_smbios_load_field()
- 0d6b8d5 seabios: pciinit: use pci device initializer helper function.
- 968d3a8 seabios: pci: introduce helper function to initialize a given device.
- 4e0daae virtio: Clear interrupt status register in virtio-blk
- af0963d seabios: pciinit: initialize pci bridge filtering registers.
- f441666 seabios: pciinit: pci bridge bus initialization.
- 5d0de15 seabios: pciinit: make bar offset calculation pci bridge aware.
- a65821d seabios: pciinit: factor out bar offset calculation.
- 0a8eada seabios: pciinit: make pci bar assigner preferchable memory aware.
- dfd94fa seabios: pciinit: make pci memory space assignment 64bit aware.
- b9e4721 seabios: pciinit: factor out pci bar region allocation logic.
- edd9911 seabios: pci: introduce foreachpci_in_bus() helper macro.
- f79a462 Add romfile_size() wrapper for accessing cbfs/qemu_cfg files.
- afbed1b Initial bootsplash support.
- 83d6ed6 Update TODO
- 1d7d893 Fix bvprintf() to respect padding for hex printing.
- e230426 Unify optionrom cbfs/qemu_cfg rom pulling code.
- 8cb8ba5 SeaBIOS VGA hooks
- 203f6f3 SeaBIOS CD/DVD abbreviations
- 12cbb43 seabios: remove iasl output file when error.
- d5d02b6 Allocate cdemu buffer in low mem instead of ebda.
- 8f59aa3 Introduce memcpy_fl - a memcpy on "flat" pointers.
- 42a1d4c Rework malloc to use a "first fit" algorithm.
- 34e9cc5 Minor mptable changes.
- 0f3783b virtio: clean up memory barrier usage
- bfe4d60 virtio: remove NO_NOTIFY optimization
- bb68591 Don't use RTC to time boot menu delay.
- b5cc2ca Generalize timer based delay code.
- 144817b Rename check_time() to check_tsc().
- 9c447c3 Allow wait_irq to be called in 32bit code.
- 49cc72b Improve optionrom debugging statements.
- c65a4a6 Minor - compile out usb-msc code if CONFIG_USB_MSC not set.
- 456479e Minor ata cleanups.
- 2515a72 Make sure virtio-blk is fully compiled out if not wanted.
- c4fe135 Minor - split up virtio_blk_setup().
- 4030db0 fix two issues with virtio-blk
- ea8ac63 Minor improvements to virtio (allow irqs, allocate page aligned).
target-sh4: Split the LDST macro into 2 sub-macros
The LDST macro is used to generate ldc and stc instructions that work with a
specific register. However, the SGR register only supports stc up to SH4A,
which supports both stc and ldc. This patch creates two sub-macros named LD
and ST that handle generating ldc and stc instructions separately, and
redeclares LDST to use these sub-macro.
Alexander Graf [Wed, 30 Jun 2010 08:41:12 +0000 (10:41 +0200)]
AppleSMC device emulation
Intel Macs have a chip called the "AppleSMC" which they use to control
certain Apple specific parts of the hardware, like the keyboard background
light.
That chip is also used to store a key that Mac OS X uses to decrypt binaries.
This patch adds emulation for that chip, so we're getting one step further
to having Mac OS X run natively on Qemu.
Isaku Yamahata [Wed, 23 Jun 2010 07:15:33 +0000 (16:15 +0900)]
pci: set PCI multi-function bit appropriately.
Set PCI multi-function bit according to multifunction property.
PCI address, devfn ,is exported to users as addr property,
so users can populate pci function(PCIDevice in qemu)
at arbitrary devfn.
It means each function(PCIDevice) don't know whether pci device
(PCIDevice[8]) is multi function or not.
So this patch allows user to set multifunction bit via property
and checks whether multifunction bit is set correctly.
Isaku Yamahata [Wed, 23 Jun 2010 07:15:31 +0000 (16:15 +0900)]
pci: set multifunction property for normal device.
use pci_create_simple_multifunction() for normal device which sets
multifunction bit.
At the moment, only pc_piix.c and mips_malta.c uses multifunction
devices with piix3/4 pci-isa bridge.
And other boards don't populate those devices.
Isaku Yamahata [Wed, 23 Jun 2010 07:15:30 +0000 (16:15 +0900)]
pci: introduce multifunction property.
introduce multifunction property.
Also introduce new convenient device creation function which
will be used later.
For bisectability this patch doesn't do anything, but sets the property
resulting in no functional changes.
Actual changes will be introduced by later patch.
Isaku Yamahata [Wed, 23 Jun 2010 07:15:27 +0000 (16:15 +0900)]
pci: don't overwrite multi functio bit in pci header type.
Don't overwrite pci header type.
Otherwise, multi function bit which pci_init_header_type() sets
appropriately is lost.
Anyway PCI_HEADER_TYPE_NORMAL is zero, so it is unnecessary to zero
which is already zero cleared.
how to test:
run qemu and issue info pci to see whether a device in question is
normal device, not pci-to-pci bridge.
This is handy because guest os isn't required.
tested changes:
The following files are covered by using following commands.
sparc64-softmmu
apb_pci.c, vga-pci.c, cmd646.c, ne2k_pci.c, sun4u.c
ppc-softmmu
grackle_pci.c, cmd646.c, ne2k_pci.c, vga-pci.c, macio.c
ppc-softmmu -M mac99
unin_pci.c(uni-north, uni-north-agp)
ppc64-softmmu
pci-ohci, ne2k_pci, vga-pci, unin_pci.c(u3-agp)
x86_64-softmmu
acpi_piix4.c, ide/piix.c, piix_pci.c
-vga vmware vmware_vga.c
-watchdog i6300esb wdt_i6300esb.c
-usb usb-uhci.c
-sound ac97 ac97.c
-nic model=rtl8139 rtl8139.c
-nic model=pcnet pcnet.c
-balloon virtio virtio-pci.c:
untested changes:
The following changes aren't tested.
prep_pci.c: ppc-softmmu -M prep should cover, but core dumped.
unin_pci.c(uni-north-pci): the caller is commented out.
openpic.c: the caller is commented out in ppc_prep.c
Isaku Yamahata [Wed, 23 Jun 2010 07:15:26 +0000 (16:15 +0900)]
pci: insert assert that auto-assigned-address function is single function device.
Auto-assigned-address pci function (passing devfn = -1) is always
single function.
This patch adds assert() to guarantee that auto-assigned-address function
is always single function device at function = 0.
Isaku Yamahata [Wed, 23 Jun 2010 07:15:25 +0000 (16:15 +0900)]
pci: use PCI_DEVFN() where appropriate.
Use PCI_DEVFN() and PCI_FUNC_MAX where appropriate.
This patch make it clear that func = 0.
test:
The following object files with/without this patch are stripped and compared.
They remains same.
arm-softmmu/versatile_pci.o
libhw32/ppce500_pci.o
libhw32/unin_pci.o
libhw64/ppce500_pci.o
libhw64/unin_pci.o
mips-softmmu/gt64xxx.o
mips64-softmmu/gt64xxx.o
mips64el-softmmu/gt64xxx.o
mipsel-softmmu/gt64xxx.o
Blue Swirl [Wed, 7 Jul 2010 19:37:53 +0000 (19:37 +0000)]
Fix warning about uninitialized variable
With gcc 4.2.1-sjlj (mingw32-2) I get this warning:
/src/qemu/exec.c: In function 'qemu_ram_alloc':
/src/qemu/exec.c:2777: warning: 'offset' may be used uninitialized in this function
Alex Williamson [Fri, 2 Jul 2010 17:13:29 +0000 (11:13 -0600)]
ramblocks: No more being lazy about duplicate names
Now that we have a working qemu_ram_free() and the primary runtime
user of it has been updated, don't be lenient about duplicate id strings.
We also shouldn't need to create them ondemand at the target.
Alex Williamson [Fri, 25 Jun 2010 17:10:05 +0000 (11:10 -0600)]
savevm: Create a new continue flag to avoid resending block name
Allows us to compress the protocol a bit by setting a flag on the
offset which indicates we're still working within the same block
as last time. That way we can avoid sending the block name for
every page. Suggested by Anthony Liguori.
Alex Williamson [Fri, 25 Jun 2010 17:09:57 +0000 (11:09 -0600)]
savevm: Use RAM blocks for basis of migration
We don't want to assume a contiguous address space, so migrate based
on RAM blocks instead of a fixed linear address map. This will allow
us to have holes in the ram_addr_t namespace, so we can implement
qemu_ram_free().
Alex Williamson [Fri, 25 Jun 2010 17:09:50 +0000 (11:09 -0600)]
savevm: Migrate RAM based on name/offset
Synchronize RAM blocks with the target and migrate using name/offset
pairs. This ensures both source and target have the same view of
RAM and that we get the right bits into the right slot.
Alex Williamson [Fri, 25 Jun 2010 17:09:43 +0000 (11:09 -0600)]
ramblocks: Make use of DeviceState pointer and BusInfo.get_dev_path
With these two pieces in place, we can start naming ramblocks. When
the device is present and it lives on a bus that provides a device
path, we concatenate the path and the provided name. Otherwise we
just use name. The resulting id string must be unique. For now we
assume an allocation for the same name and size is a device that has
been removed and reinserted and return the same block. This will go
away once qemu_ram_free() is implemented.
Alex Williamson [Fri, 25 Jun 2010 17:09:35 +0000 (11:09 -0600)]
qemu_ram_alloc: Add DeviceState and name parameters
These will be used to generate unique id strings for ramblocks. The name
field is required, the device pointer is optional as most callers don't
have a device. When there's no device or the device isn't a child of
a bus implementing BusInfo.get_dev_path, the name should be unique for
the platform.
Alex Williamson [Fri, 25 Jun 2010 17:09:28 +0000 (11:09 -0600)]
virtio-net: Incorporate a DeviceState pointer and let savevm track instances
Stuff a pointer to the DeviceState into the VirtIONet structure so that
we can easily remove the vmstate entry later. Also, let vmstate track
the instance number (it should always be zero internally since the
device path should now be unique).
Alex Williamson [Fri, 25 Jun 2010 17:09:14 +0000 (11:09 -0600)]
savevm: Make use of DeviceState
For callers that pass a device we can traverse up the qdev tree and
make use of the BusInfo.get_dev_path information for creating unique
savevm id strings. This avoids needing to rely on the instance number,
which can cause problems with device initialization order and hotplug.
For compatibility, we also store away the old id string and instance
so we can accept migrations from VMs as we add new get_dev_path
implementations.
Alex Williamson [Fri, 25 Jun 2010 17:09:07 +0000 (11:09 -0600)]
savevm: Add DeviceState param
When available, we'd like to be able to access the DeviceState
when registering a savevm. For buses with a get_dev_path()
function, this will allow us to create more unique savevm
id strings.
Alex Williamson [Fri, 25 Jun 2010 17:08:59 +0000 (11:08 -0600)]
pci: Implement BusInfo.get_dev_path()
This works great for PCI since a <segment>:<bus>:<dev>.<fn> uniquely
describes a global address. No need to traverse up the qdev tree.
PCI segment support is a placeholder for compatibility once we
support multiple segments.
Alex Williamson [Fri, 25 Jun 2010 17:08:52 +0000 (11:08 -0600)]
qdev: Add a get_dev_path() function to BusInfo
This function is meant to provide a stable device path for buses
which are able to implement it. If a bus has a globally unique
addresses scheme, one address level may be sufficient to provide
a path. Other buses may need to recursively traverse up the
qdev tree.
Alex Williamson [Fri, 25 Jun 2010 17:08:38 +0000 (11:08 -0600)]
Remove uses of ram.last_offset (aka last_ram_offset)
We currently need this either to allocate the next ram_addr_t for a
new block, or for total memory to be migrated. Both of which we can
calculate without need of this to keep us in a contiguous address space.
Jan Kiszka [Tue, 6 Jul 2010 08:58:03 +0000 (10:58 +0200)]
scsi: Fix SCSI bus reset
When the controller raises the SCSI reset line, we have to perform the
requested reset on all disks attached to the controller's bus. Moreover,
reset is edge triggered, so avoid repeating it if the line was already
high.
MORITA Kazutaka [Sun, 20 Jun 2010 20:01:00 +0000 (05:01 +0900)]
block: add sheepdog driver for distributed storage support
Sheepdog is a distributed storage system for QEMU. It provides highly
available block level storage volumes to VMs like Amazon EBS. This
patch adds a qemu block driver for Sheepdog.
Sheepdog features are:
- No node in the cluster is special (no metadata node, no control
node, etc)
- Linear scalability in performance and capacity
- No single point of failure
- Autonomous management (zero configuration)
- Useful volume management support such as snapshot and cloning
- Thin provisioning
- Autonomous load balancing
The more details are available at the project site:
http://www.osrg.net/sheepdog/