Anthony Liguori [Mon, 17 Sep 2012 15:20:48 +0000 (10:20 -0500)]
Merge remote-tracking branch 'stefanha/net' into staging
* stefanha/net:
net: EAGAIN handling for net/socket.c TCP
net: EAGAIN handling for net/socket.c UDP
net: asynchronous send/receive infrastructure for net/socket.c
net: broadcast hub packets if at least one port can receive
net: fix usbnet_receive() packet drops
net: clean up usbnet_receive()
net: add -netdev options to man page
net: do not report queued packets as sent
net: add receive_disabled logic to iov delivery path
eepro100: Fix network hang when rx buffers run out
xen: flush queue when getting an event
e1000: flush queue whenever can_receive can go from false to true
net: notify iothread after flushing queue
Anthony Liguori [Mon, 17 Sep 2012 15:20:27 +0000 (10:20 -0500)]
Merge remote-tracking branch 'qemu-kvm/uq/master' into staging
* qemu-kvm/uq/master:
kvm: Rename irqchip_inject_ioctl to irq_set_ioctl
kvm: Stop flushing coalesced MMIO on vmexit
VGA: Flush coalesced MMIO on related MMIO/PIO accesses
memory: Flush coalesced MMIO on mapping and state changes
memory: Fold memory_region_update_topology into memory_region_transaction_commit
memory: Use transaction_begin/commit also for single-step operations
memory: Flush coalesced MMIO on selected region access
kvm-all.c: Move init of irqchip_inject_ioctl out of kvm_irqchip_create()
update-linux-headers.sh: Don't hard code list of architectures
David Gibson [Mon, 10 Sep 2012 02:30:57 +0000 (12:30 +1000)]
cpu_physical_memory_write_rom() needs to do TB invalidates
cpu_physical_memory_write_rom(), despite the name, can also be used to
write images into RAM - and will often be used that way if the machine
uses load_image_targphys() into RAM addresses.
However, cpu_physical_memory_write_rom(), unlike cpu_physical_memory_rw()
doesn't invalidate any cached TBs which might be affected by the region
written.
This was breaking reset (under full emu) on the pseries machine - we loaded
our firmware image into RAM, and while executing it rewrite the code at
the entry point (correctly causing a TB invalidate/refresh). When we
reset the firmware image was reloaded, but the TB from the rewrite was
still active and caused us to get an illegal instruction trap.
This patch fixes the bug by duplicating the tb invalidate code from
cpu_physical_memory_rw() in cpu_physical_memory_write_rom().
David Gibson [Mon, 10 Sep 2012 02:30:56 +0000 (12:30 +1000)]
qemu-char: BUGFIX, don't call FD_ISSET with negative fd
tcp_chr_connect(), unlike for example udp_chr_update_read_handler() does
not check if the fd it is using is valid (>= 0) before passing it to
qemu_set_fd_handler2(). If using e.g. a TCP serial port, which is not
initially connected, this can result in -1 being passed to FD_ISSET, which
has undefined behaviour. On x86 it seems to harmlessly return 0, but on
PowerPC, it causes a fortify buffer overflow error to be thrown.
This patch fixes this by putting an extra test in tcp_chr_connect(), and
also adds an assert qemu_set_fd_handler2() to catch other such errors on
all platforms, rather than just some.
Add an explicit CPUCRISState parameter instead of relying on AREG0, and
use cpu_ld* in translation and interrupt handling. Remove AREG0 swapping
in tlb_fill(). Switch to AREG0 free mode
Natanael Copa [Wed, 12 Sep 2012 09:06:51 +0000 (09:06 +0000)]
configure: properly check if -lrt and -lm is needed
Fixes build against uClibc.
uClibc provides 2 versions of clock_gettime(), one with realtime
support and one without (this is so you can avoid linking in -lrt
unless actually needed). This means that the clock_gettime() don't
need -lrt. We still need it for timer_create() so we check for this
function in addition.
Stefan Hajnoczi [Mon, 20 Aug 2012 09:14:35 +0000 (10:14 +0100)]
net: EAGAIN handling for net/socket.c TCP
Replace spinning send_all() with a proper non-blocking send. When the
socket write buffer limit is reached, we should stop trying to send and
wait for the socket to become writable again.
Non-blocking TCP sockets can return in two different ways when the write
buffer limit is reached:
1. ret = -1 and errno = EAGAIN/EWOULDBLOCK. No data has been written.
2. ret < total_size. Short write, only part of the message was
transmitted.
Handle both cases and keep track of how many bytes have been written in
s->send_index. (This includes the 'length' header before the actual
payload buffer.)
Stefan Hajnoczi [Mon, 20 Aug 2012 09:21:54 +0000 (10:21 +0100)]
net: asynchronous send/receive infrastructure for net/socket.c
The net/socket.c net client is not truly asynchronous. This patch
borrows the qemu_set_fd_handler2() code from net/tap.c as the basis for
proper asynchronous send/receive.
Only read packets from the socket when the peer is able to receive.
This avoids needless queuing.
Stefan Hajnoczi [Fri, 24 Aug 2012 12:50:30 +0000 (13:50 +0100)]
net: broadcast hub packets if at least one port can receive
In commit 60c07d933c66c4b30a83b7ccbc8a0cb3df1b2d0e ("net: fix
qemu_can_send_packet logic") the "VLAN" broadcast behavior was changed
to queue packets if any net client cannot receive. It turns out that
this was not actually the right fix and just hides the real bug that
hw/usb/dev-network.c:usbnet_receive() clobbers its receive buffer when
called multiple times in a row. The commit also introduced a new bug
that "VLAN" packets would not be sent if one of multiple net clients was
down.
The hw/usb/dev-network.c bug has since been fixed, so this patch reverts
broadcast behavior to send packets as long as one net client can
receive. Packets simply get queued for the net clients that are
temporarily unable to receive.
Stefan Hajnoczi [Fri, 24 Aug 2012 12:37:29 +0000 (13:37 +0100)]
net: fix usbnet_receive() packet drops
The USB network interface has a single buffer which the guest reads
from. This patch prevents multiple calls to usbnet_receive() from
clobbering the input buffer. Instead we queue packets until buffer
space becomes available again.
This is inspired by virtio-net and e1000 rxbuf handling.
Stefan Hajnoczi [Fri, 24 Aug 2012 12:32:16 +0000 (13:32 +0100)]
net: clean up usbnet_receive()
The USB network interface has two code paths depending on whether or not
RNDIS mode is enabled. Refactor usbnet_receive() so that there is a
common path throughout the function instead of duplicating everything
across if (is_rndis(s)) ... else ... code paths.
Clean up coding style and 80 character line wrap along the way.
Stefan Hajnoczi [Mon, 20 Aug 2012 12:35:23 +0000 (13:35 +0100)]
net: do not report queued packets as sent
Net send functions have a return value where 0 means the packet has not
been sent and will be queued. A non-zero value means the packet was
sent or an error caused the packet to be dropped.
This patch fixes two instances where packets are queued but we return
their size. This causes callers to believe the packets were sent. When
the caller uses the async send interface this creates a real problem
because the callback will be invoked for a packet that the caller
believed to be already sent. This bug can cause double-frees in the
caller.
Stefan Hajnoczi [Fri, 17 Aug 2012 20:16:42 +0000 (21:16 +0100)]
net: add receive_disabled logic to iov delivery path
This patch adds the missing NetClient->receive_disabled logic in the
sendv delivery code path. It seems that commit 893379efd0e1b84ceb0c42a713293f3dbd27b1bd ("net: disable receiving if
client returns zero") only added the logic to qemu_deliver_packet() and
not qemu_deliver_packet_iov().
The receive_disabled flag should be automatically set when .receive(),
.receive_raw(), or .receive_iov() return 0. No further packets will be
delivered to the NetClient until the receive_disabled flag is cleared
again by calling qemu_flush_queued_packets().
Typically the NetClient will wait until its file descriptor becomes
writable and then invoke qemu_flush_queued_packets() to resume
transmission.
Bo Yang [Wed, 29 Aug 2012 11:26:11 +0000 (19:26 +0800)]
eepro100: Fix network hang when rx buffers run out
This is reported by QA. When installing os with pxe, after the initial
kernel and initrd are loaded, the procedure tries to copy files from install
server to local harddisk, the network becomes stall because of running out of
receive descriptor.
[Whitespace fixes and removed qemu_notify_event() because Paolo's
earlier net patches have moved it into qemu_flush_queued_packets().
Additional info:
I can reproduce the network hang with a tap device doing a iPXE HTTP
boot as follows:
I needed a vanilla iPXE ROM to get to the iPXE prompt. I think the boot
prompt has been disabled in the ROMs that ship with QEMU to reduce boot
time.
During the vmlinuz HTTP download there is a network hang. hw/eepro100.c
has reached the end of the rx descriptor list. When the iPXE driver
replenishes the rx descriptor list we don't kick the QEMU net subsystem
and event loop, thereby leaving the tap netdev without its file
descriptor in select(2).
Paolo Bonzini [Thu, 9 Aug 2012 14:45:57 +0000 (16:45 +0200)]
xen: flush queue when getting an event
xen does not have a register that, when written, will cause can_receive
to go from false to true. However, flushing the queue can be attempted
whenever the front-end raises its side of the Xen event channel. There
is a single event channel for tx and rx.
Paolo Bonzini [Thu, 9 Aug 2012 14:45:56 +0000 (16:45 +0200)]
e1000: flush queue whenever can_receive can go from false to true
When the guests replenish the receive ring buffer, the network device
should flush its queue of pending packets. This is done with
qemu_flush_queued_packets.
e1000's can_receive can go from false to true when RCTL or RDT are
modified.
Paolo Bonzini [Thu, 9 Aug 2012 14:45:55 +0000 (16:45 +0200)]
net: notify iothread after flushing queue
virtio-net has code to flush the queue and notify the iothread
whenever new receive buffers are added by the guest. That is
fine, and indeed we need to do the same in all other drivers.
However, notifying the iothread should be work for the network
subsystem. And since we are at it we can add a little smartness:
if some of the queued packets already could not be delivered,
there is no need to notify the iothread.
Alon Levy [Wed, 12 Sep 2012 13:13:28 +0000 (16:13 +0300)]
hw/qxl: support client monitor configuration via device
Until now we used only the agent to change the monitor count and each
monitor resolution. This patch introduces the qemu part of using the
device as the mediator instead of the agent via virtio-serial.
Spice (>=0.11.5) calls the new QXLInterface::client_monitors_config,
which returns wether the interrupt is enabled, and if so and given a non
NULL monitors config will
generate an interrupt QXL_INTERRUPT_CLIENT_MONITORS_CONFIG with crc
checksum for the guest to verify a second call hasn't interfered.
The maximal number of monitors is limited on the QXLRom to 64.
Uri Lublin [Tue, 11 Sep 2012 07:09:58 +0000 (10:09 +0300)]
qxl: better cleanup for surface destroy
Add back a call to qxl_spice_destroy_surface_wait_complete() in qxl_spice_destroy_surface_wait(),
that was removed by commit c480bb7da465186b84d8427e068ef7502e47ffbf
It is needed to complete surface-removal cleanup, for non async.
For async, qxl_spice_destroy_surface_wait_complete is called upon operation completion.
The recent introduction of set_client_capabilities has broken
(seamless) migration by trying to call qxl_send_events pre (seamless
incoming) and post (*) migration, triggering the following assert:
qxl_send_events: Assertion `qemu_spice_display_is_running(&d->ssd)' failed.
The solution is easy, pre migration the guest will have already received
the client caps on the migration source side, and post migration there no
longer is a guest, so we can simply ignore the set_client_capabilities call
in both those scenarios.
*) Post migration, so not fatal for to the migration itself, but still a crash
spice: send updates only for changed screen content
when creating screen updates go compare the current guest screen
against the mirror (which holds the most recent update sent), then
only create updates for the screen areas which did actually change.
[ v2: drop redundant qemu_spice_create_one_update call ]
Creating one function which creates a single update for a given
rectangle. And one (for now) pretty simple wrapper around it to
queue up screen updates for the dirty region.
Jan Kiszka [Thu, 23 Aug 2012 11:02:33 +0000 (13:02 +0200)]
VGA: Flush coalesced MMIO on related MMIO/PIO accesses
In preparation of stopping to flush coalesced MMIO unconditionally on
vmexits, mark VGA MMIO and PIO regions as synchronous /wrt coalesced
MMIO and flush the buffer explicitly on PIO accesses that do not use
generic memory regions yet.
Jan Kiszka [Thu, 23 Aug 2012 11:02:32 +0000 (13:02 +0200)]
memory: Flush coalesced MMIO on mapping and state changes
Flush pending coalesced MMIO before performing mapping or state changes
that could affect the event orderings or route the buffered requests to
a wrong region.
Jan Kiszka [Thu, 23 Aug 2012 11:02:30 +0000 (13:02 +0200)]
memory: Use transaction_begin/commit also for single-step operations
Wrap also simple operations consisting only of a single step with
memory_region_transaction_begin/commit. This allows to perform
additional steps like coalesced MMIO flushing from a single place.
This requires dropping some micro-optimizations: The skipping of
topology updates after updating disabled or unregistered regions.
Jan Kiszka [Thu, 23 Aug 2012 11:02:29 +0000 (13:02 +0200)]
memory: Flush coalesced MMIO on selected region access
Instead of flushing pending coalesced MMIO requests on every vmexit,
this provides a mechanism to selectively flush when memory regions
related to the coalesced one are accessed. This first of all includes
the coalesced region itself but can also applied to other regions, e.g.
of the same device, by calling memory_region_set_flush_coalesced.
Peter Maydell [Wed, 15 Aug 2012 11:08:13 +0000 (12:08 +0100)]
kvm-all.c: Move init of irqchip_inject_ioctl out of kvm_irqchip_create()
Move the init of the irqchip_inject_ioctl field of KVMState out of
kvm_irqchip_create() and into kvm_init(), so that kvm_set_irq()
can be used even when no irqchip is created (for architectures
that support async interrupt notification even without an in
kernel irqchip).
Peter Maydell [Wed, 18 Jul 2012 10:11:09 +0000 (11:11 +0100)]
update-linux-headers.sh: Don't hard code list of architectures
Rather than hardcoding the list of architectures in the kernel
header update script, just import headers for every architecture
which supports KVM (with a blacklist exception for ia64 which
has KVM headers but is dead). This reduces the number of QEMU
files which need to be updated to add support for a new KVM
architecture.
optimizer.c contains some cases were the break is appearing in both the
if and the else parts. Fix that by moving it to the outer part. Also
move some common code there.
tcg/optimize: swap brcond/setcond arguments when possible
brcond and setcond ops are not commutative, but it's easy to compute the
new condition after swapping the arguments. Try to always put the constant
argument in second position like for commutative ops, to help backends to
generate better code.
Now that there are two passes of optimization (optimize.c, liveness)
there is no point of outputing the statistics of the liveness part
only. Update the code to take into account both optimizations.
Gerd Hoffmann [Thu, 30 Aug 2012 11:05:10 +0000 (13:05 +0200)]
xhci: rework interrupt handling
Split xhci_irq_update into a function which handles intx updates
(including lowering the irq line once the guests acks the interrupt)
and one which is used for raising an irq only.
Gerd Hoffmann [Tue, 28 Aug 2012 11:38:01 +0000 (13:38 +0200)]
xhci: update port handling
This patch changes the way xhci ports are linked to USBPorts. The fixed
1:1 relationship between xhci ports and USBPorts is gone. Now each
USBPort represents a physical plug which has usually two xhci ports
assigned: one usb2 and ond usb3 port. usb devices show up at one or the
other, depending on whenever they support superspeed or not.
This patch also makes the number of usb2 and usb3 ports runtime
configurable by adding 'p2' and 'p3' properties. It is allowed to
have different numbers of usb2 and usb3 ports. Specifying p2=4,p3=2
will give you an xhci adapter which supports all speeds on physical
ports 1+2 and usb2 only on ports 3+4.
Gerd Hoffmann [Thu, 23 Aug 2012 11:26:25 +0000 (13:26 +0200)]
xhci: update register layout
Change the register layout to be a bit more sparse and also not depend
on the number of ports. Useful when for making the number of ports
runtime-configurable.
Gerd Hoffmann [Fri, 17 Aug 2012 09:04:36 +0000 (11:04 +0200)]
xhci: drop buffering
This patch splits the xhci_xfer_data function into three.
The xhci_xfer_data function used to do does two things:
(1) copy transfer data between guest memory and a temporary buffer.
(2) report transfer results to the guest using events.
Now we three functions to handle this:
(1) xhci_xfer_map creates a scatter list for the transfer and
uses that (instead of the temporary buffer) to build a
USBPacket.
(2) xhci_xfer_unmap undoes the mapping.
(3) xhci_xfer_report sends out events.
The patch also fixes reporting of transaction errors which must be
reported unconditinally, not only in case the guest asks for it
using the ISP flag.
Gerd Hoffmann [Fri, 17 Aug 2012 12:05:21 +0000 (14:05 +0200)]
xhci: rip out background transfer code
original xhci code (the one which used libusb directly) used to use
'background transfers' for iso streams. In upstream qemu the iso
stream buffering is handled by usb-host & usb-redir, so we will
never ever need this. It has been left in as reference, but is dead
code anyway. Rip it out.
Hans de Goede [Mon, 3 Sep 2012 10:04:49 +0000 (12:04 +0200)]
usb-redir: Ensure our peer has the necessary caps when redirecting to XHCI
In order for redirection to work properly when redirecting to an emulated
XHCI controller, the usb-redir-host must support both
usb_redir_cap_ep_info_max_packet_size and usb_redir_cap_64bits_ids,
reject any devices redirected to an XHCI controller when these are not
supported.