Peter Maydell [Mon, 6 Jun 2016 15:59:28 +0000 (16:59 +0100)]
target-arm: Don't try to set ESR IL bit in arm_cpu_do_interrupt_aarch64()
Remove some incorrect code from arm_cpu_do_interrupt_aarch64()
which attempts to set the IL bit in the syndrome register based
on the value of env->thumb. This is wrong in several ways:
* IL doesn't indicate Thumb-vs-ARM, it indicates instruction
length (which may be 16 or 32 for Thumb and is always 32 for ARM)
* not every syndrome format uses IL like this -- for some IL is
always set, and for some it is always clear
* the code is changing esr_el[new_el] even for interrupt entry,
which is not supposed to modify ESR_ELx at all
Delete the code, and instead rely on the syndrome value in
env->exception.syndrome having already been set up with the
correct value of IL.
Peter Maydell [Mon, 6 Jun 2016 15:59:28 +0000 (16:59 +0100)]
target-arm: Set IL bit in syndromes for insn abort, watchpoint, swstep
For some exception syndrome types, the IL bit should always be set.
This includes the instruction abort, watchpoint and software step
syndrome types; add the missing ARM_EL_IL bit to the syndrome
values returned by syn_insn_abort(), syn_swstep() and syn_watchpoint().
target-arm: A64: Create Instruction Syndromes for Data Aborts
Add support for generating the ISS (Instruction Specific Syndrome) for
Data Abort exceptions taken from AArch64.
These syndromes are used by hypervisors for example to trap and emulate
memory accesses.
We save the decoded data out-of-band with the TBs at translation time.
When exceptions hit, the extra data attached to the TB is used to
recreate the state needed to encode instruction syndromes.
This avoids the need to emit moves with every load/store.
Peter Maydell [Mon, 6 Jun 2016 14:17:52 +0000 (15:17 +0100)]
Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into staging
readdir_r() to readdir() conversion, various minor cleanups
# gpg: Signature made Mon 06 Jun 2016 10:52:52 BST
# gpg: using DSA key 0x02FC3AEB0101DBC2
# gpg: Good signature from "Greg Kurz <[email protected]>"
# gpg: aka "Greg Kurz <[email protected]>"
# gpg: aka "Greg Kurz <[email protected]>"
# gpg: aka "Gregory Kurz (Groug) <[email protected]>"
# gpg: aka "Gregory Kurz (Cimai Technology) <[email protected]>"
# gpg: aka "Gregory Kurz (Meiosys Technology) <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894 DBA2 02FC 3AEB 0101 DBC2
* remotes/gkurz/tags/for-upstream:
9p: switch back to readdir()
9p: add locking to V9fsDir
9p: introduce the V9fsDir type
9p: drop useless out: label
9p: drop useless inclusion of hw/i386/pc.h
9p/fsdev: remove obsolete references to virtio
9p: some more cleanup in #include directives
Peter Maydell [Mon, 6 Jun 2016 12:58:24 +0000 (13:58 +0100)]
Merge remote-tracking branch 'remotes/kraxel/tags/pull-vga-20160606-1' into staging
virtio-gpu: scanout fix, live migration support
vmsvga: security fixes
# gpg: Signature made Mon 06 Jun 2016 08:05:00 BST
# gpg: using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg: aka "Gerd Hoffmann <[email protected]>"
# gpg: aka "Gerd Hoffmann (private) <[email protected]>"
* remotes/kraxel/tags/pull-vga-20160606-1:
virtio-gpu: add live migration support
vmsvga: don't process more than 1024 fifo commands at once
vmsvga: shadow fifo registers
vmsvga: add more fifo checks
vmsvga: move fifo sanity checks to vmsvga_fifo_length
virtio-gpu: fix scanout rectangles
Commit fcaafb1001b9c42817714dd3b2aadcfdb997b53d accidentally broke reads from
scsi-disk devices when being updated from its original form to use the new
byte-based block functions. Add the extra missing sector to offset conversion
in order to restore read functionality.
Peter Wu [Sun, 5 Jun 2016 14:35:48 +0000 (16:35 +0200)]
gdbstub: avoid busy loop while waiting for gdb
While waiting for a gdb response, or while sending an acknowledgement
there is not much to do, so do not mark the socket as non-blocking to
avoid a busy loop while paused at gdb. This only affects the user-mode
emulation (qemu-arm -g 1234 ./a.out).
Note that this issue was reported before at
https://lists.nongnu.org/archive/html/qemu-devel/2013-02/msg02277.html.
While at it, close the gdb client fd on EOF or error while reading.
Greg Kurz [Mon, 6 Jun 2016 09:52:34 +0000 (11:52 +0200)]
9p: add locking to V9fsDir
If several threads concurrently call readdir() with the same directory
stream pointer, it is possible that they all get a pointer to the same
dirent structure, whose content is overwritten each time readdir() is
called.
We must thus serialize accesses to the dirent structure.
This may be achieved with a mutex like below:
lock_mutex();
readdir();
// work with the dirent
unlock_mutex();
This patch adds all the locking, to prepare the switch to readdir().
Dmitry Fleytman [Sat, 4 Jun 2016 07:02:43 +0000 (10:02 +0300)]
e1000e: Fix build with gcc 4.6.3 and ust tracing
This patch fixes used-uninitialized false
positive while compiling with ust tracing
backend plus gcc 4.6.3:
hw/net/e1000e.c: In function ‘e1000e_io_write’:
hw/net/e1000e.c:170:39: error: ‘idx’ may be used uninitialized in this function [-Werror=uninitialized]
hw/net/e1000e.c: In function ‘e1000e_io_read’:
hw/net/e1000e.c:145:35: error: ‘idx’ may be used uninitialized in this function [-Werror=uninitialized]
cc1: all warnings being treated as errors
make: *** [hw/net/e1000e.o] Error 1
Gerd Hoffmann [Mon, 23 May 2016 13:22:07 +0000 (15:22 +0200)]
virtio-gpu: add live migration support
Store some additional state for cursor and resource backing storage,
so we can write out and reload things. Implement vmsave+vmload for
2d mode. Continue blocking live migration in 3d/virgl mode.
Gerd Hoffmann [Mon, 30 May 2016 07:09:21 +0000 (09:09 +0200)]
vmsvga: don't process more than 1024 fifo commands at once
vmsvga_fifo_run is called in regular intervals (on each display update)
and will resume where it left off. So we can simply exit the loop,
without having to worry about how processing will continue.
Gerd Hoffmann [Mon, 30 May 2016 07:09:20 +0000 (09:09 +0200)]
vmsvga: shadow fifo registers
The fifo is normal ram. So kvm vcpu threads and qemu iothread can
access the fifo in parallel without syncronization. Which in turn
implies we can't use the fifo pointers in-place because the guest
can try changing them underneath us. So add shadows for them, to
make sure the guest can't modify them after we've applied sanity
checks.
Gerd Hoffmann [Mon, 30 May 2016 07:09:18 +0000 (09:09 +0200)]
vmsvga: move fifo sanity checks to vmsvga_fifo_length
Sanity checks are applied when the fifo is enabled by the guest
(SVGA_REG_CONFIG_DONE write). Which doesn't help much if the guest
changes the fifo registers afterwards. Move the checks to
vmsvga_fifo_length so they are done each time qemu is about to read
from the fifo.
The arm target was handled by 06486077, but other targets
were ignored. This handles all the rest which actually support
disassembly (that is, skipping moxie and tilegx).
Dmitry Fleytman [Thu, 2 Jun 2016 19:12:28 +0000 (22:12 +0300)]
e1000e: Fix build with ust trace backend
ust trace backend has limitation of maximum 10
arguments per event. Traces with more arguments
cannot be compiled for this backend.
Trace e1000e_rx_rss_ip6 introduced by previous
commits has 11 arguments and fails to compile with
ust trace backend.
This patch fixes the problem by splitting this
tracepoint into two successive tracepoints with
smaller number of arguments.
For more information see comment regarding TP_ARGS
in lttng/tracepoint.h:
/*
* TP_ARGS takes tuples of type, argument separated by a comma.
* It can take up to 10 tuples (which means that less than 10 tuples is
* fine too).
* Each tuple is also separated by a comma.
*/
Build log generated by this problem:
In file included from ./trace/generated-tracers.h:9:0,
from /home/travis/build/qemu/qemu/include/trace.h:4,
from util/oslib-posix.c:36:
./trace/generated-ust-provider.h:16556:3: error: unknown type name ‘_TP_EXPROTO_Bool’
In file included from /home/travis/build/qemu/qemu/include/trace.h:4:0,
from util/oslib-posix.c:36:
./trace/generated-tracers.h: In function ‘trace_e1000e_rx_rss_ip6’:
./trace/generated-tracers.h:8379:431: error: expected string literal before ‘_SDT_ASM_OPERANDS_ipv6_enabled’
./trace/generated-tracers.h:8379:431: error: implicit declaration of function ‘__tracepoint_cb_qemu___e1000e_rx_rss_ip6’ [-Werror=implicit-function-declaration]
./trace/generated-tracers.h:8379:431: error: nested extern declaration of ‘__tracepoint_cb_qemu___e1000e_rx_rss_ip6’ [-Werror=nested-externs]
cc1: all warnings being treated as errors
make: *** [util/oslib-posix.o] Error 1
make: *** Waiting for unfinished jobs....
In file included from ./trace/generated-tracers.h:9:0,
from /home/travis/build/qemu/qemu/include/trace.h:4,
from util/hbitmap.c:16:
./trace/generated-ust-provider.h:16556:3: error: unknown type name ‘_TP_EXPROTO_Bool’
In file included from /home/travis/build/qemu/qemu/include/trace.h:4:0,
from util/hbitmap.c:16:
./trace/generated-tracers.h: In function ‘trace_e1000e_rx_rss_ip6’:
./trace/generated-tracers.h:8379:431: error: expected string literal before ‘_SDT_ASM_OPERANDS_ipv6_enabled’
./trace/generated-tracers.h:8379:431: error: implicit declaration of function ‘__tracepoint_cb_qemu___e1000e_rx_rss_ip6’ [-Werror=implicit-function-declaration]
./trace/generated-tracers.h:8379:431: error: nested extern declaration of ‘__tracepoint_cb_qemu___e1000e_rx_rss_ip6’ [-Werror=nested-externs]
cc1: all warnings being treated as errors
make: *** [util/hbitmap.o] Error 1
Peter Krempa [Wed, 11 May 2016 10:31:04 +0000 (12:31 +0200)]
audio: pa: Set volume of recording stream instead of recording device
Since pulseaudio 1.0 it's possible to set the individual stream volume
rather than setting the device volume. With this, setting hardware mixer
of a emulated sound card doesn't mess up the volume configuration of the
host.
A side effect is that this limits compatible pulseaudio version to 1.0
which was released on 2011-09-27.
Gerd Hoffmann [Mon, 30 May 2016 08:40:55 +0000 (10:40 +0200)]
virtio-gpu: fix scanout rectangles
Commit "ca58b45 ui/virtio-gpu: add and use qemu_create_displaysurface_pixman"
breaks scanouts which use a region of the underlying resource only.
So, we need another way to handle the underlying issue. Lets create a
new pixman image, grab a reference on the pixman providing the
underlying storage, hook up a destroy callback which releases the
reference. That way regions work again and releasing the backing
storage should still be impossible thanks to the extra reference we are
holding.
Gerd Hoffmann [Wed, 1 Jun 2016 06:22:30 +0000 (08:22 +0200)]
vnc: add configurable keyboard delay
Limits the rate kbd events from the vnc server are forwarded to the
guest, so input devices which are typically low-bandwidth can keep
up even on bulky input.
Alexander Graf [Tue, 24 May 2016 14:19:19 +0000 (17:19 +0300)]
vnc: Add support for color map
Our current VNC code does not handle color maps (aka non-true-color) at all
and aborts if a client requests them. There are 2 major issues with this:
1) A VNC viewer on an 8-bit X11 system may request color maps
2) RealVNC _always_ starts requesting color maps, then moves on to full color
In order to support these 2 use cases, let's just create a fake color map
that covers exactly our normal true color 8 bit color space. That way we don't
lose anything over a client that wants true color.
Actually this is a very old patch originally submitted in 2013 by
Alexander. The situation is still the same with RealVNC, it does not
connect by default to QEMU VNC. The problem is that this client is
really popular. This is better to be kludged.
Peter Maydell [Thu, 2 Jun 2016 13:26:57 +0000 (14:26 +0100)]
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Thu 02 Jun 2016 07:23:18 BST using RSA key ID 398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request: (31 commits)
Add ENET device to i.MX6 SOC.
Add ENET/Gbps Ethernet support to FEC device
i.MX: move FEC device to a register array structure.
i.MX: Rename i.MX FEC defines to ENET_XXX
i.MX: reset TX/RX descriptors when FEC is disabled.
i.MX: Fix FEC code for ECR register reset value.
i.MX: Fix FEC code for MDIO address selection
i.MX: Fix FEC code for MDIO operation selection
net: handle optional VLAN header in checksum computation.
net: improve UDP/TCP checksum computation.
e1000e: Introduce qtest for e1000e device
net: Introduce e1000e device emulation
e1000: Move out code that will be reused in e1000e
e1000_regs: Add definitions for Intel 82574-specific bits
vmxnet3: Use pci_dma_* API instead of cpu_physical_memory_*
net_pkt: Extend packet abstraction as required by e1000e functionality
rtl8139: Move more TCP definitions to common header
net_pkt: Name vmxnet3 packet abstractions more generic
vmxnet3: Use common MAC address tracing macros
net: Add macros for MAC address tracing
...
Peter Maydell [Thu, 2 Jun 2016 12:42:52 +0000 (13:42 +0100)]
Merge remote-tracking branch 'remotes/famz/tags/pull-docker-20160601' into staging
v2: Fix warning due to include.
Various temp dir/file changes.
Don't use "find -executable" to be compatible with Mac.
# gpg: Signature made Wed 01 Jun 2016 10:30:33 BST using RSA key ID 6A9171C6
# gpg: Good signature from "Fam Zheng <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6
* remotes/famz/tags/pull-docker-20160601:
.gitignore: Ignore docker source copy
MAINTAINERS: Add tests/docker
docker: Add EXTRA_CONFIGURE_OPTS
docs: Add text for tests/docker in build-system.txt
docker: Add travis tool
docker: Add mingw test
docker: Add clang test
docker: Add full test
docker: Add quick test
docker: Add common.rc
docker: Add test runner
docker: Add images
Makefile: Rules for docker testing
Makefile: Always include rules.mak
rules.mak: Add "COMMA" constant
tests: Add utilities for docker testing
According to the FEC chapter of i.MX25 reference manual
When writing to MMFR register, the MDIO device and adress are selected by
bit 27 to 23 and bit 22 to 18 respectively. This is a total of 10 bits
that need to be used by the Phy chip/address decoding function.
This patch fixes the number of bits used from 9 to 10.
According to the FEC chapter of i.MX25 reference manual
When writing the MMFR register, bit 29 and 28 select the requested operation.
* 10 means read operation with valid MII mgmt frame
* 11 means read operation with non compliant MII mgmt frame
* 01 means write operation with valid MII mgmt frame
* 00 means write operation with non compliant MII mgmt frame
So while bit 28 does change beween read/write for valid MII mgmt frame, the
mening is inverted for non compliant MII mgmt frame.
Bit 29 on the other hand means read/write whatever the type of mgmt frame
involved.
So this patch change the operation selection from bit 28 to bit 29 as it is
more generic.
Dmitry Fleytman [Wed, 1 Jun 2016 08:23:45 +0000 (11:23 +0300)]
net: Introduce e1000e device emulation
This patch introduces emulation for the Intel 82574 adapter, AKA e1000e.
This implementation is derived from the e1000 emulation code, and
utilizes the TX/RX packet abstractions that were initially developed for
the vmxnet3 device. Although some parts of the introduced code may be
shared with e1000, the differences are substantial enough so that the
only shared resources for the two devices are the definitions in
hw/net/e1000_regs.h.
Similarly to vmxnet3, the new device uses virtio headers for task
offloads (for backends that support virtio extensions). Usage of
virtio headers may be forcibly disabled via a boolean device property
"vnet" (which is enabled by default). In such case task offloads
will be performed in software, in the same way it is done on
backends that do not support virtio headers.
The device code is split into two parts:
1. hw/net/e1000e.c: QEMU-specific code for a network device;
2. hw/net/e1000e_core.[hc]: Device emulation according to the spec.
The new device name is e1000e.
Intel specifications for the 82574 controller are available at:
http://www.intel.com/content/dam/doc/datasheet/82574l-gbe-controller-datasheet.pdf
Throughput measurement results (iperf2):
Fedora 22 guest, TCP, RX
4 ++------------------------------------------+
| |
| X X X X X
3.5 ++ X X X X |
| X |
| |
3 ++ |
G | X |
b | |
/ 2.5 ++ |
s | |
| |
2 ++ |
| |
| |
1.5 X+ |
| |
+ + + + + + + + + + + +
1 ++--+---+---+---+---+---+---+---+---+---+---+
32 64 128 256 512 1 2 4 8 16 32 64
B B B B B KB KB KB KB KB KB KB
Buffer size
Fedora 22 guest, TCP, TX
18 ++-------------------------------------------+
| X |
16 ++ X X X X X
| X |
14 ++ |
| |
12 ++ |
G | X |
b 10 ++ |
/ | |
s 8 ++ |
| |
6 ++ X |
| |
4 ++ |
| X |
2 ++ X |
X + + + + + + + + + + +
0 ++--+---+---+---+---+----+---+---+---+---+---+
32 64 128 256 512 1 2 4 8 16 32 64
B B B B B KB KB KB KB KB KB KB
Buffer size
Fedora 22 guest, UDP, RX
3 ++------------------------------------------+
| X
| |
2.5 ++ |
| |
| |
2 ++ X |
G | |
b | |
/ 1.5 ++ |
s | X |
| |
1 ++ |
| |
| X |
0.5 ++ |
| X |
X + + + + +
0 ++-------+--------+-------+--------+--------+
32 64 128 256 512 1
B B B B B KB
Datagram size
Fedora 22 guest, UDP, TX
1 ++------------------------------------------+
| X
0.9 ++ |
| |
0.8 ++ |
0.7 ++ |
| |
G 0.6 ++ |
b | |
/ 0.5 ++ |
s | X |
0.4 ++ |
| |
0.3 ++ |
0.2 ++ X |
| |
0.1 ++ X |
X X + + + +
0 ++-------+--------+-------+--------+--------+
32 64 128 256 512 1
B B B B B KB
Datagram size
Windows 2012R2 guest, TCP, RX
3.2 ++------------------------------------------+
| X |
3 ++ |
| |
2.8 ++ |
| |
2.6 ++ X |
G | X X X X X
b 2.4 ++ X X |
/ | |
s 2.2 ++ |
| |
2 ++ |
| X X |
1.8 ++ |
| |
1.6 X+ |
+ + + + + + + + + + + +
1.4 ++--+---+---+---+---+---+---+---+---+---+---+
32 64 128 256 512 1 2 4 8 16 32 64
B B B B B KB KB KB KB KB KB KB
Buffer size
Windows 2012R2 guest, TCP, TX
14 ++-------------------------------------------+
| |
| X X
12 ++ |
| |
10 ++ |
| |
G | |
b 8 ++ |
/ | X |
s 6 ++ |
| |
| |
4 ++ X |
| |
2 ++ |
| X X X |
+ X X + + X X + + + + +
0 X+--+---+---+---+---+----+---+---+---+---+---+
32 64 128 256 512 1 2 4 8 16 32 64
B B B B B KB KB KB KB KB KB KB
Buffer size
Windows 2012R2 guest, UDP, RX
1.6 ++------------------------------------------X
| |
1.4 ++ |
| |
1.2 ++ |
| X |
| |
G 1 ++ |
b | |
/ 0.8 ++ |
s | |
0.6 ++ X |
| |
0.4 ++ |
| X |
| |
0.2 ++ X |
X + + + + +
0 ++-------+--------+-------+--------+--------+
32 64 128 256 512 1
B B B B B KB
Datagram size
Windows 2012R2 guest, UDP, TX
0.6 ++------------------------------------------+
| X
| |
0.5 ++ |
| |
| |
0.4 ++ |
G | |
b | |
/ 0.3 ++ X |
s | |
| |
0.2 ++ |
| |
| X |
0.1 ++ |
| X |
X X + + + +
0 ++-------+--------+-------+--------+--------+
32 64 128 256 512 1
B B B B B KB
Datagram size
Dmitry Fleytman [Wed, 1 Jun 2016 08:23:30 +0000 (11:23 +0300)]
pci: fix unaligned access in pci_xxx_quad()
Replace legacy cpu_to_le64w()/le64_to_cpup()
calls with stq_le_p()/ldq_le_p().
Motivation for this modification is that
follow up patches add utility function
pcie_dev_ser_num_init() for PCIe DSN
capability creation which uses
pci_set_quad() with a misaligned offset.
Fam Zheng [Wed, 1 Jun 2016 04:25:17 +0000 (12:25 +0800)]
Makefile: Rules for docker testing
This adds a group of make targets to run docker tests, all are available
in source tree without running ./configure.
The usage is shown with "make docker".
Besides the fixed ones, dynamic targets for building each image and
running each test in each image are generated automatically by make,
scanning $(SRC_PATH)/tests/docker/ files with specific patterns.
Alternative to manually list particular targets (docker-TEST@IMAGE)
set, you can control which tests/images to run by filtering variables,
TESTS= and IMAGES=, which are expressed in Makefile pattern syntax,
"foo% %bar ...". For example:
$ make docker-test IMAGES="ubuntu fedora"
Unfortunately, it's impossible to propagate "-j $JOBS" into make in
containers, however since each combination is made a first class target
in the top Makefile, "make -j$N docker-test" still parallels the tests
coarsely.
Still, $J is made a magic variable to let all make invocations in
containers to use -j$J.
Instead of providing a live version of the source tree to the docker
container we snapshot it with git-archive. This ensures the tree is in a
pristine state for whatever operations the container is going to run on
them.
Uncommitted changes known to files known by the git index will be
included in the snapshot if there are any.
Peter Maydell [Tue, 31 May 2016 09:37:21 +0000 (10:37 +0100)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160531' into staging
ppc patch queue for 2016-05-31
Here's another ppc patch queue. This batch is all preliminaries
towards two significant features:
1) Full hypervisor-mode support for POWER8
Patches 1-8 start fixing various bugs with TCG's handling of
hypervisor mode
2) CPU hotplug support
Patches 9-12 make some preliminary fixes towards implementing CPU
hotplug on ppc64 (and other non-x86 platforms). These patches are
actually to generic code, not ppc, but are included here with
Paolo's ACK.
# gpg: Signature made Tue 31 May 2016 01:39:44 BST using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <[email protected]>"
# gpg: aka "David Gibson (Red Hat) <[email protected]>"
# gpg: aka "David Gibson (ozlabs.org) <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.7-20160531:
cpu: Add a sync version of cpu_remove()
cpu: Reclaim vCPU objects
exec: Do vmstate unregistration from cpu_exec_exit()
exec: Remove cpu from cpus list during cpu_exec_exit()
ppc: Add PPC_64H instruction flag to POWER7 and POWER8
ppc: Get out of emulation on SMT "OR" ops
ppc: Fix sign extension issue in mtmsr(d) emulation
ppc: Change 'invalid' bit mask of tlbiel and tlbie
ppc: tlbie, tlbia and tlbisync are HV only
ppc: Do some batching of TCG tlb flushes
ppc: Use split I/D mmu modes to avoid flushes on interrupts
ppc: Remove MMU_MODEn_SUFFIX definitions
Peter Maydell [Tue, 31 May 2016 08:29:23 +0000 (09:29 +0100)]
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* docs/atomics fixes and atomic_rcu_* optimization (Emilio)
* NBD bugfix (Eric)
* Memory fixes and cleanups (Paolo, Paul)
* scsi-block support for SCSI status, including persistent
reservations (Paolo)
* kvm_stat moves to the Linux repository
* SCSI bug fixes (Peter, Prasad)
* Killing qemu_char_get_next_serial, non-ARM parts (Xiaoqiang)
# gpg: Signature made Sun 29 May 2016 08:11:20 BST using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg: aka "Paolo Bonzini <[email protected]>"
* remotes/bonzini/tags/for-upstream: (30 commits)
exec: hide mr->ram_addr from qemu_get_ram_ptr users
memory: split memory_region_from_host from qemu_ram_addr_from_host
exec: remove ram_addr argument from qemu_ram_block_from_host
memory: remove qemu_get_ram_fd, qemu_set_ram_fd, qemu_ram_block_host_ptr
scsi-generic: Merge block max xfer len in INQUIRY response
scsi-block: always use SG_IO
scsi-disk: introduce scsi_disk_req_check_error
scsi-disk: add need_fua_emulation to SCSIDiskClass
scsi-disk: introduce dma_readv and dma_writev
scsi-disk: introduce a common base class
xen-hvm: ignore background I/O sections
docs/atomics: update comparison with Linux
atomics: do not emit consume barrier for atomic_rcu_read
atomics: emit an smp_read_barrier_depends() barrier only for Alpha and Thread Sanitizer
docs/atomics: update atomic_read/set comparison with Linux
bt: rewrite csrhci_write to avoid out-of-bounds writes
block/iscsi: avoid potential overflow of acb->task->cdb
scsi: megasas: check 'read_queue_head' index value
scsi: megasas: initialise local configuration data buffer
scsi: megasas: use appropriate property buffer size
...
Bharata B Rao [Thu, 12 May 2016 03:48:14 +0000 (09:18 +0530)]
cpu: Add a sync version of cpu_remove()
This sync API will be used by the CPU hotplug code to wait for the CPU to
completely get removed before flagging the failure to the device_add
command.
Sync version of this call is needed to correctly recover from CPU
realization failures when ->plug() handler fails.
Gu Zheng [Thu, 12 May 2016 03:48:13 +0000 (09:18 +0530)]
cpu: Reclaim vCPU objects
In order to deal well with the kvm vcpus (which can not be removed without any
protection), we do not close KVM vcpu fd, just record and mark it as stopped
into a list, so that we can reuse it for the appending cpu hot-add request if
possible. It is also the approach that kvm guys suggested:
https://www.mail-archive.com/[email protected]/msg102839.html
Signed-off-by: Chen Fan <[email protected]> Signed-off-by: Gu Zheng <[email protected]> Signed-off-by: Zhu Guihua <[email protected]> Signed-off-by: Bharata B Rao <[email protected]>
[- Explicit CPU_REMOVE() from qemu_kvm/tcg_destroy_vcpu()
isn't needed as it is done from cpu_exec_exit()
- Use iothread mutex instead of global mutex during
destroy
- Don't cleanup vCPU object from vCPU thread context
but leave it to the callers (device_add/device_del)] Reviewed-by: Thomas Huth <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: David Gibson <[email protected]>
Bharata B Rao [Thu, 12 May 2016 03:48:12 +0000 (09:18 +0530)]
exec: Do vmstate unregistration from cpu_exec_exit()
cpu_exec_init() does vmstate_register for the CPU device. This needs to be
undone from cpu_exec_exit(). This change is needed to support CPU hot
removal.
Bharata B Rao [Thu, 12 May 2016 03:48:11 +0000 (09:18 +0530)]
exec: Remove cpu from cpus list during cpu_exec_exit()
CPUState *cpu gets added to the cpus list during cpu_exec_init(). It
should be removed from cpu_exec_exit().
cpu_exec_exit() is called from generic CPU::instance_finalize and some
archs like PowerPC call it from CPU unrealizefn. So ensure that we
dequeue the cpu only once.
Now -1 value for cpu->cpu_index indicates that we have already dequeued
the cpu for CONFIG_USER_ONLY case also.
Otherwise tight loops at smt_low for example, which OPAL does,
eat so much CPU that we can't boot a kernel anymore. With that,
I can boot 8 CPUs just fine with powernv.
ppc: Change 'invalid' bit mask of tlbiel and tlbie
Otherwise it will trip on the forms used in recent architecture.
Ideally, we should have different handlers for different architecture
levels but our current implementation of TLB flushing is dumb enough
that this will do for now.
On ppc64 especially, we flush the tlb on any slbie or tlbie instruction.
However, those instructions often come in bursts of 3 or more (context
switch will favor a series of slbie's for example to an slbia if the
SLB has less than a certain number of entries in it, and tlbie's can
happen in a series, with PAPR, H_BULK_REMOVE can remove up to 4 entries
at a time.
Doing a tlb_flush() each time is a waste of time. We end up doing a memset
of the whole TLB, reloading it for the next instruction, memset'ing again,
etc...
Those instructions don't have to take effect immediately. For slbie, they
can wait for the next context synchronizing event. For tlbie, the next
tlbsync.
This implements batching by keeping a flag that indicates that we have a
TLB in need of flushing. We check it on interrupts, rfi's, isync's and
tlbsync and flush the TLB if needed.
This reduces the number of tlb_flush() on a boot to a ubuntu installer
first dialog screen from roughly 360K down to 36K.
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
[clg: added a 'CPUPPCState *' variable in h_remove() and
h_bulk_remove() ] Signed-off-by: Cédric Le Goater <[email protected]>
[dwg: removed spurious whitespace change, use 0/1 not true/false
consistently, since tlb_need_flush has int type] Signed-off-by: David Gibson <[email protected]>
ppc: Use split I/D mmu modes to avoid flushes on interrupts
We rework the way the MMU indices are calculated, providing separate
indices for I and D side based on MSR:IR and MSR:DR respectively,
and thus no longer need to flush the TLB on context changes. This also
adds correct support for HV as a separate address space.