Add OpenRISC Multicore PIC which handles inter processor interrupts
(IPI) between cores. In OpenRISC all device interrupts are routed to
each core enabling this device to be simple.
Peter Maydell [Fri, 20 Oct 2017 12:33:32 +0000 (13:33 +0100)]
Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20171020' into staging
The last big chunk of s390x changes:
- (experimental) smp support under tcg
- provide the virtio-input devices for virtio-ccw
- improve error handling in the css code
- enable some simple virtio tests for s390x
- low-address protection in tcg
- some more cleanups and fixes
* remotes/cohuck/tags/s390x-20171020: (46 commits)
s390x/tcg: low-address protection support
accel/tcg: allow to invalidate a write TLB entry immediately
tests: Enable the very simple virtio tests on s390x, too
libqtest: Add qtest_[v]startf()
s390x: refactor error handling for MSCH handler
s390x: refactor error handling for HSCH handler
s390x: refactor error handling for CSCH handler
s390x: refactor error handling for XSCH handler
s390x: improve error handling for SSCH and RSCH
s390x/css: IO instr handler ending control
s390x: move s390x_new_cpu() into board code
s390x: fix cpu object referrence leak in s390x_new_cpu()
s390x/event-facility: variable-length event masks
s390x/MAINTAINERS: add mailing list
virtio-ccw: Add the virtio-input devices for CCW bus
target/s390x: special handling when starting a CPU with WAIT PSW
s390x/tcg: refactor stfl(e) to use s390_get_feat_block()
s390x/tcg: unlock NMI
s390x/cpumodel: allow to enable SENSE RUNNING STATUS for qemu
s390x/tcg: switch to new SIGP handling code
...
Peter Maydell [Fri, 20 Oct 2017 11:45:56 +0000 (12:45 +0100)]
Merge remote-tracking branch 'remotes/famz/tags/docker-pull-request' into staging
# gpg: Signature made Fri 20 Oct 2017 07:30:45 BST
# gpg: using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6
* remotes/famz/tags/docker-pull-request:
docker: Fix PATH for ccache
docker: fix out-of-tree 'make docker-test-build@debian-powerpc-cross'
docker: allow running from srcdir != builddir build
docker: cleanup temp directory after test
docker: Don't allocate tty unless DEBUG=1
This is a neat way to implement low address protection, whereby
only the first 512 bytes of the first two pages (each 4096 bytes) of
every address space are protected.
Store a tec of 0 for the access exception, this is what is defined by
Enhanced Suppression on Protection in case of a low address protection
(Bit 61 set to 0, rest undefined).
We have to make sure to to pass the access address, not the masked page
address into mmu_translate*().
Drop the check from testblock. So we can properly test this via
kvm-unit-tests.
This will check every access going through one of the MMUs.
accel/tcg: allow to invalidate a write TLB entry immediately
Background: s390x implements Low-Address Protection (LAP). If LAP is
enabled, writing to effective addresses (before any translation)
0-511 and 4096-4607 triggers a protection exception.
So we have subpage protection on the first two pages of every address
space (where the lowcore - the CPU private data resides).
By immediately invalidating the write entry but allowing the caller to
continue, we force every write access onto these first two pages into
the slow path. we will get a tlb fault with the specific accessed
addresses and can then evaluate if protection applies or not.
We have to make sure to ignore the invalid bit if tlb_fill() succeeds.
Eric Blake [Wed, 18 Oct 2017 14:20:27 +0000 (16:20 +0200)]
libqtest: Add qtest_[v]startf()
We have several callers that were formatting the argument strings
themselves; consolidate this effort by adding new convenience
functions directly in libqtest, and update some call-sites that
can benefit from it.
Note that the new functions qtest_startf() and qtest_vstartf()
behave more like qtest_init() (the caller must assign global_qtest
after the fact, rather than getting it implicitly set). This helps
us prepare for future patches that get rid of the global variable,
by explicitly highlighting which tests still depend on it now.
Halil Pasic [Tue, 17 Oct 2017 14:04:53 +0000 (16:04 +0200)]
s390x: refactor error handling for MSCH handler
Simplify the error handling of the MSCH. Let the code detecting the
condition tell (in a less ambiguous way) how it's to be handled. No
changes in behavior.
Halil Pasic [Tue, 17 Oct 2017 14:04:52 +0000 (16:04 +0200)]
s390x: refactor error handling for HSCH handler
Simplify the error handling of the HSCH. Let the code detecting the
condition tell (in a less ambiguous way) how it's to be handled. No
changes in behavior.
Halil Pasic [Tue, 17 Oct 2017 14:04:51 +0000 (16:04 +0200)]
s390x: refactor error handling for CSCH handler
Simplify the error handling of the CSCH. Let the code detecting the
condition tell (in a less ambiguous way) how it's to be handled. No
changes in behavior.
Halil Pasic [Tue, 17 Oct 2017 14:04:50 +0000 (16:04 +0200)]
s390x: refactor error handling for XSCH handler
Simplify the error handling of the XSCH. Let the code detecting the
condition tell (in a less ambiguous way) how it's to be handled. No
changes in behavior.
Halil Pasic [Tue, 17 Oct 2017 14:04:49 +0000 (16:04 +0200)]
s390x: improve error handling for SSCH and RSCH
Simplify the error handling of the SSCH and RSCH handler avoiding
arbitrary and cryptic error codes being used to tell how the instruction
is supposed to end. Let the code detecting the condition tell how it's
to be handled in a less ambiguous way. It's best to handle SSCH and RSCH
in one go as the emulation of the two shares a lot of code.
For passthrough this change isn't pure refactoring, but changes the way
kernel reported EFAULT is handled. After clarifying the kernel interface
we decided that EFAULT shall be mapped to unit exception. Same goes for
unexpected error codes and absence of required ORB flags.
Halil Pasic [Tue, 17 Oct 2017 14:04:48 +0000 (16:04 +0200)]
s390x/css: IO instr handler ending control
CSS code needs to tell the IO instruction handlers located in ioinst.c
how the emulated instruction should be ended. Currently this is done by
returning generic (POSIX) error codes, and mapping them to outcomes like
condition codes. This makes bugs easy to create and hard to recognize.
As a preparation for moving away from (mis)using generic error codes for
flow control let us introduce a type which tells the instruction
handler function how to end the instruction, in a more straight-forward
and less ambiguous way.
Igor Mammedov [Tue, 17 Oct 2017 13:41:19 +0000 (15:41 +0200)]
s390x: fix cpu object referrence leak in s390x_new_cpu()
object_new() returns cpu with refcnt == 1 and after realize
refcnt == 2*. s390x_new_cpu() as an owner of the first refcnt
should have released it on exit in both cases (on error and
success) to avoid it leaking. Do so for both cases.
Cornelia Huck [Wed, 11 Oct 2017 13:39:53 +0000 (09:39 -0400)]
s390x/event-facility: variable-length event masks
The architecture supports masks of variable length for sclp write
event mask. We currently only support 4 byte event masks, as that
is what Linux uses.
Let's extend this to the maximum mask length supported by the
architecture and return 0 to the guest for the mask bits we don't
support in core.
target/s390x: special handling when starting a CPU with WAIT PSW
When we try to start a CPU with a WAIT PSW, we have to take care that
TCG will actually try to continue executing instructions.
We must therefore really only unhalt the CPU if we don't have a WAIT
PSW. Also document the special order for restart interrupts, which
load a new PSW and change the state to operating.
To keep KVM working, simply don't have a look at the WAIT bit when
loading the PSW. Otherwise the behavior of a restart interrupt when
a CPU stopped would be changed.
This effectively enables experimental SMP support. Floating interrupts are
still a mess, so allow it but print a big warning. There also seems
to be a problem with CPU hotplug (after the main loop started).
s390x/tcg: implement STOP and RESET interrupts for TCG
Implement them like KVM implements/handles them. Both can only be
triggered via SIGP instructions. RESET has (almost) the lowest priority if
the CPU is running, and the highest if the CPU is STOPPED. This is handled
in SIGP code already. On delivery, we only have to care about the
"CPU running" scenario.
STOP is defined to be delivered after all other interrupts have been
delivered. Therefore it has the actual lowest priority.
As both can wake up a CPU if sleeping, indicate them correctly to
external code (e.g. cpu_has_work()).
We want to use the same code base for TCG, so let's cleanly factor it
out.
The sigp mutex is currently not really needed, as everything is
protected by the iothread mutex. But this could change later, so leave
it in place and initialize it properly from common code.
target/s390x: interpret PSW_MASK_WAIT only for TCG
KVM handles the wait PSW itself and triggers a WAIT ICPT in case it
really wants to sleep (disabled wait).
This will later allow us to change the order of loading a restart
interrupt and setting a CPU to OPERATING on SIGP RESTART without
changing KVM behavior.
s390x/tcg: handle WAIT PSWs during interrupt injection
If we encounter a WAIT PSW, we have to halt immediately. Using
cpu_loop_exit() at this point feels wrong. Simply leaving
cs->exception_index set doesn't result in an immediate stop.
This is also necessary to properly handle SIGP STOP interrupts later.
The CPU_INTERRUPT_HALT will be processed immediately and properly set
the CPU to halted (also resetting cs->exception_index to EXCP_HLT)
s390x/tcg: a CPU cannot switch state due to an interrupt
Going to OPERATING here looks wrong. A CPU should even never be
!OPERATING at this point. Unhalting will already be done in
cpu_handle_halt() if there is work, so we can drop this statement
completely.
s390x/tcg: take care of external interrupt subclasses
We can now let go of INTERRUPT_EXT. When cr0 changes, we have to
revalidate if we now have a pending external interrupt, just like
when the PSW (or SYSTEM MASK only) changes.
s390x/tcg: rework checking for deliverable interrupts
Currently, enabling/disabling of interrupts is not really supported.
Let's improve interrupt handling code by explicitly checking for
deliverable interrupts only. This is the first step. Checking for
external interrupt subclasses will be done next.
There are still some leftovers from old virtio interrupts in there.
Most importantly, we don't have to queue service interrupts anymore.
Just like KVM, we can simply multiplex the SCLP service interrupts and
avoid the queue.
Also, now only valid parameters/cpu_addr will be stored on service
interrupts.
External interrupts are currently all handled like floating external
interrupts, they are queued. Let's prepare for a split of floating
and local interrupts by turning INTERRUPT_EXT into a mask.
While we can have various floating external interrupts of one kind, there
is usually only one (or a fixed number) of the local external interrupts.
So turn INTERRUPT_EXT into a mask and properly indicate the kind of
external interrupt. Floating interrupts will have to moved out of
one CPU instance later once we have SMP support.
The only floating external interrupts used right now are SERVICE
interrupts, so let's use that name. Following patches will clean up
SERVICE interrupt injection.
This get's rid of the ugly special handling for cpu timer and clock
comparator interrupts. And we really only store the parameters as
defined by the PoP.
Halil Pasic [Wed, 4 Oct 2017 15:41:37 +0000 (17:41 +0200)]
s390x/css: be more consistent if broken beyond repair
Calling do_subchannel_work with no function control flags set in SCSW is
a programming error. Currently we handle this differently in
do_subchannel_work_virtual and do_subchannel_work_passthrough. Let's be
consistent and guard with a common assert against this programming error.
Peter Maydell [Fri, 20 Oct 2017 09:49:55 +0000 (10:49 +0100)]
Merge remote-tracking branch 'remotes/stefanberger/tags/pull-tpm-2017-10-19-1' into staging
Merge tpm 2017/10/19 v1
# gpg: Signature made Thu 19 Oct 2017 16:42:39 BST
# gpg: using RSA key 0x75AD65802A0B4211
# gpg: Good signature from "Stefan Berger <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B818 B9CA DF90 89C2 D5CE C66B 75AD 6580 2A0B 4211
* remotes/stefanberger/tags/pull-tpm-2017-10-19-1: (21 commits)
tpm: move recv_data_callback to TPM interface
tpm: add a QOM TPM interface
tpm-tis: fold TPMTISEmuState in TPMState
tpm-tis: remove tpm_tis.h header
tpm-tis: move TPMState to TIS header
tpm: remove locty_data from TPMState
tpm-emulator: fix error handling
tpm: add TPMBackendCmd to hold the request state
tpm: remove locty argument from receive_cb
tpm: remove needless cast
tpm: remove unused TPMBackendCmd
tpm: remove configure_tpm() hop
tpm: remove init() class method
tpm: remove TPMDriverOps
tpm: move TPMSizedBuffer to tpm_tis.h
tpm: remove tpm_register_driver()
tpm: replace tpm_get_backend_driver() to drop be_drivers
tpm: lookup tpm backend class in tpm_driver_find_by_type()
tpm: make tpm_get_backend_driver() static
tpm-tis: remove RAISE_STS_IRQ
...
Fam Zheng [Wed, 18 Oct 2017 07:38:41 +0000 (15:38 +0800)]
docker: Fix PATH for ccache
Before bcd7f06f57fb6f780a3e2f7a46c22b6f6c8238aa we source /etc/profile
so the PATH included the right paths to ccache binaries. Now we need to
update $PATH explicitly from run script.
Keep the old /usr/lib around just so that in the future, ccache from 32
bit images will just work.
Paolo Bonzini [Wed, 18 Oct 2017 13:06:29 +0000 (15:06 +0200)]
docker: allow running from srcdir != builddir build
The new script uses "git submodule", which is picky about being invoked
from the top of the git checkout. Invoke the script from $(SRC_PATH)
to avoid git's wrath.
Peter Xu [Tue, 17 Oct 2017 07:12:46 +0000 (15:12 +0800)]
docker: cleanup temp directory after test
There are temp directories named "docker-src.*" after doing docker
tests. I don't see much point in keeping that (it only contains the
qemu.tar which is exactly current tree, and the copied "run" file).
Let's remove it after test finished.
Fam Zheng [Fri, 13 Oct 2017 01:19:54 +0000 (09:19 +0800)]
docker: Don't allocate tty unless DEBUG=1
The existence of tty in the container seems to urge gcc into colorizing
the errors, but the escape chars will clutter the report once turned
into email replies on patchew. Move -t to debug mode.
Peter Maydell [Thu, 19 Oct 2017 17:42:51 +0000 (18:42 +0100)]
Merge remote-tracking branch 'remotes/mcayland/tags/qemu-sparc-signed' into staging
qemu-sparc update
# gpg: Signature made Thu 19 Oct 2017 07:50:16 BST
# gpg: using RSA key 0x5BC2C56FAE0F321F
# gpg: Good signature from "Mark Cave-Ayland <[email protected]>"
# Primary key fingerprint: CC62 1AB9 8E82 200D 915C C9C4 5BC2 C56F AE0F 321F
* remotes/mcayland/tags/qemu-sparc-signed:
sun4u: fix assert when adding NICs which aren't the in-built model
sun4u: update PCI topology to include simba PCI bridges
build: automatically handle GIT submodule checkout for dtc
On my system, I see the following with a fresh clone:
% ./configure --disable-gtk --target-list=aarch64-softmmu
% make -j8
GEN aarch64-softmmu/config-devices.mak.tmp
GEN config-host.h
mkdir -p dtc/libfdt
GIT ui/keycodemapdb dtc
mkdir -p dtc/tests
GEN qemu-options.def
[snip]
GEN migration/trace.h
make: *** [git-submodule-update] Error 1
make: *** Waiting for unfinished jobs....
Upon closer inspection, the root cause of the error is:
% git submodule update --init ui/keycodemapdb dtc
fatal: destination path 'dtc' already exists and is not an empty directory.
Clone of 'git://git.qemu-project.org/dtc.git' into submodule path 'dtc' failed
This patch fixes this race condition by forcing the 'dtc/%' rule which caused
'dtc' to be non-empty to wait on '.git-submodule-status'.
The previous patch cleaned up a bit error handling, and exposed an
existing bug: error_report_err() could be called with a NULL error.
Instead, make tpm_emulator_set_locality() set the error.
* remotes/bonzini/tags/for-upstream: (29 commits)
scsi: reject configurations with logical block size > physical block size
qdev: defer DEVICE_DEL event until instance_finalize()
Revert "qdev: Free QemuOpts when the QOM path goes away"
qdev: store DeviceState's canonical path to use when unparenting
qemu-pr-helper: use new libmultipath API
watch_mem_write: implement 8-byte accesses
notdirty_mem_write: implement 8-byte accesses
memory: reuse section_from_flat_range()
kvm: simplify kvm_align_section()
kvm: region_add and region_del is not called on updates
kvm: fix error message when failing to unregister slot
kvm: tolerate non-existing slot for log_start/log_stop/log_sync
kvm: fix alignment of ram address
memory: call log_start after region_add
target/i386: trap on instructions longer than >15 bytes
target/i386: introduce x86_ld*_code
tco: add trace events
docs/devel/loads-stores.rst: Document our various load and store APIs
nios2: define tcg_env
build: remove CONFIG_LIBDECNUMBER
...
* remotes/kraxel/tags/opengl-20171017-pull-request:
egl-headless: add dmabuf support
egl-helpers: add egl_texture_blit and egl_texture_blend
egl-helpers: add dmabuf import support
opengl: add flipping vertex shader
opengl: move shader init from console-gl.c to shader.c
console: add support for dmabufs
Gerd Hoffmann [Thu, 19 Oct 2017 07:46:29 +0000 (09:46 +0200)]
seabios: update to 1.11 prerelease
This is the seabios update for qemu 2.11. Well, almost, seabios is in
freeze for the upcoming 1.11 release. This updates seabios to current
git master snapshot, and it will be updated again to 1.11 final before
the 2.11 release.
With this two-step seabios gets some more wide testing before the actual
release and the update to 1.11 final (which will most likely happen
after qemu freeze) should have bugfix patches only.
git shortlog
============
Aleksandr Bezzubikov (3):
pci: refactor pci_find_capapibilty to get bdf as the first argument instead of the whole pci_device
pci: add QEMU-specific PCI capability structure
pci: enable RedHat PCI bridges to reserve additional resources on PCI init
Ben Warren (5):
QEMU DMA: Add DMA write capability
romfile-loader: Switch to using named structs
QEMU fw_cfg: Add command to write back address of file
QEMU fw_cfg: Add functions for accessing files by key
QEMU fw_cfg: Write fw_cfg back on S3 resume
Daniel Verkamp (5):
nvme: support NVMe 1.0 controllers
nvme: extend command timeout to 5 seconds
nvme: fix reversed loop condition in cmd_readwrite
nvme: fix extraction of status code bits
nvme: fix copy-paste mistake in comment
Filippo Sironi (1):
nvme: Use the Maximum Queue Entries Supported (MQES) to initialize I/O queues
Gerd Hoffmann (7):
usb: add hub portmap
usb-xhci: use hub portmap
std: add cp437 to unicode map
kbd: make enqueue_key public, add ascii_to_keycode
romfile: add support for constant files.
paravirt: serial console configuration.
add serial console support
Igor Mammedov (1):
drop "etc/boot-cpus" fw_cfg file and reuse legacy QEMU_CFG_NB_CPUS
Jason Wang (1):
virtio: IOMMU support
Julian Stecklina (2):
block: add NVMe boot support
nvme: fix out of memory behavior
Julius Werner (1):
coreboot: Adapt to upstream CBMEM console changes
Kevin O'Connor (26):
usb: Make usb_time_sigatt variable static
tpm: Add comment banners to tcg.c separating major parts of spec
tpm: Don't call tpm_set_failure() from tpm12_get_capability()
tpm: Move code around in tcgbios.c to keep like code together
acpi: Generalize find_fadt() and find_tcpa_by_rsdp() into find_acpi_table()
tpm: Don't call tpm_build_and_send_cmd() from tpm20_stirrandom()
tpm: Rework tpm_build_and_send_cmd() into tpm_simple_cmd()
ps2port: Disable keyboard/mouse prior to resetting ps2 controller
docs: Note release dates for 1.10.1 and 1.10.2
resume: Don't attempt to use generic reboot mechanisms on QEMU
boot: Increase description size in boot menu
src: Minor - remove tab characters that slipped into SeaBIOS C code
NVMe: Allow NVMe to be enabled on real hardware
smm: Backup and restore A20 on an SMI based mode switch
stacks: Make sure to initialize Call16Data
stacks: Don't update the A20 settings if they haven't changed
stacks: There is no need to disable NMI if it is already disabled
vga: Fix bug in stdvga_get_linesize()
docs: Fix typos in Memory_Model.md
tcgbios: Fix use of unitialized variable
boot: Rename drive_g to drive
disk: Don't require the 'struct drive_s' to be in the f-segment
block: Rename disk_op_s->drive_gf to drive_fl
virtio: Allocate drive_s storage in low memory
xhci: Build TRBs directly in xhci_trb_queue()
xhci: Verify the device is still present in xhci_cmd_submit()
Ladi Prosek (1):
ahci: Set upper 32-bit registers to zero
Patrick Rudolph (4):
SeaVGABios/cbvga: Advertise correct pixel format
SeaVGABIOS/vbe: Query driver for scanline pitch v2
SeaVGABios/cbvga: Use active mode to clear screen
SeaVGABios/cbvga: Advertise compatible VESA modes
Paul Menzel (1):
vgasrc: Increase debug level
Petr Berky (1):
config: Add function to check if fw_cfg exists
Ricardo Ribalda Delgado (1):
serialio: Support for mmap serial ports
Roman Kagan (11):
blockcmd: accept only disks and CD-ROMs
blockcmd: generic SCSI luns enumeration
virtio-scsi: enumerate luns with REPORT LUNS
esp-scsi: enumerate luns with REPORT LUNS
usb-uas: enumerate luns with REPORT LUNS
pvscsi: fix the comment about lun enumeration
mpt-scsi: try to enumerate luns with REPORT LUNS
lsi-scsi: reset in case of a serious problem
lsi-scsi: try to enumerate luns with REPORT LUNS
blockcmd: start REPORT_LUNS with the smallest buffer
Revert "lsi-scsi: reset in case of a serious problem"
Stefan Berger (1):
tpm: Log TPM 2 digest structure in little endian format
Youness Alaoui (1):
nvme: Enable NVMe support for non-qemu hardware
Zeh, Werner (1):
ahci: Disable Native Command Queueing
Mark Cave-Ayland [Sun, 15 Oct 2017 09:05:59 +0000 (10:05 +0100)]
sun4u: fix assert when adding NICs which aren't the in-built model
Commit 8d93297 introduced a bug whereby non-inbuilt NICs are realized before
setting the default MAC address causing an assert. Switch NIC creation
over from pci_create_simple() to pci_create() which works exactly the
same except omitting the realize as originally intended.
Mark Cave-Ayland [Sun, 11 Jun 2017 09:12:08 +0000 (10:12 +0100)]
sun4u: update PCI topology to include simba PCI bridges
This patch updates the sun4u model to being much closer to a real Ultra 5
by moving devices behind the 2 simba PCI bridges (A and B) as found on real
hardware.
The most noticeable change introduced by this patchset is that in-built devices
are no longer attached to the PCI root bus, but instead behind PCI bridge A.
Along with this the interrupt routing is updated accordingly to match the
official documentation.
Since the existing code currently bypasses the PCI bridge interrupt
swizzling, the interrupt mapping functions are reorganised so that
pci_pbm_map_irq() is used by the PCI bridges and pci_apb_map_irq() is
used by the PCI host bridge.
Behind the sabre PCI host bridge, the PCI IO space now needs to be
split into two separate halves at 0x8000000. Therefore we also setup a new
PCI IO space region of increased size on the PCI host bridge and enable
32-bit PCI IO accesses to allow IO accesses to reach devices behind PCI
bridge B correctly.
As part of this change we also combine the onboard sunhme NIC and the ebus
into a single multi-function device as done on a real Ultra 5. For other
NICs the existing behaviour is preserved, i.e. we initialise them and
place them into the next free slot on PCI bus B.
Finally we mark the physically unavailable slots (plus slot 0 in busA) as
reserved to ensure that users can't plug devices into non-existent slots
which will break interrupt routing.
Note: since this commit changes PCI topology and interrupt routing, an
updated openbios-sparc64 binary is included with this commit containing the
associated changes to maintain bisectability.
Logical block size of a SCSI disk should never be larger than
physical block size. From an ATA/SCSI perspective, it makes no sense
to have the logical block size greater than the physical block size,
and it cannot even be effectively expressed in the command set. The
whole point of adding the physical block size to the ATA/SCSI command
set was to communicate a desire for a larger block size (than logical),
while maintaining backwards compatibility with legacy 512 byte block
size.
When setting logical_block_size > physical_block_size, QEMU cannot express
it in READ CAPACITY(16) output, and all it can do is set the physical
block exponent to 0 (i.e. logical_block_size == physical_block_size).
Reporting the error properly, however, is better.
Michael Roth [Mon, 16 Oct 2017 22:23:15 +0000 (17:23 -0500)]
qdev: defer DEVICE_DEL event until instance_finalize()
DEVICE_DEL is currently emitted when a Device is unparented, as
opposed to when it is finalized. The main design motivation for this
seems to be that after unparent()/unrealize(), the Device is no
longer visible to the guest, and thus the operation is complete
from the perspective of management.
However, there are cases where remaining host-side cleanup is also
pertinent to management. The is generally handled by treating these
resources as aspects of the "backend", which can be managed via
separate interfaces/events, such as blockdev_add/del, netdev_add/del,
object_add/del, etc, but some devices do not have this level of
compartmentalization, namely vfio-pci, and possibly to lend themselves
well to it.
In the case of vfio-pci, the "backend" cleanup happens as part of
the finalization of the vfio-pci device itself, in particular the
cleanup of the VFIO group FD. Failing to wait for this cleanup can
result in tools like libvirt attempting to rebind the device to
the host while it's still being used by VFIO, which can result in
host crashes or other misbehavior depending on the host driver.
Deferring DEVICE_DEL still affords us the ability to manage backends
explicitly, while also addressing cases like vfio-pci's, so we
implement that approach here.
An alternative proposal involving having VFIO emit a separate event
to denote completion of host-side cleanup was discussed, but the
prevailing opinion seems to be that it is not worth the added
complexity, and leaves the issue open for other Device implementations
to solve in the future.
This patch originally addressed an issue where a DEVICE_DELETED
event could be emitted (in device_unparent()) before a Device's
QemuOpts were cleaned up (in device_finalize()), leading to a
"duplicate ID" error if management attempted to immediately add
a device with the same ID in response to the DEVICE_DELETED event.
An alternative will be implemented in a subsequent patch where we
defer the DEVICE_DELETED event until device_finalize(), which would
also prevent the race, so we revert the original fix in preparation.
Michael Roth [Mon, 16 Oct 2017 22:23:13 +0000 (17:23 -0500)]
qdev: store DeviceState's canonical path to use when unparenting
device_unparent(dev, ...) is called when a device is unparented,
either directly, or as a result of a parent device being
finalized, and handles some final cleanup for the device. Part
of this includes emiting a DEVICE_DELETED QMP event to notify
management, which includes the device's path in the composition
tree as provided by object_get_canonical_path().
object_get_canonical_path() assumes the device is still connected
to the machine/root container, and will assert otherwise, but
in some situations this isn't the case:
If the parent is finalized as a result of object_unparent(), it
will still be attached to the composition tree at the time any
children are unparented as a result of that same call to
object_unparent(). However, in some cases, object_unparent()
will complete without finalizing the parent device, due to
lingering references that won't be released till some time later.
One such example is if the parent has MemoryRegion children (which
take a ref on their parent), who in turn have AddressSpace's (which
take a ref on their regions), since those AddressSpaces get cleaned
up asynchronously by the RCU thread.
In this case qdev:device_unparent() may be called for a child Device
that no longer has a path to the root/machine container, causing
object_get_canonical_path() to assert.
Fix this by storing the canonical path during realize() so the
information will still be available for device_unparent() in such
cases.
Paolo Bonzini [Tue, 17 Oct 2017 18:11:58 +0000 (20:11 +0200)]
qemu-pr-helper: use new libmultipath API
libmultipath has recently changed its API. The new API supports multi-threaded
clients better. Unfortunately there is no backwards-compatibility, so we just
switch to the new one. Running QEMU compiled with the new library on the old
library will likely crash, while doing the opposite will cause QEMU not to
start at all (because udev, get_multipath_config and put_multipath_config
are undefined).
Paolo Bonzini [Tue, 17 Oct 2017 12:16:05 +0000 (14:16 +0200)]
watch_mem_write: implement 8-byte accesses
Aligned 8-byte memory writes by a 64-bit target on a 64-bit host should
always turn into atomic 8-byte writes on the host, however a write
write watchpoint would end up tearing the 8-byte write into two 4-byte
writes in access_with_adjusted_size().
Andrew Baumann [Fri, 13 Oct 2017 18:19:13 +0000 (11:19 -0700)]
notdirty_mem_write: implement 8-byte accesses
Aligned 8-byte memory writes by a 64-bit target on a 64-bit host should
always turn into atomic 8-byte writes on the host, however if we missed
in the softmmu, and the TLB line was marked as not dirty, then we
would end up tearing the 8-byte write into two 4-byte writes in
access_with_adjusted_size().