Kevin Wolf [Wed, 7 Feb 2018 16:58:47 +0000 (17:58 +0100)]
block: Fail bdrv_truncate() with negative size
Most callers have their own checks, but something like this should also
be checked centrally. As it happens, x-blockdev-create can pass negative
image sizes to format drivers (because there is no QAPI type that would
reject negative numbers) and triggers the check added by this patch.
Kevin Wolf [Wed, 7 Feb 2018 15:38:05 +0000 (16:38 +0100)]
file-posix: Fix no-op bdrv_truncate() with falloc preallocation
If bdrv_truncate() is called, but the requested size is the same as
before, don't call posix_fallocate(), which returns -EINVAL for length
zero and would therefore make bdrv_truncate() fail.
The problem can be triggered by creating a zero-sized raw image with
'falloc' preallocation mode.
Kevin Wolf [Fri, 2 Feb 2018 15:12:18 +0000 (16:12 +0100)]
ssh: Use QAPI BlockdevOptionsSsh object
Create a BlockdevOptionsSsh object in connect_to_ssh() and take the
options from there. 'host_key_check' is still processed separately
because it's not in the schema yet.
Kevin Wolf [Thu, 1 Feb 2018 16:26:27 +0000 (17:26 +0100)]
sheepdog: QAPIfy "redundancy" create option
The "redundancy" option for Sheepdog image creation is currently a
string that can encode one or two integers depending on its format,
which at the same time implicitly selects a mode.
This patch turns it into a QAPI union and converts the string into such
a QAPI object before interpreting the values.
Kevin Wolf [Wed, 31 Jan 2018 18:47:48 +0000 (19:47 +0100)]
nfs: Use QAPI options in nfs_client_open()
Using the QAPI visitor to turn all options into QAPI BlockdevOptionsNfs
simplifies the code a lot. It will also be useful for implementing the
QAPI based .bdrv_co_create callback.
Kevin Wolf [Fri, 16 Feb 2018 17:48:25 +0000 (18:48 +0100)]
rbd: Use qemu_rbd_connect() in qemu_rbd_do_create()
This is almost exactly the same code. The differences are that
qemu_rbd_connect() supports BlockdevOptionsRbd.server and that the cache
mode is set explicitly.
Supporting 'server' is a welcome new feature for image creation.
Caching is disabled by default, so leave it that way.
Kevin Wolf [Fri, 16 Feb 2018 17:54:52 +0000 (18:54 +0100)]
rbd: Assign s->snap/image_name in qemu_rbd_open()
Now that the options are already available in qemu_rbd_open() and not
only parsed in qemu_rbd_connect(), we can assign s->snap and
s->image_name there instead of passing the fields by reference to
qemu_rbd_connect().
Kevin Wolf [Thu, 15 Feb 2018 19:31:04 +0000 (20:31 +0100)]
rbd: Remove non-schema options from runtime_opts
Instead of the QemuOpts in qemu_rbd_connect(), we want to use QAPI
objects. As a preparation, fetch those options directly from the QDict
that .bdrv_open() supports in the rbd driver and that are not in the
schema.
Kevin Wolf [Thu, 15 Feb 2018 18:13:47 +0000 (19:13 +0100)]
rbd: Factor out qemu_rbd_connect()
The code to establish an RBD connection is duplicated between open and
create. In order to be able to share the code, factor out the code from
qemu_rbd_open() as a first step.
Kevin Wolf [Tue, 9 Jan 2018 15:50:57 +0000 (16:50 +0100)]
block: x-blockdev-create QMP command
This adds a synchronous x-blockdev-create QMP command that can create
qcow2 images on a given node name.
We don't want to block while creating an image, so this is not the final
interface in all aspects, but BlockdevCreateOptionsQcow2 and
.bdrv_co_create() are what they actually might look like in the end. In
any case, this should be good enough to test whether we interpret
BlockdevCreateOptions as we should.
Kevin Wolf [Thu, 11 Jan 2018 15:18:08 +0000 (16:18 +0100)]
qcow2: Use visitor for options in qcow2_create()
Instead of manually creating the BlockdevCreateOptions object, use a
visitor to parse the given options into the QAPI object.
This involves translation from the old command line syntax to the syntax
mandated by the QAPI schema. Option names are still checked against
qcow2_create_opts, so only the old option names are allowed on the
command line, even if they are translated in qcow2_create().
In contrast, new option values are optionally recognised besides the old
values: 'compat' accepts 'v2'/'v3' as an alias for '0.10'/'1.1', and
'encrypt.format' accepts 'qcow' as an alias for 'aes' now.
Kevin Wolf [Thu, 11 Jan 2018 16:37:23 +0000 (17:37 +0100)]
util: Add qemu_opts_to_qdict_filtered()
This allows, given a QemuOpts for a QemuOptsList that was merged from
multiple QemuOptsList, to only consider those options that exist in one
specific list. Block drivers need this to separate format-layer create
options from protocol-level options.
Kevin Wolf [Thu, 11 Jan 2018 13:38:34 +0000 (14:38 +0100)]
qcow2: Handle full/falloc preallocation in qcow2_co_create()
Once qcow2_co_create() can be called directly on an already existing
node, we must provide the 'full' and 'falloc' preallocation modes
outside of creating the image on the protocol layer. Fortunately, we
have preallocated truncate now which can provide this functionality.
Kevin Wolf [Tue, 9 Jan 2018 16:35:26 +0000 (17:35 +0100)]
qcow2: Let qcow2_create() handle protocol layer
Currently, qcow2_create() only parses the QemuOpts and then calls
qcow2_co_create() for the actual image creation, which includes both the
creation of the actual file on the file system and writing a valid empty
qcow2 image into that file.
The plan is that qcow2_co_create() becomes the function that implements
the functionality for a future 'blockdev-create' QMP command, which only
creates the qcow2 layer on an already opened file node.
This is a first step towards that goal: Let's move out anything that
deals with the protocol layer from qcow2_co_create() into
qcow2_create(). This means that qcow2_co_create() doesn't need a file
name any more.
Kevin Wolf [Thu, 1 Mar 2018 16:45:23 +0000 (17:45 +0100)]
qcow2: Rename qcow2_co_create2() to qcow2_co_create()
The functions originally known as qcow2_create() and qcow2_create2()
are now called qcow2_co_create_opts() and qcow2_co_create(), which
matches the names of the BlockDriver callbacks that they will implement
at the end of this patch series.
Kevin Wolf [Fri, 24 Nov 2017 15:01:07 +0000 (16:01 +0100)]
block/qapi: Introduce BlockdevCreateOptions
This creates a BlockdevCreateOptions union type that will contain all of
the options for image creation. We'll start out with an empty struct
type BlockdevCreateNotSupported for all drivers.
Alberto Garcia [Tue, 6 Mar 2018 16:14:12 +0000 (18:14 +0200)]
qcow2: Make qemu-img check detect corrupted L1 tables in snapshots
'qemu-img check' cannot detect if a snapshot's L1 table is corrupted.
This patch checks the table's offset and size and reports corruption
if the values are not valid.
This patch doesn't add code to fix that corruption yet, only to detect
and report it.
Alberto Garcia [Tue, 6 Mar 2018 16:14:11 +0000 (18:14 +0200)]
qcow2: Check snapshot L1 table in qcow2_snapshot_delete()
This function deletes a snapshot from disk, removing its entry from
the snapshot table, freeing its L1 table and decreasing the refcounts
of all clusters.
The L1 table offset and size are however not validated. If we use
invalid values in this function we'll probably corrupt the image even
more, so we should return an error instead.
We now have a function to take care of this, so let's use it.
Alberto Garcia [Tue, 6 Mar 2018 16:14:08 +0000 (18:14 +0200)]
qcow2: Check L1 table parameters in qcow2_expand_zero_clusters()
This function iterates over all snapshots of a qcow2 file in order to
expand all zero clusters, but it does not validate the snapshots' L1
tables first.
We now have a function to take care of this, so let's use it.
We can also take the opportunity to replace the sector-based
bdrv_read() with bdrv_pread().
Alberto Garcia [Tue, 6 Mar 2018 16:14:06 +0000 (18:14 +0200)]
qcow2: Generalize validate_table_offset() into qcow2_validate_table()
This function checks that the offset and size of a table are valid.
While the offset checks are fine, the size check is too generic, since
it only verifies that the total size in bytes fits in a 64-bit
integer. In practice all tables used in qcow2 have much smaller size
limits, so the size needs to be checked again for each table using its
actual limit.
This patch generalizes this function by allowing the caller to specify
the maximum size for that table. In addition to that it allows passing
an Error variable.
The function is also renamed and made public since we're going to use
it in other parts of the code.
Paolo Bonzini [Thu, 1 Mar 2018 16:36:18 +0000 (17:36 +0100)]
block: convert bdrv_invalidate_cache callback to coroutine_fn
QED's bdrv_invalidate_cache implementation would like to reuse functions
that acquire/release the metadata locks. Call it from coroutine context
to simplify the logic.
Paolo Bonzini [Thu, 1 Mar 2018 16:36:17 +0000 (17:36 +0100)]
qed: make bdrv_qed_do_open a coroutine_fn
It is called from qed_invalidate_cache in coroutine context (incoming
migration runs in a coroutine), so it's cleaner if metadata is always
loaded from a coroutine.
Paolo Bonzini [Thu, 1 Mar 2018 16:36:16 +0000 (17:36 +0100)]
qcow2: make qcow2_do_open a coroutine_fn
It is called from qcow2_invalidate_cache in coroutine context (incoming
migration runs in a coroutine), so it's cleaner if metadata is always
loaded from a coroutine.
block: implement the bdrv_reopen_prepare helper for LUKS driver
If the bdrv_reopen_prepare helper isn't provided, the qemu-img commit
command fails to re-open the base layer after committing changes into
it. Provide a no-op implementation for the LUKS driver, since there
is not any custom work that needs doing to re-open it.
Peter Maydell [Fri, 9 Mar 2018 10:58:57 +0000 (10:58 +0000)]
Merge remote-tracking branch 'remotes/riscv/tags/riscv-qemu-upstream-v8.2' into staging
QEMU RISC-V Emulation Support (RV64GC, RV32GC)
This release renames the SiFive machines to sifive_e and sifive_u
to represent the SiFive Everywhere and SiFive Unleashed platforms.
SiFive has configurable soft-core IP, so it is intended that these
machines will be extended to enable a variety of SiFive IP blocks.
The CPU definition infrastructure has been improved and there are
now vendor CPU modules including the SiFiVe E31, E51, U34 and U54
cores. The emulation accuracy for the E series has been improved
by disabling the MMU for the E series. S mode has been disabled on
cores that only support M mode and U mode. The two Spike machines
that support two privileged ISA versions have been coalesced into
one file. This series has Signed-off-by from the core contributors.
*** Known Issues ***
* Disassembler has some checkpatch warnings for the sake of code brevity
* scripts/qemu-binfmt-conf.sh has checkpatch warnings due to line length
* PMP (Physical Memory Protection) is as-of-yet unused and needs testing
*** Changelog ***
v8.2
* Rebase
v8.1
* Fix missed case of renaming spike_v1.9 to spike_v1.9.1
v8
* Added linux-user/riscv/target_elf.h during rebase
* Make resetvec configurable and clear mpp and mie on reset
* Use SiFive E31, E51, U34 and U54 cores in SiFive machines
* Define SiFive E31, E51, U34 and U54 cores
* Refactor CPU core definition in preparation for vendor cores
* Prevent S or U mode unless S or U extensions are present
* SiFive E Series cores have no MMU
* SiFive E Series cores have U mode
* Make privileged ISA v1.10 implicit in CPU types
* Remove DRAM_BASE and EXT_IO_BASE as they vary by machine
* Correctly handle mtvec and stvec alignment with respect to RVC
* Print more machine mode state in riscv_cpu_dump_state
* Make riscv_isa_string use compact extension order method
* Fix bug introduced in v6 RISCV_CPU_TYPE_NAME macro change
* Parameterize spike v1.9.1 config string
* Coalesce spike_v1.9.1 and spike_v1.10 machines
* Rename sifive_e300 to sifive_e, and sifive_u500 to sifive_u
v7
* Make spike_v1.10 the default machine
* Rename spike_v1.9 to spike_v1.9.1 to match privileged spec version
* Remove empty target/riscv/trace-events file
* Monitor ROM 32-bit reset code needs to be target endian
* Add TARGET_TIOCGPTPEER to linux-user/riscv/termbits.h
* Add -initrd support to the virt board
* Fix naming in spike machine interface header
* Update copyright notice on RISC-V Spike machines
* Update copyright notice on RISC-V HTIF Console device
* Change CPU Core and translator to GPLv2+
* Change RISC-V Disassembler to GPLv2+
* Change SiFive Test Finisher to GPLv2+
* Change SiFive CLINT to GPLv2+
* Change SiFive PRCI to GPLv2+
* Change SiFive PLIC to GPLv2+
* Change RISC-V spike machines to GPLv2+
* Change RISC-V virt machine to GPLv2+
* Change SiFive E300 machine to GPLv2+
* Change SiFive U500 machine to GPLv2+
* Change RISC-V Hart Array to GPLv2+
* Change RISC-V HTIF device to GPLv2+
* Change SiFiveUART device to GPLv2+
v6
* Drop IEEE 754-201x minimumNumber/maximumNumber for fmin/fmax
* Remove some unnecessary commented debug statements
* Change RISCV_CPU_TYPE_NAME to use riscv-cpu suffix
* Define all CPU variants for linux-user
* qemu_log calls require trailing \n
* Replace PLIC printfs with qemu_log
* Tear out unused HTIF code and eliminate shouting debug messages
* Fix illegal instruction when sfence.vma is passed (rs2) arguments
* Make updates to PTE accessed and dirty bits atomic
* Only require atomic PTE updates on MTTCG enabled guests
* Page fault if accessed or dirty bits can't be updated
* Fix get_physical_address PTE reads and writes on riscv32
* Remove erroneous comments from the PLIC
* Default enable MTTCG
* Make WFI less conservative
* Unify local interrupt handling
* Expunge HTIF interrupts
* Always access mstatus.mip under a lock
* Don't implement rdtime/rdtimeh in system mode (bbl emulates them)
* Implement insreth/cycleh for rv32 and always enable user-mode counters
* Add GDB stub support for reading and writing CSRs
* Rename ENABLE_CHARDEV #ifdef from HTIF code
* Replace bad HTIF ELF code with load_elf symbol callback
* Convert chained if else fault handlers to switch statements
* Use RISCV exception codes for linux-user page faults
v5
* Implement NaN-boxing for flw, set high order bits to 1
* Use float_muladd_negate_* flags to floatXX_muladd
* Use IEEE 754-201x minimumNumber/maximumNumber for fmin/fmax
* Fix TARGET_NR_syscalls
* Update linux-user/riscv/syscall_nr.h
* Fix FENCE.I, needs to terminate translation block
* Adjust unusual convention for interruptno >= 0
v4
* Add @riscv: since 2.12 to CpuInfoArch
* Remove misleading little-endian comment from load_kernel
* Rename cpu-model property to cpu-type
* Drop some unnecessary inline function attributes
* Don't allow GDB to set value of x0 register
* Remove unnecessary empty property lists
* Add Test Finisher device to implement poweroff in virt machine
* Implement priv ISA v1.10 trap and sret/mret xPIE/xIE behavior
* Store fflags data in fp_status
* Purge runtime users of helper_raise_exception
* Fix validate_csr
* Tidy gen_jalr
* Tidy immediate shifts
* Add gen_exception_inst_addr_mis
* Add gen_exception_debug
* Add gen_exception_illegal
* Tidy helper_fclass_*
* Split rounding mode setting to a new function
* Enforce MSTATUS_FS via TB flags
* Implement acquire/release barrier semantics
* Use atomic operations as required
* Fix FENCE and FENCE_I
* Remove commented code from spike machines
* PAGE_WRITE permissions can be set on loads if page is already dirty
* The result of format conversion on an NaN must be a quiet NaN
* Add missing process_queued_cpu_work to riscv linux-user
* Remove float(32|64)_classify from cpu.h
* Removed nonsensical unions aliasing the same type
* Use uintN_t instead of uintN_fast_t in fpu_helper.c
* Use macros for FPU exception values in softfloat_flags_to_riscv
* Move code to set round mode into set_fp_round_mode function
* Convert set_fp_exceptions from a macro to an inline function
* Convert round mode helper into an inline function
* Make fpu_helper ieee_rm array static const
* Include cpu_mmu_index in cpu_get_tb_cpu_state flags
* Eliminate MPRV influence on mmu_index
* Remove unrecoverable do_unassigned_access function
* Only update PTE accessed and dirty bits if necessary
* Remove unnecessary tlb_flush in set_mode as mode is in mmu_idx
* Remove buggy support for misa writes. misa writes are optional
and are not implemented in any known hardware
* Always set PTE read or execute permissions during page walk
* Reorder helper function declarations to match order in helper.c
* Remove redundant variable declaration in get_physical_address
* Remove duplicated code from get_physical_address
* Use mmu_idx instead of mem_idx in riscv_cpu_get_phys_page_debug
v3
* Fix indentation in PMP and HTIF debug macros
* Fix disassembler checkpatch open brace '{' on next line errors
* Fix trailing statements on next line in decode_inst_decompress
* NOTE: the other checkpatch issues have been reviewed previously
v2
* Remove redundant NULL terminators from disassembler register arrays
* Change disassembler register name arrays to const
* Refine disassembler internal function names
* Update dates in disassembler copyright message
* Remove #ifdef CONFIG_USER_ONLY version of cpu_has_work
* Use ULL suffix on 64-bit constants
* Move riscv_cpu_mmu_index from cpu.h to helper.c
* Move riscv_cpu_hw_interrupts_pending from cpu.h to helper.c
* Remove redundant TARGET_HAS_ICE from cpu.h
* Use qemu_irq instead of void* for irq definition in cpu.h
* Remove duplicate typedef from struct CPURISCVState
* Remove redundant g_strdup from cpu_register
* Remove redundant tlb_flush from riscv_cpu_reset
* Remove redundant mode calculation from get_physical_address
* Remove redundant debug mode printf and dcsr comment
* Remove redundant clearing of MSB for bare physical addresses
* Use g_assert_not_reached for invalid mode in get_physical_address
* Use g_assert_not_reached for unreachable checks in get_physical_address
* Use g_assert_not_reached for unreachable type in raise_mmu_exception
* Return exception instead of aborting for misaligned fetches
* Move exception defines from cpu.h to cpu_bits.h
* Remove redundant breakpoint control definitions from cpu_bits.h
* Implement riscv_cpu_unassigned_access exception handling
* Log and raise exceptions for unimplemented CSRs
* Match Spike HTIF exit behavior - don’t print TEST-PASSED
* Make frm,fflags,fcsr writes trap when mstatus.FS is clear
* Use g_assert_not_reached for unreachable invalid mode
* Make hret,uret,dret generate illegal instructions
* Move riscv_cpu_dump_state and int/fpr regnames to cpu.c
* Lift interrupt flag and mask into constants in cpu_bits.h
* Change trap debugging to use qemu_log_mask LOG_TRACE
* Change CSR debugging to use qemu_log_mask LOG_TRACE
* Change PMP debugging to use qemu_log_mask LOG_TRACE
* Remove commented code from pmp.c
* Change CpuInfoRISCV qapi schema docs to Since 2.12
* Change RV feature macro to use target_ulong cast
* Remove riscv_feature and instead use misa extension flags
* Make riscv_flush_icache_syscall a no-op
* Undo checkpatch whitespace fixes in unrelated linux-user code
* Remove redudant constants and tidy up cpu_bits.h
* Make helper_fence_i a no-op
* Move include "exec/cpu-all" to end of cpu.h
* Rename set_privilege to riscv_set_mode
* Move redundant forward declaration for cpu_riscv_translate_address
* Remove TCGV_UNUSED from riscv_translate_init
* Add comment to pmp.c stating the code is untested and currently unused
* Use ctz to simplify decoding of PMP NAPOT address ranges
* Change pmp_is_in_range to use than equal for end addresses
* Fix off by one error in pmp_update_rule
* Rearrange PMP_DEBUG so that formatting is compile-time checked
* Rearrange trap debugging so that formatting is compile-time checked
* Rearrange PLIC debugging so that formatting is compile-time checked
* Use qemu_log/qemu_log_mask for HTIF logging and debugging
* Move exception and interrupt names into cpu.c
* Add Palmer Dabbelt as a RISC-V Maintainer
* Rebase against current qemu master branch
v1
* initial version based on forward port from riscv-qemu repository
*** Background ***
"RISC-V is an open, free ISA enabling a new era of processor innovation
through open standard collaboration. Born in academia and research,
RISC-V ISA delivers a new level of free, extensible software and
hardware freedom on architecture, paving the way for the next 50 years
of computing design and innovation."
The QEMU RISC-V port has been developed and maintained out-of-tree for
several years by Sagar Karandikar and Bastian Koppelmann. The RISC-V
Privileged specification has evolved substantially over this period but
has recently been solidifying. The RISC-V Base ISA has been frozon for
some time and the Privileged ISA, GCC toolchain and Linux ABI are now
quite stable. I have recently joined Sagar and Bastian as a RISC-V QEMU
Maintainer and hope to support upstreaming the port.
There are multiple vendors taping out, preparing to ship, or shipping
silicon that implements the RISC-V Privileged ISA Version 1.10. There
are also several RISC-V Soft-IP cores implementing Privileged ISA
Version 1.10 that run on FPGA such as SiFive's Freedom U500 Platform
and the U54‑MC RISC-V Core IP, among many more implementations from a
variety of vendors. See https://riscv.org/ for more details.
RISC-V support was upstreamed in binutils 2.28 and GCC 7.1 in the first
half of 2016. RISC-V support is now available in LLVM top-of-tree and
the RISC-V Linux port was accepted into Linux 4.15-rc1 late last year
and is available in the Linux 4.15 release. GLIBC 2.27 added support
for the RISC-V ISA running on Linux (requires at least binutils-2.30,
gcc-7.3.0, and linux-4.15). We believe it is timely to submit the
RISC-V QEMU port for upstream review with the goal of incorporating
RISC-V support into the upcoming QEMU 2.12 release.
The RISC-V QEMU port is still under active development, mostly with
respect to device emulation, the addition of Hypervisor support as
specified in the RISC-V Draft Privileged ISA Version 1.11, and Vector
support once the first draft is finalized later this year. We believe
now is the appropriate time for RISC-V QEMU development to be carried
out in the main QEMU repository as the code will benefit from more
rigorous review. The RISC-V QEMU port currently supports all the ISA
extensions that have been finalized and frozen in the Base ISA.
Blog post about recent additions to RISC-V QEMU: https://goo.gl/fJ4zgk
The RISC-V QEMU wiki: https://github.com/riscv/riscv-qemu/wiki
Instructions for building a busybox+dropbear root image, BBL (Berkeley
Boot Loader) and linux kernel image for use with the RISC-V QEMU
'virt' machine: https://github.com/michaeljclark/busybear-linux
*** Overview ***
The RISC-V QEMU port implements the following specifications:
* RISC-V Instruction Set Manual Volume I: User-Level ISA Version 2.2
* RISC-V Instruction Set Manual Volume II: Privileged ISA Version 1.9.1
* RISC-V Instruction Set Manual Volume II: Privileged ISA Version 1.10
The RISC-V QEMU port supports the following instruction set extensions:
* RV32GC with Supervisor-mode and User-mode (RV32IMAFDCSU)
* RV64GC with Supervisor-mode and User-mode (RV64IMAFDCSU)
The RISC-V QEMU port adds the following targets to QEMU:
The RISC-V QEMU port supports the following hardware:
* HTIF Console (Host Target Interface)
* SiFive CLINT (Core Local Interruptor) for Timer interrupts and IPIs
* SiFive PLIC (Platform Level Interrupt Controller)
* SiFive Test (Test Finisher) for exiting simulation
* SiFive UART, PRCI, AON, PWM, QSPI support is partially implemented
* VirtIO MMIO (GPEX PCI support will be added in a future patch)
* Generic 16550A UART emulation using 'hw/char/serial.c'
* MTTCG and SMP support (PLIC and CLINT) on the 'virt' machine
The RISC-V QEMU full system emulator supports 5 machines:
* Alex Suykov
* Andreas Schwab
* Antony Pavlov
* Bastian Koppelmann
* Bruce Hoult
* Chih-Min Chao
* Daire McNamara
* Darius Rad
* David Abdurachmanov
* Hesham Almatary
* Ivan Griffin
* Jim Wilson
* Kito Cheng
* Michael Clark
* Palmer Dabbelt
* Richard Henderson
* Sagar Karandikar
* Shea Levy
* Stefan O'Rear
Notes:
* contributor email addresses available off-list on request.
* checkpatch has been run on all 23 patches.
* checkpatch exceptions are noted in patches that have errors.
* passes "make check" on full build for all targets
* tested riscv-linux-4.6.2 on 'spike_v1.9.1' machine
* tested riscv-linux-4.15 on 'spike_v1.10' and 'virt' machines
* tested SiFive HiFive1 binaries in 'sifive_e' machine
* tested RV64 on 32-bit i386
This patch series includes the following patches:
# gpg: Signature made Thu 08 Mar 2018 19:40:20 GMT
# gpg: using DSA key 6BF1D7B357EF3E4F
# gpg: Good signature from "Michael Clark <[email protected]>"
# gpg: aka "Michael Clark <[email protected]>"
# gpg: aka "Michael Clark <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 7C99 930E B17C D8BA 073D 5EFA 6BF1 D7B3 57EF 3E4F
* remotes/riscv/tags/riscv-qemu-upstream-v8.2: (23 commits)
RISC-V Build Infrastructure
SiFive Freedom U Series RISC-V Machine
SiFive Freedom E Series RISC-V Machine
SiFive RISC-V PRCI Block
SiFive RISC-V UART Device
RISC-V VirtIO Machine
SiFive RISC-V Test Finisher
RISC-V Spike Machines
SiFive RISC-V PLIC Block
SiFive RISC-V CLINT Block
RISC-V HART Array
RISC-V HTIF Console
Add symbol table callback interface to load_elf
RISC-V Linux User Emulation
RISC-V Physical Memory Protection
RISC-V TCG Code Generation
RISC-V GDB Stub
RISC-V FPU Support
RISC-V CPU Helpers
RISC-V Disassembler
...
* remotes/cohuck/tags/s390x-20180308:
s390x/virtio: Convert virtio-ccw from *_exit to *_unrealize
pc-bios/s390-ccw: Move string arrays from bootmap header to .c file
s390x/sclp: clean up sclp masks
s390x/sclp: proper support of larger send and receive masks
vfio-ccw: license text should indicate GPL v2 or later
s390x/sclpconsole: Remove dead code - remove exit handlers
numa: we don't implement NUMA for s390x
hw/s390x: Add the possibility to specify the netboot image on the command line
target/s390x: Remove leading underscores from #defines
s390/ipl: only print boot menu error if -boot menu=on was specified
hw/s390x/ipl: Bail out if the network bootloader can not be found
Stefan Hajnoczi [Wed, 7 Mar 2018 14:42:05 +0000 (14:42 +0000)]
vl: introduce vm_shutdown()
Commit 00d09fdbbae5f7864ce754913efc84c12fdf9f1a ("vl: pause vcpus before
stopping iothreads") and commit dce8921b2baaf95974af8176406881872067adfa
("iothread: Stop threads before main() quits") tried to work around the
fact that emulation was still active during termination by stopping
iothreads. They suffer from race conditions:
1. virtio_scsi_handle_cmd_vq() racing with iothread_stop_all() hits the
virtio_scsi_ctx_check() assertion failure because the BDS AioContext
has been modified by iothread_stop_all().
2. Guest vq kick racing with main loop termination leaves a readable
ioeventfd that is handled by the next aio_poll() when external
clients are enabled again, resulting in unwanted emulation activity.
This patch obsoletes those commits by fully disabling emulation activity
when vcpus are stopped.
Use the new vm_shutdown() function instead of pause_all_vcpus() so that
vm change state handlers are invoked too. Virtio devices will now stop
their ioeventfds, preventing further emulation activity after vm_stop().
Note that vm_stop(RUN_STATE_SHUTDOWN) cannot be used because it emits a
QMP STOP event that may affect existing clients.
It is no longer necessary to call replay_disable_events() directly since
vm_shutdown() does so already.
Drop iothread_stop_all() since it is no longer used.
Stefan Hajnoczi [Wed, 7 Mar 2018 14:42:04 +0000 (14:42 +0000)]
virtio-scsi: fix race between .ioeventfd_stop() and vq handler
If the main loop thread invokes .ioeventfd_stop() just as the vq handler
function begins in the IOThread then the handler may lose the race for
the AioContext lock. By the time the vq handler is able to acquire the
AioContext lock the ioeventfd has already been removed and the handler
isn't supposed to run anymore!
Use the new aio_wait_bh_oneshot() function to perform ioeventfd removal
from within the IOThread. This way no races with the vq handler are
possible.
Stefan Hajnoczi [Wed, 7 Mar 2018 14:42:03 +0000 (14:42 +0000)]
virtio-blk: fix race between .ioeventfd_stop() and vq handler
If the main loop thread invokes .ioeventfd_stop() just as the vq handler
function begins in the IOThread then the handler may lose the race for
the AioContext lock. By the time the vq handler is able to acquire the
AioContext lock the ioeventfd has already been removed and the handler
isn't supposed to run anymore!
Use the new aio_wait_bh_oneshot() function to perform ioeventfd removal
from within the IOThread. This way no races with the vq handler are
possible.
Stefan Hajnoczi [Wed, 7 Mar 2018 14:42:02 +0000 (14:42 +0000)]
block: add aio_wait_bh_oneshot()
Sometimes it's necessary for the main loop thread to run a BH in an
IOThread and wait for its completion. This primitive is useful during
startup/shutdown to synchronize and avoid race conditions.
Sergio Lopez [Wed, 7 Mar 2018 11:44:59 +0000 (12:44 +0100)]
virtio-blk: dataplane: Don't batch notifications if EVENT_IDX is present
Commit 5b2ffbe4d99843fd8305c573a100047a8c962327 ("virtio-blk: dataplane:
notify guest as a batch") deferred guest notification to a BH in order
batch notifications, with purpose of avoiding flooding the guest with
interruptions.
This optimization came with a cost. The average latency perceived in the
guest is increased by a few microseconds, but also when multiple IO
operations finish at the same time, the guest won't be notified until
all completions from each operation has been run. On the contrary,
virtio-scsi issues the notification at the end of each completion.
On the other hand, nowadays we have the EVENT_IDX feature that allows a
better coordination between QEMU and the Guest OS to avoid sending
unnecessary interruptions.
With this change, virtio-blk/dataplane only batches notifications if the
EVENT_IDX feature is not present.
Some numbers obtained with fio (ioengine=sync, iodepth=1, direct=1):
- Test specs:
* fio-3.4 (ioengine=sync, iodepth=1, direct=1)
* qemu master
* virtio-blk with a dedicated iothread (default poll-max-ns)
* backend: null_blk nr_devices=1 irqmode=2 completion_nsec=280000
* 8 vCPUs pinned to isolated physical cores
* Emulator and iothread also pinned to separate isolated cores
* variance between runs < 1%
Deepa Srinivasan [Sat, 16 Dec 2017 00:59:13 +0000 (16:59 -0800)]
block: Fix qemu crash when using scsi-block
Starting qemu with the following arguments causes qemu to segfault:
... -device lsi,id=lsi0 -drive file=iscsi:<...>,format=raw,if=none,node-name=
iscsi1 -device scsi-block,bus=lsi0.0,id=<...>,drive=iscsi1
This patch fixes blk_aio_ioctl() so it does not pass stack addresses to
blk_aio_ioctl_entry() which may be invoked after blk_aio_ioctl() returns. More
details about the bug follow.
blk_aio_ioctl() invokes blk_aio_prwv() with blk_aio_ioctl_entry as the
coroutine parameter. blk_aio_prwv() ultimately calls aio_co_enter().
When blk_aio_ioctl() is executed from within a coroutine context (e.g.
iscsi_bh_cb()), aio_co_enter() adds the coroutine (blk_aio_ioctl_entry) to
the current coroutine's wakeup queue. blk_aio_ioctl() then returns.
When blk_aio_ioctl_entry() executes later, it accesses an invalid pointer:
....
BlkRwCo *rwco = &acb->rwco;
rwco->ret = blk_co_ioctl(rwco->blk, rwco->offset,
rwco->qiov->iov[0].iov_base); <--- qiov is
invalid here
...
In the case when blk_aio_ioctl() is called from a non-coroutine context,
blk_aio_ioctl_entry() executes immediately. But if bdrv_co_ioctl() calls
qemu_coroutine_yield(), blk_aio_ioctl() will return. When the coroutine
execution is complete, control returns to blk_aio_ioctl_entry() after the call
to blk_co_ioctl(). There is no invalid reference after this point, but the
function is still holding on to invalid pointers.
The fix is to change blk_aio_prwv() to accept a void pointer for the IO buffer
rather than a QEMUIOVector. blk_aio_prwv() passes this through in BlkRwCo and the
coroutine function casts it to QEMUIOVector or uses the void pointer directly.
Thomas Huth [Tue, 6 Mar 2018 06:18:01 +0000 (07:18 +0100)]
pc-bios/s390-ccw: Move string arrays from bootmap header to .c file
bootmap.h can currently only be included once - otherwise the linker
complains about multiple definitions of the "magic" strings. It's a
bad style to define string arrays in header files, so let's better
move these to the bootmap.c file instead where they are used.
Claudio Imbrenda [Fri, 23 Feb 2018 17:42:57 +0000 (18:42 +0100)]
s390x/sclp: clean up sclp masks
Introduce an sccb_mask_t to be used for SCLP event masks instead of just
unsigned int or uint32_t. This will allow later to extend the mask with
more ease.
Claudio Imbrenda [Fri, 23 Feb 2018 17:42:56 +0000 (18:42 +0100)]
s390x/sclp: proper support of larger send and receive masks
Until 67915de9f0383ccf4a ("s390x/event-facility: variable-length event masks")
we only supported sclp event masks with a size of exactly 4 bytes, even
though the architecture allows the guests to set up sclp event masks
from 1 to 1021 bytes in length.
After that patch, the behaviour was almost compliant, but some issues
were still remaining, in particular regarding the handling of selective
reads and migration.
When setting the sclp event mask, a mask size is also specified. Until
now we only considered the size in order to decide which bits to save
in the internal state. On the other hand, when a guest performs a
selective read, it sends a mask, but it does not specify a size; the
implied size is the size of the last mask that has been set.
Specifying bits in the mask of selective read that are not available in
the internal mask should return an error, and bits past the end of the
mask should obviously be ignored. This can only be achieved by keeping
track of the lenght of the mask.
The mask length is thus now part of the internal state that needs to be
migrated.
This patch fixes the handling of selective reads, whose size will now
match the length of the event mask, as per architecture.
While the default behaviour is to be compliant with the architecture,
when using older machine models the old broken behaviour is selected
(allowing only masks of size exactly 4), in order to be able to migrate
toward older versions.
Cornelia Huck [Tue, 27 Feb 2018 17:25:41 +0000 (18:25 +0100)]
vfio-ccw: license text should indicate GPL v2 or later
The license text currently specifies "any version" of the GPL. It
is unlikely that GPL v1 was ever intended; change this to the
standard "or any later version" text.
Right now it is possible to crash QEMU for s390x by providing e.g.
-numa node,nodeid=0,cpus=0-1
Problem is, that numa.c uses mc->cpu_index_to_instance_props as an
indicator whether NUMA is supported by a machine type. We don't
implement NUMA for s390x ("topology") yet. However we need
mc->cpu_index_to_instance_props for query-cpus.
So let's fix this case by also checking for mc->get_default_cpu_node_id,
which will be needed by machine_set_cpu_numa_node().
qemu-system-s390x: -numa node,nodeid=0,cpus=0-1: NUMA is not supported by
this machine-type
While at it, make s390_cpu_index_to_props() look like on other
architectures.
Thomas Huth [Tue, 27 Feb 2018 11:32:34 +0000 (12:32 +0100)]
hw/s390x: Add the possibility to specify the netboot image on the command line
The file name of the netboot binary is currently hard-coded to
"s390-netboot.img", without a possibility for the user to select
an alternative firmware image here. That's unfortunate, especially
since the basics are already there: The filename is a property of
the s390-ipl device. So we just have to add a check whether the user
already provided the property and only set the default if the string
is still empty. Now it is possible to select a different firmware
image with "-global s390-ipl.netboot_fw=/path/to/s390-netboot.img".
Thomas Huth [Mon, 5 Mar 2018 05:16:58 +0000 (06:16 +0100)]
target/s390x: Remove leading underscores from #defines
We should not use leading underscores followed by a capital letter
in #defines since such identifiers are reserved by the C standard.
For ASCE_ORIGIN, REGION_ENTRY_ORIGIN and SEGMENT_ENTRY_ORIGIN I also
added parentheses around the value to silence an error message from
checkpatch.pl.
s390/ipl: only print boot menu error if -boot menu=on was specified
It is possible that certain QEMU configurations may not
create an IPLB (such as when -kernel is provided). In
this case, a misleading error message will be printed
stating that the "boot menu is not supported for this
device type".
To amend this, only print this message iff boot menu=on
was provided on the commandline. Otherwise, return silently.
While we're at it, remove trailing periods from error
messages.
Thomas Huth [Tue, 27 Feb 2018 10:05:13 +0000 (11:05 +0100)]
hw/s390x/ipl: Bail out if the network bootloader can not be found
If QEMU fails to load 's390-netboot.img', the guest firmware currently
loops forever and just floods the console with "Network boot device
detected" messages. The code in ipl.c apparently already tried to stop
the VM with vm_stop() in this case, but this is in vain since the run
state is later reset due to a call to vm_start() from vl.c again.
To avoid the ugly firmware loop, let's simply exit QEMU directly instead
since it just does not make sense to continue if the required firmware
image can not be loaded. While we're at it, also add the file name of
the netboot binary to the error message, so that the user has a better
hint about what is missing.
Peter Maydell [Thu, 8 Mar 2018 13:42:26 +0000 (13:42 +0000)]
Merge remote-tracking branch 'remotes/mcayland/tags/qemu-sparc-signed' into staging
qemu-sparc update
# gpg: Signature made Thu 08 Mar 2018 07:23:01 GMT
# gpg: using RSA key 5BC2C56FAE0F321F
# gpg: Good signature from "Mark Cave-Ayland <[email protected]>"
# Primary key fingerprint: CC62 1AB9 8E82 200D 915C C9C4 5BC2 C56F AE0F 321F
* remotes/mcayland/tags/qemu-sparc-signed:
sparc: fix leon3 casa instruction when MMU is disabled
hw/sparc/sun4m: Fix implicit creation of "-drive if=scsi" devices
Peter Maydell [Thu, 8 Mar 2018 12:56:39 +0000 (12:56 +0000)]
Merge remote-tracking branch 'remotes/stefanberger/tags/pull-tpm-2018-03-07-1' into staging
Merge tpm 2018/03/07
# gpg: Signature made Wed 07 Mar 2018 12:42:13 GMT
# gpg: using RSA key 75AD65802A0B4211
# gpg: Good signature from "Stefan Berger <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B818 B9CA DF90 89C2 D5CE C66B 75AD 6580 2A0B 4211
* remotes/stefanberger/tags/pull-tpm-2018-03-07-1:
tpm: convert tpm_tis.c to use trace-events
tpm: convert tpm_emulator.c to use trace-events
tpm: convert tpm_util.c to use trace-events
tpm: convert tpm_passthrough.c to use trace-events
tpm: convert tpm_crb.c to use trace-events
Peter Maydell [Thu, 8 Mar 2018 10:02:46 +0000 (10:02 +0000)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Multiboot patches
# gpg: Signature made Wed 07 Mar 2018 11:15:17 GMT
# gpg: using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream:
multiboot: fprintf(stderr...) -> error_report()
multiboot: Use header names when displaying fields
multiboot: Remove unused variables from multiboot.c
multiboot: bss_end_addr can be zero
KONRAD Frederic [Fri, 2 Mar 2018 09:59:25 +0000 (10:59 +0100)]
sparc: fix leon3 casa instruction when MMU is disabled
Since the commit af7a06bac7d3abb2da48ef3277d2a415772d2ae8:
`casa [..](10), .., ..` (and probably others alternate space instructions)
triggers a data access exception when the MMU is disabled.
When we enter get_asi(...) dc->mem_idx is set to MMU_PHYS_IDX when the MMU
is disabled. Just keep mem_idx unchanged in this case so we passthrough the
MMU when it is disabled.
Thomas Huth [Wed, 7 Mar 2018 09:39:28 +0000 (10:39 +0100)]
hw/sparc/sun4m: Fix implicit creation of "-drive if=scsi" devices
The global hack for creating SCSI devices has recently been removed,
but this apparently broke SCSI devices on some boards that were not
ready for this change yet. For the sun4m machines you now get:
$ sparc-softmmu/qemu-system-sparc -boot d -cdrom x.iso
qemu-system-sparc: -cdrom x.iso: machine type does not support if=scsi,bus=0,unit=2
Fix it by calling scsi_bus_legacy_handle_cmdline() after creating the
corresponding SCSI controller.
Jack Schwartz [Thu, 21 Dec 2017 17:25:18 +0000 (09:25 -0800)]
multiboot: fprintf(stderr...) -> error_report()
Change all fprintf(stderr...) calls in hw/i386/multiboot.c to call
error_report() instead, including the mb_debug macro. Remove the "\n"
from strings passed to all modified calls, since error_report() appends
one.
This adds defaults configs for RISC-V, enables the build for the RISC-V
CPU core, hardware, and Linux User Emulation. The 'qemu-binfmt-conf.sh'
script is updated to add the RISC-V ELF magic.
Expected checkpatch errors for consistency reasons:
ERROR: line over 90 characters
FILE: scripts/qemu-binfmt-conf.sh
Michael Clark [Fri, 2 Mar 2018 12:31:14 +0000 (01:31 +1300)]
SiFive RISC-V UART Device
QEMU model of the UART on the SiFive E300 and U500 series SOCs.
BBL supports the SiFive UART for early console access via the SBI
(Supervisor Binary Interface) and the linux kernel SBI console.
The SiFive UART implements the pre qom legacy interface consistent
with the 16550a UART in 'hw/char/serial.c'.
Michael Clark [Fri, 2 Mar 2018 12:31:12 +0000 (01:31 +1300)]
RISC-V HTIF Console
HTIF (Host Target Interface) provides console emulation for QEMU. HTIF
allows identical copies of BBL (Berkeley Boot Loader) and linux to run
on both Spike and QEMU. BBL provides HTIF console access via the
SBI (Supervisor Binary Interface) and the linux kernel SBI console.
The HTIT chardev implements the pre qom legacy interface consistent
with the 16550a UART in 'hw/char/serial.c'.
Michael Clark [Fri, 2 Mar 2018 12:31:11 +0000 (01:31 +1300)]
RISC-V TCG Code Generation
TCG code generation for the RV32IMAFDC and RV64IMAFDC. The QEMU
RISC-V code generator has complete coverage for the Base ISA v2.2,
Privileged ISA v1.9.1 and Privileged ISA v1.10:
- RISC-V Instruction Set Manual Volume I: User-Level ISA Version 2.2
- RISC-V Instruction Set Manual Volume II: Privileged ISA Version 1.9.1
- RISC-V Instruction Set Manual Volume II: Privileged ISA Version 1.10