Max Filippov [Thu, 11 Jan 2018 20:56:45 +0000 (12:56 -0800)]
target/xtensa: allow different default CPU for MMU/noMMU
Define default core for noMMU configurations and use that core as
machine default with noMMU XTFPGA machines.
This is done to avoid offering non-working configuration (MMU core on a
noMMU machine) as a default.
Max Filippov [Thu, 24 Sep 2015 13:18:51 +0000 (16:18 +0300)]
hw/xtensa/xtfpga: support noMMU cores
Cores with and without MMU have system RAM and ROM at different locations.
Also with noMMU cores system IO region is accessible through two physical
address ranges.
Max Filippov [Fri, 22 Dec 2017 21:53:36 +0000 (13:53 -0800)]
hw/xtensa: extract xtensa_create_memory_regions
XTFPGA boards should populate core memory regions the same way sim
machine does. Move xtensa_create_memory_regions implementation to a
separate file and use it to create instruction and data memory regions
on XTFPGA boards.
Max Filippov [Fri, 22 Dec 2017 04:51:05 +0000 (20:51 -0800)]
hw/xtensa/xtfpga: rewrite mini bootloader
Don't load jump target into the CPU config, instead put it and initial
a2 as literals into the mini bootloader and use l32r to load them
natively. With these changes it should be possible to do warm reboot of
the guest.
* remotes/pmaydell/tags/pull-target-arm-20180111: (26 commits)
hw/intc/arm_gic: reserved register addresses are RAZ/WI
hw/intc/arm_gicv3: Make reserved register addresses RAZ/WI
target/arm: Make disas_thumb2_insn() generate its own UNDEF exceptions
linux-user/arm/nwfpe: Check coprocessor number for FPA emulation
hw/sd/pxa2xx_mmci: add read/write() trace events
hw/timer/pxa2xx_timer: replace hw_error() -> qemu_log_mask()
imx_fec: Reserve full FSL_IMX25_FEC_SIZE page for the register file
imx_fec: Fix a typo in imx_enet_receive()
imx_fec: Use correct length for packet size
imx_fec: Add support for multiple Tx DMA rings
imx_fec: Emulate SHIFT16 in ENETx_RACC
imx_fec: Use MIN instead of explicit ternary operator
imx_fec: Use ENET_FTRL to determine truncation length
imx_fec: Move Tx frame buffer away from the stack
imx_fec: Change queue flushing heuristics
imx_fec: Refactor imx_eth_enable_rx()
imx_fec: Do not link to netdev
Virt: ACPI: fix qemu assert due to re-assigned table data address
target/arm: Fix stlxp for aarch64_be
linux-user: Activate armeb handler registration
...
Peter Maydell [Thu, 11 Jan 2018 13:25:40 +0000 (13:25 +0000)]
hw/intc/arm_gic: reserved register addresses are RAZ/WI
The GICv2 specification says that reserved register addresses
must RAZ/WI; now that we implement external abort handling
for Arm CPUs this means we must return MEMTX_OK rather than
MEMTX_ERROR, to avoid generating a spurious guest data abort.
Peter Maydell [Thu, 11 Jan 2018 13:25:40 +0000 (13:25 +0000)]
hw/intc/arm_gicv3: Make reserved register addresses RAZ/WI
The GICv3 specification says that reserved register addresses
should RAZ/WI. This means we need to return MEMTX_OK, not MEMTX_ERROR,
because now that we support generating external aborts the
latter will cause an abort on new board models.
Peter Maydell [Thu, 11 Jan 2018 13:25:40 +0000 (13:25 +0000)]
target/arm: Make disas_thumb2_insn() generate its own UNDEF exceptions
Refactor disas_thumb2_insn() so that it generates the code for raising
an UNDEF exception for invalid insns, rather than returning a flag
which the caller must check to see if it needs to generate the UNDEF
code. This brings the function in to line with the behaviour of
disas_thumb_insn() and disas_arm_insn().
Peter Maydell [Thu, 11 Jan 2018 13:25:39 +0000 (13:25 +0000)]
linux-user/arm/nwfpe: Check coprocessor number for FPA emulation
Our copy of the nwfpe code for emulating of the old FPA11 floating
point unit doesn't check the coprocessor number in the instruction
when it emulates it. This means that we might treat some
instructions which should really UNDEF as being FPA11 instructions by
accident.
The kernel's copy of the nwfpe code doesn't make this error; I suspect
the bug was noticed and fixed as part of the process of mainlining
the nwfpe code more than a decade ago.
Add a check that the coprocessor number (which is always in bits
[11:8] of the instruction) is either 1 or 2, which is where the
FPA11 lives.
Andrey Smirnov [Thu, 11 Jan 2018 13:25:38 +0000 (13:25 +0000)]
imx_fec: Reserve full FSL_IMX25_FEC_SIZE page for the register file
Some i.MX SoCs (e.g. i.MX7) have FEC registers going as far as offset
0x614, so to avoid getting aborts when accessing those on QEMU, extend
the register file to cover FSL_IMX25_FEC_SIZE(16K) of address space
instead of just 1K.
Andrey Smirnov [Thu, 11 Jan 2018 13:25:37 +0000 (13:25 +0000)]
imx_fec: Use correct length for packet size
Use 'frame_size' instead of 'len' when calling qemu_send_packet(),
failing to do so results in malformed packets send in case when that
packed is fragmented into multiple DMA transactions.
Andrey Smirnov [Thu, 11 Jan 2018 13:25:35 +0000 (13:25 +0000)]
imx_fec: Change queue flushing heuristics
In current implementation, packet queue flushing logic seem to suffer
from a deadlock like scenario if a packet is received by the interface
before before Rx ring is initialized by Guest's driver. Consider the
following sequence of events:
1. A QEMU instance is started against a TAP device on Linux
host, running Linux guest, e. g., something to the effect
of:
qemu-system-arm \
-net nic,model=imx.fec,netdev=lan0 \
netdev tap,id=lan0,ifname=tap0,script=no,downscript=no \
... rest of the arguments ...
2. Once QEMU starts, but before guest reaches the point where
FEC deriver is done initializing the HW, Guest, via TAP
interface, receives a number of multicast MDNS packets from
Host (not necessarily true for every OS, but it happens at
least on Fedora 25)
3. Recieving a packet in such a state results in
imx_eth_can_receive() returning '0', which in turn causes
tap_send() to disable corresponding event (tap.c:203)
4. Once Guest's driver reaches the point where it is ready to
recieve packets it prepares Rx ring descriptors and writes
ENET_RDAR_RDAR to ENET_RDAR register to indicate to HW that
more descriptors are ready. And at this points emulation
layer does this:
Zhaoshenglong [Thu, 11 Jan 2018 13:25:34 +0000 (13:25 +0000)]
Virt: ACPI: fix qemu assert due to re-assigned table data address
acpi_data_push uses g_array_set_size to resize the memory size. If there
is no enough contiguous memory, the address will be changed. If we use
the old value, it will assert.
qemu-kvm: hw/acpi/bios-linker-loader.c:214: bios_linker_loader_add_checksum:
Assertion `start_offset < file->blob->len' failed.`
This issue only happens in building SRAT table now but here we unify the
pattern for other tables as well to avoid possible issues in the future.
Michael Weiser [Thu, 11 Jan 2018 13:25:33 +0000 (13:25 +0000)]
target/arm: Fix stlxp for aarch64_be
ldxp loads two consecutive doublewords from memory regardless of CPU
endianness. On store, stlxp currently assumes to work with a 128bit
value and consequently switches order in big-endian mode. With this
change it packs the doublewords in reverse order in anticipation of the
128bit big-endian store operation interposing them so they end up in
memory in the right order. This makes it work for both MTTCG and !MTTCG.
It effectively implements the ARM ARM STLXP operation pseudo-code:
data = if BigEndian() then el1:el2 else el2:el1;
With this change an aarch64_be Linux 4.14.4 kernel succeeds to boot up
in system emulation mode.
Michael Weiser [Thu, 11 Jan 2018 13:25:33 +0000 (13:25 +0000)]
linux-user: Separate binfmt arm CPU families
Give big-endian arm and aarch64 CPUs their own family in
qemu-binfmt-conf.sh to make sure we register qemu-user for binaries of
the opposite endianness on arm and aarch64. Apart from the family
assignments of the magic values, qemu_get_family() needs to be able to
distinguish the two and recognise aarch64{,_be} as well.
Michael Weiser [Thu, 11 Jan 2018 13:25:31 +0000 (13:25 +0000)]
linux-user: Fix endianess of aarch64 signal trampoline
Since for aarch64 the signal trampoline is synthesized directly into the
signal frame we need to make sure the instructions end up little-endian.
Otherwise the wrong endianness will cause a SIGILL upon return from the
signal handler on big-endian targets.
Michael Weiser [Thu, 11 Jan 2018 13:25:31 +0000 (13:25 +0000)]
linux-user: Add support for big-endian aarch64
Enable big-endian mode for data accesses on aarch64 for big-endian linux
user mode. Activate it for all exception levels as documented by ARM:
Set the SCTLR EE bit for ELs 1 through 3. Additionally set bit E0E in
EL1 to enable it in EL0 as well.
Peter Maydell [Thu, 11 Jan 2018 13:24:17 +0000 (13:24 +0000)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.12-20180111' into staging
ppc patch queue 2018-01-11
This pull request supersedes ppc-for-2.12-20180108 and several before
it. The earlier pull request included a patch which exposed a bug in
the ARM TCG backend. I've pulled that out and will repost once the
ARM bug is fixed (a patch has been posted by Richard Henderson).
Higlights from this series:
* SLOF update
* Several new devices for embedded platforms
* Fix to correctly set compatiblity mode for hotplugged CPUs
* dtc compile fix for older MacOS versions
* remotes/dgibson/tags/ppc-for-2.12-20180111:
spapr: Correct compatibility mode setting for hotplugged CPUs
hw/ppc: Remove the deprecated spapr-pci-vfio-host-bridge device
Update dtc to fix compilation problem on Mac OS 10.6
target/ppc: more use of the PPC_*() macros
ppc/pnv: change powernv_ prefix to pnv_ for overall naming consistency
hw/ide: Emulate SiI3112 SATA controller
spapr_pci: use warn_report()
ppc4xx_i2c: Implement basic I2C functions
sm501: Add some more unimplemented registers
sm501: Add panel hardware cursor registers also to read function
pseries: Update SLOF firmware image to qemu-slof-20171214
Peter Maydell [Thu, 11 Jan 2018 11:52:40 +0000 (11:52 +0000)]
Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2018-01-10' into staging
nbd patches for 2018-01-10
- Vladimir Sementsov-Ogievskiy: nbd: rename nbd_option and nbd_opt_reply
- Vladimir Sementsov-Ogievskiy: nbd/server: add additional assert to nbd_export_put
# gpg: Signature made Wed 10 Jan 2018 22:53:49 GMT
# gpg: using RSA key 0xA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <[email protected]>"
# gpg: aka "Eric Blake (Free Software Programmer) <[email protected]>"
# gpg: aka "[jpeg image of size 6874]"
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A
* remotes/ericb/tags/pull-nbd-2018-01-10:
nbd: rename nbd_option and nbd_opt_reply
nbd/server: add additional assert to nbd_export_put
Peter Maydell [Thu, 11 Jan 2018 09:54:15 +0000 (09:54 +0000)]
Merge remote-tracking branch 'remotes/mcayland/tags/qemu-sparc-signed' into staging
qemu-sparc update
# gpg: Signature made Tue 09 Jan 2018 22:12:22 GMT
# gpg: using RSA key 0x5BC2C56FAE0F321F
# gpg: Good signature from "Mark Cave-Ayland <[email protected]>"
# Primary key fingerprint: CC62 1AB9 8E82 200D 915C C9C4 5BC2 C56F AE0F 321F
* remotes/mcayland/tags/qemu-sparc-signed: (25 commits)
sun4u_iommu: add trace event for IOMMU translations
sun4u_iommu: convert from IOMMU_DPRINTF to trace-events
sun4u_iommu: update to reflect IOMMU is no longer part of the APB device
sun4u: split IOMMU device out from apb.c to sun4u_iommu.c
apb: QOMify IOMMU
sun4m: remove include/hw/sparc/sun4m.h and all references to it
sun4m: move IOMMU declarations from sun4m.h to sun4m_iommu.h
sun4m: move sun4m_iommu.c from hw/dma to hw/sparc
sun4u: switch from EBUS_DPRINTF() macro to trace-events
sparc64: introduce trace-events for hw/sparc64
apb: replace OBIO interrupt numbers in pci_pbmA_map_irq() with constants
ebus: wire up OBIO interrupts to APB pbm via qdev GPIOs
apb: remove busA property from PBMPCIBridge state
apb: split pci_pbm_map_irq() into separate functions for bus A and bus B
apb: remove pci_apb_init() and instantiate APB device using qdev
apb: move the two secondary PCI bridges objects into APBState
apb: use gpios to wire up the apb device to the SPARC CPU IRQs
apb: return APBState from pci_apb_init() rather than PCIBus
apb: APB QOMify tidy-up
sun4u: move initialisation of all ISABus devices into ebus_realize()
...
David Gibson [Thu, 4 Jan 2018 03:33:21 +0000 (14:33 +1100)]
spapr: Correct compatibility mode setting for hotplugged CPUs
Currently the pseries machine sets the compatibility mode for the
guest's cpus in two places: 1) at machine reset and 2) after CAS
negotiation.
This means that if we set or negotiate a compatiblity mode, then
hotplug a cpu, the hotplugged cpu doesn't get the right mode set and
will incorrectly have the full native features.
To correct this, we set the compatibility mode on a cpu when it is
brought online with the 'start-cpu' RTAS call. Given that we no
longer need to set the compatibility mode on all CPUs at machine
reset, so we change that to only set the mode for the boot cpu.
Thomas Huth [Wed, 3 Jan 2018 09:10:38 +0000 (10:10 +0100)]
hw/ppc: Remove the deprecated spapr-pci-vfio-host-bridge device
It's a deprecated dummy device since QEMU v2.6.0. That should have
been enough time to allow the users to update their scripts in case
they still use it, so let's remove this legacy code now.
John Arbuckle [Thu, 4 Jan 2018 19:49:52 +0000 (14:49 -0500)]
Update dtc to fix compilation problem on Mac OS 10.6
Currently QEMU does not build on Mac OS 10.6
because of a missing patch in the dtc
subproject. Updating dtc to make the patch
available fixes this problem.
Cédric Le Goater [Fri, 15 Dec 2017 13:56:01 +0000 (14:56 +0100)]
ppc/pnv: change powernv_ prefix to pnv_ for overall naming consistency
The 'pnv' prefix is now used for all and the routines populating the
device tree start with 'pnv_dt'. The handler of the PnvXScomInterface
is also renamed to 'dt_xscom' which should reflect that it is
populating the device tree under the 'xscom@' node of the chip.
BALATON Zoltan [Sat, 16 Dec 2017 22:42:39 +0000 (23:42 +0100)]
hw/ide: Emulate SiI3112 SATA controller
This is a common generic PCI SATA controller that is also used in PCs
but more importantly guests running on the Sam460ex board prefer this
card and have a driver for it (unlike for other SATA controllers
already emulated).
pseries: Update SLOF firmware image to qemu-slof-20171214
The main changes are:
- able to handle more devices with specified bootindex;
- implements flatten device tree rendering, for both QEMU and guest kernel.
The full list is:
> boot: use a temporary bootdev-buf
> boot: do not concatenate bootdev
> libvirtio: Mark struct virtio_scsi_req_cmd as packed
> fdt: Implement "fdt-fetch" method for client interface
> rtas: Store RTAS address and entry in the device tree
> board-qemu: Fix slof-build-id length
> fdt: Pass the resulting device tree to QEMU
> fdt: Fix version and add a word for FDT header size
> tree: Rework set-chosen-cpu and store /chosen ihandle and phandle
> node: Add some documentation
> Revert various SLOF-to-QEMU private hypercalls
> Use input-device and output-device
> netboot: Create bootp-response when bootp is used
> libnet/ipv6: assign times_asked value directly
> usb-xhci: Reset ERSTSZ together with ERSTBA
> virtio-net: rework the driver to support multiple open
> board-qemu: add private hcall to inform host on "phandle" update
sun4u: split IOMMU device out from apb.c to sun4u_iommu.c
By separating the sun4u IOMMU device into new sun4u_iommu.c and sun4m_iommu.h
files we noticeably simplify apb.c whilst bringing sun4u in line with all the
other IOMMU-supporting architectures.
This is in preparation to split the IOMMU device out of the APB. As part of
this commit we also enforce separation of the IOMMU and APB devices by using
a QOM object link to pass the IOMMU reference and accessing the IOMMU registers
via a separate memory region mapped into the APB config space rather than
directly.
Mark Cave-Ayland [Thu, 21 Dec 2017 07:32:57 +0000 (07:32 +0000)]
ebus: wire up OBIO interrupts to APB pbm via qdev GPIOs
This enables us to remove the static array mapping in the ISA IRQ
handler (and the embedded reference to the APB device) by formalising
the interrupt wiring via the qdev GPIO API.
For more clarity we replace the APB OBIO interrupt numbers with constants
designating the interrupt source, and rename isa_irq_handler() to
ebus_isa_irq_handler().
Mark Cave-Ayland [Thu, 21 Dec 2017 07:32:57 +0000 (07:32 +0000)]
apb: remove busA property from PBMPCIBridge state
Since the previous commit the only remaining use of the qdev busA property is
to configure the PCI bridge in front of the onboard ebus devices differently
to allow early OpenBIOS serial console access.
Instead we can now manually update the PCI configuration for bridge A in
pci_pbm_reset() and thus completely remove the busA property from the
PBMPCIBridge state.
Mark Cave-Ayland [Thu, 21 Dec 2017 07:32:57 +0000 (07:32 +0000)]
apb: remove pci_apb_init() and instantiate APB device using qdev
By making the special_base and mem_base values qdev properties, we can move
the remaining parts of pci_apb_init() into the pbm init() and realize()
functions.
This finally allows us to instantiate the APB directly using standard qdev
create/init functions in sun4u.c.
Mark Cave-Ayland [Thu, 21 Dec 2017 07:32:57 +0000 (07:32 +0000)]
sun4u: remove pci_ebus_init() function
This is initialisation that should really take place in the ebus realize
function. As part of this we also rework the ebus IRQ mapping so that
instead of having to pass in the array of pbm_irqs, we obtain a reference
to them by looking up the APB device during ebus realize.
Mark Cave-Ayland [Thu, 21 Dec 2017 07:32:57 +0000 (07:32 +0000)]
sun4u: ebus QOMify tidy-up
The main change here is to introduce the proper TYPE_EBUS/EBUS QOM macros
and remove the use of DO_UPCAST.
Alongside this there are some a couple of minor cosmetic changes and a rename
of pci_ebus_realize() to ebus_realize() since the ebus device is always what
is effectively a PCI-ISA bridge.
nbd/server: add additional assert to nbd_export_put
This place is not obvious, nbd_export_close may theoretically reduce
refcount to 0. It may happen if someone calls nbd_export_put on named
export not through nbd_export_set_name when refcount is 1.
Peter Maydell [Tue, 9 Jan 2018 18:23:27 +0000 (18:23 +0000)]
Merge remote-tracking branch 'remotes/xtensa/tags/20180109-xtensa' into staging
target/xtensa updates:
- add libisa to the xtensa target;
- change xtensa instruction translator to use it;
- switch existing xtensa cores to use it;
- add support for a number of instructions: salt/saltu, const16,
GPIO32 group, debug mode and MMU-related;
- add disassembler for Xtensa.
Max Filippov [Thu, 2 Nov 2017 22:05:56 +0000 (15:05 -0700)]
target/xtensa: implement const16
const16 is an opcode that shifts 16 lower bits of an address register
to the 16 upper bits and puts its immediate operand into the lower 16
bits. It is not controlled by an Xtensa option and doesn't have a fixed
opcode.
Max Filippov [Sat, 18 Feb 2017 00:21:36 +0000 (16:21 -0800)]
target/xtensa: implement GPIO32
GPIO32 is not in the core ISA, but it was widely used in Diamond Cores.
This implementation doesn't do actual I/O and doesn't handle the case of
GPIO32 state being a part of coprocessor.
Max Filippov [Sun, 29 Jan 2017 11:50:25 +0000 (03:50 -0800)]
target/xtensa: add internal/noop SRs and opcodes
Add two special registers: MMID and DDR:
- MMID is write-only and the only side effect of writing to it is output
to the trace port, which is not emulated;
- DDR is only accessible in debug mode, which is not emulated.
Add two debug-mode-only opcodes:
- rfdd and rfdo do return from the debug mode, which is not emulated.
Add three internal opcodes for full MMU:
- hwwdtlba and hwwitlba are the internal opcodes that write a value into
autoupdate DTLB or ITLB entry.
- ldpte is internal opcode that loads PTE entry that covers the most
recent page fault address.
None of these three opcodes may appear in a valid instruction.
Max Filippov [Sat, 4 Nov 2017 03:30:30 +0000 (20:30 -0700)]
target/xtensa: tests: fix memctl SR test
memctl SR is not available on dc232b, as it was introduced in more
recent hardware release. Now that this information is available through
the libisa the test fails. Fix the test.
Max Filippov [Sat, 4 Nov 2017 02:44:46 +0000 (19:44 -0700)]
target/xtensa: use libisa for instruction decoding
Replace manual opcode analysis with libisa-based code. This makes it
possible to support variable-encoding instructions of the core ISA, like
const16, and will allow to support advanced Xtensa features, like FLIX
and TIE.
Peter Maydell [Tue, 9 Jan 2018 15:22:47 +0000 (15:22 +0000)]
Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2018-01-08' into staging
nbd patches for 2018-01-08
- Eric Blake: 0/2 Optimize sparse reads over NBD
- Murilo Opsfelder Araujo: block/nbd: fix segmentation fault when .desc is not null-terminated
# gpg: Signature made Mon 08 Jan 2018 15:21:19 GMT
# gpg: using RSA key 0xA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <[email protected]>"
# gpg: aka "Eric Blake (Free Software Programmer) <[email protected]>"
# gpg: aka "[jpeg image of size 6874]"
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A
* remotes/ericb/tags/pull-nbd-2018-01-08:
block/nbd: fix segmentation fault when .desc is not null-terminated
nbd/server: Optimize final chunk of sparse read
nbd/server: Implement sparse reads atop structured reply
Peter Maydell [Mon, 8 Jan 2018 22:14:24 +0000 (22:14 +0000)]
Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into staging
- Aneesh no longer listed in MAINTAINERS,
- deprecation of the handle backend,
- improved error reporting, especially when the local backend fails to
open the VirtFS root,
- virtio-9p-test to behave more like a real virtio guest driver: set
DRIVER_OK when ready to use the device and process the used ring
for completed requests,
- cosmetic fixes (mostly coding style related).
# gpg: Signature made Mon 08 Jan 2018 10:19:18 GMT
# gpg: using RSA key 0x71D4D5E5822F73D6
# gpg: Good signature from "Greg Kurz <[email protected]>"
# gpg: aka "Gregory Kurz <[email protected]>"
# gpg: aka "[jpeg image of size 3330]"
# Primary key fingerprint: B482 8BAF 9431 40CE F2A3 4910 71D4 D5E5 822F 73D6
* remotes/gkurz/tags/for-upstream:
MAINTAINERS: Drop Aneesh as 9pfs maintainer
9pfs: deprecate handle backend
fsdev: improve error handling of backend init
fsdev: improve error handling of backend opts parsing
tests: virtio-9p: set DRIVER_OK before using the device
tests: virtio-9p: fix ISR dependence
9pfs: make pdu_marshal() and pdu_unmarshal() static functions
9pfs: fix error path in pdu_submit()
9pfs: fix type in *_parse_opts declarations
9pfs: handle: fix type definition
9pfs: fix some type definitions
fsdev: fix some type definitions
9pfs: fix XattrOperations typedef
virtio-9p: move unrealize/realize after virtio_9p_transport definition
In commit c97d6d2cdf97ed we accidentally added code to configure
that uses '==' for string equality testing. This is a bashism --
the portable way to write this is '='.
This fixes the "Unexpected operator error" complaint produced
if the system /bin/sh is dash.
block/nbd: fix segmentation fault when .desc is not null-terminated
The find_desc_by_name() from util/qemu-option.c relies on the .name not being
NULL to call strcmp(). This check becomes unsafe when the list is not
NULL-terminated, which is the case of nbd_runtime_opts in block/nbd.c, and can
result in segmentation fault when strcmp() tries to access an invalid memory:
#0 0x00007fff8c75f7d4 in __strcmp_power9 () from /lib64/libc.so.6
#1 0x00000000102d3ec8 in find_desc_by_name (desc=0x1036d6f0, name=0x28e46670 "server.path") at util/qemu-option.c:166
#2 0x00000000102d93e0 in qemu_opts_absorb_qdict (opts=0x28e47a80, qdict=0x28e469a0, errp=0x7fffec247c98) at util/qemu-option.c:1026
#3 0x000000001012a2e4 in nbd_open (bs=0x28e42290, options=0x28e469a0, flags=24578, errp=0x7fffec247d80) at block/nbd.c:406
#4 0x00000000100144e8 in bdrv_open_driver (bs=0x28e42290, drv=0x1036e070 <bdrv_nbd_unix>, node_name=0x0, options=0x28e469a0, open_flags=24578, errp=0x7fffec247f50) at block.c:1135
#5 0x0000000010015b04 in bdrv_open_common (bs=0x28e42290, file=0x0, options=0x28e469a0, errp=0x7fffec247f50) at block.c:1395
>From gdb, the desc[i].name was not NULL and resulted in strcmp() accessing an
invalid memory:
>>> p desc[5]
$8 = {
name = 0x1037f098 "R27A",
type = 1561964883,
help = 0xc0bbb23e <error: Cannot access memory at address 0xc0bbb23e>,
def_value_str = 0x2 <error: Cannot access memory at address 0x2>
}
>>> p desc[6]
$9 = {
name = 0x103dac78 <__gcov0.do_qemu_init_bdrv_nbd_init> "\001",
type = 272101528,
help = 0x29ec0b754403e31f <error: Cannot access memory at address 0x29ec0b754403e31f>,
def_value_str = 0x81f343b9 <error: Cannot access memory at address 0x81f343b9>
}
This patch fixes the segmentation fault in strcmp() by adding a NULL element at
the end of nbd_runtime_opts.desc list, which is the common practice to most of
other structs like runtime_opts in block/null.c. Thus, the desc[i].name != NULL
check becomes safe because it will not evaluate to true when .desc list reached
its end.
Eric Blake [Tue, 7 Nov 2017 03:09:12 +0000 (21:09 -0600)]
nbd/server: Optimize final chunk of sparse read
If we are careful to handle 0-length read requests correctly,
we can optimize our sparse read to send the NBD_REPLY_FLAG_DONE
bit on our last OFFSET_DATA or OFFSET_HOLE chunk rather than
needing a separate chunk.
The reason that NBD added structured reply in the first place was
to allow for efficient reads of sparse files, by allowing the
reply to include chunks to quickly communicate holes to the client
without sending lots of zeroes over the wire. Time to implement
this in the server; our client can already read such data.
We can only skip holes insofar as the block layer can query them;
and only if the client is okay with a fragmented request (if a
client requests NBD_CMD_FLAG_DF and the entire read is a hole, we
could technically return a single NBD_REPLY_TYPE_OFFSET_HOLE, but
that's a fringe case not worth catering to here). Sadly, the
control flow is a bit wonkier than I would have preferred, but
it was minimally invasive to have a split in the action between
a fragmented read (handled directly where we recognize
NBD_CMD_READ with the right conditions, and sending multiple
chunks) vs. a single read (handled at the end of nbd_trip, for
both simple and structured replies, when we know there is only
one thing being read). Likewise, I didn't make any effort to
optimize the final chunk of a fragmented read to set the
NBD_REPLY_FLAG_DONE, but unconditionally send that as a separate
NBD_REPLY_TYPE_NONE.
Peter Maydell [Mon, 8 Jan 2018 13:44:01 +0000 (13:44 +0000)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches
# gpg: Signature made Fri 22 Dec 2017 14:09:01 GMT
# gpg: using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (35 commits)
block: Keep nodes drained between reopen_queue/multiple
commit: Simplify reopen of base
test-bdrv-drain: Test graph changes in drained section
block: Allow graph changes in subtree drained section
test-bdrv-drain: Recursive draining with multiple parents
test-bdrv-drain: Test behaviour in coroutine context
test-bdrv-drain: Tests for bdrv_subtree_drain
block: Add bdrv_subtree_drained_begin/end()
block: Don't notify parents in drain call chain
test-bdrv-drain: Test nested drain sections
block: Nested drain_end must still call callbacks
block: Don't block_job_pause_all() in bdrv_drain_all()
test-bdrv-drain: Test drain vs. block jobs
blockjob: Pause job on draining any job BDS
test-bdrv-drain: Test bs->quiesce_counter
test-bdrv-drain: Test callback for bdrv_drain
block: Make bdrv_drain() driver callbacks non-recursive
block: Assert drain_all is only called from main AioContext
block: Remove unused bdrv_requests_pending
block: Mention -drive cyls/heads/secs/trans/serial/addr in deprecation chapter
...
Peter Maydell [Mon, 8 Jan 2018 11:39:50 +0000 (11:39 +0000)]
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream-hvf' into staging
Initial support for the HVF accelerator
# gpg: Signature made Sat 23 Dec 2017 07:51:18 GMT
# gpg: using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg: aka "Paolo Bonzini <[email protected]>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini/tags/for-upstream-hvf:
i386: hvf: cleanup x86_gen.h
i386: hvf: remove VM_PANIC from "in"
i386: hvf: remove addr_t
i386: hvf: simplify flag handling
i386: hvf: abort on decoding error
i386: hvf: remove ZERO_INIT macro
i386: hvf: remove more dead emulator code
i386: hvf: unify register enums between HVF and the rest
i386: hvf: header cleanup
i386: hvf: move all hvf files in the same directory
i386: hvf: inject General Protection Fault when vmexit through vmcall
i386: hvf: refactor event injection code for hvf
i386: hvf: implement vga dirty page tracking
i386: refactor KVM cpuid code so that it applies to hvf as well
i386: hvf: implement hvf_get_supported_cpuid
i386: hvf: use new helper functions for put/get xsave
i386: hvf: fix licensing issues; isolate task handling code (GPL v2-only)
i386: hvf: add code base from Google's QEMU repository
apic: add function to apic that will be used by hvf
Greg Kurz [Mon, 8 Jan 2018 10:18:23 +0000 (11:18 +0100)]
9pfs: deprecate handle backend
This backend raise some concerns:
- doesn't support symlinks
- fails +100 tests in the PJD POSIX file system test suite [1]
- requires the QEMU process to run with the CAP_DAC_READ_SEARCH
capability, which isn't recommended for security reasons
This backend should not be used and wil be removed. The 'local'
backend is the recommended alternative.
Greg Kurz [Mon, 8 Jan 2018 10:18:23 +0000 (11:18 +0100)]
fsdev: improve error handling of backend init
This patch changes some error messages in the backend init code and
convert backends to propagate QEMU Error objects instead of calling
error_report().
One notable improvement is that the local backend now provides a more
detailed error report when it fails to open the shared directory.
Greg Kurz [Mon, 8 Jan 2018 10:18:23 +0000 (11:18 +0100)]
fsdev: improve error handling of backend opts parsing
This patch changes some error messages in the backend opts parsing
code and convert backends to propagate QEMU Error objects instead
of calling error_report().