Git Repo - qemu.git/log

virtio-mem: Support "prealloc=on" option

For scarce memory resources, such as hugetlb, we want to be able to
prealloc such memory resources in order to not crash later on access. On
simple user errors we could otherwise easily run out of memory resources
an crash the VM -- pretty much undesired.

For ordinary memory devices, such as DIMMs, we preallocate memory via the
memory backend for such use cases; however, with virtio-mem we're dealing
with sparse memory backends; preallocating the whole memory backend
destroys the whole purpose of virtio-mem.

Instead, we want to preallocate memory when actually exposing memory to the
VM dynamically, and fail plugging memory gracefully + warn the user in case
preallocation fails.

A common use case for hugetlb will be using "reserve=off,prealloc=off" for
the memory backend and "prealloc=on" for the virtio-mem device. This
way, no huge pages will be reserved for the process, but we can recover
if there are no actual huge pages when plugging memory. Libvirt is
already prepared for this.

Note that preallocation cannot protect from the OOM killer -- which
holds true for any kind of preallocation in QEMU. It's primarily useful
only for scarce memory resources such as hugetlb, or shared file-backed
memory. It's of little use for ordinary anonymous memory that can be
swapped, KSM merged, ... but we won't forbid it.

Reviewed-by: Michal Privoznik <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20211217134611 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

util/oslib-posix: Forward SIGBUS to MCE handler under Linux

Temporarily modifying the SIGBUS handler is really nasty, as we might be
unlucky and receive an MCE SIGBUS while having our handler registered.
Unfortunately, there is no way around messing with SIGBUS when
MADV_POPULATE_WRITE is not applicable or not around.

Let's forward SIGBUS that don't belong to us to the already registered
handler and document the situation.

Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Michal Privoznik <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20211217134611 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

Merge tag 'pull-target-arm-20220107' of https://git.linaro.org/people/pmaydell/qemu-arm into staging

target-arm queue:
* Add dummy Aspeed AST2600 Display Port MCU (DPMCU)
* Add missing FEAT_TLBIOS instructions
* arm_gicv3_its: Various bug fixes and cleanups
* kudo-bmc: Add more devices

# gpg: Signature made Fri 07 Jan 2022 09:20:24 AM PST
# gpg:                using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Peter Maydell <[email protected]>" [full]
# gpg:                 aka "Peter Maydell <[email protected]>" [full]
# gpg:                 aka "Peter Maydell <[email protected]>" [full]

* tag 'pull-target-arm-20220107' of https://git.linaro.org/people/pmaydell/qemu-arm:
  hw/arm: kudo add lm75s on bus 13
  hw/arm: add i2c muxes to kudo-bmc
  hw/arm: attach MMC to kudo-bmc
  hw/arm: Add kudo i2c eeproms.
  hw/intc/arm_gicv3_its: Rename max_l2_entries to num_l2_entries
  hw/intc/arm_gicv3_its: Fix various off-by-one errors
  hw/intc/arm_gicv3_its: Use FIELD macros for CTEs
  hw/intc/arm_gicv3_its: Correct comment about CTE RDBase field size
  hw/intc/arm_gicv3_its: Use FIELD macros for DTEs
  hw/intc/arm_gicv3_its: Correct handling of MAPI
  hw/intc/arm_gicv3_its: Don't misuse GITS_TYPE_PHYSICAL define
  hw/intc/arm_gicv3_its: Correct setting of TableDesc entry_sz
  hw/intc/arm_gicv3_its: Reduce code duplication in extract_table_params()
  hw/intc/arm_gicv3_its: Don't return early in extract_table_params() loop
  hw/intc/arm_gicv3_its: Remove maxids union from TableDesc
  hw/intc/arm_gicv3_its: Remove redundant ITS_CTLR_ENABLED define
  hw/intc/arm_gicv3_its: Correct off-by-one bounds check on rdbase
  target/arm: Add missing FEAT_TLBIOS instructions
  Add dummy Aspeed AST2600 Display Port MCU (DPMCU)

Signed-off-by: Richard Henderson <[email protected]>

hw/arm: kudo add lm75s on bus 13

Add the four lm75s behind the mux on bus 13.

Tested by booting the firmware:
lm75 42-0048: hwmon0: sensor 'lm75'
lm75 43-0049: supply vs not found, using dummy regulator
lm75 43-0049: hwmon1: sensor 'lm75'
lm75 44-0048: supply vs not found, using dummy regulator
lm75 44-0048: hwmon2: sensor 'lm75'
lm75 45-0049: supply vs not found, using dummy regulator
lm75 45-0049: hwmon3: sensor 'lm75'

Signed-off-by: Patrick Venture <[email protected]>
Reviewed-by: Titus Rwantare <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: 20220102215844.2888833 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

hw/arm: add i2c muxes to kudo-bmc

Signed-off-by: Patrick Venture <[email protected]>
Reviewed-by: Hao Wu <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: 20220102215844.2888833 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

hw/arm: attach MMC to kudo-bmc

Signed-off-by: Shengtan Mao <[email protected]>
Reviewed-by: Hao Wu <[email protected]>
Reviewed-by: Chris Rauer <[email protected]>
Message-id: 20220102215844.2888833 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

hw/arm: Add kudo i2c eeproms.

Signed-off-by: Chris Rauer <[email protected]>
Reviewed-by: Hao Wu <[email protected]>
Reviewed-by: Patrick Venture <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: 20220102215844.2888833 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

hw/intc/arm_gicv3_its: Rename max_l2_entries to num_l2_entries

In several places we have a local variable max_l2_entries which is
the number of entries which will fit in a level 2 table. The
calculations done on this value are correct; rename it to
num_l2_entries to fit the convention we're using in this code.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>

hw/intc/arm_gicv3_its: Fix various off-by-one errors

The ITS code has to check whether various parameters passed in
commands are in-bounds, where the limit is defined in terms of the
number of bits that are available for the parameter.  (For example,
the GITS_TYPER.Devbits ID register field specifies the number of
DeviceID bits minus 1, and device IDs passed in the MAPTI and MAPD
command packets must fit in that many bits.)

Currently we have off-by-one bugs in many of these bounds checks.
The typical problem is that we define a max_foo as 1 << n. In
the Devbits example, we set
  s->dt.max_ids = 1UL << (GITS_TYPER.Devbits + 1).
However later when we do the bounds check we write
  if (devid > s->dt.max_ids) { /* command error */ }
which incorrectly permits a devid of 1 << n.

These bugs will not cause QEMU crashes because the ID values being
checked are only used for accesses into tables held in guest memory
which we access with address_space_*() functions, but they are
incorrect behaviour of our emulation.

Fix them by standardizing on this pattern:
* bounds limits are named num_foos and are the 2^n value
   (equal to the number of valid foo values)
* bounds checks are either
   if (fooid < num_foos) { good }
   or
   if (fooid >= num_foos) { bad }

In this commit we fix the handling of the number of IDs
in the device table and the collection table, and the number
of commands that will fit in the command queue.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>

hw/intc/arm_gicv3_its: Use FIELD macros for CTEs

Use FIELD macros to handle CTEs, rather than ad-hoc mask-and-shift.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

hw/intc/arm_gicv3_its: Correct comment about CTE RDBase field size

The comment says that in our CTE format the RDBase field is 36 bits;
in fact for us it is only 16 bits, because we use the RDBase format
where it specifies a 16-bit CPU number. The code already uses
RDBASE_PROCNUM_LENGTH (16) as the field width, so fix the comment
to match it.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

hw/intc/arm_gicv3_its: Use FIELD macros for DTEs

Currently the ITS code that reads and writes DTEs uses open-coded
shift-and-mask to assemble the various fields into the 64-bit DTE
word. The names of the macros used for mask and shift values are
also somewhat inconsistent, and don't follow our usual convention
that a MASK macro should specify the bits in their place in the word.
Replace all these with use of the FIELD macro.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

hw/intc/arm_gicv3_its: Correct handling of MAPI

The MAPI command takes arguments DeviceID, EventID, ICID, and is
defined to be equivalent to MAPTI DeviceID, EventID, EventID, ICID.
(That is, where MAPTI takes an explicit pINTID, MAPI uses the EventID
as the pINTID.)

We didn't quite get this right. In particular the error checks for
MAPI include "EventID does not specify a valid LPI identifier", which
is the same as MAPTI's error check for the pINTID field. QEMU's code
skips the pINTID error check entirely in the MAPI case.

We can fix this bug and in the process simplify the code by switching
to the obvious implementation of setting pIntid = eventid early
if ignore_pInt is true.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

hw/intc/arm_gicv3_its: Don't misuse GITS_TYPE_PHYSICAL define

The GITS_TYPE_PHYSICAL define is the value we set the
GITS_TYPER.Physical field to -- this is 1 to indicate that we support
physical LPIs.  (Support for virtual LPIs is the GITS_TYPER.Virtual
field.) We also use this define as the *value* that we write into an
interrupt translation table entry's INTTYPE field, which should be 1
for a physical interrupt and 0 for a virtual interrupt.  Finally, we
use it as a *mask* when we read the interrupt translation table entry
INTTYPE field.

Untangle this confusion: define an ITE_INTTYPE_VIRTUAL and
ITE_INTTYPE_PHYSICAL to be the valid values of the ITE INTTYPE
field, and replace the ad-hoc collection of ITE_ENTRY_* defines with
use of the FIELD() macro to define the fields of an ITE and the
FIELD_EX64() and FIELD_DP64() macros to read and write them.
We use ITE in the new setup, rather than ITE_ENTRY, because
ITE stands for "Interrupt translation entry" and so the extra
"entry" would be redundant.

We take the opportunity to correct the name of the field that holds
the GICv4 'doorbell' interrupt ID (this is always the value 1023 in a
GICv3, which is why we were calling it the 'spurious' field).

The GITS_TYPE_PHYSICAL define is then used in only one place, where
we set the initial GITS_TYPER value.  Since GITS_TYPER.Physical is
essentially a boolean, hiding the '1' value behind a macro is more
confusing than helpful, so expand out the macro there and remove the
define entirely.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

hw/intc/arm_gicv3_its: Correct setting of TableDesc entry_sz

We set the TableDesc entry_sz field from the appropriate
GITS_BASER.ENTRYSIZE field.  That ID register field specifies the
number of bytes per table entry minus one.  However when we use
td->entry_sz we assume it to be the number of bytes per table entry
(for instance we calculate the number of entries in a page by
dividing the page size by the entry size).

The effects of this bug are:
* we miscalculate the maximum number of entries in the table,
   so our checks on guest index values are wrong (too lax)
* when looking up an entry in the second level of an indirect
   table, we calculate an incorrect index into the L2 table.
   Because we make the same incorrect calculation on both
   reads and writes of the L2 table, the guest won't notice
   unless it's unlucky enough to use an index value that
   causes us to index off the end of the L2 table page and
   cause guest memory corruption in whatever follows

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

hw/intc/arm_gicv3_its: Reduce code duplication in extract_table_params()

The extract_table_params() decodes the fields in the GITS_BASER<n>
registers into TableDesc structs. Since the fields are the same for
all the GITS_BASER<n> registers, there is currently a lot of code
duplication within the switch (type) statement. Refactor so that the
cases include only what is genuinely different for each type:
the calculation of the number of bits in the ID value that indexes
into the table.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>

hw/intc/arm_gicv3_its: Don't return early in extract_table_params() loop

In extract_table_params() we process each GITS_BASER<n> register. If
the register's Valid bit is not set, this means there is no
in-guest-memory table and so we should not try to interpret the other
fields in the register. This was incorrectly coded as a 'return'
rather than a 'break', so instead of looping round to process the
next GITS_BASER<n> we would stop entirely, treating any later tables
as being not valid also.

This has no real guest-visible effects because (since we don't have
GITS_TYPER.HCC != 0) the guest must in any case set up all the
GITS_BASER<n> to point to valid tables, so this only happens in an
odd misbehaving-guest corner case.

Fix the check to 'break', so that we leave the case statement and
loop back around to the next GITS_BASER<n>.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

hw/intc/arm_gicv3_its: Remove maxids union from TableDesc

The TableDesc struct defines properties of the in-guest-memory tables
which the guest tells us about by writing to the GITS_BASER<n>
registers. This struct currently has a union 'maxids', but all the
fields of the union have the same type (uint32_t) and do the same
thing (record one-greater-than the maximum ID value that can be used
as an index into the table).

We're about to add another table type (the GICv4 vPE table); rather
than adding another specifically-named union field for that table
type with the same type as the other union fields, remove the union
entirely and just have a 'uint32_t max_ids' struct field.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

hw/intc/arm_gicv3_its: Remove redundant ITS_CTLR_ENABLED define

We currently define a bitmask for the GITS_CTLR ENABLED bit in
two ways: as ITS_CTLR_ENABLED, and via the FIELD() macro as
R_GITS_CTLR_ENABLED_MASK. Consistently use the FIELD macro version
everywhere and remove the redundant ITS_CTLR_ENABLED define.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>

hw/intc/arm_gicv3_its: Correct off-by-one bounds check on rdbase

The checks in the ITS on the rdbase values in guest commands are
off-by-one: they permit the guest to pass us a value equal to
s->gicv3->num_cpu, but the valid values are 0...num_cpu-1. This
meant the guest could cause us to index off the end of the
s->gicv3->cpu[] array when calling gicv3_redist_process_lpi(), and we
would probably crash.

(This is not a security bug, because this code is only usable
with emulation, not with KVM.)

Cc: [email protected]
Fixes: 17fb5e36aabd4b ("hw/intc: GICv3 redistributor ITS processing")
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

target/arm: Add missing FEAT_TLBIOS instructions

Some of the instructions added by the FEAT_TLBIOS extension were forgotten
when the extension was originally added to QEMU.

Fixes: 7113d618505b ("target/arm: Add support for FEAT_TLBIOS")
Signed-off-by: Idan Horowitz <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 20211231103928.1455657 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

Add dummy Aspeed AST2600 Display Port MCU (DPMCU)

AST2600 Display Port MCU introduces 0x18000000~0x1803FFFF as it's memory
and io address. If guest machine try to access DPMCU memory, it will
cause a fatal error.

Signed-off-by: Troy Lee <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>
Message-id: 20211210083034 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

util/oslib-posix: Support concurrent os_mem_prealloc() invocation

Add a mutex to protect the SIGBUS case, as we cannot mess concurrently
with the sigbus handler and we have to manage the global variable
sigbus_memset_context. The MADV_POPULATE_WRITE path can run
concurrently.

Note that page_mutex and page_cond are shared between concurrent
invocations, which shouldn't be a problem.

This is a preparation for future virtio-mem prealloc code, which will call
os_mem_prealloc() asynchronously from an iothread when handling guest
requests.

Reviewed-by: Pankaj Gupta <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Michal Privoznik <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20211217134611 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

util/oslib-posix: Avoid creating a single thread with MADV_POPULATE_WRITE

Let's simplify the case when we only want a single thread and don't have
to mess with signal handlers.

Reviewed-by: Pankaj Gupta <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Michal Privoznik <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20211217134611 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

util/oslib-posix: Don't create too many threads with small memory or little pages

Let's limit the number of threads to something sane, especially that
- We don't have more threads than the number of pages we have
- We don't have threads that initialize small (< 64 MiB) memory

Reviewed-by: Pankaj Gupta <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Michal Privoznik <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20211217134611 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

util/oslib-posix: Introduce and use MemsetContext for touch_all_pages()

Let's minimize the number of global variables to prepare for
os_mem_prealloc() getting called concurrently and make the code a bit
easier to read.

The only consumer that really needs a global variable is the sigbus
handler, which will require protection via a mutex in the future either way
as we cannot concurrently mess with the SIGBUS handler.

Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Michal Privoznik <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20211217134611 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()

Let's sense support and use it for preallocation. MADV_POPULATE_WRITE
does not require a SIGBUS handler, doesn't actually touch page content,
and avoids context switches; it is, therefore, faster and easier to handle
than our current approach.

While MADV_POPULATE_WRITE is, in general, faster than manual
prefaulting, and especially faster with 4k pages, there is still value in
prefaulting using multiple threads to speed up preallocation.

More details on MADV_POPULATE_WRITE can be found in the Linux commits
4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault
page tables") and eb2faa513c24 ("mm/madvise: report SIGBUS as -EFAULT for
MADV_POPULATE_(READ|WRITE)"), and in the man page proposal [1].

This resolves the TODO in do_touch_pages().

In the future, we might want to look into using fallocate(), eventually
combined with MADV_POPULATE_READ, when dealing with shared file/fd
mappings and not caring about memory bindings.

[1] https://lkml.kernel.org/r/20210816081922 [email protected]

Reviewed-by: Pankaj Gupta <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Michal Privoznik <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20211217134611 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

util/oslib-posix: Let touch_all_pages() return an error

Let's prepare touch_all_pages() for returning differing errors. Return
an error from the thread and report the last processed error.

Translate SIGBUS to -EFAULT, as a SIGBUS can mean all different kind of
things (memory error, read error, out of memory). When allocating memory
fails via the current SIGBUS-based mechanism, we'll get:
os_mem_prealloc: preallocating memory failed: Bad address

Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Michal Privoznik <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20211217134611 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

hw/vhost-user-blk: turn on VIRTIO_BLK_F_SIZE_MAX feature for virtio blk device

Turn on pre-defined feature VIRTIO_BLK_F_SIZE_MAX for virtio blk device to
avoid guest DMA request sizes which are too large for hardware spec.

Signed-off-by: Andy Pei <[email protected]>
Message-Id: <1641202092 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Raphael Norwitz <[email protected]>

hw/i386: expose a "smbios-entry-point-type" PC machine property

The i440fx and Q35 machine types are both hardcoded to use the
legacy SMBIOS 2.1 (32-bit) entry point. This is a sensible
conservative choice because SeaBIOS only supports SMBIOS 2.1

EDK2, however, can also support SMBIOS 3.0 (64-bit) entry points,
and QEMU already uses this on the ARM virt machine type.

This adds a property to allow the choice of SMBIOS entry point
versions For example to opt in to 64-bit SMBIOS entry point:

$QEMU -machine q35,smbios-entry-point-type=64

Based on a patch submitted by Daniel Berrangé.

Signed-off-by: Daniel P. Berrangé <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>
Message-Id: <20211026151100.1691925 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>

hw/smbios: Use qapi for SmbiosEntryPointType

This prepares for exposing the SMBIOS entry point type as a
machine property on x86.

Based on a patch from Daniel P. Berrangé.

Signed-off-by: Daniel P. Berrangé <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>
Message-Id: <20211026151100.1691925 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Acked-by: Markus Armbruster <[email protected]>

smbios: Rename SMBIOS_ENTRY_POINT_* enums

Rename the enums to match the naming style used by QAPI, and to
use "32" and "64" instead of "20" and "31".  This will allow us
to more easily move the enum to the QAPI schema later.

About the naming choice: "SMBIOS 2.1 entry point"/"SMBIOS 3.0
entry point" and "32-bit entry point"/"64-bit entry point" are
synonymous in the SMBIOS specification.  However, the phrases
"32-bit entry point" and "64-bit entry point" are used more often.

The new names also avoid confusion between the entry point format
and the actual SMBIOS version reported in the entry point
structure.  For example: currently the 32-bit entry point
actually report SMBIOS 2.8 support, not 2.1.

Based on portions of a patch submitted by Daniel P. Berrangé.

Signed-off-by: Eduardo Habkost <[email protected]>
Message-Id: <20211026151100.1691925 [email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pcie_aer: Don't trigger a LSI if none are defined

Skip triggering an LSI when the AER root error status is updated if no
LSI is defined for the device. We can have a root bridge with no LSI,
MSI and MSI-X defined, for example on POWER systems.

Signed-off-by: Frederic Barrat <[email protected]>
Message-Id: <20211116170133 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>

pci: Export the pci_intx() function

Move the pci_intx() definition to the PCI header file, so that it can
be called from other PCI files. It is used by the next patch.

Signed-off-by: Frederic Barrat <[email protected]>
Message-Id: <20211116170133 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>

vhost-user-blk: propagate error return from generic vhost

Fix the only callsite that doesn't propagate the error code from the
generic vhost code.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20211111153354 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Raphael Norwitz <[email protected]>

vhost: stick to -errno error return convention

The generic vhost code expects that many of the VhostOps methods in the
respective backends set errno on errors. However, none of the existing
backends actually bothers to do so. In a number of those methods errno
from the failed call is clobbered by successful later calls to some
library functions; on a few code paths the generic vhost code then
negates and returns that errno, thus making failures look as successes
to the caller.

As a result, in certain scenarios (e.g. live migration) the device
doesn't notice the first failure and goes on through its state
transitions as if everything is ok, instead of taking recovery actions
(break and reestablish the vhost-user connection, cancel migration, etc)
before it's too late.

To fix this, consolidate on the convention to return negated errno on
failures throughout generic vhost, and use it for error propagation.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20211111153354 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost-user: stick to -errno error return convention

VhostOps methods in user_ops are not very consistent in their error
returns: some return negated errno while others just -1.

Make sure all of them consistently return negated errno. This also
helps error propagation from the functions being called inside.
Besides, this synchronizes the error return convention with the other
two vhost backends, kernel and vdpa, and will therefore allow for
consistent error propagation in the generic vhost code (in a followup
patch).

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20211111153354 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost-vdpa: stick to -errno error return convention

Almost all VhostOps methods in vdpa_ops follow the convention of
returning negated errno on error.

Adjust the few that don't. To that end, rework vhost_vdpa_add_status to
check if setting of the requested status bits has succeeded and return
the respective error code it hasn't, and propagate the error codes
wherever it's appropriate.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20211111153354 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost-backend: stick to -errno error return convention

Almost all VhostOps methods in kernel_ops follow the convention of
returning negated errno on error.

Adjust the only one that doesn't.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20211111153354 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>

vhost-backend: avoid overflow on memslots_limit

Fix the (hypothetical) potential problem when the value parsed out of
the vhost module parameter in sysfs overflows the return value from
vhost_kernel_memslots_limit.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20211111153354 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

chardev/char-socket: tcp_chr_sync_read: don't clobber errno

After the return from tcp_chr_recv, tcp_chr_sync_read calls into a
function which eventually makes a system call and may clobber errno.

Make a copy of errno right after tcp_chr_recv and restore the errno on
return from tcp_chr_sync_read.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20211111153354 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>

chardev/char-socket: tcp_chr_recv: don't clobber errno

tcp_chr_recv communicates the specific error condition to the caller via
errno. However, after setting it, it may call into some system calls or
library functions which can clobber the errno.

Avoid this by moving the errno assignment to the end of the function.

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20211111153354 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>

vhost-user-blk: reconnect on any error during realize

vhost-user-blk realize only attempts to reconnect if the previous
connection attempt failed on "a problem with the connection and not an
error related to the content (which would fail again the same way in the
next attempt)".

However this distinction is very subtle, and may be inadvertently broken
if the code changes somewhere deep down the stack and a new error gets
propagated up to here.

OTOH now that the number of reconnection attempts is limited it seems
harmless to try reconnecting on any error.

So relax the condition of whether to retry connecting to check for any
error.

This patch amends a527e312b5 "vhost-user-blk: Implement reconnection
during realize".

Signed-off-by: Roman Kagan <[email protected]>
Message-Id: <20211111153354 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Raphael Norwitz <[email protected]>

trace-events,pci: unify trace events format

Unify format used by trace_pci_update_mappings_del(),
trace_pci_update_mappings_add(), trace_pci_cfg_write() and
trace_pci_cfg_read() to print the device name and bus number,
slot number and function number.

For instance:

  pci_cfg_read virtio-net-pci 00:0 @0x20 -> 0xffffc00c
  pci_cfg_write virtio-net-pci 00:0 @0x20 <- 0xfea0000c
  pci_update_mappings_del d=0x555810b92330 01:00.0 4,0xffffc000+0x4000
  pci_update_mappings_add d=0x555810b92330 01:00.0 4,0xfea00000+0x4000

becomes

  pci_cfg_read virtio-net-pci 01:00.0 @0x20 -> 0xffffc00c
  pci_cfg_write virtio-net-pci 01:00.0 @0x20 <- 0xfea0000c
  pci_update_mappings_del virtio-net-pci 01:00.0 4,0xffffc000+0x4000
  pci_update_mappings_add virtio-net-pci 01:00.0 4,0xfea00000+0x4000

Signed-off-by: Laurent Vivier <[email protected]>
Message-Id: <20211105192541 [email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Yanan Wang <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio-pci: add support for configure interrupt

Add support for configure interrupt, The process is used kvm_irqfd_assign
to set the gsi to kernel. When the configure notifier was signal by
host, qemu will inject a msix interrupt to guest

Signed-off-by: Cindy Lu <[email protected]>
Message-Id: <20211104164827 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio-mmio: add support for configure interrupt

Add configure interrupt support for virtio-mmio bus. This
interrupt will be working while the backend is vhost-vdpa

Signed-off-by: Cindy Lu <[email protected]>
Message-Id: <20211104164827 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio-net: add support for configure interrupt

Add functions to support configure interrupt in virtio_net
The functions are config_pending and config_mask, while
this input idx is VIRTIO_CONFIG_IRQ_IDX will check the
function of configure interrupt.

Signed-off-by: Cindy Lu <[email protected]>
Message-Id: <20211104164827 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

Merge tag 'linux-user-for-7.0-pull-request' of https://gitlab.com/laurent_vivier/qemu into staging

linux-user pull request 20220106

update netlink entries
nios2 fixes
/proc/self/maps fixes
set/getscheduler update
prctl cleanup and fixes
target_signal.h cleanup
and some trivial fixes

# gpg: Signature made Thu 06 Jan 2022 02:41:07 AM PST
# gpg:                using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Laurent Vivier <[email protected]>" [undefined]
# gpg:                 aka "Laurent Vivier <[email protected]>" [undefined]
# gpg:                 aka "Laurent Vivier (Red Hat) <[email protected]>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* tag 'linux-user-for-7.0-pull-request' of https://gitlab.com/laurent_vivier/qemu: (27 commits)
  linux-user: netlink: update IFLA_BRPORT entries
  linux-user: netlink: Add IFLA_VFINFO_LIST
  linux-user: netlink: update IFLA entries
  linux-user/syscall.c: malloc to g_try_malloc
  linux-user/nios2: Use set_sigmask in do_rt_sigreturn
  linux-user/nios2: Fix sigmask in setup_rt_frame
  linux-user/nios2: Fix EA vs PC confusion
  linux-user/nios2: Map a real kuser page
  linux-user/elfload: Rename ARM_COMMPAGE to HI_COMMPAGE
  linux-user/nios2: Fixes for signal frame setup
  linux-user/nios2: Properly emulate EXCP_TRAP
  linux-user/syscall.c: fix missed flag for shared memory in open_self_maps
  linux-user: call set/getscheduler set/getparam directly
  linux-user: add sched_getattr support
  linux-user/signal: Map exit signals in SIGCHLD siginfo_t
  target/sh4: Implement prctl_unalign_sigbus
  target/hppa: Implement prctl_unalign_sigbus
  target/alpha: Implement prctl_unalign_sigbus
  linux-user: Add code for PR_GET/SET_UNALIGN
  linux-user: Disable more prctl subcodes
  ...

Signed-off-by: Richard Henderson <[email protected]>

vhost: add support for configure interrupt

Add functions to support configure interrupt.
The configure interrupt process will start in vhost_dev_start
and stop in vhost_dev_stop.

Also add the functions to support vhost_config_pending and
vhost_config_mask, for masked_config_notifier, we only
use the notifier saved in vq 0.

Signed-off-by: Cindy Lu <[email protected]>
Message-Id: <20211104164827 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio: add support for configure interrupt

Add the functions to support the configure interrupt in virtio
The function virtio_config_guest_notifier_read will notify the
guest if there is an configure interrupt.
The function virtio_config_set_guest_notifier_fd_handler is
to set the fd hander for the notifier

Signed-off-by: Cindy Lu <[email protected]>
Message-Id: <20211104164827 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost-vdpa: add support for config interrupt

Add new call back function in vhost-vdpa, this function will
set the event fd to kernel. This function will be called
in the vhost_dev_start and vhost_dev_stop

Signed-off-by: Cindy Lu <[email protected]>
Message-Id: <20211104164827 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost: introduce new VhostOps vhost_set_config_call

This patch introduces new VhostOps vhost_set_config_call. This function allows the
vhost to set the event fd to kernel

Signed-off-by: Cindy Lu <[email protected]>
Message-Id: <20211104164827 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio-pci: decouple the single vector from the interrupt process

To reuse the interrupt process in configure interrupt
Need to decouple the single vector from the interrupt process. Add new function
kvm_virtio_pci_vector_use_one and _release_one. These functions are use
for the single vector, the whole process will finish in a loop for the vq number.

Signed-off-by: Cindy Lu <[email protected]>
Message-Id: <20211104164827 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio-pci: decouple notifier from interrupt process

To reuse the notifier process in configure interrupt.
Use the virtio_pci_get_notifier function to get the notifier.
the INPUT of this function is the IDX, the OUTPUT is notifier and
the vector

Signed-off-by: Cindy Lu <[email protected]>
Message-Id: <20211104164827 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio: introduce macro IRTIO_CONFIG_IRQ_IDX

To support configure interrupt for vhost-vdpa
Introduce VIRTIO_CONFIG_IRQ_IDX -1 as configure interrupt's queue index,
Then we can reuse the functions guest_notifier_mask and guest_notifier_pending.
Add the check of queue index in these drivers, if the driver does not support
configure interrupt, the function will just return

Signed-off-by: Cindy Lu <[email protected]>
Message-Id: <20211104164827 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

acpi: validate hotplug selector on access

When bus is looked up on a pci write, we didn't
validate that the lookup succeeded.
Fuzzers thus can trigger QEMU crash by dereferencing the NULL
bus pointer.

Fixes: b32bd763a1 ("pci: introduce acpi-index property for PCI device")
Fixes: CVE-2021-4158
Cc: "Igor Mammedov" <[email protected]>
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/770
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Ani Sinha <[email protected]>

linux-user: netlink: update IFLA_BRPORT entries

add IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT and IFLA_BRPORT_MCAST_EHT_HOSTS_CNT

  # QEMU_LOG=unimp ip a
  Unknown QEMU_IFLA_BRPORT type 37
  Unknown QEMU_IFLA_BRPORT type 38

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <20211219154514.2165728 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: netlink: Add IFLA_VFINFO_LIST

# QEMU_LOG=unimp ip a
Unknown host QEMU_IFLA type: 22

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <20211219154514.2165728 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: netlink: update IFLA entries

Add IFLA_PHYS_PORT_ID, IFLA_PARENT_DEV_NAME, IFLA_PARENT_DEV_BUS_NAME

  # QEMU_LOG=unimp ip a
  Unknown host QEMU_IFLA type: 56
  Unknown host QEMU_IFLA type: 57
  Unknown host QEMU_IFLA type: 34

Signed-off-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <20211219154514.2165728 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user/syscall.c: malloc to g_try_malloc

Use g_try_malloc instead of malloc to alocate the target ifconfig.
Also replace the corresponding free with g_free.

Signed-off-by: Ahmed Abouzied <[email protected]>
Message-Id: <20220104143841 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user/nios2: Use set_sigmask in do_rt_sigreturn

Using do_sigprocmask directly was incorrect, as it will
leave the signal blocked by the outer layers of linux-user.

Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-Id: <20211221025012.1057923 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user/nios2: Fix sigmask in setup_rt_frame

Do not cast the signal mask elements; trust __put_user.

Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-Id: <20211221025012.1057923 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user/nios2: Fix EA vs PC confusion

The real kernel will talk about the user PC as EA,
because that's where the hardware will have copied it,
and where it expects to put it to then use ERET.
But qemu does not emulate all of the exception stuff
while emulating user-only. Manipulate PC directly.

This fixes signal entry and return, and eliminates
some slight confusion from target_cpu_copy_regs.

Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Message-Id: <20211221025012.1057923 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user/nios2: Map a real kuser page

The first word of page1 is data, so the whole thing
can't be implemented with emulation of addresses.
Use init_guest_commpage for the allocation.

Hijack trap number 16 to implement cmpxchg.

Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Message-Id: <20211221025012.1057923 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user/elfload: Rename ARM_COMMPAGE to HI_COMMPAGE

Arm will no longer be the only target requiring a commpage,
but it will continue to be the only target placing the page
at the high end of the address space.

Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-Id: <20211221025012.1057923 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user/nios2: Fixes for signal frame setup

Do not confuse host and guest addresses. Lock and unlock
the target_rt_sigframe structure in setup_rt_sigframe.

Since rt_setup_ucontext always returns 0, drop the return
value entirely. This eliminates the only write to the err
variable in setup_rt_sigframe.

Always copy the siginfo structure.

Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-Id: <20211221025012.1057923 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user/nios2: Properly emulate EXCP_TRAP

The real kernel has to load the instruction and extract
the imm5 field; for qemu, modify the translator to do this.

The use of R_AT for this in cpu_loop was a bug. Handle
the other trap numbers as per the kernel's trap_table.

Reviewed-by: Alex Bennée <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Message-Id: <20211221025012.1057923 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user/syscall.c: fix missed flag for shared memory in open_self_maps

The possible variants for region type in /proc/self/maps are either
private "p" or shared "s". In the current implementation,
we mark shared regions as "-". It could break memory mapping parsers
such as included into ASan/HWASan sanitizers.

Fixes: 01ef6b9e4e4e ("linux-user: factor out reading of /proc/self/maps")
Signed-off-by: Andrey Kazmin <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Acked-by: Alex Bennée <[email protected]>
Message-Id: <20211227125048 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: call set/getscheduler set/getparam directly

There seems to be difference in syscall and libc definition of these
methods and therefore musl does not implement them (1e21e78bf7). Call
syscall directly to ensure the behavior of the libc of user application,
not the libc that was used to build QEMU.

Signed-off-by: Tonis Tiigi <[email protected]>
Message-Id: <20220105041819 [email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: add sched_getattr support

These syscalls are not exposed by glibc. The struct type need to be
redefined as it can't be included directly before
https://lkml.org/lkml/2020/5/28/810 .

sched_attr type can grow in future kernel versions. When client sends
values that QEMU does not understand it will return E2BIG with same
semantics as old kernel would so client can retry with smaller inputs.

Signed-off-by: Tonis Tiigi <[email protected]>
Message-Id: <20220105041819 [email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user/signal: Map exit signals in SIGCHLD siginfo_t

When converting a siginfo_t from waitid(), the interpretation of si_status
depends on the value of si_code: For CLD_EXITED, it is an exit code and
should be copied verbatim. For other codes, it is a signal number
(possibly with additional high bits from ptrace) that should be mapped.

This code was previously changed in commit 1c3dfb506ea3
("linux-user/signal: Decode waitid si_code"), but the fix was
incomplete.

Tested with the following test program:

    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    #include <sys/wait.h>

    int main() {
     pid_t pid = fork();
     if (pid == 0) {
     exit(12);
     } else {
     siginfo_t siginfo = {};
     waitid(P_PID, pid, &siginfo, WEXITED);
     printf("Code: %d, status: %d\n", (int)siginfo.si_code, (int)siginfo.si_status);
     }

     pid = fork();
     if (pid == 0) {
     raise(SIGUSR2);
     } else {
     siginfo_t siginfo = {};
     waitid(P_PID, pid, &siginfo, WEXITED);
     printf("Code: %d, status: %d\n", (int)siginfo.si_code, (int)siginfo.si_status);
     }
    }

Output with an x86_64 host and mips64el target before 1c3dfb506ea3
(incorrect: exit code 12 is translated like a signal):

    Code: 1, status: 17
    Code: 2, status: 17

After 1c3dfb506ea3 (incorrect: signal number is not translated):

    Code: 1, status: 12
    Code: 2, status: 12

With this patch:

    Code: 1, status: 12
    Code: 2, status: 17

Signed-off-by: Matthias Schiffer <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Message-Id: <81534fde7cdfc6acea4889d886fbefdd606630fb.1635019124 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

target/sh4: Implement prctl_unalign_sigbus

Leave TARGET_ALIGNED_ONLY set, but use the new CPUState
flag to set MO_UNALN for the instructions that the kernel
handles in the unaligned trap.

The Linux kernel does not handle all memory operations: no
floating-point and no MAC.

Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Message-Id: <20211227150127.2659293 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

target/hppa: Implement prctl_unalign_sigbus

Leave TARGET_ALIGNED_ONLY set, but use the new CPUState
flag to set MO_UNALN for the instructions that the kernel
handles in the unaligned trap.

Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Message-Id: <20211227150127.2659293 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

target/alpha: Implement prctl_unalign_sigbus

Leave TARGET_ALIGNED_ONLY set, but use the new CPUState
flag to set MO_UNALN for the instructions that the kernel
handles in the unaligned trap.

Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Message-Id: <20211227150127.2659293 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Add code for PR_GET/SET_UNALIGN

This requires extra work for each target, but adds the
common syscall code, and the necessary flag in CPUState.

Reviewed-by: Warner Losh <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-Id: <20211227150127.2659293 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Disable more prctl subcodes

Create a list of subcodes that we want to pass on, a list of
subcodes that should not be passed on because they would affect
the running qemu itself, and a list that probably could be
implemented but require extra work. Do not pass on unknown subcodes.

Reviewed-by: Laurent Vivier <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-Id: <20211227150127.2659293 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Split out do_prctl and subroutines

Since the prctl constants are supposed to be generic, supply
any that are not provided by the host.

Split out subroutines for PR_GET_FP_MODE, PR_SET_FP_MODE,
PR_GET_VL, PR_SET_VL, PR_RESET_KEYS, PR_SET_TAGGED_ADDR_CTRL,
PR_GET_TAGGED_ADDR_CTRL. Return EINVAL for guests that do
not support these options rather than pass them on to the host.

Reviewed-by: Laurent Vivier <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-Id: <20211227150127.2659293 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Remove TARGET_SIGSTKSZ

TARGET_SIGSTKSZ is not used, we should remove it.

Signed-off-by: Song Gao <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <1637893388 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: target_syscall.h remove definition TARGET_MINSIGSTKSZ

TARGET_MINSIGSTKSZ has been defined in generic/signal.h
or target_signal.h, We don't need to define it again.

Signed-off-by: Song Gao <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <1637893388 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Move target_signal.h generic definitions to generic/signal.h

No code change

Suggested-by: Richard Henderson <[email protected]>
Signed-off-by: Song Gao <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <1637893388 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Mark cpu_loop() with noreturn attribute

cpu_loop() never exits, so mark it with QEMU_NORETURN.

Reviewed-by: Richard Henderson <[email protected]>
Reviewed-By: Warner Losh <[email protected]>
Reviewed-by: Bin Meng <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Warner Losh <[email protected]>
Message-Id: <20211106113916 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user/hexagon: Use generic target_stat64 structure

Linux Hexagon port doesn't define a specific 'struct stat'
but uses the generic one (see Linux commit 6103ec56c65c [*]
"asm-generic: add generic ABI headers" which predates the
introduction of the Hexagon port).

Remove the target specific target_stat (which in fact is the
target_stat64 structure but uses incorrect target_long and
ABI unsafe long long types) and use the generic target_stat64
instead.

[*] https://github.com/torvalds/linux/commit/6103ec56c65c3#diff-5f59b07b38273b7d6a74193bc81a8cd18928c688276eae20cb10c569de3253ee

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Taylor Simpson <[email protected]>
Tested-by: Taylor Simpson <[email protected]>
Message-Id: <20211116210919.2823206 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

qemu-binfmt-conf.sh: fix -F option

qemu-binfmt-conf.sh should use "-F" as short option for "--qemu-suffix".
Fix the getopt call to make this work.

Fixes: 7155be7cda5c ("qemu-binfmt-conf.sh: allow to provide a suffix to the interpreter name")
Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Message-Id: <20211129135100 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

virtio-mem: Don't skip alignment checks when warning about block size

If we warn about the block size being smaller than the default, we skip
some alignment checks.

This can currently only fail on x86-64, when specifying a block size of
1 MiB, however, we detect the THP size of 2 MiB.

Fixes: 228957fea3a9 ("virtio-mem: Probe THP size to determine default block size")
Cc: "Michael S. Tsirkin" <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20211011173305 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

Merge tag 'pull-request-2022-01-05' of https://gitlab.com/thuth/qemu into staging

* Add compat machines for 7.0
* Some minor qtest and unit test improvements
* Remove -no-quit option
* Fixes for the docs

# gpg: Signature made Wed 05 Jan 2022 02:10:49 AM PST
# gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Thomas Huth <[email protected]>" [undefined]
# gpg:                 aka "Thomas Huth <[email protected]>" [undefined]
# gpg:                 aka "Thomas Huth <[email protected]>" [unknown]
# gpg:                 aka "Thomas Huth <[email protected]>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5

* tag 'pull-request-2022-01-05' of https://gitlab.com/thuth/qemu:
  docs/tools/qemu-trace-stap.rst: Do not hard-code the QEMU binary name
  gitlab-ci: Enable docs in the centos job
  docs/sphinx: fix compatibility with sphinx < 1.8
  qemu-options: Remove the deprecated -no-quit option
  tests/unit/test-util-sockets: Use g_file_open_tmp() to create temp file
  tests/qtest/hd-geo-test: Check for the lsi53c895a controller before using it
  tests/qtest/test-x86-cpuid-compat: Check for machines before using them
  hw: Add compat machines for 7.0

Signed-off-by: Richard Henderson <[email protected]>

docs/tools/qemu-trace-stap.rst: Do not hard-code the QEMU binary name

In downstream, we want to use a different name for the QEMU binary,
and some people might also use the docs for non-x86 binaries, that's
why we already created the |qemu_system| placeholder in the past.
Use it now in the stap trace doc, too.

Message-Id: <20220104103319 [email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>

gitlab-ci: Enable docs in the centos job

We just ran into a problem that the docs don't build on RHEL8 / CentOS 8
anymore. Seems like these distros are using one of the oldest Sphinx
versions that we still have to support. Thus enable the docs build in
the CI on CentOS so that such bugs don't slip in so easily again.

Message-Id: <20220104091240 [email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>

docs/sphinx: fix compatibility with sphinx < 1.8

SphinxDirective was added with sphinx 1.8 (2018-09-13).

Reported-by: Thomas Huth <[email protected]>
Signed-off-by: Marc-André Lureau <[email protected]>
Tested-by: Thomas Huth <[email protected]>
Message-Id: <20220104074649.1712440 [email protected]>
Signed-off-by: Thomas Huth <[email protected]>

qemu-options: Remove the deprecated -no-quit option

This option was just a wrapper around the -display ...,window-close=off
parameter, and the name "no-quit" is rather confusing compared to
"window-close" (since there are still other means to quit the emulator),
so let's remove this now.

Message-Id: <20211215082417 [email protected]>
Acked-by: Michal Prívozník <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>

tests/unit/test-util-sockets: Use g_file_open_tmp() to create temp file

Similarly to commit e63ed64c6d1 ("tests/qtest/virtio-net-failover:
Use g_file_open_tmp() to create temporary file"), avoid calling
g_test_rand_int() before g_test_init(): use g_file_open_tmp().

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <20211224234504.3413370 [email protected]>
Signed-off-by: Thomas Huth <[email protected]>

tests/qtest/hd-geo-test: Check for the lsi53c895a controller before using it

The lsi53c895a SCSI controller might have been disabled in the target
binary, so let's check for its availability first before using it.

Message-Id: <20211222153600 [email protected]>
Signed-off-by: Thomas Huth <[email protected]>

tests/qtest/test-x86-cpuid-compat: Check for machines before using them

The user might have disabled the pc-i440fx machine type (or it's older
versions, like done in downstream RHEL) in the QEMU binary, so let's
better check whether the machine types are available before using them.

Message-Id: <20211222153923.1000420 [email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>

hw: Add compat machines for 7.0

Add 7.0 machine types for arm/i440fx/q35/s390x/spapr.

Signed-off-by: Cornelia Huck <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Acked-by: Cédric Le Goater <[email protected]>
Message-Id: <20211217143948 [email protected]>
Signed-off-by: Thomas Huth <[email protected]>

common-user: Really fix i386 calls to safe_syscall_set_errno_tail

Brown bag time: offset 0 from esp is the return address,
offset 4 is the first argument.

Fixes: d7478d4229f0 ("common-user: Fix tail calls to safe_syscall_set_errno_tail")
Signed-off-by: Richard Henderson <[email protected]>

Merge tag 'pull-tcg-20220104' of https://gitlab.com/rth7680/qemu into staging

Fix for safe_syscall_base.
Fix for folding of vector add/sub.
Fix build on loongarch64 with gcc 8.
Remove decl for qemu_run_machine_init_done_notifiers.

# gpg: Signature made Tue 04 Jan 2022 04:39:35 PM PST
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Richard Henderson <[email protected]>" [ultimate]

* tag 'pull-tcg-20220104' of https://gitlab.com/rth7680/qemu:
  common-user: Fix tail calls to safe_syscall_set_errno_tail
  sysemu: Cleanup qemu_run_machine_init_done_notifiers()
  linux-user: Fix trivial build error on loongarch64 hosts
  tcg/optimize: Fix folding of vector ops

Signed-off-by: Richard Henderson <[email protected]>

common-user: Fix tail calls to safe_syscall_set_errno_tail

For the ABIs in which the syscall return register is not
also the first function argument register, move the errno
value into the correct place.

Fixes: a3310c0397e2 ("linux-user: Move syscall error detection into safe_syscall_base")
Reported-by: Laurent Vivier <[email protected]>
Tested-by: Laurent Vivier <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Message-Id: <20220104190454 [email protected]>

sysemu: Cleanup qemu_run_machine_init_done_notifiers()

Remove qemu_run_machine_init_done_notifiers() since no implementation
and user.

Fixes: f66dc8737c9 ("vl: move all generic initialization out of vl.c")
Signed-off-by: Xiaoyao Li <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <20220104024136.1433545 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

linux-user: Fix trivial build error on loongarch64 hosts

When building using GCC 8.3.0 on loongarch64 (Loongnix) we get:

  In file included from ../linux-user/signal.c:33:
  ../linux-user/host/loongarch64/host-signal.h: In function ‘host_signal_write’:
  ../linux-user/host/loongarch64/host-signal.h:57:9: error: a label can only be part of a statement and a declaration is not a statement
         uint32_t sel = (insn >> 15) & 0b11111111111;
         ^~~~~~~~

We don't use the 'sel' variable more than once, so drop it.

Meson output for the record:

  Host machine cpu family: loongarch64
  Host machine cpu: loongarch64
  C compiler for the host machine: cc (gcc 8.3.0 "cc (Loongnix 8.3.0-6.lnd.vec.27) 8.3.0")
  C linker for the host machine: cc ld.bfd 2.31.1-system

Fixes: ad812c3bd65 ("linux-user: Implement CPU-specific signal handler for loongarch64 hosts")
Reported-by: Song Gao <[email protected]>
Suggested-by: Song Gao <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: WANG Xuerui <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <20220104215027.2180972 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg/optimize: Fix folding of vector ops

Bitwise operations are easy to fold, because the operation is
identical regardless of element size. But add and sub need
extra element size info that is not currently propagated.

Fixes: 2f9f08ba43d
Cc: [email protected]
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/799
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

Merge tag 'pull-ppc-20220104' of https://github.com/legoater/qemu into staging

ppc 7.0 queue:

* Cleanup of PowerNV PHBs (Daniel and Cedric)
* Cleanup and fixes for PPC405 machine (Cedric)
* Fix for xscvspdpn (Matheus)
* Rework of powerpc exception handling 1/n (Fabiano)
* Optimisation for PMU (Richard and Daniel)

# gpg: Signature made Mon 03 Jan 2022 11:04:06 PM PST
# gpg:                using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <[email protected]>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B  0B60 51A3 43C7 CFFB ECA1

* tag 'pull-ppc-20220104' of https://github.com/legoater/qemu: (26 commits)
  target/ppc: do not call hreg_compute_hflags() in helper_store_mmcr0()
  target/ppc: Use env->pnc_cyc_cnt
  target/ppc: Rewrite pmu_increment_insns
  target/ppc: Cache per-pmc insn and cycle count settings
  target/ppc: powerpc_excp: Stop passing excp_model around
  target/ppc: powerpc_excp: Move system call vectored code together
  target/ppc: powerpc_excp: Set vector earlier
  target/ppc: powerpc_excp: Add excp_vectors bounds check
  target/ppc: powerpc_excp: Set alternate SRRs directly
  target/ppc: do not silence snan in xscvspdpn
  ppc/ppc405: Dump specific registers
  ppc/ppc405: Introduce a store helper for SPR_40x_PID
  ppc/ppc405: Fix timer initialization
  ppc/ppc405: Rework ppc_40x_timers_init() to use a PowerPCCPU
  ppc/ppc405: Restore TCR and STR write handlers
  ppc/ppc405: Activate MMU logs
  ppc/ppc4xx: Convert printfs()
  target/ppc: Print out literal exception names in logs
  target/ppc: Remove static inline
  target/ppc: Check effective address validity
  ...

Signed-off-by: Richard Henderson <[email protected]>