Git Repo - qemu.git/log

net/colo-compare.c: Fix incorrect return when input wrong size

Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net/colo-compare.c: Fix ACK track reverse issue

The TCP protocol ACK maybe bigger than uint32_t MAX.
At this time, the ACK will reverse to 0. This patch
fix the max_ack and min_ack track issue.

Signed-off-by: Zhang Chen <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

net: vmxnet3: validate configuration values during activate (CVE-2021-20203)

While activating device in vmxnet3_acticate_device(), it does not
validate guest supplied configuration values against predefined
minimum - maximum limits. This may lead to integer overflow or
OOB access issues. Add checks to avoid it.

Fixes: CVE-2021-20203
Buglink: https://bugs.launchpad.net/qemu/+bug/1913873
Reported-by: Gaoning Pan <[email protected]>
Signed-off-by: Prasad J Pandit <[email protected]>
Signed-off-by: Jason Wang <[email protected]>

Merge tag 'sev-hashes-pull-request' of https://gitlab.com/berrange/qemu into staging

Add property for requesting AMD SEV measured kernel launch

- The 'sev-guest' object gains a boolean 'kernel-hashes' property
   which must be enabled to request a measured kernel launch.

# gpg: Signature made Thu 18 Nov 2021 02:33:25 PM CET
# gpg:                using RSA key DAF3A6FDB26B62912D0E8E3FBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <[email protected]>" [full]
# gpg:                 aka "Daniel P. Berrange <[email protected]>" [full]

* tag 'sev-hashes-pull-request' of https://gitlab.com/berrange/qemu:
  target/i386/sev: Replace qemu_map_ram_ptr with address_space_map
  target/i386/sev: Perform padding calculations at compile-time
  target/i386/sev: Fail when invalid hashes table area detected
  target/i386/sev: Rephrase error message when no hashes table in guest firmware
  target/i386/sev: Add kernel hashes only if sev-guest.kernel-hashes=on
  qapi/qom,target/i386: sev-guest: Introduce kernel-hashes=on|off option

Signed-off-by: Richard Henderson <[email protected]>

target/i386/sev: Replace qemu_map_ram_ptr with address_space_map

Use address_space_map/unmap and check for errors.

Signed-off-by: Dov Murik <[email protected]>
Acked-by: Brijesh Singh <[email protected]>
[Two lines wrapped for length - Daniel]
Signed-off-by: Daniel P. Berrangé <[email protected]>

target/i386/sev: Perform padding calculations at compile-time

In sev_add_kernel_loader_hashes, the sizes of structs are known at
compile-time, so calculate needed padding at compile-time.

No functional change intended.

Signed-off-by: Dov Murik <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Acked-by: Brijesh Singh <[email protected]>
Signed-off-by: Daniel P. Berrangé <[email protected]>

target/i386/sev: Fail when invalid hashes table area detected

Commit cff03145ed3c ("sev/i386: Introduce sev_add_kernel_loader_hashes
for measured linux boot", 2021-09-30) introduced measured direct boot
with -kernel, using an OVMF-designated hashes table which QEMU fills.

However, no checks are performed on the validity of the hashes area
designated by OVMF.  Specifically, if OVMF publishes the
SEV_HASH_TABLE_RV_GUID entry but it is filled with zeroes, this will
cause QEMU to write the hashes entries over the first page of the
guest's memory (GPA 0).

Add validity checks to the published area.  If the hashes table area's
base address is zero, or its size is too small to fit the aligned hashes
table, display an error and stop the guest launch.  In such case, the
following error will be displayed:

    qemu-system-x86_64: SEV: guest firmware hashes table area is invalid (base=0x0 size=0x0)

Signed-off-by: Dov Murik <[email protected]>
Reported-by: Brijesh Singh <[email protected]>
Acked-by: Brijesh Singh <[email protected]>
Signed-off-by: Daniel P. Berrangé <[email protected]>

target/i386/sev: Rephrase error message when no hashes table in guest firmware

Signed-off-by: Dov Murik <[email protected]>
Acked-by: Brijesh Singh <[email protected]>
Signed-off-by: Daniel P. Berrangé <[email protected]>

target/i386/sev: Add kernel hashes only if sev-guest.kernel-hashes=on

Commit cff03145ed3c ("sev/i386: Introduce sev_add_kernel_loader_hashes
for measured linux boot", 2021-09-30) introduced measured direct boot
with -kernel, using an OVMF-designated hashes table which QEMU fills.

However, if OVMF doesn't designate such an area, QEMU would completely
abort the VM launch.  This breaks launching with -kernel using older
OVMF images which don't publish the SEV_HASH_TABLE_RV_GUID.

Fix that so QEMU will only look for the hashes table if the sev-guest
kernel-hashes option is set to on.  Otherwise, QEMU won't look for the
designated area in OVMF and won't fill that area.

To enable addition of kernel hashes, launch the guest with:

    -object sev-guest,...,kernel-hashes=on

Signed-off-by: Dov Murik <[email protected]>
Reported-by: Tom Lendacky <[email protected]>
Acked-by: Brijesh Singh <[email protected]>
Signed-off-by: Daniel P. Berrangé <[email protected]>

qapi/qom,target/i386: sev-guest: Introduce kernel-hashes=on|off option

Introduce new boolean 'kernel-hashes' option on the sev-guest object.
It will be used to to decide whether to add the hashes of
kernel/initrd/cmdline to SEV guest memory when booting with -kernel.
The default value is 'off'.

Signed-off-by: Dov Murik <[email protected]>
Acked-by: Brijesh Singh <[email protected]>
Signed-off-by: Daniel P. Berrangé <[email protected]>

Merge tag 'vfio-fixes-20211117.0' of git://github.com/awilliam/qemu-vfio into staging

VFIO fixes 2021-11-17

* Fix hostwin memory leak (Peng Liang)

# gpg: Signature made Wed 17 Nov 2021 08:05:09 PM CET
# gpg:                using RSA key 42F6C04E540BD1A99E7B8A90239B9B6E3BB08B22
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Alex Williamson <[email protected]>" [full]
# gpg:                 aka "Alex Williamson <[email protected]>" [full]
# gpg:                 aka "Alex Williamson <[email protected]>" [full]
# gpg:                 aka "Alex Williamson <[email protected]>" [full]

* tag 'vfio-fixes-20211117.0' of git://github.com/awilliam/qemu-vfio:
  vfio: Fix memory leak of hostwin

Signed-off-by: Richard Henderson <[email protected]>

vfio: Fix memory leak of hostwin

hostwin is allocated and added to hostwin_list in vfio_host_win_add, but
it is only deleted from hostwin_list in vfio_host_win_del, which causes
a memory leak. Also, freeing all elements in hostwin_list is missing in
vfio_disconnect_container.

Fix: 2e4109de8e58 ("vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2)")
CC: [email protected]
Signed-off-by: Peng Liang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alex Williamson <[email protected]>

Merge tag 'pull-request-2021-11-17' of https://gitlab.com/thuth/qemu into staging

* Remove some unused #defines in s390x code
* rSTify some of the development process pages from the Wiki
* Revert a useless patch in the device-crash-test script
* Bump timeout of the Cirrus-CI jobs to 80 minutes

# gpg: Signature made Wed 17 Nov 2021 11:13:43 AM CET
# gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Thomas Huth <[email protected]>" [full]
# gpg:                 aka "Thomas Huth <[email protected]>" [full]
# gpg:                 aka "Thomas Huth <[email protected]>" [unknown]
# gpg:                 aka "Thomas Huth <[email protected]>" [full]

* tag 'pull-request-2021-11-17' of https://gitlab.com/thuth/qemu:
  gitlab-ci/cirrus: Increase timeout to 80 minutes
  Revert "device-crash-test: Ignore errors about a bus not being available"
  docs: rSTify the "SubmitAPatch" wiki
  docs: rSTify the "SubmitAPullRequest" wiki
  docs: rSTify the "TrivialPatches" wiki
  target/s390x/cpu.h: Remove unused SIGP_MODE defines

Signed-off-by: Richard Henderson <[email protected]>

Merge tag 'pull-riscv-to-apply-20211117-1' of github.com:alistair23/qemu into staging

Sixth RISC-V PR for QEMU 6.2

- Fix build for riscv hosts
- Soft code alphabetically

# gpg: Signature made Wed 17 Nov 2021 10:19:25 AM CET
# gpg:                using RSA key F6C4AC46D4934868D3B8CE8F21E10D29DF977054
# gpg: Good signature from "Alistair Francis <[email protected]>" [full]

* tag 'pull-riscv-to-apply-20211117-1' of github.com:alistair23/qemu:
  meson.build: Merge riscv32 and riscv64 cpu family
  target/riscv: machine: Sort the .subsections

Signed-off-by: Richard Henderson <[email protected]>

gitlab-ci/cirrus: Increase timeout to 80 minutes

The jobs on Cirrus-CI sometimes get delayed quite a bit, waiting to
be scheduled, so while the build test itself finishes within 60 minutes,
the total run time of the jobs can be longer due to this waiting time.
Thus let's increase the timeout on the gitlab side a little bit, so
that these jobs are not marked as failing just because of the delay.

Message-Id: <20211116163309 [email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Willian Rampazzo <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>

Revert "device-crash-test: Ignore errors about a bus not being available"

This reverts commit ca89d15f8e42f2e5eac5bd200af38fdbfb32e875.

There is already an entry for this kind of messages earlier in the
ERROR_RULE_LIST - when I added this patch, I just got fooled by
the other errors that occur due to a race between QMP connection
and QEMU terminating early (which still spit out the 'No bus found'
messages in their backtrace), but these other problems have now
fortunately been tackled by John Snow, so we certainly don't need
this duplicated entry here anymore.

Message-Id: <20211112072220 [email protected]>
Signed-off-by: Thomas Huth <[email protected]>

meson.build: Merge riscv32 and riscv64 cpu family

In ba0e73336200, we merged riscv32 and riscv64 in configure.
However, meson does not treat them the same. We need to merge
them here as well.

Fixes: ba0e73336200
Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Message-id: 20211116095042 [email protected]
Signed-off-by: Alistair Francis <[email protected]>

target/riscv: machine: Sort the .subsections

Move the codes around so that the order of .subsections matches
the one they are referenced in vmstate_riscv_cpu.

Signed-off-by: Bin Meng <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Message-id: 20211030030606 [email protected]
Signed-off-by: Alistair Francis <[email protected]>

docs: rSTify the "SubmitAPatch" wiki

- The original wiki is here[1]. I copied the wiki source[2] into a .wiki
  file, and used `pandoc` to convert it to rST:

    $> pandoc -f Mediawiki -t rst submitting-a-patch.wiki -o
       submitting-a-patch.rst

- The only minor touch-ups I did was to fix URLs.  But 99%, it is a 1-1
  conversion.

  (An example of a "touch-up": under the section "Patch emails must
  include a Signed-off-by: line", I updated the "see SubmittingPatches
  1.12"  to "1.12) Sign your work")

- I have also converted a couple other related wiki pages (included in
  this patch series) that were hyperlinked within the SubmitAPatch page,
  or a page that it refers to:

  - SubmitAPullRequest: https://wiki.qemu.org/Contribute/SubmitAPullRequest
  - TrivialPatches: https://wiki.qemu.org/Contribute/TrivialPatches

- Over time, many people contributed to this wiki page; you can find all
  the authors in the wiki history[3].

[1] https://wiki.qemu.org/Contribute/SubmitAPatch
[2] http://wiki.qemu.org/index.php?title=Contribute/SubmitAPatch&action=edit
[3] http://wiki.qemu.org/index.php?title=Contribute/SubmitAPatch&action=history

Signed-off-by: Kashyap Chamarthy <[email protected]>
Message-Id: <20211110144902 [email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
[thuth: Cosmetic fixes]
Signed-off-by: Thomas Huth <[email protected]>

docs: rSTify the "SubmitAPullRequest" wiki

The original wiki is here[1].  I converted by copying the wiki source
into a .wiki file and convert to rST using `pandoc`:

    $ pandoc -f Mediawiki -t rst submitting-a-pull-request.wiki \
        -o submitting-a-pull-request.rst

This is a 1-1 conversion; no content changes.

[1] https://wiki.qemu.org/Contribute/SubmitAPullRequest

Signed-off-by: Kashyap Chamarthy <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <20211110144902 [email protected]>
Signed-off-by: Thomas Huth <[email protected]>

docs: rSTify the "TrivialPatches" wiki

The original wiki is here[1]. I converted by copying the wiki source
into a .wiki file and convert to rST using `pandoc`:

$ pandoc -f Mediawiki -t rst trivial-patches.wiki -o trivial-patches.rst

Update the active maintainer names (and drop Michael Tokarev's inactive
repo) to reflect current reality.

[1] https://wiki.qemu.org/Contribute/TrivialPatches

Signed-off-by: Kashyap Chamarthy <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <20211110144902 [email protected]>
Signed-off-by: Thomas Huth <[email protected]>

target/s390x/cpu.h: Remove unused SIGP_MODE defines

These are unused since commit 075e52b816648f21 ("s390x/cpumodel:
we are always in zarchitecture mode") and it's unlikely that we
will ever need them again. So let's simply remove them now.

Message-Id: <20211015124219.1330830 [email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>

Merge tag 'python-pull-request' of https://gitlab.com/jsnow/qemu into staging

Pull request

# gpg: Signature made Wed 17 Nov 2021 01:33:06 AM CET
# gpg:                using RSA key F9B7ABDBBCACDF95BE76CBD07DEF8106AAFC390E
# gpg: Good signature from "John Snow (John Huston) <[email protected]>" [full]

* tag 'python-pull-request' of https://gitlab.com/jsnow/qemu:
  scripts/device-crash-test: hide tracebacks for QMP connect errors
  scripts/device-crash-test: don't emit AQMP connection errors to stdout
  scripts/device-crash-test: simplify Exception handling
  python/aqmp: fix ConnectError string method
  python/aqmp: Fix disconnect during capabilities negotiation

Signed-off-by: Richard Henderson <[email protected]>

Update version for v6.2.0-rc1 release

Signed-off-by: Richard Henderson <[email protected]>

scripts/device-crash-test: hide tracebacks for QMP connect errors

Generally, the traceback for a connection failure is uninteresting and
all we need to know is that the connection attempt failed.

Reduce the verbosity in these cases, except when debugging.

Signed-off-by: John Snow <[email protected]>
Reported-by: Thomas Huth <[email protected]>
Tested-by: Thomas Huth <[email protected]>
Message-id: 20211111143719.2162525 [email protected]
Signed-off-by: John Snow <[email protected]>

scripts/device-crash-test: don't emit AQMP connection errors to stdout

These errors are expected, so they shouldn't clog up terminal output. In
the event that they're *not* expected, we'll be seeing an awful lot more
output concerning the nature of the failure.

Reported-by: Thomas Huth <[email protected]>
Signed-off-by: John Snow <[email protected]>
Tested-by: Thomas Huth <[email protected]>
Message-id: 20211111143719.2162525 [email protected]
Signed-off-by: John Snow <[email protected]>

scripts/device-crash-test: simplify Exception handling

We don't need to handle KeyboardInterruptError specifically; we can
instead tighten the scope of the broad Exception handlers to only catch
"Exception", which has the effect of allowing all BaseException classes
that do not inherit from Exception to be raised through.

KeyboardInterruptError and a few other important ones are
BaseExceptions, so this does the same thing with less code.

Signed-off-by: John Snow <[email protected]>
Reported-by: Thomas Huth <[email protected]>
Tested-by: Thomas Huth <[email protected]>
Message-id: 20211111143719.2162525 [email protected]
Signed-off-by: John Snow <[email protected]>

python/aqmp: fix ConnectError string method

When ConnectError is used to wrap an Exception that was initialized
without an error message, we are treated to a traceback with a rubbish
line like this:

... ConnectError: Failed to establish session:

Correct this to use the name of an exception as a fallback message:

... ConnectError: Failed to establish session: EOFError

Better!

Signed-off-by: John Snow <[email protected]>
Reported-by: Thomas Huth <[email protected]>
Tested-by: Thomas Huth <[email protected]>
Message-id: 20211111143719.2162525 [email protected]
Signed-off-by: John Snow <[email protected]>

python/aqmp: Fix disconnect during capabilities negotiation

If we receive ConnectionResetError (ECONNRESET) while attempting to
perform capabilities negotiation -- prior to the establishment of the
async reader/writer tasks -- the disconnect function is not aware that
we are in an error pathway.

As a result, when attempting to close the StreamWriter, we'll see the
same ConnectionResetError that caused us to initiate a disconnect in the
first place, which will cause the disconnect task itself to fail, which
emits a CRITICAL logging event.

I still don't know if there's a smarter way to check to see if an
exception received at this point is "the same" exception as the one that
caused the initial disconnect, but for now the problem can be avoided by
improving the error pathway detection in the exit path.

Reported-by: Thomas Huth <[email protected]>
Signed-off-by: John Snow <[email protected]>
Tested-by: Thomas Huth <[email protected]>
Message-id: 20211111143719.2162525 [email protected]
Signed-off-by: John Snow <[email protected]>

Merge tag 'pull-nbd-2021-11-16' of https://repo.or.cz/qemu/ericb into staging

nbd patches for 2021-11-16

- Rich Jones: Add 'qemu-nbd --selinux-label' option for running Unix
  socket with appropriate SELinux labeling
- Eric Blake: Address clang sanitizer warning

# gpg: Signature made Tue 16 Nov 2021 05:32:26 PM CET
# gpg:                using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <[email protected]>" [full]
# gpg:                 aka "Eric Blake (Free Software Programmer) <[email protected]>" [full]
# gpg:                 aka "[jpeg image of size 6874]" [full]

* tag 'pull-nbd-2021-11-16' of https://repo.or.cz/qemu/ericb:
  nbd/server: Add --selinux-label option
  nbd/server: Silence clang sanitizer warning

Signed-off-by: Richard Henderson <[email protected]>

Merge tag 'pull-for-6.2-161121-1' of https://github.com/stsquad/qemu into staging

Misc build and test fixes:

  - force NOUSER for base docker images
  - don't run TCG VM tests by default
  - remove useless meson test
  - add Centos 8 custom runner
  - split up custom-runners to individual files
  - skip cirrus checks on master/stable branches

# gpg: Signature made Tue 16 Nov 2021 05:22:09 PM CET
# gpg:                using RSA key 6685AE99E75167BCAFC8DF35FBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <[email protected]>" [full]

* tag 'pull-for-6.2-161121-1' of https://github.com/stsquad/qemu:
  gitlab: skip cirrus jobs on master and stable branches
  gitlab-ci: Split custom-runners.yml in one file per runner
  Jobs based on custom runners: add CentOS Stream 8
  meson: remove useless libdl test
  tests/vm: don't build using TCG by default
  tests/vm: sort the special variable list
  tests/docker: force NOUSER=1 for base images

Signed-off-by: Richard Henderson <[email protected]>

gitlab: skip cirrus jobs on master and stable branches

On the primary QEMU repository we want the CI jobs to run on the staging
branch as a gating CI test.

Cirrus CI has very limited job concurrency, so if there are too many
jobs triggered they'll queue up and hit the GitLab CI job timeout before
they complete on Cirrus.

If we let Cirrus jobs run again on the master branch immediately after
merging from staging, that just increases the chances jobs will get
queued and subsequently timeout.

The same applies for merges to the stable branches.

User forks meanwhile should be allowed to run Cirrus CI jobs freely.

Signed-off-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Willian Rampazzo <[email protected]>
Message-Id: <20211116112757.1909176 [email protected]>
Signed-off-by: Alex Bennée <[email protected]>

gitlab-ci: Split custom-runners.yml in one file per runner

To ease maintenance, add the custom-runners/ directory and
split custom-runners.yml in 3 files, all included by the
current custom-runners.yml:
- ubuntu-18.04-s390x.yml
- ubuntu-20.04-aarch64.yml
- centos-stream-8-x86_64.yml

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Alex Bennée <[email protected]>
Message-Id: <20211115095608.2436223 [email protected]>
Reviewed-by: Willian Rampazzo <[email protected]>
Message-Id: <20211115142915.3797652 [email protected]>

Jobs based on custom runners: add CentOS Stream 8

This introduces three different parts of a job designed to run
on a custom runner managed by Red Hat.  The goals include:

  a) propose a model for other organizations that want to onboard
     their own runners, with their specific platforms, build
     configuration and tests.

  b) bring awareness to the differences between upstream QEMU and the
     version available under CentOS Stream, which is "A preview of
     upcoming Red Hat Enterprise Linux minor and major releases".

  c) because of b), it should be easier to identify and reduce the gap
     between Red Hat's downstream and upstream QEMU.

The components of this custom job are:

  I) OS build environment setup code:

     - additions to the existing "build-environment.yml" playbook
       that can be used to set up CentOS/EL 8 systems.

     - a CentOS Stream 8 specific "build-environment.yml" playbook
       that adds to the generic one.

II) QEMU build configuration: a script that will produce binaries with
     features as similar as possible to the ones built and packaged on
     CentOS stream 8.

III) Scripts that define the minimum amount of testing that the
     binaries built with the given configuration (point II) under the
     given OS build environment (point I) should be subjected to.

IV) Job definition: GitLab CI jobs that will dispatch the build/test
     jobs (see points #II and #III) to the machine specifically
     configured according to #I.

Signed-off-by: Cleber Rosa <[email protected]>
Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Willian Rampazzo <[email protected]>
Tested-by: Willian Rampazzo <[email protected]>
Message-Id: <20211111160501 [email protected]>
Message-Id: <20211115142915.3797652 [email protected]>

meson: remove useless libdl test

dlopen is never used after it is sought via cc.find_library, because
plugins use gmodule instead; remove the test.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Message-Id: <20211110092454 [email protected]>
Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <20211115142915.3797652 [email protected]>

tests/vm: don't build using TCG by default

While it is useful to run these images using TCG their performance
will not be anything like the native guests. Don't do it by default.

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/393
Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Willian Rampazzo <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <20211115142915.3797652 [email protected]>

tests/vm: sort the special variable list

Making the list alphabetical makes it easier to find the config option
you are looking for.

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Willian Rampazzo <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <20211115142915.3797652 [email protected]>

tests/docker: force NOUSER=1 for base images

As base images are often used to build further images like toolchains
ensure we don't add the local user by accident. The local user should
only exist on local images and not anything that gets pushed up to the
public registry.

Reported-by: Richard Henderson <[email protected]>
Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <20211115142915.3797652 [email protected]>

nbd/server: Add --selinux-label option

Under SELinux, Unix domain sockets have two labels.  One is on the
disk and can be set with commands such as chcon(1).  There is a
different label stored in memory (called the process label).  This can
only be set by the process creating the socket.  When using SELinux +
SVirt and wanting qemu to be able to connect to a qemu-nbd instance,
you must set both labels correctly first.

For qemu-nbd the options to set the second label are awkward.  You can
create the socket in a wrapper program and then exec into qemu-nbd.
Or you could try something with LD_PRELOAD.

This commit adds the ability to set the label straightforwardly on the
command line, via the new --selinux-label flag.  (The name of the flag
is the same as the equivalent nbdkit option.)

A worked example showing how to use the new option can be found in
this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1984938

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1984938
Signed-off-by: Richard W.M. Jones <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
[eblake: rebase to configure changes, reject --selinux-label if it is
not compiled in or not used on a Unix socket]
Note that we may relax some of these restrictions at a later date,
such as making it possible to label a TCP socket, although it may be
smarter to do so as a generic QMP action rather than more one-off
command lines in qemu-nbd.
Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20211115202944 [email protected]>
Reviewed-by: Thomas Huth <[email protected]>
[eblake: adjust meson output as suggested by thuth]
Signed-off-by: Eric Blake <[email protected]>

nbd/server: Silence clang sanitizer warning

clang's sanitizer is picky: memset(NULL, x, 0) is technically
undefined behavior, even though no sane implementation of memset()
deferences the NULL. Caught by the nbd-qemu-allocation iotest.

The alternative to checking before each memset is to instead force an
allocation of 1 element instead of g_new0(type, 0)'s behavior of
returning NULL for a 0-length array.

Reported-by: Peter Maydell <[email protected]>
Fixes: 3b1f244c59 (nbd: Allow export of multiple bitmaps for one device)
Signed-off-by: Eric Blake <[email protected]>
Message-Id: <20211115223943 [email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>

Merge tag 'pull-block-2021-11-16' of https://gitlab.com/hreitz/qemu into staging

Block patches for 6.2.0-rc1:
- Fixes to image streaming job and block layer reconfiguration to make
  iotest 030 pass again
- docs: Deprecate incorrectly typed device_add arguments
- file-posix: Fix alignment after reopen changing O_DIRECT

# gpg: Signature made Tue 16 Nov 2021 01:57:03 PM CET
# gpg:                using RSA key CB62D7A0EE3829E45F004D34A1FA40D098019CDF
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Hanna Reitz <[email protected]>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: CB62 D7A0 EE38 29E4 5F00  4D34 A1FA 40D0 9801 9CDF

* tag 'pull-block-2021-11-16' of https://gitlab.com/hreitz/qemu:
  file-posix: Fix alignment after reopen changing O_DIRECT
  softmmu/qdev-monitor: fix use-after-free in qdev_set_id()
  docs: Deprecate incorrectly typed device_add arguments
  iotests/030: Unthrottle parallel jobs in reverse
  block: Let replace_child_noperm free children
  block: Let replace_child_tran keep indirect pointer
  transactions: Invoke clean() after everything else
  block: Restructure remove_file_or_backing_child()
  block: Pass BdrvChild ** to replace_child_noperm
  block: Drop detached child from ignore list
  block: Unite remove_empty_child and child_free
  block: Manipulate children list in .attach/.detach
  stream: Traverse graph after modification

Signed-off-by: Richard Henderson <[email protected]>

Merge tag 'machine-core-20211115' of https://github.com/philmd/qemu into staging

Machine core patches

- Rework SMP parsing unit test to work on WinGW:

  https://github.com/qemu/qemu/runs/4078386652

  This fixes:

    Test smp_parse failed!
    Expected error report: Invalid SMP CPUs 1. The min CPUs supported by machine '(null)' is 2
      Output error report: Invalid SMP CPUs 1. The min CPUs supported by machine '(NULL)' is 2

# gpg: Signature made Mon 15 Nov 2021 11:46:36 PM CET
# gpg:                using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <[email protected]>" [full]

* tag 'machine-core-20211115' of https://github.com/philmd/qemu:
  tests/unit/test-smp-parse: Explicit MachineClass name
  tests/unit/test-smp-parse: QOM'ify smp_machine_class_init()
  tests/unit/test-smp-parse: Restore MachineClass fields after modifying

Signed-off-by: Richard Henderson <[email protected]>

file-posix: Fix alignment after reopen changing O_DIRECT

At the end of a reopen, we already call bdrv_refresh_limits(), which
should update bs->request_alignment according to the new file
descriptor. However, raw_probe_alignment() relies on s->needs_alignment
and just uses 1 if it isn't set. We neglected to update this field, so
starting with cache=writeback and then reopening with cache=none means
that we get an incorrect bs->request_alignment == 1 and unaligned
requests fail instead of being automatically aligned.

Fix this by recalculating s->needs_alignment in raw_refresh_limits()
before calling raw_probe_alignment().

Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20211104113109 [email protected]>
Reviewed-by: Hanna Reitz <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20211115145409 [email protected]>
[hreitz: Fix iotest 142 for block sizes greater than 512 by operating on
a file with a size of 1 MB]
Signed-off-by: Hanna Reitz <[email protected]>
Message-Id: <20211116101431 [email protected]>

softmmu/qdev-monitor: fix use-after-free in qdev_set_id()

Reported by Coverity (CID 1465222).

Fixes: 4a1d937796de0fecd8b22d7dbebf87f38e8282fd ("softmmu/qdev-monitor: add error handling in qdev_set_id")
Cc: Damien Hedde <[email protected]>
Cc: Kevin Wolf <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>
Message-Id: <20211102163342 [email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Reviewed-by: Damien Hedde <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20211115145409 [email protected]>
Signed-off-by: Hanna Reitz <[email protected]>

Merge tag 'pull-target-arm-20211115-1' of https://git.linaro.org/people/pmaydell/qemu-arm into staging

target-arm queue:
* Support multiple redistributor regions for TCG GICv3
* Send RTC_CHANGE QMP event from pl031

# gpg: Signature made Mon 15 Nov 2021 07:53:40 PM CET
# gpg:                using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Peter Maydell <[email protected]>" [full]
# gpg:                 aka "Peter Maydell <[email protected]>" [full]
# gpg:                 aka "Peter Maydell <[email protected]>" [full]

* tag 'pull-target-arm-20211115-1' of https://git.linaro.org/people/pmaydell/qemu-arm:
  hw/rtc/pl031: Send RTC_CHANGE QMP event
  hw/intc/arm_gicv3: Support multiple redistributor regions
  hw/intc/arm_gicv3: Set GICR_TYPER.Last correctly when nb_redist_regions > 1
  hw/intc/arm_gicv3: Move checking of redist-region-count to arm_gicv3_common_realize

Signed-off-by: Richard Henderson <[email protected]>

docs: Deprecate incorrectly typed device_add arguments

While introducing a non-QemuOpts code path for device creation for JSON
-device, we noticed that QMP device_add doesn't check its input
correctly (accepting arguments that should have been rejected), and that
users may be relying on this behaviour (libvirt did until it was fixed
recently).

Let's use a deprecation period before we fix this bug in QEMU to avoid
nasty surprises for users.

Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20211111143530 [email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20211115145409 [email protected]>
Signed-off-by: Hanna Reitz <[email protected]>

iotests/030: Unthrottle parallel jobs in reverse

See the comment for why this is necessary.

Signed-off-by: Hanna Reitz <[email protected]>
Message-Id: <20211111120829 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20211115145409 [email protected]>
Signed-off-by: Hanna Reitz <[email protected]>

block: Let replace_child_noperm free children

In most of the block layer, especially when traversing down from other
BlockDriverStates, we assume that BdrvChild.bs can never be NULL.  When
it becomes NULL, it is expected that the corresponding BdrvChild pointer
also becomes NULL and the BdrvChild object is freed.

Therefore, once bdrv_replace_child_noperm() sets the BdrvChild.bs
pointer to NULL, it should also immediately set the corresponding
BdrvChild pointer (like bs->file or bs->backing) to NULL.

In that context, it also makes sense for this function to free the
child.  Sometimes we cannot do so, though, because it is called in a
transactional context where the caller might still want to reinstate the
child in the abort branch (and free it only on commit), so this behavior
has to remain optional.

In bdrv_replace_child_tran()'s abort handler, we now rely on the fact
that the BdrvChild passed to bdrv_replace_child_tran() must have had a
non-NULL .bs pointer initially.  Make a note of that and assert it.

Signed-off-by: Hanna Reitz <[email protected]>
Message-Id: <20211111120829 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20211115145409 [email protected]>
Signed-off-by: Hanna Reitz <[email protected]>

block: Let replace_child_tran keep indirect pointer

As of a future commit, bdrv_replace_child_noperm() will clear the
indirect BdrvChild pointer passed to it if the new child BDS is NULL.
bdrv_replace_child_tran() will want to let it do that, but revert this
change in its abort handler.  For that, we need to have it receive a
BdrvChild ** pointer, too, and keep it stored in the
BdrvReplaceChildState object that we attach to the transaction.

Note that we do not need to store it in the BdrvReplaceChildState when
new_bs is not NULL, because then there is nothing to revert.  This is
important so that bdrv_replace_node_noperm() can pass a pointer to a
loop-local variable to bdrv_replace_child_tran() without worrying that
this pointer will outlive one loop iteration.

(Of course, for that to work, bdrv_replace_node_noperm() and in turn
bdrv_replace_node() and its relatives may not be called with a NULL @to
node.  Luckily, they already are not, but now we should assert this.)

bdrv_remove_file_or_backing_child() on the other hand needs to ensure
that the indirect pointer it passes will stay valid for the duration of
the transaction.  Ensure this by keeping a strong reference to the BDS
whose &bs->backing or &bs->file it passes to bdrv_replace_child_tran(),
and giving up that reference only in the transaction .clean() handler.

Signed-off-by: Hanna Reitz <[email protected]>
Message-Id: <20211111120829 [email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20211115145409 [email protected]>
Signed-off-by: Hanna Reitz <[email protected]>

transactions: Invoke clean() after everything else

Invoke the transaction drivers' .clean() methods only after all
.commit() or .abort() handlers are done.

This makes it easier to have nested transactions where the top-level
transactions pass objects to lower transactions that the latter can
still use throughout their commit/abort phases, while the top-level
transaction keeps a reference that is released in its .clean() method.

(Before this commit, that is also possible, but the top-level
transaction would need to take care to invoke tran_add() before the
lower-level transaction does. This commit makes the ordering
irrelevant, which is just a bit nicer.)

Signed-off-by: Hanna Reitz <[email protected]>
Message-Id: <20211111120829 [email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20211115145409 [email protected]>
Signed-off-by: Hanna Reitz <[email protected]>

block: Restructure remove_file_or_backing_child()

As of a future patch, bdrv_replace_child_tran() will take a BdrvChild **
pointer. Prepare for that by getting such a pointer and using it where
applicable, and (dereferenced) as a parameter for
bdrv_replace_child_tran().

Signed-off-by: Hanna Reitz <[email protected]>
Message-Id: <20211111120829 [email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20211115145409 [email protected]>
Signed-off-by: Hanna Reitz <[email protected]>

block: Pass BdrvChild ** to replace_child_noperm

bdrv_replace_child_noperm() modifies BdrvChild.bs, and can potentially
set it to NULL. That is dangerous, because BDS parents generally assume
that their children's .bs pointer is never NULL. We therefore want to
let bdrv_replace_child_noperm() set the corresponding BdrvChild pointer
to NULL, too.

This patch lays the foundation for it by passing a BdrvChild ** pointer
to bdrv_replace_child_noperm() so that it can later use it to NULL the
BdrvChild pointer immediately after setting BdrvChild.bs to NULL.

(We will still need to undertake some intermediate steps, though.)

Signed-off-by: Hanna Reitz <[email protected]>
Message-Id: <20211111120829 [email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20211115145409 [email protected]>
Signed-off-by: Hanna Reitz <[email protected]>

block: Drop detached child from ignore list

bdrv_attach_child_common_abort() restores the parent's AioContext. To
do so, the child (which was supposed to be attached, but is now detached
again by this abort handler) is added to the ignore list for the
AioContext changing functions.

However, since we modify a BDS's children list in the BdrvChildClass's
.attach and .detach handlers, the child is already effectively detached
from the parent by this point. We do not need to put it into the ignore
list.

Use this opportunity to clean up the empty line structure: Keep setting
the ignore list, invoking the AioContext function, and freeing the
ignore list in blocks separated by empty lines.

Signed-off-by: Hanna Reitz <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20211111120829 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20211115145409 [email protected]>
Signed-off-by: Hanna Reitz <[email protected]>

block: Unite remove_empty_child and child_free

Now that bdrv_remove_empty_child() no longer removes the child from the
parent's children list but only checks that it is not in such a list, it
is only a wrapper around bdrv_child_free() that checks that the child is
empty and unused. That should apply to all children that we free, so
put those checks into bdrv_child_free() and drop
bdrv_remove_empty_child().

Signed-off-by: Hanna Reitz <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20211111120829 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20211115145409 [email protected]>
Signed-off-by: Hanna Reitz <[email protected]>

block: Manipulate children list in .attach/.detach

The children list is specific to BDS parents. We should not modify it
in the general children modification code, but let BDS parents deal with
it in their .attach() and .detach() methods.

This also has the advantage that a BdrvChild is removed from the
children list before its .bs pointer can become NULL. BDS parents
generally assume that their children's .bs pointer is never NULL, so
this is actually a bug fix.

Signed-off-by: Hanna Reitz <[email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20211111120829 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20211115145409 [email protected]>
Signed-off-by: Hanna Reitz <[email protected]>

stream: Traverse graph after modification

bdrv_cor_filter_drop() modifies the block graph. That means that other
parties can also modify the block graph before it returns. Therefore,
we cannot assume that the result of a graph traversal we did before
remains valid afterwards.

We should thus fetch `base` and `unfiltered_base` afterwards instead of
before.

Signed-off-by: Hanna Reitz <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <20211111120829 [email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <20211115145409 [email protected]>
Signed-off-by: Hanna Reitz <[email protected]>

Merge tag 'for_upstream' of git://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging

pci,pc,virtio: bugfixes

pci power management fixes
acpi hotplug fixes
misc other fixes

Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Mon 15 Nov 2021 05:15:09 PM CET
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>" [full]
# gpg:                 aka "Michael S. Tsirkin <[email protected]>" [full]

* tag 'for_upstream' of git://git.kernel.org/pub/scm/virt/kvm/mst/qemu:
  pcie: expire pending delete
  pcie: fast unplug when slot power is off
  pcie: factor out pcie_cap_slot_unplug()
  pcie: add power indicator blink check
  pcie: implement slot power control for pcie root ports
  pci: implement power state
  vdpa: Check for existence of opts.vhostdev
  vdpa: Replace qemu_open_old by qemu_open at
  virtio: use virtio accessor to access packed event
  virtio: use virtio accessor to access packed descriptor flags
  tests: bios-tables-test update expected blobs
  hw/i386/acpi-build: Deny control on PCIe Native Hot-plug in _OSC
  bios-tables-test: Allow changes in DSDT ACPI tables
  hw/acpi/ich9: Add compat prop to keep HPC bit set for 6.1 machine type
  pcie: rename 'native-hotplug' to 'x-native-hotplug'
  hw/mem/pc-dimm: Restrict NUMA-specific code to NUMA machines
  vhost: Fix last vq queue index of devices with no cvq
  vhost: Rename last_index to vq_index_end
  softmmu/qdev-monitor: fix use-after-free in qdev_set_id()
  net/vhost-vdpa: fix memory leak in vhost_vdpa_get_max_queue_pairs()

Signed-off-by: Richard Henderson <[email protected]>

tests/unit/test-smp-parse: Explicit MachineClass name

If the MachineClass::name pointer is not explicitly set, it is NULL.
Per the C standard, passing a NULL pointer to printf "%s" format is
undefined. Some implementations display it as 'NULL', other as 'null'.
Since we are comparing the formatted output, we need a stable value.
The easiest is to explicit a machine name string.

Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Yanan Wang <[email protected]>
Tested-by: Yanan Wang <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <20211115145900.2531865 [email protected]>

tests/unit/test-smp-parse: QOM'ify smp_machine_class_init()

smp_machine_class_init() is the actual TypeInfo::class_init().
Declare it as such in smp_machine_info, and avoid to call it
manually in each test. Move smp_machine_info definition just
before we register the type to avoid a forward declaration.

Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Yanan Wang <[email protected]>
Tested-by: Yanan Wang <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <20211115145900.2531865 [email protected]>

tests/unit/test-smp-parse: Restore MachineClass fields after modifying

There is a single MachineClass object, registered with
type_register_static(&smp_machine_info). Since the same
object is used multiple times (an MachineState object
is instantiated in both test_generic and test_with_dies),
we should restore its internal state after modifying for
the test purpose.

Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Yanan Wang <[email protected]>
Tested-by: Yanan Wang <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <20211115145900.2531865 [email protected]>

hw/rtc/pl031: Send RTC_CHANGE QMP event

The PL031 currently is not able to report guest RTC change to the QMP
monitor as opposed to mc146818 or spapr RTCs. This patch adds the call
to qapi_event_send_rtc_change() when the Load Register is written. The
value which is reported corresponds to the difference between the guest
reference time and the reference time kept in softmmu/rtc.c.

For instance adding 20s to the guest RTC value will report 20. Adding
an extra 20s to the guest RTC value will report 20 + 20 = 40.

The inclusion of qapi/qapi-types-misc-target.h in hw/rtl/pl031.c
require to compile the PL031 with specific_ss.add() to avoid
./qapi/qapi-types-misc-target.h:18:13: error: attempt to use poisoned
"TARGET_<ARCH>".

Signed-off-by: Eric Auger <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Message-id: 20210920122535 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

hw/intc/arm_gicv3: Support multiple redistributor regions

Our GICv3 QOM interface includes an array property
redist-region-count which allows board models to specify that the
registributor registers are not in a single contiguous range, but
split into multiple pieces.  We implemented this for KVM, but
currently the TCG GICv3 model insists that there is only one region.
You can see the limit being hit with a setup like:
  qemu-system-aarch64 -machine virt,gic-version=3 -smp 124

Add support for split regions to the TCG GICv3.  To do this we switch
from allocating a simple array of MemoryRegions to an array of
GICv3RedistRegion structs so that we can use the GICv3RedistRegion as
the opaque pointer in the MemoryRegion read/write callbacks.  Each
GICv3RedistRegion contains the MemoryRegion, a backpointer allowing
the read/write callback to get hold of the GICv3State, and an index
which allows us to calculate which CPU's redistributor is being
accessed.

Note that arm_gicv3_kvm always passes in NULL as the ops argument
to gicv3_init_irqs_and_mmio(), so the only MemoryRegion read/write
callbacks we need to update to handle this new scheme are the
gicv3_redist_read/write functions used by the emulated GICv3.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

hw/intc/arm_gicv3: Set GICR_TYPER.Last correctly when nb_redist_regions > 1

The 'Last' bit in the GICR_TYPER GICv3 redistributor register is
supposed to be set to 1 if this is the last redistributor in a series
of contiguous redistributor pages.  Currently we set Last only for
the redistributor for CPU (num_cpu - 1).  This only works if there is
a single redistributor region; if there are multiple redistributor
regions then we need to set the Last bit for the last redistributor
in each region.

This doesn't cause any problems currently because only the KVM GICv3
supports multiple redistributor regions, and it ignores the value in
GICv3State::gicr_typer.  But we need to fix this before we can enable
support for multiple regions in the emulated GICv3.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

hw/intc/arm_gicv3: Move checking of redist-region-count to arm_gicv3_common_realize

The GICv3 devices have an array property redist-region-count.
Currently we check this for errors (bad values) in
gicv3_init_irqs_and_mmio(), just before we use it.  Move this error
checking to the arm_gicv3_common_realize() function, where we
sanity-check all of the other base-class properties. (This will
always be before gicv3_init_irqs_and_mmio() is called, because
that function is called in the subclass realize methods, after
they have called the parent-class realize.)

The motivation for this refactor is:
* we would like to use the redist_region_count[] values in
   arm_gicv3_common_realize() in a subsequent patch, so we need
   to have already done the sanity-checking first
* this removes the only use of the Error** argument to
   gicv3_init_irqs_and_mmio(), so we can remove some error-handling
   boilerplate

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>

pcie: expire pending delete

Add an expire time for pending delete, once the time is over allow
pressing the attention button again.

This makes pcie hotplug behave more like acpi hotplug, where one can
try sending an 'device_del' monitor command again in case the guest
didn't respond to the first attempt.

Signed-off-by: Gerd Hoffmann <[email protected]>
Message-Id: <20211111130859.1171890 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pcie: fast unplug when slot power is off

In case the slot is powered off (and the power indicator turned off too)
we can unplug right away, without round-trip to the guest.

Also clear pending attention button press, there is nothing to care
about any more.

Signed-off-by: Gerd Hoffmann <[email protected]>
Message-Id: <20211111130859.1171890 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pcie: factor out pcie_cap_slot_unplug()

No functional change.

Signed-off-by: Gerd Hoffmann <[email protected]>
Message-Id: <20211111130859.1171890 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pcie: add power indicator blink check

Refuse to push the attention button in case the guest is busy with some
hotplug operation (as indicated by the power indicator blinking).

Signed-off-by: Gerd Hoffmann <[email protected]>
Message-Id: <20211111130859.1171890 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pcie: implement slot power control for pcie root ports

With this patch hot-plugged pci devices will only be visible to the
guest if the guests hotplug driver has enabled slot power.

This should fix the hot-plug race which one can hit when hot-plugging
a pci device at boot, while the guest is in the middle of the pci bus
scan.

Signed-off-by: Gerd Hoffmann <[email protected]>
Message-Id: <20211111130859.1171890 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pci: implement power state

This allows to power off pci devices.  In "off" state the devices will
not be visible.  No pci config space access, no pci bar access, no dma.

Default state is "on", so this patch (alone) should not change behavior.

Use case:  Allows hotplug controllers implement slot power.  Hotplug
controllers doing so should set the inital power state for devices in
the ->plug callback.

Signed-off-by: Gerd Hoffmann <[email protected]>
Message-Id: <20211111130859.1171890 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vdpa: Check for existence of opts.vhostdev

Since net_init_vhost_vdpa is trying to open it. Not specifying it in the
command line crash qemu.

Fixes: 7327813d17 ("vhost-vdpa: open device fd in net_init_vhost_vdpa()")
Signed-off-by: Eugenio Pérez <[email protected]>
Message-Id: <20211112193431.2379298 [email protected]>
Acked-by: Jason Wang <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vdpa: Replace qemu_open_old by qemu_open at

There is no reason to keep using the old one, since we neither use the
variadics arguments nor open it with O_DIRECT.

Also, net_client_init1, the caller of net_init_vhost_vdpa, wants all
net_client_init_fun to use Error API, so it's a good step in that
direction.

Signed-off-by: Eugenio Pérez <[email protected]>
Message-Id: <20211112193431.2379298 [email protected]>
Acked-by: Jason Wang <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio: use virtio accessor to access packed event

We used to access packed descriptor event and off_wrap via
address_space_{write|read}_cached(). When we hit the cache, memcpy()
is used which is not atomic which may lead a wrong value to be read or
wrote.

This patch fixes this by switching to use
virito_{stw|lduw}_phys_cached() to make sure the access is atomic.

Fixes: 683f7665679c1 ("virtio: event suppression support for packed ring")
Cc: [email protected]
Signed-off-by: Jason Wang <[email protected]>
Message-Id: <20211111063854 [email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio: use virtio accessor to access packed descriptor flags

We used to access packed descriptor flags via
address_space_{write|read}_cached(). When we hit the cache, memcpy()
is used which is not an atomic operation which may lead a wrong value
is read or wrote.

So this patch switches to use virito_{stw|lduw}_phys_cached() to make
sure the aceess is atomic.

Fixes: 86044b24e865f ("virtio: basic packed virtqueue support")
Cc: [email protected]
Signed-off-by: Jason Wang <[email protected]>
Message-Id: <20211111063854 [email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

tests: bios-tables-test update expected blobs

The changes are the result of
        'hw/i386/acpi-build: Deny control on PCIe Native Hot-Plug in _OSC'
which hides PCIE hotplug bit in host-bridge _OSC

Method (_OSC, 4, NotSerialized)  // _OSC: Operating System Capabilities
             {
                 CreateDWordField (Arg3, Zero, CDW1)
                 If ((Arg0 == ToUUID ("33db4d5b-1ff7-401c-9657-7441c03dd766") /* PCI Host Bridge Device */))
                 {
                     CreateDWordField (Arg3, 0x04, CDW2)
                     CreateDWordField (Arg3, 0x08, CDW3)
                     Local0 = CDW3 /* \_SB_.PCI0._OSC.CDW3 */
-                    Local0 &= 0x1F
+                    Local0 &= 0x1E

Signed-off-by: Igor Mammedov <[email protected]>
Message-Id: <20211112110857.3116853 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

hw/i386/acpi-build: Deny control on PCIe Native Hot-plug in _OSC

There are two ways to enable ACPI PCI Hot-plug:

        * Disable the Hot-plug Capable bit on PCIe slots.

This was the first approach which led to regression [1-2], as
I/O space for a port is allocated only when it is hot-pluggable,
which is determined by HPC bit.

        * Leave the HPC bit on and disable PCIe Native Hot-plug in _OSC
          method.

This removes the (future) ability of hot-plugging switches with PCIe
Native hotplug since ACPI PCI Hot-plug only works with cold-plugged
bridges. If the user wants to explicitely use this feature, they can
disable ACPI PCI Hot-plug with:
        --global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off

Change the bit in _OSC method so that the OS selects ACPI PCI Hot-plug
instead of PCIe Native.

[1] https://gitlab.com/qemu-project/qemu/-/issues/641
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2006409

Signed-off-by: Julia Suvorova <[email protected]>
Signed-off-by: Igor Mammedov <[email protected]>
Message-Id: <20211112110857.3116853 [email protected]>
Reviewed-by: Ani Sinha <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

bios-tables-test: Allow changes in DSDT ACPI tables

Prepare for changing the _OSC method in q35 DSDT.

Signed-off-by: Julia Suvorova <[email protected]>
Signed-off-by: Igor Mammedov <[email protected]>
Acked-by: Ani Sinha <[email protected]>
Message-Id: <20211112110857.3116853 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

hw/acpi/ich9: Add compat prop to keep HPC bit set for 6.1 machine type

To solve issues [1-2] the Hot Plug Capable bit in PCIe Slots will be
turned on, while the switch to ACPI Hot-plug will be done in the
DSDT table.

Introducing 'x-keep-native-hpc' property disables the HPC bit only
in 6.1 and as a result keeps the forced 'reserve-io' on
pcie-root-ports in 6.1 too.

[1] https://gitlab.com/qemu-project/qemu/-/issues/641
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2006409

Signed-off-by: Julia Suvorova <[email protected]>
Signed-off-by: Igor Mammedov <[email protected]>
Message-Id: <20211112110857.3116853 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pcie: rename 'native-hotplug' to 'x-native-hotplug'

Mark property as experimental/internal adding 'x-' prefix.

Property was introduced in 6.1 and it should have provided
ability to turn on native PCIE hotplug on port even when
ACPI PCI hotplug is in use is user explicitly sets property
on CLI. However that never worked since slot is wired to
ACPI hotplug controller.
Another non-intended usecase: disable native hotplug on slot
when APCI based hotplug is disabled, which works but slot has
'hotplug' property for this taks.

It should be relatively safe to rename it to experimental
as no users should exist for it and given that the property
is broken we don't really want to leave it around for much
longer lest users start using it.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Ani Sinha <[email protected]>
Message-Id: <20211112110857.3116853 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

Merge tag 'pull-ppc-20211112' of https://github.com/legoater/qemu into staging

ppc 6.2 queue :

* Fix of a regression in floating point load instructions (Matheus)
* Associativity fix for pseries machine (Daniel)
* tlbivax fix for BookE machines (Danel)

# gpg: Signature made Fri 12 Nov 2021 12:11:29 PM CET
# gpg:                using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <[email protected]>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B  0B60 51A3 43C7 CFFB ECA1

* tag 'pull-ppc-20211112' of https://github.com/legoater/qemu:
  ppc/mmu_helper.c: do not truncate 'ea' in booke206_invalidate_ea_tlb()
  spapr_numa.c: fix FORM1 distance-less nodes
  target/ppc: Fix register update on lf[sd]u[x]/stf[sd]u[x]

Signed-off-by: Richard Henderson <[email protected]>

Merge tag 'pull-tcg-20211111' of https://gitlab.com/rth7680/qemu into staging

appease coverity vs extract2
update docs for ctpop opcodes
tcg/s390x build fix for gcc11

# gpg: Signature made Thu 11 Nov 2021 12:05:20 PM CET
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Richard Henderson <[email protected]>" [ultimate]

* tag 'pull-tcg-20211111' of https://gitlab.com/rth7680/qemu:
  tcg/s390x: Fix tcg_out_vec_op argument type
  tcg: Document ctpop opcodes
  tcg: Remove TCI experimental status
  tcg/optimize: Add an extra cast to fold_extract2

Signed-off-by: Richard Henderson <[email protected]>

tcg/s390x: Fix tcg_out_vec_op argument type

Newly defined tcg_out_vec_op (34ef767609 tcg/s390x: Add host vector framework)
for s390x uses pointer argument definition.
This fails on gcc 11 as original declaration uses array argument:

In file included from ../tcg/tcg.c:430:
/builddir/build/BUILD/qemu-6.1.50/tcg/s390x/tcg-target.c.inc:2702:42: error: argument 5 of type 'const TCGArg *' {aka 'const long unsigned int *'} declared as a pointer [-Werror=array-parameter=]
2702 |                            const TCGArg *args, const int *const_args)
      |                            ~~~~~~~~~~~~~~^~~~
../tcg/tcg.c:121:41: note: previously declared as an array 'const TCGArg[16]' {aka 'const long unsigned int[16]'}
  121 |                            const TCGArg args[TCG_MAX_OP_ARGS],
      |                            ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
In file included from ../tcg/tcg.c:430:
/builddir/build/BUILD/qemu-6.1.50/tcg/s390x/tcg-target.c.inc:2702:59: error: argument 6 of type 'const int *' declared as a pointer [-Werror=array-parameter=]
2702 |                            const TCGArg *args, const int *const_args)
      |                                                ~~~~~~~~~~~^~~~~~~~~~
../tcg/tcg.c:122:38: note: previously declared as an array 'const int[16]'
  122 |                            const int const_args[TCG_MAX_OP_ARGS]);
      |                            ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixing argument type to pass build.

Signed-off-by: Miroslav Rezanina <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Message-Id: <20211027085629 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Document ctpop opcodes

Fixes: a768e4e99247
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/658
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg: Remove TCI experimental status

The following commits (released in v6.0.0) made raised the
quality of the TCI backend to the other TCG architectures,
thus is is not considerated experimental anymore:
- c6fbea47664..2f74f45e32b
- dc09f047edd..9e9acb7b348
- b6139eb0578..2fc6f16ca5e
- dbcbda2cd84..5e8892db93f

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <20211106111457 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

tcg/optimize: Add an extra cast to fold_extract2

There is no bug, but silence a warning about computation
in int32_t being assigned to a uint64_t.

Reported-by: Coverity CID 1465220
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

ppc/mmu_helper.c: do not truncate 'ea' in booke206_invalidate_ea_tlb()

'tlbivax' is implemented by gen_tlbivax_booke206() via
gen_helper_booke206_tlbivax(). In case the TLB needs to be flushed,
booke206_invalidate_ea_tlb() is called. All these functions, but
booke206_invalidate_ea_tlb(), uses a 64-bit effective address 'ea'.

booke206_invalidate_ea_tlb() uses an uint32_t 'ea' argument that
truncates the original 'ea' value for apparently no particular reason.
This function retrieves the tlb pointer by calling booke206_get_tlbm(),
which also uses a target_ulong address as parameter - in this case, a
truncated 'ea' address. All the surrounding logic considers the
effective TLB address as a 64 bit value, aside from the signature of
booke206_invalidate_ea_tlb().

Last but not the least, PowerISA 2.07B section 6.11.4.9 [2] makes it
clear that the effective address "EA" is a 64 bit value.

Commit 01662f3e5133 introduced this code and no changes were made ever
since. An user detected a problem with tlbivax [1] stating that this
address truncation was the cause. This same behavior might be the source
of several subtle bugs that were never caught.

For all these reasons, this patch assumes that this address truncation
is the result of a mistake/oversight of the original commit, and changes
booke206_invalidate_ea_tlb() 'ea' argument to 'vaddr'.

[1] https://gitlab.com/qemu-project/qemu/-/issues/52
[2] https://wiki.raptorcs.com/wiki/File:PowerISA_V2.07B.pdf

Fixes: 01662f3e5133 ("PPC: Implement e500 (FSL) MMU")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/52
Signed-off-by: Daniel Henrique Barboza <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>

Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging

* Fixes for SGX
* force_rcu notifiers

# gpg: Signature made Wed 10 Nov 2021 10:57:48 PM CET
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Paolo Bonzini <[email protected]>" [full]
# gpg:                 aka "Paolo Bonzini <[email protected]>" [full]

* tag 'for-upstream' of https://gitlab.com/bonzini/qemu:
  sgx: Reset the vEPC regions during VM reboot
  numa: avoid crash with SGX and "info numa"
  accel/tcg: Register a force_rcu notifier
  rcu: Introduce force_rcu notifier
  target/i386: sgx: mark device not user creatable

Signed-off-by: Richard Henderson <[email protected]>

hw/mem/pc-dimm: Restrict NUMA-specific code to NUMA machines

When trying to use the pc-dimm device on a non-NUMA machine, we get:

  $ qemu-system-arm -M none -cpu max -S \
      -object memory-backend-file,id=mem1,size=1M,mem-path=/tmp/1m \
      -device pc-dimm,id=dimm1,memdev=mem1
  Segmentation fault (core dumped)

  (gdb) bt
  #0  pc_dimm_realize (dev=0x555556da3e90, errp=0x7fffffffcd10) at hw/mem/pc-dimm.c:184
  #1  0x0000555555fe1f8f in device_set_realized (obj=0x555556da3e90, value=true, errp=0x7fffffffce18) at hw/core/qdev.c:531
  #2  0x0000555555feb4a9 in property_set_bool (obj=0x555556da3e90, v=0x555556e54420, name=0x5555563c3c41 "realized", opaque=0x555556a704f0, errp=0x7fffffffce18) at qom/object.c:2257

To avoid that crash, restrict the pc-dimm NUMA check to machines
supporting NUMA, and do not allow the use of 'node' property on
non-NUMA machines.

Suggested-by: Igor Mammedov <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <20211106145016 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost: Fix last vq queue index of devices with no cvq

The -1 assumes that cvq device model is accounted in data_queue_pairs,
if cvq does not exists, but it's actually the opposite: Devices with
!cvq are ok but devices with cvq does not add the last queue to
data_queue_pairs.

This is not a problem to vhost-net, but it is to vhost-vdpa:
* Devices with cvq gets initialized at last data vq device model, not
at cvq one.
* Devices with !cvq never gets initialized, since last_index is the
first queue of the last device model.

Because of that, the right change in last_index is to actually add the
cvq, not to remove the missing one.

This is not a problem to vhost-net, but it is to vhost-vdpa, which
device model trust to reach the last index to finish starting the
device.

Also, as the previous commit, rename it to index_end.

Tested with vp_vdpa with host's vhost=on and vhost=off, with ctrl_vq=on
and ctrl_vq=off.

Fixes: 049eb15b5fc9 ("vhost: record the last virtqueue index for the virtio device")
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Eugenio Pérez <[email protected]>
Message-Id: <20211104085625.2054959 [email protected]>
Acked-by: Jason Wang <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost: Rename last_index to vq_index_end

The doc of this field pointed out that last_index is the last vq index.
This is misleading, since it's actually one past the end of the vqs.

Renaming and modifying comment.

Signed-off-by: Eugenio Pérez <[email protected]>
Acked-by: Jason Wang <[email protected]>
Message-Id: <20211104085625.2054959 [email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

softmmu/qdev-monitor: fix use-after-free in qdev_set_id()

Reported by Coverity (CID 1465222).

Fixes: 4a1d937796de0fecd8b22d7dbebf87f38e8282fd ("softmmu/qdev-monitor: add error handling in qdev_set_id")
Cc: Damien Hedde <[email protected]>
Cc: Kevin Wolf <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>
Message-Id: <20211102163342 [email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Kevin Wolf <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Reviewed-by: Damien Hedde <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>

net/vhost-vdpa: fix memory leak in vhost_vdpa_get_max_queue_pairs()

Use g_autofree to ensure that `config` is freed when
vhost_vdpa_get_max_queue_pairs() returns.

Reported-by: Coverity (CID 1465228: RESOURCE_LEAK)
Fixes: 402378407d ("vhost-vdpa: multiqueue support")
Signed-off-by: Stefano Garzarella <[email protected]>
Message-Id: <20211102155157 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Acked-by: Jason Wang <[email protected]>

sgx: Reset the vEPC regions during VM reboot

For bare-metal SGX on real hardware, the hardware provides guarantees
SGX state at reboot. For instance, all pages start out uninitialized.
The vepc driver provides a similar guarantee today for freshly-opened
vepc instances, but guests such as Windows expect all pages to be in
uninitialized state on startup, including after every guest reboot.

Qemu can invoke the ioctl to bring its vEPC pages back to uninitialized
state. There is a possibility that some pages fail to be removed if they
are SECS pages, and the child and SECS pages could be in separate vEPC
regions. Therefore, the ioctl returns the number of EREMOVE failures,
telling Qemu to try the ioctl again after it's done with all vEPC regions.

The related kernel patches:
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yang Zhong <[email protected]>
Message-Id: <20211101162009 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

spapr_numa.c: fix FORM1 distance-less nodes

Commit 71e6fae3a99 fixed an issue with FORM2 affinity guests with NUMA
nodes in which the distance info is absent in
machine_state->numa_state->nodes. This happens when QEMU adds a default
NUMA node and when the user adds NUMA nodes without specifying the
distances.

During the discussions of the forementioned patch [1] it was found that
FORM1 guests were behaving in a strange way in the same scenario, with
the kernel seeing the distances between the nodes as '160', as we can
see in this example with 4 NUMA nodes without distance information:

$ numactl -H
available: 4 nodes (0-3)
(...)
node distances:
node   0   1   2   3
  0:  10  160  160  160
  1:  160  10  160  160
  2:  160  160  10  160
  3:  160  160  160  10

Turns out that we have the same problem with FORM1 guests - we are
calculating associativity domain using zeroed values. And as it also
turns out, the solution from 71e6fae3a99 applies to FORM1 as well.

This patch creates a wrapper called 'get_numa_distance' that contains
the logic used in FORM2 to define node distances when this information
is absent. This helper is then used in all places where we need to read
distance information from machine_state->numa_state->nodes. That way
we'll guarantee that the NUMA node distance is always being curated
before being used.

After this patch, the FORM1 guest mentioned above will have the
following topology:

$ numactl -H
available: 4 nodes (0-3)
(...)
node distances:
node   0   1   2   3
  0:  10  20  20  20
  1:  20  10  20  20
  2:  20  20  10  20
  3:  20  20  20  10

This is compatible with what FORM2 guests and other archs do in this
case.

[1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg01960.html

Fixes: 690fbe4295d5 ("spapr_numa: consider user input when defining associativity")
CC: Aneesh Kumar K.V <[email protected]>
CC: Nicholas Piggin <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Daniel Henrique Barboza <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>

numa: avoid crash with SGX and "info numa"

Add the MEMORY_DEVICE_INFO_KIND_SGX_EPC case, so that enclave
memory is included in the output of "info numa" instead of crashing
the monitor.

Fixes: a7c565a941 ("sgx-epc: Add the fill_device_info() callback support", 2021-09-30)
Signed-off-by: Paolo Bonzini <[email protected]>

accel/tcg: Register a force_rcu notifier

A TCG vCPU doing a busy loop systematicaly hangs the QEMU monitor
if the user passes 'device_add' without argument. This is because
drain_cpu_all() which is called from qmp_device_add() cannot return
if readers don't exit read-side critical sections. That is typically
what busy-looping TCG vCPUs do:

int cpu_exec(CPUState *cpu)
{
[...]
    rcu_read_lock();
[...]
    while (!cpu_handle_exception(cpu, &ret)) {
        // Busy loop keeps vCPU here
    }
[...]
    rcu_read_unlock();

    return ret;
}

For MTTCG, have all vCPU threads register a force_rcu notifier that will
kick them out of the loop using async_run_on_cpu(). The notifier is called
with the rcu_registry_lock mutex held, using async_run_on_cpu() ensures
there are no deadlocks.

For RR, a single thread runs all vCPUs. Just register a single notifier
that kicks the current vCPU to the next one.

For MTTCG:
Suggested-by: Paolo Bonzini <[email protected]>
For RR:
Suggested-by: Richard Henderson <[email protected]>
Fixes: 7bed89958bfb ("device_core: use drain_call_rcu in in qmp_device_add")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/650
Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <20211109183523 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

rcu: Introduce force_rcu notifier

The drain_rcu_call() function can be blocked as long as an RCU reader
stays in a read-side critical section. This is typically what happens
when a TCG vCPU is executing a busy loop. It can deadlock the QEMU
monitor as reported in https://gitlab.com/qemu-project/qemu/-/issues/650 .

This can be avoided by allowing drain_rcu_call() to enforce an RCU grace
period. Since each reader might need to do specific actions to end a
read-side critical section, do it with notifiers.

Prepare ground for this by adding a notifier list to the RCU reader
struct and use it in wait_for_readers() if drain_rcu_call() is in
progress. An API is added for readers to register their notifiers.

This is largely based on a draft from Paolo Bonzini.

Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <20211109183523 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge tag 'pull-qapi-2021-11-10' of git://repo.or.cz/qemu/armbru into staging

QAPI patches patches for 2021-11-10

# gpg: Signature made Wed 10 Nov 2021 06:21:23 AM CET
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Markus Armbruster <[email protected]>" [full]
# gpg:                 aka "Markus Armbruster <[email protected]>" [full]

* tag 'pull-qapi-2021-11-10' of git://repo.or.cz/qemu/armbru:
  qapi: Belatedly mark unstable QMP parts with feature 'unstable'
  docs/devel/qapi-code-gen: Belatedly document feature documentation
  docs/devel/qapi-code-gen: Drop a duplicate paragraph

Signed-off-by: Richard Henderson <[email protected]>

Merge tag 'pull-monitor-2021-11-10' of git://repo.or.cz/qemu/armbru into staging

Monitor patches patches for 2021-11-10

# gpg: Signature made Wed 10 Nov 2021 06:15:38 AM CET
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Markus Armbruster <[email protected]>" [full]
# gpg:                 aka "Markus Armbruster <[email protected]>" [full]

* tag 'pull-monitor-2021-11-10' of git://repo.or.cz/qemu/armbru:
  monitor: Fix find_device_state() for IDs containing slashes

Signed-off-by: Richard Henderson <[email protected]>

target/ppc: Fix register update on lf[sd]u[x]/stf[sd]u[x]

These instructions should update the GPR indicated by the field RA
instead of RT. This error caused a regression on Mac OS 9 boot and some
graphical glitches in OS X.

Fixes: a39a106634a9 ("target/ppc: Move load and store floating point instructions to decodetree")
Reported-by: Mark Cave-Ayland <[email protected]>
Tested-by: Mark Cave-Ayland <[email protected]>
Signed-off-by: Matheus Ferst <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>