Git Repo - qemu.git/log

block/curl: fix minor memory leaks

Signed-off-by: Jeff Cody <[email protected]>
Reviewed-by: Richard W.M. Jones <[email protected]>
Signed-off-by: Jeff Cody <[email protected]>

block/curl: check error return of curl_global_init()

If curl_global_init() fails, per the documentation no other curl
functions may be called, so make sure to check the return value.

Also, some minor changes to the initialization latch variable 'inited':

- Make it static in the file, for clarity
- Change the name for clarity
- Make it a bool

Signed-off-by: Jeff Cody <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Richard W.M. Jones <[email protected]>
Reviewed-by: Darren Kenny <[email protected]>
Signed-off-by: Jeff Cody <[email protected]>

block/sheepdog: code beautification

No functional changes, just whitespace manipulation.

Signed-off-by: Jeff Cody <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Darren Kenny <[email protected]>
Signed-off-by: Jeff Cody <[email protected]>

block/sheepdog: remove spurious NULL check

'tag' is already checked in the lines immediately preceding this check,
and set to non-NULL if NULL. No need to check again, it hasn't changed.

Signed-off-by: Jeff Cody <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Darren Kenny <[email protected]>
Signed-off-by: Jeff Cody <[email protected]>

blockjob: kick jobs on set-speed

If users set an unreasonably low speed (like one byte per second), the
calculated delay may exceed many hours. While we like to punish users
for asking for stupid things, we do also like to allow users to correct
their wicked ways.

When a user provides a new speed, kick the job to allow it to recalculate
its delay.

Signed-off-by: John Snow <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Message-id: 20171213204611 [email protected]
Signed-off-by: Jeff Cody <[email protected]>

backup: use copy_bitmap in incremental backup

We can use copy_bitmap instead of sync_bitmap. copy_bitmap is
initialized from sync_bitmap and it is more informative: we will not try
to process data, that is already in progress (by write notifier).

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Message-id: 20171012135313 [email protected]
Signed-off-by: Jeff Cody <[email protected]>

backup: simplify non-dirty bits progress processing

Set fake progress for non-dirty clusters in copy_bitmap initialization,
to. It simplifies code and allows further refactoring.

This patch changes user's view of backup progress, but formally it
doesn't changed: progress hops are just moved to the beginning.

Actually it's just a point of view: when do we actually skip clusters?
We can say in the very beginning, that we skip these clusters and do
not think about them later.

Of course, if go through disk sequentially, it's logical to say, that
we skip clusters between copied portions to the left and to the right
of them. But even now copying progress is not sequential because of
write notifiers. Future patches will introduce new backup architecture
which will do copying in several coroutines in parallel, so it will
make no sense to publish fake progress by parts in parallel with
other copying requests.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Message-id: 20171012135313 [email protected]
Signed-off-by: Jeff Cody <[email protected]>

backup: init copy_bitmap from sync_bitmap for incremental

We should not copy non-dirty clusters in write notifiers. So,
initialize copy_bitmap from sync_bitmap.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: John Snow <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Message-id: 20171012135313 [email protected]
Signed-off-by: Jeff Cody <[email protected]>

backup: move from done_bitmap to copy_bitmap

Use HBitmap copy_bitmap instead of done_bitmap. This is needed to
improve incremental backup in following patches and to unify backup
loop for full/incremental modes in future patches.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: John Snow <[email protected]>
Message-id: 20171012135313 [email protected]
Signed-off-by: Jeff Cody <[email protected]>

hbitmap: add next_zero function

The function searches for next zero bit.
Also add interface for BdrvDirtyBitmap and unit test.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Reviewed-by: John Snow <[email protected]>
Message-id: 20171012135313 [email protected]
Signed-off-by: Jeff Cody <[email protected]>

Merge remote-tracking branch 'remotes/stefanberger/tags/pull-tpm-2017-12-15-1' into staging

Merge tpm 2017/12/15 v1

# gpg: Signature made Fri 15 Dec 2017 04:44:15 GMT
# gpg:                using RSA key 0x75AD65802A0B4211
# gpg: Good signature from "Stefan Berger <[email protected]>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B818 B9CA DF90 89C2 D5CE  C66B 75AD 6580 2A0B 4211

* remotes/stefanberger/tags/pull-tpm-2017-12-15-1: (32 commits)
  tpm: tpm_passthrough: Fail startup if FE buffer size < BE buffer size
  tpm: tpm_emulator: get and set buffer size of device
  tpm: tpm_passthrough: Read the buffer size from the host device
  tpm: pull tpm_util_request() out of tpm_util_test()
  tpm: Move getting TPM buffer size to backends
  tpm: remove tpm_register_model()
  tpm-tis: use DEFINE_PROP_TPMBE
  qdev: add DEFINE_PROP_TPMBE
  tpm-tis: check that at most one TPM device exists
  tpm-tis: remove redundant 'tpm_tis:' in error messages
  tpm-emulator: add a FIXME comment about blocking cancel
  acpi: change TPM TIS data conditions
  tpm: add tpm_cmd_get_size() to tpm_util
  tpm: add TPM interface to lookup TPM version
  tpm: lookup the the TPM interface instead of TIS device
  tpm: rename qemu_find_tpm() -> qemu_find_tpm_be()
  tpm-tis: simplify header inclusion
  tpm-passthrough: workaround a possible race
  tpm-passthrough: simplify create()
  tpm-passthrough: make it safer to destroy after creation
  ...

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/berrange/tags/pull-qio-201712151' into staging

Merge qio 2017/12/15 v1

# gpg: Signature made Fri 15 Dec 2017 15:07:34 GMT
# gpg:                using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <[email protected]>"
# gpg:                 aka "Daniel P. Berrange <[email protected]>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF

* remotes/berrange/tags/pull-qio-201712151:
  io: introduce a network socket listener API

Signed-off-by: Peter Maydell <[email protected]>

sparc: Make sure we mmap at SHMLBA alignment

SPARC Linux has an oddity that it insists that mmap()
of MAP_FIXED memory must be at an alignment defined by
SHMLBA, which is more aligned than the page size
(typically, SHMLBA alignment is to 16K, and pages are 8K).
This is a relic of ancient hardware that had cache
aliasing constraints, but even on modern hardware the
kernel still insists on the alignment.

To ensure that we get mmap() alignment sufficient to
make the kernel happy, change QEMU_VMALLOC_ALIGN,
qemu_fd_getpagesize() and qemu_mempath_getpagesize()
to use the maximum of getpagesize() and SHMLBA.

In particular, this allows 'make check' to pass on Sparc:
we were previously failing the ivshmem tests.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: 1512752248 [email protected]

io: introduce a network socket listener API

The existing QIOChannelSocket class provides the ability to
listen on a single socket at a time. This patch introduces
a QIONetListener class that provides a higher level API
concept around listening for network services, allowing
for listening on multiple sockets.

Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Daniel P. Berrange <[email protected]>

Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20171215-v2' into staging

s390x changes for 2.12:
- Lots of tcg improvements: ccw hotplug is now working and we can run
  a Linux kernel built for z12 under tcg
- zPCI improvements to get virtio-pci working
- get rid of the cssid restrictions for virtual and non-virtual channel
  devices
- we now support 8TB+ systems
- 2.12 compat machine
- fixes and cleanups

# gpg: Signature made Fri 15 Dec 2017 10:57:01 GMT
# gpg:                using RSA key 0xDECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <[email protected]>"
# gpg:                 aka "Cornelia Huck <[email protected]>"
# gpg:                 aka "Cornelia Huck <[email protected]>"
# gpg:                 aka "Cornelia Huck <[email protected]>"
# gpg:                 aka "Cornelia Huck <[email protected]>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0  18CE DECF 6B93 C6F0 2FAF

* remotes/cohuck/tags/s390x-20171215-v2: (46 commits)
  s390-ccw-virtio: allow for systems larger that 7.999TB
  s390x: change the QEMU cpu model to a stripped down z12
  s390x/tcg: we already implement the Set-Program-Parameter facility
  s390x/tcg: implement extract-CPU-time facility
  s390x/tcg: Implement SIGNAL ADAPTER instruction
  s390x/tcg: Implement STORE CHANNEL PATH STATUS
  s390x/tcg: wire up SET CHANNEL MONITOR
  s390x/tcg: wire up SET ADDRESS LIMIT
  s390x/tcg: implement Interlocked-Access Facility 2
  s390x/tcg: ASI/ASGI/ALSI/ALSGI are atomic with Interlocked-acccess facility 1
  s390x/tcg: wire up STORE CHANNEL REPORT WORD
  s390x/tcg: indicate value of TODPR in STCKE
  s390x/tcg: implement SET CLOCK PROGRAMMABLE FIELD
  s390x/tcg: fix and cleanup mcck injection
  s390x/kvm: factor out build_channel_report_mcic() into cpu.h
  s390x/css: attach css bridge
  s390x: deprecate s390-squash-mcss machine prop
  s390x/css: unrestrict cssids
  s390x/pci: search for subregion inside the BARs
  s390x/pci: move the memory region write from pcistg
  ...

# Conflicts:
# include/hw/compat.h

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.12-20171215' into staging

ppc patch queue 2017-12-15

First pull request for qemu-2.12.  This has quite a bit of stuff
accumulated while 2.11 was finalizing.  Highlights are:

  * Some preliminary work towards implementing the "XIVE" POWER9
    interrupt controller
  * Some fixes for problems during reboot with MTTCG
  * A substantial TCG performance improvement via
    tcg_get_lookup_and_goto_ptr
  * Numerous assorted cleanups and bugfixes that weren't urgent enough
    for 2.11

# gpg: Signature made Fri 15 Dec 2017 03:14:12 GMT
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <[email protected]>"
# gpg:                 aka "David Gibson (Red Hat) <[email protected]>"
# gpg:                 aka "David Gibson (ozlabs.org) <[email protected]>"
# gpg:                 aka "David Gibson (kernel.org) <[email protected]>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.12-20171215: (24 commits)
  spapr: don't initialize PATB entry if max-cpu-compat < power9
  spapr: Assume msi_nonbroken
  spapr: Rename machine init functions for clarity
  target/ppc: introduce the PPC_BIT() macro
  spapr_events: drop bogus cell from "interrupt-ranges" property
  spapr: fix LSI interrupt specifiers in the device tree
  spapr: replace numa_get_node() with lookup in pc-dimm list
  spapr: introduce a spapr_qirq() helper
  spapr: introduce a spapr_irq_set_lsi() helper
  spapr: move the IRQ allocation routines under the machine
  ppc/xics: assign of the CPU 'intc' pointer under the core
  ppc/xics: introduce an icp_create() helper
  spapr/rtas: do not reset the MSR in stop-self command
  spapr/rtas: fix reboot of a a SMP TCG guest
  spapr/rtas: disable the decrementer interrupt when a CPU is unplugged
  e500: fix pci host bridge class/type
  openpic: debug w/ info_report()
  pcc: define the Power-saving mode Exit Cause Enable bits in PowerPCCPUClass
  nvram: add AT24Cx i2c eeprom
  e500: name openpic and pci host bridge
  ...

Signed-off-by: Peter Maydell <[email protected]>

s390-ccw-virtio: allow for systems larger that 7.999TB

KVM does not allow memory regions > KVM_MEM_MAX_NR_PAGES, basically
limiting the memory per slot to 8TB-4k. As memory slots on s390/kvm must
be a multiple of 1MB we need start a new memory region if we cross
8TB-1M.

With that (and optimistic overcommitment in the kernel) I was able to
start a 24TB guest on a 1TB system.

Signed-off-by: Christian Borntraeger <[email protected]>
Message-Id: <20171211122146 [email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
[CH: 1UL -> 1ULL in KVM_MEM_MAX_NR_PAGES; build fix on 32 bit hosts]
Signed-off-by: Cornelia Huck <[email protected]>

Merge remote-tracking branch 'remotes/sstabellini/tags/xen-20171214-tag' into staging

Xen 2017/12/14

# gpg: Signature made Fri 15 Dec 2017 00:26:26 GMT
# gpg:                using RSA key 0x894F8F4870E1AE90
# gpg: Good signature from "Stefano Stabellini <[email protected]>"
# gpg:                 aka "Stefano Stabellini <[email protected]>"
# Primary key fingerprint: D04E 33AB A51F 67BA 07D3  0AEA 894F 8F48 70E1 AE90

* remotes/sstabellini/tags/xen-20171214-tag:
  xen/pt: Set is_express to avoid out-of-bounds write
  xenfb: activate input handlers for raw pointer devices
  xenfb: Add [feature|request]-raw-pointer
  xenfb: Use Input Handlers directly
  ui: generate qcode to linux mappings
  xen-disk: use an IOThread per instance

Signed-off-by: Peter Maydell <[email protected]>

tpm: tpm_passthrough: Fail startup if FE buffer size < BE buffer size

If the requested buffer size of the frontend is smaller than the fixed
buffer size of the host's TPM, fail the startup_tpm() interface function,
which will make the device unusable. We fail it because the backend TPM
could produce larger packets than what the frontend could pass to the OS.

The current combination of TIS frontend and either passthrough or emulator
backend will not lead to this case since the TIS can support any size of
buffer.

Signed-off-by: Stefan Berger <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>

tpm: tpm_emulator: get and set buffer size of device

Convert the tpm_emulator backend to get the current buffer size
of the external device and set it to the buffer size that the
frontend (TIS) requests.

Signed-off-by: Stefan Berger <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>

tpm: tpm_passthrough: Read the buffer size from the host device

Rather than hard coding the buffer size in the tpm_passthrough
backend read the TPM I/O buffer size from the host device.

Signed-off-by: Stefan Berger <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>

tpm: pull tpm_util_request() out of tpm_util_test()

Signed-off-by: Stefan Berger <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>

tpm: Move getting TPM buffer size to backends

Rather than setting the size of the TPM buffer in the front-end,
query the backend for the size of the buffer. In this patch we
just move the hard-coded buffer size of 4096 to the backends.

Signed-off-by: Stefan Berger <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>

tpm: remove tpm_register_model()

Query object classes that implements TPMIf instead.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm-tis: use DEFINE_PROP_TPMBE

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

qdev: add DEFINE_PROP_TPMBE

A property to lookup a tpm backend.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm-tis: check that at most one TPM device exists

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm-tis: remove redundant 'tpm_tis:' in error messages

The reported error message is already prefixed with the -device
name & arguments.

Before:
qemu-system-x86_64: -device tpm-tis,id=foo,tpmdev=foo,irq=21: tpm_tis: IRQ 21 is outside valid range of 0 to 15

After:
qemu-system-x86_64: -device tpm-tis,id=foo,tpmdev=foo,irq=21: IRQ 21 is outside valid range of 0 to 15

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm-emulator: add a FIXME comment about blocking cancel

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

acpi: change TPM TIS data conditions

The device should be exposed if present. It shouldn't have an
undefined version (or else backend init failed, and device should fail
too). Finally, make the fields specific to TIS device model.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm: add tpm_cmd_get_size() to tpm_util

The function is generally useful and used in the following patches.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm: add TPM interface to lookup TPM version

Do not hardcode TPM device model to lookup version, use an interface
instead.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm: lookup the the TPM interface instead of TIS device

This will allow to introduce new devices implementing TPM.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm: rename qemu_find_tpm() -> qemu_find_tpm_be()

find_tpm() will be introduced to lookup the TPM device.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm-tis: simplify header inclusion

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm-passthrough: workaround a possible race

The TPM backend processing thread has common shared variable race
issues. (they should not be so easy to reach since guest interaction
with the device is slow compared to host emulation)

An obvious one is setting op_cancelled from device thread after
calling write(cancel_fd). The backend thread may return before the
device thread has set the variable. Instead set it before
cancellation. Even if the write() failed, the end result is command
get possibly cancelled (even if cancellation came from external
sources it doesn't matter much).

It's worth to consider removing the backend processing thread for now.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm-passthrough: simplify create()

Use a similar code as tpm_emulator_create(), call handle_opts() and
handle failure cleanup with object_unref() in create().

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm-passthrough: make it safer to destroy after creation

Check fds values before closing, to avoid close(-1).

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm-backend: move set 'id' to common code

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm-passthrough: pass TPMPassthruState to handle_device_opts

It doesn't need TPMBackend. Also reorder arguments for consistency.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm-be: update optional function pointers

QEMU code doesn't generally have assert() for mandatory
callbacks/function pointers, probably because the crash is pretty
obvious. Document the methods instead of going into the code.

Make get_tpm_options() mandatory to implement (since all
backend implementation have it).

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm-passthrough: don't save guessed cancel_path in options

The value is later unneeded, and may leak if the free visitor doesn't
consider it since has_cancel_path is false. And for consistency with
"path" it shouldn't be returned in get_tpm_options().

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm: remove unused opened code

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm-be: ask model to the TPM interface

No need to store the mode in the backend, or to let the frontend set
it itself.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm-be: report error instead of front-end

Backend can give more accurate error description, and lift out the job
from the frontend.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm-be: call request_completed() out of thread

Lift from the backend implementation the responsability to call the
request_completed() callback outside of thread context. This also
simplify frontend/interface work, as they no longer need to care
whether the callback is called from a different thread.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm-tis: no longer expose TPMState

Now that there is an interface instead.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm-backend: store TPMIf interface, improve backend_init()

Store the TPM interface, the actual object may be different from
TPMState. Keep a reference on the interface, and check the backend
wasn't already initialized.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm: move TpmIf in include/sysemu/tpm.h

This is a better location than hw/tpm, since we are going to use the
interface from outside hw/tpm.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

tpm-tis: remove unused locty_number

This field slipped in commit 5086bf9784.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>

xen/pt: Set is_express to avoid out-of-bounds write

The passed-through device might be an express device. In this case the
old code allocated a too small emulated config space in
pci_config_alloc() since pci_config_size() returned the size for a
non-express device. This leads to an out-of-bound write in
xen_pt_config_reg_init(), which sometimes results in crashes. So set
is_express as already done for KVM in vfio-pci.

Shortened ASan report:

==17512==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x611000041648 at pc 0x55e0fdac51ff bp 0x7ffe4af07410 sp 0x7ffe4af07408
WRITE of size 2 at 0x611000041648 thread T0
    #0 0x55e0fdac51fe in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53
    #1 0x55e0fdac51fe in stw_he_p include/qemu/bswap.h:330
    #2 0x55e0fdac51fe in stw_le_p include/qemu/bswap.h:379
    #3 0x55e0fdac51fe in pci_set_word include/hw/pci/pci.h:490
    #4 0x55e0fdac51fe in xen_pt_config_reg_init hw/xen/xen_pt_config_init.c:1991
    #5 0x55e0fdac51fe in xen_pt_config_init hw/xen/xen_pt_config_init.c:2067
    #6 0x55e0fdabcf4d in xen_pt_realize hw/xen/xen_pt.c:830
    #7 0x55e0fdf59666 in pci_qdev_realize hw/pci/pci.c:2034
    #8 0x55e0fdda7d3d in device_set_realized hw/core/qdev.c:914
[...]

0x611000041648 is located 8 bytes to the right of 256-byte region [0x611000041540,0x611000041640)
allocated by thread T0 here:
    #0 0x7ff596a94bb8 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xd9bb8)
    #1 0x7ff57da66580 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x50580)
    #2 0x55e0fdda7d3d in device_set_realized hw/core/qdev.c:914
[...]

Signed-off-by: Simon Gaiser <[email protected]>
Acked-by: Stefano Stabellini <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>

xenfb: activate input handlers for raw pointer devices

If the frontend requests raw pointers, the input handlers must be
activated to have the input events delivered to the xenfb backend.
Without activation, the input events are delivered to handlers
registered earlier, which would be the emulated USB tablet or
emulated PS/2 mouse.
HVM xen_kbdfront can incorrectly scale absolute coordinates when
the display resolution is not 800x600.

Signed-off-by: Owen Smith <[email protected]>
Reviewed-by: Gerd Hoffmann <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>

xenfb: Add [feature|request]-raw-pointer

Writes "feature-raw-pointer" during init to indicate the backend
can pass raw unscaled values for absolute axes to the frontend.
Frontends set "request-raw-pointer" to indicate the backend should
not attempt to scale absolute values to console size.
"request-raw-pointer" is only valid if "request-abs-pointer" is
also set. Raw unscaled pointer values are in the range [0, 0x7fff]

"feature-raw-pointer" and "request-raw-pointer" added to Xen
header in commit 7868654ff7fe5e4a2eeae2b277644fa884a5031e

Signed-off-by: Owen Smith <[email protected]>
Reviewed-by: Gerd Hoffmann <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>

xenfb: Use Input Handlers directly

Avoid the unneccessary calls through the input-legacy.c file by
using the qemu_input_handler_*() calls directly. This did require
reworking the event and sync handlers to use the reverse mapping
from qcode to linux using qemu_input_qcode_to_linux().
Removes the scancode2linux mapping, and supporting documention.

Signed-off-by: Owen Smith <[email protected]>
Reviewed-by: Gerd Hoffmann <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>

ui: generate qcode to linux mappings

Use keycodedb to generate a qcode to linux mapping

Signed-off-by: Owen Smith <[email protected]>
Reviewed-by: Gerd Hoffmann <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>

xen-disk: use an IOThread per instance

This patch allocates an IOThread object for each xen_disk instance and
sets the AIO context appropriately on connect. This allows processing
of I/O to proceed in parallel.

The patch also adds tracepoints into xen_disk to make it possible to
follow the state transtions of an instance in the log.

Signed-off-by: Paul Durrant <[email protected]>
Acked-by: Stefano Stabellini <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>

spapr: don't initialize PATB entry if max-cpu-compat < power9

if KVM is enabled and KVM capabilities MMU radix is available,
the partition table entry (patb_entry) for the radix mode is
initialized by default in ppc_spapr_reset().

It's a problem if we want to migrate the guest to a POWER8 host
while the kernel is not started to set the value to the one
expected for a POWER8 CPU.

The "-machine max-cpu-compat=power8" should allow to migrate
a POWER9 KVM host to a POWER8 KVM host, but because patb_entry
is set, the destination QEMU tries to enable radix mode on the
POWER8 host. This fails and cancels the migration:

    Process table config unsupported by the host
    error while loading state for instance 0x0 of device 'spapr'
    load of migration failed: Invalid argument

This patch doesn't set the PATB entry if the user provides
a CPU compatibility mode that doesn't support radix mode.

Signed-off-by: Laurent Vivier <[email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr: Assume msi_nonbroken

We conditionally adjust part of the guest device tree based on the
global msi_nonbroken flag. However, the main machine type code
initializes msi_nonbroken to true and there's nothing that would set
it to false again.

So replace the test with an assert().

Signed-off-by: David Gibson <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>

spapr: Rename machine init functions for clarity

Machine objects have two init functions - the generic QOM level
instance_init which should only do static object initialization, and
the Machine specific MachineClass::init which does the actual
construction of the machine.

In spapr the functions implementing these two have names -
ppc_machine_initfn() and ppc_spapr_init() - which don't correspond closely
to either of those. To prevent people (read, me) from confusing which is
which, rename them spapr_instance_init() and spapr_machine_init() to
make it clearer which is which.

While we're there rename ppc_spapr_reset() to spapr_machine_reset() to
match.

Signed-off-by: David Gibson <[email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Reviewed-by: Suraj Jitindar Singh <[email protected]>

target/ppc: introduce the PPC_BIT() macro

and use them in a couple of obvious places. Other macros will be used
in the model of the XIVE interrupt controller.

Signed-off-by: Cédric Le Goater <[email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr_events: drop bogus cell from "interrupt-ranges" property

According to LoPAPR 1.1 B.6.12, the "/event-sources" node has an "interrupt-
ranges" property, the format of which is described in B.6.9.1.2 as follows:

“interrupt-ranges”
Standard property name that defines the interrupt number(s) and range(s)
handled by this unit.

prop-encoded-array: List of (int-number, range) specifications.

Int-number is encoded as with encode-int.
Range is encoded as with encode-int.

The first entry in this list shall contain the int-number associated with
the first “reg” property entry. The int-num-ber is the value representing
the interrupt source as would appear in the PowerPC External Interrupt
Architecture XISR. The range shall be the number of sequential interrupt
numbers which this unit can generate.

There's no such thing as a cell count at the end of the array, like the
one introduced by commit ffbb1705a33d in QEMU 2.8. It doesn't seem it had
any impact on existing guests and I couldn't find any related workaround
in linux. So, let's just drop the bogus lines.

Signed-off-by: Greg Kurz <[email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr: fix LSI interrupt specifiers in the device tree

LoPAPR 1.1 B.6.9.1.2 describes the "#interrupt-cells" property of the
PowerPC External Interrupt Source Controller node as follows:

“#interrupt-cells”

  Standard property name to define the number of cells in an interrupt-
  specifier within an interrupt domain.

  prop-encoded-array: An integer, encoded as with encode-int, that denotes
  the number of cells required to represent an interrupt specifier in its
  child nodes.

  The value of this property for the PowerPC External Interrupt option shall
  be 2. Thus all interrupt specifiers (as used in the standard “interrupts”
  property) shall consist of two cells, each containing an integer encoded
  as with encode-int. The first integer represents the interrupt number the
  second integer is the trigger code: 0 for edge triggered, 1 for level
  triggered.

This patch fixes the interrupt specifiers in the "interrupt-map" property
of the PHB node, that were setting the second cell to 8 (confusion with
IRQ_TYPE_LEVEL_LOW ?) instead of 1.

VIO devices and RTAS event sources use the same format for interrupt
specifiers: while here, we introduce a common helper to handle the
encoding details.

Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>
Tested-by: Cédric Le Goater <[email protected]>
--
v3: - reference public LoPAPR instead of internal PAPR+ in changelog
    - change helper name to spapr_dt_xics_irq()

v2: - drop the erroneous changes to the "interrupts" prop in PCI device nodes
    - introduce a common helper to encode interrupt specifiers
Signed-off-by: David Gibson <[email protected]>

spapr: replace numa_get_node() with lookup in pc-dimm list

SPAPR is the last user of numa_get_node() and a bunch of
supporting code to maintain numa_info[x].addr list.

Get LMB node id from pc-dimm list, which allows to
remove ~80LOC maintaining dynamic address range
lookup list.

It also removes pc-dimm dependency on numa_[un]set_mem_node_id()
and makes pc-dimms a sole source of information about which
node it belongs to and removes duplicate data from global
numa_info.

Signed-off-by: Igor Mammedov <[email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr: introduce a spapr_qirq() helper

xics_get_qirq() is only used by the sPAPR machine. Let's move it there
and change its name to reflect its scope. It will be useful for XIVE
support which will use its own set of qirqs.

Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr: introduce a spapr_irq_set_lsi() helper

It will make synchronisation easier with the XIVE interrupt mode when
available. The 'irq' parameter refers to the global IRQ number space.

Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr: move the IRQ allocation routines under the machine

Also change the prototype to use a sPAPRMachineState and prefix them
with spapr_irq_. It will let us synchronise the IRQ allocation with
the XIVE interrupt mode when available.

Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Signed-off-by: David Gibson <[email protected]>

ppc/xics: assign of the CPU 'intc' pointer under the core

The 'intc' pointer of the CPU references the interrupt presenter in
the XICS interrupt mode. When the XIVE interrupt mode is available and
activated, the machine will need to reassign this pointer to reflect
the change.

Moving this assignment under the realize routine of the CPU will ease
the process when the interrupt mode is toggled.

Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Signed-off-by: David Gibson <[email protected]>

ppc/xics: introduce an icp_create() helper

The sPAPR and the PowerNV core objects create the interrupt presenter
object of the CPUs in a very similar way. Let's provide a common
routine in which we use the presenter 'type' as a child identifier.

Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr/rtas: do not reset the MSR in stop-self command

When a CPU is stopped with the 'stop-self' RTAS call, its state
'halted' is switched to 1 and, in this case, the MSR is not taken into
account anymore in the cpu_has_work() routine. Only the pending
hardware interrupts are checked with their LPCR:PECE* enablement bit.

The CPU is now also protected from the decrementer interrupt by the
LPCR:PECE* bits which are disabled in the 'stop-self' RTAS
call. Reseting the MSR is pointless.

Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr/rtas: fix reboot of a a SMP TCG guest

Just like for hot unplug CPUs, when a guest is rebooted, the secondary
CPUs can be awaken by the decrementer and start entering SLOF at the
same time the boot CPU is.

To be safe, let's disable on the secondaries all the exceptions which
can cause an exit while the CPU is in power-saving mode.

Based on previous work from Nikunj A Dadhania <[email protected]>

Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr/rtas: disable the decrementer interrupt when a CPU is unplugged

When a CPU is stopped with the 'stop-self' RTAS call, its state
'halted' is switched to 1 and, in this case, the MSR is not taken into
account anymore in the cpu_has_work() routine. Only the pending
hardware interrupts are checked with their LPCR:PECE* enablement bit.

If the DECR timer fires after 'stop-self' is called and before the CPU
'stop' state is reached, the nearly-dead CPU will have some work to do
and the guest will crash. This case happens very frequently with the
not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
occasionally fired but after 'stop' state, so no work is to be done
and the guest survives.

I suspect there is a race between the QEMU mainloop triggering the
timers and the TCG CPU thread but I could not quite identify the root
cause. To be safe, let's disable in the LPCR all the exceptions which
can cause an exit while the CPU is in power-saving mode and reenable
them when the CPU is started.

Signed-off-by: Cédric Le Goater <[email protected]>
Signed-off-by: David Gibson <[email protected]>

e500: fix pci host bridge class/type

Correct some confusion wrt. the PCI facing
side of the PCI host bridge (not PCIe root complex).
The ref. manual for the mpc8533 (as well as
mpc8540 and mpc8540) give the class code as
PCI_CLASS_PROCESSOR_POWERPC.
While the PCI_HEADER_TYPE field is oddly omitted,
the tables in the "PCI Configuration Header"
section shows a type 0 layout using all 6 BAR
registers (as 2x 32, and 2x 64 bit regions)

So 997505065dc92e533debf5cb23012ba4e673d387
seems to be in error. Although there was
perhaps some confusion as the mpc8533
has a separate PCIe root complex.
With PCIe, a root complex has PCI_HEADER_TYPE=1.

Neither the PCI host bridge, nor the PCIe
root complex advertise class PCI_CLASS_BRIDGE_PCI.

This was confusing Linux guests, which try
to interpret the host bridge as a pci-pci
bridge, but get confused and re-enumerate
the bus when the primary/secondary/subordinate
bus registers don't have valid values.

Signed-off-by: Michael Davidsaver <[email protected]>
Signed-off-by: David Gibson <[email protected]>

openpic: debug w/ info_report()

Replace *printf() with *_report().
Remove trailing new lines.

Signed-off-by: Michael Davidsaver <[email protected]>
Signed-off-by: David Gibson <[email protected]>

pcc: define the Power-saving mode Exit Cause Enable bits in PowerPCCPUClass

and use the value to define precisely the default value of the LPCR in
the helper routine cpu_ppc_set_papr()

Signed-off-by: Cédric Le Goater <[email protected]>
Signed-off-by: David Gibson <[email protected]>

nvram: add AT24Cx i2c eeprom

Signed-off-by: Michael Davidsaver <[email protected]>
Signed-off-by: David Gibson <[email protected]>

e500: name openpic and pci host bridge

Signed-off-by: Michael Davidsaver <[email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr_cpu_core: instantiate CPUs separately

The current code assumes that only the CPU core object holds a
reference on each individual CPU object, and happily frees their
allocated memory when the core is unrealized. This is dangerous
as some other code can legitimely keep a pointer to a CPU if it
calls object_ref(), but it would end up with a dangling pointer.

Let's allocate all CPUs with object_new() and let QOM free them
when their reference count reaches zero. This greatly simplify the
code as we don't have to fiddle with the instance size anymore.

Signed-off-by: Greg Kurz <[email protected]>
Acked-by: Igor Mammedov <[email protected]>
Signed-off-by: David Gibson <[email protected]>

spapr: Add pseries-2.12 machine type

While we're at it fix a couple of small errors in the 2.11 and 2.10 models
(they didn't have any real effect, but don't quite match the template).

Signed-off-by: David Gibson <[email protected]>

ppc/xics: remove useless if condition

The previous code section uses a 'first < 0' test and returns. Therefore,
there is no need to test the 'first' variable against '>= 0' afterwards.

Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Signed-off-by: David Gibson <[email protected]>

target/ppc: Use tcg_gen_lookup_and_goto_ptr

Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Daniel Henrique Barboza <[email protected]>
Signed-off-by: David Gibson <[email protected]>

s390x: change the QEMU cpu model to a stripped down z12

We are good enough to boot upstream Linux kernels / Fedora 26/27. That
should be sufficient for now.

As the QEMU CPU model is migration safe, let's add compatibility code.
Generate the feature list to reduce the chance of messing things up in the
future.

Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20171208165529 [email protected]>
[CH: squashed 's390x/cpumodel: make qemu cpu model play with "none" machine'
(20171213132407 [email protected]) and 's390x/tcg: don't include z13
features in the qemu model' (20171213171512 [email protected]) into
patch]
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: we already implement the Set-Program-Parameter facility

The Set-Program-Parameter facility (also known as Load-Program-Parameter
facility) provides the LPP instruction used to load the program
parameter. We already implement that instruction in TCG, so add it to our
list.

Note: Not documented in the PoP but in "The Load-Program-Parameter and
CPU-Measurement Facilities) - SA23-2260-05 document.

While at it, make the whole list ordered (according to cpu_features_def.h).

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20171208160207 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: implement extract-CPU-time facility

It only provides the EXTRACT CPU TIME instruction. We can reuse the stpt
helper, which calculates the CPU timer value.

As the instruction is not privileged, but we don't have a CPU timer
value in case of linux user, we simply reuse cpu_get_host_ticks() to
produce some descending value.

Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20171208160207 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: Implement SIGNAL ADAPTER instruction

KVM suppresses SIGA, setting cc=3. Let's do the same for TCG, so we're at
least equal.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20171208160207 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: Implement STORE CHANNEL PATH STATUS

Just like KVM does, we should suppress this instruction:
    When this instruction is not provided, it is
    checked for privileged operation exception and the
    instruction is suppressed by the machine

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20171208160207 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: wire up SET CHANNEL MONITOR

Let's just wire it up like KVM.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20171208160207 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: wire up SET ADDRESS LIMIT

Let's handle it just like KVM:
    Depending on the model, this instruction may not be
    provided. When this instruction is not provided, it is
    checked for operand exception and privileged-opera-
    tion exception, and then is suppressed.

Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20171208160207 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: implement Interlocked-Access Facility 2

With this facility, OI/OIY, NI/NIY and XI/XIY are atomic. All operate on
one byte (MO_UB). Emulate old behavior.

Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20171208160207 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: ASI/ASGI/ALSI/ALSGI are atomic with Interlocked-acccess facility 1

The semantics of ASI/ASGI/ALSI/ALSGI changed. Let's implement them just
like LOAD AND ADD, so they are atomic. Emulate old behavior.

This fixes random crashes when booting a Linux kernel compiled for
z196+ with SMP + MTTCG.

Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20171208160207 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: wire up STORE CHANNEL REPORT WORD

CRW machine check handling requires STCRW. So let's wire it up.

Reviewed-by: Thomas Huth <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20171208160207 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: indicate value of TODPR in STCKE

We were not yet using the value of the TOD Programmable Register.

Reviewed-by: Thomas Huth <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20171208160207 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: implement SET CLOCK PROGRAMMABLE FIELD

Needed for machine check handling inside Linux (when restoring registers).

Except for SIGP and machine checks, we don't make use of the register
yet. Sufficient for now.

Reviewed-by: Thomas Huth <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20171208160207 [email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/tcg: fix and cleanup mcck injection

The architecture mode indication wasn't stored. The split of certain
64bit fields was unnecessary. Also, the complete clock comparator, not
just bit 0-55 (starting at byte 1) was stored.

We now generate a proper MCIC via the same helper we use for KVM.

There is more to clean up, but we will change the other parts later on
either way.

Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20171208160207 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/kvm: factor out build_channel_report_mcic() into cpu.h

We'll need it later on in two places. Refactor it to just indicate the
validity bits. While at it, introduce a define for the used CR14 bit (we'll
also need later on).

Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <20171208160207 [email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/css: attach css bridge

Logically, the css bridge should be attached to the machine.

Reviewed-by: Christian Borntraeger <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Tested-by: Bjoern Walk <[email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x: deprecate s390-squash-mcss machine prop

With the cssids unrestricted (commit "s390x/css: unrestrict cssids") the
s390-squash-mcss machine property should not be used. Actually Libvirt
never supported this, so the expectation is that removing it should be
pretty painless. But let's play nice and deprecate it first.

Signed-off-by: Halil Pasic <[email protected]>
Message-Id: <20171206144438 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/css: unrestrict cssids

The default css 0xfe is currently restricted to virtual subchannel
devices. The hope when the decision was made was, that non-virtual
subchannel devices will come around when guest can exploit multiple
channel subsystems. Since the guests generally don't do, the pain
of the partitioned (cssid) namespace outweighs the gain.

Let us remove the corresponding restrictions (virtual devices
can be put only in 0xfe and non-virtual devices in any css except
the 0xfe -- while s390-squash-mcss then remaps everything to cssid 0).

At the same time, change our schema for generating css bus ids to put
both virtual and non-virtual devices into the default css (spilling over
into other css images, if needed). The intention is to deprecate
s390-squash-mcss. With this change devices without a specified devno
won't end up hidden to guests not supporting multiple channel subsystems,
unless this can not be avoided (default css full).

Let us also advertise the changes to the management software (so it can
tell are cssids unrestricted or restricted).

The adverse effect of getting rid of the restriction on migration should
not be too severe. Vfio-ccw devices are not live-migratable yet, and for
virtual devices using the extra freedom would only make sense with the
aforementioned guest support in place.

The auto-generated bus ids are affected by both changes. We hope to not
encounter any auto-generated bus ids in production as Libvirt is always
explicit about the bus id. Since 8ed179c937 ("s390x/css: catch section
mismatch on load", 2017-05-18) the worst that can happen because the same
device ended up having a different bus id is a cleanly failed migration.
I find it hard to reason about the impact of changed auto-generated bus
ids on migration for command line users as I don't know which rules is
such an user supposed to follow.

Another pain-point is down- or upgrade of QEMU for command line users.
The old way and the new way of doing vfio-ccw are mutually incompatible.
Libvirt is only going to support the new way, so for libvirt users, the
possible problems at QEMU downgrade are the following. If a domain
contains virtual devices placed into a css different than 0xfe the domain
will refuse to start with a QEMU not having this patch. Putting devices
into a css different that 0xfe however won't make much sense in the near
future (guest support). Libvirt will refuse to do vfio-ccw with a QEMU
not having this patch. This is business as usual.

Signed-off-by: Halil Pasic <[email protected]>
Acked-by: Christian Borntraeger <[email protected]>
Reviewed-by: Dong Jia Shi <[email protected]>
Message-Id: <20171206144438 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/pci: search for subregion inside the BARs

When dispatching memory access to PCI BAR region, we must
look for possible subregions, used by the PCI device to map
different memory areas inside the same PCI BAR.

Since the data offset we received is calculated starting at the
region start address we need to adjust the offset for the subregion.

The data offset inside the subregion is calculated by substracting
the subregion's starting address from the data offset in the region.

The access to the MSIX region is now handled in a generic way,
we do not need the specific trap_msix() function anymore.

Signed-off-by: Pierre Morel <[email protected]>
Reviewed-by: Yi Min Zhao <[email protected]>
Message-Id: <1512046530 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/pci: move the memory region write from pcistg

Let's move the memory region write from pcistg into a dedicated
function.
This allows us to prepare a later patch searching for subregions
inside of the memory region.

Signed-off-by: Pierre Morel <[email protected]>
Reviewed-by: Yi Min Zhao <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Message-Id: <1512046530 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>

s390x/pci: move the memory region read from pcilg

Let's move the memory region read from pcilg into a dedicated function.
This allows us to prepare a later patch.

Signed-off-by: Pierre Morel <[email protected]>
Reviewed-by: Yi Min Zhao <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Message-Id: <1512046530 [email protected]>
Signed-off-by: Cornelia Huck <[email protected]>