Git Repo - qemu.git/log

virtiofsd: Fix fuse_daemonize ignored return values

QEMU's compiler enables warnings/errors for ignored values
and the (void) trick used in the fuse code isn't enough.
Turn all the return values into a return value on the function.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Tested-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

virtiofsd: Remove unused enum fuse_buf_copy_flags

Signed-off-by: Xiao Yang <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

virtiofsd: remove unused notify reply support

Notify reply support is unused by virtiofsd. The code would need to be
updated to validate input buffer sizes. Remove this unused code since
changes to it are untestable.

Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

virtiofsd: remove mountpoint dummy argument

Classic FUSE file system daemons take a mountpoint argument but
virtiofsd exposes a vhost-user UNIX domain socket instead. The
mountpoint argument is not used by virtiofsd but the user is still
required to pass a dummy argument on the command-line.

Remove the mountpoint argument to clean up the command-line.

Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

virtiofsd: Format imported files to qemu style

Mostly using a set like:

indent -nut -i 4 -nlp -br -cs -ce --no-space-after-function-call-names file
clang-format -style=file -i -- file
clang-tidy -fix-errors -checks=readability-braces-around-statements file
clang-format -style=file -i -- file

With manual cleanups.

The .clang-format used is below.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed by: Aleksandar Markovic <[email protected]>

Language:        Cpp
AlignAfterOpenBracket: Align
AlignConsecutiveAssignments: false # although we like it, it creates churn
AlignConsecutiveDeclarations: false
AlignEscapedNewlinesLeft: true
AlignOperands:   true
AlignTrailingComments: false # churn
AllowAllParametersOfDeclarationOnNextLine: true
AllowShortBlocksOnASingleLine: false
AllowShortCaseLabelsOnASingleLine: false
AllowShortFunctionsOnASingleLine: None
AllowShortIfStatementsOnASingleLine: false
AllowShortLoopsOnASingleLine: false
AlwaysBreakAfterReturnType: None # AlwaysBreakAfterDefinitionReturnType is taken into account
AlwaysBreakBeforeMultilineStrings: false
BinPackArguments: true
BinPackParameters: true
BraceWrapping:
  AfterControlStatement: false
  AfterEnum:       false
  AfterFunction:   true
  AfterStruct:     false
  AfterUnion:      false
  BeforeElse:      false
  IndentBraces:    false
BreakBeforeBinaryOperators: None
BreakBeforeBraces: Custom
BreakBeforeTernaryOperators: false
BreakStringLiterals: true
ColumnLimit:     80
ContinuationIndentWidth: 4
Cpp11BracedListStyle: false
DerivePointerAlignment: false
DisableFormat:   false
ForEachMacros:   [
  'CPU_FOREACH',
  'CPU_FOREACH_REVERSE',
  'CPU_FOREACH_SAFE',
  'IOMMU_NOTIFIER_FOREACH',
  'QLIST_FOREACH',
  'QLIST_FOREACH_ENTRY',
  'QLIST_FOREACH_RCU',
  'QLIST_FOREACH_SAFE',
  'QLIST_FOREACH_SAFE_RCU',
  'QSIMPLEQ_FOREACH',
  'QSIMPLEQ_FOREACH_SAFE',
  'QSLIST_FOREACH',
  'QSLIST_FOREACH_SAFE',
  'QTAILQ_FOREACH',
  'QTAILQ_FOREACH_REVERSE',
  'QTAILQ_FOREACH_SAFE',
  'QTAILQ_RAW_FOREACH',
  'RAMBLOCK_FOREACH'
]
IncludeCategories:
  - Regex:           '^"qemu/osdep.h'
    Priority:        -3
  - Regex:           '^"(block|chardev|crypto|disas|exec|fpu|hw|io|libdecnumber|migration|monitor|net|qapi|qemu|qom|standard-headers|sysemu|ui)/'
    Priority:        -2
  - Regex:           '^"(elf.h|qemu-common.h|glib-compat.h|qemu-io.h|trace-tcg.h)'
    Priority:        -1
  - Regex:           '.*'
    Priority:        1
IncludeIsMainRegex: '$'
IndentCaseLabels: false
IndentWidth:     4
IndentWrappedFunctionNames: false
KeepEmptyLinesAtTheStartOfBlocks: false
MacroBlockBegin: '.*_BEGIN$' # only PREC_BEGIN ?
MacroBlockEnd:   '.*_END$'
MaxEmptyLinesToKeep: 2
PointerAlignment: Right
ReflowComments:  true
SortIncludes:    true
SpaceAfterCStyleCast: false
SpaceBeforeAssignmentOperators: true
SpaceBeforeParens: ControlStatements
SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 1
SpacesInContainerLiterals: true
SpacesInParentheses: false
SpacesInSquareBrackets: false
Standard:        Auto
UseTab:          Never
...

Signed-off-by: Dr. David Alan Gilbert <[email protected]>

virtiofsd: Trim down imported files

There's a lot of the original fuse code we don't need; trim them down.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
with additional trimming by:
Signed-off-by: Misono Tomohiro <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Xiao Yang <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

virtiofsd: Add passthrough_ll

passthrough_ll is one of the examples in the upstream fuse project
and is the main part of our daemon here.  It passes through requests
from fuse to the underlying filesystem, using syscalls as directly
as possible.

From libfuse fuse-3.8.0

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
  Fixed up 'GPL' to 'GPLv2' as per Dan's comments and consistent
  with the 'LICENSE' file in libfuse;  patch sent to libfuse to fix
  it upstream.
Reviewed-by: Daniel P. Berrangé <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

virtiofsd: Add fuse_lowlevel.c

fuse_lowlevel is one of the largest files from the library
and does most of the work. Add it separately to keep the diff
sizes small.
Again this is from upstream fuse-3.8.0

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

virtiofsd: Add auxiliary .c's

Add most of the non-main .c files we need from upstream fuse-3.8.0

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

virtiofsd: Pull in kernel's fuse.h

Update scripts/update-linux-headers.sh to add fuse.h and
use it to pull in fuse.h from the kernel; from v5.5-rc1

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

virtiofsd: Pull in upstream headers

Pull in headers fromlibfuse's upstream fuse-3.8.0

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>

Merge remote-tracking branch 'remotes/vivier2/tags/linux-user-for-5.0-pull-request' into staging

Fix mmap guest space and brk
Add FS/FD/RTC/KCOV ioctls

# gpg: Signature made Thu 23 Jan 2020 08:21:41 GMT
# gpg:                using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Laurent Vivier <[email protected]>" [full]
# gpg:                 aka "Laurent Vivier <[email protected]>" [full]
# gpg:                 aka "Laurent Vivier (Red Hat) <[email protected]>" [full]
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier2/tags/linux-user-for-5.0-pull-request:
  linux-user: Add support for read/clear RTC voltage low detector using ioctls
  linux-user: Add support for getting/setting RTC PLL correction using ioctls
  linux-user: Add support for getting/setting RTC wakeup alarm using ioctls
  linux-user: Add support for getting/setting RTC periodic interrupt and epoch using ioctls
  linux-user: Add support for getting/setting RTC time and alarm using ioctls
  linux-user: Add support for enabling/disabling RTC features using ioctls
  linux-user: Add support for TYPE_LONG and TYPE_ULONG in do_ioctl()
  linux-user: Add support for KCOV_INIT_TRACE ioctl
  linux-user: Add support for KCOV_<ENABLE|DISABLE> ioctls
  configure: Detect kcov support and introduce CONFIG_KCOV
  linux-user: Add support for FDFMT<BEG|TRK|END> ioctls
  linux-user: Add support for FD<SETEMSGTRESH|SETMAXERRS|GETMAXERRS> ioctls
  linux-user: Add support for FS_IOC32_<GET|SET>VERSION ioctls
  linux-user: Add support for FS_IOC32_<GET|SET>FLAGS ioctls
  linux-user: Add support for FS_IOC_<GET|SET>VERSION ioctls
  linux-user: Reserve space for brk
  linux-user:Fix align mistake when mmap guest space

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

virtio, pc: fixes, features

Bugfixes all over the place.
CPU hotplug with secureboot.

Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Thu 23 Jan 2020 07:08:32 GMT
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>" [full]
# gpg:                 aka "Michael S. Tsirkin <[email protected]>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream:
  vhost: coding style fix
  i386:acpi: Remove _HID from the SMBus ACPI entry
  vhost: Only align sections for vhost-user
  vhost: Add names to section rounded warning
  vhost-vsock: delete vqs in vhost_vsock_unrealize to avoid memleaks
  virtio-scsi: convert to new virtio_delete_queue
  virtio-scsi: delete vqs in unrealize to avoid memleaks
  virtio-9p-device: convert to new virtio_delete_queue
  virtio-9p-device: fix memleak in virtio_9p_device_unrealize
  bios-tables-test: document expected file update
  acpi: cpuhp: add CPHP_GET_CPU_ID_CMD command
  acpi: cpuhp: spec: add typical usecases
  acpi: cpuhp: introduce 'Command data 2' field
  acpi: cpuhp: spec: clarify store into 'Command data' when 'Command field' == 0
  acpi: cpuhp: spec: fix 'Command data' description
  acpi: cpuhp: spec: clarify 'CPU selector' register usage and endianness
  tests: q35: MCH: add default SMBASE SMRAM lock test
  q35: implement 128K SMRAM at default SMBASE address

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20200121' into staging

Remove another limit to NB_MMU_MODES.
Fix compilation using uclibc.
Fix defaulting of -accel parameters.
Tidy cputlb basic routines.
Adjust git.orderfile for decodetree.

# gpg: Signature made Wed 22 Jan 2020 02:44:18 GMT
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Richard Henderson <[email protected]>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth/tags/pull-tcg-20200121:
  scripts/git.orderfile: Display decodetree before C source
  cputlb: Hoist timestamp outside of loops over tlbs
  cputlb: Initialize tlbs as flushed
  cputlb: Partially merge tlb_dyn_init into tlb_init
  cputlb: Split out tlb_mmu_flush_locked
  cputlb: Hoist tlb portions in tlb_flush_one_mmuidx_locked
  cputlb: Hoist tlb portions in tlb_mmu_resize_locked
  cputlb: Pass CPUTLBDescFast to tlb_n_entries and sizeof_tlb
  cputlb: Make tlb_n_entries private to cputlb.c
  cputlb: Merge tlb_table_flush_by_mmuidx into tlb_flush_one_mmuidx_locked
  vl: Only choose enabled accelerators in configure_accelerators
  vl: Remove useless test in configure_accelerators
  vl: Reduce scope of variables in configure_accelerators
  vl: Remove unused variable in configure_accelerators
  util/cacheinfo: fix crash when compiling with uClibc
  cputlb: Handle NB_MMU_MODES > TARGET_PAGE_BITS_MIN

Signed-off-by: Peter Maydell <[email protected]>

vhost: coding style fix

Drop a trailing whitespace. Make line shorter.

Fixes: 76525114736e8 ("vhost: Only align sections for vhost-user")
Signed-off-by: Michael S. Tsirkin <[email protected]>

linux-user: Add support for read/clear RTC voltage low detector using ioctls

This patch implements functionalities of following ioctls:

RTC_VL_READ - Read voltage low detection information

    Read the voltage low for RTCs that support voltage low.
    The third ioctl's' argument points to an int in which
    the voltage low is returned.

RTC_VL_CLR - Clear voltage low information

    Clear the information about voltage low for RTCs that
    support voltage low. The third ioctl(2) argument is
    ignored.

Implementation notes:

    Since one ioctl has a pointer to 'int' as its third agrument,
    and another ioctl has NULL as its third argument, their
    implementation was straightforward.

Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Filip Bozuta <[email protected]>
Message-Id: <1579117007 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Add support for getting/setting RTC PLL correction using ioctls

This patch implements functionalities of following ioctls:

RTC_PLL_GET - Getting PLL correction

    Read the PLL correction for RTCs that support PLL. The PLL correction
    is returned in the following structure:

        struct rtc_pll_info {
            int pll_ctrl;        /* placeholder for fancier control */
            int pll_value;       /* get/set correction value */
            int pll_max;         /* max +ve (faster) adjustment value */
            int pll_min;         /* max -ve (slower) adjustment value */
            int pll_posmult;     /* factor for +ve correction */
            int pll_negmult;     /* factor for -ve correction */
            long pll_clock;      /* base PLL frequency */
        };

    A pointer to this structure should be passed as the third
    ioctl's argument.

RTC_PLL_SET - Setting PLL correction

    Sets the PLL correction for RTCs that support PLL. The PLL correction
    that is set is specified by the rtc_pll_info structure pointed to by
    the third ioctl's' argument.

Implementation notes:

    All ioctls in this patch have a pointer to a structure rtc_pll_info
    as their third argument. All elements of this structure are of
    type 'int', except the last one that is of type 'long'. That is
    the reason why a separate target structure (target_rtc_pll_info)
    is defined in linux-user/syscall_defs. The rest of the
    implementation is straightforward.

Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Filip Bozuta <[email protected]>
Message-Id: <1579117007 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Add support for getting/setting RTC wakeup alarm using ioctls

This patch implements functionalities of following ioctls:

RTC_WKALM_SET, RTC_WKALM_GET - Getting/Setting wakeup alarm

    Some RTCs support a more powerful alarm interface, using these
    ioctls to read or write the RTC's alarm time (respectively)
    with this structure:

        struct rtc_wkalrm {
            unsigned char enabled;
            unsigned char pending;
            struct rtc_time time;
        };

    The enabled flag is used to enable or disable the alarm
    interrupt, or to read its current status; when using these
    calls, RTC_AIE_ON and RTC_AIE_OFF are not used. The pending
    flag is used by RTC_WKALM_RD to report a pending interrupt
    (so it's mostly useless on Linux, except when talking to the
    RTC managed by EFI firmware). The time field is as used with
    RTC_ALM_READ and RTC_ALM_SET except that the tm_mday, tm_mon,
    and tm_year fields are also valid. A pointer to this structure
    should be passed as the third ioctl's argument.

Implementation notes:

    All ioctls in this patch have a pointer to a structure
    rtc_wkalrm as their third argument. That is the reason why
    corresponding definition is added in linux-user/syscall_types.h.
    Since all  elements of this structure are either of type
    'unsigned char' or 'struct rtc_time' (that was covered in one
    of previous patches), the rest of the implementation is
    straightforward.

Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Filip Bozuta <[email protected]>
Message-Id: <1579117007 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Add support for getting/setting RTC periodic interrupt and epoch using ioctls

This patch implements functionalities of following ioctls:

RTC_IRQP_READ, RTC_IRQP_SET - Getting/Setting IRQ rate

    Read and set the frequency for periodic interrupts, for RTCs
    that support periodic interrupts. The periodic interrupt must
    be separately enabled or disabled using the RTC_PIE_ON,
    RTC_PIE_OFF requests. The third ioctl's argument is an
    unsigned long * or an unsigned long, respectively. The value
    is the frequency in interrupts per second. The set of allow‐
    able frequencies is the multiples of two in the range 2 to
    8192. Only a privileged process (i.e., one having the
    CAP_SYS_RESOURCE capability) can set frequencies above the
    value specified in /proc/sys/dev/rtc/max-user-freq. (This
    file contains the value 64 by default.)

RTC_EPOCH_READ, RTC_EPOCH_SET - Getting/Setting epoch

    Many RTCs encode the year in an 8-bit register which is either
    interpreted as an 8-bit binary number or as a BCD number. In
    both cases, the number is interpreted relative to this RTC's
    Epoch. The RTC's Epoch is initialized to 1900 on most systems
    but on Alpha and MIPS it might also be initialized to 1952,
    1980, or 2000, depending on the value of an RTC register for
    the year. With some RTCs, these operations can be used to
    read or to set the RTC's Epoch, respectively. The third
    ioctl's argument is an unsigned long * or an unsigned long,
    respectively, and the value returned (or assigned) is the
    Epoch. To set the RTC's Epoch the process must be privileged
    (i.e., have the CAP_SYS_TIME capability).

Implementation notes:

    All ioctls in this patch have a pointer to 'ulong' as their
    third argument. That is the reason why corresponding parts
    of added code in linux-user/syscall_defs.h contain special
    handling related to 'ulong' type: they use 'abi_ulong' type
    to make sure that ioctl's code is calculated correctly for
    both 32-bit and 64-bit targets. Also, 'MK_PTR(TYPE_ULONG)'
    is used for the similar reason in linux-user/ioctls.h.

Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Filip Bozuta <[email protected]>
Message-Id: <1579117007 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Add support for getting/setting RTC time and alarm using ioctls

This patch implements functionalities of following ioctls:

RTC_RD_TIME - Getting RTC time

    Returns this RTC's time in the following structure:

        struct rtc_time {
            int tm_sec;
            int tm_min;
            int tm_hour;
            int tm_mday;
            int tm_mon;
            int tm_year;
            int tm_wday;     /* unused */
            int tm_yday;     /* unused */
            int tm_isdst;    /* unused */
        };

    The fields in this structure have the same meaning and ranges
    as the tm structure described in gmtime man page. A pointer
    to this structure should be passed as the third ioctl's argument.

RTC_SET_TIME - Setting RTC time

    Sets this RTC's time to the time specified by the rtc_time
    structure pointed to by the third ioctl's argument. To set
    the RTC's time the process must be privileged (i.e., have the
    CAP_SYS_TIME capability).

RTC_ALM_READ, RTC_ALM_SET - Getting/Setting alarm time

    Read and set the alarm time, for RTCs that support alarms.
    The alarm interrupt must be separately enabled or disabled
    using the RTC_AIE_ON, RTC_AIE_OFF requests. The third
    ioctl's argument is a pointer to a rtc_time structure. Only
    the tm_sec, tm_min, and tm_hour fields of this structure are
    used.

Implementation notes:

    All ioctls in this patch have pointer to a structure rtc_time
    as their third argument. That is the reason why corresponding
    definition is added in linux-user/syscall_types.h. Since all
    elements of this structure are of type 'int', the rest of the
    implementation is straightforward.

Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Filip Bozuta <[email protected]>
Message-Id: <1579117007 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Add support for enabling/disabling RTC features using ioctls

This patch implements functionalities of following ioctls:

RTC_AIE_ON, RTC_AIE_OFF - Alarm interrupt enabling on/off

    Enable or disable the alarm interrupt, for RTCs that support
    alarms.  The third ioctl's argument is ignored.

RTC_UIE_ON, RTC_UIE_OFF - Update interrupt enabling on/off

    Enable or disable the interrupt on every clock update, for
    RTCs that support this once-per-second interrupt. The third
    ioctl's argument is ignored.

RTC_PIE_ON, RTC_PIE_OFF - Periodic interrupt enabling on/off

    Enable or disable the periodic interrupt, for RTCs that sup‐
    port these periodic interrupts. The third ioctl's argument
    is ignored. Only a privileged process (i.e., one having the
    CAP_SYS_RESOURCE capability) can enable the periodic interrupt
    if the frequency is currently set above the value specified in
    /proc/sys/dev/rtc/max-user-freq.

RTC_WIE_ON, RTC_WIE_OFF - Watchdog interrupt enabling on/off

    Enable or disable the Watchdog interrupt, for RTCs that sup-
    port this Watchdog interrupt. The third ioctl's argument is
    ignored.

Implementation notes:

    Since all of involved ioctls have NULL as their third argument,
    their implementation was straightforward.

    The line '#include <linux/rtc.h>' was added to recognize
    preprocessor definitions for these ioctls. This needs to be
    done only once in this series of commits. Also, the content
    of this file (with respect to ioctl definitions) remained
    unchanged for a long time, therefore there is no need to
    worry about supporting older Linux kernel version.

Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Filip Bozuta <[email protected]>
Message-Id: <1579117007 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Add support for TYPE_LONG and TYPE_ULONG in do_ioctl()

Function "do_ioctl()" located in file "syscall.c" was missing
an option for TYPE_LONG and TYPE_ULONG. This caused some ioctls
to not be recognised because they had the third argument that was
of type 'long' or 'unsigned long'.

For example:

Since implemented ioctls RTC_IRQP_SET and RTC_EPOCH_SET
are of type IOW(writing type) that have unsigned long as
their third argument, they were not recognised in QEMU
before the changes of this patch.

Signed-off-by: Filip Bozuta <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Message-Id: <1579117007 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Add support for KCOV_INIT_TRACE ioctl

KCOV_INIT_TRACE ioctl plays the role in kernel coverage tracing.
This ioctl's third argument is of type 'unsigned long', and the
implementation in QEMU is straightforward.

Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Aleksandar Markovic <[email protected]>
Message-Id: <1579214991 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Add support for KCOV_<ENABLE|DISABLE> ioctls

KCOV_ENABLE and KCOV_DISABLE play the role in kernel coverage
tracing. These ioctls do not use the third argument of ioctl()
system call and are straightforward to implement in QEMU.

Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Aleksandar Markovic <[email protected]>
Message-Id: <1579214991 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

configure: Detect kcov support and introduce CONFIG_KCOV

kcov is kernel code coverage tracing tool. It requires kernel 4.4+
compiled with certain kernel options.

This patch checks if kcov header "sys/kcov.h" is present on build
machine, and stores the result in variable CONFIG_KCOV, meant to
be used in linux-user code related to the support for three ioctls
that were introduced at the same time as the mentioned header
(their definition was a part of the first version of that header).

Signed-off-by: Aleksandar Markovic <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Message-Id: <1579214991 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Add support for FDFMT<BEG|TRK|END> ioctls

FDFMTBEG, FDFMTTRK, and FDFMTEND ioctls provide means for controlling
formatting of a floppy drive.

FDFMTTRK's third agrument is a pointer to the structure:

struct format_descr {
unsigned int device,head,track;
};

defined in Linux kernel header <linux/fd.h>.

Since all fields of the structure are of type 'unsigned int', there is
no need to define "target_format_descr".

FDFMTBEG and FDFMTEND ioctls do not use the third argument.

Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Aleksandar Markovic <[email protected]>
Message-Id: <1579214991 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Add support for FD<SETEMSGTRESH|SETMAXERRS|GETMAXERRS> ioctls

FDSETEMSGTRESH, FDSETMAXERRS, and FDGETMAXERRS ioctls are commands
for controlling error reporting of a floppy drive.

FDSETEMSGTRESH's third agrument is a pointer to the structure:

struct floppy_max_errors {
    unsigned int
      abort,      /* number of errors to be reached before aborting */
      read_track, /* maximal number of errors permitted to read an
                   * entire track at once */
      reset,      /* maximal number of errors before a reset is tried */
      recal,      /* maximal number of errors before a recalibrate is
                   * tried */
      /*
       * Threshold for reporting FDC errors to the console.
       * Setting this to zero may flood your screen when using
       * ultra cheap floppies ;-)
       */
      reporting;
};

defined in Linux kernel header <linux/fd.h>.

Since all fields of the structure are of type 'unsigned int', there is
no need to define "target_floppy_max_errors".

FDSETMAXERRS and FDGETMAXERRS ioctls do not use the third argument.

Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Aleksandar Markovic <[email protected]>
Message-Id: <1579214991 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Add support for FS_IOC32_<GET|SET>VERSION ioctls

These FS_IOC32_<GET|SET>VERSION ioctls are identical to
FS_IOC_<GET|SET>VERSION ioctls, but without the anomaly of their
number defined as if their third argument is of type long, while
it is treated internally in kernel as is of type int.

Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Aleksandar Markovic <[email protected]>
Message-Id: <1579214991 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Add support for FS_IOC32_<GET|SET>FLAGS ioctls

These FS_IOC32_<GET|SET>FLAGS ioctls are identical to
FS_IOC_<GET|SET>FLAGS ioctls, but without the anomaly of their
number defined as if their third argument is of type long, while
it is treated internally in kernel as is of type int.

Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Aleksandar Markovic <[email protected]>
Message-Id: <1579214991 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Add support for FS_IOC_<GET|SET>VERSION ioctls

A very specific thing for these two ioctls is that their code
implies that their third argument is of type 'long', but the
kernel uses that argument as if it is of type 'int'. This anomaly
is recognized also in commit 6080723 (linux-user: Implement
FS_IOC_GETFLAGS and FS_IOC_SETFLAGS ioctls).

Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Aleksandar Markovic <[email protected]>
Message-Id: <1579214991 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user: Reserve space for brk

With bad luck, we can wind up with no space at all for brk,
which will generally cause the guest malloc to fail.

This bad luck is easier to come by with ET_DYN (PIE) binaries,
where either the stack or the interpreter (ld.so) gets placed
immediately after the main executable.

But there's nothing preventing this same thing from happening
with ET_EXEC (normal) binaries, during probe_guest_base().

In both cases, reserve some extra space via mmap and release
it back to the system after loading the interpreter and
allocating the stack.

The choice of 16MB is somewhat arbitrary. It's enough for libc
to get going, but without being so large that 32-bit guests or
32-bit hosts are in danger of running out of virtual address space.
It is expected that libc will be able to fall back to mmap arenas
after the limited brk space is exhausted.

Launchpad: https://bugs.launchpad.net/qemu/+bug/1749393
Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Tested-by: Alex Bennée <[email protected]>
Message-Id: <20200117230245 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

linux-user:Fix align mistake when mmap guest space

In init_guest_space, we need to mmap guest space. If the return address
of first mmap is not aligned with align, which was set to MAX(SHMLBA,
qemu_host_page_size), we need unmap and a new mmap(space is larger than
first size). The new size is named real_size, which is aligned_size +
qemu_host_page_size. alugned_size is the guest space size. And add a
qemu_host_page_size to avoid memory error when we align real_start
manually (ROUND_UP(real_start, align)). But when SHMLBA >
qemu_host_page_size, the added size will smaller than the size to align,
which can make a mistake(in a mips machine, it appears). So change
real_size from aligned_size +qemu_host_page_size
to aligned_size + align will solve it.

Signed-off-by: Xinyu Li <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <20191213022919 [email protected]>
Signed-off-by: Laurent Vivier <[email protected]>

i386:acpi: Remove _HID from the SMBus ACPI entry

Per the ACPI spec (version 6.1, section 6.1.5 _HID) it is not required
on enumerated buses (like PCI in this case), _ADR is required (and is
already there). And the _HID value is wrong. Linux appears to ignore
the _HID entry, but Windows 10 detects it as 'Unknown Device' and there
is no driver available. See https://bugs.launchpad.net/qemu/+bug/1856724

Signed-off-by: Corey Minyard <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Igor Mammedov <[email protected]>
Reviewed-by: Igor Mammedov <[email protected]>
Message-Id: <20200120170725 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost: Only align sections for vhost-user

I added hugepage alignment code in c1ece84e7c9 to deal with
vhost-user + postcopy which needs aligned pages when using userfault.
However, on x86 the lower 2MB of address space tends to be shotgun'd
with small fragments around the 512-640k range - e.g. video RAM, and
with HyperV synic pages tend to sit around there - again splitting
it up. The alignment code complains with a 'Section rounded to ...'
error and gives up.

Since vhost-user already filters out devices without an fd
(see vhost-user.c vhost_user_mem_section_filter) it shouldn't be
affected by those overlaps.

Turn the alignment off on vhost-kernel so that it doesn't try
and align, and thus won't hit the rounding issues.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Message-Id: <20200116202414 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>

vhost: Add names to section rounded warning

Add the memory region names to section rounding/alignment
warnings.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Message-Id: <20200116202414 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost-vsock: delete vqs in vhost_vsock_unrealize to avoid memleaks

Receive/transmit/event vqs forgot to cleanup in vhost_vsock_unrealize. This
patch save receive/transmit vq pointer in realize() and cleanup vqs
through those vq pointers in unrealize(). The leak stack is as follow:

Direct leak of 21504 byte(s) in 3 object(s) allocated from:
  #0 0x7f86a1356970 (/lib64/libasan.so.5+0xef970)  ??:?
  #1 0x7f86a09aa49d (/lib64/libglib-2.0.so.0+0x5249d)  ??:?
  #2 0x5604852f85ca (./x86_64-softmmu/qemu-system-x86_64+0x2c3e5ca)  /mnt/sdb/qemu/hw/virtio/virtio.c:2333
  #3 0x560485356208 (./x86_64-softmmu/qemu-system-x86_64+0x2c9c208)  /mnt/sdb/qemu/hw/virtio/vhost-vsock.c:339
  #4 0x560485305a17 (./x86_64-softmmu/qemu-system-x86_64+0x2c4ba17)  /mnt/sdb/qemu/hw/virtio/virtio.c:3531
  #5 0x5604858e6b65 (./x86_64-softmmu/qemu-system-x86_64+0x322cb65)  /mnt/sdb/qemu/hw/core/qdev.c:865
  #6 0x5604861e6c41 (./x86_64-softmmu/qemu-system-x86_64+0x3b2cc41)  /mnt/sdb/qemu/qom/object.c:2102

Reported-by: Euler Robot <[email protected]>
Signed-off-by: Pan Nengyuan <[email protected]>
Message-Id: <20200115062535 [email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio-scsi: convert to new virtio_delete_queue

Use virtio_delete_queue to make it more clear.

Signed-off-by: Pan Nengyuan <[email protected]>
Message-Id: <20200117075547 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>

virtio-scsi: delete vqs in unrealize to avoid memleaks

This patch fix memleaks when attaching/detaching virtio-scsi device, the
memory leak stack is as follow:

Direct leak of 21504 byte(s) in 3 object(s) allocated from:
  #0 0x7f491f2f2970 (/lib64/libasan.so.5+0xef970)  ??:?
  #1 0x7f491e94649d (/lib64/libglib-2.0.so.0+0x5249d)  ??:?
  #2 0x564d0f3919fa (./x86_64-softmmu/qemu-system-x86_64+0x2c3e9fa)  /mnt/sdb/qemu/hw/virtio/virtio.c:2333
  #3 0x564d0f2eca55 (./x86_64-softmmu/qemu-system-x86_64+0x2b99a55)  /mnt/sdb/qemu/hw/scsi/virtio-scsi.c:912
  #4 0x564d0f2ece7b (./x86_64-softmmu/qemu-system-x86_64+0x2b99e7b)  /mnt/sdb/qemu/hw/scsi/virtio-scsi.c:924
  #5 0x564d0f39ee47 (./x86_64-softmmu/qemu-system-x86_64+0x2c4be47)  /mnt/sdb/qemu/hw/virtio/virtio.c:3531
  #6 0x564d0f980224 (./x86_64-softmmu/qemu-system-x86_64+0x322d224)  /mnt/sdb/qemu/hw/core/qdev.c:865

Reported-by: Euler Robot <[email protected]>
Signed-off-by: Pan Nengyuan <[email protected]>
Message-Id: <20200117075547 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>

virtio-9p-device: convert to new virtio_delete_queue

Use virtio_delete_queue to make it more clear.

Signed-off-by: Pan Nengyuan <[email protected]>
Message-Id: <20200117060927 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Christian Schoenebeck <[email protected]>

virtio-9p-device: fix memleak in virtio_9p_device_unrealize

v->vq forgot to cleanup in virtio_9p_device_unrealize, the memory leak
stack is as follow:

Direct leak of 14336 byte(s) in 2 object(s) allocated from:
  #0 0x7f819ae43970 (/lib64/libasan.so.5+0xef970)  ??:?
  #1 0x7f819872f49d (/lib64/libglib-2.0.so.0+0x5249d)  ??:?
  #2 0x55a3a58da624 (./x86_64-softmmu/qemu-system-x86_64+0x2c14624)  /mnt/sdb/qemu/hw/virtio/virtio.c:2327
  #3 0x55a3a571bac7 (./x86_64-softmmu/qemu-system-x86_64+0x2a55ac7)  /mnt/sdb/qemu/hw/9pfs/virtio-9p-device.c:209
  #4 0x55a3a58e7bc6 (./x86_64-softmmu/qemu-system-x86_64+0x2c21bc6)  /mnt/sdb/qemu/hw/virtio/virtio.c:3504
  #5 0x55a3a5ebfb37 (./x86_64-softmmu/qemu-system-x86_64+0x31f9b37)  /mnt/sdb/qemu/hw/core/qdev.c:876

Reported-by: Euler Robot <[email protected]>
Signed-off-by: Pan Nengyuan <[email protected]>
Message-Id: <20200117060927 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Christian Schoenebeck <[email protected]>
Acked-by: Greg Kurz <[email protected]>

bios-tables-test: document expected file update

Document the flow for the case where contributor
updates the expected files.

Signed-off-by: Michael S. Tsirkin <[email protected]>

acpi: cpuhp: add CPHP_GET_CPU_ID_CMD command

Firmware can enumerate present at boot APs by broadcasting wakeup IPI,
so that woken up secondary CPUs could register them-selves.
However in CPU hotplug case, it would need to know architecture
specific CPU IDs for possible and hotplugged CPUs so it could
prepare environment for and wake hotplugged AP.

Reuse and extend existing CPU hotplug interface to return architecture
specific ID for currently selected CPU in 2 registers:
- lower 32 bits in ACPI_CPU_CMD_DATA_OFFSET_RW
- upper 32 bits in ACPI_CPU_CMD_DATA2_OFFSET_R

On x86, firmware will use CPHP_GET_CPU_ID_CMD for fetching the APIC ID
when handling hotplug SMI.

Later, CPHP_GET_CPU_ID_CMD will be used on ARM to retrieve MPIDR,
which serves the similar to APIC ID purpose.

Signed-off-by: Igor Mammedov <[email protected]>
Message-Id: <1575896942 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Laszlo Ersek <[email protected]>

acpi: cpuhp: spec: add typical usecases

Document work-flows for
  * enabling/detecting modern CPU hotplug interface
  * finding a CPU with pending 'insert/remove' event
  * enumerating present and possible CPUs

Signed-off-by: Igor Mammedov <[email protected]>
Message-Id: <1575896942 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Laszlo Ersek <[email protected]>

acpi: cpuhp: introduce 'Command data 2' field

No functional change in practice, patch only aims to properly
document (in spec and code) intended usage of the reserved space.

The new field is to be used for 2 purposes:
  - detection of modern CPU hotplug interface using
    CPHP_GET_NEXT_CPU_WITH_EVENT_CMD command.
    procedure will be described in follow up patch:
      "acpi: cpuhp: spec: add typical usecases"
  - for returning upper 32 bits of architecture specific CPU ID,
    for new CPHP_GET_CPU_ID_CMD command added by follow up patch:
      "acpi: cpuhp: add CPHP_GET_CPU_ID_CMD command"

Change is backward compatible with 4.2 and older machines, as field was
unconditionally reserved and always returned 0x0 if modern CPU hotplug
interface was enabled.

Signed-off-by: Igor Mammedov <[email protected]>
Message-Id: <1575896942 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Laszlo Ersek <[email protected]>

acpi: cpuhp: spec: clarify store into 'Command data' when 'Command field' == 0

Write section of 'Command data' register should describe what happens
when it's written into. Correct description in case the last stored
'Command field' value is equal to 0, to reflect that currently it's not
supported.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Laszlo Ersek <[email protected]>
Message-Id: <1575896942 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

acpi: cpuhp: spec: fix 'Command data' description

Correct returned value description in case 'Command field' == 0x0,
it's not PXM but CPU selector value with pending event

In addition describe 0 blanket value in case of not supported
'Command field' value.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Laszlo Ersek <[email protected]>
Message-Id: <1575896942 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

acpi: cpuhp: spec: clarify 'CPU selector' register usage and endianness

* Move reserved registers to the top of the section, so reader would be
aware of effects when reading registers description.
* State registers endianness explicitly at the beginning of the section
* Describe registers behavior in case of 'CPU selector' register contains
value that doesn't point to a possible CPU.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Laszlo Ersek <[email protected]>
Message-Id: <1575896942 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

tests: q35: MCH: add default SMBASE SMRAM lock test

test lockable SMRAM at default SMBASE feature, introduced by
patch "q35: implement 128K SMRAM at default SMBASE address"

Signed-off-by: Igor Mammedov <[email protected]>
Message-Id: <1575899217 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

q35: implement 128K SMRAM at default SMBASE address

It's not what real HW does, implementing which would be overkill [**]
and would require complex cross stack changes (QEMU+firmware) to make
it work.
So considering that SMRAM is owned by MCH, for simplicity (ab)use
reserved Q35 register, which allows QEMU and firmware easily init
and make RAM at SMBASE available only from SMM context.

Patch uses commit (2f295167e0 q35/mch: implement extended TSEG sizes)
for inspiration and uses reserved register in config space at 0x9c
offset [*] to extend q35 pci-host with ability to use 128K at
0x30000 as SMRAM and hide it (like TSEG) from non-SMM context.

Usage:
  1: write 0xff in the register
  2: if the feature is supported, follow up read from the register
     should return 0x01. At this point RAM at 0x30000 is still
     available for SMI handler configuration from non-SMM context
  3: writing 0x02 in the register, locks SMBASE area, making its contents
     available only from SMM context. In non-SMM context, reads return
     0xff and writes are ignored. Further writes into the register are
     ignored until the system reset.

*) https://www.mail-archive.com/[email protected]/msg455991.html
**) https://www.mail-archive.com/[email protected]/msg646965.html

Signed-off-by: Igor Mammedov <[email protected]>
Message-Id: <1575896942 [email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Tested-by: Laszlo Ersek <[email protected]>

scripts/git.orderfile: Display decodetree before C source

To avoid scrolling each instruction when reviewing tcg
helpers written for the decodetree script, display the
.decode files (similar to header declarations) before
the C source (implementation of previous declarations).

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Message-Id: <20191230082856 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

cputlb: Hoist timestamp outside of loops over tlbs

Do not call get_clock_realtime() in tlb_mmu_resize_locked,
but hoist outside of any loop over a set of tlbs. This is
only two (indirect) callers, tlb_flush_by_mmuidx_async_work
and tlb_flush_page_locked, so not onerous.

Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

cputlb: Initialize tlbs as flushed

There's little point in leaving these data structures half initialized,
and relying on a flush to be done during reset.

Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

cputlb: Partially merge tlb_dyn_init into tlb_init

Merge into the only caller, but at the same time split
out tlb_mmu_init to initialize a single tlb entry.

Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

cputlb: Split out tlb_mmu_flush_locked

We will want to be able to flush a tlb without resizing.

Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

cputlb: Hoist tlb portions in tlb_flush_one_mmuidx_locked

No functional change, but the smaller expressions make
the code easier to read.

Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

cputlb: Hoist tlb portions in tlb_mmu_resize_locked

No functional change, but the smaller expressions make
the code easier to read.

Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

cputlb: Pass CPUTLBDescFast to tlb_n_entries and sizeof_tlb

We do not need the entire CPUArchState to compute these values.

Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

cputlb: Make tlb_n_entries private to cputlb.c

There are no users of this function outside cputlb.c,
and its interface will change in the next patch.

Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

cputlb: Merge tlb_table_flush_by_mmuidx into tlb_flush_one_mmuidx_locked

There is only one caller for tlb_table_flush_by_mmuidx. Place
the result at the earlier line number, due to an expected user
in the near future.

Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

vl: Only choose enabled accelerators in configure_accelerators

By choosing "tcg:kvm" when kvm is not enabled, we generate
an incorrect warning: "invalid accelerator kvm".

At the same time, use g_str_has_suffix rather than open-coding
the same operation.

Presumably the inverse is also true with --disable-tcg.

Fixes: 28a0961757fc
Acked-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reviewed by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

vl: Remove useless test in configure_accelerators

The result of g_strsplit is never NULL.

Acked-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

vl: Reduce scope of variables in configure_accelerators

The accel_list and tmp variables are only used when manufacturing
-machine accel, options based on -accel.

Acked-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reviewed by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

vl: Remove unused variable in configure_accelerators

The accel_initialised variable no longer has any setters.

Fixes: 6f6e1698a68c
Acked-by: Paolo Bonzini <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed by: Aleksandar Markovic <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

util/cacheinfo: fix crash when compiling with uClibc

uClibc defines _SC_LEVEL1_ICACHE_LINESIZE and _SC_LEVEL1_DCACHE_LINESIZE
but the corresponding sysconf calls returns -1, which is a valid result,
meaning that the limit is indeterminate.

Handle this situation using the fallback values instead of crashing due
to an assertion failure.

Signed-off-by: Carlos Santos <[email protected]>
Message-Id: <20191017123713 [email protected]>
Signed-off-by: Richard Henderson <[email protected]>

cputlb: Handle NB_MMU_MODES > TARGET_PAGE_BITS_MIN

In target/arm we will shortly have "too many" mmu_idx.
The current minimum barrier is caused by the way in which
tlb_flush_page_by_mmuidx is coded.

We can remove this limitation by allocating memory for
consumption by the worker. Let us assume that this is
the unlikely case, as will be the case for the majority
of targets which have so far satisfied the BUILD_BUG_ON,
and only allocate memory when necessary.

Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>

Merge remote-tracking branch 'remotes/philmd-gitlab/tags/edk2-next-20200121' into staging

EDK2 firmware patches

Another set of build-sys patches, to help building the firmware
binaries we use for testing. We almost have reproducible builds.

# gpg: Signature made Tue 21 Jan 2020 15:14:09 GMT
# gpg:                using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <[email protected]>" [full]
# Primary key fingerprint: FAAB E75E 1291 7221 DCFD  6BB2 E3E3 2C2C DEAD C0DE

* remotes/philmd-gitlab/tags/edk2-next-20200121:
  gitlab-ci.yml: Add jobs to build EDK2 firmware binaries
  roms/edk2-funcs: Force softfloat ARM toolchain prefix on Debian

Signed-off-by: Peter Maydell <[email protected]>

gitlab-ci.yml: Add jobs to build EDK2 firmware binaries

Add two GitLab job to build the EDK2 firmware binaries.

The first job build a Docker image with the packages requisite
to build EDK2, and store this image in the GitLab registry.
The second job pull the image from the registry and build the
EDK2 firmware binaries.

The docker image is only rebuilt if the GitLab YAML or the
Dockerfile is updated.
The second job is only built when the roms/edk2/ submodule is
updated, when a git-ref starts with 'edk2' or when the last
commit contains 'EDK2'. The files generated are archived in
the artifacts.zip file.

With edk2-stable201905, it took 2 minutes 52 seconds to build
the docker image, and 36 minutes 28 seconds to generate the
artifacts.zip with the firmware binaries (filesize: 10MiB).

See: https://gitlab.com/philmd/qemu/pipelines/107553178

Reviewed-by: Laszlo Ersek <[email protected]>
Acked-by: Thomas Huth <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>

roms/edk2-funcs: Force softfloat ARM toolchain prefix on Debian

The Debian (based) distributions currently provides 2 ARM
toolchains, documented as [1]:

* The ARM EABI (armel) port targets a range of older 32-bit ARM
  devices, particularly those used in NAS hardware and a variety
  of *plug computers.
* The newer ARM hard-float (armhf) port supports newer, more
  powerful 32-bit devices using version 7 of the ARM architecture
  specification.

For various reasons documented in [2], the EDK2 project suggests
to use the softfloat toolchain (named 'armel' by Debian).

Force the softfloat cross toolchain prefix on Debian distributions.

[1] https://www.debian.org/ports/arm/#status
[2] https://github.com/tianocore/edk2/commit/41203b9a

Reviewed-by: Laszlo Ersek <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>

Merge remote-tracking branch 'remotes/vivier/tags/m68k-for-5.0-pull-request' into staging

Fix m68k single-stepping with remote gdb

# gpg: Signature made Tue 21 Jan 2020 12:21:12 GMT
# gpg:                using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Laurent Vivier <[email protected]>" [full]
# gpg:                 aka "Laurent Vivier <[email protected]>" [full]
# gpg:                 aka "Laurent Vivier (Red Hat) <[email protected]>" [full]
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier/tags/m68k-for-5.0-pull-request:
  m68k: Fix regression causing Single-Step via GDB/RSP to not single step

Signed-off-by: Peter Maydell <[email protected]>

m68k: Fix regression causing Single-Step via GDB/RSP to not single step

A regression that was introduced, with the refactor to TranslatorOps,
drops two lines that update the PC when single-stepping is being performed.

Fixes: 11ab74b01e0a ("target/m68k: Convert to TranslatorOps")
Reported-by: Lucien Murray-Pitts <[email protected]>
Suggested-by: Lucien Murray-Pitts <[email protected]>
Suggested-by: Richard Henderson <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Laurent Vivier <[email protected]>
Message-Id: <20200116165454.2076265 [email protected]>

Makefile: add missing mkdir MANUAL_BUILDDIR

The MANUAL_BUILDDIR directory is automatically created by sphinx-build
for the other targets.  The index.html target does not use sphinx-build
so we must manually create the directory to avoid the following error:

  GEN     docs/built/index.html
  /bin/sh: docs/built/index.html: No such file or directory

Signed-off-by: Stefan Hajnoczi <[email protected]>
Message-id: 20200120163400 [email protected]
Reviewed-by: Miroslav Rezanina <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/gkurz/tags/9p-next-2020-01-20' into staging

Assorted fixes and cleanups.
v2: - fix 32-bit build

# gpg: Signature made Mon 20 Jan 2020 14:14:11 GMT
# gpg:                using RSA key B4828BAF943140CEF2A3491071D4D5E5822F73D6
# gpg: Good signature from "Greg Kurz <[email protected]>" [full]
# gpg:                 aka "Gregory Kurz <[email protected]>" [full]
# gpg:                 aka "[jpeg image of size 3330]" [full]
# Primary key fingerprint: B482 8BAF 9431 40CE F2A3  4910 71D4 D5E5 822F 73D6

* remotes/gkurz/tags/9p-next-2020-01-20:
  9pfs/9p.c: remove unneeded labels
  virtfs-proxy-helper.c: remove 'err_out' label in setugid()
  9p: init_in_iov_from_pdu can truncate the size
  9p: local: always return -1 on error in local_unlinkat_common
  9pfs: local: Fix possible memory leak in local_link()

Signed-off-by: Peter Maydell <[email protected]>

9pfs/9p.c: remove unneeded labels

'out' label in v9fs_xattr_write() and 'out_nofid' label in
v9fs_complete_rename() can be replaced by appropriate return
calls.

CC: Greg Kurz <[email protected]>
Signed-off-by: Daniel Henrique Barboza <[email protected]>
Acked-by: Greg Kurz <[email protected]>
Signed-off-by: Greg Kurz <[email protected]>

virtfs-proxy-helper.c: remove 'err_out' label in setugid()

'err_out' can be removed and be replaced by 'return -errno'
in its only instance in the function.

CC: Greg Kurz <[email protected]>
Signed-off-by: Daniel Henrique Barboza <[email protected]>
Acked-by: Greg Kurz <[email protected]>
Signed-off-by: Greg Kurz <[email protected]>

9p: init_in_iov_from_pdu can truncate the size

init_in_iov_from_pdu might not be able to allocate the full buffer size
requested, which comes from the client and could be larger than the
transport has available at the time of the request. Specifically, this
can happen with read operations, with the client requesting a read up to
the max allowed, which might be more than the transport has available at
the time.

Today the implementation of init_in_iov_from_pdu throws an error, both
Xen and Virtio.

Instead, change the V9fsTransport interface so that the size becomes a
pointer and can be limited by the implementation of
init_in_iov_from_pdu.

Change both the Xen and Virtio implementations to set the size to the
size of the buffer they managed to allocate, instead of throwing an
error. However, if the allocated buffer size is less than P9_IOHDRSZ
(the size of the header) still throw an error as the case is unhandable.

Signed-off-by: Stefano Stabellini <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
[groug: fix 32-bit build]
Signed-off-by: Greg Kurz <[email protected]>

9p: local: always return -1 on error in local_unlinkat_common

local_unlinkat_common() is supposed to always return -1 on error.
This is being done by jumps to the 'err_out' label, which is
a 'return ret' call, and 'ret' is initialized with -1.

Unfortunately there is a condition in which the function will
return 0 on error: in a case where flags == AT_REMOVEDIR, 'ret'
will be 0 when reaching

map_dirfd = openat_dir(...)

And, if map_dirfd == -1 and errno != ENOENT, the existing 'err_out'
jump will execute 'return ret', when ret is still set to zero
at that point.

This patch fixes it by changing all 'err_out' labels by
'return -1' calls, ensuring that the function will always
return -1 on error conditions. 'ret' can be left unintialized
since it's now being used just to store the result of 'unlinkat'
calls.

CC: Greg Kurz <[email protected]>
Signed-off-by: Daniel Henrique Barboza <[email protected]>
[groug: changed prefix in title to be "9p: local:"]
Signed-off-by: Greg Kurz <[email protected]>

9pfs: local: Fix possible memory leak in local_link()

There is a possible memory leak while local_link return -1 without free
odirpath and oname.

Reported-by: Euler Robot <[email protected]>
Signed-off-by: Jaijun Chen <[email protected]>
Signed-off-by: Xiang Zheng <[email protected]>
Reviewed-by: Christian Schoenebeck <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Greg Kurz <[email protected]>

qapi: Fix code generation with Python 3.5

Recent commit 3e7fb5811b "qapi: Fix code generation for empty modules"
modules" switched QAPISchema.visit() from

    for entity in self._entity_list:

effectively to

    for mod in self._module_dict.values():
        for entity in mod._entity_list:

Visits in the same order as long as .values() is in insertion order.
That's the case only for Python 3.6 and later.  Before, it's in some
arbitrary order, which results in broken generated code.

Fix by making self._module_dict an OrderedDict rather than a dict.

Fixes: 3e7fb5811baab213dcc7149c3aa69442d683c26c
Signed-off-by: Markus Armbruster <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Alex Bennée <[email protected]>
Tested-by: Thomas Huth <[email protected]>
Tested-by: Philippe Mathieu-Daudé <[email protected]>
Tested-by: BALATON Zoltan <[email protected]>
Tested-by: Alex Bennée <[email protected]>
Message-id: 20200116202558 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/juanquintela/tags/migration-pull-pull-request' into staging

Migration pull request

# gpg: Signature made Mon 20 Jan 2020 10:29:53 GMT
# gpg:                using RSA key 1899FF8EDEBF58CCEE034B82F487EF185872D723
# gpg: Good signature from "Juan Quintela <[email protected]>" [full]
# gpg:                 aka "Juan Quintela <[email protected]>" [full]
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03  4B82 F487 EF18 5872 D723

* remotes/juanquintela/tags/migration-pull-pull-request: (29 commits)
  multifd: Be consistent about using uint64_t
  migration: Support QLIST migration
  apic: Use 32bit APIC ID for migration instance ID
  migration: Change SaveStateEntry.instance_id into uint32_t
  migration: Define VMSTATE_INSTANCE_ID_ANY
  Bug #1829242 correction.
  migration/multifd: fix destroyed mutex access in terminating multifd threads
  migration/multifd: fix nullptr access in terminating multifd threads
  migration/multifd: not use multifd during postcopy
  migration/multifd: clean pages after filling packet
  migration/postcopy: enable compress during postcopy
  migration/postcopy: enable random order target page arrival
  migration/postcopy: set all_zero to true on the first target page
  migration/postcopy: count target page number to decide the place_needed
  migration/postcopy: wait for decompress thread in precopy
  migration/postcopy: reduce memset when it is zero page and matches_target_page_size
  migration/ram: Yield periodically to the main loop
  migration: savevm_state_handler_insert: constant-time element insertion
  migration: add savevm_state_handler_remove()
  misc: use QEMU_IS_ALIGNED
  ...

Signed-off-by: Peter Maydell <[email protected]>

multifd: Be consistent about using uint64_t

We transmit ram_addr_t always as uint64_t. Be consistent in its
use (on 64bit system, it is always uint64_t problem is 32bits).

Signed-off-by: Juan Quintela <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>

migration: Support QLIST migration

Support QLIST migration using the same principle as QTAILQ:
94869d5c52 ("migration: migrate QTAILQ").

The VMSTATE_QLIST_V macro has the same proto as VMSTATE_QTAILQ_V.
The change mainly resides in QLIST RAW macros: QLIST_RAW_INSERT_HEAD
and QLIST_RAW_REVERSE.

Tests also are provided.

Signed-off-by: Eric Auger <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

apic: Use 32bit APIC ID for migration instance ID

Migration is silently broken now with x2apic config like this:

-smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \
-device intel-iommu,intremap=on,eim=on

After migration, the guest kernel could hang at anything, due to
x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so
any operations related to x2apic could be broken then (e.g., RDMSR on
x2apic MSRs could fail because KVM would think that the vcpu hasn't
enabled x2apic at all).

The issue is that the x2apic bit was never applied correctly for vcpus
whose ID > 255 when migrate completes, and that's because when we
migrate APIC we use the APICCommonState.id as instance ID of the
migration stream, while that's too short for x2apic.

Let's use the newly introduced initial_apic_id for that.

Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: Change SaveStateEntry.instance_id into uint32_t

It was always used as 32bit, so define it as used to be clear.
Instead of using -1 as the auto-gen magic value, we switch to
UINT32_MAX. We also make sure that we don't auto-gen this value to
avoid overflowed instance IDs without being noticed.

Suggested-by: Juan Quintela <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: Define VMSTATE_INSTANCE_ID_ANY

Define the new macro VMSTATE_INSTANCE_ID_ANY for callers who wants to
auto-generate the vmstate instance ID.  Previously it was hard coded
as -1 instead of this macro.  It helps to change this default value in
the follow up patches.  No functional change.

Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

Bug #1829242 correction.

Added type conversions to ram_addr_t before all left shifts of page
indexes to TARGET_PAGE_BITS, to correct overflows when the page
address was 4Gb and more.

Signed-off-by: Alexey Romko <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration/multifd: fix destroyed mutex access in terminating multifd threads

One multifd will lock all the other multifds' IOChannel mutex to inform them
to quit by setting p->quit or shutting down p->c. In this senario, if some
multifds had already been terminated and multifd_load_cleanup/multifd_save_cleanup
had destroyed their mutex, it could cause destroyed mutex access when trying
lock their mutex.

Here is the coredump stack:
    #0  0x00007f81a2794437 in raise () from /usr/lib64/libc.so.6
    #1  0x00007f81a2795b28 in abort () from /usr/lib64/libc.so.6
    #2  0x00007f81a278d1b6 in __assert_fail_base () from /usr/lib64/libc.so.6
    #3  0x00007f81a278d262 in __assert_fail () from /usr/lib64/libc.so.6
    #4  0x000055eb1bfadbd3 in qemu_mutex_lock_impl (mutex=0x55eb1e2d1988, file=<optimized out>, line=<optimized out>) at util/qemu-thread-posix.c:64
    #5  0x000055eb1bb4564a in multifd_send_terminate_threads (err=<optimized out>) at migration/ram.c:1015
    #6  0x000055eb1bb4bb7f in multifd_send_thread (opaque=0x55eb1e2d19f8) at migration/ram.c:1171
    #7  0x000055eb1bfad628 in qemu_thread_start (args=0x55eb1e170450) at util/qemu-thread-posix.c:502
    #8  0x00007f81a2b36df5 in start_thread () from /usr/lib64/libpthread.so.0
    #9  0x00007f81a286048d in clone () from /usr/lib64/libc.so.6

To fix it up, let's destroy the mutex after all the other multifd threads had
been terminated.

Signed-off-by: Jiahui Cen <[email protected]>
Signed-off-by: Ying Fang <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration/multifd: fix nullptr access in terminating multifd threads

One multifd channel will shutdown all the other multifd's IOChannel when it
fails to receive an IOChannel. In this senario, if some multifds had not
received its IOChannel yet, it would try to shutdown its IOChannel which could
cause nullptr access at qio_channel_shutdown.

Here is the coredump stack:
    #0  object_get_class (obj=obj@entry=0x0) at qom/object.c:908
    #1  0x00005563fdbb8f4a in qio_channel_shutdown (ioc=0x0, how=QIO_CHANNEL_SHUTDOWN_BOTH, errp=0x0) at io/channel.c:355
    #2  0x00005563fd7b4c5f in multifd_recv_terminate_threads (err=<optimized out>) at migration/ram.c:1280
    #3  0x00005563fd7bc019 in multifd_recv_new_channel (ioc=ioc@entry=0x556400255610, errp=errp@entry=0x7ffec07dce00) at migration/ram.c:1478
    #4  0x00005563fda82177 in migration_ioc_process_incoming (ioc=ioc@entry=0x556400255610, errp=errp@entry=0x7ffec07dce30) at migration/migration.c:605
    #5  0x00005563fda8567d in migration_channel_process_incoming (ioc=0x556400255610) at migration/channel.c:44
    #6  0x00005563fda83ee0 in socket_accept_incoming_migration (listener=0x5563fff6b920, cioc=0x556400255610, opaque=<optimized out>) at migration/socket.c:166
    #7  0x00005563fdbc25cd in qio_net_listener_channel_func (ioc=<optimized out>, condition=<optimized out>, opaque=<optimized out>) at io/net-listener.c:54
    #8  0x00007f895b6fe9a9 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
    #9  0x00005563fdc18136 in glib_pollfds_poll () at util/main-loop.c:218
    #10 0x00005563fdc181b5 in os_host_main_loop_wait (timeout=1000000000) at util/main-loop.c:241
    #11 0x00005563fdc183a2 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:517
    #12 0x00005563fd8edb37 in main_loop () at vl.c:1791
    #13 0x00005563fd74fd45 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4473

To fix it up, let's check p->c before calling qio_channel_shutdown.

Signed-off-by: Jiahui Cen <[email protected]>
Signed-off-by: Ying Fang <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration/multifd: not use multifd during postcopy

We don't support multifd during postcopy, but user still could enable
both multifd and postcopy. This leads to migration failure.

Skip multifd during postcopy.

Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration/multifd: clean pages after filling packet

This is a preparation for the next patch:

not use multifd during postcopy.

Without enabling postcopy, everything looks good. While after enabling
postcopy, migration may fail even not use multifd during postcopy. The
reason is the pages is not properly cleared and *old* target page will
continue to be transferred.

After clean pages, migration succeeds.

Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration/postcopy: enable compress during postcopy

postcopy requires to place a whole host page, while migration thread
migrate memory in target page size. This makes postcopy need to collect
all target pages in one host page before placing via userfaultfd.

To enable compress during postcopy, there are two problems to solve:

    1. Random order for target page arrival
    2. Target pages in one host page arrives without interrupt by target
       page from other host page

The first one is handled by previous cleanup patch.

This patch handles the second one by:

    1. Flush compress thread for each host page
    2. Wait for decompress thread for before placing host page

Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration/postcopy: enable random order target page arrival

After using number of target page received to track one host page, we
could have the capability to handle random order target page arrival in
one host page.

This is a preparation for enabling compress during postcopy.

Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration/postcopy: set all_zero to true on the first target page

For the first target page, all_zero is set to true for this round check.

After target_pages introduced, we could leverage this variable instead
of checking the address offset.

Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration/postcopy: count target page number to decide the place_needed

In postcopy, it requires to place whole host page instead of target
page.

Currently, it relies on the page offset to decide whether this is the
last target page. We also can count the target page number during the
iteration. When the number of target page equals
(host page size / target page size), this means it is the last target
page in the host page.

This is a preparation for non-ordered target page transmission.

Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration/postcopy: wait for decompress thread in precopy

Compress is not supported with postcopy, it is safe to wait for
decompress thread just in precopy.

This is a preparation for later patch.

Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration/postcopy: reduce memset when it is zero page and matches_target_page_size

In this case, page_buffer content would not be used.

Skip this to save some time.

Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration/ram: Yield periodically to the main loop

Usually, incoming migration coroutine yields to the main loop
while its IO-channel is waiting for data to receive. But there is a case
when RAM migration and data receive have the same speed: VM with huge
zeroed RAM. In this case, IO-channel won't read and thus the main loop
is stuck and for instance, it doesn't respond to QMP commands.

For this case, yield periodically, but not too often, so as not to
affect the speed of migration.

Signed-off-by: Yury Kotov <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: savevm_state_handler_insert: constant-time element insertion

savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
sorting is maintained by searching from head to tail for a suitable
insertion spot.  Insertion is thus an O(n) operation.

If we instead keep track of the head of each priority's subqueue
within that larger queue we can reduce this operation to O(1) time.

savevm_state_handler_remove() becomes slightly more complex to
accomodate these gains: we need to replace the head of a priority's
subqueue when removing it.

With O(1) insertion, booting VMs with many SaveStateEntry objects is
more plausible.  For example, a ppc64 VM with maxmem=8T has 40000 such
objects to insert.

Signed-off-by: Scott Cheloha <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: add savevm_state_handler_remove()

Create a function to abstract common logic needed when removing a
SaveStateEntry element from the savevm_state.handlers queue.

For now we just remove the element. Soon it will involve additional
cleanup.

Signed-off-by: Scott Cheloha <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

misc: use QEMU_IS_ALIGNED

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Stefan Berger <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>

migration: Fix the re-run check of the migrate-incoming command

The current check sets an error but doesn't fail the command.
This may cause a problem if new connection attempt by the same URI
affects the first connection.

Signed-off-by: Yury Kotov <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Reviewed-by: Darren Kenny <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>