Git Repo - qemu.git/log

qemu-timer: Avoid overflows when converting timeout to struct timespec

In qemu_poll_ns(), when we convert an int64_t nanosecond timeout into
a struct timespec, we may accidentally run into overflow problems if
the timeout is very long. This happens because the tv_sec field is a
time_t, which is signed, so we might end up setting it to a negative
value by mistake. This will result in what was intended to be a
near-infinite timeout turning into an instantaneous timeout, and we'll
busy loop. Cap the maximum timeout at INT32_MAX seconds (about 68 years)
to avoid this problem.

This specifically manifested on ARM hosts as an extreme slowdown on
guest shutdown (when the guest reprogrammed the PL031 RTC to not
generate alarms using a very long timeout) but could happen on other
hosts and guests too.

Reported-by: Christoffer Dall <[email protected]>
Cc: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Message-id: 1416939705 [email protected]

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

The final 2.2 patches from me.

# gpg: Signature made Wed 26 Nov 2014 11:12:25 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg:                 aka "Paolo Bonzini <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  s390x/kvm: Fix compile error
  fw_cfg: fix boot order bug when dynamically modified via QOM
  -machine vmport=auto: Fix handling of VMWare ioport emulation for xen

Signed-off-by: Peter Maydell <[email protected]>

s390x/kvm: Fix compile error

commit a2b257d6212a "memory: expose alignment used for allocating RAM
as MemoryRegion API" triggered a compile error on KVM/s390x.

Fix the prototype and the implementation of legacy_s390_alloc.

Cc: Igor Mammedov <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

fw_cfg: fix boot order bug when dynamically modified via QOM

When we dynamically modify boot order, the length of
boot order will be changed, but we don't update
s->files->f[i].size with new length. This casuse
seabios read a wrong vale of qemu cfg file about
bootorder.

Cc: Gerd Hoffmann <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Signed-off-by: Gonglei <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

-machine vmport=auto: Fix handling of VMWare ioport emulation for xen

c/s 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4

or

c/s b154537ad07598377ebf98252fb7d2aff127983b

moved the testing of xen_enabled() from pc_init1() to
pc_machine_initfn().

xen_enabled() does not return the correct value in
pc_machine_initfn().

Changed vmport from a bool to an enum. Added the value "auto" to do
the old way. Move check of xen_enabled() back to pc_init1().

Acked-by: Eric Blake <[email protected]>
Reviewed-by: Eduardo Habkost <[email protected]>
Signed-off-by: Don Slutz <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Update version for v2.2.0-rc3 release

Signed-off-by: Peter Maydell <[email protected]>

input: move input-send-event into experimental namespace

Ongoing discussions on how we are going to specify the console,
so tag the command as experiental so we can refine things in
the 2.3 development cycle.

Signed-off-by: Gerd Hoffmann <[email protected]>
Message-id: 1416923657 [email protected]
[Spell out "not a stable API", and x- the QAPI schema, too]
Signed-off-by: Markus Armbruster <[email protected]>
Reviewed-by: Amos Kong <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc, pci, misc bugfixes

A bunch of bugfixes for 2.2.

Signed-off-by: Michael S. Tsirkin <[email protected]>
# gpg: Signature made Mon 24 Nov 2014 18:59:47 GMT using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>"
# gpg:                 aka "Michael S. Tsirkin <[email protected]>"

* remotes/mst/tags/for_upstream:
  pc: acpi: mark all possible CPUs as enabled in SRAT
  pcie: fix improper use of negative value
  pcie: fix typo in pcie_cap_deverr_init()
  target-i386: move generic memory hotplug methods to DSDTs
  acpi-build: mark RAM dirty on table update
  hw/pci: fix crash on shpc error flow
  pc: count in 1Gb hugepage alignment when sizing hotplug-memory container
  pc: explicitly check maxmem limit when adding DIMM
  pc: pc-dimm: use backend alignment during address auto allocation
  pc: align DIMM's address/size by backend's alignment value
  memory: expose alignment used for allocating RAM as MemoryRegion API
  pc: limit DIMM address and size to page aligned values
  pc: make pc_dimm_plug() more readble
  pc: kvm: check if KVM has free memory slots to avoid abort()
  qemu-char: fix tcp_get_fds

Signed-off-by: Peter Maydell <[email protected]>

pc: acpi: mark all possible CPUs as enabled in SRAT

If QEMU is started with -numa ... Windows only notices that
CPU has been hot-added but it will not online such CPUs.

It's caused by the fact that possible CPUs are flagged as
not enabled in SRAT and Windows honoring that information
doesn't use corresponding CPU.

ACPI 5.0 Spec regarding to flag says:
"
Table 5-47 Local APIC Flags
...
Enabled: if zero, this processor is unusable, and the operating system
support will not attempt to use it.
"

Fix QEMU to adhere to spec and mark possible CPUs as enabled
in SRAT.

With that Windows onlines hot-added CPUs as expected.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pcie: fix improper use of negative value

Signed-off-by: Gonglei <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pcie: fix typo in pcie_cap_deverr_init()

Reported-by:
https://bugs.launchpad.net/qemu/+bug/1393440

Signed-off-by: Gonglei <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

target-i386: move generic memory hotplug methods to DSDTs

This makes it simpler to keep the SSDT byte-for-byte identical for a
given machine type, which is a goal we want to have for 2.2 and newer
types.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

acpi-build: mark RAM dirty on table update

acpi build modifies internal FW CFG RAM on first access
but we forgot to mark it dirty.
If this RAM has been migrated already, it won't be
migrated again, returning corrupted tables to guest.

Signed-off-by: Michael S. Tsirkin <[email protected]>

hw/pci: fix crash on shpc error flow

If the pci bridge enters in error flow as part
of init process it will only delete the shpc mmio
subregion but not remove it from the properties list,
resulting in segmentation fault when the bridge runs
the exit function.

Example: add a pci bridge without specifing the chassis number:
    <qemu-bin> ... -device pci-bridge,id=p1
Result:
    (qemu) qemu-system-x86_64: -device pci-bridge,id=p1: Bridge chassis not specified. Each bridge is required to be assigned a unique chassis id > 0.
    qemu-system-x86_64: -device pci-bridge,id=p1: Device
    initialization failed.
    Segmentation fault (core dumped)

    if (child->class->unparent) {
    #0  0x00005555558d629b in object_finalize_child_property (obj=0x555556d2e830, name=0x555556d30630 "shpc-mmio[0]", opaque=0x555556a42fc8) at qom/object.c:1078
    #1  0x00005555558d4b1f in object_property_del_all (obj=0x555556d2e830) at qom/object.c:367
    #2  0x00005555558d4ca1 in object_finalize (data=0x555556d2e830) at qom/object.c:412
    #3  0x00005555558d55a1 in object_unref (obj=0x555556d2e830) at qom/object.c:720
    #4  0x000055555572c907 in qdev_device_add (opts=0x5555563544f0) at qdev-monitor.c:566
    #5  0x0000555555744f16 in device_init_func (opts=0x5555563544f0, opaque=0x0) at vl.c:2213
    #6  0x00005555559cf5f0 in qemu_opts_foreach (list=0x555555e0f8e0 <qemu_device_opts>, func=0x555555744efa <device_init_func>, opaque=0x0, abort_on_failure=1) at util/qemu-option.c:1057
    #7  0x000055555574a11b in main (argc=16, argv=0x7fffffffdde8, envp=0x7fffffffde70) at vl.c:423

Unparent the shpc mmio region as part of shpc cleanup.

Signed-off-by: Marcel Apfelbaum <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Amos Kong <[email protected]>

pc: count in 1Gb hugepage alignment when sizing hotplug-memory container

if DIMMs with different size/alignment are interleaved
in creation order, it could lead to hotplug-memory
container fragmentation and following inability to use
all RAM upto maxmem.
For example:
    -m 4G,slots=3,maxmem=7G
    -object memory-backend-file,id=mem-1,size=256M,mem-path=/pagesize-2MB
    -device pc-dimm,id=mem1,memdev=mem-1
    -object memory-backend-file,id=mem-2,size=1G,mem-path=/pagesize-1GB
    -device pc-dimm,id=mem2,memdev=mem-2
    -object memory-backend-file,id=mem-3,size=256M,mem-path=/pagesize-2MB
    -device pc-dimm,id=mem3,memdev=mem-3

fragments hotplug-memory container and doesn't allow
to use 1GB hugepage backend to consume remainig 1Gb.

To ease managment factor count in max 1Gb alignment for
each memory slot when sizing hotplug-memory region so
that regadless of fragmentaion it would be possible to
add max aligned DIMM.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pc: explicitly check maxmem limit when adding DIMM

Currently maxmem limit is not checked and depends on
hotplug region container not being able to fit more RAM
than maxmem. Do check explicitly so that it would
be possible to change hotplug container size later
to deal with fragmentation.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block patches for 2.2.0-rc3

# gpg: Signature made Mon 24 Nov 2014 12:52:23 GMT using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"

* remotes/kevin/tags/for-upstream:
Revert "qemu-img info: show nocow info"

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

Three patches to fix ExtINT for the QEMU implementation of the local APIC.

# gpg: Signature made Mon 24 Nov 2014 13:38:36 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg:                 aka "Paolo Bonzini <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  apic: fix incorrect handling of ExtINT interrupts wrt processor priority
  apic: fix loss of IPI due to masked ExtINT
  apic: avoid getting out of halted state on masked PIC interrupts

Signed-off-by: Peter Maydell <[email protected]>

apic: fix incorrect handling of ExtINT interrupts wrt processor priority

This fixes another failure with ExtINT, demonstrated by QNX. The failure
mode is as follows:
- IPI sent to cpu 0 (bit set in APIC irr)
- IPI accepted by cpu 0 (bit cleared in irr, set in isr)
- IPI sent to cpu 0 (bit set in both irr and isr)
- PIC interrupt sent to cpu 0

The PIC interrupt causes CPU_INTERRUPT_HARD to be set, but
apic_irq_pending observes that the highest pending APIC interrupt priority
(the IPI) is the same as the processor priority (since the IPI is still
being handled), so apic_get_interrupt returns a spurious interrupt rather
than the pending PIC interrupt. The result is an endless sequence of
spurious interrupts, since nothing will clear CPU_INTERRUPT_HARD.

Instead, ExtINT interrupts should have ignored the processor priority.
Calling apic_check_pic early in apic_get_interrupt ensures that
apic_deliver_pic_intr is called instead of delivering the spurious
interrupt. apic_deliver_pic_intr then clears CPU_INTERRUPT_HARD if needed.

Reported-by: Richard Bilson <[email protected]>
Tested-by: Richard Bilson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

apic: fix loss of IPI due to masked ExtINT

This patch fixes an obscure failure of the QNX kernel on QEMU x86 SMP.
In QNX, all hardware interrupts come via the PIC, and are delivered by
the cpu 0 LAPIC in ExtINT mode, while IPIs are delivered by the LAPIC
in fixed mode.

This bug happens as follows:
- cpu 0 masks a particular PIC interrupt
- IPI sent to cpu 0 (CPU_INTERRUPT_HARD is set)
- before the IPI is accepted, the masked interrupt line is asserted by the
device

Since the interrupt is masked, apic_deliver_pic_intr will clear
CPU_INTERRUPT_HARD. The IPI will still be set in the APIC irr, but since
CPU_INTERRUPT_HARD is not set the cpu will not notice. Depending on the
scenario this can cause a system hang, i.e. if cpu 0 is expected to unmask
the interrupt.

In order to fix this, do a full check of the APIC before an EXTINT
is acknowledged. This can result in clearing CPU_INTERRUPT_HARD, but
can also result in delivering the lost IPI.

Reported-by: Richard Bilson <[email protected]>
Tested-by: Richard Bilson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

apic: avoid getting out of halted state on masked PIC interrupts

After the next patch, if a masked PIC interrupts causes CPU_INTERRUPT_POLL
to be set, the CPU will spuriously get out of halted state.  While this
is technically valid, we should avoid that.

Make CPU_INTERRUPT_POLL run apic_update_irq in the right thread and then
look at CPU_INTERRUPT_HARD.  If CPU_INTERRUPT_HARD does not get set,
do not report the CPU as having work.

Also move the handling of software-disabled APIC from apic_update_irq
to apic_irq_pending, and always trigger CPU_INTERRUPT_POLL.  This will
be important once we will add a case that resets CPU_INTERRUPT_HARD
from apic_update_irq.  We want to run it even if we go through
CPU_INTERRUPT_POLL, and even if the local APIC is software disabled.

Reported-by: Richard Bilson <[email protected]>
Tested-by: Richard Bilson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Revert "qemu-img info: show nocow info"

This reverts commit 000c4dfff4d7686e2fba3066a477a1290ed60622.

The main reason for reverting this commit before the 2.2 release is that
it adds a QAPI interface that we don't want to keep: The 'nocow' flag
doesn't generally make sense for block nodes, but only for the raw-posix
driver. It should therefore be part of ImageInfoSpecific rather than
ImageInfo.

The commit contains more problems, but unlike the API stability issue
they wouldn't justify reverting it.

Conflicts:
block/qapi.c

Signed-off-by: Kevin Wolf <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>

pc: pc-dimm: use backend alignment during address auto allocation

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pc: align DIMM's address/size by backend's alignment value

Performance wise it's better to align GVA by the backend's
page size.

Also do not allow to create DIMM device with suboptimal
size (i.e. not aligned to backends page size) to aviod
memory loss.

Do above only for 2.2 and newer machine types to avoid
breaking working configs with 2.1 machine type.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

memory: expose alignment used for allocating RAM as MemoryRegion API

introduce memory_region_get_alignment() that returns
underlying memory block alignment or 0 if it's not
relevant/implemented for backend.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pc: limit DIMM address and size to page aligned values

When running in KVM mode, kvm_set_phys_mem() will silently
fail if registered MemoryRegion address/size is not page
aligned. Causing memory hotplug failure in guest.

Mapping non aligned MemoryRegion in TCG mode 'works', but
sane guest OS still expects page aligned memory module
and fails to initialize it if it's not aligned.

So do not allow non aligned (i.e. valid) address/size
values for DIMM to avoid either KVM failure or guest
issues caused by it.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pc: make pc_dimm_plug() more readble

split addr initialization from declaration so that
later when new local vars are added property getter
wouldn't drift off of error check.

Signed-off-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

pc: kvm: check if KVM has free memory slots to avoid abort()

When more memory devices are used than available
KVM memory slots, QEMU crashes with:

kvm_alloc_slot: no free slot available
Aborted (core dumped)

Fix this by checking that KVM has a free slot before
attempting to map memory in guest address space.

Signed-off-by: Igor Mammedov <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

qemu-char: fix tcp_get_fds

tcp_get_fds API discards fds if there's more than 1 of these.

It's tricky to fix this without API changes in the generic case.

However, this API is only used by tests ATM, and tests know how
many fds they expect.

So let's not waste cycles trying to fix this properly:
simply assume at most 16 fds (tests use at most 8 now).
assert if some test tries to get more.

Signed-off-by: Michael S. Tsirkin <[email protected]>

Merge remote-tracking branch 'remotes/stefanha/tags/net-pull-request' into staging

# gpg: Signature made Fri 21 Nov 2014 11:12:37 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg:                 aka "Stefan Hajnoczi <[email protected]>"

* remotes/stefanha/tags/net-pull-request:
  rtl8139: fix Pointer to local outside scope
  pcnet: fix Negative array index read
  net/socket: fix Uninitialized scalar variable
  net/slirp: fix memory leak

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/kraxel/tags/pull-gtk-20141121-1' into staging

gtk: two bugfixes for 2.2.

# gpg: Signature made Fri 21 Nov 2014 07:38:45 GMT using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <[email protected]>"
# gpg:                 aka "Gerd Hoffmann <[email protected]>"
# gpg:                 aka "Gerd Hoffmann (private) <[email protected]>"

* remotes/kraxel/tags/pull-gtk-20141121-1:
  gtk: Don't crash if -nodefaults
  gtk: fix possible memory leak about local_err

Signed-off-by: Peter Maydell <[email protected]>

rtl8139: fix Pointer to local outside scope

Coverity spot:
Assigning: iov = struct iovec [3]({{buf, 12UL},
{(void *)dot1q_buf, 4UL},
{buf + 12, size - 12}})
(address of temporary variable of type struct iovec [3]).
out_of_scope: Temporary variable of type struct iovec [3] goes out of scope.

Pointer to local outside scope (RETURN_LOCAL)
use_invalid:
Using iov, which points to an out-of-scope temporary variable of type struct iovec [3].

Signed-off-by: Gonglei <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

pcnet: fix Negative array index read

s->xmit_pos maybe assigned to a negative value (-1),
but in this branch variable s->xmit_pos as an index to
array s->buffer. Let's add a check for s->xmit_pos.

Signed-off-by: Gonglei <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

net/socket: fix Uninitialized scalar variable

If is_connected parameter is false, the saddr
variable will no initialize. Coverity report:
uninit_use: Using uninitialized value saddr.sin_port.

We don't need add saddr information to nc->info_str
when is_connected is false.

Signed-off-by: Gonglei <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

net/slirp: fix memory leak

commit b412eb61 introduce 'cmd:' target for guestfwd,
and fwd don't be used in this scenario, and will leak
memory in true branch with 'cmd:'. Let's allocate memory
for fwd variable just in else statement.

Cc: Alexander Graf <[email protected]>
Signed-off-by: Gonglei <[email protected]>
Reviewed-by: Jason Wang <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

gtk: Don't crash if -nodefaults

This fixes a crash by just skipping the vte resize hack if cur is NULL.

Reproducer:

qemu-system-x86_64 -nodefaults

Signed-off-by: Fam Zheng <[email protected]>
Signed-off-by: Gerd Hoffmann <[email protected]>

gtk: fix possible memory leak about local_err

local_err in gd_vc_gfx_init() is not freed, and we don't use it,
so remove it.

Signed-off-by: zhanghailiang <[email protected]>
Signed-off-by: Gerd Hoffmann <[email protected]>

hw/arm/virt: set stdout-path instead of linux,stdout-path

ePAPR 1.1 defines the stdout-path property, making the os-specific
linux,stdout-path property redundant. Change the DT setup for ARM virt
to use the generic property - supported by Linux since 3.15.

The old QEMU behaviour was not present in any released version of
QEMU, and was only added to QEMU after the kernel changed, so
this should not break any existing setups.

Signed-off-by: Leif Lindholm <[email protected]>
[PMM: add note to commit about the old behaviour never hving been
in a released version of QEMU]
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/agraf/tags/signed-ppc-for-upstream' into staging

Patch queue for ppc - 2014-11-20

Hopefully the last few fixups for 2.2:

  - KVM memory slot fix (should usually only occur on PPC)
  - e300 fix
  - Altivec mtvscr instruction fix

# gpg: Signature made Thu 20 Nov 2014 13:53:34 GMT using RSA key ID 03FEDC60
# gpg: Good signature from "Alexander Graf <[email protected]>"
# gpg:                 aka "Alexander Graf <[email protected]>"

* remotes/agraf/tags/signed-ppc-for-upstream:
  target-ppc: Altivec's mtvscr Decodes Wrong Register
  kvm: Fix memory slot page alignment logic
  target-ppc: Fix breakpoint registers for e300

Signed-off-by: Peter Maydell <[email protected]>

target-ppc: Altivec's mtvscr Decodes Wrong Register

The Move to Vector Status and Control Register (mtvscr) instruction
uses VRB as the source register. Fix the code generator to correctly
decode the VRB field. That is, use "rB(ctx->opcode)" instead of
"rD(ctx->opcode)".

Signed-off-by: Tom Musta <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

kvm: Fix memory slot page alignment logic

Memory slots have to be page aligned to get entered into KVM. There
is existing logic that tries to ensure that we pad memory slots that
are not page aligned to the biggest region that would still fit in the
alignment requirements.

Unfortunately, that logic is broken. It tries to calculate the start
offset based on the region size.

Fix up the logic to do the thing it was intended to do and document it
properly in the comment above it.

With this patch applied, I can successfully run an e500 guest with more
than 3GB RAM (at which point RAM starts overlapping subpage memory regions).

Cc: [email protected]
Signed-off-by: Alexander Graf <[email protected]>

target-ppc: Fix breakpoint registers for e300

In the previous patch, the registers were added to init_proc_G2LE
instead of init_proc_e300.

Signed-off-by: Fabien Chouteau <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>

Merge remote-tracking branch 'remotes/amit-migration/tags/for-2.2-2' into staging

Fix from a while back that unfortunately got ignored.  Dave Gilbert says
it may actually fix a case where autoconverge would break on a repeat
migration (and not just fix stats).

# gpg: Signature made Thu 20 Nov 2014 12:52:41 GMT using RSA key ID 854083B6
# gpg: Good signature from "Amit Shah <[email protected]>"
# gpg:                 aka "Amit Shah <[email protected]>"
# gpg:                 aka "Amit Shah <[email protected]>"

* remotes/amit-migration/tags/for-2.2-2:
  migration: static variables will not be reset at second migration

Signed-off-by: Peter Maydell <[email protected]>

migration: static variables will not be reset at second migration

The static variables in migration_bitmap_sync will not be reset in
the case of a second attempted migration.

Signed-off-by: ChenLiang <[email protected]>
Signed-off-by: Gonglei <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Amit Shah <[email protected]>

Update version for v2.2.0-rc2 release

Signed-off-by: Peter Maydell <[email protected]>

hw/ide/core.c: Prevent SIGSEGV during migration

The other callers to blk_set_enable_write_cache() in this file
already check for s->blk == NULL.

Signed-off-by: Don Slutz <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Message-id: 1416259239 [email protected]
Cc: [email protected]
Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/stefanha/tags/net-pull-request' into staging

# gpg: Signature made Tue 18 Nov 2014 15:04:53 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg: aka "Stefan Hajnoczi <[email protected]>"

* remotes/stefanha/tags/net-pull-request:
net: The third parameter of getsockname should be initialized

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging

# gpg: Signature made Tue 18 Nov 2014 15:04:14 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg:                 aka "Stefan Hajnoczi <[email protected]>"

* remotes/stefanha/tags/tracing-pull-request:
  Tracing: Fix simpletrace.py error on tcg enabled binary traces
  Tracing docs fix configure option and description

Signed-off-by: Peter Maydell <[email protected]>

net: The third parameter of getsockname should be initialized

Signed-off-by: zhanghailiang <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

Tracing: Fix simpletrace.py error on tcg enabled binary traces

simpletrace.py does not recognize the tcg option while reading trace-events file. In result simpletrace does not work on binary traces and tcg enabled events. Moved transformation of tcg enabled events to _read_events() which is used by simpletrace.

Signed-off-by: Christoph Seifert <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

Tracing docs fix configure option and description

Fix the example trace configure option.
Update the text to say that multiple backends are allowed and what
happens when multiple backends are enabled.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Message-id: 1412691161 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block patches for 2.2.0-rc2

# gpg: Signature made Tue 18 Nov 2014 11:32:55 GMT using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <[email protected]>"

* remotes/kevin/tags/for-upstream:
  block/raw-posix: Catch fsync() errors
  block/raw-posix: Only sync after successful preallocation
  block/raw-posix: Fix preallocating write() loop
  raw-posix: The SEEK_HOLE code is flawed, rewrite it
  raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  raw-posix: Fix comment for raw_co_get_block_status()

Signed-off-by: Peter Maydell <[email protected]>

Merge remote-tracking branch 'remotes/amit-migration/tags/for-2.2' into staging

Fix for CVE-2014-7840, avoiding arbitrary qemu memory overwrite for
migration by Michael S. Tsirkin.

# gpg: Signature made Tue 18 Nov 2014 11:23:00 GMT using RSA key ID 854083B6
# gpg: Good signature from "Amit Shah <[email protected]>"
# gpg:                 aka "Amit Shah <[email protected]>"
# gpg:                 aka "Amit Shah <[email protected]>"

* remotes/amit-migration/tags/for-2.2:
  migration: fix parameter validation on ram load

Signed-off-by: Peter Maydell <[email protected]>

linux-headers: update to 3.18-rc5

This updates the Linux header to version 3.18-rc5, adding support for
(among other things) read-only memslots on ARM and arm64.

Signed-off-by: Ard Biesheuvel <[email protected]>
Message-id: 1416248898 [email protected]
Signed-off-by: Peter Maydell <[email protected]>

migration: fix parameter validation on ram load

During migration, the values read from migration stream during ram load
are not validated. Especially offset in host_from_stream_offset() and
also the length of the writes in the callers of said function.

To fix this, we need to make sure that the [offset, offset + length]
range fits into one of the allocated memory regions.

Validating addr < len should be sufficient since data seems to always be
managed in TARGET_PAGE_SIZE chunks.

Fixes: CVE-2014-7840
Note: follow-up patches add extra checks on each block->host access.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Amit Shah <[email protected]>

block/raw-posix: Catch fsync() errors

fsync() may fail, and that case should be handled.

Reported-by: László Érsek <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block/raw-posix: Only sync after successful preallocation

The loop which filled the file with zeroes may have been left early due
to an error. In that case, the fsync() should be skipped.

Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

block/raw-posix: Fix preallocating write() loop

write() may write less bytes than requested; in this case, the number of
bytes written is returned. This is the byte count we should be
subtracting from the number of bytes still to be written, and not the
byte count we requested to write.

Reported-by: László Érsek <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>

exec: Handle multipage ranges in invalidate_and_set_dirty()

The code in invalidate_and_set_dirty() needs to handle addr/length
combinations which cross guest physical page boundaries. This can happen,
for example, when disk I/O reads large blocks into guest RAM which previously
held code that we have cached translations for. Unfortunately we were only
checking the clean/dirty status of the first page in the range, and then
were calling a tb_invalidate function which only handles ranges that don't
cross page boundaries. Fix the function to deal with multipage ranges.

The symptoms of this bug were that guest code would misbehave (eg segfault),
in particular after a guest reboot but potentially any time the guest
reused a page of its physical RAM for new code.

Cc: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Message-id: 1416167061 [email protected]

Merge remote-tracking branch 'mreitz/block' into queue-block

* mreitz/block:
  raw-posix: The SEEK_HOLE code is flawed, rewrite it
  raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  raw-posix: Fix comment for raw_co_get_block_status()

raw-posix: The SEEK_HOLE code is flawed, rewrite it

On systems where SEEK_HOLE in a trailing hole seeks to EOF (Solaris,
but not Linux), try_seek_hole() reports trailing data instead.

Additionally, unlikely lseek() failures are treated badly:

* When SEEK_HOLE fails, try_seek_hole() reports trailing data.  For
  -ENXIO, there's in fact a trailing hole.  Can happen only when
  something truncated the file since we opened it.

* When SEEK_HOLE succeeds, SEEK_DATA fails, and SEEK_END succeeds,
  then try_seek_hole() reports a trailing hole.  This is okay only
  when SEEK_DATA failed with -ENXIO (which means the non-trailing hole
  found by SEEK_HOLE has since become trailing somehow).  For other
  failures (unlikely), it's wrong.

* When SEEK_HOLE succeeds, SEEK_DATA fails, SEEK_END fails (unlikely),
  then try_seek_hole() reports bogus data [-1,start), which its caller
  raw_co_get_block_status() turns into zero sectors of data.  Could
  theoretically lead to infinite loops in code that attempts to scan
  data vs. hole forward.

Rewrite from scratch, with very careful comments.

Signed-off-by: Markus Armbruster <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

raw-posix: SEEK_HOLE suffices, get rid of FIEMAP

Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
follows:

1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl

2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek()

3. Else pretend there are no holes

Later on, raw_co_is_allocated() was generalized to
raw_co_get_block_status().

Commit 4f11aa8 (May 2014) changed it to try the three methods in order
until success, because "there may be implementations which support
[SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice
versa."

Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC.
Commit 38c4d0a (Sep 2014) added it.  Because that's a significant
speed hit, the next commit 7c159037 put SEEK_HOLE/SEEK_DATA first.

As you see, the obvious use of FIEMAP is wrong, and the correct use is
slow.  I guess this puts it somewhere between -7 "The obvious use is
wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard
to Misuse scale[*].

"Fortunately", the FIEMAP code is used only when

* SEEK_HOLE/SEEK_DATA aren't defined, but CONFIG_FIEMAP is

  Uncommon.  SEEK_HOLE had no XFS implementation between 2011 (when it
  was introduced for ext4 and btrfs) and 2012.

* SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails

  Unlikely.

Thus, the FIEMAP code executes rarely.  Makes it a nice hidey-hole for
bugs.  Worse, bugs hiding there can theoretically bite even on a host
that has SEEK_HOLE/SEEK_DATA.

I don't want to worry about this crap, not even theoretically.  Get
rid of it.

[*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html

Signed-off-by: Markus Armbruster <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

raw-posix: Fix comment for raw_co_get_block_status()

Missed in commit 705be72.

Signed-off-by: Markus Armbruster <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Signed-off-by: Max Reitz <[email protected]>

target-arm: handle address translations that start at level 3

The ARMv8 address translation system defines that a page table walk
starts at a level which depends on the translation granule size
and the number of bits of virtual address that need to be resolved.
Where the translation granule is 64KB and the guest sets the
TCR.TxSZ field to between 35 and 39, it's actually possible to
start at level 3 (the final level). QEMU's implementation failed
to handle this case, and so we would set level to 2 and behave
incorrectly (including invoking the C undefined behaviour of
shifting left by a negative number). Correct the code that
determines the starting level to deal with the start-at-3 case,
by replacing the if-else ladder with an expression derived from
the ARM ARM pseudocode version.

This error was detected by the Coverity scan, which spotted
the potential shift by a negative number.

Signed-off-by: Peter Maydell <[email protected]>
Message-id: 1415890569 [email protected]

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

A smattering of fixes for problems that Coverity reported.

# gpg: Signature made Mon 17 Nov 2014 17:03:25 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg:                 aka "Paolo Bonzini <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  hcd-musb: fix dereference null return value
  target-cris/translate.c: fix out of bounds read
  shpc: fix error propaagation
  qemu-char: fix MISSING_COMMA
  acl: fix memory leak
  nvme: remove superfluous check
  loader: fix NEGATIVE_RETURNS
  qga: fix false negative argument passing
  mips_mipssim: fix use-after-free for filename
  l2tpv3: fix fd leak
  l2tpv3: fix possible double free
  libcacard: fix resource leak

Signed-off-by: Peter Maydell <[email protected]>

hcd-musb: fix dereference null return value

usb_ep_get and usb_handle_packet can deal with a NULL device, but we have
to avoid dereferencing NULL pointers when building the id.

Thanks to Gonglei for an initial stab at fixing this.

Signed-off-by: Paolo Bonzini <[email protected]>

Merge remote-tracking branch 'remotes/mcayland/tags/qemu-openbios-signed' into staging

Update OpenBIOS images

# gpg: Signature made Sat 15 Nov 2014 13:12:02 GMT using RSA key ID AE0F321F
# gpg: Good signature from "Mark Cave-Ayland <[email protected]>"

* remotes/mcayland/tags/qemu-openbios-signed:
Update OpenBIOS images

Signed-off-by: Peter Maydell <[email protected]>

target-cris/translate.c: fix out of bounds read

In function t_gen_mov_TN_preg and t_gen_mov_preg_TN, The begin check about the
validity of in-parameter 'r' is useless. We still access cpu_PR[r] in the
follow code if it is invalid. Which will be an out-of-bounds read error.

Fix it by using assert() to ensure it is valid before using it.

Signed-off-by: zhanghailiang <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

shpc: fix error propaagation

Signed-off-by: Gonglei <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

qemu-char: fix MISSING_COMMA

Signed-off-by: Gonglei <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

acl: fix memory leak

If 'i != index' for all acl->entries, variable
entry leaks the storage it points to.

Signed-off-by: Gonglei <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

nvme: remove superfluous check

Operands don't affect result (CONSTANT_EXPRESSION_RESULT)
((n->bar.aqa >> AQA_ASQS_SHIFT) & AQA_ASQS_MASK) > 4095
is always false regardless of the values of its operands.
This occurs as the logical second operand of '||'.

Signed-off-by: Gonglei <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

loader: fix NEGATIVE_RETURNS

lseek will return -1 on error, g_malloc0(size) and read(,,size)
paramenters cannot be negative. We should add a check for return
value of lseek().

Signed-off-by: Gonglei <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

qga: fix false negative argument passing

Function send_response(s, &qdict->base) returns a negative number
when any failures occured. But strerror()'s parameter cannot be
negative. Let's change the testing condition and pass '-ret' to
strerr().

Signed-off-by: Gonglei <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

mips_mipssim: fix use-after-free for filename

May pass freed pointer filename as an argument to error_report.

Signed-off-by: Gonglei <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

l2tpv3: fix fd leak

In this false branch, fd will leak when it is zero.
Change the testing condition.

Signed-off-by: Gonglei <[email protected]>
[Fix net_l2tpv3_cleanup as well. - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>

Update OpenBIOS images

Update OpenBIOS images to SVN r1327 built from submodule.

Signed-off-by: Mark Cave-Ayland <[email protected]>

Merge remote-tracking branch 'remotes/sstabellini/xen-2014-11-14' into staging

* remotes/sstabellini/xen-2014-11-14:
xen_disk: fix unmapping of persistent grants
pc: piix4_pm: init legacy PCI hotplug when running on Xen

Signed-off-by: Peter Maydell <[email protected]>

l2tpv3: fix possible double free

freeaddrinfo(result) does not assign result = NULL, after frees it.
There will be a double free when it goes error case.
It is reported by covertiy.

Reviewed-by: Gonglei <[email protected]>
Cc: [email protected]
Signed-off-by: zhanghailiang <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

libcacard: fix resource leak

In function connect_to_qemu(), getaddrinfo() will allocate memory
that is stored into server, it should be freed by using freeaddrinfo()
before connect_to_qemu() return.

Cc: [email protected]
Reviewed-by: Markus Armbruster <[email protected]>
Signed-off-by: zhanghailiang <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

# gpg: Signature made Fri 14 Nov 2014 11:05:54 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <[email protected]>"
# gpg:                 aka "Stefan Hajnoczi <[email protected]>"

* remotes/stefanha/tags/block-pull-request:
  vmdk: Leave bdi intact if -ENOTSUP in vmdk_get_info
  block: Fix max nb_sectors in bdrv_make_zero
  ahci: factor out FIS decomposition from handle_cmd
  ahci: Check cmd_fis[1] more explicitly
  ahci: Reorder error cases in handle_cmd
  ahci: Fix FIS decomposition
  ahci: add is_ncq predicate helper
  ide: Correct handling of malformed/short PRDTs
  ahci: unify sglist preparation
  ide: repair PIO transfers for cases where nsector > 1
  ahci: Fix byte count regression for ATAPI/PIO

Signed-off-by: Peter Maydell <[email protected]>

xen_disk: fix unmapping of persistent grants

This patch fixes two issues with persistent grants and the disk PV backend
(Qdisk):

- Keep track of memory regions where persistent grants have been mapped
   since we need to unmap them as a whole. It is not possible to unmap a
   single grant if it has been batch-mapped. A new check has also been added
   to make sure persistent grants are only used if the whole mapped region
   can be persistently mapped in the batch_maps case.
- Unmap persistent grants before switching to the closed state, so the
   frontend can also free them.

Signed-off-by: Roger Pau Monné <[email protected]>
Reported-by: George Dunlap <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Kevin Wolf <[email protected]>
Cc: Stefan Hajnoczi <[email protected]>
Cc: George Dunlap <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>

pc: piix4_pm: init legacy PCI hotplug when running on Xen

If user starts QEMU with "-machine pc,accel=xen", then
compat property in xenfv won't work and it would cause error:
"Unsupported bus. Bus doesn't have property 'acpi-pcihp-bsel' set"
when PCI device is added with -device on QEMU CLI.

From: Igor Mammedov <[email protected]>

In case of Xen instead of using compat property, just use the fact
that xen doesn't use QEMU's fw_cfg/acpi tables to switch piix4_pm
into legacy PCI hotplug mode when Xen is enabled.

Signed-off-by: Igor Mammedov <[email protected]>
Signed-off-by: Li Liang <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>

vmdk: Leave bdi intact if -ENOTSUP in vmdk_get_info

When extent types don't match, we return -ENOTSUP. In this case, be
polite to the caller and don't modify bdi.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Max Reitz <[email protected]>
Message-id: 1415938161 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

block: Fix max nb_sectors in bdrv_make_zero

In bdrv_rw_co we report -EINVAL for nb_sectors > INT_MAX /
BDRV_SECTOR_SIZE, so a caller shouldn't exceed it.

Signed-off-by: Fam Zheng <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Message-id: 1415603264 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

ahci: factor out FIS decomposition from handle_cmd

In order to make handle_cmd more readable at the macro level,
the details of how to decompose particular types of FIS packets
are left to helper functions.

In our case, the only type of FIS packet we currently expect to
see is a Register H2D FIS packet, but the gory details of its
decomposition are of no particular interest in handle_cmd.

This patch keeps the receipt of FIS packets and the decomposition
thereof separated to two different functions.

Signed-off-by: John Snow <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Message-id: 1415058979 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

ahci: Check cmd_fis[1] more explicitly

Instead of checking for a known byte, inspect the
fields of this byte explicitly to produce more meaningful
error messages and improve the readability of this section.

Signed-off-by: John Snow <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Message-id: 1415058979 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

ahci: Reorder error cases in handle_cmd

Error checking in ahci's handle_cmd is re-ordered so that we
initialize as few things as possible before we've done our
sanity checking. This simplifies returning from this call
in case of an error.

A check to make sure the DMA memory map succeeds with the
correct size is also added, and the debug print of the
command fis is cleaned up with its size corrected.

Signed-off-by: John Snow <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Message-id: 1415058979 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

ahci: Fix FIS decomposition

This patch introduces a few changes to how FIS packets are
deciphered in the AHCI virtual device. The summary of
changes can be grouped into two pieces:

[A] Changes to how we apply a preliminary sieve to FISes,
[B] Changes in how we internalize a decomposed FIS.

== Changes to how we apply a preliminary sieve to FISes ==

(1) Packets may now either update the Control register or
    the Command register, but not both. This is according
    to the SATA 3.2 specification which states:
    "...the device either initiates processing of the command
    indicated in the Command register or initiates processing
    of the control request indicated [...] depending on the
    state of the C bit in the FIS."

    See SATA 3.2 section 10.5.5.4, "Reception" in the 10.5.5
    "Register Host to Device FIS" section.

    This change accounts for the first two regions of change
    within the diff. All other changes belong to the following
    changes.

== Changes in how we internalize a decomposed FIS ==

(2) Instead of trying to extract the sector number out of the
    FIS from bytes 4-10 and setting it with ide_set_sector,
    we set the appropriate IDEState registers and trust that
    ide_get_sector can retrieve the correct sector later.

    By "constructing" the sector for use with ide_set_sector,
    we are duplicating the mechanisms of ide_get_sector.
    This change makes the FIS decomposition more obvious.

    SATA 3.2 as a specification does not make the legacy
    register mapping with respect to the D2H FIS obvious.
    However, SATA 3.2 section 10.5.5.1 "Register Host to
    Device FIS layout" describes all of the "cmd_fis"
    bytes:

    0 - FIS Type (0x27)
    1 - Port Multiplier Port and Command Update flag
    2 - ATA Command
    3 - Features_Low
    4 - LBA 7:0
    5 - LBA 15:8
    6 - LBA 23:16
    7 - Device, AKA "Drive Select."
    8 - LBA 31:24
    9 - LBA 39:32
    10 - LBA 47:40
    11 - Features_High
    12 - Count Low
    13 - Count High
    14 - ICC
    15 - Control
    16-19 - Auxiliary (for NCQ, defined per-command)

    Most of these registers map to existing IDEState registers
    in obvious ways, especially features, select, hob_features,
    and nsector (count). ICC is reserved in older specifications
    but is not supported in our implementation, and remains
    unused here. The Control register is not valid for a command
    that is trying to update the command register and is to be
    considered reserved at this point.

    What is not obvious is the LBA register mappings, but SATA 1.0
    can help inform of us legacy device support, see SATA 1.0 section
    8.5.2 "Register - Host to Device."

    LBA 7:0   - Sector Number    (sector)
    LBA 15:8  - Cyl Low          (lcyl)
    LBA 23:16 - Cyl High         (hcyl)
    LBA 31:24 - Sector Num Exp.  (hob_sector)
    LBA 39:32 - Cyl Low Exp.     (hob_lcyl)
    LBA 47:40 - Cyl High Exp.    (hob_hcyl)

    These mappings help guide which registers the FIS should be decomposed
    into/towards for CHS, LBA28 and LBA48 commands.

    As a note: The prior confusion that can be seen in the documentation
    arises from the fact that CHS and LBA28 commands use the low nybble
    of the drive select register to store LBA 27:24, whereas LNA48 commands
    use the hob_sector, hob_lcyl and hob_hcyl registers as explained above.

    The decomposition as it stands now will correctly decompose CHS, LBA28
    and LBA48 commands into their appropriate registers where the core
    IDE/ATAPI layers can deal with them correctly.

    See the below point for more information.

(3) We save cmd_fis[7] as ide_state->select, which informs
    decisions about if we are using LBA or CHS.
    This corrects a bug in AHCI wherein we attempt to set and/or
    retrieve the sector number by using ide_set_sector and
    ide_get_sector, which depend on the select register to
    determine if we are using LBA or CHS.

    Without this adjustment, LBA48 read/writes are currently
    broken. Thanks to Eniac Zheng @ HP for pointing this out.

(4) Save cmd_fis[11] as ide_state->hob_feature, as defined in SATA 3.2.

(5) For several ATA commands, the sector count register set to 0
    is a magic number that means 256 sectors. For LBA48 commands,
    this means 65,536 sectors. We drop the magic sector correction
    here, and trust the ide core layer to handle the conversion
    appropriately, in ide_cmd_lba48_transform(). As it stands,
    the current AHCI code is only compliant with LBA28 commands.
    By simply removing the magic, it will work with LBA28 and LBA48.

(6) We expand FIS decomposition to include both ATAPI and IDE devices.
    We leave the logic of determining if the fields are valid or not
    to the respective layers.

    This change intends to make it clearer that AHCI is only a
    composition mechanism for the FIS packets: the meanings of
    the registers is best left to the implementation layers for
    those devices.

(7) Forcefully setting the feature, hcyl and lcyl registers for ATAPI
    commands is removed.
    - The hcyl and lcyl magic present here is valid at boot only,
      and should not be overridden for every PACKET command.
    - The feature register is defined as valid for the PACKET command,
      so we should not suppress it. The ATAPI layer does not even
      currently depend on or require 0x01 as mandatory.

Signed-off-by: John Snow <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Message-id: 1415058979 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

ahci: add is_ncq predicate helper

A small helper to determine which S/ATA commands
are destined to be routed to the NCQ pathways.

This references SATA 3.2 section 13.6,
Native Command Queueing. See sections 13.6.4,
13.6.5, 13.6.6, 13.6.7 and 13.6.8 for all
SATA commands considered to be part of the
NCQ feature set. This is summarized in a small
list in section 13.6.3.1 and again in 13.6.3.2.

Not all of these NCQ commands are currently supported,
so the error pathways are adjusted slightly to be more
informative in the case they are encountered.

Signed-off-by: John Snow <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Message-id: 1415058979 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

ide: Correct handling of malformed/short PRDTs

This impacts both BMDMA and AHCI HBA interfaces for IDE.
Currently, we confuse the difference between a PRDT having
"0 bytes" and a PRDT having "0 complete sectors."

When we receive an incomplete sector, inconsistent error checking
leads to an infinite loop wherein the call succeeds, but it
didn't give us enough bytes -- leading us to re-call the
DMA chain over and over again. This leads to, in the BMDMA case,
leaked memory for short PRDTs, and infinite loops and resource
usage in the AHCI case.

The .prepare_buf() callback is reworked to return the number of
bytes that it successfully prepared. 0 is a valid, non-error
answer that means the table was empty and described no bytes.
-1 indicates an error.

Our current implementation uses the io_buffer in IDEState to
ultimately describe the size of a prepared scatter-gather list.
Even though the AHCI PRDT/SGList can be as large as 256GiB, the
AHCI command header limits transactions to just 4GiB. ATA8-ACS3,
however, defines the largest transaction to be an LBA48 command
that transfers 65,536 sectors. With a 512 byte sector size, this
is just 32MiB.

Since our current state structures use the int type to describe
the size of the buffer, and this state is migrated as int32, we
are limited to describing 2GiB buffer sizes unless we change the
migration protocol.

For this reason, this patch begins to unify the assertions in the
IDE pathways that the scatter-gather list provided by either the
AHCI PRDT or the PCI BMDMA PRDs can only describe, at a maximum,
2GiB. This should be resilient enough unless we need a sector
size that exceeds 32KiB.

Further, the likelihood of any guest operating system actually
attempting to transfer this much data in a single operation is
very slim.

To this end, the IDEState variables have been updated to more
explicitly clarify our maximum supported size. Callers to the
prepare_buf callback have been reworked to understand the new
return code, and all versions of the prepare_buf callback have
been adjusted accordingly.

Lastly, the ahci_populate_sglist helper, relied upon by the
AHCI implementation of .prepare_buf() as well as the PCI
implementation of the callback have had overflow assertions
added to help make clear the reasonings behind the various
type changes.

[Added %d -> %"PRId64" fix John sent because off_pos changed from int to
int64_t.
--Stefan]

Signed-off-by: John Snow <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Message-id: 1414785819 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

ahci: unify sglist preparation

The intent of this patch is to further unify the creation and
deletion of the sglist used for all AHCI transfers, including
emulated PIO, ATAPI R/W, and native DMA R/W.

By replacing ahci_start_transfer's call to ahci_populate_sglist
with ahci_dma_prepare_buf, we reduce the number of direct calls
where we manipulate the scatter-gather list in the AHCI code.

To make this switch, the constant "0" passed as an offset
in ahci_dma_prepare_buf is adjusted to use io_buffer_offset.

For DMA pathways, this has no effect: io_buffer_offset is always
updated to 0 at the beginning of a DMA transfer loop regardless.
DMA pathways through ide_dma_cb() update the io_buffer_offset
accordingly, and for circumstances where we might make several
trips through this loop, this may actually correct a design flaw.

For PIO pathways, the newly updated ahci_dma_prepare_buf will
now prepare the sglist at the correct offset. It will also set
io_buffer_size, but this is not used in the cmd_read_pio or
cmd_write_pio pathways.

Signed-off-by: John Snow <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Message-id: 1414785819 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

ide: repair PIO transfers for cases where nsector > 1

Currently, for emulated PIO transfers through the AHCI device,
any attempt made to request more than a single sector's worth
of data will result in the same sector being transferred over
and over.

For example, if we request 8 sectors via PIO READ SECTORS, the
AHCI device will give us the same sector eight times.

This patch adds offset tracking into the PIO pathways so that
we can fulfill these requests appropriately.

Signed-off-by: John Snow <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Message-id: 1414785819 [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>

ahci: Fix byte count regression for ATAPI/PIO

This patch fixes a regression caused by commit
659142ecf71a0da240ab0ff7cf929ee25c32b9bc.
The problem occurs when we wish to return early
from the ahci_start_transfer function, but are now
updating the transferred byte count in the AHCI
command header via ahci_commit_buf.

This will cause problems in the Windows 8 installer.

Don't update the byte count in the command header
for the transmission of ATAPI packets: These commands
will distort the final byte count of the actual data
payload.

The call to ahci_commit_buf remains in the "out"
portion of the call in order to clean up the sglist.
The byte count is maintained by forcing size to be 0.

Signed-off-by: John Snow <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

x86 and SCSI fixes.  I left out the APIC device model
patches, pending confirmation from the submitter that they really
fix QNX.

# gpg: Signature made Thu 13 Nov 2014 15:13:38 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <[email protected]>"
# gpg:                 aka "Paolo Bonzini <[email protected]>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  acpi: accurate overflow check
  smbios: change 'ram_addr_t' variables to 'uint64_t'
  kvmclock: Add comment explaining why we need cpu_clean_all_dirty()
  target-i386: fix Coverity complaints about overflows
  apic_common: migrate missing fields
  target-i386: eliminate dead code and hoist common code out of "if"
  virtio-scsi: Fix comment for VirtIOSCSIReq
  virtio-scsi: dataplane: suppress guest notification
  esp: Do not overwrite ESP_TCHI after reset
  virtio-scsi: dataplane: fix allocation for 'cmd_vrings'
  esp: fix coding standards
  virtio-scsi: work around bug in old BIOSes
  esp-pci: fixup deadlock with linux

Signed-off-by: Peter Maydell <[email protected]>

acpi: accurate overflow check

Compare clock in ns, because acpi_pm_tmr_update uses rounded
to ns value instead of ticks.

Signed-off-by: Pavel Dovgalyuk <[email protected]>
[This lets Windows boot in icount mode. - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>

smbios: change 'ram_addr_t' variables to 'uint64_t'

ram_addr_t should not be used except if referring to a RAMBlobk.
Using 'uint64_t' avoids a -Wconstant-conversion warning, which
clang >= 3.4 produces in "smbios_get_tables()".

Signed-off-by: SeokYeon Hwang <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

kvmclock: Add comment explaining why we need cpu_clean_all_dirty()

Try to explain why commit 317b0a6d8ba44e9bf8f9c3dbd776c4536843d82c
needed a cpu_clean_all_dirty() call just after calling
cpu_synchronize_all_states().

Signed-off-by: Eduardo Habkost <[email protected]>
Cc: Andrey Korolyov <[email protected]>
Cc: Marcin GibuÅ‚a <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

target-i386: fix Coverity complaints about overflows

sipi_vector is an int; it is shifted by 12 and passed as a 64-bit value,
which makes Coverity think that we wanted (uint64_t)sipi_vector << 12.

But actually it must be between 0 and 255. Make this explicit.

Signed-off-by: Paolo Bonzini <[email protected]>

apic_common: migrate missing fields

This patch adds missed sipi_vector and wait_for_sipi fields to a new
subsection of the vmstate of the apic_common module. Saving and loading
of these fields makes migration of the apic state deterministic.

Signed-off-by: Pavel Dovgalyuk <[email protected]>
[Initialize the field in pre_load and kvm_apic_realize. - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>