David Gibson [Tue, 20 Dec 2016 05:49:05 +0000 (16:49 +1100)]
KVM: PPC: Book3S HV: Outline of KVM-HV HPT resizing implementation
This adds a not yet working outline of the HPT resizing PAPR
extension. Specifically it adds the necessary ioctl() functions,
their basic steps, the work function which will handle preparation for
the resize, and synchronization between these, the guest page fault
path and guest HPT update path.
The actual guts of the implementation isn't here yet, so for now the
calls will always fail.
The kvm_unmap_rmapp() function, called from certain MMU notifiers, is used
to force all guest mappings of a particular host page to be set ABSENT, and
removed from the reverse mappings.
For HPT resizing, we will have some cases where we want to set just a
single guest HPTE ABSENT and remove its reverse mappings. To prepare with
this, we split out the logic from kvm_unmap_rmapp() to evict a single HPTE,
moving it to a new helper function.
The KVM_PPC_ALLOCATE_HTAB ioctl() is used to set the size of hashed page
table (HPT) that userspace expects a guest VM to have, and is also used to
clear that HPT when necessary (e.g. guest reboot).
At present, once the ioctl() is called for the first time, the HPT size can
never be changed thereafter - it will be cleared but always sized as from
the first call.
With upcoming HPT resize implementation, we're going to need to allow
userspace to resize the HPT at reset (to change it back to the default size
if the guest changed it).
So, we need to allow this ioctl() to change the HPT size.
This patch also updates Documentation/virtual/kvm/api.txt to reflect
the new behaviour. In fact the documentation was already slightly
incorrect since 572abd5 "KVM: PPC: Book3S HV: Don't fall back to
smaller HPT size in allocation ioctl"
David Gibson [Tue, 20 Dec 2016 05:49:02 +0000 (16:49 +1100)]
KVM: PPC: Book3S HV: Split HPT allocation from activation
Currently, kvmppc_alloc_hpt() both allocates a new hashed page table (HPT)
and sets it up as the active page table for a VM. For the upcoming HPT
resize implementation we're going to want to allocate HPTs separately from
activating them.
So, split the allocation itself out into kvmppc_allocate_hpt() and perform
the activation with a new kvmppc_set_hpt() function. Likewise we split
kvmppc_free_hpt(), which just frees the HPT, from kvmppc_release_hpt()
which unsets it as an active HPT, then frees it.
We also move the logic to fall back to smaller HPT sizes if the first try
fails into the single caller which used that behaviour,
kvmppc_hv_setup_htab_rma(). This introduces a slight semantic change, in
that previously if the initial attempt at CMA allocation failed, we would
fall back to attempting smaller sizes with the page allocator. Now, we
try first CMA, then the page allocator at each size. As far as I can tell
this change should be harmless.
To match, we make kvmppc_free_hpt() just free the actual HPT itself. The
call to kvmppc_free_lpid() that was there, we move to the single caller.
David Gibson [Tue, 20 Dec 2016 05:49:01 +0000 (16:49 +1100)]
KVM: PPC: Book3S HV: Don't store values derivable from HPT order
Currently the kvm_hpt_info structure stores the hashed page table's order,
and also the number of HPTEs it contains and a mask for its size. The
last two can be easily derived from the order, so remove them and just
calculate them as necessary with a couple of helper inlines.
David Gibson [Tue, 20 Dec 2016 05:49:00 +0000 (16:49 +1100)]
KVM: PPC: Book3S HV: Gather HPT related variables into sub-structure
Currently, the powerpc kvm_arch structure contains a number of variables
tracking the state of the guest's hashed page table (HPT) in KVM HV. This
patch gathers them all together into a single kvm_hpt_info substructure.
This makes life more convenient for the upcoming HPT resizing
implementation.
David Gibson [Tue, 20 Dec 2016 05:48:59 +0000 (16:48 +1100)]
KVM: PPC: Book3S HV: Rename kvm_alloc_hpt() for clarity
The difference between kvm_alloc_hpt() and kvmppc_alloc_hpt() is not at
all obvious from the name. In practice kvmppc_alloc_hpt() allocates an HPT
by whatever means, and calls kvm_alloc_hpt() which will attempt to allocate
it with CMA only.
To make this less confusing, rename kvm_alloc_hpt() to kvm_alloc_hpt_cma().
Similarly, kvm_release_hpt() is renamed kvm_free_hpt_cma().
David Gibson [Tue, 20 Dec 2016 05:48:58 +0000 (16:48 +1100)]
KVM: PPC: Book3S HV: HPT resizing documentation and reserved numbers
This adds a new powerpc-specific KVM_CAP_SPAPR_RESIZE_HPT capability to
advertise whether KVM is capable of handling the PAPR extensions for
resizing the hashed page table during guest runtime. It also adds
definitions for two new VM ioctl()s to implement this extension, and
documentation of the same.
Note that, HPT resizing is already possible with KVM PR without kernel
modification, since the HPT is managed within userspace (qemu). The
capability defined here will only be set where an in-kernel implementation
of resizing is necessary, i.e. for KVM HV. To determine if the userspace
resize implementation can be used, it's necessary to check
KVM_CAP_PPC_ALLOC_HTAB. Unfortunately older kernels incorrectly set
KVM_CAP_PPC_ALLOC_HTAB even with KVM PR. If userspace it want to support
resizing with KVM PR on such kernels, it will need a workaround.
David Gibson [Tue, 20 Dec 2016 05:48:57 +0000 (16:48 +1100)]
Documentation: Correct duplicate section number in kvm/api.txt
Both KVM_CREATE_SPAPR_TCE_64 and KVM_REINJECT_CONTROL have section number
4.98 in Documentation/virtual/kvm/api.txt, presumably due to a naive merge.
This corrects the duplication.
[[email protected] - correct section numbers for following sections,
KVM_PPC_CONFIGURE_V3_MMU and KVM_PPC_GET_RMMU_INFO, as well.]
Paul Mackerras [Mon, 30 Jan 2017 10:21:53 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: Enable radix guest support
This adds a few last pieces of the support for radix guests:
* Implement the backends for the KVM_PPC_CONFIGURE_V3_MMU and
KVM_PPC_GET_RMMU_INFO ioctls for radix guests
* On POWER9, allow secondary threads to be on/off-lined while guests
are running.
* Set up LPCR and the partition table entry for radix guests.
* Don't allocate the rmap array in the kvm_memory_slot structure
on radix.
* Don't try to initialize the HPT for radix guests, since they don't
have an HPT.
* Take out the code that prevents the HV KVM module from
initializing on radix hosts.
At this stage, we only support radix guests if the host is running
in radix mode, and only support HPT guests if the host is running in
HPT mode. Thus a guest cannot switch from one mode to the other,
which enables some simplifications.
Paul Mackerras [Mon, 30 Jan 2017 10:21:52 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: Invalidate ERAT on guest entry/exit for POWER9 DD1
On POWER9 DD1, we need to invalidate the ERAT (effective to real
address translation cache) when changing the PIDR register, which
we do as part of guest entry and exit.
Paul Mackerras [Mon, 30 Jan 2017 10:21:51 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: Allow guest exit path to have MMU on
If we allow LPCR[AIL] to be set for radix guests, then interrupts from
the guest to the host can be delivered by the hardware with relocation
on, and thus the code path starting at kvmppc_interrupt_hv can be
executed in virtual mode (MMU on) for radix guests (previously it was
only ever executed in real mode).
Most of the code is indifferent to whether the MMU is on or off, but
the calls to OPAL that use the real-mode OPAL entry code need to
be switched to use the virtual-mode code instead. The affected
calls are the calls to the OPAL XICS emulation functions in
kvmppc_read_one_intr() and related functions. We test the MSR[IR]
bit to detect whether we are in real or virtual mode, and call the
opal_rm_* or opal_* function as appropriate.
The other place that depends on the MMU being off is the optimization
where the guest exit code jumps to the external interrupt vector or
hypervisor doorbell interrupt vector, or returns to its caller (which
is __kvmppc_vcore_entry). If the MMU is on and we are returning to
the caller, then we don't need to use an rfid instruction since the
MMU is already on; a simple blr suffices. If there is an external
or hypervisor doorbell interrupt to handle, we branch to the
relocation-on version of the interrupt vector.
Paul Mackerras [Mon, 30 Jan 2017 10:21:50 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: Invalidate TLB on radix guest vcpu movement
With radix, the guest can do TLB invalidations itself using the tlbie
(global) and tlbiel (local) TLB invalidation instructions. Linux guests
use local TLB invalidations for translations that have only ever been
accessed on one vcpu. However, that doesn't mean that the translations
have only been accessed on one physical cpu (pcpu) since vcpus can move
around from one pcpu to another. Thus a tlbiel might leave behind stale
TLB entries on a pcpu where the vcpu previously ran, and if that task
then moves back to that previous pcpu, it could see those stale TLB
entries and thus access memory incorrectly. The usual symptom of this
is random segfaults in userspace programs in the guest.
To cope with this, we detect when a vcpu is about to start executing on
a thread in a core that is a different core from the last time it
executed. If that is the case, then we mark the core as needing a
TLB flush and then send an interrupt to any thread in the core that is
currently running a vcpu from the same guest. This will get those vcpus
out of the guest, and the first one to re-enter the guest will do the
TLB flush. The reason for interrupting the vcpus executing on the old
core is to cope with the following scenario:
CPU 0 CPU 1 CPU 4
(core 0) (core 0) (core 1)
VCPU 0 runs task X VCPU 1 runs
core 0 TLB gets
entries from task X
VCPU 0 moves to CPU 4
VCPU 0 runs task X
Unmap pages of task X
tlbiel
(still VCPU 1) task X moves to VCPU 1
task X runs
task X sees stale TLB
entries
That is, as soon as the VCPU starts executing on the new core, it
could unmap and tlbiel some page table entries, and then the task
could migrate to one of the VCPUs running on the old core and
potentially see stale TLB entries.
Since the TLB is shared between all the threads in a core, we only
use the bit of kvm->arch.need_tlb_flush corresponding to the first
thread in the core. To ensure that we don't have a window where we
can miss a flush, this moves the clearing of the bit from before the
actual flush to after it. This way, two threads might both do the
flush, but we prevent the situation where one thread can enter the
guest before the flush is finished.
Paul Mackerras [Mon, 30 Jan 2017 10:21:49 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: Make HPT-specific hypercalls return error in radix mode
If the guest is in radix mode, then it doesn't have a hashed page
table (HPT), so all of the hypercalls that manipulate the HPT can't
work and should return an error. This adds checks to make them
return H_FUNCTION ("function not supported").
This adds code to keep track of dirty pages when requested (that is,
when memslot->dirty_bitmap is non-NULL) for radix guests. We use the
dirty bits in the PTEs in the second-level (partition-scoped) page
tables, together with a bitmap of pages that were dirty when their
PTE was invalidated (e.g., when the page was paged out). This bitmap
is stored in the first half of the memslot->dirty_bitmap area, and
kvm_vm_ioctl_get_dirty_log_hv() now uses the second half for the
bitmap that gets returned to userspace.
Paul Mackerras [Mon, 30 Jan 2017 10:21:47 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: MMU notifier callbacks for radix guests
This adapts our implementations of the MMU notifier callbacks
(unmap_hva, unmap_hva_range, age_hva, test_age_hva, set_spte_hva)
to call radix functions when the guest is using radix. These
implementations are much simpler than for HPT guests because we
have only one PTE to deal with, so we don't need to traverse
rmap chains.
Paul Mackerras [Mon, 30 Jan 2017 10:21:46 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: Page table construction and page faults for radix guests
This adds the code to construct the second-level ("partition-scoped" in
architecturese) page tables for guests using the radix MMU. Apart from
the PGD level, which is allocated when the guest is created, the rest
of the tree is all constructed in response to hypervisor page faults.
As well as hypervisor page faults for missing pages, we also get faults
for reference/change (RC) bits needing to be set, as well as various
other error conditions. For now, we only set the R or C bit in the
guest page table if the same bit is set in the host PTE for the
backing page.
This code can take advantage of the guest being backed with either
transparent or ordinary 2MB huge pages, and insert 2MB page entries
into the guest page tables. There is no support for 1GB huge pages
yet.
This adds code to branch around the parts that radix guests don't
need - clearing and loading the SLB with the guest SLB contents,
saving the guest SLB contents on exit, and restoring the host SLB
contents.
Since the host is now using radix, we need to save and restore the
host value for the PID register.
On hypervisor data/instruction storage interrupts, we don't do the
guest HPT lookup on radix, but just save the guest physical address
for the fault (from the ASDR register) in the vcpu struct.
Paul Mackerras [Mon, 30 Jan 2017 10:21:44 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: Add basic infrastructure for radix guests
This adds a field in struct kvm_arch and an inline helper to
indicate whether a guest is a radix guest or not, plus a new file
to contain the radix MMU code, which currently contains just a
translate function which knows how to traverse the guest page
tables to translate an address.
Paul Mackerras [Mon, 30 Jan 2017 10:21:43 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: Use ASDR for HPT guests on POWER9
POWER9 adds a register called ASDR (Access Segment Descriptor
Register), which is set by hypervisor data/instruction storage
interrupts to contain the segment descriptor for the address
being accessed, assuming the guest is using HPT translation.
(For radix guests, it contains the guest real address of the
access.)
Thus, for HPT guests on POWER9, we can use this register rather
than looking up the SLB with the slbfee. instruction.
Paul Mackerras [Mon, 30 Jan 2017 10:21:42 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9
This adds the implementation of the KVM_PPC_CONFIGURE_V3_MMU ioctl
for HPT guests on POWER9. With this, we can return 1 for the
KVM_CAP_PPC_MMU_HASH_V3 capability.
Paul Mackerras [Mon, 30 Jan 2017 10:21:41 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMU
This adds two capabilities and two ioctls to allow userspace to
find out about and configure the POWER9 MMU in a guest. The two
capabilities tell userspace whether KVM can support a guest using
the radix MMU, or using the hashed page table (HPT) MMU with a
process table and segment tables. (Note that the MMUs in the
POWER9 processor cores do not use the process and segment tables
when in HPT mode, but the nest MMU does).
The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify
whether a guest will use the radix MMU or the HPT MMU, and to
specify the size and location (in guest space) of the process
table.
The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about
the radix MMU. It returns a list of supported radix tree geometries
(base page size and number of bits indexed at each level of the
radix tree) and the encoding used to specify the various page
sizes for the TLB invalidate entry instruction.
Initially, both capabilities return 0 and the ioctls return -EINVAL,
until the necessary infrastructure for them to operate correctly
is added.
Paul Mackerras [Mon, 30 Jan 2017 10:21:40 +0000 (21:21 +1100)]
powerpc/64: Allow for relocation-on interrupts from guest to host
With host and guest both using radix translation, it is feasible
for the host to take interrupts that come from the guest with
relocation on, and that is in fact what the POWER9 hardware will
do when LPCR[AIL] = 3. All such interrupts use HSRR0/1 not SRR0/1
except for system call with LEV=1 (hcall).
Therefore this adds the KVM tests to the _HV variants of the
relocation-on interrupt handlers, and adds the KVM test to the
relocation-on system call entry point.
We also instantiate the relocation-on versions of the hypervisor
data storage and instruction interrupt handlers, since these can
occur with relocation on in radix guests.
Paul Mackerras [Mon, 30 Jan 2017 10:21:39 +0000 (21:21 +1100)]
powerpc/64: Make type of partition table flush depend on partition type
When changing a partition table entry on POWER9, we do a particular
form of the tlbie instruction which flushes all TLBs and caches of
the partition table for a given logical partition ID (LPID).
This instruction has a field in the instruction word, labelled R
(radix), which should be 1 if the partition was previously a radix
partition and 0 if it was a HPT partition. This implements that
logic.
Paul Mackerras [Mon, 30 Jan 2017 10:21:37 +0000 (21:21 +1100)]
powerpc/64: More definitions for POWER9
This adds definitions for bits in the DSISR register which are used
by POWER9 for various translation-related exception conditions, and
for some more bits in the partition table entry that will be needed
by KVM.
Paul Mackerras [Mon, 30 Jan 2017 10:21:36 +0000 (21:21 +1100)]
powerpc/64: Enable use of radix MMU under hypervisor on POWER9
To use radix as a guest, we first need to tell the hypervisor via
the ibm,client-architecture call first that we support POWER9 and
architecture v3.00, and that we can do either radix or hash and
that we would like to choose later using an hcall (the
H_REGISTER_PROC_TBL hcall).
Then we need to check whether the hypervisor agreed to us using
radix. We need to do this very early on in the kernel boot process
before any of the MMU initialization is done. If the hypervisor
doesn't agree, we can't use radix and therefore clear the radix
MMU feature bit.
Later, when we have set up our process table, which points to the
radix tree for each process, we need to install that using the
H_REGISTER_PROC_TBL hcall.
Paul Mackerras [Mon, 30 Jan 2017 10:21:35 +0000 (21:21 +1100)]
powerpc/pseries: Fixes for the "ibm,architecture-vec-5" options
This fixes the byte index values for some of the option bits in
the "ibm,architectur-vec-5" property. The "platform facilities options"
bits are in byte 17 not byte 14, so the upper 8 bits of their
definitions need to be 0x11 not 0x0E. The "sub processor support" option
is in byte 21 not byte 15.
Note none of these options are actually looked up in
"ibm,architecture-vec-5" at this time, so there is no bug.
When checking whether option bits are set, we should check that
the offset of the byte being checked is less than the vector
length that we got from the hypervisor.
Paul Mackerras [Mon, 30 Jan 2017 10:21:34 +0000 (21:21 +1100)]
powerpc/64: Don't try to use radix MMU under a hypervisor
Currently, if the kernel is running on a POWER9 processor under a
hypervisor, it will try to use the radix MMU even though it doesn't have
the necessary code to use radix under a hypervisor (it doesn't negotiate
use of radix, and it doesn't do the H_REGISTER_PROC_TBL hcall). The
result is that the guest kernel will crash when it tries to turn on the
MMU.
This fixes it by looking for the /chosen/ibm,architecture-vec-5
property, and if it exists, clears the radix MMU feature bit, before we
decide whether to initialize for radix or HPT. This property is created
by the hypervisor as a result of the guest calling the
ibm,client-architecture-support method to indicate its capabilities, so
it will indicate whether the hypervisor agreed to us using radix.
Systems without a hypervisor may have this property also (for example,
skiboot creates it), so we check the HV bit in the MSR to see whether we
are running as a guest or not. If we are in hypervisor mode, then we can
do whatever we like including using the radix MMU.
The reason for using this property is that in future, when we have
support for using radix under a hypervisor, we will need to check this
property to see whether the hypervisor agreed to us using radix.
Fixes: 2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early init routines") Cc: [email protected] # v4.7+ Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
Nicholas Piggin [Fri, 27 Jan 2017 04:00:34 +0000 (14:00 +1000)]
KVM: PPC: Book3S: 64-bit CONFIG_RELOCATABLE support for interrupts
64-bit Book3S exception handlers must find the dynamic kernel base
to add to the target address when branching beyond __end_interrupts,
in order to support kernel running at non-0 physical address.
Support this in KVM by branching with CTR, similarly to regular
interrupt handlers. The guest CTR saved in HSTATE_SCRATCH1 and
restored after the branch.
Without this, the host kernel hangs and crashes randomly when it is
running at a non-0 address and a KVM guest is started.
Marc Zyngier [Wed, 25 Jan 2017 13:33:11 +0000 (13:33 +0000)]
arm/arm64: KVM: Stop propagating cacheability status of a faulted page
Now that we unconditionally flush newly mapped pages to the PoC,
there is no need to care about the "uncached" status of individual
pages - they must all be visible all the way down.
Marc Zyngier [Wed, 25 Jan 2017 12:29:59 +0000 (12:29 +0000)]
arm/arm64: KVM: Enforce unconditional flush to PoC when mapping to stage-2
When we fault in a page, we flush it to the PoC (Point of Coherency)
if the faulting vcpu has its own caches off, so that it can observe
the page we just brought it.
But if the vcpu has its caches on, we skip that step. Bad things
happen when *another* vcpu tries to access that page with its own
caches disabled. At that point, there is no garantee that the
data has made it to the PoC, and we access stale data.
The obvious fix is to always flush to PoC when a page is faulted
in, no matter what the state of the vcpu is.
Cc: [email protected] Fixes: 2d58b733c876 ("arm64: KVM: force cache clean on page fault when caches are off") Reviewed-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
Vijaya Kumar K [Thu, 26 Jan 2017 14:20:51 +0000 (19:50 +0530)]
KVM: arm/arm64: vgic: Implement VGICv3 CPU interface access
VGICv3 CPU interface registers are accessed using
KVM_DEV_ARM_VGIC_CPU_SYSREGS ioctl. These registers are accessed
as 64-bit. The cpu MPIDR value is passed along with register id.
It is used to identify the cpu for registers access.
The VM that supports SEIs expect it on destination machine to handle
guest aborts and hence checked for ICC_CTLR_EL1.SEIS compatibility.
Similarly, VM that supports Affinity Level 3 that is required for AArch64
mode, is required to be supported on destination machine. Hence checked
for ICC_CTLR_EL1.A3V compatibility.
The arch/arm64/kvm/vgic-sys-reg-v3.c handles read and write of VGIC
CPU registers for AArch64.
For AArch32 mode, arch/arm/kvm/vgic-v3-coproc.c file is created but
APIs are not implemented.
Updated arch/arm/include/uapi/asm/kvm.h with new definitions
required to compile for AArch32.
The version of VGIC v3 specification is defined here
Documentation/virtual/kvm/devices/arm-vgic-v3.txt
Vijaya Kumar K [Thu, 26 Jan 2017 14:20:50 +0000 (19:50 +0530)]
KVM: arm/arm64: vgic: Introduce VENG0 and VENG1 fields to vmcr struct
ICC_VMCR_EL2 supports virtual access to ICC_IGRPEN1_EL1.Enable
and ICC_IGRPEN0_EL1.Enable fields. Add grpen0 and grpen1 member
variables to struct vmcr to support read and write of these fields.
Also refactor vgic_set_vmcr and vgic_get_vmcr() code.
Drop ICH_VMCR_CTLR_SHIFT and ICH_VMCR_CTLR_MASK macros and instead
use ICH_VMCR_EOI* and ICH_VMCR_CBPR* macros.
Vijaya Kumar K [Thu, 26 Jan 2017 14:20:48 +0000 (19:50 +0530)]
KVM: arm/arm64: vgic: Introduce find_reg_by_id()
In order to implement vGICv3 CPU interface access, we will need to perform
table lookup of system registers. We would need both index_to_params() and
find_reg() exported for that purpose, but instead we export a single
function which combines them both.
Vijaya Kumar K [Thu, 26 Jan 2017 14:20:47 +0000 (19:50 +0530)]
KVM: arm/arm64: vgic: Add distributor and redistributor access
VGICv3 Distributor and Redistributor registers are accessed using
KVM_DEV_ARM_VGIC_GRP_DIST_REGS and KVM_DEV_ARM_VGIC_GRP_REDIST_REGS
with KVM_SET_DEVICE_ATTR and KVM_GET_DEVICE_ATTR ioctls.
These registers are accessed as 32-bit and cpu mpidr
value passed along with register offset is used to identify the
cpu for redistributor registers access.
The version of VGIC v3 specification is defined here
Documentation/virtual/kvm/devices/arm-vgic-v3.txt
Also update arch/arm/include/uapi/asm/kvm.h to compile for
AArch32 mode.
KVM: s390: Add debug logging to basic cpu model interface
Let's log something for changes in facilities, cpuid and ibc now that we
have a cpu model in QEMU. All of these calls are pretty seldom, so we
will not spill the log, the they will help to understand pontential
guest issues, for example if some instructions are fenced off.
As the s390 debug feature has a limited amount of parameters and
strings must not go away we limit the facility printing to 3 double
words, instead of building that list dynamically. This should be enough
for several years. If we ever exceed 3 double words then the logging
will be incomplete but no functional impact will happen.
KVM: s390: guestdbg: filter PER i-fetch on EXECUTE properly
When we get a PER i-fetch event on an EXECUTE or EXECUTE RELATIVE LONG
instruction, because the executed instruction generated a PER i-fetch
event, then the PER address points at the EXECUTE function, not the
fetched one.
Therefore, when filtering PER events, we have to take care of the
really fetched instruction, which we can only get by reading in guest
virtual memory.
For icpt code 4 and 56, we directly have additional information about an
EXECUTE instruction at hand. For icpt code 8, we always have to read
in guest virtual memory.
sparse with __CHECK_ENDIAN__ shows that ar_t was never properly
used across KVM on s390. We can now:
- fix all places
- do not make ar_t special
Since ar_t is just used as a register number (no endianness issues
for u8), and all other register numbers are also just plain int
variables, let's just use u8, which matches the __u8 in the userspace
ABI for the memop ioctl.
Heiko Carstens [Tue, 13 Dec 2016 13:25:32 +0000 (14:25 +0100)]
KVM: s390: get rid of bogus cc initialization
The plo inline assembly has a cc output operand that is always written
to and is also as such an operand declared. Therefore the compiler is
free to omit the rather pointless and misleading initialization.
Janosch Frank [Thu, 4 Aug 2016 07:57:36 +0000 (09:57 +0200)]
KVM: s390: instruction-execution-protection support
The new Instruction Execution Protection needs to be enabled before
the guest can use it. Therefore we pass the IEP facility bit to the
guest and enable IEP interpretation.
When we access guest memory and run into a protection exception, we
need to pass the exception data to the guest. ESOP2 provides detailed
information about all protection exceptions which ESOP1 only partially
provided.
The gaccess changes make sure, that the guest always gets all
available information.
Junaid Shahid [Thu, 22 Dec 2016 04:29:32 +0000 (20:29 -0800)]
kvm: x86: mmu: Verify that restored PTE has needed perms in fast page fault
Before fast page fault restores an access track PTE back to a regular PTE,
it now also verifies that the restored PTE would grant the necessary
permissions for the faulting access to succeed. If not, it falls back
to the slow page fault path.
Junaid Shahid [Thu, 22 Dec 2016 04:29:29 +0000 (20:29 -0800)]
kvm: x86: mmu: Set SPTE_SPECIAL_MASK within mmu.c
Instead of the caller including the SPTE_SPECIAL_MASK in the masks being
supplied to kvm_mmu_set_mmio_spte_mask() and kvm_mmu_set_mask_ptes(),
those functions now themselves include the SPTE_SPECIAL_MASK.
Note that bit 63 is now reset in the default MMIO mask.
Rename the EPT_VIOLATION_READ/WRITE/INSTR constants to
EPT_VIOLATION_ACC_READ/WRITE/INSTR to more clearly indicate that these
signify the type of the memory access as opposed to the permissions
granted by the PTE.
Thomas Huth [Wed, 25 Jan 2017 12:27:22 +0000 (13:27 +0100)]
KVM: PPC: Book3S PR: Refactor program interrupt related code into separate function
The function kvmppc_handle_exit_pr() is quite huge and thus hard to read,
and even contains a "spaghetti-code"-like goto between the different case
labels of the big switch statement. This can be made much more readable
by moving the code related to injecting program interrupts / instruction
emulation into a separate function instead.
Paul Mackerras [Tue, 6 Dec 2016 09:42:05 +0000 (20:42 +1100)]
KVM: PPC: Book3S HV: Fix H_PROD to actually wake the target vcpu
The H_PROD hypercall is supposed to wake up an idle vcpu. We have
an implementation, but because Linux doesn't use it except when
doing cpu hotplug, it was never tested properly. AIX does use it,
and reported it broken. It turns out we were waking the wrong
vcpu (the one doing H_PROD, not the target of the prod) and we
weren't handling the case where the target needs an IPI to wake
it. Fix it by using the existing kvmppc_fast_vcpu_kick_hv()
function, which is intended for this kind of thing, and by using
the target vcpu not the current vcpu.
We were also not looking at the prodded flag when checking whether a
ceded vcpu should wake up, so this adds checks for the prodded flag
alongside the checks for pending exceptions.
Nicholas Piggin [Wed, 21 Dec 2016 18:29:26 +0000 (04:29 +1000)]
KVM: PPC: Book3S: Move 64-bit KVM interrupt handler out from alt section
A subsequent patch to make KVM handlers relocation-safe makes them
unusable from within alt section "else" cases (due to the way fixed
addresses are taken from within fixed section head code).
Stop open-coding the KVM handlers, and add them both as normal. A more
optimal fix may be to allow some level of alternate feature patching in
the exception macros themselves, but for now this will do.
The TRAMP_KVM handlers must be moved to the "virt" fixed section area
(name is arbitrary) in order to be closer to .text and avoid the dreaded
"relocation truncated to fit" error.
Li Zhong [Fri, 11 Nov 2016 04:57:36 +0000 (12:57 +0800)]
KVM: PPC: Book 3S: XICS: Don't lock twice when checking for resend
This patch improves the code that takes lock twice to check the resend flag
and do the actual resending, by checking the resend flag locklessly, and
add a boolean parameter check_resend to icp_[rm_]deliver_irq(), so the
resend flag can be checked in the lock when doing the delivery.
We need make sure when we clear the ics's bit in the icp's resend_map, we
don't miss the resend flag of the irqs that set the bit. It could be
ordered through the barrier in test_and_clear_bit(), and a newly added
wmb between setting irq's resend flag, and icp's resend_map.
Li Zhong [Fri, 11 Nov 2016 04:57:35 +0000 (12:57 +0800)]
KVM: PPC: Book 3S: XICS: Implement ICS P/Q states
This patch implements P(Presented)/Q(Queued) states for ICS irqs.
When the interrupt is presented, set P. Present if P was not set.
If P is already set, don't present again, set Q.
When the interrupt is EOI'ed, move Q into P (and clear Q). If it is
set, re-present.
The asserted flag used by LSI is also incorporated into the P bit.
When the irq state is saved, P/Q bits are also saved, they need some
qemu modifications to be recognized and passed around to be restored.
KVM_XICS_PENDING bit set and saved should also indicate
KVM_XICS_PRESENTED bit set and saved. But it is possible some old
code doesn't have/recognize the P bit, so when we restore, we set P
for PENDING bit, too.
Li Zhong [Fri, 11 Nov 2016 04:57:34 +0000 (12:57 +0800)]
KVM: PPC: Book 3S: XICS: Fix potential issue with duplicate IRQ resends
It is possible that in the following order, one irq is resent twice:
CPU 1 CPU 2
ics_check_resend()
lock ics_lock
see resend set
unlock ics_lock
/* change affinity of the irq */
kvmppc_xics_set_xive()
write_xive()
lock ics_lock
see resend set
unlock ics_lock
icp_deliver_irq() /* resend */
icp_deliver_irq() /* resend again */
It doesn't have any user-visible effect at present, but needs to be avoided
when the following patch implementing the P/Q stuff is applied.
This patch clears the resend flag before releasing the ics lock, when we
know we will do a re-delivery after checking the flag, or setting the flag.
Li Zhong [Fri, 11 Nov 2016 04:57:33 +0000 (12:57 +0800)]
KVM: PPC: Book 3S: XICS: correct the real mode ICP rejecting counter
Some counters are added in Commit 6e0365b78273 ("KVM: PPC: Book3S HV:
Add ICP real mode counters"), to provide some performance statistics to
determine whether further optimizing is needed for real mode functions.
The n_reject counter counts how many times ICP rejects an irq because of
priority in real mode. The redelivery of an lsi that is still asserted
after eoi doesn't fall into this category, so the increasement there is
removed.
Also, it needs to be increased in icp_rm_deliver_irq() if it rejects
another one.
Li Zhong [Fri, 11 Nov 2016 04:57:32 +0000 (12:57 +0800)]
KVM: PPC: Book 3S: XICS cleanup: remove XICS_RM_REJECT
Commit b0221556dbd3 ("KVM: PPC: Book3S HV: Move virtual mode ICP functions
to real-mode") removed the setting of the XICS_RM_REJECT flag. And
since that commit, nothing else sets the flag any more, so we can remove
the flag and the remaining code that handles it, including the counter
that counts how many times it get set.
Paul Mackerras [Tue, 20 Dec 2016 03:02:29 +0000 (14:02 +1100)]
KVM: PPC: Book3S HV: Don't try to signal cpu -1
If the target vcpu for kvmppc_fast_vcpu_kick_hv() is not running on
any CPU, then we will have vcpu->arch.thread_cpu == -1, and as it
happens, kvmppc_fast_vcpu_kick_hv will call kvmppc_ipi_thread with
-1 as the cpu argument. Although this is not meaningful, in the past,
before commit 1704a81ccebc ("KVM: PPC: Book3S HV: Use msgsnd for IPIs
to other cores on POWER9", 2016-11-18), it was harmless because CPU
-1 is not in the same core as any real CPU thread. On a POWER9,
however, we don't do the "same core" check, so we were trying to
do a msgsnd to thread -1, which is invalid. To avoid this, we add
a check to see that vcpu->arch.thread_cpu is >= 0 before calling
kvmppc_ipi_thread() with it. Since vcpu->arch.thread_vcpu can change
asynchronously, we use READ_ONCE to ensure that the value we check is
the same value that we use as the argument to kvmppc_ipi_thread().
Fixes: 1704a81ccebc ("KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9") Signed-off-by: Paul Mackerras <[email protected]>
Christoffer Dall [Tue, 17 Jan 2017 22:09:13 +0000 (23:09 +0100)]
KVM: arm/arm64: vgic: Add debugfs vgic-state file
Add a file to debugfs to read the in-kernel state of the vgic. We don't
do any locking of the entire VGIC state while traversing all the IRQs,
so if the VM is running the user/developer may not see a quiesced state,
but should take care to pause the VM using facilities in user space for
that purpose.
We also don't support LPIs yet, but they can be added easily if needed.
Christoffer Dall [Mon, 23 Jan 2017 13:07:18 +0000 (14:07 +0100)]
KVM: arm/arm64: Remove struct vgic_irq pending field
One of the goals behind the VGIC redesign was to get rid of cached or
intermediate state in the data structures, but we decided to allow
ourselves to precompute the pending value of an IRQ based on the line
level and pending latch state. However, this has now become difficult
to base proper GICv3 save/restore on, because there is a potential to
modify the pending state without knowing if an interrupt is edge or
level configured.
See the following post and related message for more background:
https://lists.cs.columbia.edu/pipermail/kvmarm/2017-January/023195.html
This commit gets rid of the precomputed pending field in favor of a
function that calculates the value when needed, irq_is_pending().
The soft_pending field is renamed to pending_latch to represent that
this latch is the equivalent hardware latch which gets manipulated by
the input signal for edge-triggered interrupts and when writing to the
SPENDR/CPENDR registers.
After this commit save/restore code should be able to simply restore the
pending_latch state, line_level state, and config state in any order and
get the desired result.
Linus Torvalds [Sun, 22 Jan 2017 20:47:48 +0000 (12:47 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"Restore the retrigger callbacks in the IO APIC irq chips. That
addresses a long standing regression which got introduced with the
rewrite of the x86 irq subsystem two years ago and went unnoticed so
far"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioapic: Restore IO-APIC irq_chip retrigger callback
Linus Torvalds [Sun, 22 Jan 2017 20:40:09 +0000 (12:40 -0800)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio/vhost fixes from Michael Tsirkin:
"Random fixes and cleanups that accumulated over the time"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio/s390: virtio: constify virtio_config_ops structures
virtio/s390: add missing \n to end of dev_err message
virtio/s390: support READ_STATUS command for virtio-ccw
tools/virtio/ringtest: tweaks for s390
tools/virtio/ringtest: fix run-on-all.sh for offline cpus
virtio_console: fix a crash in config_work_handler
vhost/scsi: silence uninitialized variable warning
vhost: scsi: constify target_core_fabric_ops structures
Linus Torvalds [Sun, 22 Jan 2017 20:36:47 +0000 (12:36 -0800)]
Merge branch 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management fixes from Zhang Rui:
- fix a regression that thermal zone dynamically allocated sysfs
attributes are freed before they're removed, which is introduced in
4.10-rc1 (Jacob von Chorus)
- fix a boot warning because deprecated hwmon API is used (Fabio
Estevam)
- a couple of fixes for rockchip thermal driver (Brian Norris, Caesar
Wang)
* 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal: rockchip: fixes the conversion table
thermal: core: move tz->device.groups cleanup to thermal_release
thermal: thermal_hwmon: Convert to hwmon_device_register_with_info()
thermal: rockchip: handle set_trips without the trip points
thermal: rockchip: optimize the conversion table
thermal: rockchip: fixes invalid temperature case
thermal: rockchip: don't pass table structs by value
thermal: rockchip: improve conversion error messages
Linus Torvalds [Sun, 22 Jan 2017 03:01:06 +0000 (19:01 -0800)]
Merge tag 'usb-4.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a few small USB fixes for 4.10-rc5.
Most of these are gadget/dwc2 fixes for reported issues, all of these
have been in linux-next for a while. The last one is a single xhci
WARN_ON removal to handle an issue that the dwc3 driver is hitting in
the 4.10-rc tree. The warning is harmless and needs to be removed, and
a "real" fix that is more complex will show up in 4.11-rc1 for this
device.
That last patch hasn't been in linux-next yet due to the weekend
timing, but it's a "simple" WARN_ON() removal so what could go wrong?
:)"
Famous last words.
* tag 'usb-4.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
xhci: remove WARN_ON if dma mask is not set for platform devices
usb: dwc2: host: fix Wmaybe-uninitialized warning
usb: dwc2: gadget: Fix GUSBCFG.USBTRDTIM value
usb: gadget: udc: atmel: remove memory leak
usb: dwc3: exynos fix axius clock error path to do cleanup
usb: dwc2: Avoid suspending if we're in gadget mode
usb: dwc2: use u32 for DT binding parameters
usb: gadget: f_fs: Fix iterations on endpoints.
usb: dwc2: gadget: Fix DMA memory freeing
usb: gadget: composite: Fix function used to free memory
Linus Torvalds [Sun, 22 Jan 2017 02:53:06 +0000 (18:53 -0800)]
Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"Two fixes:
- a regression fix for the multiple-pmem-namespace-per-region support
added in 4.9. Even if an existing environment is not using that
feature the act of creating and a destroying a single namespace
with the ndctl utility will lead to the proliferation of extra
unwanted namespace devices.
- a fix for the error code returned from the pmem driver when the
memcpy_mcsafe() routine returns -EFAULT. Btrfs seems to be the only
block I/O consumer that tries to parse the meaning of the error
code when it is non-zero.
Neither of these fixes are critical, the namespace leak is awkward in
that it can cause device naming to change and complicates debugging
namespace initialization issues. The error code fix is included out of
caution for what other consumers might be expecting -EIO for block I/O
errors"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm, namespace: fix pmem namespace leak, delete when size set to zero
pmem: return EIO on read_pmem() failure
Linus Torvalds [Sun, 22 Jan 2017 02:46:45 +0000 (18:46 -0800)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fix from Stephen Boyd:
"One fix for Samsung Exynos524x SoCs where recent IOMMU patches have
caused some of these clocks to turn off when they were always left on
before"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk/samsung: exynos542x: mark some clocks as critical
Linus Torvalds [Sun, 22 Jan 2017 02:07:40 +0000 (18:07 -0800)]
Merge tag 'arc-4.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- more intc updates [Yuriv]
- fix module build when unwinder is turned off
- IO Coherency Programming model updates
- other miscellaneous
* tag 'arc-4.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: Revert "ARC: mm: IOC: Don't enable IOC by default"
ARC: mm: split arc_cache_init to allow __init reaping of bulk
ARCv2: IOC: Use actual memory size to setup aperture size
ARCv2: IOC: Adhere to progamming model guidelines to avoid DMA corruption
ARCv2: IOC: refactor the IOC and SLC operations into own functions
ARC: module: Fix !CONFIG_ARC_DW2_UNWIND builds
ARCv2: save r30 on kernel entry as gcc uses it for code-gen
ARCv2: IRQ: Call entry/exit functions for chained handlers in MCIP
ARC: IRQ: Use hwirq instead of virq in mask/unmask
ARC: mmu: clarify the MMUv3 programming model
Linus Torvalds [Sun, 22 Jan 2017 01:58:45 +0000 (17:58 -0800)]
Merge tag 'powerpc-4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Two fixes for fallout from the hugetlb changes we merged this cycle.
Ten other fixes, four only affect Power9, and the rest are a bit of a
mixture though nothing terrible.
Thanks to: Aneesh Kumar K.V, Anton Blanchard, Benjamin Herrenschmidt,
Dave Martin, Gavin Shan, Madhavan Srinivasan, Nicholas Piggin, Reza
Arbab"
* tag 'powerpc-4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Ignore reserved field in DCSR and PVR reads and writes
powerpc/ptrace: Preserve previous TM fprs/vsrs on short regset write
powerpc/ptrace: Preserve previous fprs/vsrs on short regset write
powerpc/perf: Use MSR to report privilege level on P9 DD1
selftest/powerpc: Wrong PMC initialized in pmc56_overflow test
powerpc/eeh: Enable IO path on permanent error
powerpc/perf: Fix PM_BRU_CMPL event code for power9
powerpc/mm: Fix little-endian 4K hugetlb
powerpc/mm/hugetlb: Don't panic when we don't find the default huge page size
powerpc: Fix pgtable pmd cache init
powerpc/icp-opal: Fix missing KVM case and harden replay
powerpc/mm: Fix memory hotplug BUG() on radix
Linus Torvalds [Fri, 20 Jan 2017 22:19:34 +0000 (14:19 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM:
- Fix for timer setup on VHE machines
- Drop spurious warning when the timer races against the vcpu running
again
- Prevent a vgic deadlock when the initialization fails (for stable)
s390:
- Fix a kernel memory exposure (for stable)
x86:
- Fix exception injection when hypercall instruction cannot be
patched"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: s390: do not expose random data via facility bitmap
KVM: x86: fix fixing of hypercalls
KVM: arm/arm64: vgic: Fix deadlock on error handling
KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systems
KVM: arm/arm64: Fix occasional warning from the timer work function
Linus Torvalds [Fri, 20 Jan 2017 22:17:04 +0000 (14:17 -0800)]
Merge branch 'scsi-target-for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bvanassche/linux
Pull SCSI target fixes from Bart Van Assche:
- two small fixes for the ibmvscsis driver
- ten patches with bug fixes for the target mode of the qla2xxx driver
- four patches that avoid that the "sparse" and "smatch" static
analyzer tools report false positives for the qla2xxx code base
* 'scsi-target-for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bvanassche/linux:
qla2xxx: Disable out-of-order processing by default in firmware
qla2xxx: Fix erroneous invalid handle message
qla2xxx: Reduce exess wait during chip reset
qla2xxx: Terminate exchange if corrupted
qla2xxx: Fix crash due to null pointer access
qla2xxx: Collect additional information to debug fw dump
qla2xxx: Reset reserved field in firmware options to 0
qla2xxx: Set tcm_qla2xxx version to automatically track qla2xxx version
qla2xxx: Include ATIO queue in firmware dump when in target mode
qla2xxx: Fix wrong IOCB type assumption
qla2xxx: Avoid that building with W=1 triggers complaints about set-but-not-used variables
qla2xxx: Move two arrays from header files to .c files
qla2xxx: Declare an array with file scope static
qla2xxx: Fix indentation
ibmvscsis: Fix sleeping in interrupt context
ibmvscsis: Fix max transfer length
Linus Torvalds [Fri, 20 Jan 2017 20:28:02 +0000 (12:28 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Just two small fixes for this -rc.
One is just killing an unused variable from Keith, but the other
fixes a performance regression for nbd in this series, where we
inadvertently flipped when we set MSG_MORE when outputting data"
* 'for-linus' of git://git.kernel.dk/linux-block:
nbd: only set MSG_MORE when we have more to send
blk-mq: Remove unused variable
Linus Torvalds [Fri, 20 Jan 2017 20:25:11 +0000 (12:25 -0800)]
Merge tag 'spi-fix-v4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"The usual small smattering of driver specific fixes. A few bits that
stand out here:
- the R-Car patches adding fallbacks are just adding new compatible
strings to the driver so that device trees are written in a more
robustly future proof fashion, this isn't strictly a fix but it's
just new IDs and it's better to get it into mainline sooner to
improve the ABI
- the DesignWare "switch to new API part 2" patch is actually a
misleadingly titled fix for a bit that got missed in the original
conversion"
* tag 'spi-fix-v4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: davinci: use dma_mapping_error()
spi: spi-axi: Free resources on error path
spi: pxa2xx: add missed break
spi: dw-mid: switch to new dmaengine_terminate_* API (part 2)
spi: dw: Make debugfs name unique between instances
spi: sh-msiof: Do not use C++ style comment
spi: armada-3700: Set mode bits correctly
spi: armada-3700: fix unsigned compare than zero on irq
spi: sh-msiof: Add R-Car Gen 2 and 3 fallback bindings
spi: SPI_FSL_DSPI should depend on HAS_DMA
Linus Torvalds [Fri, 20 Jan 2017 20:15:48 +0000 (12:15 -0800)]
Merge tag 'ceph-for-4.10-rc5' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Three filesystem endianness fixes (one goes back to the 2.6 era, all
marked for stable) and two fixups for this merge window's patches"
* tag 'ceph-for-4.10-rc5' of git://github.com/ceph/ceph-client:
ceph: fix bad endianness handling in parse_reply_info_extra
ceph: fix endianness bug in frag_tree_split_cmp
ceph: fix endianness of getattr mask in ceph_d_revalidate
libceph: make sure ceph_aes_crypt() IV is aligned
ceph: fix ceph_get_caps() interruption
Linus Torvalds [Fri, 20 Jan 2017 19:56:29 +0000 (11:56 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
"Fix two regressions, one introduced in 4.9 and a less recent one in
4.2"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: fix time_to_jiffies nsec sanity check
fuse: clear FR_PENDING flag when moving requests out of pending queue
Linus Torvalds [Fri, 20 Jan 2017 19:47:18 +0000 (11:47 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of 12 fixes including the mpt3sas one that was causing
hangs on ATA passthrough.
The others are a couple of zoned block device fixes, a SAS device
detection bug which lead to SATA drives not being matched to bays, two
qla2xxx MSI fixes, a qla2xxx req for rsp confusion caused by cut and
paste, and a few other minor fixes"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: mpt3sas: fix hang on ata passthrough commands
scsi: lpfc: Set elsiocb contexts to NULL after freeing it
scsi: sd: Ignore zoned field for host-managed devices
scsi: sd: Fix wrong DPOFUA disable in sd_read_cache_type
scsi: bfa: fix wrongly initialized variable in bfad_im_bsg_els_ct_request()
scsi: ses: Fix SAS device detection in enclosure
scsi: libfc: Fix variable name in fc_set_wwpn
scsi: lpfc: avoid double free of resource identifiers
scsi: qla2xxx: remove irq_affinity_notifier
scsi: qla2xxx: fix MSI-X vector affinity
scsi: qla2xxx: Fix apparent cut-n-paste error.
scsi: qla2xxx: Get mutex lock before checking optrom_state
Linus Torvalds [Fri, 20 Jan 2017 19:44:47 +0000 (11:44 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- avoid potential stack information leak via the ptrace ABI caused by
uninitialised variables
- SWIOTLB DMA API fall-back allocation fix when the SWIOTLB buffer is
not initialised (all RAM is suitable for 32-bit DMA masks)
- fix the bad_mode function returning for unhandled exceptions coming
from user space
- fix name clash in __page_to_voff()
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: avoid returning from bad_mode
arm64/ptrace: Reject attempts to set incomplete hardware breakpoint fields
arm64/ptrace: Avoid uninitialised struct padding in fpr_set()
arm64/ptrace: Preserve previous registers for short regset write
arm64/ptrace: Preserve previous registers for short regset write
arm64/ptrace: Preserve previous registers for short regset write
arm64: mm: avoid name clash in __page_to_voff()
arm64: Fix swiotlb fallback allocation
KVM: s390: do not expose random data via facility bitmap
kvm_s390_get_machine() populates the facility bitmap by copying bytes
from the host results that are stored in a 256 byte array in the prefix
page. The KVM code does use the size of the target buffer (2k), thus
copying and exposing unrelated kernel memory (mostly machine check
related logout data).
Let's use the size of the source buffer instead. This is ok, as the
target buffer will always be greater or equal than the source buffer as
the KVM internal buffers (and thus S390_ARCH_FAC_LIST_SIZE_BYTE) cover
the maximum possible size that is allowed by STFLE, which is 256
doublewords. All structures are zero allocated so we can leave bytes
256-2047 unchanged.
Mathias Nyman [Fri, 20 Jan 2017 13:38:24 +0000 (15:38 +0200)]
xhci: remove WARN_ON if dma mask is not set for platform devices
The warn on is a bit too much, we will anyway set the dma mask if not set
previously.
The main reason for this fix is that 4.10-rc1 has a dwc3 change that
pass a parent sysdev dev pointer instead of setting the dma mask of
its xhci platform device. xhci platform driver can then get more
attributes from the sysdev than just the dma mask.
The usb core and xhci changes are not yet in 4.10, and a fix like
this was preferred instead of taking those big changes this late in
the rc-cycle.
Anton Blanchard [Thu, 19 Jan 2017 03:19:10 +0000 (14:19 +1100)]
powerpc: Ignore reserved field in DCSR and PVR reads and writes
IBM bit 31 (for the rest of us - bit 0) is a reserved field in the
instruction definition of mtspr and mfspr. Hardware is encouraged to
(and does) ignore it.
As a result, if userspace executes an mtspr DSCR with the reserved bit
set, we get a DSCR facility unavailable exception. The kernel fails to
match against the expected value/mask, and we silently return to
userspace to try and re-execute the same mtspr DSCR instruction. We
loop forever until the process is killed.
We should do something here, and it seems mirroring what hardware does
is the better option vs killing the process. While here, relax the
matching of mfspr PVR too.
Dave Martin [Thu, 5 Jan 2017 16:50:57 +0000 (16:50 +0000)]
powerpc/ptrace: Preserve previous TM fprs/vsrs on short regset write
Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the check pointed registers, the thread's old check pointed
registers are preserved.
Fixes: 9d3918f7c0e5 ("powerpc/ptrace: Enable support for NT_PPC_CVSX") Fixes: 19cbcbf75a0c ("powerpc/ptrace: Enable support for NT_PPC_CFPR") Cc: [email protected] # v4.8+ Signed-off-by: Dave Martin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
Linus Torvalds [Fri, 20 Jan 2017 00:40:03 +0000 (16:40 -0800)]
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"We've been sitting on fixes for a while, and they keep trickling in at
a low rate. Nothing in here comes across as particularly scary or
noteworthy, for the most part it's a large collection of small DT
tweaks"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (24 commits)
ARM: dts: da850-evm: fix read access to SPI flash
ARM: dts: omap3: Fix Card Detect and Write Protect on Logic PD SOM-LV
ARM64: dts: meson-gxbb-odroidc2: Disable SCPI DVFS
ARM: dts: OMAP5 / DRA7: indicate that SATA port 0 is available.
ARM: dts: NSP: Fix DT ranges error
ARM: multi_v7_defconfig: set bcm47xx watchdog
ARM: multi_v7_defconfig: fix config typo
ARM: dts: dra72-evm-revc: fix typo in ethernet-phy node
soc: ti: wkup_m3_ipc: Fix error return code in wkup_m3_ipc_probe()
ARM: ux500: fix prcmu_is_cpu_in_wfi() calculation
ARM: dts: sunxi: Change node name for pwrseq pin on Olinuxino-lime2-emmc
ARM: dts: sun8i: Support DTB build for NanoPi M1
ARM: dts: sun6i: hummingbird: Enable display engine again
ARM: dts: sun6i: Disable display pipeline by default
ARM, ARM64: dts: drop "arm,amba-bus" in favor of "simple-bus" part 3
ARM: dts: imx6qdl-nitrogen6_som2: fix sgtl5000 pinctrl init
ARM: dts: imx6qdl-nitrogen6_max: fix sgtl5000 pinctrl init
ARM: OMAP1: DMA: Correct the number of logical channels
ARM: dts: am335x-icev2: Remove the duplicated pinmux setting
ARM: OMAP2+: Fix WL1283 Bluetooth Baud Rate
...
Linus Torvalds [Fri, 20 Jan 2017 00:33:00 +0000 (16:33 -0800)]
Merge tag 'xfs-for-linux-4.10-rc5-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"I have a few more patches this week -- one to make the behavior of a
quota id ioctl consistent with the other filesystems, and the rest
improve validation of i_mode & i_size values coming into xfs so that
we don't read off the ends of arrays or crash when handed garbage disk
data.
Summary:
- inode i_mode sanitization
- prevent overflows in getnextquota
- minor build fixes"
* tag 'xfs-for-linux-4.10-rc5-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix xfs_mode_to_ftype() prototype
xfs: don't wrap ID in xfs_dq_get_next_id
xfs: sanity check inode di_mode
xfs: sanity check inode mode when creating new dentry
xfs: replace xfs_mode_to_ftype table with switch statement
xfs: add missing include dependencies to xfs_dir2.h
xfs: sanity check directory inode di_size
xfs: make the ASSERT() condition likely