]> Git Repo - linux.git/log
linux.git
3 years agovduse: Implement an MMU-based software IOTLB
Xie Yongji [Tue, 31 Aug 2021 10:36:32 +0000 (18:36 +0800)]
vduse: Implement an MMU-based software IOTLB

This implements an MMU-based software IOTLB to support mapping
kernel dma buffer into userspace dynamically. The basic idea
behind it is treating MMU (VA->PA) as IOMMU (IOVA->PA). The
software IOTLB will set up MMU mapping instead of IOMMU mapping
for the DMA transfer so that the userspace process is able to
use its virtual address to access the dma buffer in kernel.

To avoid security issue, a bounce-buffering mechanism is
introduced to prevent userspace accessing the original buffer
directly which may contain other kernel data. During the mapping,
unmapping, the software IOTLB will copy the data from the original
buffer to the bounce buffer and back, depending on the direction
of the transfer. And the bounce-buffer addresses will be mapped
into the user address space instead of the original one.

Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agovdpa: Support transferring virtual addressing during DMA mapping
Xie Yongji [Tue, 31 Aug 2021 10:36:31 +0000 (18:36 +0800)]
vdpa: Support transferring virtual addressing during DMA mapping

This patch introduces an attribute for vDPA device to indicate
whether virtual address can be used. If vDPA device driver set
it, vhost-vdpa bus driver will not pin user page and transfer
userspace virtual address instead of physical address during
DMA mapping. And corresponding vma->vm_file and offset will be
also passed as an opaque pointer.

Suggested-by: Jason Wang <[email protected]>
Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agovdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap()
Xie Yongji [Tue, 31 Aug 2021 10:36:30 +0000 (18:36 +0800)]
vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap()

The upcoming patch is going to support VA mapping/unmapping.
So let's factor out the logic of PA mapping/unmapping firstly
to make the code more readable.

Suggested-by: Jason Wang <[email protected]>
Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agovdpa: Add an opaque pointer for vdpa_config_ops.dma_map()
Xie Yongji [Tue, 31 Aug 2021 10:36:29 +0000 (18:36 +0800)]
vdpa: Add an opaque pointer for vdpa_config_ops.dma_map()

Add an opaque pointer for DMA mapping.

Suggested-by: Jason Wang <[email protected]>
Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agovhost-iotlb: Add an opaque pointer for vhost IOTLB
Xie Yongji [Tue, 31 Aug 2021 10:36:28 +0000 (18:36 +0800)]
vhost-iotlb: Add an opaque pointer for vhost IOTLB

Add an opaque pointer for vhost IOTLB. And introduce
vhost_iotlb_add_range_ctx() to accept it.

Suggested-by: Jason Wang <[email protected]>
Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agovhost-vdpa: Handle the failure of vdpa_reset()
Xie Yongji [Tue, 31 Aug 2021 10:36:27 +0000 (18:36 +0800)]
vhost-vdpa: Handle the failure of vdpa_reset()

The vdpa_reset() may fail now. This adds check to its return
value and fail the vhost_vdpa_open().

Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agovdpa: Add reset callback in vdpa_config_ops
Xie Yongji [Tue, 31 Aug 2021 10:36:26 +0000 (18:36 +0800)]
vdpa: Add reset callback in vdpa_config_ops

This adds a new callback to support device specific reset
behavior. The vdpa bus driver will call the reset function
instead of setting status to zero during resetting.

Signed-off-by: Xie Yongji <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agovdpa: Fix some coding style issues
Xie Yongji [Tue, 31 Aug 2021 10:36:25 +0000 (18:36 +0800)]
vdpa: Fix some coding style issues

Fix some code indent issues and following checkpatch warning:

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
371: FILE: include/linux/vdpa.h:371:
+static inline void vdpa_get_config(struct vdpa_device *vdev, unsigned offset,

Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agofile: Export receive_fd() to modules
Xie Yongji [Tue, 31 Aug 2021 10:36:24 +0000 (18:36 +0800)]
file: Export receive_fd() to modules

Export receive_fd() so that some modules can use
it to pass file descriptor between processes without
missing any security stuffs.

Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agoeventfd: Export eventfd_wake_count to modules
Xie Yongji [Tue, 31 Aug 2021 10:36:23 +0000 (18:36 +0800)]
eventfd: Export eventfd_wake_count to modules

Export eventfd_wake_count so that some modules can use
the eventfd_signal_count() to check whether the
eventfd_signal() call should be deferred to a safe context.

NB(mst): this patch is not needed in Linus tree since there
eventfd_signal_count() has been superseded by an already exported
eventfd_signal_allowed().

Signed-off-by: Xie Yongji <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Jason Wang <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agoiova: Export alloc_iova_fast() and free_iova_fast()
Xie Yongji [Tue, 31 Aug 2021 10:36:22 +0000 (18:36 +0800)]
iova: Export alloc_iova_fast() and free_iova_fast()

Export alloc_iova_fast() and free_iova_fast() so that
some modules can make use of the per-CPU cache to get
rid of rbtree spinlock in alloc_iova() and free_iova()
during IOVA allocation.

Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
Acked-by: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agovirtio-blk: remove unneeded "likely" statements
Max Gurtovoy [Sun, 5 Sep 2021 08:57:17 +0000 (11:57 +0300)]
virtio-blk: remove unneeded "likely" statements

Usually we use "likely/unlikely" to optimize the fast path. Remove
redundant "likely/unlikely" statements in the control path to simplify
the code and make it easier to read.

Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Max Gurtovoy <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
3 years agovirtio-balloon: Use virtio_find_vqs() helper
Xianting Tian [Fri, 23 Jul 2021 05:42:59 +0000 (13:42 +0800)]
virtio-balloon: Use virtio_find_vqs() helper

Use the helper virtio_find_vqs().

Signed-off-by: Xianting Tian <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agoKVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted
Zelin Deng [Wed, 28 Apr 2021 02:22:01 +0000 (10:22 +0800)]
KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted

When MSR_IA32_TSC_ADJUST is written by guest due to TSC ADJUST feature
especially there's a big tsc warp (like a new vCPU is hot-added into VM
which has been up for a long time), tsc_offset is added by a large value
then go back to guest. This causes system time jump as tsc_timestamp is
not adjusted in the meantime and pvclock monotonic character.
To fix this, just notify kvm to update vCPU's guest time before back to
guest.

Cc: [email protected]
Signed-off-by: Zelin Deng <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <1619576521[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: MMU: mark role_regs and role accessors as maybe unused
Paolo Bonzini [Mon, 6 Sep 2021 09:52:23 +0000 (05:52 -0400)]
KVM: MMU: mark role_regs and role accessors as maybe unused

It is reasonable for these functions to be used only in some configurations,
for example only if the host is 64-bits (and therefore supports 64-bit
guests).  It is also reasonable to keep the role_regs and role accessors
in sync even though some of the accessors may be used only for one of the
two sets (as is the case currently for CR4.LA57)..

Because clang reports warnings for unused inlines declared in a .c file,
mark both sets of accessors as __maybe_unused.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: MIPS: Remove a "set but not used" variable
Huacai Chen [Tue, 6 Apr 2021 02:49:11 +0000 (10:49 +0800)]
KVM: MIPS: Remove a "set but not used" variable

This fix a build warning:

   arch/mips/kvm/vz.c: In function '_kvm_vz_restore_htimer':
>> arch/mips/kvm/vz.c:392:10: warning: variable 'freeze_time' set but not used [-Wunused-but-set-variable]
     392 |  ktime_t freeze_time;
         |          ^~~~~~~~~~~

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
Message-Id: <20210406024911.2008046[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoMerge tag 'kvmarm-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmar...
Paolo Bonzini [Mon, 6 Sep 2021 10:34:11 +0000 (06:34 -0400)]
Merge tag 'kvmarm-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 5.15

- Page ownership tracking between host EL1 and EL2

- Rely on userspace page tables to create large stage-2 mappings

- Fix incompatibility between pKVM and kmemleak

- Fix the PMU reset state, and improve the performance of the virtual PMU

- Move over to the generic KVM entry code

- Address PSCI reset issues w.r.t. save/restore

- Preliminary rework for the upcoming pKVM fixed feature

- A bunch of MM cleanups

- a vGIC fix for timer spurious interrupts

- Various cleanups

3 years agoMerge tag 'kvm-s390-next-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Paolo Bonzini [Mon, 6 Sep 2021 10:33:40 +0000 (06:33 -0400)]
Merge tag 'kvm-s390-next-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fix and feature for 5.15

- enable interpretion of specification exceptions
- fix a vcpu_idx vs vcpu_id mixup

3 years agox86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait
Lai Jiangshan [Sat, 14 Aug 2021 03:51:29 +0000 (11:51 +0800)]
x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait

Commit f4e61f0c9add3 ("x86/kvm: Fix broken irq restoration in kvm_wait")
replaced "local_irq_restore() when IRQ enabled" with "local_irq_enable()
when IRQ enabled" to suppress a warnning.

Although there is no similar debugging warnning for doing local_irq_enable()
when IRQ enabled as doing local_irq_restore() in the same IRQ situation.  But
doing local_irq_enable() when IRQ enabled is no less broken as doing
local_irq_restore() and we'd better avoid it.

Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
Message-Id: <20210814035129[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: stats: Add VM stat for remote tlb flush requests
Jing Zhang [Tue, 17 Aug 2021 00:26:39 +0000 (00:26 +0000)]
KVM: stats: Add VM stat for remote tlb flush requests

Add a new stat that counts the number of times a remote TLB flush is
requested, regardless of whether it kicks vCPUs out of guest mode. This
allows us to look at how often flushes are initiated.

Unlike remote_tlb_flush, this one applies to ARM's instruction-set-based
TLB flush implementation, so apply it there too.

Original-by: David Matlack <[email protected]>
Signed-off-by: Jing Zhang <[email protected]>
Message-Id: <20210817002639.3856694[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count()
Sean Christopherson [Thu, 2 Sep 2021 17:59:51 +0000 (10:59 -0700)]
KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count()

Don't export KVM's MMU notifier count helpers, under no circumstance
should any downstream module, including x86's vendor code, have a
legitimate reason to piggyback KVM's MMU notifier logic.  E.g in the x86
case, only KVM's MMU should be elevating the notifier count, and that
code is always built into the core kvm.ko module.

Fixes: edb298c663fc ("KVM: x86/mmu: bump mmu notifier count in kvm_zap_gfn_range")
Cc: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Message-Id: <20210902175951.1387989[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page
Sean Christopherson [Wed, 1 Sep 2021 22:10:23 +0000 (15:10 -0700)]
KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page

Move "lpage_disallowed_link" out of the first 64 bytes, i.e. out of the
first cache line, of kvm_mmu_page so that "spt" and to a lesser extent
"gfns" land in the first cache line.  "lpage_disallowed_link" is accessed
relatively infrequently compared to "spt", which is accessed any time KVM
is walking and/or manipulating the shadow page tables.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20210901221023.1303578[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality
Sean Christopherson [Wed, 1 Sep 2021 22:10:22 +0000 (15:10 -0700)]
KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality

Move "tdp_mmu_page" into the 1-byte void left by the recently removed
"mmio_cached" so that it resides in the first 64 bytes of kvm_mmu_page,
i.e. in the same cache line as the most commonly accessed fields.

Don't bother wrapping tdp_mmu_page in CONFIG_X86_64, including the field in
32-bit builds doesn't affect the size of kvm_mmu_page, and a future patch
can always wrap the field in the unlikely event KVM gains a 1-byte flag
that is 32-bit specific.

Note, the size of kvm_mmu_page is also unchanged on CONFIG_X86_64=y due
to it previously sharing an 8-byte chunk with write_flooding_count.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20210901221023.1303578[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoRevert "KVM: x86: mmu: Add guest physical address check in translate_gpa()"
Sean Christopherson [Tue, 31 Aug 2021 16:42:22 +0000 (09:42 -0700)]
Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()"

Revert a misguided illegal GPA check when "translating" a non-nested GPA.
The check is woefully incomplete as it does not fill in @exception as
expected by all callers, which leads to KVM attempting to inject a bogus
exception, potentially exposing kernel stack information in the process.

 WARNING: CPU: 0 PID: 8469 at arch/x86/kvm/x86.c:525 exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525
 CPU: 1 PID: 8469 Comm: syz-executor531 Not tainted 5.14.0-rc7-syzkaller #0
 RIP: 0010:exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525
 Call Trace:
  x86_emulate_instruction+0xef6/0x1460 arch/x86/kvm/x86.c:7853
  kvm_mmu_page_fault+0x2f0/0x1810 arch/x86/kvm/mmu/mmu.c:5199
  handle_ept_misconfig+0xdf/0x3e0 arch/x86/kvm/vmx/vmx.c:5336
  __vmx_handle_exit arch/x86/kvm/vmx/vmx.c:6021 [inline]
  vmx_handle_exit+0x336/0x1800 arch/x86/kvm/vmx/vmx.c:6038
  vcpu_enter_guest+0x2a1c/0x4430 arch/x86/kvm/x86.c:9712
  vcpu_run arch/x86/kvm/x86.c:9779 [inline]
  kvm_arch_vcpu_ioctl_run+0x47d/0x1b20 arch/x86/kvm/x86.c:10010
  kvm_vcpu_ioctl+0x49e/0xe50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3652

The bug has escaped notice because practically speaking the GPA check is
useless.  The GPA check in question only comes into play when KVM is
walking guest page tables (or "translating" CR3), and KVM already handles
illegal GPA checks by setting reserved bits in rsvd_bits_mask for each
PxE, or in the case of CR3 for loading PTDPTRs, manually checks for an
illegal CR3.  This particular failure doesn't hit the existing reserved
bits checks because syzbot sets guest.MAXPHYADDR=1, and IA32 architecture
simply doesn't allow for such an absurd MAXPHYADDR, e.g. 32-bit paging
doesn't define any reserved PA bits checks, which KVM emulates by only
incorporating the reserved PA bits into the "high" bits, i.e. bits 63:32.

Simply remove the bogus check.  There is zero meaningful value and no
architectural justification for supporting guest.MAXPHYADDR < 32, and
properly filling the exception would introduce non-trivial complexity.

This reverts commit ec7771ab471ba6a945350353617e2e3385d0e013.

Fixes: ec7771ab471b ("KVM: x86: mmu: Add guest physical address check in translate_gpa()")
Cc: [email protected]
Reported-by: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20210831164224.1119728[email protected]>
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page
Jia He [Mon, 30 Aug 2021 14:53:36 +0000 (22:53 +0800)]
KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page

After reverting and restoring the fast tlb invalidation patch series,
the mmio_cached is not removed. Hence a unused field is left in
kvm_mmu_page.

Cc: Sean Christopherson <[email protected]>
Signed-off-by: Jia He <[email protected]>
Message-Id: <20210830145336[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agokvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710
Eduardo Habkost [Fri, 3 Sep 2021 21:16:00 +0000 (17:16 -0400)]
kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710

Support for 710 VCPUs was tested by Red Hat since RHEL-8.4,
so increase KVM_SOFT_MAX_VCPUS to 710.

Signed-off-by: Eduardo Habkost <[email protected]>
Message-Id: <20210903211600.2002377[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agokvm: x86: Increase MAX_VCPUS to 1024
Eduardo Habkost [Fri, 3 Sep 2021 21:15:59 +0000 (17:15 -0400)]
kvm: x86: Increase MAX_VCPUS to 1024

Increase KVM_MAX_VCPUS to 1024, so we can test larger VMs.

I'm not changing KVM_SOFT_MAX_VCPUS yet because I'm afraid it
might involve complicated questions around the meaning of
"supported" and "recommended" in the upstream tree.
KVM_SOFT_MAX_VCPUS will be changed in a separate patch.

For reference, visible effects of this change are:
- KVM_CAP_MAX_VCPUS will now return 1024 (of course)
- Default value for CPUID[HYPERV_CPUID_IMPLEMENT_LIMITS (00x40000005)].EAX
  will now be 1024
- KVM_MAX_VCPU_ID will change from 1151 to 4096
- Size of struct kvm will increase from 19328 to 22272 bytes
  (in x86_64)
- Size of struct kvm_ioapic will increase from 1780 to 5084 bytes
  (in x86_64)
- Bitmap stack variables that will grow:
  - At kvm_hv_flush_tlb() kvm_hv_send_ipi(),
    vp_bitmap[] and vcpu_bitmap[] will now be 128 bytes long
  - vcpu_bitmap at bioapic_write_indirect() will be 128 bytes long
    once patch "KVM: x86: Fix stack-out-of-bounds memory access
    from ioapic_write_indirect()" is applied

Signed-off-by: Eduardo Habkost <[email protected]>
Message-Id: <20210903211600.2002377[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agokvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS
Eduardo Habkost [Fri, 3 Sep 2021 21:15:58 +0000 (17:15 -0400)]
kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS

Instead of requiring KVM_MAX_VCPU_ID to be manually increased
every time we increase KVM_MAX_VCPUS, set it to 4*KVM_MAX_VCPUS.
This should be enough for CPU topologies where Cores-per-Package
and Packages-per-Socket are not powers of 2.

In practice, this increases KVM_MAX_VCPU_ID from 1023 to 1152.
The only side effect of this change is making some fields in
struct kvm_ioapic larger, increasing the struct size from 1628 to
1780 bytes (in x86_64).

Signed-off-by: Eduardo Habkost <[email protected]>
Message-Id: <20210903211600.2002377[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoiwlwifi: fix printk format warnings in uefi.c
Randy Dunlap [Sat, 21 Aug 2021 02:09:01 +0000 (19:09 -0700)]
iwlwifi: fix printk format warnings in uefi.c

The kernel test robot reports printk format warnings in uefi.c, so
correct them.

../drivers/net/wireless/intel/iwlwifi/fw/uefi.c: In function 'iwl_uefi_get_pnvm':
../drivers/net/wireless/intel/iwlwifi/fw/uefi.c:52:30: warning: format '%zd' expects argument of type 'signed size_t', but argument 7 has type 'long unsigned int' [-Wformat=]
   52 |                              "PNVM UEFI variable not found %d (len %zd)\n",
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   53 |                              err, package_size);
      |                                   ~~~~~~~~~~~~
      |                                   |
      |                                   long unsigned int
../drivers/net/wireless/intel/iwlwifi/fw/uefi.c:59:29: warning: format '%zd' expects argument of type 'signed size_t', but argument 6 has type 'long unsigned int' [-Wformat=]
   59 |         IWL_DEBUG_FW(trans, "Read PNVM from UEFI with size %zd\n", package_size);
      |                             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~
      |                                                                    |
      |                                                                    long unsigned int

Fixes: 84c3c9952afb ("iwlwifi: move UEFI code to a separate file")
Signed-off-by: Randy Dunlap <[email protected]>
Reported-by: kernel test robot <[email protected]>
Cc: Kalle Valo <[email protected]>
Cc: Luca Coelho <[email protected]>
Cc: [email protected]
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
3 years agoKVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation
Maxim Levitsky [Thu, 26 Aug 2021 09:57:49 +0000 (12:57 +0300)]
KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation

If we are emulating an invalid guest state, we don't have a correct
exit reason, and thus we shouldn't do anything in this function.

Signed-off-by: Maxim Levitsky <[email protected]>
Message-Id: <20210826095750.1650467[email protected]>
Cc: [email protected]
Fixes: 95b5a48c4f2b ("KVM: VMX: Handle NMIs, #MCs and async #PFs in common irqs-disabled fn", 2019-06-18)
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host
Sean Christopherson [Tue, 24 Aug 2021 00:58:24 +0000 (17:58 -0700)]
KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host

Include pml5_root in the set of special roots if and only if the host,
and thus NPT, is using 5-level paging.  mmu_alloc_special_roots() expects
special roots to be allocated as a bundle, i.e. they're either all valid
or all NULL.  But for pml5_root, that expectation only holds true if the
host uses 5-level paging, which causes KVM to WARN about pml5_root being
NULL when the other special roots are valid.

The silver lining of 4-level vs. 5-level NPT being tied to the host
kernel's paging level is that KVM's shadow root level is constant; unlike
VMX's EPT, KVM can't choose 4-level NPT based on guest.MAXPHYADDR.  That
means KVM can still expect pml5_root to be bundled with the other special
roots, it just needs to be conditioned on the shadow root level.

Fixes: cb0f722aff6e ("KVM: x86/mmu: Support shadowing NPT when 5-level paging is enabled in host")
Reported-by: Maxim Levitsky <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20210824005824[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agocpufreq: mediatek-hw: Add support for CPUFREQ HW
Hector.Yuan [Fri, 3 Sep 2021 08:39:24 +0000 (16:39 +0800)]
cpufreq: mediatek-hw: Add support for CPUFREQ HW

Introduce cpufreq HW driver which can support
CPU frequency adjust in MT6779 platform.

Signed-off-by: Hector.Yuan <[email protected]>
[ Viresh: Massaged the patch and cleaned some stuff. ]
Signed-off-by: Viresh Kumar <[email protected]>
3 years agocpufreq: Add of_perf_domain_get_sharing_cpumask
Hector.Yuan [Fri, 3 Sep 2021 08:39:23 +0000 (16:39 +0800)]
cpufreq: Add of_perf_domain_get_sharing_cpumask

Add of_perf_domain_get_sharing_cpumask function to group cpu
to specific performance domain.

Signed-off-by: Hector.Yuan <[email protected]>
[ Viresh: create separate routine parse_perf_domain() and always set the
  cpumask. ]
Signed-off-by: Viresh Kumar <[email protected]>
3 years agodt-bindings: cpufreq: add bindings for MediaTek cpufreq HW
Hector.Yuan [Fri, 3 Sep 2021 08:39:22 +0000 (16:39 +0800)]
dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW

Add devicetree bindings for MediaTek HW driver.

Signed-off-by: Hector.Yuan <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>
3 years agodt-bindings: mfd: Add Broadcom CRU
Rafał Miłecki [Tue, 13 Jul 2021 09:47:45 +0000 (11:47 +0200)]
dt-bindings: mfd: Add Broadcom CRU

CRU is a block used in e.g. Northstar devices. It can be seen in the
bcm5301x.dtsi and this binding documents its proper usage.

Signed-off-by: Rafał Miłecki <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
3 years agonvme: update MAINTAINERS email address
Chaitanya Kulkarni [Mon, 6 Sep 2021 02:26:01 +0000 (19:26 -0700)]
nvme: update MAINTAINERS email address

Update the MAINTAINERS file email address.

Signed-off-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
3 years agonvme: add error handling support for add_disk()
Luis Chamberlain [Mon, 30 Aug 2021 21:25:33 +0000 (14:25 -0700)]
nvme: add error handling support for add_disk()

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
3 years agonvme: only call synchronize_srcu when clearing current path
Daniel Wagner [Wed, 1 Sep 2021 09:25:24 +0000 (11:25 +0200)]
nvme: only call synchronize_srcu when clearing current path

The function nmve_mpath_clear_current_path returns true if the current
path has changed. In this case we have to wait for all concurrent
submissions to finish. But if we didn't change the current path, there
is no point in waiting for another RCU period to finish.

Signed-off-by: Daniel Wagner <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
3 years agonvme: update keep alive interval when kato is modified
Tatsuya Sasaki [Wed, 1 Sep 2021 08:23:42 +0000 (08:23 +0000)]
nvme: update keep alive interval when kato is modified

Currently the connection between host and NVMe-oF target gets
disconnected by keep-alive timeout when a user connects to a target
with a relatively large kato value and then sets the smaller kato
with a set features command (e.g. connects with 60 seconds kato value
and then sets 10 seconds kato value).

The cause is that keep alive command interval on the host, which is
defined as unsigned int kato in nvme_ctrl structure, does not follow
the kato value changes.

This patch updates the keep alive interval in the following steps when
the kato is modified by a set features command: stops the keep alive
work queue, then sets the kato as new timer value and re-start the queue.

Signed-off-by: Tatsuya Sasaki <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
3 years agonvme-tcp: Do not reset transport on data digest errors
Daniel Wagner [Mon, 30 Aug 2021 13:36:26 +0000 (15:36 +0200)]
nvme-tcp: Do not reset transport on data digest errors

The spec says

  7.4.6.1 Digest Error handling

  When a host detects a data digest error in a C2HData PDU, that host
  shall continue processing C2HData PDUs associated with the command and
  when the command processing has completed, if a successful status was
  returned by the controller, the host shall fail the command with a
  non-fatal transport error.

Currently the transport is reseted when a data digest error is
detected. Instead, when a digest error is detected, mark the final
status as NVME_SC_DATA_XFER_ERROR and let the upper layer handle
the error.

In order to keep track of the final result maintain a status field in
nvme_tcp_request object and use it to overwrite the completion queue
status (which might be successful even though a digest error has been
detected) when completing the request.

Signed-off-by: Daniel Wagner <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
3 years agonvmet: fixup buffer overrun in nvmet_subsys_attr_serial()
Hannes Reinecke [Mon, 6 Sep 2021 07:04:03 +0000 (09:04 +0200)]
nvmet: fixup buffer overrun in nvmet_subsys_attr_serial()

The serial number is copied into the buffer via memcpy_and_pad()
with the length NVMET_SN_MAX_SIZE. So when printing out we also
need to take just that length as anything beyond that will be
uninitialized.

Signed-off-by: Hannes Reinecke <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
3 years agonvmet: return bool from nvmet_passthru_ctrl and nvmet_is_passthru_req
Christoph Hellwig [Fri, 27 Aug 2021 06:11:12 +0000 (08:11 +0200)]
nvmet: return bool from nvmet_passthru_ctrl and nvmet_is_passthru_req

The target core code never needs the host-side nvme_ctrl structure.
Open code two uses of nvmet_is_passthru_req in passthru.c, and then
switch the helpers used by the core to return bool.  Also rename the
fuctions to better match their usage:

  nvmet_passthru_ctrl -> nvmet_is_passthru_subsys
  nvmet_req_passthru_ctrl -> nvmet_is_passthru_req

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
3 years agonvmet: looks at the passthrough controller when initializing CAP
Adam Manzanares [Thu, 26 Aug 2021 21:15:45 +0000 (21:15 +0000)]
nvmet: looks at the passthrough controller when initializing CAP

For a passthru controller make cap initialization dependent on the cap of
the passthru controller, given that multiple Command Set support needs
to be supported by the underlying controller.  For that move the
initialization of CAP later so that it can use the fully initialized
nvmet_ctrl structure.

Fixes: ab5d0b38c047 (nvmet: add Command Set Identifier support)
Signed-off-by: Adam Manzanares <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
[hch: refactored the code a bit to keep it more contained in passthru.c]
Signed-off-by: Christoph Hellwig <[email protected]>
3 years agonvme: move nvme_multi_css into nvme.h
Adam Manzanares [Thu, 26 Aug 2021 21:15:45 +0000 (21:15 +0000)]
nvme: move nvme_multi_css into nvme.h

Preparatory patch in order to reuse nvme_multi_css in the nvme target
code.

Signed-off-by: Adam Manzanares <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
3 years agonvme-multipath: revalidate paths during rescan
Hannes Reinecke [Tue, 24 Aug 2021 14:57:42 +0000 (16:57 +0200)]
nvme-multipath: revalidate paths during rescan

When triggering a rescan due to a namespace resize we will be
receiving AENs on every controller, triggering a rescan of all
attached namespaces. If multipath is active only the current path and
the ns_head disk will be updated, the other paths will still refer to
the old size until AENs for the remaining controllers are received.

If I/O comes in before that it might be routed to one of the old
paths, triggering an I/O failure with 'access beyond end of device'.
With this patch the old paths are skipped from multipath path
selection until the controller serving these paths has been rescanned.

Signed-off-by: Hannes Reinecke <[email protected]>
[dwagner: - introduce NVME_NS_READY flag instead of NVME_NS_INVALIDATE
          - use 'revalidate' instead of 'invalidate' which
    follows the zoned device code path.
  - clear NVME_NS_READY before clearing current_path]
Signed-off-by: Daniel Wagner <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
3 years agonvme-multipath: set QUEUE_FLAG_NOWAIT
Christoph Hellwig [Mon, 14 Jun 2021 11:17:34 +0000 (13:17 +0200)]
nvme-multipath: set QUEUE_FLAG_NOWAIT

The nvme multipathing code just dispatches bios to one of the blk-mq
based paths and never blocks on its own, so set QUEUE_FLAG_NOWAIT
to support REQ_NOWAIT bios.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
3 years agodrm/i915: use linux/stddef.h due to "isystem: trim/fixup stdarg.h and other headers"
Stephen Rothwell [Fri, 20 Aug 2021 02:33:48 +0000 (12:33 +1000)]
drm/i915: use linux/stddef.h due to "isystem: trim/fixup stdarg.h and other headers"

After merging the drm tree, today's linux-next build (x86_64 allmodconfig)
failed like this:

In file included from drivers/gpu/drm/i915/i915_debugfs.c:39:
drivers/gpu/drm/i915/gt/intel_gt_requests.h:9:10: fatal error: stddef.h: No such file or directory
    9 | #include <stddef.h>
      |          ^~~~~~~~~~

Caused by commit

  564f963eabd1 ("isystem: delete global -isystem compile option")

from the kbuild tree interacting with commit

  b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC")

Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC")
Signed-off-by: Stephen Rothwell <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
3 years agovdpa: Make use of PFN_PHYS/PFN_UP/PFN_DOWN helper macro
Cai Huoqing [Mon, 2 Aug 2021 01:37:17 +0000 (09:37 +0800)]
vdpa: Make use of PFN_PHYS/PFN_UP/PFN_DOWN helper macro

it's a nice refactor to make use of
PFN_PHYS/PFN_UP/PFN_DOWN helper macro

Signed-off-by: Cai Huoqing <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Jason Wang <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agovsock_test: update message bounds test for MSG_EOR
Arseny Krasnov [Fri, 3 Sep 2021 12:33:18 +0000 (15:33 +0300)]
vsock_test: update message bounds test for MSG_EOR

Set 'MSG_EOR' in one of message sent, check that 'MSG_EOR'
is visible in corresponding message at receiver.

Signed-off-by: Arseny Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agoaf_vsock: rename variables in receive loop
Arseny Krasnov [Fri, 3 Sep 2021 12:33:03 +0000 (15:33 +0300)]
af_vsock: rename variables in receive loop

Record is supported via MSG_EOR flag, while current logic operates
with message, so rename variables from 'record' to 'message'.

Signed-off-by: Arseny Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agoInput: edt-ft5x06 - added case for EDT EP0110M09
Oliver Graute [Mon, 6 Sep 2021 02:09:25 +0000 (19:09 -0700)]
Input: edt-ft5x06 - added case for EDT EP0110M09

Add Support for EP011M09 Firmware

Signed-off-by: Oliver Graute <[email protected]>
Reviewed-by: Marco Felsch <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
3 years agoMerge branch 'next' into for-linus
Dmitry Torokhov [Mon, 6 Sep 2021 01:58:05 +0000 (18:58 -0700)]
Merge branch 'next' into for-linus

Prepare input updates for 5.15 merge window.

3 years agoNTB: switch from 'pci_' to 'dma_' API
Christophe JAILLET [Sun, 22 Aug 2021 14:04:12 +0000 (16:04 +0200)]
NTB: switch from 'pci_' to 'dma_' API

The wrappers in include/linux/pci-dma-compat.h should go away.

The patch has been generated with the coccinelle script below.

It has been compile tested.

@@
@@
-    PCI_DMA_BIDIRECTIONAL
+    DMA_BIDIRECTIONAL

@@
@@
-    PCI_DMA_TODEVICE
+    DMA_TO_DEVICE

@@
@@
-    PCI_DMA_FROMDEVICE
+    DMA_FROM_DEVICE

@@
@@
-    PCI_DMA_NONE
+    DMA_NONE

@@
expression e1, e2, e3;
@@
-    pci_alloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3;
@@
-    pci_zalloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3, e4;
@@
-    pci_free_consistent(e1, e2, e3, e4)
+    dma_free_coherent(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_single(e1, e2, e3, e4)
+    dma_map_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_single(e1, e2, e3, e4)
+    dma_unmap_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4, e5;
@@
-    pci_map_page(e1, e2, e3, e4, e5)
+    dma_map_page(&e1->dev, e2, e3, e4, e5)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_page(e1, e2, e3, e4)
+    dma_unmap_page(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_sg(e1, e2, e3, e4)
+    dma_map_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_sg(e1, e2, e3, e4)
+    dma_unmap_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_device(e1, e2, e3, e4)
+    dma_sync_single_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2;
@@
-    pci_dma_mapping_error(e1, e2)
+    dma_mapping_error(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_dma_mask(e1, e2)
+    dma_set_mask(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_consistent_dma_mask(e1, e2)
+    dma_set_coherent_mask(&e1->dev, e2)

Signed-off-by: Christophe JAILLET <[email protected]>
Acked-by: Serge Semin <[email protected]>
Signed-off-by: Jon Mason <[email protected]>
3 years agontb: ntb_pingpong: remove redundant initialization of variables msg_data and spad_data
Colin Ian King [Wed, 9 Jun 2021 11:21:28 +0000 (12:21 +0100)]
ntb: ntb_pingpong: remove redundant initialization of variables msg_data and spad_data

The variables msg_data and spad_data are being initialized with values
that are never read, they are being updated later on. The initializations
are redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <[email protected]>
Acked-by: Serge Semin <[email protected]>
Signed-off-by: Jon Mason <[email protected]>
3 years agovirtio/vsock: support MSG_EOR bit processing
Arseny Krasnov [Fri, 3 Sep 2021 12:32:48 +0000 (15:32 +0300)]
virtio/vsock: support MSG_EOR bit processing

If packet has 'EOR' bit - set MSG_EOR in 'recvmsg()' flags.

Signed-off-by: Arseny Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agovhost/vsock: support MSG_EOR bit processing
Arseny Krasnov [Fri, 3 Sep 2021 12:32:35 +0000 (15:32 +0300)]
vhost/vsock: support MSG_EOR bit processing

'MSG_EOR' handling has similar logic as 'MSG_EOM' - if bit present
in packet's header, reset it to 0. Then restore it back if packet
processing wasn't completed. Instead of bool variable for each
flag, bit mask variable was added: it has logical OR of 'MSG_EOR'
and 'MSG_EOM' if needed, to restore flags, this variable is ORed
with flags field of packet.

Signed-off-by: Arseny Krasnov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
3 years agovirtio/vsock: add 'VIRTIO_VSOCK_SEQ_EOR' bit.
Arseny Krasnov [Fri, 3 Sep 2021 12:32:23 +0000 (15:32 +0300)]
virtio/vsock: add 'VIRTIO_VSOCK_SEQ_EOR' bit.

This bit is used to handle POSIX MSG_EOR flag passed from
userspace in 'send*()' system calls. It marks end of each
record and is visible to receiver using 'recvmsg()' system
call.

Signed-off-by: Arseny Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agovirtio/vsock: rename 'EOR' to 'EOM' bit.
Arseny Krasnov [Fri, 3 Sep 2021 12:31:06 +0000 (15:31 +0300)]
virtio/vsock: rename 'EOR' to 'EOM' bit.

This current implemented bit is used to mark end of messages
('EOM' - end of message), not records('EOR' - end of record).
Also rename 'record' to 'message' in implementation as it is
different things.

Signed-off-by: Arseny Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agovirtio: Bind virtio device to device-tree node
Viresh Kumar [Tue, 27 Jul 2021 05:23:52 +0000 (10:53 +0530)]
virtio: Bind virtio device to device-tree node

Bind the virtio devices with their of_node. This will help users of the
virtio devices to mention their dependencies on the device in the DT
itself. Like GPIO pin users can use the phandle of the device node, or
the node may contain more subnodes to add i2c or spi eeproms and other
users.

Reviewed-by: Arnd Bergmann <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>
Link: https://lore.kernel.org/r/94c12705602929968477aaf27e02439eb7a7f253.1627362340.git.viresh.kumar@linaro.org
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agouapi: virtio_ids: Sync ids with specification
Viresh Kumar [Tue, 27 Jul 2021 05:23:51 +0000 (10:53 +0530)]
uapi: virtio_ids: Sync ids with specification

This synchronizes the virtio ids with the latest list from virtio
specification.

Signed-off-by: Viresh Kumar <[email protected]>
Link: https://lore.kernel.org/r/61b27e3bc61fb0c9f067001e95cfafc5d37d414a.1627362340.git.viresh.kumar@linaro.org
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agodt-bindings: gpio: Add bindings for gpio-virtio
Viresh Kumar [Tue, 27 Jul 2021 05:23:50 +0000 (10:53 +0530)]
dt-bindings: gpio: Add bindings for gpio-virtio

This patch adds binding for virtio GPIO controller, it is based on
virtio-device bindings.

Reviewed-by: Arnd Bergmann <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>
Link: https://lore.kernel.org/r/acf7402ef4aabc0ad6295c32846f2bef1cd9b56a.1627362340.git.viresh.kumar@linaro.org
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
3 years agodt-bindings: i2c: Add bindings for i2c-virtio
Viresh Kumar [Tue, 27 Jul 2021 05:23:49 +0000 (10:53 +0530)]
dt-bindings: i2c: Add bindings for i2c-virtio

This patch adds binding for virtio I2C device, it is based on
virtio-device bindings.

Acked-by: Wolfram Sang <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>
Link: https://lore.kernel.org/r/33c317b95097ce491845c697db1e8285e3ec1d41.1627362340.git.viresh.kumar@linaro.org
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
3 years agodt-bindings: virtio: Add binding for virtio devices
Viresh Kumar [Tue, 27 Jul 2021 05:23:48 +0000 (10:53 +0530)]
dt-bindings: virtio: Add binding for virtio devices

Allow virtio device sub-nodes to be added to the virtio mmio or pci
nodes. The compatible property for virtio device must be of the format
"virtio,device<ID>", where ID is virtio device ID in hexadecimal format.

Signed-off-by: Viresh Kumar <[email protected]>
Link: https://lore.kernel.org/r/d8319fd18df7086b12cdcc23193c313893aa071a.1627362340.git.viresh.kumar@linaro.org
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
3 years agovdpa_sim: Use iova_shift() for the size passed to alloc_iova()
Xie Yongji [Mon, 9 Aug 2021 10:09:23 +0000 (18:09 +0800)]
vdpa_sim: Use iova_shift() for the size passed to alloc_iova()

The size passed to alloc_iova() should be the size of page frames.
Now we use byte granularity for the iova domain, so it's safe to
pass the size in bytes to alloc_iova(). But it would be better to use
iova_shift() for the size to avoid future bugs if we change granularity.

Signed-off-by: Xie Yongji <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Jason Wang <[email protected]>
3 years agovhost scsi: Convert to SPDX identifier
Cai Huoqing [Sat, 21 Aug 2021 12:33:20 +0000 (20:33 +0800)]
vhost scsi: Convert to SPDX identifier

use SPDX-License-Identifier instead of a verbose license text

Signed-off-by: Cai Huoqing <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
3 years agovdpa/mlx5: Add multiqueue support
Eli Cohen [Mon, 23 Aug 2021 05:21:23 +0000 (08:21 +0300)]
vdpa/mlx5: Add multiqueue support

Multiqueue support requires additional virtio_net_q objects to be added
or removed per the configured number of queue pairs. In addition the RQ
tables needs to be modified to match the number of configured receive
queues so the packets are dispatched to the right virtqueue according to
the hash result.

Note: qemu v6.0.0 is broken when the device requests more than two data
queues; no net device will be created for the vdpa device. To avoid
this, one should specify mq=off to qemu. In this case it will end up
with a single queue.

Signed-off-by: Eli Cohen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agovdpa/mlx5: Add support for control VQ and MAC setting
Eli Cohen [Mon, 23 Aug 2021 05:21:22 +0000 (08:21 +0300)]
vdpa/mlx5: Add support for control VQ and MAC setting

Add support to handle control virtqueue configurations per virtio
specification. The control virtqueue is implemented in software and no
hardware offloading is involved.

Control VQ configuration need task context, therefore all configurations
are handled in a workqueue created for the purpose.

Modifications are made to the memory registration code to allow for
saving a copy of itolb to be used by the control VQ to access the vring.

The max number of data virtqueus supported by the driver has been
updated to 2 since multiqueue is not supported at this stage and we need
to ensure consistency of VQ indices mapping to either data or control
VQ.

Signed-off-by: Eli Cohen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agovdpa/mlx5: Ensure valid indices are provided
Eli Cohen [Mon, 23 Aug 2021 05:21:21 +0000 (08:21 +0300)]
vdpa/mlx5: Ensure valid indices are provided

Following patches add control virtuqeue and multiqueue support. We want
to verify that the index value to callbacks referencing a virtqueue is
valid.

The logic defining valid indices is as follows:
CVQ clear: 0 and 1.
CVQ set, MQ clear: 0, 1 and 2
CVQ set, MQ set: 0..nvq where nvq is whatever provided to
_vdpa_register_device()

Signed-off-by: Eli Cohen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agovdpa/mlx5: Decouple virtqueue callback from struct mlx5_vdpa_virtqueue
Eli Cohen [Mon, 23 Aug 2021 05:21:20 +0000 (08:21 +0300)]
vdpa/mlx5: Decouple virtqueue callback from struct mlx5_vdpa_virtqueue

Instead, define an array of struct vdpa_callback on struct mlx5_vdpa_net
and use it to store callbacks for any virtqueue provided. This is
required due to the fact that callback configurations arrive before feature
negotiation. With control VQ and multiqueue introduced next we want to
save the information until after feature negotiation where we know the
CVQ index.

Signed-off-by: Eli Cohen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agovdpa/mlx5: function prototype modifications in preparation to control VQ
Eli Cohen [Mon, 23 Aug 2021 05:21:19 +0000 (08:21 +0300)]
vdpa/mlx5: function prototype modifications in preparation to control VQ

Use struct mlx5_vdpa_dev as an argument to setup_driver() and a few
others in preparation to control virtqueue support in a subsequent
patch. The control virtqueue is part of struct mlx5_vdpa_dev so this is
required.

Signed-off-by: Eli Cohen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agovdpa/mlx5: Remove redundant header file inclusion
Eli Cohen [Mon, 23 Aug 2021 05:21:18 +0000 (08:21 +0300)]
vdpa/mlx5: Remove redundant header file inclusion

linux/if_vlan.h is not required.
Remove it.

Signed-off-by: Eli Cohen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agovDPA/ifcvf: enable multiqueue and control vq
Zhu Lingshan [Wed, 18 Aug 2021 09:57:14 +0000 (17:57 +0800)]
vDPA/ifcvf: enable multiqueue and control vq

This commit enbales multi-queue and control vq
features for ifcvf

Signed-off-by: Zhu Lingshan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Jason Wang <[email protected]>
3 years agovDPA/ifcvf: detect and use the onboard number of queues directly
Zhu Lingshan [Wed, 18 Aug 2021 09:57:13 +0000 (17:57 +0800)]
vDPA/ifcvf: detect and use the onboard number of queues directly

To enable this multi-queue feature for ifcvf, this commit
intends to detect and use the onboard number of queues
directly than IFCVF_MAX_QUEUE_PAIRS = 1 (removed)

Signed-off-by: Zhu Lingshan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Jason Wang <[email protected]>
3 years agovDPA/ifcvf: implement management netlink framework for ifcvf
Zhu Lingshan [Thu, 12 Aug 2021 03:24:54 +0000 (11:24 +0800)]
vDPA/ifcvf: implement management netlink framework for ifcvf

This commit implements the management netlink framework for ifcvf,
including register and add / remove a device

It works with iproute2:
[root@localhost lszhu]# vdpa mgmtdev show -jp
{
    "mgmtdev": {
        "pci/0000:01:00.5": {
            "supported_classes": [ "net" ]
        },
        "pci/0000:01:00.6": {
            "supported_classes": [ "net" ]
        }
    }
}

[root@localhost lszhu]# vdpa dev add mgmtdev pci/0000:01:00.5 name vdpa0
[root@localhost lszhu]# vdpa dev add mgmtdev pci/0000:01:00.6 name vdpa1

Signed-off-by: Zhu Lingshan <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agovDPA/ifcvf: introduce get_dev_type() which returns virtio dev id
Zhu Lingshan [Thu, 12 Aug 2021 03:24:53 +0000 (11:24 +0800)]
vDPA/ifcvf: introduce get_dev_type() which returns virtio dev id

This commit introduces a new function get_dev_type() which returns
the virtio device id of a device, to avoid duplicated code.

Signed-off-by: Zhu Lingshan <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
3 years agonet: create netdev->dev_addr assignment helpers
Jakub Kicinski [Thu, 2 Sep 2021 18:10:37 +0000 (11:10 -0700)]
net: create netdev->dev_addr assignment helpers

Recent work on converting address list to a tree made it obvious
we need an abstraction around writing netdev->dev_addr. Without
such abstraction updating the main device address is invisible
to the core.

Introduce a number of helpers which for now just wrap memcpy()
but in the future can make necessary changes to the address
tree.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agoMerge branch 'bnxt_en-fixes'
David S. Miller [Sun, 5 Sep 2021 19:43:04 +0000 (20:43 +0100)]
Merge branch 'bnxt_en-fixes'

Michael Chan says:

====================
bnxt_en: Bug fixes

This series includes 3 fixes related to devlink firmware and chip
versions.  The other 2 patches fix a UDP tunneling issue and an
error recovery issue.
====================

Signed-off-by: David S. Miller <[email protected]>
3 years agobnxt_en: Fix possible unintended driver initiated error recovery
Michael Chan [Sun, 5 Sep 2021 18:10:59 +0000 (14:10 -0400)]
bnxt_en: Fix possible unintended driver initiated error recovery

If error recovery is already enabled, bnxt_timer() will periodically
check the heartbeat register and the reset counter.  If we get an
error recovery async. notification from the firmware (e.g. change in
primary/secondary role), we will immediately read and update the
heartbeat register and the reset counter.  If the timer for the next
health check expires soon after this, we may read the heartbeat register
again in quick succession and find that it hasn't changed.  This will
trigger error recovery unintentionally.

The likelihood is small because we also reset fw_health->tmr_counter
which will reset the interval for the next health check.  But the
update is not protected and bnxt_timer() can miss the update and
perform the health check without waiting for the full interval.

Fix it by only reading the heartbeat register and reset counter in
bnxt_async_event_process() if error recovery is trasitioning to the
enabled state.  Also add proper memory barriers so that when enabling
for the first time, bnxt_timer() will see the tmr_counter interval and
perform the health check after the full interval has elapsed.

Fixes: 7e914027f757 ("bnxt_en: Enable health monitoring.")
Reviewed-by: Edwin Peer <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agobnxt_en: Fix UDP tunnel logic
Michael Chan [Sun, 5 Sep 2021 18:10:58 +0000 (14:10 -0400)]
bnxt_en: Fix UDP tunnel logic

The current logic assumes that when the driver sends the message to the
firmware to add the VXLAN or Geneve port, the firmware will never fail
the operation.  The UDP ports are always stored and are used to check
the tunnel packets in .ndo_features_check().  These tunnnel packets
will fail to offload on the transmit side if firmware fails the call to
add the UDP ports.

To fix the problem, bp->vxlan_port and bp->nge_port will only be set to
the offloaded ports when the HWRM_TUNNEL_DST_PORT_ALLOC firmware call
succeeds.  When deleting a UDP port, we check that the port was
previously added successfuly first by checking the FW ID.

Fixes: 1698d600b361 ("bnxt_en: Implement .ndo_features_check().")
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agobnxt_en: Fix asic.rev in devlink dev info command
Michael Chan [Sun, 5 Sep 2021 18:10:57 +0000 (14:10 -0400)]
bnxt_en: Fix asic.rev in devlink dev info command

The current asic.rev is incomplete and does not include the metal
revision.  Add the metal revision and decode the complete asic
revision into the more common and readable form (A0, B0, etc).

Fixes: 7154917a12b2 ("bnxt_en: Refactor bnxt_dl_info_get().")
Reviewed-by: Edwin Peer <[email protected]>
Reviewed-by: Somnath Kotur <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agobnxt_en: fix read of stored FW_PSID version on P5 devices
Edwin Peer [Sun, 5 Sep 2021 18:10:56 +0000 (14:10 -0400)]
bnxt_en: fix read of stored FW_PSID version on P5 devices

P5 devices store NVM arrays using a different internal representation.
This implementation detail permeates into the HWRM API, requiring the
caller to explicitly index the array elements in HWRM_NVM_GET_VARIABLE
on these devices. Conversely, older devices do not support the indexed
mode of operation and require reading the raw NVM content.

Fixes: db28b6c77f40 ("bnxt_en: Fix devlink info's stored fw.psid version format.")
Signed-off-by: Edwin Peer <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agobnxt_en: fix stored FW_PSID version masks
Edwin Peer [Sun, 5 Sep 2021 18:10:55 +0000 (14:10 -0400)]
bnxt_en: fix stored FW_PSID version masks

The FW_PSID version components are 8 bits wide, not 4.

Fixes: db28b6c77f40 ("bnxt_en: Fix devlink info's stored fw.psid version format.")
Signed-off-by: Edwin Peer <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agoMerge tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sun, 5 Sep 2021 18:56:18 +0000 (11:56 -0700)]
Merge tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool updates from Arnaldo Carvalho de Melo:
 "New features:

   - Improvements for the flamegraph python script, including:
       - Display perf.data header
       - Display PIDs of user stacks
       - Added option to change color scheme
       - Default to blue/green color scheme to improve accessibility
       - Correctly identify kernel stacks when debuginfo is available

   - Improvements for 'perf bench futex':
       - Add --mlockall parameter
       - Add --broadcast and --pi to the 'requeue' sub benchmark

   - Add support for PMU aliases.

   - Introduce an ARM Coresight ETE decoder.

   - Add a 'perf bench' entry for evlist open/close operations, to help
     quantify improvements with multithreading 'perf record'.

   - Allow reporting the [un]throttle PERF_RECORD_ meta event in 'perf
     script's python scripting.

   - Add a 'perf test' entry for PMU aliases.

   - Add a 'perf test' entry for 'perf record/perf report/perf script'
     pipe mode.

  Fixes:

   - perf script dlfilter (API for filtering via dynamically loaded
     shared object introduced in v5.14) fixes and a 'perf test' entry
     for it.

   - Fix get_current_dir_name() compilation on Android.

   - Fix issues with asciidoc and double dashes uses.

   - Fix memory leaks in the BTF handling code.

   - Fix leftover problems in the Documentation from the infrastructure
     originally lifted from the git codebase.

   - Fix *probe_vfs_getname.sh 'perf test' failures.

   - Handle fd gaps in 'perf test's test__dso_data_reopen().

   - Make sure to show disasembly warnings for 'perf annotate --stdio'.

   - Fix output from pipe to file and vice-versa in 'perf
     record/report/script'.

   - Correct 'perf data -h' output.

   - Fix wrong comm in system-wide mode with 'perf record --delay'.

   - Do not allow --for-each-cgroup without cpu in 'perf stat'

   - Make 'perf test --skip' work on shell tests.

   - Fix libperf's verbose printing.

  Misc improvements:

   - Preparatory patches for multithreading various 'perf record' phases
     (synthesizing, opening, recording, etc).

   - Add sparse context/locking annotations in compiler-types.h, also to
     help with the multithreading effort.

   - Optimize the generation of the arch specific erno tables used in
     'perf trace'.

   - Optimize libperf's perf_cpu_map__max().

   - Improve ARM's CoreSight warnings.

   - Report collisions in AUX records.

   - Improve warnings for the LLVM 'perf test' entry.

   - Improve the PMU events 'perf test' codebase.

   - perf test: Do not compare overheads in the zstd comp test

   - Better support annotation on ARM.

   - Update 'perf trace's cmd string table to decode sys_bpf() first
     arg.

  Vendor events:

   - Add JSON events and metrics for Intel's Ice Lake, Tiger Lake and
     Elhart Lake.

   - Update JSON eventsand metrics for Intel's Cascade Lake and Sky Lake
     servers.

  Hardware tracing:

   - Improvements for the ARM hardware tracing auxtrace support"

* tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (130 commits)
  perf tests: Add test for PMU aliases
  perf pmu: Add PMU alias support
  perf session: Report collisions in AUX records
  perf script python: Allow reporting the [un]throttle PERF_RECORD_ meta event
  perf build: Report failure for testing feature libopencsd
  perf cs-etm: Show a warning for an unknown magic number
  perf cs-etm: Print the decoder name
  perf cs-etm: Create ETE decoder
  perf cs-etm: Update OpenCSD decoder for ETE
  perf cs-etm: Fix typo
  perf cs-etm: Save TRCDEVARCH register
  perf cs-etm: Refactor out ETMv4 header saving
  perf cs-etm: Initialise architecture based on TRCIDR1
  perf cs-etm: Refactor initialisation of decoder params.
  tools build: Fix feature detect clean for out of source builds
  perf evlist: Add evlist__for_each_entry_from() macro
  perf evsel: Handle precise_ip fallback in evsel__open_cpu()
  perf evsel: Move bpf_counter__install_pe() to success path in evsel__open_cpu()
  perf evsel: Move test_attr__open() to success path in evsel__open_cpu()
  perf evsel: Move ignore_missing_thread() to fallback code
  ...

3 years agoMerge tag 'trace-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Sun, 5 Sep 2021 18:50:41 +0000 (11:50 -0700)]
Merge tag 'trace-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - simplify the Kconfig use of FTRACE and TRACE_IRQFLAGS_SUPPORT

 - bootconfig can now start histograms

 - bootconfig supports group/all enabling

 - histograms now can put values in linear size buckets

 - execnames can be passed to synthetic events

 - introduce "event probes" that attach to other events and can retrieve
   data from pointers of fields, or record fields as different types (a
   pointer to a string as a string instead of just a hex number)

 - various fixes and clean ups

* tag 'trace-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (35 commits)
  tracing/doc: Fix table format in histogram code
  selftests/ftrace: Add selftest for testing duplicate eprobes and kprobes
  selftests/ftrace: Add selftest for testing eprobe events on synthetic events
  selftests/ftrace: Add test case to test adding and removing of event probe
  selftests/ftrace: Fix requirement check of README file
  selftests/ftrace: Add clear_dynamic_events() to test cases
  tracing: Add a probe that attaches to trace events
  tracing/probes: Reject events which have the same name of existing one
  tracing/probes: Have process_fetch_insn() take a void * instead of pt_regs
  tracing/probe: Change traceprobe_set_print_fmt() to take a type
  tracing/probes: Use struct_size() instead of defining custom macros
  tracing/probes: Allow for dot delimiter as well as slash for system names
  tracing/probe: Have traceprobe_parse_probe_arg() take a const arg
  tracing: Have dynamic events have a ref counter
  tracing: Add DYNAMIC flag for dynamic events
  tracing: Replace deprecated CPU-hotplug functions.
  MAINTAINERS: Add an entry for os noise/latency
  tracepoint: Fix kerneldoc comments
  bootconfig/tracing/ktest: Update ktest example for boot-time tracing
  tools/bootconfig: Use per-group/all enable option in ftrace2bconf script
  ...

3 years agoMerge tag 'arc-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Linus Torvalds [Sun, 5 Sep 2021 18:43:03 +0000 (11:43 -0700)]
Merge tag 'arc-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC updates from Vineet Gupta:
 "Finally a big pile of changes for ARC (atomics/mm). These are from our
  internal arc64 tree, preparing mainline for eventual arc64 support.
  I'm spreading them out to avoid tsunami of patches in one release.

   - MM rework:
       - Implement up to 4 paging levels
       - Enable STRICT_MM_TYPECHECK
       - switch pgtable_t back to 'struct page *'

   - Atomics rework / implement relaxed accessors

   - Retire legacy MMUv1,v2; ARC750 cores

   - A few other build errors, typos"

* tag 'arc-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (33 commits)
  ARC: mm: vmalloc sync from kernel to user table to update PMD ...
  ARC: mm: support 4 levels of page tables
  ARC: mm: support 3 levels of page tables
  ARC: mm: switch to asm-generic/pgalloc.h
  ARC: mm: switch pgtable_t back to struct page *
  ARC: mm: hack to allow 2 level build with 4 level code
  ARC: mm: disintegrate pgtable.h into levels and flags
  ARC: mm: disintegrate mmu.h (arcv2 bits out)
  ARC: mm: move MMU specific bits out of entry code ...
  ARC: mm: move MMU specific bits out of ASID allocator
  ARC: mm: non-functional code movement/cleanup
  ARC: mm: pmd_populate* to use the canonical set_pmd (and drop pmd_set)
  ARC: ioremap: use more commonly used PAGE_KERNEL based uncached flag
  ARC: mm: Enable STRICT_MM_TYPECHECKS
  ARC: mm: Fixes to allow STRICT_MM_TYPECHECKS
  ARC: mm: move mmu/cache externs out to setup.h
  ARC: mm: remove tlb paranoid code
  ARC: mm: use SCRATCH_DATA0 register for caching pgdir in ARCv2 only
  ARC: retire MMUv1 and MMUv2 support
  ARC: retire ARC750 support
  ...

3 years agoMerge tag 'riscv-for-linus-5.15-mw0' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 5 Sep 2021 18:31:23 +0000 (11:31 -0700)]
Merge tag 'riscv-for-linus-5.15-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - support PC-relative instructions (auipc and branches) in kprobes

 - support for forced IRQ threading

 - support for the hlt/nohlt kernel command line options, via the
   generic idle loop

 - show the edge/level triggered behavior of interrupts
   in /proc/interrupts

 - a handful of cleanups to our address mapping mechanisms

 - support for allocating gigantic hugepages via CMA

 - support for the undefined behavior sanitizer (UBSAN)

 - a handful of cleanups to the VDSO that allow the kernel to build with
   LLD.

 - support for hugepage migration

* tag 'riscv-for-linus-5.15-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (21 commits)
  riscv: add support for hugepage migration
  RISC-V: Fix VDSO build for !MMU
  riscv: use strscpy to replace strlcpy
  riscv: explicitly use symbol offsets for VDSO
  riscv: Enable Undefined Behavior Sanitizer UBSAN
  riscv: Keep the riscv Kconfig selects sorted
  riscv: Support allocating gigantic hugepages using CMA
  riscv: fix the global name pfn_base confliction error
  riscv: Move early fdt mapping creation in its own function
  riscv: Simplify BUILTIN_DTB device tree mapping handling
  riscv: Use __maybe_unused instead of #ifdefs around variable declarations
  riscv: Get rid of map_size parameter to create_kernel_page_table
  riscv: Introduce va_kernel_pa_offset for 32-bit kernel
  riscv: Optimize kernel virtual address conversion macro
  dt-bindings: riscv: add starfive jh7100 bindings
  riscv: Enable GENERIC_IRQ_SHOW_LEVEL
  riscv: Enable idle generic idle loop
  riscv: Allow forced irq threading
  riscv: Implement thread_struct whitelist for hardened usercopy
  riscv: kprobes: implement the branch instructions
  ...

3 years agoEnable '-Werror' by default for all kernel builds
Linus Torvalds [Sun, 5 Sep 2021 18:24:05 +0000 (11:24 -0700)]
Enable '-Werror' by default for all kernel builds

... but make it a config option so that broken environments can disable
it when required.

We really should always have a clean build, and will disable specific
over-eager warnings as required, if we can't fix them.  But while I
fairly religiously enforce that in my own tree, it doesn't get enforced
by various build robots that don't necessarily report warnings.

So this just makes '-Werror' a default compiler flag, but allows people
to disable it for their configuration if they have some particular
issues.

Occasionally, new compiler versions end up enabling new warnings, and it
can take a while before we have them fixed (or the warnings disabled if
that is what it takes), so the config option allows for that situation.

Hopefully this will mean that I get fewer pull requests that have new
warnings that were not noticed by various automation we have in place.

Knock wood.

Signed-off-by: Linus Torvalds <[email protected]>
3 years agoMerge tag 'usb-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 5 Sep 2021 18:19:15 +0000 (11:19 -0700)]
Merge tag 'usb-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull more USB updates from Greg KH:
 "Here are some straggler USB-serial changes for 5.15-rc1.

  These were not included in the first pull request as they came in
  "late" from Johan and I had missed them in my pull request earlier
  this week.

  Nothing big in here, just some USB to serial driver updates and fixes.
  All of these were in linux-next before I pulled them into my tree, and
  have been in linux-next all this week from my tree with no reported
  problems"

* tag 'usb-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: serial: pl2303: fix GL type detection
  USB: serial: replace symbolic permissions by octal permissions
  USB: serial: cp210x: determine fw version for CP2105 and CP2108
  USB: serial: cp210x: clean up type detection
  USB: serial: cp210x: clean up set-chars request
  USB: serial: cp210x: clean up control-request timeout
  USB: serial: cp210x: fix flow-control error handling
  USB: serial: cp210x: fix control-characters error handling
  USB: serial: io_edgeport: drop unused descriptor helper

3 years agonet: dsa: b53: Fix IMP port setup on BCM5301x
Rafał Miłecki [Sun, 5 Sep 2021 17:23:28 +0000 (19:23 +0200)]
net: dsa: b53: Fix IMP port setup on BCM5301x

Broadcom's b53 switches have one IMP (Inband Management Port) that needs
to be programmed using its own designed register. IMP port may be
different than CPU port - especially on devices with multiple CPU ports.

For that reason it's required to explicitly note IMP port index and
check for it when choosing a register to use.

This commit fixes BCM5301x support. Those switches use CPU port 5 while
their IMP port is 8. Before this patch b53 was trying to program port 5
with B53_PORT_OVERRIDE_CTRL instead of B53_GMII_PORT_OVERRIDE_CTRL(5).

It may be possible to also replace "cpu_port" usages with
dsa_is_cpu_port() but that is out of the scope of thix BCM5301x fix.

Fixes: 967dd82ffc52 ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Rafał Miłecki <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agoip_gre: validate csum_start only on pull
Willem de Bruijn [Sun, 5 Sep 2021 15:21:09 +0000 (11:21 -0400)]
ip_gre: validate csum_start only on pull

The GRE tunnel device can pull existing outer headers in ipge_xmit.
This is a rare path, apparently unique to this device. The below
commit ensured that pulling does not move skb->data beyond csum_start.

But it has a false positive if ip_summed is not CHECKSUM_PARTIAL and
thus csum_start is irrelevant.

Refine to exclude this. At the same time simplify and strengthen the
test.

Simplify, by moving the check next to the offending pull, making it
more self documenting and removing an unnecessary branch from other
code paths.

Strengthen, by also ensuring that the transport header is correct and
therefore the inner headers will be after skb_reset_inner_headers.
The transport header is set to csum_start in skb_partial_csum_set.

Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/
Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start")
Reported-by: Ido Schimmel <[email protected]>
Suggested-by: Alexander Duyck <[email protected]>
Signed-off-by: Willem de Bruijn <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agoMerge tag 'mtd/for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Linus Torvalds [Sun, 5 Sep 2021 17:50:12 +0000 (10:50 -0700)]
Merge tag 'mtd/for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux

Pull MTD updates from Miquel Raynal:
 "MTD changes:
   - blkdevs:
       - Simplify the refcounting in blktrans_{open, release}
       - Simplify blktrans_getgeo
       - Remove blktrans_ref_mutex
       - Simplify blktrans_dev_get
       - Use lockdep_assert_held
       - Don't hold del_mtd_blktrans_dev in blktrans_{open, release}
   - ftl:
       - Don't cast away the type when calling add_mtd_blktrans_dev
       - Don't cast away the type when calling add_mtd_blktrans_dev
       - Use container_of() rather than cast
       - Fix use-after-free
       - Add discard support
       - Allow use of MTD_RAM for testing purposes
   - concat:
       - Check _read, _write callbacks existence before assignment
       - Judge callback existence based on the master
   - maps:
       - Maps: remove dead MTD map driver for PMC-Sierra MSP boards
   - mtdblock:
       - Warn if added for a NAND device
       - Add comment about UBI block devices
       - Update old JFFS2 mention in Kconfig
   - partitions:
       - Redboot: convert to YAML

  NAND core changes:
   - Repair Miquel Raynal's email address in MAINTAINERS
   - Fix a couple of spelling mistakes in Kconfig
   - bbt: Skip bad blocks when searching for the BBT in NAND
   - Remove never changed ret variable

  Raw NAND changes:
   - cafe: Fix a resource leak in the error handling path of 'cafe_nand_probe()'
   - intel: Fix error handling in probe
   - omap: Fix kernel doc warning on 'calcuate' typo
   - gpmc: Fix the ECC bytes vs. OOB bytes equation

  SPI-NAND core changes:
   - Properly fill the OOB area.
   - Fix comment

  SPI-NAND drivers changes:
   - macronix: Add Quad support for serial NAND flash"

* tag 'mtd/for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (30 commits)
  mtd: rawnand: cafe: Fix a resource leak in the error handling path of 'cafe_nand_probe()'
  mtd_blkdevs: simplify the refcounting in blktrans_{open, release}
  mtd_blkdevs: simplify blktrans_getgeo
  mtd_blkdevs: remove blktrans_ref_mutex
  mtd_blkdevs: simplify blktrans_dev_get
  mtd/rfd_ftl: don't cast away the type when calling add_mtd_blktrans_dev
  mtd/ftl: don't cast away the type when calling add_mtd_blktrans_dev
  mtd_blkdevs: use lockdep_assert_held
  mtd_blkdevs: don't hold del_mtd_blktrans_dev in blktrans_{open, release}
  mtd: rawnand: intel: Fix error handling in probe
  mtd: mtdconcat: Check _read, _write callbacks existence before assignment
  mtd: mtdconcat: Judge callback existence based on the master
  mtd: maps: remove dead MTD map driver for PMC-Sierra MSP boards
  mtd: rfd_ftl: use container_of() rather than cast
  mtd: rfd_ftl: fix use-after-free
  mtd: rfd_ftl: add discard support
  mtd: rfd_ftl: allow use of MTD_RAM for testing purposes
  mtdblock: Warn if added for a NAND device
  mtd: spinand: macronix: Add Quad support for serial NAND flash
  mtdblock: Add comment about UBI block devices
  ...

3 years agobinfmt: a.out: Fix bogus semicolon
Geert Uytterhoeven [Sun, 5 Sep 2021 09:30:34 +0000 (11:30 +0200)]
binfmt: a.out: Fix bogus semicolon

    fs/binfmt_aout.c: In function ‘load_aout_library’:
    fs/binfmt_aout.c:311:27: error: expected ‘)’ before ‘;’ token
      311 |    MAP_FIXED | MAP_PRIVATE;
  |                           ^
    fs/binfmt_aout.c:309:10: error: too few arguments to function ‘vm_mmap’
      309 |  error = vm_mmap(file, start_addr, ex.a_text + ex.a_data,
  |          ^~~~~~~
    In file included from fs/binfmt_aout.c:12:
    include/linux/mm.h:2626:35: note: declared here
     2626 | extern unsigned long __must_check vm_mmap(struct file *, unsigned long,
  |                                   ^~~~~~~

Fix this by reverting the accidental replacement of a comma by a
semicolon.

Fixes: 42be8b42535183f8 ("binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib()")
Reported-by: [email protected]
Reported-by: Guenter Roeck <[email protected]>
Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
3 years agobonding: complain about missing route only once for A/B ARP probes
David Decotigny [Sat, 4 Sep 2021 06:31:29 +0000 (23:31 -0700)]
bonding: complain about missing route only once for A/B ARP probes

On configs where there is no confirgured direct route to the target of
the ARP probes, these probes are still sent and may be replied to
properly, so no need to repeatedly complain about the missing route.

Signed-off-by: David Decotigny <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agoip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address
Antonio Quartulli [Fri, 3 Sep 2021 16:58:42 +0000 (18:58 +0200)]
ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address

GRE interfaces are not Ether-like and therefore it is not
possible to generate the v6LL address the same way as (for example)
GRETAP devices.

With default settings, a GRE interface will attempt generating its v6LL
address using the EUI64 approach, but this will fail when the local
endpoint of the GRE tunnel is set to "any". In this case the GRE
interface will end up with no v6LL address, thus violating RFC4291.

SIT interfaces already implement a different logic to ensure that a v6LL
address is always computed.

Change the GRE v6LL generation logic to follow the same approach as SIT.
This way GRE interfaces will always have a v6LL address as well.

Behaviour of GRETAP interfaces has not been changed as they behave like
classic Ether-like interfaces.

To avoid code duplication sit_add_v4_addrs() has been renamed to
add_v4_addrs() and adapted to handle also the IP6GRE/GRE cases.

Signed-off-by: Antonio Quartulli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agonet: stmmac: Fix overall budget calculation for rxtx_napi
Song Yoong Siang [Fri, 3 Sep 2021 02:00:26 +0000 (10:00 +0800)]
net: stmmac: Fix overall budget calculation for rxtx_napi

tx_done is not used for napi_complete_done(). Thus, NAPI busy polling
mechanism by gro_flush_timeout and napi_defer_hard_irqs will not able
be triggered after a packet is transmitted when there is no receive
packet.

Fix this by taking the maximum value between tx_done and rx_done as
overall budget completed by the rxtx NAPI poll to ensure XDP Tx ZC
operation is continuously polling for next Tx frame. This gives
benefit of lower packet submission processing latency and jitter
under XDP Tx ZC mode.

Performance of tx-only using xdp-sock on Intel ADL-S platform is
the same with and without this patch.

root@intel-corei7-64:~# ./xdpsock -i enp0s30f4 -t -z -q 1 -n 10
 sock0@enp0s30f4:1 txonly xdp-drv
                   pps            pkts           10.00
rx                 0              0
tx                 511630         8659520

 sock0@enp0s30f4:1 txonly xdp-drv
                   pps            pkts           10.00
rx                 0              0
tx                 511625         13775808

 sock0@enp0s30f4:1 txonly xdp-drv
                   pps            pkts           10.00
rx                 0              0
tx                 511619         18892032

Fixes: 132c32ee5bc0 ("net: stmmac: Add TX via XDP zero-copy socket")
Cc: <[email protected]> # 5.13.x
Co-developed-by: Ong Boon Leong <[email protected]>
Signed-off-by: Ong Boon Leong <[email protected]>
Signed-off-by: Song Yoong Siang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agoiwlwifi: pnvm: Fix a memory leak in 'iwl_pnvm_get_from_fs()'
Christophe JAILLET [Thu, 2 Sep 2021 20:38:11 +0000 (22:38 +0200)]
iwlwifi: pnvm: Fix a memory leak in 'iwl_pnvm_get_from_fs()'

A firmware is requested but never released in this function. This leads to
a memory leak in the normal execution path.

Add the missing 'release_firmware()' call.
Also introduce a temp variable (new_len) in order to keep the value of
'pnvm->size' after the firmware has been released.

Fixes: cdda18fbbefa ("iwlwifi: pnvm: move file loading code to a separate function")
Signed-off-by: Christophe JAILLET <[email protected]>
Reviewed-by: Dan Carpenter <[email protected]>
Acked-by: Luca Coelho <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/1b5d80f54c1dbf85710fd285243932943b498fe7.1630614969.git.christophe.jaillet@wanadoo.fr
3 years agonet/9p: increase default msize to 128k
Christian Schoenebeck [Sat, 4 Sep 2021 15:12:51 +0000 (17:12 +0200)]
net/9p: increase default msize to 128k

Let's raise the default msize value to 128k.

The 'msize' option defines the maximum message size allowed for any
message being transmitted (in both directions) between 9p server and 9p
client during a 9p session.

Currently the default 'msize' is just 8k, which is way too conservative.
Such a small 'msize' value has quite a negative performance impact,
because individual 9p messages have to be split up far too often into
numerous smaller messages to fit into this message size limitation.

A default value of just 8k also has a much higher probablity of hitting
short-read issues like: https://gitlab.com/qemu-project/qemu/-/issues/409

Unfortunately user feedback showed that many 9p users are not aware that
this option even exists, nor the negative impact it might have if it is
too low.

Link: http://lkml.kernel.org/r/61ea0f0faaaaf26dd3c762eabe4420306ced21b9.1630770829.git.linux_oss@crudebyte.com
Link: https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01003.html
Signed-off-by: Christian Schoenebeck <[email protected]>
Signed-off-by: Dominique Martinet <[email protected]>
3 years agonet/9p: use macro to define default msize
Christian Schoenebeck [Sat, 4 Sep 2021 15:07:12 +0000 (17:07 +0200)]
net/9p: use macro to define default msize

Use a macro to define the default value for the 'msize' option
at one place instead of using two separate integer literals.

Link: http://lkml.kernel.org/r/28bb651ae0349a7d57e8ddc92c1bd5e62924a912.1630770829.git.linux_oss@crudebyte.com
Signed-off-by: Christian Schoenebeck <[email protected]>
Signed-off-by: Dominique Martinet <[email protected]>
3 years agonet/9p: increase tcp max msize to 1MB
Dominique Martinet [Sat, 4 Sep 2021 23:29:22 +0000 (08:29 +0900)]
net/9p: increase tcp max msize to 1MB

Historically TCP has been limited to 64K buffers, but increasing
msize provides huge performance benefits especially as latency
increase so allow for bigger buffers.

Ideally further improvements could change the allocation from the
current contiguous chunk in slab (kmem_cache) to some scatter-gather
compatible API...

Note this only increases the max possible setting, not the default
value.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dominique Martinet <[email protected]>
3 years agoNTB: perf: Fix an error code in perf_setup_inbuf()
Yang Li [Mon, 7 Jun 2021 08:40:36 +0000 (16:40 +0800)]
NTB: perf: Fix an error code in perf_setup_inbuf()

When the function IS_ALIGNED() returns false, the value of ret is 0.
So, we set ret to -EINVAL to indicate this error.

Clean up smatch warning:
drivers/ntb/test/ntb_perf.c:602 perf_setup_inbuf() warn: missing error
code 'ret'.

Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Yang Li <[email protected]>
Reviewed-by: Serge Semin <[email protected]>
Signed-off-by: Jon Mason <[email protected]>
This page took 0.126852 seconds and 4 git commands to generate.