]> Git Repo - linux.git/log
linux.git
3 years agoKVM: selftests: Use perf_test_destroy_vm in memslot_modification_stress_test
David Matlack [Thu, 11 Nov 2021 00:12:57 +0000 (00:12 +0000)]
KVM: selftests: Use perf_test_destroy_vm in memslot_modification_stress_test

Change memslot_modification_stress_test to use perf_test_destroy_vm
instead of manually calling ucall_uninit and kvm_vm_free.

No functional change intended.

Signed-off-by: David Matlack <[email protected]>
Reviewed-by: Ben Gardon <[email protected]>
Message-Id: <20211111001257.1446428[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: selftests: Wait for all vCPU to be created before entering guest mode
David Matlack [Thu, 11 Nov 2021 00:12:56 +0000 (00:12 +0000)]
KVM: selftests: Wait for all vCPU to be created before entering guest mode

Thread creation requires taking the mmap_sem in write mode, which causes
vCPU threads running in guest mode to block while they are populating
memory. Fix this by waiting for all vCPU threads to be created and start
running before entering guest mode on any one vCPU thread.

This substantially improves the "Populate memory time" when using 1GiB
pages since it allows all vCPUs to zero pages in parallel rather than
blocking because a writer is waiting (which is waiting for another vCPU
that is busy zeroing a 1GiB page).

Before:

  $ ./dirty_log_perf_test -v256 -s anonymous_hugetlb_1gb
  ...
  Populate memory time: 52.811184013s

After:

  $ ./dirty_log_perf_test -v256 -s anonymous_hugetlb_1gb
  ...
  Populate memory time: 10.204573342s

Signed-off-by: David Matlack <[email protected]>
Message-Id: <20211111001257.1446428[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: selftests: Move vCPU thread creation and joining to common helpers
David Matlack [Thu, 11 Nov 2021 00:12:55 +0000 (00:12 +0000)]
KVM: selftests: Move vCPU thread creation and joining to common helpers

Move vCPU thread creation and joining to common helper functions. This
is in preparation for the next commit which ensures that all vCPU
threads are fully created before entering guest mode on any one
vCPU.

No functional change intended.

Signed-off-by: David Matlack <[email protected]>
Reviewed-by: Ben Gardon <[email protected]>
Message-Id: <20211111001257.1446428[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: selftests: Start at iteration 0 instead of -1
David Matlack [Thu, 11 Nov 2021 00:12:54 +0000 (00:12 +0000)]
KVM: selftests: Start at iteration 0 instead of -1

Start at iteration 0 instead of -1 to avoid having to initialize
vcpu_last_completed_iteration when setting up vCPU threads. This
simplifies the next commit where we move vCPU thread initialization
out to a common helper.

No functional change intended.

Signed-off-by: David Matlack <[email protected]>
Message-Id: <20211111001257.1446428[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: selftests: Sync perf_test_args to guest during VM creation
Sean Christopherson [Thu, 11 Nov 2021 00:03:10 +0000 (00:03 +0000)]
KVM: selftests: Sync perf_test_args to guest during VM creation

Copy perf_test_args to the guest during VM creation instead of relying on
the caller to do so at their leisure.  Ideally, tests wouldn't even be
able to modify perf_test_args, i.e. they would have no motivation to do
the sync, but enforcing that is arguably a net negative for readability.

No functional change intended.

[Set wr_fract=1 by default and add helper to override it since the new
 access_tracking_perf_test needs to set it dynamically.]

Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: David Matlack <[email protected]>
Reviewed-by: Ben Gardon <[email protected]>
Message-Id: <20211111000310.1435032[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: selftests: Fill per-vCPU struct during "perf_test" VM creation
Sean Christopherson [Thu, 11 Nov 2021 00:03:09 +0000 (00:03 +0000)]
KVM: selftests: Fill per-vCPU struct during "perf_test" VM creation

Fill the per-vCPU args when creating the perf_test VM instead of having
the caller do so.  This helps ensure that any adjustments to the number
of pages (and thus vcpu_memory_bytes) are reflected in the per-VM args.
Automatically filling the per-vCPU args will also allow a future patch
to do the sync to the guest during creation.

Signed-off-by: Sean Christopherson <[email protected]>
[Updated access_tracking_perf_test as well.]
Signed-off-by: David Matlack <[email protected]>
Reviewed-by: Ben Gardon <[email protected]>
Message-Id: <20211111000310.1435032[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: selftests: Create VM with adjusted number of guest pages for perf tests
Sean Christopherson [Thu, 11 Nov 2021 00:03:08 +0000 (00:03 +0000)]
KVM: selftests: Create VM with adjusted number of guest pages for perf tests

Use the already computed guest_num_pages when creating the so called
extra VM pages for a perf test, and add a comment explaining why the
pages are allocated as extra pages.

Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: David Matlack <[email protected]>
Message-Id: <20211111000310.1435032[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: selftests: Remove perf_test_args.host_page_size
Sean Christopherson [Thu, 11 Nov 2021 00:03:07 +0000 (00:03 +0000)]
KVM: selftests: Remove perf_test_args.host_page_size

Remove perf_test_args.host_page_size and instead use getpagesize() so
that it's somewhat obvious that, for tests that care about the host page
size, they care about the system page size, not the hardware page size,
e.g. that the logic is unchanged if hugepages are in play.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: David Matlack <[email protected]>
Message-Id: <20211111000310.1435032[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: selftests: Move per-VM GPA into perf_test_args
Sean Christopherson [Thu, 11 Nov 2021 00:03:06 +0000 (00:03 +0000)]
KVM: selftests: Move per-VM GPA into perf_test_args

Move the per-VM GPA into perf_test_args instead of storing it as a
separate global variable.  It's not obvious that guest_test_phys_mem
holds a GPA, nor that it's connected/coupled with per_vcpu->gpa.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: David Matlack <[email protected]>
Message-Id: <20211111000310.1435032[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: selftests: Use perf util's per-vCPU GPA/pages in demand paging test
Sean Christopherson [Thu, 11 Nov 2021 00:03:05 +0000 (00:03 +0000)]
KVM: selftests: Use perf util's per-vCPU GPA/pages in demand paging test

Grab the per-vCPU GPA and number of pages from perf_util in the demand
paging test instead of duplicating perf_util's calculations.

Note, this may or may not result in a functional change.  It's not clear
that the test's calculations are guaranteed to yield the same value as
perf_util, e.g. if guest_percpu_mem_size != vcpu_args->pages.

Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: David Matlack <[email protected]>
Message-Id: <20211111000310.1435032[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: selftests: Capture per-vCPU GPA in perf_test_vcpu_args
Sean Christopherson [Thu, 11 Nov 2021 00:03:04 +0000 (00:03 +0000)]
KVM: selftests: Capture per-vCPU GPA in perf_test_vcpu_args

Capture the per-vCPU GPA in perf_test_vcpu_args so that tests can get
the GPA without having to calculate the GPA on their own.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: David Matlack <[email protected]>
Message-Id: <20211111000310.1435032[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: selftests: Use shorthand local var to access struct perf_tests_args
Sean Christopherson [Thu, 11 Nov 2021 00:03:03 +0000 (00:03 +0000)]
KVM: selftests: Use shorthand local var to access struct perf_tests_args

Use 'pta' as a local pointer to the global perf_tests_args in order to
shorten line lengths and make the code borderline readable.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: David Matlack <[email protected]>
Message-Id: <20211111000310.1435032[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: selftests: Require GPA to be aligned when backed by hugepages
Sean Christopherson [Thu, 11 Nov 2021 00:03:02 +0000 (00:03 +0000)]
KVM: selftests: Require GPA to be aligned when backed by hugepages

Assert that the GPA for a memslot backed by a hugepage is aligned to
the hugepage size and fix perf_test_util accordingly.  Lack of GPA
alignment prevents KVM from backing the guest with hugepages, e.g. x86's
write-protection of hugepages when dirty logging is activated is
otherwise not exercised.

Add a comment explaining that guest_page_size is for non-huge pages to
try and avoid confusion about what it actually tracks.

Cc: Ben Gardon <[email protected]>
Cc: Yanan Wang <[email protected]>
Cc: Andrew Jones <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Aaron Lewis <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
[Used get_backing_src_pagesz() to determine alignment dynamically.]
Signed-off-by: David Matlack <[email protected]>
Message-Id: <20211111000310.1435032[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: selftests: Assert mmap HVA is aligned when using HugeTLB
Sean Christopherson [Thu, 11 Nov 2021 00:03:01 +0000 (00:03 +0000)]
KVM: selftests: Assert mmap HVA is aligned when using HugeTLB

Manually padding and aligning the mmap region is only needed when using
THP. When using HugeTLB, mmap will always return an address aligned to
the HugeTLB page size. Add a comment to clarify this and assert the mmap
behavior for HugeTLB.

[Removed requirement that HugeTLB mmaps must be padded per Yanan's
 feedback and added assertion that mmap returns aligned addresses
 when using HugeTLB.]

Cc: Ben Gardon <[email protected]>
Cc: Yanan Wang <[email protected]>
Cc: Andrew Jones <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Aaron Lewis <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: David Matlack <[email protected]>
Message-Id: <20211111000310.1435032[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: selftests: Expose align() helpers to tests
Sean Christopherson [Thu, 11 Nov 2021 00:03:00 +0000 (00:03 +0000)]
KVM: selftests: Expose align() helpers to tests

Refactor align() to work with non-pointers and split into separate
helpers for aligning up vs. down. Add align_ptr_up() for use with
pointers. Expose all helpers so that they can be used by tests and/or
other utilities.  The align_down() helper in particular will be used to
ensure gpa alignment for hugepages.

No functional change intended.

[Added sepearate up/down helpers and replaced open-coded alignment
 bit math throughout the KVM selftests.]

Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: David Matlack <[email protected]>
Reviewed-by: Ben Gardon <[email protected]>
Message-Id: <20211111000310.1435032[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: selftests: Explicitly state indicies for vm_guest_mode_params array
Sean Christopherson [Thu, 11 Nov 2021 00:02:59 +0000 (00:02 +0000)]
KVM: selftests: Explicitly state indicies for vm_guest_mode_params array

Explicitly state the indices when populating vm_guest_mode_params to
make it marginally easier to visualize what's going on.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Ben Gardon <[email protected]>
[Added indices for new guest modes.]
Signed-off-by: David Matlack <[email protected]>
Message-Id: <20211111000310.1435032[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoKVM: selftests: Add event channel upcall support to xen_shinfo_test
David Woodhouse [Mon, 15 Nov 2021 16:50:22 +0000 (16:50 +0000)]
KVM: selftests: Add event channel upcall support to xen_shinfo_test

When I first looked at this, there was no support for guest exception
handling in the KVM selftests. In fact it was merged into 5.10 before
the Xen support got merged in 5.11, and I could have used it from the
start.

Hook it up now, to exercise the Xen upcall delivery. I'm about to make
things a bit more interesting by handling the full 2level event channel
stuff in-kernel on top of the basic vector injection that we already
have, and I'll want to build more tests on top.

Signed-off-by: David Woodhouse <[email protected]>
Message-Id: <20211115165030[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoudp: Validate checksum in udp_read_sock()
Cong Wang [Mon, 15 Nov 2021 04:40:06 +0000 (20:40 -0800)]
udp: Validate checksum in udp_read_sock()

It turns out the skb's in sock receive queue could have bad checksums, as
both ->poll() and ->recvmsg() validate checksums. We have to do the same
for ->read_sock() path too before they are redirected in sockmap.

Fixes: d7f571188ecf ("udp: Implement ->read_sock() for sockmap")
Reported-by: Daniel Borkmann <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: John Fastabend <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
3 years agomips: lantiq: add support for clk_get_parent()
Randy Dunlap [Mon, 15 Nov 2021 01:20:51 +0000 (17:20 -0800)]
mips: lantiq: add support for clk_get_parent()

Provide a simple implementation of clk_get_parent() in the
lantiq subarch so that callers of it will build without errors.

Fixes this build error:
ERROR: modpost: "clk_get_parent" [drivers/iio/adc/ingenic-adc.ko] undefined!

Fixes: 171bb2f19ed6 ("MIPS: Lantiq: Add initial support for Lantiq SoCs")
Signed-off-by: Randy Dunlap <[email protected]>
Suggested-by: Russell King (Oracle) <[email protected]>
Cc: [email protected]
Cc: John Crispin <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: [email protected]
Cc: Russell King <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Acked-by: Jonathan Cameron <[email protected]>
Acked-by: John Crispin <[email protected]>
Signed-off-by: Thomas Bogendoerfer <[email protected]>
3 years agomips: bcm63xx: add support for clk_get_parent()
Randy Dunlap [Mon, 15 Nov 2021 00:42:18 +0000 (16:42 -0800)]
mips: bcm63xx: add support for clk_get_parent()

BCM63XX selects HAVE_LEGACY_CLK but does not provide/support
clk_get_parent(), so add a simple implementation of that
function so that callers of it will build without errors.

Fixes these build errors:

mips-linux-ld: drivers/iio/adc/ingenic-adc.o: in function `jz4770_adc_init_clk_div':
ingenic-adc.c:(.text+0xe4): undefined reference to `clk_get_parent'
mips-linux-ld: drivers/iio/adc/ingenic-adc.o: in function `jz4725b_adc_init_clk_div':
ingenic-adc.c:(.text+0x1b8): undefined reference to `clk_get_parent'

Fixes: e7300d04bd08 ("MIPS: BCM63xx: Add support for the Broadcom BCM63xx family of SOCs." )
Signed-off-by: Randy Dunlap <[email protected]>
Reported-by: kernel test robot <[email protected]>
Suggested-by: Russell King (Oracle) <[email protected]>
Cc: Artur Rojek <[email protected]>
Cc: Paul Cercueil <[email protected]>
Cc: [email protected]
Cc: Jonathan Cameron <[email protected]>
Cc: Lars-Peter Clausen <[email protected]>
Cc: [email protected]
Cc: Florian Fainelli <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Russell King <[email protected]>
Cc: [email protected]
Cc: Jonas Gorski <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Acked-by: Jonathan Cameron <[email protected]>
Acked-by: Russell King (Oracle) <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Signed-off-by: Thomas Bogendoerfer <[email protected]>
3 years agoMIPS: generic/yamon-dt: fix uninitialized variable error
Colin Ian King [Wed, 10 Nov 2021 23:28:24 +0000 (23:28 +0000)]
MIPS: generic/yamon-dt: fix uninitialized variable error

In the case where fw_getenv returns an error when fetching values
for ememsizea and memsize then variable phys_memsize is not assigned
a variable and will be uninitialized on a zero check of phys_memsize.
Fix this by initializing phys_memsize to zero.

Cleans up cppcheck error:
arch/mips/generic/yamon-dt.c:100:7: error: Uninitialized variable: phys_memsize [uninitvar]

Fixes: f41d2430bbd6 ("MIPS: generic/yamon-dt: Support > 256MB of RAM")
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Thomas Bogendoerfer <[email protected]>
3 years agoMIPS: syscalls: Wire up futex_waitv syscall
Wang Haojun [Wed, 3 Nov 2021 02:55:21 +0000 (10:55 +0800)]
MIPS: syscalls: Wire up futex_waitv syscall

Wire up the futex_waitv syscall.

Fix Build warning: #warning syscall futex_waitv not implemented [-Wcpp]

Signed-off-by: Wang Haojun <[email protected]>
Signed-off-by: Thomas Bogendoerfer <[email protected]>
3 years agobpf: Fix toctou on read-only map's constant scalar tracking
Daniel Borkmann [Tue, 9 Nov 2021 18:48:08 +0000 (18:48 +0000)]
bpf: Fix toctou on read-only map's constant scalar tracking

Commit a23740ec43ba ("bpf: Track contents of read-only maps as scalars") is
checking whether maps are read-only both from BPF program side and user space
side, and then, given their content is constant, reading out their data via
map->ops->map_direct_value_addr() which is then subsequently used as known
scalar value for the register, that is, it is marked as __mark_reg_known()
with the read value at verification time. Before a23740ec43ba, the register
content was marked as an unknown scalar so the verifier could not make any
assumptions about the map content.

The current implementation however is prone to a TOCTOU race, meaning, the
value read as known scalar for the register is not guaranteed to be exactly
the same at a later point when the program is executed, and as such, the
prior made assumptions of the verifier with regards to the program will be
invalid which can cause issues such as OOB access, etc.

While the BPF_F_RDONLY_PROG map flag is always fixed and required to be
specified at map creation time, the map->frozen property is initially set to
false for the map given the map value needs to be populated, e.g. for global
data sections. Once complete, the loader "freezes" the map from user space
such that no subsequent updates/deletes are possible anymore. For the rest
of the lifetime of the map, this freeze one-time trigger cannot be undone
anymore after a successful BPF_MAP_FREEZE cmd return. Meaning, any new BPF_*
cmd calls which would update/delete map entries will be rejected with -EPERM
since map_get_sys_perms() removes the FMODE_CAN_WRITE permission. This also
means that pending update/delete map entries must still complete before this
guarantee is given. This corner case is not an issue for loaders since they
create and prepare such program private map in successive steps.

However, a malicious user is able to trigger this TOCTOU race in two different
ways: i) via userfaultfd, and ii) via batched updates. For i) userfaultfd is
used to expand the competition interval, so that map_update_elem() can modify
the contents of the map after map_freeze() and bpf_prog_load() were executed.
This works, because userfaultfd halts the parallel thread which triggered a
map_update_elem() at the time where we copy key/value from the user buffer and
this already passed the FMODE_CAN_WRITE capability test given at that time the
map was not "frozen". Then, the main thread performs the map_freeze() and
bpf_prog_load(), and once that had completed successfully, the other thread
is woken up to complete the pending map_update_elem() which then changes the
map content. For ii) the idea of the batched update is similar, meaning, when
there are a large number of updates to be processed, it can increase the
competition interval between the two. It is therefore possible in practice to
modify the contents of the map after executing map_freeze() and bpf_prog_load().

One way to fix both i) and ii) at the same time is to expand the use of the
map's map->writecnt. The latter was introduced in fc9702273e2e ("bpf: Add mmap()
support for BPF_MAP_TYPE_ARRAY") and further refined in 1f6cb19be2e2 ("bpf:
Prevent re-mmap()'ing BPF map as writable for initially r/o mapping") with
the rationale to make a writable mmap()'ing of a map mutually exclusive with
read-only freezing. The counter indicates writable mmap() mappings and then
prevents/fails the freeze operation. Its semantics can be expanded beyond
just mmap() by generally indicating ongoing write phases. This would essentially
span any parallel regular and batched flavor of update/delete operation and
then also have map_freeze() fail with -EBUSY. For the check_mem_access() in
the verifier we expand upon the bpf_map_is_rdonly() check ensuring that all
last pending writes have completed via bpf_map_write_active() test. Once the
map->frozen is set and bpf_map_write_active() indicates a map->writecnt of 0
only then we are really guaranteed to use the map's data as known constants.
For map->frozen being set and pending writes in process of still being completed
we fall back to marking that register as unknown scalar so we don't end up
making assumptions about it. With this, both TOCTOU reproducers from i) and
ii) are fixed.

Note that the map->writecnt has been converted into a atomic64 in the fix in
order to avoid a double freeze_mutex mutex_{un,}lock() pair when updating
map->writecnt in the various map update/delete BPF_* cmd flavors. Spanning
the freeze_mutex over entire map update/delete operations in syscall side
would not be possible due to then causing everything to be serialized.
Similarly, something like synchronize_rcu() after setting map->frozen to wait
for update/deletes to complete is not possible either since it would also
have to span the user copy which can sleep. On the libbpf side, this won't
break d66562fba1ce ("libbpf: Add BPF object skeleton support") as the
anonymous mmap()-ed "map initialization image" is remapped as a BPF map-backed
mmap()-ed memory where for .rodata it's non-writable.

Fixes: a23740ec43ba ("bpf: Track contents of read-only maps as scalars")
Reported-by: [email protected]
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
3 years agosamples/bpf: Fix build error due to -isystem removal
Alexander Lobakin [Mon, 15 Nov 2021 13:07:41 +0000 (14:07 +0100)]
samples/bpf: Fix build error due to -isystem removal

Since recent Kbuild updates we no longer include files from compiler
directories. However, samples/bpf/hbm_kern.h hasn't been tuned for
this (LLVM 13):

  CLANG-bpf  samples/bpf/hbm_out_kern.o
In file included from samples/bpf/hbm_out_kern.c:55:
samples/bpf/hbm_kern.h:12:10: fatal error: 'stddef.h' file not found
         ^~~~~~~~~~
1 error generated.
  CLANG-bpf  samples/bpf/hbm_edt_kern.o
In file included from samples/bpf/hbm_edt_kern.c:53:
samples/bpf/hbm_kern.h:12:10: fatal error: 'stddef.h' file not found
         ^~~~~~~~~~
1 error generated.

It is enough to just drop both stdbool.h and stddef.h from includes
to fix those.

Fixes: 04e85bbf71c9 ("isystem: delete global -isystem compile option")
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
3 years agoMerge branch 'Forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs'
Alexei Starovoitov [Sun, 14 Nov 2021 18:33:25 +0000 (10:33 -0800)]
Merge branch 'Forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs'

Dmitrii Banshchikov says:

====================

Various locking issues are possible with bpf_ktime_get_coarse_ns() and
bpf_timer_* set of helpers.

syzbot found a locking issue with bpf_ktime_get_coarse_ns() helper executed in
BPF_PROG_TYPE_PERF_EVENT prog type - [1]. The issue is possible because the
helper uses non fast version of time accessor that isn't safe for any context.
The helper was added because it provided performance benefits in comparison to
bpf_ktime_get_ns() helper.

A similar locking issue is possible with bpf_timer_* set of helpers when used
in tracing progs.

The solution is to restrict use of the helpers in tracing progs.

In the [1] discussion it was stated that bpf_spin_lock related helpers shall
also be excluded for tracing progs. The verifier has a compatibility check
between a map and a program. If a tracing program tries to use a map which
value has struct bpf_spin_lock the verifier fails that is why bpf_spin_lock is
already restricted.

Patch 1 restricts helpers
Patch 2 adds tests

v1 -> v2:
 * Limit the helpers via func proto getters instead of allowed callback
 * Add note about helpers' restrictions to linux/bpf.h
 * Add Fixes tag
 * Remove extra \0 from btf_str_sec
 * Beside asm tests add prog tests
 * Trim CC

1. https://lore.kernel.org/all/00000000000013aebd05cff8e064@google.com/
====================

Signed-off-by: Alexei Starovoitov <[email protected]>
3 years agoselftests/bpf: Add tests for restricted helpers
Dmitrii Banshchikov [Sat, 13 Nov 2021 14:22:27 +0000 (18:22 +0400)]
selftests/bpf: Add tests for restricted helpers

This patch adds tests that bpf_ktime_get_coarse_ns(), bpf_timer_* and
bpf_spin_lock()/bpf_spin_unlock() helpers are forbidden in tracing progs
as their use there may result in various locking issues.

Signed-off-by: Dmitrii Banshchikov <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
3 years agobpf: Forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs
Dmitrii Banshchikov [Sat, 13 Nov 2021 14:22:26 +0000 (18:22 +0400)]
bpf: Forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs

Use of bpf_ktime_get_coarse_ns() and bpf_timer_* helpers in tracing
progs may result in locking issues.

bpf_ktime_get_coarse_ns() uses ktime_get_coarse_ns() time accessor that
isn't safe for any context:
======================================================
WARNING: possible circular locking dependency detected
5.15.0-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor.4/14877 is trying to acquire lock:
ffffffff8cb30008 (tk_core.seq.seqcount){----}-{0:0}, at: ktime_get_coarse_ts64+0x25/0x110 kernel/time/timekeeping.c:2255

but task is already holding lock:
ffffffff90dbf200 (&obj_hash[i].lock){-.-.}-{2:2}, at: debug_object_deactivate+0x61/0x400 lib/debugobjects.c:735

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&obj_hash[i].lock){-.-.}-{2:2}:
       lock_acquire+0x19f/0x4d0 kernel/locking/lockdep.c:5625
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
       _raw_spin_lock_irqsave+0xd1/0x120 kernel/locking/spinlock.c:162
       __debug_object_init+0xd9/0x1860 lib/debugobjects.c:569
       debug_hrtimer_init kernel/time/hrtimer.c:414 [inline]
       debug_init kernel/time/hrtimer.c:468 [inline]
       hrtimer_init+0x20/0x40 kernel/time/hrtimer.c:1592
       ntp_init_cmos_sync kernel/time/ntp.c:676 [inline]
       ntp_init+0xa1/0xad kernel/time/ntp.c:1095
       timekeeping_init+0x512/0x6bf kernel/time/timekeeping.c:1639
       start_kernel+0x267/0x56e init/main.c:1030
       secondary_startup_64_no_verify+0xb1/0xbb

-> #0 (tk_core.seq.seqcount){----}-{0:0}:
       check_prev_add kernel/locking/lockdep.c:3051 [inline]
       check_prevs_add kernel/locking/lockdep.c:3174 [inline]
       validate_chain+0x1dfb/0x8240 kernel/locking/lockdep.c:3789
       __lock_acquire+0x1382/0x2b00 kernel/locking/lockdep.c:5015
       lock_acquire+0x19f/0x4d0 kernel/locking/lockdep.c:5625
       seqcount_lockdep_reader_access+0xfe/0x230 include/linux/seqlock.h:103
       ktime_get_coarse_ts64+0x25/0x110 kernel/time/timekeeping.c:2255
       ktime_get_coarse include/linux/timekeeping.h:120 [inline]
       ktime_get_coarse_ns include/linux/timekeeping.h:126 [inline]
       ____bpf_ktime_get_coarse_ns kernel/bpf/helpers.c:173 [inline]
       bpf_ktime_get_coarse_ns+0x7e/0x130 kernel/bpf/helpers.c:171
       bpf_prog_a99735ebafdda2f1+0x10/0xb50
       bpf_dispatcher_nop_func include/linux/bpf.h:721 [inline]
       __bpf_prog_run include/linux/filter.h:626 [inline]
       bpf_prog_run include/linux/filter.h:633 [inline]
       BPF_PROG_RUN_ARRAY include/linux/bpf.h:1294 [inline]
       trace_call_bpf+0x2cf/0x5d0 kernel/trace/bpf_trace.c:127
       perf_trace_run_bpf_submit+0x7b/0x1d0 kernel/events/core.c:9708
       perf_trace_lock+0x37c/0x440 include/trace/events/lock.h:39
       trace_lock_release+0x128/0x150 include/trace/events/lock.h:58
       lock_release+0x82/0x810 kernel/locking/lockdep.c:5636
       __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:149 [inline]
       _raw_spin_unlock_irqrestore+0x75/0x130 kernel/locking/spinlock.c:194
       debug_hrtimer_deactivate kernel/time/hrtimer.c:425 [inline]
       debug_deactivate kernel/time/hrtimer.c:481 [inline]
       __run_hrtimer kernel/time/hrtimer.c:1653 [inline]
       __hrtimer_run_queues+0x2f9/0xa60 kernel/time/hrtimer.c:1749
       hrtimer_interrupt+0x3b3/0x1040 kernel/time/hrtimer.c:1811
       local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
       __sysvec_apic_timer_interrupt+0xf9/0x270 arch/x86/kernel/apic/apic.c:1103
       sysvec_apic_timer_interrupt+0x8c/0xb0 arch/x86/kernel/apic/apic.c:1097
       asm_sysvec_apic_timer_interrupt+0x12/0x20
       __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:152 [inline]
       _raw_spin_unlock_irqrestore+0xd4/0x130 kernel/locking/spinlock.c:194
       try_to_wake_up+0x702/0xd20 kernel/sched/core.c:4118
       wake_up_process kernel/sched/core.c:4200 [inline]
       wake_up_q+0x9a/0xf0 kernel/sched/core.c:953
       futex_wake+0x50f/0x5b0 kernel/futex/waitwake.c:184
       do_futex+0x367/0x560 kernel/futex/syscalls.c:127
       __do_sys_futex kernel/futex/syscalls.c:199 [inline]
       __se_sys_futex+0x401/0x4b0 kernel/futex/syscalls.c:180
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae

There is a possible deadlock with bpf_timer_* set of helpers:
hrtimer_start()
  lock_base();
  trace_hrtimer...()
    perf_event()
      bpf_run()
        bpf_timer_start()
          hrtimer_start()
            lock_base()         <- DEADLOCK

Forbid use of bpf_ktime_get_coarse_ns() and bpf_timer_* helpers in
BPF_PROG_TYPE_KPROBE, BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT
and BPF_PROG_TYPE_RAW_TRACEPOINT prog types.

Fixes: d05512618056 ("bpf: Add bpf_ktime_get_coarse_ns helper")
Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.")
Reported-by: [email protected]
Signed-off-by: Dmitrii Banshchikov <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
3 years agoiavf: Restore VLAN filters after link down
Akeem G Abodunrin [Fri, 4 Jun 2021 16:53:27 +0000 (09:53 -0700)]
iavf: Restore VLAN filters after link down

Restore VLAN filters after the link is brought down, and up - since all
filters are deleted from HW during the netdev link down routine.

Fixes: ed1f5b58ea01 ("i40evf: remove VLAN filters on close")
Signed-off-by: Akeem G Abodunrin <[email protected]>
Tested-by: George Kuruvinakunnel <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
3 years agoiavf: Fix for setting queues to 0
Grzegorz Szczurek [Fri, 4 Jun 2021 16:49:00 +0000 (09:49 -0700)]
iavf: Fix for setting queues to 0

Now setting combine to 0 will be rejected with the
appropriate error code.
This has been implemented by adding a condition that checks
the value of combine equal to zero.
Without this patch, when the user requested it, no error was
returned and combine was set to the default value for VF.

Fixes: 5520deb15326 ("iavf: Enable support for up to 16 queues")
Signed-off-by: Grzegorz Szczurek <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
3 years agoiavf: Fix for the false positive ASQ/ARQ errors while issuing VF reset
Surabhi Boob [Fri, 4 Jun 2021 16:48:59 +0000 (09:48 -0700)]
iavf: Fix for the false positive ASQ/ARQ errors while issuing VF reset

While issuing VF Reset from the guest OS, the VF driver prints
logs about critical / Overflow error detection. This is not an
actual error since the VF_MBX_ARQLEN register is set to all FF's
for a short period of time and the VF would catch the bits set if
it was reading the register during that spike of time.
This patch introduces an additional check to ignore this condition
since the VF is in reset.

Fixes: 19b73d8efaa4 ("i40evf: Add additional check for reset")
Signed-off-by: Surabhi Boob <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
3 years agoiavf: validate pointers
Mitch Williams [Fri, 4 Jun 2021 16:48:58 +0000 (09:48 -0700)]
iavf: validate pointers

In some cases, the ethtool get_rxfh handler may be called with a null
key or indir parameter. So check these pointers, or you will have a very
bad day.

Fixes: 43a3d9ba34c9 ("i40evf: Allow PF driver to configure RSS")
Signed-off-by: Mitch Williams <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
3 years agoiavf: prevent accidental free of filter structure
Jacob Keller [Fri, 4 Jun 2021 16:48:57 +0000 (09:48 -0700)]
iavf: prevent accidental free of filter structure

In iavf_config_clsflower, the filter structure could be accidentally
released at the end, if iavf_parse_cls_flower or iavf_handle_tclass ever
return a non-zero but positive value.

In this case, the function continues through to the end, and will call
kfree() on the filter structure even though it has been added to the
linked list.

This can actually happen because iavf_parse_cls_flower will return
a positive IAVF_ERR_CONFIG value instead of the traditional negative
error codes.

Fix this by ensuring that the kfree() check and error checks are
similar. Use the more idiomatic "if (err)" to catch all non-zero error
codes.

Fixes: 0075fa0fadd0 ("i40evf: Add support to apply cloud filters")
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
3 years agoiavf: Fix failure to exit out from last all-multicast mode
Piotr Marczak [Fri, 4 Jun 2021 16:48:56 +0000 (09:48 -0700)]
iavf: Fix failure to exit out from last all-multicast mode

The driver could only quit allmulti when allmulti and promisc modes are
turn on at the same time. If promisc had been off there was no way to turn
off allmulti mode.
The patch corrects this behavior. Switching allmulti does not depends on
promisc state mode anymore

Fixes: f42a5c74da99 ("i40e: Add allmulti support for the VF")
Signed-off-by: Piotr Marczak <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
3 years agoiavf: don't clear a lock we don't hold
Nicholas Nunley [Fri, 4 Jun 2021 16:48:55 +0000 (09:48 -0700)]
iavf: don't clear a lock we don't hold

In iavf_configure_clsflower() the function will bail out if it is unable
to obtain the crit_section lock in a reasonable time. However, it will
clear the lock when exiting, so fix this.

Fixes: 640a8af5841f ("i40evf: Reorder configure_clsflower to avoid deadlock on error")
Signed-off-by: Nicholas Nunley <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
3 years agoiavf: free q_vectors before queues in iavf_disable_vf
Nicholas Nunley [Fri, 4 Jun 2021 16:48:54 +0000 (09:48 -0700)]
iavf: free q_vectors before queues in iavf_disable_vf

iavf_free_queues() clears adapter->num_active_queues, which
iavf_free_q_vectors() relies on, so swap the order of these two function
calls in iavf_disable_vf(). This resolves a panic encountered when the
interface is disabled and then later brought up again after PF
communication is restored.

Fixes: 65c7006f234c ("i40evf: assign num_active_queues inside i40evf_alloc_queues")
Signed-off-by: Nicholas Nunley <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
3 years agoiavf: check for null in iavf_fix_features
Nicholas Nunley [Fri, 4 Jun 2021 16:48:53 +0000 (09:48 -0700)]
iavf: check for null in iavf_fix_features

If the driver has lost contact with the PF then it enters a disabled state
and frees adapter->vf_res. However, ndo_fix_features can still be called on
the interface, so we need to check for this condition first. Since we have
no information on the features at this time simply leave them unmodified
and return.

Fixes: c4445aedfe09 ("i40evf: Fix VLAN features")
Signed-off-by: Nicholas Nunley <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
3 years agoiavf: Fix return of set the new channel count
Mateusz Palczewski [Tue, 9 Feb 2021 11:59:38 +0000 (11:59 +0000)]
iavf: Fix return of set the new channel count

Fixed return correct code from set the new channel count.
Implemented by check if reset is done in appropriate time.
This solution give a extra time to pf for reset vf in case
when user want set new channel count for all vfs.
Without this patch it is possible to return misleading output
code to user and vf reset not to be correctly performed by pf.

Fixes: 5520deb15326 ("iavf: Enable support for up to 16 queues")
Signed-off-by: Grzegorz Szczurek <[email protected]>
Signed-off-by: Mateusz Palczewski <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
3 years agoNFSD: Fix exposure in nfsd4_decode_bitmap()
Chuck Lever [Sun, 14 Nov 2021 20:16:04 +0000 (15:16 -0500)]
NFSD: Fix exposure in nfsd4_decode_bitmap()

[email protected] reports:
> nfsd4_decode_bitmap4() will write beyond bmval[bmlen-1] if the RPC
> directs it to do so. This can cause nfsd4_decode_state_protect4_a()
> to write client-supplied data beyond the end of
> nfsd4_exchange_id.spo_must_allow[] when called by
> nfsd4_decode_exchange_id().

Rewrite the loops so nfsd4_decode_bitmap() cannot iterate beyond
@bmlen.

Reported by: [email protected]
Fixes: d1c263a031e8 ("NFSD: Replace READ* macros in nfsd4_decode_fattr()")
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
3 years agonet/smc: Make sure the link_id is unique
Wen Gu [Mon, 15 Nov 2021 09:45:07 +0000 (17:45 +0800)]
net/smc: Make sure the link_id is unique

The link_id is supposed to be unique, but smcr_next_link_id() doesn't
skip the used link_id as expected. So the patch fixes this.

Fixes: 026c381fb477 ("net/smc: introduce link_idx for link group array")
Signed-off-by: Wen Gu <[email protected]>
Reviewed-by: Tony Lu <[email protected]>
Acked-by: Karsten Graul <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agosock: fix /proc/net/sockstat underflow in sk_clone_lock()
Tetsuo Handa [Mon, 15 Nov 2021 10:16:56 +0000 (19:16 +0900)]
sock: fix /proc/net/sockstat underflow in sk_clone_lock()

sk_clone_lock() needs to call sock_inuse_add(1) before entering the
sk_free_unlock_clone() error path, for __sk_free() from sk_free() from
sk_free_unlock_clone() calls sock_inuse_add(-1).

Signed-off-by: Tetsuo Handa <[email protected]>
Fixes: 648845ab7e200993 ("sock: Move the socket inuse to namespace.")
Signed-off-by: David S. Miller <[email protected]>
3 years agotipc: only accept encrypted MSG_CRYPTO msgs
Xin Long [Mon, 15 Nov 2021 12:45:24 +0000 (07:45 -0500)]
tipc: only accept encrypted MSG_CRYPTO msgs

The MSG_CRYPTO msgs are always encrypted and sent to other nodes
for keys' deployment. But when receiving in peers, if those nodes
do not validate it and make sure it's encrypted, one could craft
a malicious MSG_CRYPTO msg to deploy its key with no need to know
other nodes' keys.

This patch is to do that by checking TIPC_SKB_CB(skb)->decrypted
and discard it if this packet never got decrypted.

Note that this is also a supplementary fix to CVE-2021-43267 that
can be triggered by an unencrypted malicious MSG_CRYPTO msg.

Fixes: 1ef6f7c9390f ("tipc: add automatic session key exchange")
Acked-by: Ying Xue <[email protected]>
Acked-by: Jon Maloy <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agonet: return correct error code
liuguoqiang [Mon, 15 Nov 2021 08:14:48 +0000 (16:14 +0800)]
net: return correct error code

When kmemdup called failed and register_net_sysctl return NULL, should
return ENOMEM instead of ENOBUFS

Signed-off-by: liuguoqiang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agonet: stmmac: socfpga: add runtime suspend/resume callback for stratix10 platform
Meng Li [Mon, 15 Nov 2021 07:04:23 +0000 (15:04 +0800)]
net: stmmac: socfpga: add runtime suspend/resume callback for stratix10 platform

According to upstream commit 5ec55823438e("net: stmmac:
add clocks management for gmac driver"), it improve clocks
management for stmmac driver. So, it is necessary to implement
the runtime callback in dwmac-socfpga driver because it doesn't
use the common stmmac_pltfr_pm_ops instance. Otherwise, clocks
are not disabled when system enters suspend status.

Fixes: 5ec55823438e ("net: stmmac: add clocks management for gmac driver")
Cc: [email protected]
Signed-off-by: Meng Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agoMerge branch 'bnxt_en-fixes'
David S. Miller [Mon, 15 Nov 2021 14:13:20 +0000 (14:13 +0000)]
Merge branch 'bnxt_en-fixes'

Michael Chan says:

====================
bnxt_en: Bug fixes

This series includes 3 fixes.  The first one fixes a race condition
between devlink reload and SR-IOV configuration.  The second one
fixes a type mismatch warning in devlink fw live patching.  The
last one fixes unwanted OVS TC dmesg error logs when tc-hw-offload is
disabled on bnxt_en.
====================

Signed-off-by: David S. Miller <[email protected]>
3 years agobnxt_en: reject indirect blk offload when hw-tc-offload is off
Sriharsha Basavapatna [Mon, 15 Nov 2021 07:38:01 +0000 (02:38 -0500)]
bnxt_en: reject indirect blk offload when hw-tc-offload is off

The driver does not check if hw-tc-offload is enabled for the device
before offloading a flow in the context of indirect block callback.
Fix this by checking NETIF_F_HW_TC in the features flag and rejecting
the offload request.  This will avoid unnecessary dmesg error logs when
hw-tc-offload is disabled, such as these:

bnxt_en 0000:19:00.1 eno2np1: dev(ifindex=294) not on same switch
bnxt_en 0000:19:00.1 eno2np1: Error: bnxt_tc_add_flow: cookie=0xffff8dace1c88000 error=-22
bnxt_en 0000:19:00.0 eno1np0: dev(ifindex=294) not on same switch
bnxt_en 0000:19:00.0 eno1np0: Error: bnxt_tc_add_flow: cookie=0xffff8dace1c88000 error=-22

Reported-by: Marcelo Ricardo Leitner <[email protected]>
Fixes: 627c89d00fb9 ("bnxt_en: flow_offload: offload tunnel decap rules via indirect callbacks")
Signed-off-by: Sriharsha Basavapatna <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agobnxt_en: fix format specifier in live patch error message
Edwin Peer [Mon, 15 Nov 2021 07:38:00 +0000 (02:38 -0500)]
bnxt_en: fix format specifier in live patch error message

This fixes type mismatch warning.

Reported-by: kernel test robot <[email protected]>
Fixes: 3c4153394e2c ("bnxt_en: implement firmware live patching")
Signed-off-by: Edwin Peer <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agobnxt_en: extend RTNL to VF check in devlink driver_reinit
Edwin Peer [Mon, 15 Nov 2021 07:37:59 +0000 (02:37 -0500)]
bnxt_en: extend RTNL to VF check in devlink driver_reinit

The fixes the race condition between configuring SR-IOV and devlink
reload.  The SR-IOV configure logic already takes the RTNL lock,
setting sriov_cfg under the lock while changes are underway. Extend
the lock scope in devlink driver_reinit to cover the VF check and
don't run concurrently with SR-IOV configure.

Reported-by: Leon Romanovsky <[email protected]>
Fixes: 228ea8c187d8 ("bnxt_en: implement devlink dev reload driver_reinit")
Cc: Leon Romanovsky <[email protected]>
Reviewed-by: Somnath Kotur <[email protected]>
Reviewed-by: Pavan Chebbi <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: Edwin Peer <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agonet: ethernet: lantiq_etop: fix build errors/warnings
Randy Dunlap [Mon, 15 Nov 2021 01:02:29 +0000 (17:02 -0800)]
net: ethernet: lantiq_etop: fix build errors/warnings

Fix build error and warnings reported by kernel test robot:

drivers/net/ethernet/lantiq_etop.c: In function 'ltq_etop_probe':
drivers/net/ethernet/lantiq_etop.c:673:15: error: implicit declaration of function 'device_property_read_u32' [-Werror=implicit-function-declaration]
     673 |         err = device_property_read_u32(&pdev->dev, "lantiq,tx-burst-length", &priv->tx_burst_len);

   drivers/net/ethernet/lantiq_etop.c: At top level:
   drivers/net/ethernet/lantiq_etop.c:730:1: warning: no previous prototype for 'init_ltq_etop' [-Wmissing-prototypes]
     730 | init_ltq_etop(void)

   drivers/net/ethernet/lantiq_etop.c: In function 'ltq_etop_hw_init':
   drivers/net/ethernet/lantiq_etop.c:276:25: warning: ignoring return value of 'request_irq' declared with attribute 'warn_unused_result' [-Wunused-result]
     276 |                         request_irq(irq, ltq_etop_dma_irq, 0, "etop_tx", priv);
   drivers/net/ethernet/lantiq_etop.c:284:25: warning: ignoring return value of 'request_irq' declared with attribute 'warn_unused_result' [-Wunused-result]
     284 |                         request_irq(irq, ltq_etop_dma_irq, 0, "etop_rx", priv);

Fixes: 14d4e308e0aa ("net: lantiq: configure the burst length in ethernet drivers")
Fixes: dddb29e42770 ("net: lantiq_etop: remove deprecated IRQF_DISABLED")
Fixes: 504d4721ee8e ("MIPS: Lantiq: Add ethernet driver")
Signed-off-by: Randy Dunlap <[email protected]>
Reported-by: kernel test robot <[email protected]>
Link: lore.kernel.org/r/202111090621[email protected]
To: [email protected]
Cc: Aleksander Jan Bajkowski <[email protected]>
Cc: Hauke Mehrtens <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: John Crispin <[email protected]>
Cc: [email protected]
Cc: Ralf Baechle <[email protected]>
Cc: Michael Opdenacker <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agoprintk: Remove printk.h inclusion in percpu.h
Andy Shevchenko [Fri, 12 Nov 2021 14:07:49 +0000 (16:07 +0200)]
printk: Remove printk.h inclusion in percpu.h

After the commit 42a0bb3f7138 ("printk/nmi: generic solution for safe
printk in NMI") the printk.h is not needed anymore in percpu.h.

Moreover `make headerdep` complains (an excerpt)

In file included from linux/printk.h,
                 from linux/dynamic_debug.h:188
                 from linux/printk.h:559 <-- here
                 from linux/percpu.h:9
                 from linux/idr.h:17
include/net/9p/client.h:13: warning: recursive header inclusion

Yeah, it's not a root cause of this, but removing will help to reduce
the noise.

Fixes: 42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI")
Signed-off-by: Andy Shevchenko <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Signed-off-by: Petr Mladek <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
3 years agoatlantic: Fix OOB read and write in hw_atl_utils_fw_rpc_wait
Zekun Shen [Sun, 14 Nov 2021 03:24:40 +0000 (22:24 -0500)]
atlantic: Fix OOB read and write in hw_atl_utils_fw_rpc_wait

This bug report shows up when running our research tools. The
reports is SOOB read, but it seems SOOB write is also possible
a few lines below.

In details, fw.len and sw.len are inputs coming from io. A len
over the size of self->rpc triggers SOOB. The patch fixes the
bugs by adding sanity checks.

The bugs are triggerable with compromised/malfunctioning devices.
They are potentially exploitable given they first leak up to
0xffff bytes and able to overwrite the region later.

The patch is tested with QEMU emulater.
This is NOT tested with a real device.

Attached is the log we found by fuzzing.

BUG: KASAN: slab-out-of-bounds in
hw_atl_utils_fw_upload_dwords+0x393/0x3c0 [atlantic]
Read of size 4 at addr ffff888016260b08 by task modprobe/213
CPU: 0 PID: 213 Comm: modprobe Not tainted 5.6.0 #1
Call Trace:
 dump_stack+0x76/0xa0
 print_address_description.constprop.0+0x16/0x200
 ? hw_atl_utils_fw_upload_dwords+0x393/0x3c0 [atlantic]
 ? hw_atl_utils_fw_upload_dwords+0x393/0x3c0 [atlantic]
 __kasan_report.cold+0x37/0x7c
 ? aq_hw_read_reg_bit+0x60/0x70 [atlantic]
 ? hw_atl_utils_fw_upload_dwords+0x393/0x3c0 [atlantic]
 kasan_report+0xe/0x20
 hw_atl_utils_fw_upload_dwords+0x393/0x3c0 [atlantic]
 hw_atl_utils_fw_rpc_call+0x95/0x130 [atlantic]
 hw_atl_utils_fw_rpc_wait+0x176/0x210 [atlantic]
 hw_atl_utils_mpi_create+0x229/0x2e0 [atlantic]
 ? hw_atl_utils_fw_rpc_wait+0x210/0x210 [atlantic]
 ? hw_atl_utils_initfw+0x9f/0x1c8 [atlantic]
 hw_atl_utils_initfw+0x12a/0x1c8 [atlantic]
 aq_nic_ndev_register+0x88/0x650 [atlantic]
 ? aq_nic_ndev_init+0x235/0x3c0 [atlantic]
 aq_pci_probe+0x731/0x9b0 [atlantic]
 ? aq_pci_func_init+0xc0/0xc0 [atlantic]
 local_pci_probe+0xd3/0x160
 pci_device_probe+0x23f/0x3e0

Reported-by: Brendan Dolan-Gavitt <[email protected]>
Signed-off-by: Zekun Shen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agonet: bnx2x: fix variable dereferenced before check
Pavel Skripkin [Sat, 13 Nov 2021 22:36:36 +0000 (01:36 +0300)]
net: bnx2x: fix variable dereferenced before check

Smatch says:
bnx2x_init_ops.h:640 bnx2x_ilt_client_mem_op()
warn: variable dereferenced before check 'ilt' (see line 638)

Move ilt_cli variable initialization _after_ ilt validation, because
it's unsafe to deref the pointer before validation check.

Fixes: 523224a3b3cd ("bnx2x, cnic, bnx2i: use new FW/HSI")
Signed-off-by: Pavel Skripkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agonet/smc: Transfer remaining wait queue entries during fallback
Wen Gu [Sat, 13 Nov 2021 07:33:35 +0000 (15:33 +0800)]
net/smc: Transfer remaining wait queue entries during fallback

The SMC fallback is incomplete currently. There may be some
wait queue entries remaining in smc socket->wq, which should
be removed to clcsocket->wq during the fallback.

For example, in nginx/wrk benchmark, this issue causes an
all-zeros test result:

server: nginx -g 'daemon off;'
client: smc_run wrk -c 1 -t 1 -d 5 http://11.200.15.93/index.html

  Running 5s test @ http://11.200.15.93/index.html
     1 threads and 1 connections
     Thread Stats   Avg      Stdev     Max   Â± Stdev
      Latency     0.00us    0.00us   0.00us    -nan%
Req/Sec     0.00      0.00     0.00      -nan%
0 requests in 5.00s, 0.00B read
     Requests/sec:      0.00
     Transfer/sec:       0.00B

The reason for this all-zeros result is that when wrk used SMC
to replace TCP, it added an eppoll_entry into smc socket->wq
and expected to be notified if epoll events like EPOLL_IN/
EPOLL_OUT occurred on the smc socket.

However, once a fallback occurred, wrk switches to use clcsocket.
Now it is clcsocket->wq instead of smc socket->wq which will
be woken up. The eppoll_entry remaining in smc socket->wq does
not work anymore and wrk stops the test.

This patch fixes this issue by removing remaining wait queue
entries from smc socket->wq to clcsocket->wq during the fallback.

Link: https://www.spinics.net/lists/netdev/msg779769.html
Signed-off-by: Wen Gu <[email protected]>
Reviewed-by: Tony Lu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agoMerge branch 'net-ipa-fixes'
David S. Miller [Mon, 15 Nov 2021 13:25:45 +0000 (13:25 +0000)]
Merge branch 'net-ipa-fixes'

Alex Elder says:

====================
net: ipa: HOLB register write fixes

This small series fixes two recently identified bugs related to the
way two registers must be written.  The registers define whether and
when to drop packets if a head-of-line blocking condition is
encountered.  The "enable" (dropping packets) register must be
written twice for newer versions of hardware.  And the timer
register must not be written while dropping is enabled.
====================

Signed-off-by: David S. Miller <[email protected]>
3 years agonet: ipa: disable HOLB drop when updating timer
Alex Elder [Fri, 12 Nov 2021 22:22:10 +0000 (16:22 -0600)]
net: ipa: disable HOLB drop when updating timer

The head-of-line blocking timer should only be modified when
head-of-line drop is disabled.

One of the steps in recovering from a modem crash is to enable
dropping of packets with timeout of 0 (immediate).  We don't know
how the modem configured its endpoints, so before we program the
timer, we need to ensure HOL_BLOCK is disabled.

Fixes: 84f9bd12d46db ("soc: qcom: ipa: IPA endpoints")
Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agonet: ipa: HOLB register sometimes must be written twice
Alex Elder [Fri, 12 Nov 2021 22:22:09 +0000 (16:22 -0600)]
net: ipa: HOLB register sometimes must be written twice

Starting with IPA v4.5, the HOL_BLOCK_EN register must be written
twice when enabling head-of-line blocking avoidance.

Fixes: 84f9bd12d46db ("soc: qcom: ipa: IPA endpoints")
Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agonet: Clean up some inconsistent indenting
Jiapeng Chong [Fri, 12 Nov 2021 10:16:34 +0000 (18:16 +0800)]
net: Clean up some inconsistent indenting

Eliminate the follow smatch warning:

./include/linux/skbuff.h:4229 skb_remcsum_process() warn: inconsistent
indenting.

Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Jiapeng Chong <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agotun: fix bonding active backup with arp monitoring
Nicolas Dichtel [Fri, 12 Nov 2021 07:56:03 +0000 (08:56 +0100)]
tun: fix bonding active backup with arp monitoring

As stated in the bonding doc, trans_start must be set manually for drivers
using NETIF_F_LLTX:
 Drivers that use NETIF_F_LLTX flag must also update
 netdev_queue->trans_start. If they do not, then the ARP monitor will
 immediately fail any slaves using that driver, and those slaves will stay
 down.

Link: https://www.kernel.org/doc/html/v5.15/networking/bonding.html#arp-monitor-operation
Signed-off-by: Nicolas Dichtel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agotipc: use consistent GFP flags
Tadeusz Struk [Thu, 11 Nov 2021 20:59:16 +0000 (12:59 -0800)]
tipc: use consistent GFP flags

Some functions, like tipc_crypto_start use inconsisten GFP flags
when allocating memory. The mentioned function use GFP_ATOMIC to
to alloc a crypto instance, and then calls alloc_ordered_workqueue()
which allocates memory with GFP_KERNEL. tipc_aead_init() function
even uses GFP_KERNEL and GFP_ATOMIC interchangeably.
No doc comment specifies what context a function is designed to
work in, but the flags should at least be consistent within a function.

Cc: Jon Maloy <[email protected]>
Cc: Ying Xue <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Tadeusz Struk <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agox86/hyperv: Move required MSRs check to initial platform probing
Sean Christopherson [Thu, 4 Nov 2021 18:22:39 +0000 (18:22 +0000)]
x86/hyperv: Move required MSRs check to initial platform probing

Explicitly check for MSR_HYPERCALL and MSR_VP_INDEX support when probing
for running as a Hyper-V guest instead of waiting until hyperv_init() to
detect the bogus configuration.  Add messages to give the admin a heads
up that they are likely running on a broken virtual machine setup.

At best, silently disabling Hyper-V is confusing and difficult to debug,
e.g. the kernel _says_ it's using all these fancy Hyper-V features, but
always falls back to the native versions.  At worst, the half baked setup
will crash/hang the kernel.

Reviewed-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
3 years agox86/hyperv: Fix NULL deref in set_hv_tscchange_cb() if Hyper-V setup fails
Sean Christopherson [Thu, 4 Nov 2021 18:22:38 +0000 (18:22 +0000)]
x86/hyperv: Fix NULL deref in set_hv_tscchange_cb() if Hyper-V setup fails

Check for a valid hv_vp_index array prior to derefencing hv_vp_index when
setting Hyper-V's TSC change callback.  If Hyper-V setup failed in
hyperv_init(), the kernel will still report that it's running under
Hyper-V, but will have silently disabled nearly all functionality.

  BUG: kernel NULL pointer dereference, address: 0000000000000010
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP
  CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.15.0-rc2+ #75
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:set_hv_tscchange_cb+0x15/0xa0
  Code: <8b> 04 82 8b 15 12 17 85 01 48 c1 e0 20 48 0d ee 00 01 00 f6 c6 08
  ...
  Call Trace:
   kvm_arch_init+0x17c/0x280
   kvm_init+0x31/0x330
   vmx_init+0xba/0x13a
   do_one_initcall+0x41/0x1c0
   kernel_init_freeable+0x1f2/0x23b
   kernel_init+0x16/0x120
   ret_from_fork+0x22/0x30

Fixes: 93286261de1b ("x86/hyperv: Reenlightenment notifications support")
Cc: [email protected]
Cc: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
3 years agoDrivers: hv: balloon: Use VMBUS_RING_SIZE() wrapper for dm_ring_size
Boqun Feng [Mon, 1 Nov 2021 15:00:26 +0000 (23:00 +0800)]
Drivers: hv: balloon: Use VMBUS_RING_SIZE() wrapper for dm_ring_size

Baihua reported an error when boot an ARM64 guest with PAGE_SIZE=64k and
BALLOON is enabled:

hv_vmbus: registering driver hv_balloon
hv_vmbus: probe failed for device 1eccfd72-4b41-45ef-b73a-4a6e44c12924 (-22)

The cause of this is that the ringbuffer size for hv_balloon is not
adjusted with VMBUS_RING_SIZE(), which makes the size not large enough
for ringbuffers on guest with PAGE_SIZE=64k. Therefore use
VMBUS_RING_SIZE() to calculate the ringbuffer size. Note that the old
size (20 * 1024) counts a 4k header in the total size, while
VMBUS_RING_SIZE() expects the parameter as the payload size, so use
16 * 1024.

Cc: <[email protected]> # 5.15.x
Reported-by: Baihua Lu <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
Tested-by: Vitaly Kuznetsov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
3 years agoselftests: nft_nat: switch port shadow test cases to socat
Florian Westphal [Thu, 11 Nov 2021 17:23:30 +0000 (18:23 +0100)]
selftests: nft_nat: switch port shadow test cases to socat

There are now at least three distinct flavours of netcat/nc tool:
'original' version, one version ported from openbsd and nmap-ncat.

The script only works with original because it sets SOREUSEPORT option.

Other nc versions return 'port already in use' error and port shadow test fails:

PASS: inet IPv6 redirection for ns2-hMHcaRvx
nc: bind failed: Address already in use
ERROR: portshadow test default: got reply from "ROUTER", not CLIENT as intended

Switch to socat instead.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
3 years agomac80211: fix throughput LED trigger
Felix Fietkau [Sat, 13 Nov 2021 06:34:15 +0000 (07:34 +0100)]
mac80211: fix throughput LED trigger

The codepaths for rx with decap offload and tx with itxq were not updating
the counters for the throughput led trigger.

Signed-off-by: Felix Fietkau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>
3 years agomac80211: fix monitor_sdata RCU/locking assertions
Johannes Berg [Fri, 12 Nov 2021 12:51:44 +0000 (13:51 +0100)]
mac80211: fix monitor_sdata RCU/locking assertions

Since commit a05829a7222e ("cfg80211: avoid holding the RTNL when
calling the driver") we've not only been protecting the pointer
to monitor_sdata with the RTNL, but also with the wiphy->mtx. This
is relevant in a number of lockdep assertions, e.g. the one we hit
in ieee80211_set_monitor_channel(). However, we're now protecting
all the assignments/dereferences, even the one in interface iter,
with the wiphy->mtx, so switch over the lockdep assertions to that
lock.

Fixes: a05829a7222e ("cfg80211: avoid holding the RTNL when calling the driver")
Signed-off-by: Johannes Berg <[email protected]>
Link: https://lore.kernel.org/r/20211112135143.cb8e8ceffef3.Iaa210f16f6904c8a7a24954fb3396da0ef86ec08@changeid
Signed-off-by: Johannes Berg <[email protected]>
3 years agomac80211: drop check for DONT_REORDER in __ieee80211_select_queue
Felix Fietkau [Wed, 10 Nov 2021 21:22:01 +0000 (22:22 +0100)]
mac80211: drop check for DONT_REORDER in __ieee80211_select_queue

When __ieee80211_select_queue is called, skb->cb has not been cleared yet,
which means that info->control.flags can contain garbage.
In some cases this leads to IEEE80211_TX_CTRL_DONT_REORDER being set, causing
packets marked for other queues to randomly end up in BE instead.

This flag only needs to be checked in ieee80211_select_queue_80211, since
the radiotap parser is the only piece of code that sets it

Fixes: 66d06c84730c ("mac80211: adhere to Tx control flag that prevents frame reordering")
Cc: [email protected]
Signed-off-by: Felix Fietkau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>
3 years agomac80211: fix radiotap header generation
Johannes Berg [Tue, 9 Nov 2021 09:02:04 +0000 (10:02 +0100)]
mac80211: fix radiotap header generation

In commit 8c89f7b3d3f2 ("mac80211: Use flex-array for radiotap header
bitmap") we accidentally pointed the position to the wrong place, so
we overwrite a present bitmap, and thus cause all kinds of trouble.

To see the issue, note that the previous code read:

  pos = (void *)(it_present + 1);

The requirement now is that we need to calculate pos via it_optional,
to not trigger the compiler hardening checks, as:

  pos = (void *)&rthdr->it_optional[...];

Rewriting the original expression, we get (obviously, since that just
adds "+ x - x" terms):

  pos = (void *)(it_present + 1 + rthdr->it_optional - rthdr->it_optional)

and moving the "+ rthdr->it_optional" outside to be used as an array:

  pos = (void *)&rthdr->it_optional[it_present + 1 - rthdr->it_optional];

The original is off by one, fix it.

Cc: [email protected]
Fixes: 8c89f7b3d3f2 ("mac80211: Use flex-array for radiotap header bitmap")
Reported-by: Sid Hayn <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
Tested-by: Sid Hayn <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/20211109100203.c61007433ed6.I1dade57aba7de9c4f48d68249adbae62636fd98c@changeid
Signed-off-by: Johannes Berg <[email protected]>
3 years agodocs: filesystems: Fix grammatical error "with" to "which"
Wasin Thonkaew [Wed, 3 Nov 2021 19:35:04 +0000 (19:35 +0000)]
docs: filesystems: Fix grammatical error "with" to "which"

Signed-off-by: Wasin Thonkaew <[email protected]>
Signed-off-by: Jonathan Corbet <[email protected]>
3 years agomac80211: do not access the IV when it was stripped
Xing Song [Mon, 1 Nov 2021 02:46:57 +0000 (10:46 +0800)]
mac80211: do not access the IV when it was stripped

ieee80211_get_keyid() will return false value if IV has been stripped,
such as return 0 for IP/ARP frames due to LLC header, and return -EINVAL
for disassociation frames due to its length... etc. Don't try to access
it if it's not present.

Signed-off-by: Xing Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>
3 years agodoc/zh_CN: fix a translation error in management-style
Alex Shi [Wed, 10 Nov 2021 12:02:13 +0000 (20:02 +0800)]
doc/zh_CN: fix a translation error in management-style

'The name of the game' means the most important part of an activity, so
we should translate it by the meaning instead of the words.

Suggested-by: Xinyong Wang <[email protected]>
Signed-off-by: Alex Shi <[email protected]>
Reviewed-by: Yanteng Si <[email protected]>
Signed-off-by: Jonathan Corbet <[email protected]>
3 years agonl80211: fix radio statistics in survey dump
Johannes Berg [Fri, 29 Oct 2021 07:25:39 +0000 (09:25 +0200)]
nl80211: fix radio statistics in survey dump

Even if userspace specifies the NL80211_ATTR_SURVEY_RADIO_STATS
attribute, we cannot get the statistics because we're not really
parsing the incoming attributes properly any more.

Fix this by passing the attrbuf to nl80211_prepare_wdev_dump()
and filling it there, if given, and using a local version only
if no output is desired.

Since I'm touching it anyway, make nl80211_prepare_wdev_dump()
static.

Fixes: 50508d941c18 ("cfg80211: use parallel_ops for genl")
Reported-by: Jan Fuchs <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
Tested-by: Sven Eckelmann <[email protected]>
Link: https://lore.kernel.org/r/20211029092539.2851b4799386.If9736d4575ee79420cbec1bd930181e1d53c7317@changeid
Signed-off-by: Johannes Berg <[email protected]>
3 years agocfg80211: call cfg80211_stop_ap when switch from P2P_GO type
Nguyen Dinh Phi [Wed, 27 Oct 2021 17:37:22 +0000 (01:37 +0800)]
cfg80211: call cfg80211_stop_ap when switch from P2P_GO type

If the userspace tools switch from NL80211_IFTYPE_P2P_GO to
NL80211_IFTYPE_ADHOC via send_msg(NL80211_CMD_SET_INTERFACE), it
does not call the cleanup cfg80211_stop_ap(), this leads to the
initialization of in-use data. For example, this path re-init the
sdata->assigned_chanctx_list while it is still an element of
assigned_vifs list, and makes that linked list corrupt.

Signed-off-by: Nguyen Dinh Phi <[email protected]>
Reported-by: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected]
Fixes: ac800140c20e ("cfg80211: .stop_ap when interface is going down")
Signed-off-by: Johannes Berg <[email protected]>
3 years agodocs: ftrace: fix the wrong path of tracefs
Zhaoyu Liu [Sat, 13 Nov 2021 13:37:34 +0000 (21:37 +0800)]
docs: ftrace: fix the wrong path of tracefs

Delete "tracing" due to it has been included in /proc/mounts.
Delete "echo nop > $tracefs/tracing/current_tracer", maybe
this command is redundant.

Signed-off-by: Zhaoyu Liu <[email protected]>
Acked-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Jonathan Corbet <[email protected]>
3 years agoDocumentation: arm: marvell: Fix link to armada_1000_pb.pdf document
Pali Rohár [Fri, 8 Oct 2021 16:01:05 +0000 (18:01 +0200)]
Documentation: arm: marvell: Fix link to armada_1000_pb.pdf document

File armada_1000_pb.pdf is not available on Marvell website anymore.
So update link to webarchive where is backup copy.

Signed-off-by: Pali Rohár <[email protected]>
Signed-off-by: Jonathan Corbet <[email protected]>
3 years agoDocumentation: arm: marvell: Put Armada XP section between Armada 370 and 375
Pali Rohár [Fri, 8 Oct 2021 16:01:04 +0000 (18:01 +0200)]
Documentation: arm: marvell: Put Armada XP section between Armada 370 and 375

From evolution and feature point of view Armada XP belongs between Armada
370 and Armada 375 families.

Signed-off-by: Pali Rohár <[email protected]>
Signed-off-by: Jonathan Corbet <[email protected]>
3 years agoDocumentation: arm: marvell: Add some links to homepage / product infos
Pali Rohár [Fri, 8 Oct 2021 16:01:03 +0000 (18:01 +0200)]
Documentation: arm: marvell: Add some links to homepage / product infos

Webarchive contains some useful resources like product info or links to
other documents.

Signed-off-by: Pali Rohár <[email protected]>
Signed-off-by: Jonathan Corbet <[email protected]>
3 years agodocs: Update Sphinx requirements
Akira Yokosawa [Wed, 10 Nov 2021 09:16:48 +0000 (18:16 +0900)]
docs: Update Sphinx requirements

Commit f546ff0c0c07 ("Move our minimum Sphinx version to 1.7") raised
the minimum version to 1.7.

For pdfdocs, sphinx_pre_install says:

    note: If you want pdf, you need at least Sphinx 2.4.4.

, and current requirements.txt installs Sphinx 2.4.4.

Update Sphinx versions mentioned in docs and remove a note on earlier
Sphinx versions.

Update zh_CN and it_IT translations as well.

Signed-off-by: Akira Yokosawa <[email protected]>
Cc: Federico Vaga <[email protected]>
Cc: Alex Shi <[email protected]>
Reviewed-by: Alex Shi <[email protected]>
Signed-off-by: Jonathan Corbet <[email protected]>
3 years agoMerge tag 'trace-v5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Mon, 15 Nov 2021 03:07:19 +0000 (19:07 -0800)]
Merge tag 'trace-v5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "Update to tracing histogram variable string copy

  A fix to only copy the size of the field to the histogram string did
  not take into account that the size can be larger than the storage"

* tag 'trace-v5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Add length protection to histogram string copies

3 years agokbuild: Fix -Wimplicit-fallthrough=5 error for GCC 5.x and 6.x
Gustavo A. R. Silva [Mon, 15 Nov 2021 02:48:44 +0000 (20:48 -0600)]
kbuild: Fix -Wimplicit-fallthrough=5 error for GCC 5.x and 6.x

-Wimplicit-fallthrough=5 was under cc-option because it was only
available in GCC 7.x and newer so the build is now broken for GCC 5.x
and 6.x:

gcc: error: unrecognized command line option '-Wimplicit-fallthrough=5';
did you mean '-Wno-fallthrough'?

Fix this by moving -Wimplicit-fallthrough=5 under cc-option.

Fixes: dee2b702bcf0 ("kconfig: Add support for -Wimplicit-fallthrough")
Reported-by: Nathan Chancellor <[email protected]>
Co-developed-by: Nathan Chancellor <[email protected]>
Signed-off-by: Nathan Chancellor <[email protected]>
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
3 years agotracing: Add length protection to histogram string copies
Steven Rostedt (VMware) [Sun, 14 Nov 2021 18:28:34 +0000 (13:28 -0500)]
tracing: Add length protection to histogram string copies

The string copies to the histogram storage has a max size of 256 bytes
(defined by MAX_FILTER_STR_VAL). Only the string size of the event field
needs to be copied to the event storage, but no more than what is in the
event storage. Although nothing should be bigger than 256 bytes, there's
no protection against overwriting of the storage if one day there is.

Copy no more than the destination size, and enforce it.

Also had to turn MAX_FILTER_STR_VAL into an unsigned int, to keep the
min() comparison of the string sizes of comparable types.

Link: https://lore.kernel.org/all/CAHk-=wjREUihCGrtRBwfX47y_KrLCGjiq3t6QtoNJpmVrAEb1w@mail.gmail.com/
Link: https://lkml.kernel.org/r/[email protected]
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Tom Zanussi <[email protected]>
Reported-by: Linus Torvalds <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
Fixes: 63f84ae6b82b ("tracing/histogram: Do not copy the fixed-size char array field over the field size")
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
3 years agoLinux 5.16-rc1 v5.16-rc1
Linus Torvalds [Sun, 14 Nov 2021 21:56:52 +0000 (13:56 -0800)]
Linux 5.16-rc1

3 years agokconfig: Add support for -Wimplicit-fallthrough
Gustavo A. R. Silva [Sun, 14 Nov 2021 00:57:25 +0000 (18:57 -0600)]
kconfig: Add support for -Wimplicit-fallthrough

Add Kconfig support for -Wimplicit-fallthrough for both GCC and Clang.

The compiler option is under configuration CC_IMPLICIT_FALLTHROUGH,
which is enabled by default.

Special thanks to Nathan Chancellor who fixed the Clang bug[1][2]. This
bugfix only appears in Clang 14.0.0, so older versions still contain
the bug and -Wimplicit-fallthrough won't be enabled for them, for now.

This concludes a long journey and now we are finally getting rid
of the unintentional fallthrough bug-class in the kernel, entirely. :)

Link: https://github.com/llvm/llvm-project/commit/9ed4a94d6451046a51ef393cd62f00710820a7e8
Link: https://bugs.llvm.org/show_bug.cgi?id=51094
Link: https://github.com/KSPP/linux/issues/115
Link: https://github.com/ClangBuiltLinux/linux/issues/236
Co-developed-by: Kees Cook <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Co-developed-by: Linus Torvalds <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
3 years agoMerge tag 'xfs-5.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Sun, 14 Nov 2021 20:18:22 +0000 (12:18 -0800)]
Merge tag 'xfs-5.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs cleanups from Darrick Wong:
 "The most 'exciting' aspect of this branch is that the xfsprogs
  maintainer and I have worked through the last of the code
  discrepancies between kernel and userspace libxfs such that there are
  no code differences between the two except for #includes.

  IOWs, diff suffices to demonstrate that the userspace tools behave the
  same as the kernel, and kernel-only bits are clearly marked in the
  /kernel/ source code instead of just the userspace source.

  Summary:

   - Clean up open-coded swap() calls.

   - A little bit of #ifdef golf to complete the reunification of the
     kernel and userspace libxfs source code"

* tag 'xfs-5.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: sync xfs_btree_split macros with userspace libxfs
  xfs: #ifdef out perag code for userspace
  xfs: use swap() to make dabtree code cleaner

3 years agoMerge tag 'for-5.16/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Sun, 14 Nov 2021 19:53:59 +0000 (11:53 -0800)]
Merge tag 'for-5.16/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull more parisc fixes from Helge Deller:
 "Fix a build error in stracktrace.c, fix resolving of addresses to
  function names in backtraces, fix single-stepping in assembly code and
  flush userspace pte's when using set_pte_at()"

* tag 'for-5.16/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc/entry: fix trace test in syscall exit path
  parisc: Flush kernel data mapping in set_pte_at() when installing pte for user page
  parisc: Fix implicit declaration of function '__kernel_text_address'
  parisc: Fix backtrace to always include init funtion names

3 years agoMerge tag 'sh-for-5.16' of git://git.libc.org/linux-sh
Linus Torvalds [Sun, 14 Nov 2021 19:37:49 +0000 (11:37 -0800)]
Merge tag 'sh-for-5.16' of git://git.libc.org/linux-sh

Pull arch/sh updates from Rich Felker.

* tag 'sh-for-5.16' of git://git.libc.org/linux-sh:
  sh: pgtable-3level: Fix cast to pointer from integer of different size
  sh: fix READ/WRITE redefinition warnings
  sh: define __BIG_ENDIAN for math-emu
  sh: math-emu: drop unused functions
  sh: fix kconfig unmet dependency warning for FRAME_POINTER
  sh: Cleanup about SPARSE_IRQ
  sh: kdump: add some attribute to function
  maple: fix wrong return value of maple_bus_init().
  sh: boot: avoid unneeded rebuilds under arch/sh/boot/compressed/
  sh: boot: add intermediate vmlinux.bin* to targets instead of extra-y
  sh: boards: Fix the cacography in irq.c
  sh: check return code of request_irq
  sh: fix trivial misannotations

3 years agoMerge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Linus Torvalds [Sun, 14 Nov 2021 19:30:50 +0000 (11:30 -0800)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:

 - Fix early_iounmap

 - Drop cc-option fallbacks for architecture selection

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 9156/1: drop cc-option fallbacks for architecture selection
  ARM: 9155/1: fix early early_iounmap()

3 years agoMerge tag 'devicetree-fixes-for-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 14 Nov 2021 19:11:51 +0000 (11:11 -0800)]
Merge tag 'devicetree-fixes-for-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

 - Two fixes due to DT node name changes on Arm, Ltd. boards

 - Treewide rename of Ingenic CGU headers

 - Update ST email addresses

 - Remove Netlogic DT bindings

 - Dropping few more cases of redundant 'maxItems' in schemas

 - Convert toshiba,tc358767 bridge binding to schema

* tag 'devicetree-fixes-for-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: watchdog: sunxi: fix error in schema
  bindings: media: venus: Drop redundant maxItems for power-domain-names
  dt-bindings: Remove Netlogic bindings
  clk: versatile: clk-icst: Ensure clock names are unique
  of: Support using 'mask' in making device bus id
  dt-bindings: treewide: Update @st.com email address to @foss.st.com
  dt-bindings: media: Update maintainers for st,stm32-hwspinlock.yaml
  dt-bindings: media: Update maintainers for st,stm32-cec.yaml
  dt-bindings: mfd: timers: Update maintainers for st,stm32-timers
  dt-bindings: timer: Update maintainers for st,stm32-timer
  dt-bindings: i2c: imx: hardware do not restrict clock-frequency to only 100 and 400 kHz
  dt-bindings: display: bridge: Convert toshiba,tc358767.txt to yaml
  dt-bindings: Rename Ingenic CGU headers to ingenic,*.h

3 years agoMerge tag 'timers-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 14 Nov 2021 18:43:38 +0000 (10:43 -0800)]
Merge tag 'timers-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
 "A single fix for POSIX CPU timers to address a problem where POSIX CPU
  timer delivery stops working for a new child task because
  copy_process() copies state information which is only valid for the
  parent task"

* tag 'timers-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()

3 years agoMerge tag 'irq-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 14 Nov 2021 18:38:27 +0000 (10:38 -0800)]
Merge tag 'irq-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "A set of fixes for the interrupt subsystem

  Core code:

   - A regression fix for the Open Firmware interrupt mapping code where
     a interrupt controller property in a node caused a map property in
     the same node to be ignored.

  Interrupt chip drivers:

   - Workaround a limitation in SiFive PLIC interrupt chip which
     silently ignores an EOI when the interrupt line is masked.

   - Provide the missing mask/unmask implementation for the CSKY MP
     interrupt controller.

  PCI/MSI:

   - Prevent a use after free when PCI/MSI interrupts are released by
     destroying the sysfs entries before freeing the memory which is
     accessed in the sysfs show() function.

   - Implement a mask quirk for the Nvidia ION AHCI chip which does not
     advertise masking capability despite implementing it. Even worse
     the chip comes out of reset with all MSI entries masked, which due
     to the missing masking capability never get unmasked.

   - Move the check which prevents accessing the MSI[X] masking for XEN
     back into the low level accessors. The recent consolidation missed
     that these accessors can be invoked from places which do not have
     that check which broke XEN. Move them back to he original place
     instead of sprinkling tons of these checks all over the code"

* tag 'irq-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  of/irq: Don't ignore interrupt-controller when interrupt-map failed
  irqchip/sifive-plic: Fixup EOI failed when masked
  irqchip/csky-mpintc: Fixup mask/unmask implementation
  PCI/MSI: Destroy sysfs before freeing entries
  PCI: Add MSI masking quirk for Nvidia ION AHCI
  PCI/MSI: Deal with devices lying about their MSI mask capability
  PCI/MSI: Move non-mask check back into low level accessors

3 years agoMerge tag 'locking-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 14 Nov 2021 18:30:17 +0000 (10:30 -0800)]
Merge tag 'locking-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 static call update from Thomas Gleixner:
 "A single fix for static calls to make the trampoline patching more
  robust by placing explicit signature bytes after the call trampoline
  to prevent patching random other jumps like the CFI jump table
  entries"

* tag 'locking-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  static_call,x86: Robustify trampoline patching

3 years agoMerge tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 14 Nov 2021 17:39:03 +0000 (09:39 -0800)]
Merge tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:

 - Avoid touching ~100 config files in order to be able to select the
   preemption model

 - clear cluster CPU masks too, on the CPU unplug path

 - prevent use-after-free in cfs

 - Prevent a race condition when updating CPU cache domains

 - Factor out common shared part of smp_prepare_cpus() into a common
   helper which can be called by both baremetal and Xen, in order to fix
   a booting of Xen PV guests

* tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  preempt: Restore preemption model selection configs
  arch_topology: Fix missing clear cluster_cpumask in remove_cpu_topology()
  sched/fair: Prevent dead task groups from regaining cfs_rq's
  sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
  x86/smp: Factor out parts of native_smp_prepare_cpus()

3 years agoMerge tag 'perf_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 14 Nov 2021 17:33:12 +0000 (09:33 -0800)]
Merge tag 'perf_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Prevent unintentional page sharing by checking whether a page
   reference to a PMU samples page has been acquired properly before
   that

 - Make sure the LBR_SELECT MSR is saved/restored too

 - Reset the LBR_SELECT MSR when resetting the LBR PMU to clear any
   residual data left

* tag 'perf_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Avoid put_page() when GUP fails
  perf/x86/vlbr: Add c->flags to vlbr event constraints
  perf/x86/lbr: Reset LBR_SELECT during vlbr reset

3 years agoMerge tag 'x86_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 14 Nov 2021 17:29:03 +0000 (09:29 -0800)]
Merge tag 'x86_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Add the model number of a new, Raptor Lake CPU, to intel-family.h

 - Do not log spurious corrected MCEs on SKL too, due to an erratum

 - Clarify the path of paravirt ops patches upstream

 - Add an optimization to avoid writing out AMX components to sigframes
   when former are in init state

* tag 'x86_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add Raptor Lake to Intel family
  x86/mce: Add errata workaround for Skylake SKX37
  MAINTAINERS: Add some information to PARAVIRT_OPS entry
  x86/fpu: Optimize out sigframe xfeatures when in init state

3 years agoMerge tag 'perf-tools-for-v5.16-2021-11-13' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sun, 14 Nov 2021 17:25:01 +0000 (09:25 -0800)]
Merge tag 'perf-tools-for-v5.16-2021-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull more perf tools updates from Arnaldo Carvalho de Melo:
 "Hardware tracing:

   - ARM:
      * Print the size of the buffer size consistently in hexadecimal in
        ARM Coresight.
      * Add Coresight snapshot mode support.
      * Update --switch-events docs in 'perf record'.
      * Support hardware-based PID tracing.
      * Track task context switch for cpu-mode events.

   - Vendor events:
      * Add metric events JSON file for power10 platform

  perf test:

   - Get 'perf test' unit tests closer to kunit.

   - Topology tests improvements.

   - Remove bashisms from some tests.

  perf bench:

   - Fix memory leak of perf_cpu_map__new() in the futex benchmarks.

  libbpf:

   - Add some more weak libbpf functions o allow building with the
     libbpf versions, old ones, present in distros.

  libbeauty:

   - Translate [gs]setsockopt 'level' argument integer values to
     strings.

  tools headers UAPI:

   - Sync futex_waitv, arch prctl, sound, i195_drm and msr-index files
     with the kernel sources.

  Documentation:

   - Add documentation to 'struct symbol'.

   - Synchronize the definition of enum perf_hw_id with code in
     tools/perf/design.txt"

* tag 'perf-tools-for-v5.16-2021-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (67 commits)
  perf tests: Remove bash constructs from stat_all_pmu.sh
  perf tests: Remove bash construct from record+zstd_comp_decomp.sh
  perf test: Remove bash construct from stat_bpf_counters.sh test
  perf bench futex: Fix memory leak of perf_cpu_map__new()
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
  tools headers UAPI: Sync sound/asound.h with the kernel sources
  tools headers UAPI: Sync linux/prctl.h with the kernel sources
  tools headers UAPI: Sync arch prctl headers with the kernel sources
  perf tools: Add more weak libbpf functions
  perf bpf: Avoid memory leak from perf_env__insert_btf()
  perf symbols: Factor out annotation init/exit
  perf symbols: Bit pack to save a byte
  perf symbols: Add documentation to 'struct symbol'
  tools headers UAPI: Sync files changed by new futex_waitv syscall
  perf test bpf: Use ARRAY_CHECK() instead of ad-hoc equivalent, addressing array_size.cocci warning
  perf arm-spe: Support hardware-based PID tracing
  perf arm-spe: Save context ID in record
  perf arm-spe: Update --switch-events docs in 'perf record'
  perf arm-spe: Track task context switch for cpu-mode events
  ...

3 years agoMerge tag 'irqchip-fixes-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Thomas Gleixner [Sun, 14 Nov 2021 12:59:05 +0000 (13:59 +0100)]
Merge tag 'irqchip-fixes-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent

Pull irqchip fixes from Marc Zyngier:

  - Address an issue with the SiFive PLIC being unable to EOI
    a masked interrupt

  - Move the disable/enable methods in the CSky mpintc to
    mask/unmask

  - Fix a regression in the OF irq code where an interrupt-controller
    property in the same node as an interrupt-map property would get
    ignored

Link: https://lore.kernel.org/all/[email protected]
3 years agonet,lsm,selinux: revert the security_sctp_assoc_established() hook
Paul Moore [Fri, 12 Nov 2021 23:18:10 +0000 (18:18 -0500)]
net,lsm,selinux: revert the security_sctp_assoc_established() hook

This patch reverts two prior patches, e7310c94024c
("security: implement sctp_assoc_established hook in selinux") and
7c2ef0240e6a ("security: add sctp_assoc_established hook"), which
create the security_sctp_assoc_established() LSM hook and provide a
SELinux implementation.  Unfortunately these two patches were merged
without proper review (the Reviewed-by and Tested-by tags from
Richard Haines were for previous revisions of these patches that
were significantly different) and there are outstanding objections
from the SELinux maintainers regarding these patches.

Work is currently ongoing to correct the problems identified in the
reverted patches, as well as others that have come up during review,
but it is unclear at this point in time when that work will be ready
for inclusion in the mainline kernel.  In the interest of not keeping
objectionable code in the kernel for multiple weeks, and potentially
a kernel release, we are reverting the two problematic patches.

Signed-off-by: Paul Moore <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
3 years agoMerge tag 'zstd-for-linus-v5.16' of git://github.com/terrelln/linux
Linus Torvalds [Sat, 13 Nov 2021 23:32:30 +0000 (15:32 -0800)]
Merge tag 'zstd-for-linus-v5.16' of git://github.com/terrelln/linux

Pull zstd update from Nick Terrell:
 "Update to zstd-1.4.10.

  Add myself as the maintainer of zstd and update the zstd version in
  the kernel, which is now 4 years out of date, to a much more recent
  zstd release. This includes bug fixes, much more extensive fuzzing,
  and performance improvements. And generates the kernel zstd
  automatically from upstream zstd, so it is easier to keep the zstd
  verison up to date, and we don't fall so far out of date again.

  This includes 5 commits that update the zstd library version:

   - Adds a new kernel-style wrapper around zstd.

     This wrapper API is functionally equivalent to the subset of the
     current zstd API that is currently used. The wrapper API changes to
     be kernel style so that the symbols don't collide with zstd's
     symbols. The update to zstd-1.4.10 maintains the same API and
     preserves the semantics, so that none of the callers need to be
     updated. All callers are updated in the commit, because there are
     zero functional changes.

   - Adds an indirection for `lib/decompress_unzstd.c` so it doesn't
     depend on the layout of `lib/zstd/` to include every source file.
     This allows the next patch to be automatically generated.

   - Imports the zstd-1.4.10 source code. This commit is automatically
     generated from upstream zstd (https://github.com/facebook/zstd).

   - Adds me ([email protected]) as the maintainer of `lib/zstd`.

   - Fixes a newly added build warning for clang.

  The discussion around this patchset has been pretty long, so I've
  included a FAQ-style summary of the history of the patchset, and why
  we are taking this approach.

  Why do we need to update?
  -------------------------

  The zstd version in the kernel is based off of zstd-1.3.1, which is
  was released August 20, 2017. Since then zstd has seen many bug fixes
  and performance improvements. And, importantly, upstream zstd is
  continuously fuzzed by OSS-Fuzz, and bug fixes aren't backported to
  older versions. So the only way to sanely get these fixes is to keep
  up to date with upstream zstd.

  There are no known security issues that affect the kernel, but we need
  to be able to update in case there are. And while there are no known
  security issues, there are relevant bug fixes. For example the problem
  with large kernel decompression has been fixed upstream for over 2
  years [1]

  Additionally the performance improvements for kernel use cases are
  significant. Measured for x86_64 on my Intel i9-9900k @ 3.6 GHz:

   - BtrFS zstd compression at levels 1 and 3 is 5% faster

   - BtrFS zstd decompression+read is 15% faster

   - SquashFS zstd decompression+read is 15% faster

   - F2FS zstd compression+write at level 3 is 8% faster

   - F2FS zstd decompression+read is 20% faster

   - ZRAM decompression+read is 30% faster

   - Kernel zstd decompression is 35% faster

   - Initramfs zstd decompression+build is 5% faster

  On top of this, there are significant performance improvements coming
  down the line in the next zstd release, and the new automated update
  patch generation will allow us to pull them easily.

  How is the update patch generated?
  ----------------------------------

  The first two patches are preparation for updating the zstd version.
  Then the 3rd patch in the series imports upstream zstd into the
  kernel. This patch is automatically generated from upstream. A script
  makes the necessary changes and imports it into the kernel. The
  changes are:

   - Replace all libc dependencies with kernel replacements and rewrite
     includes.

   - Remove unncessary portability macros like: #if defined(_MSC_VER).

   - Use the kernel xxhash instead of bundling it.

  This automation gets tested every commit by upstream's continuous
  integration. When we cut a new zstd release, we will submit a patch to
  the kernel to update the zstd version in the kernel.

  The automated process makes it easy to keep the kernel version of zstd
  up to date. The current zstd in the kernel shares the guts of the
  code, but has a lot of API and minor changes to work in the kernel.
  This is because at the time upstream zstd was not ready to be used in
  the kernel envrionment as-is. But, since then upstream zstd has
  evolved to support being used in the kernel as-is.

  Why are we updating in one big patch?
  -------------------------------------

  The 3rd patch in the series is very large. This is because it is
  restructuring the code, so it both deletes the existing zstd, and
  re-adds the new structure. Future updates will be directly
  proportional to the changes in upstream zstd since the last import.
  They will admittidly be large, as zstd is an actively developed
  project, and has hundreds of commits between every release. However,
  there is no other great alternative.

  One option ruled out is to replay every upstream zstd commit. This is
  not feasible for several reasons:

   - There are over 3500 upstream commits since the zstd version in the
     kernel.

   - The automation to automatically generate the kernel update was only
     added recently, so older commits cannot easily be imported.

   - Not every upstream zstd commit builds.

   - Only zstd releases are "supported", and individual commits may have
     bugs that were fixed before a release.

  Another option to reduce the patch size would be to first reorganize
  to the new file structure, and then apply the patch. However, the
  current kernel zstd is formatted with clang-format to be more
  "kernel-like". But, the new method imports zstd as-is, without
  additional formatting, to allow for closer correlation with upstream,
  and easier debugging. So the patch wouldn't be any smaller.

  It also doesn't make sense to import upstream zstd commit by commit
  going forward. Upstream zstd doesn't support production use cases
  running of the development branch. We have a lot of post-commit
  fuzzing that catches many bugs, so indiviudal commits may be buggy,
  but fixed before a release. So going forward, I intend to import every
  (important) zstd release into the Kernel.

  So, while it isn't ideal, updating in one big patch is the only patch
  I see forward.

  Who is responsible for this code?
  ---------------------------------

  I am. This patchset adds me as the maintainer for zstd. Previously,
  there was no tree for zstd patches. Because of that, there were
  several patches that either got ignored, or took a long time to merge,
  since it wasn't clear which tree should pick them up. I'm officially
  stepping up as maintainer, and setting up my tree as the path through
  which zstd patches get merged. I'll make sure that patches to the
  kernel zstd get ported upstream, so they aren't erased when the next
  version update happens.

  How is this code tested?
  ------------------------

  I tested every caller of zstd on x86_64 (BtrFS, ZRAM, SquashFS, F2FS,
  Kernel, InitRAMFS). I also tested Kernel & InitRAMFS on i386 and
  aarch64. I checked both performance and correctness.

  Also, thanks to many people in the community who have tested these
  patches locally.

  Lastly, this code will bake in linux-next before being merged into
  v5.16.

  Why update to zstd-1.4.10 when zstd-1.5.0 has been released?
  ------------------------------------------------------------

  This patchset has been outstanding since 2020, and zstd-1.4.10 was the
  latest release when it was created. Since the update patch is
  automatically generated from upstream, I could generate it from
  zstd-1.5.0.

  However, there were some large stack usage regressions in zstd-1.5.0,
  and are only fixed in the latest development branch. And the latest
  development branch contains some new code that needs to bake in the
  fuzzer before I would feel comfortable releasing to the kernel.

  Once this patchset has been merged, and we've released zstd-1.5.1, we
  can update the kernel to zstd-1.5.1, and exercise the update process.

  You may notice that zstd-1.4.10 doesn't exist upstream. This release
  is an artifical release based off of zstd-1.4.9, with some fixes for
  the kernel backported from the development branch. I will tag the
  zstd-1.4.10 release after this patchset is merged, so the Linux Kernel
  is running a known version of zstd that can be debugged upstream.

  Why was a wrapper API added?
  ----------------------------

  The first versions of this patchset migrated the kernel to the
  upstream zstd API. It first added a shim API that supported the new
  upstream API with the old code, then updated callers to use the new
  shim API, then transitioned to the new code and deleted the shim API.
  However, Cristoph Hellwig suggested that we transition to a kernel
  style API, and hide zstd's upstream API behind that. This is because
  zstd's upstream API is supports many other use cases, and does not
  follow the kernel style guide, while the kernel API is focused on the
  kernel's use cases, and follows the kernel style guide.

  Where is the previous discussion?
  ---------------------------------

  Links for the discussions of the previous versions of the patch set
  below. The largest changes in the design of the patchset are driven by
  the discussions in v11, v5, and v1. Sorry for the mix of links, I
  couldn't find most of the the threads on lkml.org"

Link: https://lkml.org/lkml/2020/9/29/27
Link: https://www.spinics.net/lists/linux-crypto/msg58189.html
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
Link: https://lore.kernel.org/linux-f2fs-devel/[email protected]/
Link: https://lkml.org/lkml/2020/12/3/1195
Link: https://lkml.org/lkml/2020/12/2/1245
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
Link: https://www.spinics.net/lists/linux-btrfs/msg105783.html
Link: https://lkml.org/lkml/2020/9/23/1074
Link: https://www.spinics.net/lists/linux-btrfs/msg105505.html
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
Signed-off-by: Nick Terrell <[email protected]>
Tested By: Paul Jones <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Sedat Dilek <[email protected]> # LLVM/Clang v13.0.0 on x86-64
Tested-by: Jean-Denis Girard <[email protected]>
* tag 'zstd-for-linus-v5.16' of git://github.com/terrelln/linux:
  lib: zstd: Add cast to silence clang's -Wbitwise-instead-of-logical
  MAINTAINERS: Add maintainer entry for zstd
  lib: zstd: Upgrade to latest upstream zstd version 1.4.10
  lib: zstd: Add decompress_sources.h for decompress_unzstd
  lib: zstd: Add kernel-specific API

3 years agoMerge tag 'virtio-mem-for-5.16' of git://github.com/davidhildenbrand/linux
Linus Torvalds [Sat, 13 Nov 2021 21:14:05 +0000 (13:14 -0800)]
Merge tag 'virtio-mem-for-5.16' of git://github.com/davidhildenbrand/linux

Pull virtio-mem update from David Hildenbrand:
 "Support the VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE feature in virtio-mem,
  now that "accidential" access to logically unplugged memory inside
  added Linux memory blocks is no longer possible, because we:

   - Removed /dev/kmem in commit bbcd53c96071 ("drivers/char: remove
     /dev/kmem for good")

   - Disallowed access to virtio-mem device memory via /dev/mem in
     commit 2128f4e21aa ("virtio-mem: disallow mapping virtio-mem memory
     via /dev/mem")

   - Sanitized access to virtio-mem device memory via /proc/kcore in
     commit 0daa322b8ff9 ("fs/proc/kcore: don't read offline sections,
     logically offline pages and hwpoisoned pages")

   - Sanitized access to virtio-mem device memory via /proc/vmcore in
     commit ce2814622e84 ("virtio-mem: kdump mode to sanitize
     /proc/vmcore access")

  The new VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE feature that will be
  required by some hypervisors implementing virtio-mem in the near
  future, so let's support it now that we safely can"

* tag 'virtio-mem-for-5.16' of git://github.com/davidhildenbrand/linux:
  virtio-mem: support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE

3 years agoperf tests: Remove bash constructs from stat_all_pmu.sh
James Clark [Thu, 28 Oct 2021 13:48:27 +0000 (14:48 +0100)]
perf tests: Remove bash constructs from stat_all_pmu.sh

The tests were passing but without testing and were printing the
following:

  $ ./perf test -v 90
  90: perf all PMU test                                               :
  --- start ---
  test child forked, pid 51650
  Testing cpu/branch-instructions/
  ./tests/shell/stat_all_pmu.sh: 10: [:
   Performance counter stats for 'true':

             137,307      cpu/branch-instructions/

         0.001686672 seconds time elapsed

         0.001376000 seconds user
         0.000000000 seconds sys: unexpected operator

Changing the regexes to a grep works in sh and prints this:

  $ ./perf test -v 90
  90: perf all PMU test                                               :
  --- start ---
  test child forked, pid 60186
  [...]
  Testing tlb_flush.stlb_any
  test child finished with 0
  ---- end ----
  perf all PMU test: Ok

Signed-off-by: James Clark <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Florian Fainelli <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Sumanth Korikkar <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
3 years agoperf tests: Remove bash construct from record+zstd_comp_decomp.sh
James Clark [Thu, 28 Oct 2021 13:48:26 +0000 (14:48 +0100)]
perf tests: Remove bash construct from record+zstd_comp_decomp.sh

Commit 463538a383a2 ("perf tests: Fix test 68 zstd compression for
s390") inadvertently removed the -g flag from all platforms rather than
just s390, because the [[ ]] construct fails in sh. Changing to single
brackets restores testing of call graphs and removes the following error
from the output:

  $ ./perf test -v 85
  85: Zstd perf.data compression/decompression                        :
  --- start ---
  test child forked, pid 50643
  Collecting compressed record file:
  ./tests/shell/record+zstd_comp_decomp.sh: 15: [[: not found

Fixes: 463538a383a2 ("perf tests: Fix test 68 zstd compression for s390")
Signed-off-by: James Clark <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Florian Fainelli <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Sumanth Korikkar <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
3 years agoperf test: Remove bash construct from stat_bpf_counters.sh test
James Clark [Thu, 28 Oct 2021 13:48:25 +0000 (14:48 +0100)]
perf test: Remove bash construct from stat_bpf_counters.sh test

Currently the test skips with an error because == only works in bash:

  $ ./perf test 91 -v
  Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc
  91: perf stat --bpf-counters test                                   :
  --- start ---
  test child forked, pid 44586
  ./tests/shell/stat_bpf_counters.sh: 26: [: -v: unexpected operator
  test child finished with -2
  ---- end ----
  perf stat --bpf-counters test: Skip

Changing == to = does the same thing, but doesn't result in an error:

  ./perf test 91 -v
  Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc
  91: perf stat --bpf-counters test                                   :
  --- start ---
  test child forked, pid 45833
  Skipping: --bpf-counters not supported
    Error: unknown option `bpf-counters'
  [...]
  test child finished with -2
  ---- end ----
  perf stat --bpf-counters test: Skip

Signed-off-by: James Clark <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Florian Fainelli <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Sumanth Korikkar <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
This page took 0.146424 seconds and 4 git commands to generate.