Git Repo - linux.git/log

net: lantiq_xrx200: confirm skb is allocated before using

xrx200_hw_receive() assumes build_skb() always works and goes straight
to skb_reserve(). However, build_skb() can fail under memory pressure.

Add a check in case build_skb() failed to allocate and return NULL.

Fixes: e015593573b3 ("net: lantiq_xrx200: convert to build_skb")
Reported-by: Eric Dumazet <[email protected]>
Signed-off-by: Aleksander Jan Bajkowski <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: stmmac: work around sporadic tx issue on link-up

This is a follow-up to the discussion in [0]. It seems to me that
at least the IP version used on Amlogic SoC's sometimes has a problem
if register MAC_CTRL_REG is written whilst the chip is still processing
a previous write. But that's just a guess.
Adding a delay between two writes to this register helps, but we can
also simply omit the offending second write. This patch uses the second
approach and is based on a suggestion from Qi Duan.
Benefit of this approach is that we can save few register writes, also
on not affected chip versions.

[0] https://www.spinics.net/lists/netdev/msg831526.html

Fixes: bfab27a146ed ("stmmac: add the experimental PCI support")
Suggested-by: Qi Duan <[email protected]>
Suggested-by: Jerome Brunet <[email protected]>
Signed-off-by: Heiner Kallweit <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-08-24 (ixgbe, i40e)

This series contains updates to ixgbe and i40e drivers.

Jake stops incorrect resetting of SYSTIME registers when starting
cyclecounter for ixgbe.

Sylwester corrects a check on source IP address when validating destination
for i40e.

* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: Fix incorrect address type for IPv6 flow rules
ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'ionic-bug-fixes'

Shannon Nelson says:

====================
ionic: bug fixes

These are a couple of maintenance bug fixes for the Pensando ionic
networking driver.

Mohamed takes care of a "plays well with others" issue where the
VF spec is a bit vague on VF mac addresses, but certain customers
have come to expect behavior based on other vendor drivers.

Shannon addresses a couple of corner cases seen in internal
stress testing.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ionic: VF initial random MAC address if no assigned mac

Assign a random mac address to the VF interface station
address if it boots with a zero mac address in order to match
similar behavior seen in other VF drivers. Handle the errors
where the older firmware does not allow the VF to set its own
station address.

Newer firmware will allow the VF to set the station mac address
if it hasn't already been set administratively through the PF.
Setting it will also be allowed if the VF has trust.

Fixes: fbb39807e9ae ("ionic: support sr-iov operations")
Signed-off-by: R Mohamed Shah <[email protected]>
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

ionic: fix up issues with handling EAGAIN on FW cmds

In looping on FW update tests we occasionally see the
FW_ACTIVATE_STATUS command fail while it is in its EAGAIN loop
waiting for the FW activate step to finsh inside the FW. The
firmware is complaining that the done bit is set when a new
dev_cmd is going to be processed.

Doing a clean on the cmd registers and doorbell before exiting
the wait-for-done and cleaning the done bit before the sleep
prevents this from occurring.

Fixes: fbfb8031533c ("ionic: Add hardware init and device commands")
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

ionic: clear broken state on generation change

There is a case found in heavy testing where a link flap happens just
before a firmware Recovery event and the driver gets stuck in the
BROKEN state. This comes from the driver getting interrupted by a FW
generation change when coming back up from the link flap, and the call
to ionic_start_queues() in ionic_link_status_check() fails. This can be
addressed by having the fw_up code clear the BROKEN bit if seen, rather
than waiting for a user to manually force the interface down and then
back up.

Fixes: 9e8eaf8427b6 ("ionic: stop watchdog when in broken state")
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

rxrpc: Fix locking in rxrpc's sendmsg

Fix three bugs in the rxrpc's sendmsg implementation:

(1) rxrpc_new_client_call() should release the socket lock when returning
     an error from rxrpc_get_call_slot().

(2) rxrpc_wait_for_tx_window_intr() will return without the call mutex
     held in the event that we're interrupted by a signal whilst waiting
     for tx space on the socket or relocking the call mutex afterwards.

     Fix this by: (a) moving the unlock/lock of the call mutex up to
     rxrpc_send_data() such that the lock is not held around all of
     rxrpc_wait_for_tx_window*() and (b) indicating to higher callers
     whether we're return with the lock dropped.  Note that this means
     recvmsg() will not block on this call whilst we're waiting.

(3) After dropping and regaining the call mutex, rxrpc_send_data() needs
     to go and recheck the state of the tx_pending buffer and the
     tx_total_len check in case we raced with another sendmsg() on the same
     call.

Thinking on this some more, it might make sense to have different locks for
sendmsg() and recvmsg().  There's probably no need to make recvmsg() wait
for sendmsg().  It does mean that recvmsg() can return MSG_EOR indicating
that a call is dead before a sendmsg() to that call returns - but that can
currently happen anyway.

Without fix (2), something like the following can be induced:

WARNING: bad unlock balance detected!
5.16.0-rc6-syzkaller #0 Not tainted
-------------------------------------
syz-executor011/3597 is trying to release lock (&call->user_mutex) at:
[<ffffffff885163a3>] rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
but there are no more locks to release!

other info that might help us debug this:
no locks held by syz-executor011/3597.
...
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_unlock_imbalance_bug include/trace/events/lock.h:58 [inline]
__lock_release kernel/locking/lockdep.c:5306 [inline]
lock_release.cold+0x49/0x4e kernel/locking/lockdep.c:5657
__mutex_unlock_slowpath+0x99/0x5e0 kernel/locking/mutex.c:900
rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
rxrpc_sendmsg+0x420/0x630 net/rxrpc/af_rxrpc.c:561
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:724
____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
___sys_sendmsg+0xf3/0x170 net/socket.c:2463
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae

[Thanks to Hawkins Jiawei and Khalid Masum for their attempts to fix this]

Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Reported-by: [email protected]
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Marc Dionne <[email protected]>
Tested-by: [email protected]
cc: Hawkins Jiawei <[email protected]>
cc: Khalid Masum <[email protected]>
cc: Dan Carpenter <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/166135894583.600315.7170979436768124075.stgit@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <[email protected]>

drm/amdgpu: mmVM_L2_CNTL3 register not initialized correctly

The mmVM_L2_CNTL3 register is not assigned an initial value

Signed-off-by: Qu Huang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: add MGCG perfmon setting for gfx11

Enable GFX11 MGCG perfmon setting.
V2: set rlc to saft mode before setting.

Signed-off-by: Likun Gao <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: Fix isa version for the GC 10.3.7

Correct the isa version for handling KFD test.

Fixes: 7c4f4f197e0c ("drm/amdkfd: Add GC 10.3.6 and 10.3.7 KFD definitions")
Signed-off-by: Prike Liang <[email protected]>
Reviewed-by: Aaron Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Fix page table setup on Arcturus

When translate_further is enabled, page table depth needs to
be updated. This was missing on Arcturus MMHUB init. This was
causing address translations to fail for SDMA user-mode queues.

Fixes: 352e683b72e7 ("drm/amdgpu: Enable translate_further to extend UTCL2 reach")
Reviewed-by: Felix Kuehling <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Mukul Joshi <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/pm: update SMU 13.0.0 driver_if header

To fit the latest 78.53 PMFW.

Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Feifei Xu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: add sdma instance check for gfx11 CGCG

For some ASICs, like GFX IP v11.0.1, only have one SDMA instance,
so not need to configure SDMA1_RLC_CGCG_CTRL for this case.

Signed-off-by: Tim Huang <[email protected]>
Reviewed-by: Yifan Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: enable PCON support for dcn314

[Why]
DCN314 supports PCON.

[How]
Explicitly enable it in dcn314 resources.

Signed-off-by: Roman Li <[email protected]>
Reviewed-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: enable NBIO IP v7.7.0 Clock Gating

Enable AMD_CG_SUPPORT_BIF_MGCG and AMD_CG_SUPPORT_BIF_LS support.

Signed-off-by: Tim Huang <[email protected]>
Reviewed-by: Yifan Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: add NBIO IP v7.7.0 Clock Gating support

Add BIF Clock Gating MGCG and LS support for NBIO IP v7.7.0.

Signed-off-by: Tim Huang <[email protected]>
Reviewed-by: Yifan Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: add TX_POWER_CTRL_1 macro definitions for NBIO IP v7.7.0

Add the BIF0_PCIE_TX_POWER_CTRL_1 register offset and mask macro
definitions for AMD_CG_SUPPORT_BIF_LS.

Signed-off-by: Tim Huang <[email protected]>
Reviewed-by: Yifan Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

Merge tag 'cgroup-for-6.0-rc2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull another cgroup fix from Tejun Heo:
"Commit 4f7e7236435c ("cgroup: Fix threadgroup_rwsem <->
  cpus_read_lock() deadlock") required the cgroup
  core to grab cpus_read_lock() before invoking ->attach().

  Unfortunately, it missed adding cpus_read_lock() in
  cgroup_attach_task_all(). Fix it"

* tag 'cgroup-for-6.0-rc2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all()

cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all()

syzbot is hitting percpu_rwsem_assert_held(&cpu_hotplug_lock) warning at
cpuset_attach() [1], for commit 4f7e7236435ca0ab ("cgroup: Fix
threadgroup_rwsem <-> cpus_read_lock() deadlock") missed that
cpuset_attach() is also called from cgroup_attach_task_all().
Add cpus_read_lock() like what cgroup_procs_write_start() does.

Link: https://syzkaller.appspot.com/bug?extid=29d3a3b4d86c8136ad9e
Reported-by: syzbot <[email protected]>
Signed-off-by: Tetsuo Handa <[email protected]>
Fixes: 4f7e7236435ca0ab ("cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock")
Signed-off-by: Tejun Heo <[email protected]>

xen/privcmd: fix error exit of privcmd_ioctl_dm_op()

The error exit of privcmd_ioctl_dm_op() is calling unlock_pages()
potentially with pages being NULL, leading to a NULL dereference.

Additionally lock_pages() doesn't check for pin_user_pages_fast()
having been completely successful, resulting in potentially not
locking all pages into memory. This could result in sporadic failures
when using the related memory in user mode.

Fix all of that by calling unlock_pages() always with the real number
of pinned pages, which will be zero in case pages being NULL, and by
checking the number of pages pinned by pin_user_pages_fast() matching
the expected number of pages.

Cc: <[email protected]>
Fixes: ab520be8cd5d ("xen/privcmd: Add IOCTL_PRIVCMD_DM_OP")
Reported-by: Rustam Subkhankulov <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Reviewed-by: Oleksandr Tyshchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>

Documentation/ABI: Mention retbleed vulnerability info file for sysfs

While reporting for the AMD retbleed vulnerability was added in

6b80b59b3555 ("x86/bugs: Report AMD retbleed vulnerability")

the new sysfs file was not mentioned so far in the ABI documentation for
sysfs-devices-system-cpu. Fix that.

Fixes: 6b80b59b3555 ("x86/bugs: Report AMD retbleed vulnerability")
Signed-off-by: Salvatore Bonaccorso <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

x86/sev: Mark snp_abort() noreturn

Mark both the function prototype and definition as noreturn in order to
prevent the compiler from doing transformations which confuse objtool
like so:

vmlinux.o: warning: objtool: sme_enable+0x71: unreachable instruction

This triggers with gcc-12.

Add it and sev_es_terminate() to the objtool noreturn tracking array
too. Sort it while at it.

Suggested-by: Michael Matz <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

io_uring/net: save address for sendzc async execution

We usually copy all bits that a request needs from the userspace for
async execution, so the userspace can keep them on the stack. However,
send zerocopy violates this pattern for addresses and may reloads it
e.g. from io-wq. Save the address if any in ->async_data as usual.

Reported-by: Stefan Metzmacher <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/d7512d7aa9abcd36e9afe1a4d292a24cb2d157e5.1661342812.git.asml.silence@gmail.com
[axboe: fold in incremental fix]
Signed-off-by: Jens Axboe <[email protected]>

s390/mm: do not trigger write fault when vma does not allow VM_WRITE

For non-protection pXd_none() page faults in do_dat_exception(), we
call do_exception() with access == (VM_READ | VM_WRITE | VM_EXEC).
In do_exception(), vma->vm_flags is checked against that before
calling handle_mm_fault().

Since commit 92f842eac7ee3 ("[S390] store indication fault optimization"),
we call handle_mm_fault() with FAULT_FLAG_WRITE, when recognizing that
it was a write access. However, the vma flags check is still only
checking against (VM_READ | VM_WRITE | VM_EXEC), and therefore also
calling handle_mm_fault() with FAULT_FLAG_WRITE in cases where the vma
does not allow VM_WRITE.

Fix this by changing access check in do_exception() to VM_WRITE only,
when recognizing write access.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 92f842eac7ee3 ("[S390] store indication fault optimization")
Cc: <[email protected]>
Reported-by: David Hildenbrand <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Gerald Schaefer <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390: fix double free of GS and RI CBs on fork() failure

The pointers for guarded storage and runtime instrumentation control
blocks are stored in the thread_struct of the associated task. These
pointers are initially copied on fork() via arch_dup_task_struct()
and then cleared via copy_thread() before fork() returns. If fork()
happens to fail after the initial task dup and before copy_thread(),
the newly allocated task and associated thread_struct memory are
freed via free_task() -> arch_release_task_struct(). This results in
a double free of the guarded storage and runtime info structs
because the fields in the failed task still refer to memory
associated with the source task.

This problem can manifest as a BUG_ON() in set_freepointer() (with
CONFIG_SLAB_FREELIST_HARDENED enabled) or KASAN splat (if enabled)
when running trinity syscall fuzz tests on s390x. To avoid this
problem, clear the associated pointer fields in
arch_dup_task_struct() immediately after the new task is copied.
Note that the RI flag is still cleared in copy_thread() because it
resides in thread stack memory and that is where stack info is
copied.

Signed-off-by: Brian Foster <[email protected]>
Fixes: 8d9047f8b967c ("s390/runtime instrumentation: simplify task exit handling")
Fixes: 7b83c6297d2fc ("s390/guarded storage: simplify task exit handling")
Cc: <[email protected]> # 4.15
Reviewed-by: Gerald Schaefer <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vasily Gorbik <[email protected]>

xen: move from strlcpy with unused retval to strscpy

Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <[email protected]>
Reviewed-by: Oleksandr Tyshchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>

xen: x86: remove setting the obsolete config XEN_MAX_DOMAIN_MEMORY

Commit c70727a5bc18 ("xen: allow more than 512 GB of RAM for 64 bit
pv-domains") from July 2015 replaces the config XEN_MAX_DOMAIN_MEMORY with
a new config XEN_512GB, but misses to adjust arch/x86/configs/xen.config.
As XEN_512GB defaults to yes, there is no need to explicitly set any config
in xen.config.

Just remove setting the obsolete config XEN_MAX_DOMAIN_MEMORY.

Signed-off-by: Lukas Bulwahn <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>

LoongArch: mm: Avoid unnecessary page fault retires on shared memory types

Commit d92725256b4f22d0 ("mm: avoid unnecessary page fault retires on
shared memory types") modifies do_page_fault() to handle the VM_FAULT_
COMPLETED case, but forget to change for LoongArch, so fix it as other
architectures does.

Fixes: d92725256b4f22d0 ("mm: avoid unnecessary page fault retires on shared memory types")
Reviewed-by: Guo Ren <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Add subword xchg/cmpxchg emulation

LoongArch only support 32-bit/64-bit xchg/cmpxchg in native. But percpu
operation, qspinlock and some drivers need 8-bit/16-bit xchg/cmpxchg. We
add subword xchg/cmpxchg emulation in this patch because the emulation
has better performance than the generic implementation (on NUMA system),
and it can fix some build errors meanwhile [1].

LoongArch's guarantee for forward progress (avoid many ll/sc happening
at the same time and no one succeeds):

We have the "exclusive access (with timeout) of ll" feature to avoid
simultaneous ll (which also blocks other memory load/store on the same
address), and the "random delay of sc" feature to avoid simultaneous
sc. It is a mandatory requirement for multi-core LoongArch processors
to implement such features, only except those single-core and dual-core
processors (they also don't support multi-chip interconnection).

Feature bits are introduced in CPUCFG3, bit 3 and bit 4 [2].

[1] https://lore.kernel.org/loongarch/CAAhV-H6vvkuOzy8OemWdYK3taj5Jn3bFX0ZTwE=twM8ywpBUYA@mail.gmail.com/T/#t
[2] https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#_cpucfg

Reported-by: Sudip Mukherjee (Codethink) <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Rui Wang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Cleanup headers to avoid circular dependency

When enable GENERIC_IOREMAP, there will be circular dependency to cause
build errors. The root cause is that pgtable.h shouldn't include io.h
but pgtable.h need some macros defined in io.h. So cleanup those macros
and remove the unnecessary inclusions, as other architectures do.

Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Cleanup reset routines with new API

Cleanup reset routines by using new do_kernel_power_off() instead of old
pm_power_off(), and then simplify the whole file (reset.c) organization
by inlining some functions. This cleanup also fix a poweroff error if EFI
runtime is disabled.

Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Fix build warnings in VDSO

Fix build warnings in VDSO as below:

arch/loongarch/vdso/vgettimeofday.c:9:5: warning: no previous prototype for '__vdso_clock_gettime' [-Wmissing-prototypes]
    9 | int __vdso_clock_gettime(clockid_t clock,
      |     ^~~~~~~~~~~~~~~~~~~~
arch/loongarch/vdso/vgettimeofday.c:15:5: warning: no previous prototype for '__vdso_gettimeofday' [-Wmissing-prototypes]
   15 | int __vdso_gettimeofday(struct __kernel_old_timeval *tv,
      |     ^~~~~~~~~~~~~~~~~~~
arch/loongarch/vdso/vgettimeofday.c:21:5: warning: no previous prototype for '__vdso_clock_getres' [-Wmissing-prototypes]
   21 | int __vdso_clock_getres(clockid_t clock_id,
      |     ^~~~~~~~~~~~~~~~~~~
arch/loongarch/vdso/vgetcpu.c:27:5: warning: no previous prototype for '__vdso_getcpu' [-Wmissing-prototypes]
   27 | int __vdso_getcpu(unsigned int *cpu, unsigned int *node, struct getcpu_cache *unused)

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Select PCI_QUIRKS to avoid build error

PCI_LOONGSON is a mandatory for LoongArch and it is selected in Kconfig
unconditionally, but its dependency PCI_QUIRKS is missing and may cause
a build error when "make randconfig":

   arch/loongarch/pci/acpi.c: In function 'pci_acpi_setup_ecam_mapping':
>> arch/loongarch/pci/acpi.c:103:29: error: 'loongson_pci_ecam_ops' undeclared (first use in this function)
     103 |                 ecam_ops = &loongson_pci_ecam_ops;
         |                             ^~~~~~~~~~~~~~~~~~~~~
   arch/loongarch/pci/acpi.c:103:29: note: each undeclared identifier is reported only once for each function it appears in

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for PCI_LOONGSON
   Depends on [n]: PCI [=y] && (MACH_LOONGSON64 [=y] || COMPILE_TEST [=y]) && (OF [=y] || ACPI [=y]) && PCI_QUIRKS [=n]
   Selected by [y]:
   - LOONGARCH [=y]

Fix it by selecting PCI_QUIRKS unconditionally, too.

Reported-by: kernel test robot <[email protected]>
Tested-by: Randy Dunlap <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

ACPI: property: Remove default association from integer maximum values

Remove the default association from integer maximum value checks. It is
not necessary and has caused a bug in other associations being unnoticed.

Fixes: 923044133367 ("ACPI: property: Unify integer value reading functions")
Signed-off-by: Sakari Ailus <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

ACPI: property: Ignore already existing data node tags

ACPI node pointers are attached to data node handles, in order to resolve
string references to them. _DSD guide allows the same node to be reached
from multiple parent nodes, leading the node enumeration algorithm to each
such nodes more than once. As attached data already already exists,
attaching data with the same tag will fail. Address this problem by
ignoring nodes that have been already tagged.

Fixes: 1d52f10917a7 ("ACPI: property: Tie data nodes to acpi handles")
Reported-by: Pierre-Louis Bossart <[email protected]>
Signed-off-by: Sakari Ailus <[email protected]>
Tested-by: Pierre-Louis Bossart <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

ACPI: property: Fix type detection of unified integer reading functions

The current code expects the type of the value to be an integer type,
instead the value passed to the macro is a pointer.
Ensure the size comparison uses the correct pointer type to choose the
max value, instead of using the integer type.

Fixes: 923044133367 ("ACPI: property: Unify integer value reading functions")
Signed-off-by: Stefan Binding <[email protected]>
Reviewed-by: Sakari Ailus <[email protected]>
Tested-by: Sakari Ailus <[email protected]>
Tested-by: John Garry <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

net: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2

Properly report hw rx hash for mt7986 chipset accroding to the new dma
descriptor layout.

Fixes: 197c9e9b17b11 ("net: ethernet: mtk_eth_soc: introduce support for mt7986 chipset")
Signed-off-by: Lorenzo Bianconi <[email protected]>
Link: https://lore.kernel.org/r/091394ea4e705fbb35f828011d98d0ba33808f69.1661257293.git.lorenzo@kernel.org
Signed-off-by: Paolo Abeni <[email protected]>

cifs: Add helper function to check smb1+ server

SMB1 server's header_preamble_size is not 0, add use is_smb1 function
to simplify the code, no actual functional changes.

Reviewed-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Zhang Xiaoxu <[email protected]>
Signed-off-by: Steve French <[email protected]>

cifs: Use help macro to get the mid header size

It's better to use MID_HEADER_SIZE because the unfolded expression
too long. No actual functional changes, minor readability improvement.

Signed-off-by: Zhang Xiaoxu <[email protected]>
Reviewed-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

cifs: Use help macro to get the header preamble size

It's better to use HEADER_PREAMBLE_SIZE because the unfolded expression
too long. No actual functional changes, minor readability improvement.

Signed-off-by: Zhang Xiaoxu <[email protected]>
Reviewed-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Fix crash with malformed ebtables blob which do not provide all
   entry points, from Florian Westphal.

2) Fix possible TCP connection clogging up with default 5-days
   timeout in conntrack, from Florian.

3) Fix crash in nf_tables tproxy with unsupported chains, also from Florian.

4) Do not allow to update implicit chains.

5) Make table handle allocation per-netns to fix data race.

6) Do not truncated payload length and offset, and checksum offset.
   Instead report EINVAl.

7) Enable chain stats update via static key iff no error occurs.

8) Restrict osf expression to ip, ip6 and inet families.

9) Restrict tunnel expression to netdev family.

10) Fix crash when trying to bind again an already bound chain.

11) Flowtable garbage collector might leave behind pending work to
    delete entries. This patch comes with a previous preparation patch
    as dependency.

12) Allow net.netfilter.nf_conntrack_frag6_high_thresh to be lowered,
    from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increases
  netfilter: flowtable: fix stuck flows on cleanup due to pending work
  netfilter: flowtable: add function to invoke garbage collection immediately
  netfilter: nf_tables: disallow binding to already bound chain
  netfilter: nft_tunnel: restrict it to netdev family
  netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families
  netfilter: nf_tables: do not leave chain stats enabled on error
  netfilter: nft_payload: do not truncate csum_offset and csum_type
  netfilter: nft_payload: report ERANGE for too long offset and length
  netfilter: nf_tables: make table handle allocation per-netns friendly
  netfilter: nf_tables: disallow updates of implicit chain
  netfilter: nft_tproxy: restrict to prerouting hook
  netfilter: conntrack: work around exceeded receive window
  netfilter: ebtables: reject blobs that don't provide all entry points
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

MAINTAINERS: rectify file entry in BONDING DRIVER

Commit c078290a2b76 ("selftests: include bonding tests into the kselftest
infra") adds the bonding tests in the directory:

tools/testing/selftests/drivers/net/bonding/

The file entry in MAINTAINERS for the BONDING DRIVER however refers to:

tools/testing/selftests/net/bonding/

Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a
broken file pattern.

Repair this file entry in BONDING DRIVER.

Signed-off-by: Lukas Bulwahn <[email protected]>
Acked-by: Jonathan Toppins <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

fbdev: Move fbdev drivers from strlcpy to strscpy

Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

Merge branch 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.0

Pull MD fixes from Song:

"1. Fix for clustered raid, by Guoqing Jiang.
2. req_op fix, by Bart Van Assche.
3. Fix race condition in raid recreate, by David Sloan."

* 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md: call __md_stop_writes in md_stop
  Revert "md-raid: destroy the bitmap after destroying the thread"
  md: Flush workqueue md_rdev_misc_wq in md_alloc()
  md/raid10: Fix the data type of an r10_sync_page_io() argument

fbdev: omap: Remove unnecessary print function dev_err()

The print function dev_err() is redundant because platform_get_irq()
already prints an error.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=1957
Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Jiapeng Chong <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

fbdev: chipsfb: Add missing pci_disable_device() in chipsfb_pci_init()

Add missing pci_disable_device() in error path in chipsfb_pci_init().

Signed-off-by: Yang Yingliang <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

fbdev: fbcon: Destroy mutex on freeing struct fb_info

It's needed to destroy bl_curve_mutex on freeing struct fb_info since
the mutex is embedded in the structure and initialized when it's
allocated.

Signed-off-by: Shigeru Yoshida <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

fbdev: radeon: Clean up some inconsistent indenting

No functional modification involved.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=1932
Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Jiapeng Chong <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

fbdev: sisfb: Clean up some inconsistent indenting

No functional modification involved.

drivers/video/fbdev/sis/sis_main.c:6165 sisfb_probe() warn: inconsistent indenting.
drivers/video/fbdev/sis/sis_main.c:4266 sisfb_post_300_rwtest() warn: inconsistent indenting.
drivers/video/fbdev/sis/sis_main.c:2388 SISDoSense() warn: inconsistent indenting.
drivers/video/fbdev/sis/sis_main.c:2531 SiS_Sense30x() warn: inconsistent indenting.
drivers/video/fbdev/sis/sis_main.c:2382 SISDoSense() warn: inconsistent indenting.
drivers/video/fbdev/sis/sis_main.c:2250 sisfb_sense_crt1() warn: inconsistent indenting.
drivers/video/fbdev/sis/sis_main.c:672 sisfb_validate_mode() warn: inconsistent indenting.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=1934
Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Jiapeng Chong <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

fbdev: fb_pm2fb: Avoid potential divide by zero error

In `do_fb_ioctl()` of fbmem.c, if cmd is FBIOPUT_VSCREENINFO, var will be
copied from user, then go through `fb_set_var()` and
`info->fbops->fb_check_var()` which could may be `pm2fb_check_var()`.
Along the path, `var->pixclock` won't be modified. This function checks
whether reciprocal of `var->pixclock` is too high. If `var->pixclock` is
zero, there will be a divide by zero error. So, it is necessary to check
whether denominator is zero to avoid crash. As this bug is found by
Syzkaller, logs are listed below.

divide error in pm2fb_check_var
Call Trace:
<TASK>
fb_set_var+0x367/0xeb0 drivers/video/fbdev/core/fbmem.c:1015
do_fb_ioctl+0x234/0x670 drivers/video/fbdev/core/fbmem.c:1110
fb_ioctl+0xdd/0x130 drivers/video/fbdev/core/fbmem.c:1189

Reported-by: Zheyu Ma <[email protected]>
Signed-off-by: Letu Ren <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

fbdev: ssd1307fb: Fix repeated words in comments

Delete the redundant word 'set'.

Signed-off-by: Jilin Yuan <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

fbdev: omapfb: Fix tests for platform_get_irq() failure

The platform_get_irq() returns negative error codes. It can't actually
return zero.

Signed-off-by: Yu Zhe <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

i40e: Fix incorrect address type for IPv6 flow rules

It was not possible to create 1-tuple flow director
rule for IPv6 flow type. It was caused by incorrectly
checking for source IP address when validating user provided
destination IP address.

Fix this by changing ip6src to correct ip6dst address
in destination IP address validation for IPv6 flow type.

Fixes: efca91e89b67 ("i40e: Add flow director support for IPv6")
Signed-off-by: Sylwester Dziedziuch <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter

The ixgbe_ptp_start_cyclecounter is intended to be called whenever the
cyclecounter parameters need to be changed.

Since commit a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x
devices"), this function has cleared the SYSTIME registers and reset the
TSAUXC DISABLE_SYSTIME bit.

While these need to be cleared during ixgbe_ptp_reset, it is wrong to clear
them during ixgbe_ptp_start_cyclecounter. This function may be called
during both reset and link status change. When link changes, the SYSTIME
counter is still operating normally, but the cyclecounter should be updated
to account for the possibly changed parameters.

Clearing SYSTIME when link changes causes the timecounter to jump because
the cycle counter now reads zero.

Extract the SYSTIME initialization out to a new function and call this
during ixgbe_ptp_reset. This prevents the timecounter adjustment and avoids
an unnecessary reset of the current time.

This also restores the original SYSTIME clearing that occurred during
ixgbe_ptp_reset before the commit above.

Reported-by: Steve Payne <[email protected]>
Reported-by: Ilya Evenbach <[email protected]>
Fixes: a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x devices")
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

md: call __md_stop_writes in md_stop

From the link [1], we can see raid1d was running even after the path
raid_dtr -> md_stop -> __md_stop.

Let's stop write first in destructor to align with normal md-raid to
fix the KASAN issue.

[1]. https://lore.kernel.org/linux-raid/CAPhsuW5gc4AakdGNdF8ubpezAuDLFOYUO_sfMZcec6hQFm8nhg@mail.gmail.com/T/#m7f12bf90481c02c6d2da68c64aeed4779b7df74a

Fixes: 48df498daf62 ("md: move bitmap_destroy to the beginning of __md_stop")
Reported-by: Mikulas Patocka <[email protected]>
Signed-off-by: Guoqing Jiang <[email protected]>
Signed-off-by: Song Liu <[email protected]>

Revert "md-raid: destroy the bitmap after destroying the thread"

This reverts commit e151db8ecfb019b7da31d076130a794574c89f6f. Because it
obviously breaks clustered raid as noticed by Neil though it fixed KASAN
issue for dm-raid, let's revert it and fix KASAN issue in next commit.

[1]. https://lore.kernel.org/linux-raid/a6657e08-b6a7-358b-2d2a-0ac37d49d23a@linux.dev/T/#m95ac225cab7409f66c295772483d091084a6d470

Fixes: e151db8ecfb0 ("md-raid: destroy the bitmap after destroying the thread")
Signed-off-by: Guoqing Jiang <[email protected]>
Signed-off-by: Song Liu <[email protected]>

Merge tag 'trace-v6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:

- Fix build warning for when MODULES and FTRACE_WITH_DIRECT_CALLS are
   not set. A warning happens with ops_references_rec() defined but not
   used.

* tag 'trace-v6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Fix build warning for ops_references_rec() not used

md: Flush workqueue md_rdev_misc_wq in md_alloc()

A race condition still exists when removing and re-creating md devices
in test cases. However, it is only seen on some setups.

The race condition was tracked down to a reference still being held
to the kobject by the rdev in the md_rdev_misc_wq which will be released
in rdev_delayed_delete().

md_alloc() waits for previous deletions by waiting on the md_misc_wq,
but the md_rdev_misc_wq may still be holding a reference to a recently
removed device.

To fix this, also flush the md_rdev_misc_wq in md_alloc().

Signed-off-by: David Sloan <[email protected]>
[[email protected]: rewrote commit message]
Signed-off-by: Logan Gunthorpe <[email protected]>
Signed-off-by: Song Liu <[email protected]>

md/raid10: Fix the data type of an r10_sync_page_io() argument

Fix the following sparse warning:

drivers/md/raid10.c:2647:60: sparse: sparse: incorrect type in argument 5 (different base types) @@ expected restricted blk_opf_t [usertype] opf @@ got int rw @@

This patch does not change any functionality since REQ_OP_READ = READ = 0
and since REQ_OP_WRITE = WRITE = 1.

Cc: Rong A Chen <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Paul Menzel <[email protected]>
Fixes: 4ce4c73f662b ("md/core: Combine two sync_page_io() arguments")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
Signed-off-by: Song Liu <[email protected]>

cifs: skip extra NULL byte in filenames

Since commit:
cifs: alloc_path_with_tree_prefix: do not append sep. if the path is empty
alloc_path_with_tree_prefix() function was no longer including the
trailing separator when @path is empty, although @out_len was still
assuming a path separator thus adding an extra byte to the final
filename.

This has caused mount issues in some Synology servers due to the extra
NULL byte in filenames when sending SMB2_CREATE requests with
SMB2_FLAGS_DFS_OPERATIONS set.

Fix this by checking if @path is not empty and then add extra byte for
separator. Also, do not include any trailing NULL bytes in filename
as MS-SMB2 requires it to be 8-byte aligned and not NULL terminated.

Cc: [email protected]
Fixes: 7eacba3b00a3 ("cifs: alloc_path_with_tree_prefix: do not append sep. if the path is empty")
Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

Merge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging

Pull dmi update from Jean Delvare.

Tiny cleanup.

* 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
firmware: dmi: Use the proper accessor for the version field

lib/cpumask_kunit: add tests file to MAINTAINERS

cpumask related files are listed under the BITMAP API section, so the
file with tests for cpumask should be added to that list.

Signed-off-by: Sander Vanheule <[email protected]>
Reviewed-by: David Gow <[email protected]>
Acked-by: Yury Norov <[email protected]>
Signed-off-by: Yury Norov <[email protected]>

lib/cpumask_kunit: log mask contents

For extra context, log the contents of the masks under test. This
should help with finding out why a certain test fails.

Link: https://lore.kernel.org/lkml/CABVgOSkPXBc-PWk1zBZRQ_Tt+Sz1ruFHBj3ixojymZF=Vi4tpQ@mail.gmail.com/
Suggested-by: David Gow <[email protected]>
Signed-off-by: Sander Vanheule <[email protected]>
Signed-off-by: Yury Norov <[email protected]>

lib/test_cpumask: follow KUnit style guidelines

The cpumask test suite doesn't follow the KUnit style guidelines, as
laid out in Documentation/dev-tools/kunit/style.rst. The file is
renamed to lib/cpumask_kunit.c to clearly distinguish it from other,
non-KUnit, tests.

Link: https://lore.kernel.org/lkml/[email protected]/
Suggested-by: Maíra Canal <[email protected]>
Signed-off-by: Sander Vanheule <[email protected]>
Reviewed-by: Maíra Canal <[email protected]>
Reviewed-by: David Gow <[email protected]>
Acked-by: Yury Norov <[email protected]>
Signed-off-by: Yury Norov <[email protected]>

lib/test_cpumask: fix cpu_possible_mask last test

Since cpumask_first() on the cpu_possible_mask must return at most
nr_cpu_ids - 1 for a valid result, cpumask_last() cannot return anything
larger than this value. As test_cpumask_weight() also verifies that the
total weight of cpu_possible_mask must equal nr_cpu_ids, the last bit
set in this mask must be at nr_cpu_ids - 1.

Fixes: c41e8866c28c ("lib/test: introduce cpumask KUnit test suite")
Link: https://lore.kernel.org/lkml/[email protected]/
Reported-by: Maíra Canal <[email protected]>
Signed-off-by: Sander Vanheule <[email protected]>
Tested-by: Maíra Canal <[email protected]>
Reviewed-by: David Gow <[email protected]>
Acked-by: Yury Norov <[email protected]>
Signed-off-by: Yury Norov <[email protected]>

lib/test_cpumask: drop cpu_possible_mask full test

When the number of CPUs that can possibly be brought online is known at
boot time, e.g. when HOTPLUG is disabled, nr_cpu_ids may be smaller than
NR_CPUS. In that case, cpu_possible_mask would not be completely filled,
and cpumask_full(cpu_possible_mask) can return false for valid system
configurations.

Without this test, cpu_possible_mask contents are still constrained by
a check on cpumask_weight(), as well as tests in test_cpumask_first(),
test_cpumask_last(), test_cpumask_next(), and test_cpumask_iterators().

Fixes: c41e8866c28c ("lib/test: introduce cpumask KUnit test suite")
Link: https://lore.kernel.org/lkml/[email protected]/
Reported-by: Maíra Canal <[email protected]>
Signed-off-by: Sander Vanheule <[email protected]>
Tested-by: Maíra Canal <[email protected]>
Reviewed-by: David Gow <[email protected]>
Signed-off-by: Yury Norov <[email protected]>

io_uring: conditional ->async_data allocation

There are opcodes that need ->async_data only in some cases and
allocation it unconditionally may hurt performance. Add an option to
opdef to make move the allocation part from the core io_uring to opcode
specific code.
Note, we can't just set opdef->async_size to zero because there are
other helpers that rely on it, e.g. io_alloc_async_data().

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/9dc62be9e88dd0ed63c48365340e8922d2498293.1661342812.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

io_uring/notif: order notif vs send CQEs

Currently, there is no ordering between notification CQEs and
completions of the send flushing it, this quite complicates the
userspace, especially since we don't flush notification when the
send(+flush) request fails, i.e. there will be only one CQE. What we
can do is to make sure that notification completions come only after
sends.

The easiest way to achieve this is to not try to complete a notification
inline from io_sendzc() but defer it to task_work, considering that
io-wq sendzc is disallowed CQEs will be naturally ordered because
task_works will only be executed after we're done with submission and so
inline completion.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/cddfd1c2bf91f22b9fe08e13b7dffdd8f858a151.1661342812.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

io_uring/net: fix indentation

Fix up indentation before we get complaints from tooling.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/bd5754e3764215ccd7fb04cd636ea9167aaa275d.1661342812.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

io_uring/net: fix zc send link failing

Failed requests should be marked with req_set_fail(), so links and cqe
skipping work correctly, which is missing in io_sendzc(). Note,
io_sendzc() return IOU_OK on failure, so the core code won't do the
cleanup for us.

Fixes: 06a5464be84e4 ("io_uring: wire send zc request type")
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/e47d46fda9db30154ce66a549bb0d3380b780520.1661342812.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

io_uring/net: fix must_hold annotation

Fix up the io_alloc_notif()'s __must_hold as we don't have a ctx
argument there but should get it from the slot instead.

Reported-by: Stefan Metzmacher <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/cbb0a920f18e0aed590bf58300af817b9befb8a3.1661342812.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

loop: Check for overflow while configuring loop

The userspace can configure a loop using an ioctl call, wherein
a configuration of type loop_config is passed (see lo_ioctl()'s
case on line 1550 of drivers/block/loop.c). This proceeds to call
loop_configure() which in turn calls loop_set_status_from_info()
(see line 1050 of loop.c), passing &config->info which is of type
loop_info64*. This function then sets the appropriate values, like
the offset.

loop_device has lo_offset of type loff_t (see line 52 of loop.c),
which is typdef-chained to long long, whereas loop_info64 has
lo_offset of type __u64 (see line 56 of include/uapi/linux/loop.h).

The function directly copies offset from info to the device as
follows (See line 980 of loop.c):
lo->lo_offset = info->lo_offset;

This results in an overflow, which triggers a warning in iomap_iter()
due to a call to iomap_iter_done() which has:
WARN_ON_ONCE(iter->iomap.offset > iter->pos);

Thus, check for negative value during loop_set_status_from_info().

Bug report: https://syzkaller.appspot.com/bug?id=c620fe14aac810396d3c3edc9ad73848bf69a29e

Reported-and-tested-by: [email protected]
Cc: [email protected]
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Siddh Raman Pant <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

Merge branch 'sysctl-data-races'

Kuniyuki Iwashima says:

====================
net: sysctl: Fix data-races around net.core.XXX

This series fixes data-races around all knobs in net_core_table and
netns_core_table except for bpf stuff.

These knobs are skipped:

  - 4 bpf knobs
  - netdev_rss_key: Written only once by net_get_random_once() and
                    read-only knob
  - rps_sock_flow_entries: Protected with sock_flow_mutex
  - flow_limit_cpu_bitmap: Protected with flow_limit_update_mutex
  - flow_limit_table_len: Protected with flow_limit_update_mutex
  - default_qdisc: Protected with qdisc_mod_lock
  - warnings: Unused
  - high_order_alloc_disable: Protected with static_key_mutex
  - skb_defer_max: Already using READ_ONCE()
  - sysctl_txrehash: Already using READ_ONCE()

Note 5th patch fixes net.core.message_cost and net.core.message_burst,
and lib/ratelimit.c does not have an explicit maintainer.

Changes:
  v3:
    * Fix build failures of CONFIG_SYSCTL=n case in 13th & 14th patches

  v2: https://lore.kernel.org/netdev/20220818035227 [email protected]/
    * Remove 4 bpf knobs and added 6 knobs

  v1: https://lore.kernel.org/netdev/20220816052347 [email protected]/
====================

Signed-off-by: David S. Miller <[email protected]>

net: Fix a data-race around sysctl_somaxconn.

While reading sysctl_somaxconn, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Fix a data-race around netdev_unregister_timeout_secs.

While reading netdev_unregister_timeout_secs, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.

Fixes: 5aa3afe107d9 ("net: make unregister netdev warning timeout configurable")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Acked-by: Dmitry Vyukov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Fix a data-race around gro_normal_batch.

While reading gro_normal_batch, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 323ebb61e32b ("net: use listified RX for handling GRO_NORMAL skbs")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Acked-by: Edward Cree <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Fix data-races around sysctl_devconf_inherit_init_net.

While reading sysctl_devconf_inherit_init_net, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.

Fixes: 856c395cfa63 ("net: introduce a knob to control whether to inherit devconf config")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Fix data-races around sysctl_fb_tunnels_only_for_init_net.

While reading sysctl_fb_tunnels_only_for_init_net, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.

Fixes: 79134e6ce2c9 ("net: do not create fallback tunnels for non-default namespaces")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Fix a data-race around netdev_budget_usecs.

While reading netdev_budget_usecs, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 7acf8a1e8a28 ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Fix data-races around sysctl_max_skb_frags.

While reading sysctl_max_skb_frags, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 5f74f82ea34c ("net:Add sysctl_max_skb_frags")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Fix a data-race around netdev_budget.

While reading netdev_budget, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 51b0bdedb8e7 ("[NET]: Separate two usages of netdev_max_backlog.")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Fix a data-race around sysctl_net_busy_read.

While reading sysctl_net_busy_read, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 2d48d67fa8cd ("net: poll/select low latency socket support")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Fix a data-race around sysctl_net_busy_poll.

While reading sysctl_net_busy_poll, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 060212928670 ("net: add low latency socket poll")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Fix a data-race around sysctl_tstamp_allow_data.

While reading sysctl_tstamp_allow_data, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.

Fixes: b245be1f4db1 ("net-timestamp: no-payload only sysctl")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Fix data-races around sysctl_optmem_max.

While reading sysctl_optmem_max, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ratelimit: Fix data-races in ___ratelimit().

While reading rs->interval and rs->burst, they can be changed
concurrently via sysctl (e.g. net_ratelimit_state). Thus, we
need to add READ_ONCE() to their readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Fix data-races around netdev_tstamp_prequeue.

While reading netdev_tstamp_prequeue, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 3b098e2d7c69 ("net: Consistent skb timestamping")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Fix data-races around netdev_max_backlog.

While reading netdev_max_backlog, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

While at it, we remove the unnecessary spaces in the doc.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Fix data-races around weight_p and dev_weight_[rt]x_bias.

While reading weight_p, it can be changed concurrently.  Thus, we need
to add READ_ONCE() to its reader.

Also, dev_[rt]x_weight can be read/written at the same time.  So, we
need to use READ_ONCE() and WRITE_ONCE() for its access.  Moreover, to
use the same weight_p while changing dev_[rt]x_weight, we add a mutex
in proc_do_dev_weight().

Fixes: 3d48b53fb2ae ("net: dev_weight: TX/RX orthogonality")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Fix data-races around sysctl_[rw]mem_(max|default).

While reading sysctl_[rw]mem_(max|default), they can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/core/skbuff: Check the return value of skb_copy_bits()

skb_copy_bits() could fail, which requires a check on the return
value.

Signed-off-by: Li Zhong <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2022-08-24

1) Fix a refcount leak in __xfrm_policy_check.
   From Xin Xiong.

2) Revert "xfrm: update SA curlft.use_time". This
   violates RFC 2367. From Antony Antony.

3) Fix a comment on XFRMA_LASTUSED.
   From Antony Antony.

4) x->lastused is not cloned in xfrm_do_migrate.
   Fix from Antony Antony.

5) Serialize the calls to xfrm_probe_algs.
   From Herbert Xu.

6) Fix a null pointer dereference of dst->dev on a metadata
   dst in xfrm_lookup_with_ifid. From Nikolay Aleksandrov.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <[email protected]>

fec: Restart PPS after link state change

On link state change, the controller gets reset,
causing PPS to drop out and the PHC to lose its
time and calibration. So we restart it if needed,
restoring calibration and time registers.

Changes since v2:
* Add `fec_ptp_save_state()`/`fec_ptp_restore_state()`
* Use `ktime_get_real_ns()`
* Use `BIT()` macro
Changes since v1:
* More ECR #define's
* Stop PPS in `fec_ptp_stop()`

Signed-off-by: Csókás Bence <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: neigh: don't call kfree_skb() under spin_lock_irqsave()

It is not allowed to call kfree_skb() from hardware interrupt
context or with interrupts being disabled. So add all skb to
a tmp list, then free them after spin_unlock_irqrestore() at
once.

Fixes: 66ba215cb513 ("neigh: fix possible DoS due to net iface start/stop loop")
Suggested-by: Denis V. Lunev <[email protected]>
Signed-off-by: Yang Yingliang <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

x86/sev: Don't use cc_platform_has() for early SEV-SNP calls

When running identity-mapped and depending on the kernel configuration,
it is possible that the compiler uses jump tables when generating code
for cc_platform_has().

This causes a boot failure because the jump table uses un-mapped kernel
virtual addresses, not identity-mapped addresses. This has been seen
with CONFIG_RETPOLINE=n.

Similar to sme_encrypt_kernel(), use an open-coded direct check for the
status of SNP rather than trying to eliminate the jump table. This
preserves any code optimization in cc_platform_has() that can be useful
post boot. It also limits the changes to SEV-specific files so that
future compiler features won't necessarily require possible build changes
just because they are not compatible with running identity-mapped.

[ bp: Massage commit message. ]

Fixes: 5e5ccff60a29 ("x86/sev: Add helper for validating pages in early enc attribute changes")
Reported-by: Sean Christopherson <[email protected]>
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: <[email protected]> # 5.19.x
Link: https://lore.kernel.org/all/[email protected]/

x86/boot: Don't propagate uninitialized boot_params->cc_blob_address

In some cases, bootloaders will leave boot_params->cc_blob_address
uninitialized rather than zeroing it out. This field is only meant to be
set by the boot/compressed kernel in order to pass information to the
uncompressed kernel when SEV-SNP support is enabled.

Therefore, there are no cases where the bootloader-provided values
should be treated as anything other than garbage. Otherwise, the
uncompressed kernel may attempt to access this bogus address, leading to
a crash during early boot.

Normally, sanitize_boot_params() would be used to clear out such fields
but that happens too late: sev_enable() may have already initialized
it to a valid value that should not be zeroed out. Instead, have
sev_enable() zero it out unconditionally beforehand.

Also ensure this happens for !CONFIG_AMD_MEM_ENCRYPT as well by also
including this handling in the sev_enable() stub function.

[ bp: Massage commit message and comments. ]

Fixes: b190a043c49a ("x86/sev: Add SEV-SNP feature detection/setup")
Reported-by: Jeremi Piotrowski <[email protected]>
Reported-by: [email protected]
Signed-off-by: Michael Roth <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: [email protected]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216387
Link: https://lore.kernel.org/r/[email protected]

netfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increases

Currently, net.netfilter.nf_conntrack_frag6_high_thresh can only be lowered.

I found this issue while investigating a probable kernel issue
causing flakes in tools/testing/selftests/net/ip_defrag.sh

In particular, these sysctl changes were ignored:
ip netns exec "${NETNS}" sysctl -w net.netfilter.nf_conntrack_frag6_high_thresh=9000000 >/dev/null 2>&1
ip netns exec "${NETNS}" sysctl -w net.netfilter.nf_conntrack_frag6_low_thresh=7000000 >/dev/null 2>&1

This change is inline with commit 836196239298 ("net/ipfrag: let ip[6]frag_high_thresh
in ns be higher than in init_net")

Fixes: 8db3d41569bb ("netfilter: nf_defrag_ipv6: use net_generic infra")
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: flowtable: fix stuck flows on cleanup due to pending work

To clear the flow table on flow table free, the following sequence
normally happens in order:

  1) gc_step work is stopped to disable any further stats/del requests.
  2) All flow table entries are set to teardown state.
  3) Run gc_step which will queue HW del work for each flow table entry.
  4) Waiting for the above del work to finish (flush).
  5) Run gc_step again, deleting all entries from the flow table.
  6) Flow table is freed.

But if a flow table entry already has pending HW stats or HW add work
step 3 will not queue HW del work (it will be skipped), step 4 will wait
for the pending add/stats to finish, and step 5 will queue HW del work
which might execute after freeing of the flow table.

To fix the above, this patch flushes the pending work, then it sets the
teardown flag to all flows in the flowtable and it forces a garbage
collector run to queue work to remove the flows from hardware, then it
flushes this new pending work and (finally) it forces another garbage
collector run to remove the entry from the software flowtable.

Stack trace:
[47773.882335] BUG: KASAN: use-after-free in down_read+0x99/0x460
[47773.883634] Write of size 8 at addr ffff888103b45aa8 by task kworker/u20:6/543704
[47773.885634] CPU: 3 PID: 543704 Comm: kworker/u20:6 Not tainted 5.12.0-rc7+ #2
[47773.886745] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)
[47773.888438] Workqueue: nf_ft_offload_del flow_offload_work_handler [nf_flow_table]
[47773.889727] Call Trace:
[47773.890214]  dump_stack+0xbb/0x107
[47773.890818]  print_address_description.constprop.0+0x18/0x140
[47773.892990]  kasan_report.cold+0x7c/0xd8
[47773.894459]  kasan_check_range+0x145/0x1a0
[47773.895174]  down_read+0x99/0x460
[47773.899706]  nf_flow_offload_tuple+0x24f/0x3c0 [nf_flow_table]
[47773.907137]  flow_offload_work_handler+0x72d/0xbe0 [nf_flow_table]
[47773.913372]  process_one_work+0x8ac/0x14e0
[47773.921325]
[47773.921325] Allocated by task 592159:
[47773.922031]  kasan_save_stack+0x1b/0x40
[47773.922730]  __kasan_kmalloc+0x7a/0x90
[47773.923411]  tcf_ct_flow_table_get+0x3cb/0x1230 [act_ct]
[47773.924363]  tcf_ct_init+0x71c/0x1156 [act_ct]
[47773.925207]  tcf_action_init_1+0x45b/0x700
[47773.925987]  tcf_action_init+0x453/0x6b0
[47773.926692]  tcf_exts_validate+0x3d0/0x600
[47773.927419]  fl_change+0x757/0x4a51 [cls_flower]
[47773.928227]  tc_new_tfilter+0x89a/0x2070
[47773.936652]
[47773.936652] Freed by task 543704:
[47773.937303]  kasan_save_stack+0x1b/0x40
[47773.938039]  kasan_set_track+0x1c/0x30
[47773.938731]  kasan_set_free_info+0x20/0x30
[47773.939467]  __kasan_slab_free+0xe7/0x120
[47773.940194]  slab_free_freelist_hook+0x86/0x190
[47773.941038]  kfree+0xce/0x3a0
[47773.941644]  tcf_ct_flow_table_cleanup_work

Original patch description and stack trace by Paul Blakey.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Reported-by: Paul Blakey <[email protected]>
Tested-by: Paul Blakey <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: flowtable: add function to invoke garbage collection immediately

Expose nf_flow_table_gc_run() to force a garbage collector run from the
offload infrastructure.

Signed-off-by: Pablo Neira Ayuso <[email protected]>