Git Repo - linux.git/log

of: property: Add in-ports/out-ports support to of_graph_get_port_parent()

Similar to the existing "ports" node name, coresight device tree bindings
have added "in-ports" and "out-ports" as standard node names for a
collection of ports.

Add support for these name to of_graph_get_port_parent() so that
remote-endpoint parsing can find the correct parent node for these
coresight ports too.

Signed-off-by: Saravana Kannan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>

of: property: Improve finding the supplier of a remote-endpoint property

After commit 4a032827daa8 ("of: property: Simplify of_link_to_phandle()"),
remote-endpoint properties created a fwnode link from the consumer device
to the supplier endpoint. This is a tiny bit inefficient (not buggy) when
trying to create device links or detecting cycles. So, improve this the
same way we improved finding the consumer of a remote-endpoint property.

Fixes: 4a032827daa8 ("of: property: Simplify of_link_to_phandle()")
Signed-off-by: Saravana Kannan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>

of: property: Improve finding the consumer of a remote-endpoint property

We have a more accurate function to find the right consumer of a
remote-endpoint property instead of searching for a parent with
compatible string property. So, use that instead. While at it, make the
code to find the consumer a bit more flexible and based on the property
being parsed.

Fixes: f7514a663016 ("of: property: fw_devlink: Add support for remote-endpoint")
Signed-off-by: Saravana Kannan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>

Revert "usb: dwc3: Support EBC feature of DWC_usb31"

This reverts commit 398aa9a7e77cf23c2a6f882ddd3dcd96f21771dc.

The update to the gadget API to support EBC feature is incomplete. It's
missing at least the following:
* New usage documentation
* Gadget capability check
* Condition for the user to check how many and which endpoints can be
   used as "fifo_mode"
* Description of how it can affect completed request (e.g. dwc3 won't
   update TRB on completion -- ie. how it can affect request's actual
   length report)

Let's revert this until it's ready.

Fixes: 398aa9a7e77c ("usb: dwc3: Support EBC feature of DWC_usb31")
Signed-off-by: Thinh Nguyen <[email protected]>
Link: https://lore.kernel.org/r/3042f847ff904b4dd4e4cf66a1b9df470e63439e.1707441690.git.Thinh.Nguyen@synopsys.com
Signed-off-by: Greg Kroah-Hartman <[email protected]>

ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default

The commit 60c8971899f3 ("ftrace: Make DIRECT_CALLS work WITH_ARGS
and !WITH_REGS") changed DIRECT_CALLS to use SAVE_ARGS when there
are multiple ftrace_ops at the same function, but since the x86 only
support to jump to direct_call from ftrace_regs_caller, when we set
the function tracer on the same target function on x86, ftrace-direct
does not work as below (this actually works on arm64.)

At first, insmod ftrace-direct.ko to put a direct_call on
'wake_up_process()'.

# insmod kernel/samples/ftrace/ftrace-direct.ko
# less trace
...
          <idle>-0       [006] ..s1.   564.686958: my_direct_func: waking up rcu_preempt-17
          <idle>-0       [007] ..s1.   564.687836: my_direct_func: waking up kcompactd0-63
          <idle>-0       [006] ..s1.   564.690926: my_direct_func: waking up rcu_preempt-17
          <idle>-0       [006] ..s1.   564.696872: my_direct_func: waking up rcu_preempt-17
          <idle>-0       [007] ..s1.   565.191982: my_direct_func: waking up kcompactd0-63

Setup a function filter to the 'wake_up_process' too, and enable it.

# cd /sys/kernel/tracing/
# echo wake_up_process > set_ftrace_filter
# echo function > current_tracer
# less trace
...
          <idle>-0       [006] ..s3.   686.180972: wake_up_process <-call_timer_fn
          <idle>-0       [006] ..s3.   686.186919: wake_up_process <-call_timer_fn
          <idle>-0       [002] ..s3.   686.264049: wake_up_process <-call_timer_fn
          <idle>-0       [002] d.h6.   686.515216: wake_up_process <-kick_pool
          <idle>-0       [002] d.h6.   686.691386: wake_up_process <-kick_pool

Then, only function tracer is shown on x86.
But if you enable 'kprobe on ftrace' event (which uses SAVE_REGS flag)
on the same function, it is shown again.

# echo 'p wake_up_process' >> dynamic_events
# echo 1 > events/kprobes/p_wake_up_process_0/enable
# echo > trace
# less trace
...
          <idle>-0       [006] ..s2.  2710.345919: p_wake_up_process_0: (wake_up_process+0x4/0x20)
          <idle>-0       [006] ..s3.  2710.345923: wake_up_process <-call_timer_fn
          <idle>-0       [006] ..s1.  2710.345928: my_direct_func: waking up rcu_preempt-17
          <idle>-0       [006] ..s2.  2710.349931: p_wake_up_process_0: (wake_up_process+0x4/0x20)
          <idle>-0       [006] ..s3.  2710.349934: wake_up_process <-call_timer_fn
          <idle>-0       [006] ..s1.  2710.349937: my_direct_func: waking up rcu_preempt-17

To fix this issue, use SAVE_REGS flag for multiple ftrace_ops flag of
direct_call by default.

Link: https://lore.kernel.org/linux-trace-kernel/170484558617.178953.1590516949390270842.stgit@devnote2
Fixes: 60c8971899f3 ("ftrace: Make DIRECT_CALLS work WITH_ARGS and !WITH_REGS")
Cc: [email protected]
Cc: Florent Revest <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Reviewed-by: Mark Rutland <[email protected]>
Tested-by: Mark Rutland <[email protected]> [arm64]
Acked-by: Jiri Olsa <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

RISC-V: KVM: Use correct restricted types

__le32 and __le64 types should be used with le32_to_cpu() and
le64_to_cpu() and __user is needed for pointers referencing
guest memory, as sparse helpfully points out.

Fixes: e9f12b5fff8a ("RISC-V: KVM: Implement SBI STA extension")
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Andrew Jones <[email protected]>
Reviewed-by: Atish Patra <[email protected]>
Signed-off-by: Anup Patel <[email protected]>

RISC-V: paravirt: Use correct restricted types

__le32 and __le64 types should be used with le32_to_cpu() and
le64_to_cpu(), as sparse helpfully points out.

Fixes: fdf68acccfc6 ("RISC-V: paravirt: Implement steal-time support")
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Andrew Jones <[email protected]>
Reviewed-by: Atish Patra <[email protected]>
Signed-off-by: Anup Patel <[email protected]>

RISC-V: paravirt: steal_time should be static

steal_time is not used outside paravirt.c, make it static,
as sparse suggested.

Fixes: fdf68acccfc6 ("RISC-V: paravirt: Implement steal-time support")
Signed-off-by: Andrew Jones <[email protected]>
Reviewed-by: Atish Patra <[email protected]>
Signed-off-by: Anup Patel <[email protected]>

selftests: net: add more missing kernel config

The reuseport_addr_any.sh is currently skipping DCCP tests and
pmtu.sh is skipping all the FOU/GUE related cases: add the missing
options.

Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/38d3ca7f909736c1aef56e6244d67c82a9bba6ff.1707326987.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>

devlink: Fix command annotation documentation

Command example string is not read as command.
Fix command annotation.

Fixes: a8ce7b26a51e ("devlink: Expose port function commands to control migratable")
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY

Do not report the XDP capability NETDEV_XDP_ACT_XSK_ZEROCOPY as the
bonding driver does not support XDP and AF_XDP in zero-copy mode even
if the real NIC drivers do.

Note that the driver used to report everything as supported before a
device was bonded. Instead of just masking out the zero-copy support
from this, have the driver report that no XDP feature is supported
until a real device is bonded. This seems to be more truthful as it is
the real drivers that decide what XDP features are supported.

Fixes: cb9e6e584d58 ("bonding: add xdp_features support")
Reported-by: Prashant Batra <[email protected]>
Link: https://lore.kernel.org/all/CAJ8uoz2ieZCopgqTvQ9ZY6xQgTbujmC6XkMTamhp68O-h_-rLg@mail.gmail.com/T/
Signed-off-by: Magnus Karlsson <[email protected]>
Reviewed-by: Toke Høiland-Jørgensen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net/handshake: Fix handshake_req_destroy_test1

Recently, handshake_req_destroy_test1 started failing:

Expected handshake_req_destroy_test == req, but
handshake_req_destroy_test == 0000000000000000
req == 0000000060f99b40
not ok 11 req_destroy works

This is because "sock_release(sock)" was replaced with "fput(filp)"
to address a memory leak. Note that sock_release() is synchronous
but fput() usually delays the final close and clean-up.

The delay is not consequential in the other cases that were changed
but handshake_req_destroy_test1 is testing that handshake_req_cancel()
followed by closing the file actually does call the ->hp_destroy
method. Thus the PTR_EQ test at the end has to be sure that the
final close is complete before it checks the pointer.

We cannot use a completion here because if ->hp_destroy is never
called (ie, there is an API bug) then the test will hang.

Reported by: Guenter Roeck <[email protected]>
Closes: https://lore.kernel.org/netdev/[email protected]/T/#mac5c6299f86799f1c71776f3a07f9c566c7c3c40
Fixes: 4a0f07d71b04 ("net/handshake: Fix memory leak in __sock_create() and sock_alloc_file()")
Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Link: https://lore.kernel.org/r/170724699027.91401.7839730697326806733.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <[email protected]>

net/mlx5: DPLL, Fix possible use after free after delayed work timer triggers

I managed to hit following use after free warning recently:

[ 2169.711665] ==================================================================
[ 2169.714009] BUG: KASAN: slab-use-after-free in __run_timers.part.0+0x179/0x4c0
[ 2169.716293] Write of size 8 at addr ffff88812b326a70 by task swapper/4/0

[ 2169.719022] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 6.8.0-rc2jiri+ #2
[ 2169.720974] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 2169.722457] Call Trace:
[ 2169.722756]  <IRQ>
[ 2169.723024]  dump_stack_lvl+0x58/0xb0
[ 2169.723417]  print_report+0xc5/0x630
[ 2169.723807]  ? __virt_addr_valid+0x126/0x2b0
[ 2169.724268]  kasan_report+0xbe/0xf0
[ 2169.724667]  ? __run_timers.part.0+0x179/0x4c0
[ 2169.725116]  ? __run_timers.part.0+0x179/0x4c0
[ 2169.725570]  __run_timers.part.0+0x179/0x4c0
[ 2169.726003]  ? call_timer_fn+0x320/0x320
[ 2169.726404]  ? lock_downgrade+0x3a0/0x3a0
[ 2169.726820]  ? kvm_clock_get_cycles+0x14/0x20
[ 2169.727257]  ? ktime_get+0x92/0x150
[ 2169.727630]  ? lapic_next_deadline+0x35/0x60
[ 2169.728069]  run_timer_softirq+0x40/0x80
[ 2169.728475]  __do_softirq+0x1a1/0x509
[ 2169.728866]  irq_exit_rcu+0x95/0xc0
[ 2169.729241]  sysvec_apic_timer_interrupt+0x6b/0x80
[ 2169.729718]  </IRQ>
[ 2169.729993]  <TASK>
[ 2169.730259]  asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 2169.730755] RIP: 0010:default_idle+0x13/0x20
[ 2169.731190] Code: c0 08 00 00 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc 8b 05 9a 7f 1f 02 85 c0 7e 07 0f 00 2d cf 69 43 00 fb f4 <fa> c3 66 66 2e 0f 1f 84 00 00 00 00 00 65 48 8b 04 25 c0 93 04 00
[ 2169.732759] RSP: 0018:ffff888100dbfe10 EFLAGS: 00000242
[ 2169.733264] RAX: 0000000000000001 RBX: ffff888100d9c200 RCX: ffffffff8241bd62
[ 2169.733925] RDX: ffffed109a848b15 RSI: 0000000000000004 RDI: ffffffff8127ac55
[ 2169.734566] RBP: 0000000000000004 R08: 0000000000000000 R09: ffffed109a848b14
[ 2169.735200] R10: ffff8884d42458a3 R11: 000000000000ba7e R12: ffffffff83d7d3a0
[ 2169.735835] R13: 1ffff110201b7fc6 R14: 0000000000000000 R15: ffff888100d9c200
[ 2169.736478]  ? ct_kernel_exit.constprop.0+0xa2/0xc0
[ 2169.736954]  ? do_idle+0x285/0x290
[ 2169.737323]  default_idle_call+0x63/0x90
[ 2169.737730]  do_idle+0x285/0x290
[ 2169.738089]  ? arch_cpu_idle_exit+0x30/0x30
[ 2169.738511]  ? mark_held_locks+0x1a/0x80
[ 2169.738917]  ? lockdep_hardirqs_on_prepare+0x12e/0x200
[ 2169.739417]  cpu_startup_entry+0x30/0x40
[ 2169.739825]  start_secondary+0x19a/0x1c0
[ 2169.740229]  ? set_cpu_sibling_map+0xbd0/0xbd0
[ 2169.740673]  secondary_startup_64_no_verify+0x15d/0x16b
[ 2169.741179]  </TASK>

[ 2169.741686] Allocated by task 1098:
[ 2169.742058]  kasan_save_stack+0x1c/0x40
[ 2169.742456]  kasan_save_track+0x10/0x30
[ 2169.742852]  __kasan_kmalloc+0x83/0x90
[ 2169.743246]  mlx5_dpll_probe+0xf5/0x3c0 [mlx5_dpll]
[ 2169.743730]  auxiliary_bus_probe+0x62/0xb0
[ 2169.744148]  really_probe+0x127/0x590
[ 2169.744534]  __driver_probe_device+0xd2/0x200
[ 2169.744973]  device_driver_attach+0x6b/0xf0
[ 2169.745402]  bind_store+0x90/0xe0
[ 2169.745761]  kernfs_fop_write_iter+0x1df/0x2a0
[ 2169.746210]  vfs_write+0x41f/0x790
[ 2169.746579]  ksys_write+0xc7/0x160
[ 2169.746947]  do_syscall_64+0x6f/0x140
[ 2169.747333]  entry_SYSCALL_64_after_hwframe+0x46/0x4e

[ 2169.748049] Freed by task 1220:
[ 2169.748393]  kasan_save_stack+0x1c/0x40
[ 2169.748789]  kasan_save_track+0x10/0x30
[ 2169.749188]  kasan_save_free_info+0x3b/0x50
[ 2169.749621]  poison_slab_object+0x106/0x180
[ 2169.750044]  __kasan_slab_free+0x14/0x50
[ 2169.750451]  kfree+0x118/0x330
[ 2169.750792]  mlx5_dpll_remove+0xf5/0x110 [mlx5_dpll]
[ 2169.751271]  auxiliary_bus_remove+0x2e/0x40
[ 2169.751694]  device_release_driver_internal+0x24b/0x2e0
[ 2169.752191]  unbind_store+0xa6/0xb0
[ 2169.752563]  kernfs_fop_write_iter+0x1df/0x2a0
[ 2169.753004]  vfs_write+0x41f/0x790
[ 2169.753381]  ksys_write+0xc7/0x160
[ 2169.753750]  do_syscall_64+0x6f/0x140
[ 2169.754132]  entry_SYSCALL_64_after_hwframe+0x46/0x4e

[ 2169.754847] Last potentially related work creation:
[ 2169.755315]  kasan_save_stack+0x1c/0x40
[ 2169.755709]  __kasan_record_aux_stack+0x9b/0xf0
[ 2169.756165]  __queue_work+0x382/0x8f0
[ 2169.756552]  call_timer_fn+0x126/0x320
[ 2169.756941]  __run_timers.part.0+0x2ea/0x4c0
[ 2169.757376]  run_timer_softirq+0x40/0x80
[ 2169.757782]  __do_softirq+0x1a1/0x509

[ 2169.758387] Second to last potentially related work creation:
[ 2169.758924]  kasan_save_stack+0x1c/0x40
[ 2169.759322]  __kasan_record_aux_stack+0x9b/0xf0
[ 2169.759773]  __queue_work+0x382/0x8f0
[ 2169.760156]  call_timer_fn+0x126/0x320
[ 2169.760550]  __run_timers.part.0+0x2ea/0x4c0
[ 2169.760978]  run_timer_softirq+0x40/0x80
[ 2169.761381]  __do_softirq+0x1a1/0x509

[ 2169.761998] The buggy address belongs to the object at ffff88812b326a00
                which belongs to the cache kmalloc-256 of size 256
[ 2169.763061] The buggy address is located 112 bytes inside of
                freed 256-byte region [ffff88812b326a00, ffff88812b326b00)

[ 2169.764346] The buggy address belongs to the physical page:
[ 2169.764866] page:000000000f2b1e89 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12b324
[ 2169.765731] head:000000000f2b1e89 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 2169.766484] anon flags: 0x200000000000840(slab|head|node=0|zone=2)
[ 2169.767048] page_type: 0xffffffff()
[ 2169.767422] raw: 0200000000000840 ffff888100042b40 0000000000000000 dead000000000001
[ 2169.768183] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
[ 2169.768899] page dumped because: kasan: bad access detected

[ 2169.769649] Memory state around the buggy address:
[ 2169.770116]  ffff88812b326900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 2169.770805]  ffff88812b326980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 2169.771485] >ffff88812b326a00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 2169.772173]                                                              ^
[ 2169.772787]  ffff88812b326a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 2169.773477]  ffff88812b326b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 2169.774160] ==================================================================
[ 2169.774845] ==================================================================

I didn't manage to reproduce it. Though the issue seems to be obvious.
There is a chance that the mlx5_dpll_remove() calls
cancel_delayed_work() when the work runs and manages to re-arm itself.
In that case, after delay timer triggers next attempt to queue it,
it works with freed memory.

Fix this by using cancel_delayed_work_sync() instead which makes sure
that work is done when it returns.

Fixes: 496fd0a26bbf ("mlx5: Implement SyncE support using DPLL infrastructure")
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

dpll: fix possible deadlock during netlink dump operation

Recently, I've been hitting following deadlock warning during dpll pin
dump:

[52804.637962] ======================================================
[52804.638536] WARNING: possible circular locking dependency detected
[52804.639111] 6.8.0-rc2jiri+ #1 Not tainted
[52804.639529] ------------------------------------------------------
[52804.640104] python3/2984 is trying to acquire lock:
[52804.640581] ffff88810e642678 (nlk_cb_mutex-GENERIC){+.+.}-{3:3}, at: netlink_dump+0xb3/0x780
[52804.641417]
               but task is already holding lock:
[52804.642010] ffffffff83bde4c8 (dpll_lock){+.+.}-{3:3}, at: dpll_lock_dumpit+0x13/0x20
[52804.642747]
               which lock already depends on the new lock.

[52804.643551]
               the existing dependency chain (in reverse order) is:
[52804.644259]
               -> #1 (dpll_lock){+.+.}-{3:3}:
[52804.644836]        lock_acquire+0x174/0x3e0
[52804.645271]        __mutex_lock+0x119/0x1150
[52804.645723]        dpll_lock_dumpit+0x13/0x20
[52804.646169]        genl_start+0x266/0x320
[52804.646578]        __netlink_dump_start+0x321/0x450
[52804.647056]        genl_family_rcv_msg_dumpit+0x155/0x1e0
[52804.647575]        genl_rcv_msg+0x1ed/0x3b0
[52804.648001]        netlink_rcv_skb+0xdc/0x210
[52804.648440]        genl_rcv+0x24/0x40
[52804.648831]        netlink_unicast+0x2f1/0x490
[52804.649290]        netlink_sendmsg+0x36d/0x660
[52804.649742]        __sock_sendmsg+0x73/0xc0
[52804.650165]        __sys_sendto+0x184/0x210
[52804.650597]        __x64_sys_sendto+0x72/0x80
[52804.651045]        do_syscall_64+0x6f/0x140
[52804.651474]        entry_SYSCALL_64_after_hwframe+0x46/0x4e
[52804.652001]
               -> #0 (nlk_cb_mutex-GENERIC){+.+.}-{3:3}:
[52804.652650]        check_prev_add+0x1ae/0x1280
[52804.653107]        __lock_acquire+0x1ed3/0x29a0
[52804.653559]        lock_acquire+0x174/0x3e0
[52804.653984]        __mutex_lock+0x119/0x1150
[52804.654423]        netlink_dump+0xb3/0x780
[52804.654845]        __netlink_dump_start+0x389/0x450
[52804.655321]        genl_family_rcv_msg_dumpit+0x155/0x1e0
[52804.655842]        genl_rcv_msg+0x1ed/0x3b0
[52804.656272]        netlink_rcv_skb+0xdc/0x210
[52804.656721]        genl_rcv+0x24/0x40
[52804.657119]        netlink_unicast+0x2f1/0x490
[52804.657570]        netlink_sendmsg+0x36d/0x660
[52804.658022]        __sock_sendmsg+0x73/0xc0
[52804.658450]        __sys_sendto+0x184/0x210
[52804.658877]        __x64_sys_sendto+0x72/0x80
[52804.659322]        do_syscall_64+0x6f/0x140
[52804.659752]        entry_SYSCALL_64_after_hwframe+0x46/0x4e
[52804.660281]
               other info that might help us debug this:

[52804.661077]  Possible unsafe locking scenario:

[52804.661671]        CPU0                    CPU1
[52804.662129]        ----                    ----
[52804.662577]   lock(dpll_lock);
[52804.662924]                                lock(nlk_cb_mutex-GENERIC);
[52804.663538]                                lock(dpll_lock);
[52804.664073]   lock(nlk_cb_mutex-GENERIC);
[52804.664490]

The issue as follows: __netlink_dump_start() calls control->start(cb)
with nlk->cb_mutex held. In control->start(cb) the dpll_lock is taken.
Then nlk->cb_mutex is released and taken again in netlink_dump(), while
dpll_lock still being held. That leads to ABBA deadlock when another
CPU races with the same operation.

Fix this by moving dpll_lock taking into dumpit() callback which ensures
correct lock taking order.

Fixes: 9d71b54b65b1 ("dpll: netlink: Add DPLL framework base functions")
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Arkadiusz Kubalewski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'drm-msm-fixes-2024-02-07' of https://gitlab.freedesktop.org/drm/msm into drm-fixes

Fixes for v6.8-rc4

DPU:
- fix for kernel doc warnings and smatch warnings in dpu_encoder
- fix for smatch warning in dpu_encoder
- fix the bus bandwidth value for SDM670

DP:
- fixes to handle unknown bpc case correctly for DP. The current code was
  spilling over into other bits of DP configuration register, had to be
  fixed to avoid the extra shifts which were causing the spill over
- fix for MISC0 programming in DP driver to program the correct
  colorimetry value

GPU:
- dmabuf vmap fix
- a610 UBWC corruption fix (incorrect hbb)
- revert a commit that was making GPU recovery unreliable

Signed-off-by: Dave Airlie <[email protected]>
From: Rob Clark <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGv+tb1+_cp7ftxcMZbbxE9810rvxeaC50eL=msQ+zkm0g@mail.gmail.com

Merge tag 'amd-drm-fixes-6.8-2024-02-08' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.8-2024-02-08:

amdgpu:
- Misc NULL/bounds check fixes
- ODM pipe policy fix
- Aborted suspend fixes
- JPEG 4.0.5 fix
- DCN 3.5 fixes
- PSP fix
- DP MST fix
- Phantom pipe fix
- VRAM vendor fix
- Clang fix
- SR-IOV fix

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'drm-intel-fixes-2024-02-08' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Just includes gvt-fixes-2024-02-05

Signed-off-by: Dave Airlie <[email protected]>
From: Joonas Lahtinen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'drm-xe-fixes-2024-02-08' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- Fix a loop in an error path
- Fix a missing dma-fence reference
- Fix a retry path on userptr REMAP
- Workaround for a false gcc warning
- Fix missing map of the usm batch buffer
in the migrate vm.
- Fix a memory leak.
- Fix a bad assumption of used page size
- Fix hitting a BUG() due to zero pages to map.
- Remove some leftover async bind queue relics

Signed-off-by: Dave Airlie <[email protected]>
From: Thomas Hellstrom <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/ZcS2LllawGifubsk@fedora

Merge tag 'drm-misc-fixes-2024-02-08' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

A null pointer dereference fix for v3d, a TTM pool initialization fix,
several fixes for nouveau around register size, DMA buffer leaks and API
consistency, a multiple fixes for ivpu around MMU setup, initialization
and firmware interactions.

Signed-off-by: Dave Airlie <[email protected]>
From: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/4wsi2i6kgkqdu7nzp4g7hxasbswnrmc5cakgf5zzvnix53u7lr@4rmp7hwblow3

Merge tag 'net-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from WiFi and netfilter.

  Current release - regressions:

   - nic: intel: fix old compiler regressions

   - netfilter: ipset: missing gc cancellations fixed

  Current release - new code bugs:

   - netfilter: ctnetlink: fix filtering for zone 0

  Previous releases - regressions:

   - core: fix from address in memcpy_to_iter_csum()

   - netfilter: nfnetlink_queue: un-break NF_REPEAT

   - af_unix: fix memory leak for dead unix_(sk)->oob_skb in GC.

   - devlink: avoid potential loop in devlink_rel_nested_in_notify_work()

   - iwlwifi:
       - mvm: fix a battery life regression
       - fix double-free bug

   - mac80211: fix waiting for beacons logic

   - nic: nfp: flower: prevent re-adding mac index for bonded port

  Previous releases - always broken:

   - rxrpc: fix generation of serial numbers to skip zero

   - tipc: check the bearer type before calling tipc_udp_nl_bearer_add()

   - tunnels: fix out of bounds access when building IPv6 PMTU error

   - nic: hv_netvsc: register VF in netvsc_probe if NET_DEVICE_REGISTER
     missed

   - nic: atlantic: fix DMA mapping for PTP hwts ring

  Misc:

   - selftests: more fixes to deal with very slow hosts"

* tag 'net-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (80 commits)
  netfilter: nft_set_pipapo: remove scratch_aligned pointer
  netfilter: nft_set_pipapo: add helper to release pcpu scratch area
  netfilter: nft_set_pipapo: store index in scratch maps
  netfilter: nft_set_rbtree: skip end interval element from gc
  netfilter: nfnetlink_queue: un-break NF_REPEAT
  netfilter: nf_tables: use timestamp to check for set element timeout
  netfilter: nft_ct: reject direction for ct id
  netfilter: ctnetlink: fix filtering for zone 0
  s390/qeth: Fix potential loss of L3-IP@ in case of network issues
  netfilter: ipset: Missing gc cancellations fixed
  octeontx2-af: Initialize maps.
  net: ethernet: ti: cpsw: enable mac_managed_pm to fix mdio
  net: ethernet: ti: cpsw_new: enable mac_managed_pm to fix mdio
  netfilter: nft_set_pipapo: remove static in nft_pipapo_get()
  netfilter: nft_compat: restrict match/target protocol to u16
  netfilter: nft_compat: reject unused compat flag
  netfilter: nft_compat: narrow down revision to unsigned 8-bits
  net: intel: fix old compiler regressions
  MAINTAINERS: Maintainer change for rds
  selftests: cmsg_ipv6: repeat the exact packet
  ...

Merge tag 'pinctrl-v6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pinctrl fix from Linus Walleij:
"A single fix for the AMD driver which affects developer laptops, the
pinctrl/GPIO driver won't probe on some systems"

* tag 'pinctrl-v6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: amd: Add IRQF_ONESHOT to the interrupt request

Merge tag 'nvme-6.8-2023-02-08' of git://git.infradead.org/nvme into block-6.8

Pull NVMe fixes from Keith:

"nvme fixes for Linux 6.8

- Update a potentially stale firmware attribute (Maurizio)
- Fixes for the recent verbose error logging (Keith, Chaitanya)
- Protection information payload size fix for passthrough (Francis)"

* tag 'nvme-6.8-2023-02-08' of git://git.infradead.org/nvme:
  nvme: use ns->head->pi_size instead of t10_pi_tuple structure size
  nvme-core: fix comment to reflect right functions
  nvme: move passthrough logging attribute to head
  nvme-host: fix the updating of the firmware version

virtio-blk: Ensure no requests in virtqueues before deleting vqs.

Ensure no remaining requests in virtqueues before resetting vdev and
deleting virtqueues. Otherwise these requests will never be completed.
It may cause the system to become unresponsive.

Function blk_mq_quiesce_queue() can ensure that requests have become
in_flight status, but it cannot guarantee that requests have been
processed by the device. Virtqueues should never be deleted before
all requests become complete status.

Function blk_mq_freeze_queue() ensure that all requests in virtqueues
become complete status. And no requests can enter in virtqueues.

Signed-off-by: Yi Sun <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

docs: kernel_feat.py: fix build error for missing files

If the directory passed to the '.. kernel-feat::' directive does not
exist or the get_feat.pl script does not find any files to extract
features from, Sphinx will report the following error:

    Sphinx parallel build error:
    UnboundLocalError: local variable 'fname' referenced before assignment
    make[2]: *** [Documentation/Makefile:102: htmldocs] Error 2

This is due to how I changed the script in c48a7c44a1d0 ("docs:
kernel_feat.py: fix potential command injection"). Before that, the
filename passed along to self.nestedParse() in this case was weirdly
just the whole get_feat.pl invocation.

We can fix it by doing what kernel_abi.py does -- just pass
self.arguments[0] as 'fname'.

Fixes: c48a7c44a1d0 ("docs: kernel_feat.py: fix potential command injection")
Cc: Justin Forbes <[email protected]>
Cc: Salvatore Bonaccorso <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: [email protected]
Signed-off-by: Vegard Nossum <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jonathan Corbet <[email protected]>

blk-iocost: Fix an UBSAN shift-out-of-bounds warning

When iocg_kick_delay() is called from a CPU different than the one which set
the delay, @now may be in the past of @iocg->delay_at leading to the
following warning:

  UBSAN: shift-out-of-bounds in block/blk-iocost.c:1359:23
  shift exponent 18446744073709 is too large for 64-bit type 'u64' (aka 'unsigned long long')
  ...
  Call Trace:
   <TASK>
   dump_stack_lvl+0x79/0xc0
   __ubsan_handle_shift_out_of_bounds+0x2ab/0x300
   iocg_kick_delay+0x222/0x230
   ioc_rqos_merge+0x1d7/0x2c0
   __rq_qos_merge+0x2c/0x80
   bio_attempt_back_merge+0x83/0x190
   blk_attempt_plug_merge+0x101/0x150
   blk_mq_submit_bio+0x2b1/0x720
   submit_bio_noacct_nocheck+0x320/0x3e0
   __swap_writepage+0x2ab/0x9d0

The underflow itself doesn't really affect the behavior in any meaningful
way; however, the past timestamp may exaggerate the delay amount calculated
later in the code, which shouldn't be a material problem given the nature of
the delay mechanism.

If @now is in the past, this CPU is racing another CPU which recently set up
the delay and there's nothing this CPU can contribute w.r.t. the delay.
Let's bail early from iocg_kick_delay() in such cases.

Reported-by: Breno Leitão <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Fixes: 5160a5a53c0c ("blk-iocost: implement delay adjustment hysteresis")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

smb: client: set correct d_type for reparse points under DFS mounts

Send query dir requests with an info level of
SMB_FIND_FILE_FULL_DIRECTORY_INFO rather than
SMB_FIND_FILE_DIRECTORY_INFO when the client is generating its own
inode numbers (e.g. noserverino) so that reparse tags still
can be parsed directly from the responses, but server won't
send UniqueId (server inode number)

Signed-off-by: Paulo Alcantara <[email protected]>
Signed-off-by: Steve French <[email protected]>

smb3: add missing null server pointer check

Address static checker warning in cifs_ses_get_chan_index():
warn: variable dereferenced before check 'server'
To be consistent, and reduce risk, we should add another check
for null server pointer.

Fixes: 88675b22d34e ("cifs: do not search for channel if server is terminating")
Reported-by: Dan Carpenter <[email protected]>
Reviewed-by: Shyam Prasad N <[email protected]>
Signed-off-by: Steve French <[email protected]>

Revert "kobject: Remove redundant checks for whether ktype is NULL"

This reverts commit 1b28cb81dab7c1eedc6034206f4e8d644046ad31.

It is reported to cause problems, so revert it for now until the root
cause can be found.

Reported-by: kernel test robot <[email protected]>
Fixes: 1b28cb81dab7 ("kobject: Remove redundant checks for whether ktype is NULL")
Cc: Zhen Lei <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Link: https://lore.kernel.org/r/2024020849-consensus-length-6264@gregkh
Signed-off-by: Greg Kroah-Hartman <[email protected]>

kprobes: Remove unnecessary initial values of variables

ri and sym is assigned first, so it does not need to initialize the
assignment.

Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Li zeming <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>

tracing/probes: Fix to set arg size and fmt after setting type from BTF

Since the BTF type setting updates probe_arg::type, the type size
calculation and setting print-fmt should be done after that.
Without this fix, the argument size and print-fmt can be wrong.

Link: https://lore.kernel.org/all/170602218196.215583.6417859469540955777.stgit@devnote2/
Fixes: b576e09701c7 ("tracing/probes: Support function parameters if BTF is available")
Cc: [email protected]
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>

tracing/probes: Fix to show a parse error for bad type for $comm

Fix to show a parse error for bad type (non-string) for $comm/$COMM and
immediate-string. With this fix, error_log file shows appropriate error
message as below.

/sys/kernel/tracing # echo 'p vfs_read $comm:u32' >> kprobe_events
sh: write error: Invalid argument
/sys/kernel/tracing # echo 'p vfs_read \"hoge":u32' >> kprobe_events
sh: write error: Invalid argument
/sys/kernel/tracing # cat error_log

[   30.144183] trace_kprobe: error: $comm and immediate-string only accepts string type
  Command: p vfs_read $comm:u32
                            ^
[   62.618500] trace_kprobe: error: $comm and immediate-string only accepts string type
  Command: p vfs_read \"hoge":u32
                              ^
Link: https://lore.kernel.org/all/170602215411.215583.2238016352271091852.stgit@devnote2/
Fixes: 3dd1f7f24f8c ("tracing: probeevent: Fix to make the type of $comm string")
Cc: [email protected]
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>

ASoC: SOF: ipc3-topology: Fix pipeline tear down logic

With the change in the widget free logic to power down the cores only
when the scheduler widgets are freed, we need to ensure that the
scheduler widget is freed only after all the widgets associated with the
scheduler are freed. This is to ensure that the secondary core that the
scheduler is scheduled to run on is kept powered on until all widgets
that need them are in use. While this works well for dynamic pipelines,
in the case of static pipelines the current logic does not take this into
account and frees all widgets in the order they occur in the
widget_list. So, modify this to ensure that the scheduler widgets are freed
only after all other types of widgets in the widget_list are freed.

Link: https://github.com/thesofproject/linux/issues/4807
Fixes: 31ed8da1c8e5 ("ASoC: SOF: sof-audio: Modify logic for enabling/disabling topology cores")
Signed-off-by: Ranjani Sridharan <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
Reviewed-by: Péter Ujfalusi <[email protected]>
Signed-off-by: Peter Ujfalusi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

ASoC: cs35l56: Fix deadlock in ASP1 mixer register initialization

Rewrite the handling of ASP1 TX mixer mux initialization to prevent a
deadlock during component_remove().

The firmware can overwrite the ASP1 TX mixer registers with
system-specific settings. This is mainly for hardware that uses the
ASP as a chip-to-chip link controlled by the firmware. Because of this
the driver cannot know the starting state of the ASP1 mixer muxes until
the firmware has been downloaded and rebooted.

The original workaround for this was to queue a work function from the
dsp_work() job. This work then read the register values (populating the
regmap cache the first time around) and then called
snd_soc_dapm_mux_update_power(). The problem with this is that it was
ultimately triggered by cs35l56_component_probe() queueing dsp_work,
which meant that it would be running in parallel with the rest of the
ASoC component and card initialization. To prevent accessing DAPM before
it was fully initialized the work function took the card mutex. But this
would deadlock if cs35l56_component_remove() was called before the work job
had completed, because ASoC calls component_remove() with the card mutex
held.

This new version removes the work function. Instead the regmap cache and
DAPM mux widgets are initialized the first time any of the associated ALSA
controls is read or written.

Signed-off-by: Richard Fitzgerald <[email protected]>
Fixes: 07f7d6e7a124 ("ASoC: cs35l56: Fix for initializing ASP1 mixer registers")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

wifi: iwlwifi: mvm: fix a crash when we run out of stations

A DoS tool that injects loads of authentication frames made our AP
crash. The iwl_mvm_is_dup() function couldn't find the per-queue
dup_data which was not allocated.

The root cause for that is that we ran out of stations in the firmware
and we didn't really add the station to the firmware, yet we didn't
return an error to mac80211.
Mac80211 was thinking that we have the station and because of that,
sta_info::uploaded was set to 1. This allowed
ieee80211_find_sta_by_ifaddr() to return a valid station object, but
that ieee80211_sta didn't have any iwl_mvm_sta object initialized and
that caused the crash mentioned earlier when we got Rx on that station.

Cc: [email protected]
Fixes: 57974a55d995 ("wifi: iwlwifi: mvm: refactor iwl_mvm_mac_sta_state_common()")
Signed-off-by: Emmanuel Grumbach <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240206175739.1f76c44b2486.I6a00955e2842f15f0a089db2f834adb9d10fbe35@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: iwlwifi: uninitialized variable in iwl_acpi_get_ppag_table()

This is an error path and Smatch complains that "tbl_rev" is uninitialized
on this path. All the other functions follow this same patter where they
set the error code and goto out_free so that's probably what was intended
here as well.

Fixes: e8e10a37c51c ("iwlwifi: acpi: move ppag code from mvm to fw/acpi")
Signed-off-by: Dan Carpenter <[email protected]>
Link: https://msgid.link/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

wifi: iwlwifi: Fix some error codes

This saves the error as PTR_ERR(wifi_pkg). The problem is that
"wifi_pkg" is a valid pointer, not an error pointer. Set the error code
to -EINVAL instead.

Fixes: 2a8084147bff ("iwlwifi: acpi: support reading and storing WRDS revision 1 and 2")
Signed-off-by: Dan Carpenter <[email protected]>
Link: https://msgid.link/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

wifi: iwlwifi: clear link_id in time_event

Before sending a SESSION PROTECTION cmd the driver checks if
the link_id indicated in the time event (and for which the cmd will be
sent) is valid and exists.
Clear the te_data::link_id when FW notifies that a session protection
ended, so the check will actually fail when it should.

Fixes: 135065837310 ("wifi: iwlwifi: support link_id in SESSION_PROTECTION cmd")
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240204235836.c64a6b3606c2.I35cdc08e8a3be282563163690f8ca3edb51a3854@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: iwlwifi: mvm: use correct address 3 in A-MSDU

As described in IEEE sta 802.11-2020, table 9-30 (Address
field contents), A-MSDU address 3 should contain the BSSID
address.

In TX_CMD we copy the MAC header from skb, and skb address 3
holds the destination address, but it may not be identical to
the BSSID.

Using the wrong destination address appears to work with (most)
receivers without MLO, but in MLO some devices are checking for
it carefully, perhaps as a consequence of link to MLD address
translation.

Replace address 3 in the TX_CMD MAC header with the correct
address while retaining the skb address 3 unchanged.
This ensures that skb address 3 will be utilized later for
constructing the A-MSDU subframes.

Note that we fill in the MLD address, but the firmware will do the
necessary translation to link address after encryption.

Signed-off-by: Daniel Gabay <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240204235836.4583a1bf9188.I3f8e7892bdf8f86b4daa28453771a8c9817b2416@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211: reload info pointer in ieee80211_tx_dequeue()

This pointer can change here since the SKB can change, so we
actually later open-coded IEEE80211_SKB_CB() again. Reload
the pointer where needed, so the monitor-mode case using it
gets fixed, and then use info-> later as well.

Cc: [email protected]
Fixes: 531682159092 ("mac80211: fix VLAN handling with TXQs")
Link: https://msgid.link/20240131164910.b54c28d583bc.I29450cec84ea6773cff5d9c16ff92b836c331471@changeid
Signed-off-by: Johannes Berg <[email protected]>

Merge tag 'nf-24-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Narrow down target/match revision to u8 in nft_compat.

2) Bail out with unused flags in nft_compat.

3) Restrict layer 4 protocol to u16 in nft_compat.

4) Remove static in pipapo get command that slipped through when
   reducing set memory footprint.

5) Follow up incremental fix for the ipset performance regression,
   this includes the missing gc cancellation, from Jozsef Kadlecsik.

6) Allow to filter by zone 0 in ctnetlink, do not interpret zone 0
   as no filtering, from Felix Huettner.

7) Reject direction for NFT_CT_ID.

8) Use timestamp to check for set element expiration while transaction
   is handled to prevent garbage collection from removing set elements
   that were just added by this transaction. Packet path and netlink
   dump/get path still use current time to check for expiration.

9) Restore NF_REPEAT in nfnetlink_queue, from Florian Westphal.

10) map_index needs to be percpu and per-set, not just percpu.
    At this time its possible for a pipapo set to fill the all-zero part
    with ones and take the 'might have bits set' as 'start-from-zero' area.
    From Florian Westphal. This includes three patches:

    - Change scratchpad area to a structure that provides space for a
      per-set-and-cpu toggle and uses it of the percpu one.

    - Add a new free helper to prepare for the next patch.

    - Remove the scratch_aligned pointer and makes AVX2 implementation
      use the exact same memory addresses for read/store of the matching
      state.

netfilter pull request 24-02-08

* tag 'nf-24-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_set_pipapo: remove scratch_aligned pointer
  netfilter: nft_set_pipapo: add helper to release pcpu scratch area
  netfilter: nft_set_pipapo: store index in scratch maps
  netfilter: nft_set_rbtree: skip end interval element from gc
  netfilter: nfnetlink_queue: un-break NF_REPEAT
  netfilter: nf_tables: use timestamp to check for set element timeout
  netfilter: nft_ct: reject direction for ct id
  netfilter: ctnetlink: fix filtering for zone 0
  netfilter: ipset: Missing gc cancellations fixed
  netfilter: nft_set_pipapo: remove static in nft_pipapo_get()
  netfilter: nft_compat: restrict match/target protocol to u16
  netfilter: nft_compat: reject unused compat flag
  netfilter: nft_compat: narrow down revision to unsigned 8-bits
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

netfilter: nft_set_pipapo: remove scratch_aligned pointer

use ->scratch for both avx2 and the generic implementation.

After previous change the scratch->map member is always aligned properly
for AVX2, so we can just use scratch->map in AVX2 too.

The alignoff delta is stored in the scratchpad so we can reconstruct
the correct address to free the area again.

Fixes: 7400b063969b ("nft_set_pipapo: Introduce AVX2-based lookup implementation")
Reviewed-by: Stefano Brivio <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nft_set_pipapo: add helper to release pcpu scratch area

After next patch simple kfree() is not enough anymore, so add
a helper for it.

Reviewed-by: Stefano Brivio <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nft_set_pipapo: store index in scratch maps

Pipapo needs a scratchpad area to keep state during matching.
This state can be large and thus cannot reside on stack.

Each set preallocates percpu areas for this.

On each match stage, one scratchpad half starts with all-zero and the other
is inited to all-ones.

At the end of each stage, the half that starts with all-ones is
always zero. Before next field is tested, pointers to the two halves
are swapped, i.e. resmap pointer turns into fill pointer and vice versa.

After the last field has been processed, pipapo stashes the
index toggle in a percpu variable, with assumption that next packet
will start with the all-zero half and sets all bits in the other to 1.

This isn't reliable.

There can be multiple sets and we can't be sure that the upper
and lower half of all set scratch map is always in sync (lookups
can be conditional), so one set might have swapped, but other might
not have been queried.

Thus we need to keep the index per-set-and-cpu, just like the
scratchpad.

Note that this bug fix is incomplete, there is a related issue.

avx2 and normal implementation might use slightly different areas of the
map array space due to the avx2 alignment requirements, so
m->scratch (generic/fallback implementation) and ->scratch_aligned
(avx) may partially overlap. scratch and scratch_aligned are not distinct
objects, the latter is just the aligned address of the former.

After this change, write to scratch_align->map_index may write to
scratch->map, so this issue becomes more prominent, we can set to 1
a bit in the supposedly-all-zero area of scratch->map[].

A followup patch will remove the scratch_aligned and makes generic and
avx code use the same (aligned) area.

Its done in a separate change to ease review.

Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Reviewed-by: Stefano Brivio <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nft_set_rbtree: skip end interval element from gc

rbtree lazy gc on insert might collect an end interval element that has
been just added in this transactions, skip end interval elements that
are not yet active.

Fixes: f718863aca46 ("netfilter: nft_set_rbtree: fix overlap expiration walk")
Cc: [email protected]
Reported-by: lonial con <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nfnetlink_queue: un-break NF_REPEAT

Only override userspace verdict if the ct hook returns something
other than ACCEPT.

Else, this replaces NF_REPEAT (run all hooks again) with NF_ACCEPT
(move to next hook).

Fixes: 6291b3a67ad5 ("netfilter: conntrack: convert nf_conntrack_update to netfilter verdicts")
Reported-by: [email protected]
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nf_tables: use timestamp to check for set element timeout

Add a timestamp field at the beginning of the transaction, store it
in the nftables per-netns area.

Update set backend .insert, .deactivate and sync gc path to use the
timestamp, this avoids that an element expires while control plane
transaction is still unfinished.

.lookup and .update, which are used from packet path, still use the
current time to check if the element has expired. And .get path and dump
also since this runs lockless under rcu read size lock. Then, there is
async gc which also needs to check the current time since it runs
asynchronously from a workqueue.

Fixes: c3e1b005ed1c ("netfilter: nf_tables: add set element timeout support")
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nft_ct: reject direction for ct id

Direction attribute is ignored, reject it in case this ever needs to be
supported

Fixes: 3087c3f7c23b ("netfilter: nft_ct: Add ct id support")
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: ctnetlink: fix filtering for zone 0

previously filtering for the default zone would actually skip the zone
filter and flush all zones.

Fixes: eff3c558bb7e ("netfilter: ctnetlink: support filtering by zone")
Reported-by: Ilya Maximets <[email protected]>
Closes: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Felix Huettner <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

s390/qeth: Fix potential loss of L3-IP@ in case of network issues

Symptom:
In case of a bad cable connection (e.g. dirty optics) a fast sequence of
network DOWN-UP-DOWN-UP could happen. UP triggers recovery of the qeth
interface. In case of a second DOWN while recovery is still ongoing, it
can happen that the IP@ of a Layer3 qeth interface is lost and will not
be recovered by the second UP.

Problem:
When registration of IP addresses with Layer 3 qeth devices fails, (e.g.
because of bad address format) the respective IP address is deleted from
its hash-table in the driver. If registration fails because of a ENETDOWN
condition, the address should stay in the hashtable, so a subsequent
recovery can restore it.

3caa4af834df ("qeth: keep ip-address after LAN_OFFLINE failure")
fixes this for registration failures during normal operation, but not
during recovery.

Solution:
Keep L3-IP address in case of ENETDOWN in qeth_l3_recover_ip(). For
consistency with qeth_l3_add_ip() we also keep it in case of EADDRINUSE,
i.e. for some reason the card already/still has this address registered.

Fixes: 4a71df50047f ("qeth: new qeth device driver")
Cc: [email protected]
Signed-off-by: Alexandra Winter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

netfilter: ipset: Missing gc cancellations fixed

The patch fdb8e12cc2cc ("netfilter: ipset: fix performance regression
in swap operation") missed to add the calls to gc cancellations
at the error path of create operations and at module unload. Also,
because the half of the destroy operations now executed by a
function registered by call_rcu(), neither NFNL_SUBSYS_IPSET mutex
or rcu read lock is held and therefore the checking of them results
false warnings.

Fixes: 97f7cf1cd80e ("netfilter: ipset: fix performance regression in swap operation")
Reported-by: [email protected]
Reported-by: Brad Spengler <[email protected]>
Reported-by: Стас Ничипорович <[email protected]>
Tested-by: Brad Spengler <[email protected]>
Tested-by: Стас Ничипорович <[email protected]>
Signed-off-by: Jozsef Kadlecsik <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

octeontx2-af: Initialize maps.

kmalloc_array() without __GFP_ZERO flag does not initialize
memory to zero. This causes issues. Use kcalloc() for maps and
bitmap_zalloc() for bitmaps.

Fixes: dd7842878633 ("octeontx2-af: Add new devlink param to configure maximum usable NIX block LFs")
Signed-off-by: Ratheesh Kannoth <[email protected]>
Reviewed-by: Brett Creeley <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'cpsw-enable-mac_managed_pm-to-fix-mdio'

Sinthu Raja says:

====================
CPSW: enable mac_managed_pm to fix mdio

This patch fix the resume/suspend issue on CPSW interface.

Reference from the foloowing patchwork:
https://lore.kernel.org/netdev/20221014144729.1159257 [email protected]/T/

V1: https://patchwork.kernel.org/project/netdevbpf/patch/20240122083414 [email protected]/
V2: https://patchwork.kernel.org/project/netdevbpf/patch/20240122093326 [email protected]/
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: ethernet: ti: cpsw: enable mac_managed_pm to fix mdio

The below commit  introduced a WARN when phy state is not in the states:
PHY_HALTED, PHY_READY and PHY_UP.
commit 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")

When cpsw resumes, there have port in PHY_NOLINK state, so the below
warning comes out. Set mac_managed_pm be true to tell mdio that the phy
resume/suspend is managed by the mac, to fix the following warning:

WARNING: CPU: 0 PID: 965 at drivers/net/phy/phy_device.c:326 mdio_bus_phy_resume+0x140/0x144
CPU: 0 PID: 965 Comm: sh Tainted: G           O       6.1.46-g247b2535b2 #1
Hardware name: Generic AM33XX (Flattened Device Tree)
unwind_backtrace from show_stack+0x18/0x1c
show_stack from dump_stack_lvl+0x24/0x2c
dump_stack_lvl from __warn+0x84/0x15c
__warn from warn_slowpath_fmt+0x1a8/0x1c8
warn_slowpath_fmt from mdio_bus_phy_resume+0x140/0x144
mdio_bus_phy_resume from dpm_run_callback+0x3c/0x140
dpm_run_callback from device_resume+0xb8/0x2b8
device_resume from dpm_resume+0x144/0x314
dpm_resume from dpm_resume_end+0x14/0x20
dpm_resume_end from suspend_devices_and_enter+0xd0/0x924
suspend_devices_and_enter from pm_suspend+0x2e0/0x33c
pm_suspend from state_store+0x74/0xd0
state_store from kernfs_fop_write_iter+0x104/0x1ec
kernfs_fop_write_iter from vfs_write+0x1b8/0x358
vfs_write from ksys_write+0x78/0xf8
ksys_write from ret_fast_syscall+0x0/0x54
Exception stack(0xe094dfa8 to 0xe094dff0)
dfa0:                   00000004 005c3fb8 00000001 005c3fb8 00000004 00000001
dfc0: 00000004 005c3fb8 b6f6bba0 00000004 00000004 0059edb8 00000000 00000000
dfe0: 00000004 bed918f0 b6f09bd3 b6e89a66

Cc: <[email protected]> # v6.0+
Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
Signed-off-by: Sinthu Raja <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: ethernet: ti: cpsw_new: enable mac_managed_pm to fix mdio

The below commit  introduced a WARN when phy state is not in the states:
PHY_HALTED, PHY_READY and PHY_UP.
commit 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")

When cpsw_new resumes, there have port in PHY_NOLINK state, so the below
warning comes out. Set mac_managed_pm be true to tell mdio that the phy
resume/suspend is managed by the mac, to fix the following warning:

WARNING: CPU: 0 PID: 965 at drivers/net/phy/phy_device.c:326 mdio_bus_phy_resume+0x140/0x144
CPU: 0 PID: 965 Comm: sh Tainted: G           O       6.1.46-g247b2535b2 #1
Hardware name: Generic AM33XX (Flattened Device Tree)
unwind_backtrace from show_stack+0x18/0x1c
show_stack from dump_stack_lvl+0x24/0x2c
dump_stack_lvl from __warn+0x84/0x15c
__warn from warn_slowpath_fmt+0x1a8/0x1c8
warn_slowpath_fmt from mdio_bus_phy_resume+0x140/0x144
mdio_bus_phy_resume from dpm_run_callback+0x3c/0x140
dpm_run_callback from device_resume+0xb8/0x2b8
device_resume from dpm_resume+0x144/0x314
dpm_resume from dpm_resume_end+0x14/0x20
dpm_resume_end from suspend_devices_and_enter+0xd0/0x924
suspend_devices_and_enter from pm_suspend+0x2e0/0x33c
pm_suspend from state_store+0x74/0xd0
state_store from kernfs_fop_write_iter+0x104/0x1ec
kernfs_fop_write_iter from vfs_write+0x1b8/0x358
vfs_write from ksys_write+0x78/0xf8
ksys_write from ret_fast_syscall+0x0/0x54
Exception stack(0xe094dfa8 to 0xe094dff0)
dfa0:                   00000004 005c3fb8 00000001 005c3fb8 00000004 00000001
dfc0: 00000004 005c3fb8 b6f6bba0 00000004 00000004 0059edb8 00000000 00000000
dfe0: 00000004 bed918f0 b6f09bd3 b6e89a66

Cc: <[email protected]> # v6.0+
Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
Signed-off-by: Sinthu Raja <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

gpio: remove GPIO device from the list unconditionally in error path

Since commit 48e1b4d369cf ("gpiolib: remove the GPIO device from the list
when it's unregistered") we remove the GPIO device entry from the global
list (used to order devices by their GPIO ranges) when unregistering the
chip, not when releasing the device. It will not happen when the last
reference is put anymore. This means, we need to remove it in error path
in gpiochip_add_data_with_key() unconditionally, without checking if the
device's .release() callback is set.

Fixes: 48e1b4d369cf ("gpiolib: remove the GPIO device from the list when it's unregistered")
Signed-off-by: Bartosz Golaszewski <[email protected]>

netfilter: nft_set_pipapo: remove static in nft_pipapo_get()

This has slipped through when reducing memory footprint for set
elements, remove it.

Fixes: 9dad402b89e8 ("netfilter: nf_tables: expose opaque set element as struct nft_elem_priv")
Reported-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

drm/xe: Remove TEST_VM_ASYNC_OPS_ERROR

TEST_VM_ASYNC_OPS_ERROR is broken and unused. Remove for now and will
pull back in a later time when it is used, fixed, and properly hidden
behind a Kconfig option. Also fixup the supported flags value.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Maarten Lankhorst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit d9890c028d66a9e1ee3cccaa081ab5aedcbfe431)
Signed-off-by: Thomas Hellström <[email protected]>

drm/xe/vm: don't ignore error when in_kthread

If GUP fails and we are in_kthread, we can have pinned = 0 and ret = 0.
If that happens we call sg_alloc_append_table_from_pages() with n_pages
= 0, which is not well behaved and can trigger:

kernel BUG at include/linux/scatterlist.h:115!

depending on if the pages array happens to be zeroed or not. Even if we
don't hit that it crashes later when trying to dma_map the returned
table.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <[email protected]>
Cc: Thomas Hellström <[email protected]>
Cc: Matthew Brost <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 8087199cd5951c1eba26003b3e4296dbb2110adf)
Signed-off-by: Thomas Hellström <[email protected]>

drm/xe: Assume large page size if VMA not yet bound

The calculation to determine max page size of a VMA during a REMAP
operations assumes the VMA has been bound. This assumption is not true
if the VMA is from an eariler operation in an array of binds. If a VMA
has not been bound use the maximum page size which will ensure the
previous / next REMAP operations are not incorrectly skipped.

Fixes: 8f33b4f054fc ("drm/xe: Avoid doing rebinds")
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 5ad6af5c91e9b942c44b657122270d935db3a813)
Signed-off-by: Thomas Hellström <[email protected]>

drm/xe/display: Fix memleak in display initialization

intel_power_domains_init is called twice in xe_device_probe:

1) intel_power_domains_init()
   xe_display_init_nommio()
   xe_device_probe()

2) intel_power_domains_init()
   intel_display_driver_probe_noirq()
   xe_display_init_noirq()
   xe_device_probe()

It needs remove one to avoid power_domains->power_wells double malloc.

unreferenced object 0xffff88811150ee00 (size 512):
  comm "systemd-udevd", pid 506, jiffies 4294674198 (age 3605.560s)
  hex dump (first 32 bytes):
    10 b4 9d a0 ff ff ff ff ff ff ff ff ff ff ff ff  ................
    ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8134b901>] __kmem_cache_alloc_node+0x1c1/0x2b0
    [<ffffffff812c98b2>] __kmalloc+0x52/0x150
    [<ffffffffa08b0033>] __set_power_wells+0xc3/0x360 [xe]
    [<ffffffffa08562fc>] xe_display_init_nommio+0x4c/0x70 [xe]
    [<ffffffffa07f0d1c>] xe_device_probe+0x3c/0x5a0 [xe]
    [<ffffffffa082e48f>] xe_pci_probe+0x33f/0x5a0 [xe]
    [<ffffffff817f2187>] local_pci_probe+0x47/0xa0
    [<ffffffff817f3db3>] pci_device_probe+0xc3/0x1f0
    [<ffffffff8192f2a2>] really_probe+0x1a2/0x410
    [<ffffffff8192f598>] __driver_probe_device+0x78/0x160
    [<ffffffff8192f6ae>] driver_probe_device+0x1e/0x90
    [<ffffffff8192f92a>] __driver_attach+0xda/0x1d0
    [<ffffffff8192c95c>] bus_for_each_dev+0x7c/0xd0
    [<ffffffff8192e159>] bus_add_driver+0x119/0x220
    [<ffffffff81930d00>] driver_register+0x60/0x120
    [<ffffffffa05e50a0>] 0xffffffffa05e50a0

The call to intel_power_domains_cleanup() needs to stay where it is for
now. The main issue is that while the init is called by the display
side, shared by i915 and xe, the cleanup is called by a non-shared code
path. Fixing that will be done as a separate commit.

Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
Signed-off-by: Xiaoming Wang <[email protected]>
[ reword commit message and explain why the fini needs to stay
  where it is ]
Reviewed-by: Lucas De Marchi <[email protected]>
Signed-off-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 86c99abb5f1b6fcd69fb268eeb2e34cb7c4f355c)
Signed-off-by: Thomas Hellström <[email protected]>

drm/xe: Map both mem.kernel_bb_pool and usm.bb_pool

For integrated devices we need to map both mem.kernel_bb_pool and
usm.bb_pool to be able to run batches from both pools.

Fixes: a682b6a42d4d ("drm/xe: Support device page faults on integrated platforms")
Tested-by: Brian Welty <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Brian Welty <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 72f86ed3c88933d6fa09b036de93621ea71097a7)
Signed-off-by: Thomas Hellström <[email protected]>

drm/xe: circumvent bogus stringop-overflow warning

gcc-13 warns about an array overflow that it sees but that is
prevented by the "asid % NUM_PF_QUEUE" calculation:

drivers/gpu/drm/xe/xe_gt_pagefault.c: In function 'xe_guc_pagefault_handler':
include/linux/fortify-string.h:57:33: error: writing 16 bytes into a region of size 0 [-Werror=stringop-overflow=]
include/linux/fortify-string.h:689:26: note: in expansion of macro '__fortify_memcpy_chk'
  689 | #define memcpy(p, q, s)  __fortify_memcpy_chk(p, q, s,                  \
      |                          ^~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/xe/xe_gt_pagefault.c:341:17: note: in expansion of macro 'memcpy'
  341 |                 memcpy(pf_queue->data + pf_queue->tail, msg, len * sizeof(u32));
      |                 ^~~~~~
drivers/gpu/drm/xe/xe_gt_types.h:102:25: note: at offset [1144, 265324] into destination object 'tile' of size 8

I found that rewriting the assignment using pointer addition rather than the
equivalent array index calculation prevents the warning, so use that instead.

I sent a bug report against gcc for the false positive warning.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113214
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 774ef5dfc95578a9079426d5106076dcd59c4dfa)
Signed-off-by: Thomas Hellström <[email protected]>

drm/xe: Pick correct userptr VMA to repin on REMAP op failure

A REMAP op is composed of 3 VMA's - unmap, prev map, and next map. When
op_execute fails with -EAGAIN we need to update the local VMA pointer to
the current op state and then repin the VMA if it is a userptr.

Fixes a failure seen in xe_vm.munmap-style-unbind-userptr-one-partial.

Fixes: b06d47be7c83 ("drm/xe: Port Xe to GPUVA")
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Maarten Lankhorst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 447f74d223b4f6cbab74963bf1099050c15374ce)
Signed-off-by: Thomas Hellström <[email protected]>

drm/xe: Take a reference in xe_exec_queue_last_fence_get()

Take a reference in xe_exec_queue_last_fence_get(). Also fix a reference
counting underflow bug VM bind and unbind.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Maarten Lankhorst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit a856b67a84169e065ebbeee50258936b1eacc9eb)
Signed-off-by: Thomas Hellström <[email protected]>

drm/xe: Fix loop in vm_bind_ioctl_ops_unwind

The logic for the unwind loop is incorrect resulting in an infinite
loop. Fix to unwind to go from the last operations list to he first.

Fixes: 617eebb9c480 ("drm/xe: Fix array of binds")
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Maarten Lankhorst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 3acc1ff1a72fce00cdbd3ef1c27108a967fd5616)
Signed-off-by: Thomas Hellström <[email protected]>

Merge tag 'v6.8-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
"Fix regressions in cbc and algif_hash, as well as an older
  NULL-pointer dereference in ccp"

* tag 'v6.8-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: algif_hash - Remove bogus SGL free on zero-length error path
  crypto: cbc - Ensure statesize is zero
  crypto: ccp - Fix null pointer dereference in __sev_platform_shutdown_locked

Merge tag 'percpu-for-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu

Pull percpu fix from Dennis Zhou:

- fix riscv wrong size passed to local_flush_tlb_range_asid()

* tag 'percpu-for-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
riscv: Fix wrong size passed to local_flush_tlb_range_asid()

nilfs2: fix potential bug in end_buffer_async_write

According to a syzbot report, end_buffer_async_write(), which handles the
completion of block device writes, may detect abnormal condition of the
buffer async_write flag and cause a BUG_ON failure when using nilfs2.

Nilfs2 itself does not use end_buffer_async_write().  But, the async_write
flag is now used as a marker by commit 7f42ec394156 ("nilfs2: fix issue
with race condition of competition between segments for dirty blocks") as
a means of resolving double list insertion of dirty blocks in
nilfs_lookup_dirty_data_buffers() and nilfs_lookup_node_buffers() and the
resulting crash.

This modification is safe as long as it is used for file data and b-tree
node blocks where the page caches are independent.  However, it was
irrelevant and redundant to also introduce async_write for segment summary
and super root blocks that share buffers with the backing device.  This
led to the possibility that the BUG_ON check in end_buffer_async_write
would fail as described above, if independent writebacks of the backing
device occurred in parallel.

The use of async_write for segment summary buffers has already been
removed in a previous change.

Fix this issue by removing the manipulation of the async_write flag for
the remaining super root block buffer.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 7f42ec394156 ("nilfs2: fix issue with race condition of competition between segments for dirty blocks")
Signed-off-by: Ryusuke Konishi <[email protected]>
Reported-by: [email protected]
Closes: https://lkml.kernel.org/r/[email protected]
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/damon/sysfs-schemes: fix wrong DAMOS tried regions update timeout setup

DAMON sysfs interface's update_schemes_tried_regions command has a timeout
of two apply intervals of the DAMOS scheme.  Having zero value DAMOS
scheme apply interval means it will use the aggregation interval as the
value.  However, the timeout setup logic is mistakenly using the sampling
interval insted of the aggregartion interval for the case.  This could
cause earlier-than-expected timeout of the command.  Fix it.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 7d6fa31a2fd7 ("mm/damon/sysfs-schemes: add timeout for update_schemes_tried_regions")
Signed-off-by: SeongJae Park <[email protected]>
Cc: <[email protected]> # 6.7.x
Signed-off-by: Andrew Morton <[email protected]>

nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()

Syzbot reported a hang issue in migrate_pages_batch() called by mbind()
and nilfs_lookup_dirty_data_buffers() called in the log writer of nilfs2.

While migrate_pages_batch() locks a folio and waits for the writeback to
complete, the log writer thread that should bring the writeback to
completion picks up the folio being written back in
nilfs_lookup_dirty_data_buffers() that it calls for subsequent log
creation and was trying to lock the folio.  Thus causing a deadlock.

In the first place, it is unexpected that folios/pages in the middle of
writeback will be updated and become dirty.  Nilfs2 adds a checksum to
verify the validity of the log being written and uses it for recovery at
mount, so data changes during writeback are suppressed.  Since this is
broken, an unclean shutdown could potentially cause recovery to fail.

Investigation revealed that the root cause is that the wait for writeback
completion in nilfs_page_mkwrite() is conditional, and if the backing
device does not require stable writes, data may be modified without
waiting.

Fix these issues by making nilfs_page_mkwrite() wait for writeback to
finish regardless of the stable write requirement of the backing device.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 1d1d1a767206 ("mm: only enforce stable page writes if the backing device requires it")
Signed-off-by: Ryusuke Konishi <[email protected]>
Reported-by: [email protected]
Closes: https://lkml.kernel.org/r/[email protected]
Tested-by: Ryusuke Konishi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

MAINTAINERS: Leo Yan has moved

I will lose access to my @linaro.org email address next week, update the
MAINTAINERS file and map it in .mailmap with the new email address.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Leo Yan <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: James Clark <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/zswap: don't return LRU_SKIP if we have dropped lru lock

LRU_SKIP can only be returned if we don't ever dropped lru lock, or we
need to return LRU_RETRY to restart from the head of lru list.

Otherwise, the iteration might continue from a cursor position that was
freed while the locks were dropped.

Actually we may need to introduce another LRU_STOP to really terminate the
ongoing shrinking scan process, when we encounter a warm page already in
the swap cache. The current list_lru implementation doesn't have this
function to early break from __list_lru_walk_one.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
Signed-off-by: Chengming Zhou <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Reviewed-by: Nhat Pham <[email protected]>
Cc: Chris Li <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super

When configuring a hugetlb filesystem via the fsconfig() syscall, there is
a possible NULL dereference in hugetlbfs_fill_super() caused by assigning
NULL to ctx->hstate in hugetlbfs_parse_param() when the requested pagesize
is non valid.

E.g: Taking the following steps:

     fd = fsopen("hugetlbfs", FSOPEN_CLOEXEC);
     fsconfig(fd, FSCONFIG_SET_STRING, "pagesize", "1024", 0);
     fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);

Given that the requested "pagesize" is invalid, ctxt->hstate will be replaced
with NULL, losing its previous value, and we will print an error:

...
...
case Opt_pagesize:
ps = memparse(param->string, &rest);
ctx->hstate = h;
if (!ctx->hstate) {
         pr_err("Unsupported page size %lu MB\n", ps / SZ_1M);
         return -EINVAL;
}
return 0;
...
...

This is a problem because later on, we will dereference ctxt->hstate in
hugetlbfs_fill_super()

...
...
sb->s_blocksize = huge_page_size(ctx->hstate);
...
...

Causing below Oops.

Fix this by replacing cxt->hstate value only when then pagesize is known
to be valid.

kernel: hugetlbfs: Unsupported page size 0 MB
kernel: BUG: kernel NULL pointer dereference, address: 0000000000000028
kernel: #PF: supervisor read access in kernel mode
kernel: #PF: error_code(0x0000) - not-present page
kernel: PGD 800000010f66c067 P4D 800000010f66c067 PUD 1b22f8067 PMD 0
kernel: Oops: 0000 [#1] PREEMPT SMP PTI
kernel: CPU: 4 PID: 5659 Comm: syscall Tainted: G            E      6.8.0-rc2-default+ #22 5a47c3fef76212addcc6eb71344aabc35190ae8f
kernel: Hardware name: Intel Corp. GROVEPORT/GROVEPORT, BIOS GVPRCRB1.86B.0016.D04.1705030402 05/03/2017
kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0
kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28
kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246
kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004
kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000
kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004
kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000
kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400
kernel: FS:  00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0
kernel: Call Trace:
kernel:  <TASK>
kernel:  ? __die_body+0x1a/0x60
kernel:  ? page_fault_oops+0x16f/0x4a0
kernel:  ? search_bpf_extables+0x65/0x70
kernel:  ? fixup_exception+0x22/0x310
kernel:  ? exc_page_fault+0x69/0x150
kernel:  ? asm_exc_page_fault+0x22/0x30
kernel:  ? __pfx_hugetlbfs_fill_super+0x10/0x10
kernel:  ? hugetlbfs_fill_super+0xb4/0x1a0
kernel:  ? hugetlbfs_fill_super+0x28/0x1a0
kernel:  ? __pfx_hugetlbfs_fill_super+0x10/0x10
kernel:  vfs_get_super+0x40/0xa0
kernel:  ? __pfx_bpf_lsm_capable+0x10/0x10
kernel:  vfs_get_tree+0x25/0xd0
kernel:  vfs_cmd_create+0x64/0xe0
kernel:  __x64_sys_fsconfig+0x395/0x410
kernel:  do_syscall_64+0x80/0x160
kernel:  ? syscall_exit_to_user_mode+0x82/0x240
kernel:  ? do_syscall_64+0x8d/0x160
kernel:  ? syscall_exit_to_user_mode+0x82/0x240
kernel:  ? do_syscall_64+0x8d/0x160
kernel:  ? exc_page_fault+0x69/0x150
kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0x76
kernel: RIP: 0033:0x7ffbc0cb87c9
kernel: Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 96 0d 00 f7 d8 64 89 01 48
kernel: RSP: 002b:00007ffc29d2f388 EFLAGS: 00000206 ORIG_RAX: 00000000000001af
kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffbc0cb87c9
kernel: RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003
kernel: RBP: 00007ffc29d2f3b0 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
kernel: R13: 00007ffc29d2f4c0 R14: 0000000000000000 R15: 0000000000000000
kernel:  </TASK>
kernel: Modules linked in: rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) sunrpc(E) netfs(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) intel_rapl_msr(E) intel_rapl_common(E) iTCO_wdt(E) intel_pmc_bxt(E) sb_edac(E) iTCO_vendor_support(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) rfkill(E) ipmi_ssif(E) kvm(E) acpi_ipmi(E) irqbypass(E) pcspkr(E) igb(E) ipmi_si(E) mei_me(E) i2c_i801(E) joydev(E) intel_pch_thermal(E) i2c_smbus(E) dca(E) lpc_ich(E) mei(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_pad(E) tiny_power_button(E) button(E) fuse(E) efi_pstore(E) configfs(E) ip_tables(E) x_tables(E) ext4(E) mbcache(E) jbd2(E) hid_generic(E) usbhid(E) sd_mod(E) t10_pi(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) polyval_clmulni(E) ahci(E) xhci_pci(E) polyval_generic(E) gf128mul(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) xhci_pci_renesas(E) libahci(E) ehci_pci(E) sha1_ssse3(E) xhci_hcd(E) ehci_hcd(E) libata(E)
kernel:  mgag200(E) i2c_algo_bit(E) usbcore(E) wmi(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) scsi_common(E) aesni_intel(E) crypto_simd(E) cryptd(E)
kernel: Unloaded tainted modules: acpi_cpufreq(E):1 fjes(E):1
kernel: CR2: 0000000000000028
kernel: ---[ end trace 0000000000000000 ]---
kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0
kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28
kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246
kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004
kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000
kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004
kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000
kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400
kernel: FS:  00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 32021982a324 ("hugetlbfs: Convert to fs_context")
Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Oscar Salvador <[email protected]>
Acked-by: Muchun Song <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mailmap: switch email address for John Moon

Add current email address as QUIC email is no longer active.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: John Moon <[email protected]>
Acked-by: Trilok Soni <[email protected]>
Cc: Elliot Berman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: zswap: fix objcg use-after-free in entry destruction

In the per-memcg LRU universe, LRU removal uses entry->objcg to determine
which list count needs to be decreased. Drop the objcg reference after
updating the LRU, to fix a possible use-after-free.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware")
Signed-off-by: Johannes Weiner <[email protected]>
Acked-by: Yosry Ahmed <[email protected]>
Reviewed-by: Nhat Pham <[email protected]>
Reviewed-by: Chengming Zhou <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/madvise: don't forget to leave lazy MMU mode in madvise_cold_or_pageout_pte_range()

We need to leave lazy MMU mode before unlocking.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: b2f557a21bc8 ("mm/madvise: add cond_resched() in madvise_cold_or_pageout_pte_range()")
Signed-off-by: Sergey Senozhatsky <[email protected]>
Cc: Jiexun Wang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

arch/arm/mm: fix major fault accounting when retrying under per-VMA lock

The change [1] missed ARM architecture when fixing major fault accounting
for page fault retry under per-VMA lock.

The user-visible effects is that it restores correct major fault
accounting that was broken after [2] was merged in 6.7 kernel. The
more detailed description is in [3] and this patch simply adds the
same fix to ARM architecture which I missed in [3].

Add missing code to fix ARM architecture fault accounting.

[1] 46e714c729c8 ("arch/mm/fault: fix major fault accounting when retrying under per-VMA lock")
[2] https://lore.kernel.org/all/20231006195318.4087158 [email protected]/
[3] https://lore.kernel.org/all/20231226214610 [email protected]/

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 12214eba1992 ("mm: handle read faults under the VMA lock")
Reported-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

selftests: core: include linux/close_range.h for CLOSE_RANGE_* macros

Correct header file is needed for getting CLOSE_RANGE_* macros.
Previously it was tested with newer glibc which didn't show the need to
include the header which was a mistake.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: ec54424923cf ("selftests: core: remove duplicate defines")
Reported-by: Aishwarya TCV <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Muhammad Usama Anjum <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/memory-failure: fix crash in split_huge_page_to_list from soft_offline_page

When I did soft offline stress test, a machine was observed to crash with
the following message:

  kernel BUG at include/linux/memcontrol.h:554!
  invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 5 PID: 3837 Comm: hwpoison.sh Not tainted 6.7.0-next-20240112-00001-g8ecf3e7fb7c8-dirty #97
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
  RIP: 0010:folio_memcg+0xaf/0xd0
  Code: 10 5b 5d c3 cc cc cc cc 48 c7 c6 08 b1 f2 b2 48 89 ef e8 b4 c5 f8 ff 90 0f 0b 48 c7 c6 d0 b0 f2 b2 48 89 ef e8 a2 c5 f8 ff 90 <0f> 0b 48 c7 c6 08 b1 f2 b2 48 89 ef e8 90 c5 f8 ff 90 0f 0b 66 66
  RSP: 0018:ffffb6c043657c98 EFLAGS: 00000296
  RAX: 000000000000004b RBX: ffff932bc1d1e401 RCX: ffff933abfb5c908
  RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff933abfb5c900
  RBP: ffffea6f04019080 R08: ffffffffb3338ce8 R09: 0000000000009ffb
  R10: 00000000000004dd R11: ffffffffb3308d00 R12: ffffea6f04019080
  R13: ffffea6f04019080 R14: 0000000000000001 R15: ffffb6c043657da0
  FS:  00007f6c60f6b740(0000) GS:ffff933abfb40000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000559c3bc8b980 CR3: 0000000107f1c000 CR4: 00000000000006f0
  Call Trace:
   <TASK>
   split_huge_page_to_list+0x4d/0x1380
   try_to_split_thp_page+0x3a/0xf0
   soft_offline_page+0x1ea/0x8a0
   soft_offline_page_store+0x52/0x90
   kernfs_fop_write_iter+0x118/0x1b0
   vfs_write+0x30b/0x430
   ksys_write+0x5e/0xe0
   do_syscall_64+0xb0/0x1b0
   entry_SYSCALL_64_after_hwframe+0x6d/0x75
  RIP: 0033:0x7f6c60d14697
  Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
  RSP: 002b:00007ffe9b72b8d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f6c60d14697
  RDX: 000000000000000c RSI: 0000559c3bc8b980 RDI: 0000000000000001
  RBP: 0000559c3bc8b980 R08: 00007f6c60dd1460 R09: 000000007fffffff
  R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
  R13: 00007f6c60e1a780 R14: 00007f6c60e16600 R15: 00007f6c60e15a00

The problem is that page->mapping is overloaded with slab->slab_list or
slabs fields now, so slab pages could be taken as non-LRU movable pages if
field slabs contains PAGE_MAPPING_MOVABLE or slab_list->prev is set to
LIST_POISON2.  These slab pages will be treated as thp later leading to
crash in split_huge_page_to_list().

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Fixes: 130d4df57390 ("mm/sl[au]b: rearrange struct slab fields to allow larger rcu_head")
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: memcg: optimize parent iteration in memcg_rstat_updated()

In memcg_rstat_updated(), we iterate the memcg being updated and its
parents to update memcg->vmstats_percpu->stats_updates in the fast path
(i.e. no atomic updates). According to my math, this is 3 memory loads
(and potentially 3 cache misses) per memcg:
- Load the address of memcg->vmstats_percpu.
- Load vmstats_percpu->stats_updates (based on some percpu calculation).
- Load the address of the parent memcg.

Avoid most of the cache misses by caching a pointer from each struct
memcg_vmstats_percpu to its parent on the corresponding CPU. In this
case, for the first memcg we have 2 memory loads (same as above):
- Load the address of memcg->vmstats_percpu.
- Load vmstats_percpu->stats_updates (based on some percpu calculation).

Then for each additional memcg, we need a single load to get the
parent's stats_updates directly. This reduces the number of loads from
O(3N) to O(2+N) -- where N is the number of memcgs we need to iterate.

Additionally, stash a pointer to memcg->vmstats in each struct
memcg_vmstats_percpu such that we can access the atomic counter that all
CPUs fold into, memcg->vmstats->stats_updates.
memcg_should_flush_stats() is changed to memcg_vmstats_needs_flush() to
accept a struct memcg_vmstats pointer accordingly.

In struct memcg_vmstats_percpu, make sure both pointers together with
stats_updates live on the same cacheline. Finally, update
mem_cgroup_alloc() to take in a parent pointer and initialize the new
cache pointers on each CPU. The percpu loop in mem_cgroup_alloc() may
look concerning, but there are multiple similar loops in the cgroup
creation path (e.g. cgroup_rstat_init()), most of which are hidden
within alloc_percpu().

According to Oliver's testing [1], this fixes multiple 30-38%
regressions in vm-scalability, will-it-scale-tlb_flush2, and
will-it-scale-fallocate1. This comes at a cost of 2 more pointers per
CPU (<2KB on a machine with 128 CPUs).

[1] https://lore.kernel.org/lkml/ZbDJsfsZt2ITyo61@xsang-OptiPlex-9020/

[[email protected]: fix struct memcg_vmstats_percpu size and alignment]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yosry Ahmed <[email protected]>
Fixes: 8d59d2214c23 ("mm: memcg: make stats flushing threshold per-memcg")
Tested-by: kernel test robot <[email protected]>
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Acked-by: Shakeel Butt <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Greg Thelen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

nilfs2: fix data corruption in dsync block recovery for small block sizes

The helper function nilfs_recovery_copy_block() of
nilfs_recovery_dsync_blocks(), which recovers data from logs created by
data sync writes during a mount after an unclean shutdown, incorrectly
calculates the on-page offset when copying repair data to the file's page
cache. In environments where the block size is smaller than the page
size, this flaw can cause data corruption and leak uninitialized memory
bytes during the recovery process.

Fix these issues by correcting this byte offset calculation on the page.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Tested-by: Ryusuke Konishi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/userfaultfd: UFFDIO_MOVE implementation should use ptep_get()

Commit c33c794828f2 ("mm: ptep_get() conversion") converted all (non-arch)
call sites to use ptep_get() instead of doing a direct dereference of the
pte.  Full rationale can be found in that commit's log.

Since then, UFFDIO_MOVE has been implemented which does 7 direct pte
dereferences.  Let's fix those up to use ptep_get().

I've asserted in the past that there is no reliable automated mechanism to
catch these; I'm relying on a combination of Coccinelle (which throws up a
lot of false positives) and some compiler magic to force a compiler error
on dereference.  But given the frequency with which new issues are coming
up, I'll add it to my todo list to try to find an automated solution.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Ryan Roberts <[email protected]>
Reviewed-by: Suren Baghdasaryan <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock)

After the recent changes nobody use siglock to read the values protected
by stats_lock, we can kill spin_lock_irq(&current->sighand->siglock) and
update the comment.

With this patch only __exit_signal() and thread_group_start_cputime() take
stats_lock under siglock.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Dylan Hatch <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats

lock_task_sighand() can trigger a hard lockup. If NR_CPUS threads call
do_task_stat() at the same time and the process has NR_THREADS, it will
spin with irqs disabled O(NR_CPUS * NR_THREADS) time.

Change do_task_stat() to use sig->stats_lock to gather the statistics
outside of ->siglock protected section, in the likely case this code will
run lockless.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Dylan Hatch <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand()

Patch series "fs/proc: do_task_stat: use sig->stats_".

do_task_stat() has the same problem as getrusage() had before "getrusage:
use sig->stats_lock rather than lock_task_sighand()": a hard lockup. If
NR_CPUS threads call lock_task_sighand() at the same time and the process
has NR_THREADS, spin_lock_irq will spin with irqs disabled O(NR_CPUS *
NR_THREADS) time.

This patch (of 3):

thread_group_cputime() does its own locking, we can safely shift
thread_group_cputime_adjusted() which does another for_each_thread loop
outside of ->siglock protected section.

Not only this removes for_each_thread() from the critical section with
irqs disabled, this removes another case when stats_lock is taken with
siglock held. We want to remove this dependency, then we can change the
users of stats_lock to not disable irqs.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Dylan Hatch <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

getrusage: use sig->stats_lock rather than lock_task_sighand()

lock_task_sighand() can trigger a hard lockup. If NR_CPUS threads call
getrusage() at the same time and the process has NR_THREADS, spin_lock_irq
will spin with irqs disabled O(NR_CPUS * NR_THREADS) time.

Change getrusage() to use sig->stats_lock, it was specifically designed
for this type of use. This way it runs lockless in the likely case.

TODO:
- Change do_task_stat() to use sig->stats_lock too, then we can
  remove spin_lock_irq(siglock) in wait_task_zombie().

- Turn sig->stats_lock into seqcount_rwlock_t, this way the
  readers in the slow mode won't exclude each other. See
  https://lore.kernel.org/all/20230913154907 [email protected]/

- stats_lock has to disable irqs because ->siglock can be taken
  in irq context, it would be very nice to change __exit_signal()
  to avoid the siglock->stats_lock dependency.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Reported-by: Dylan Hatch <[email protected]>
Tested-by: Dylan Hatch <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand()

Patch series "getrusage: use sig->stats_lock", v2.

This patch (of 2):

thread_group_cputime() does its own locking, we can safely shift
thread_group_cputime_adjusted() which does another for_each_thread loop
outside of ->siglock protected section.

This is also preparation for the next patch which changes getrusage() to
use stats_lock instead of siglock, thread_group_cputime() takes the same
lock. With the current implementation recursive read_seqbegin_or_lock()
is fine, thread_group_cputime() can't enter the slow mode if the caller
holds stats_lock, yet this looks more safe and better performance-wise.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Reported-by: Dylan Hatch <[email protected]>
Tested-by: Dylan Hatch <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: hugetlb pages should not be reserved by shmat() if SHM_NORESERVE

For shared memory of type SHM_HUGETLB, hugetlb pages are reserved in
shmget() call.  If SHM_NORESERVE flags is specified then the hugetlb pages
are not reserved.  However when the shared memory is attached with the
shmat() call the hugetlb pages are getting reserved incorrectly for
SHM_HUGETLB shared memory created with SHM_NORESERVE which is a bug.

-------------------------------
Following test shows the issue.

$cat shmhtb.c

int main()
{
int shmflags = 0660 | IPC_CREAT | SHM_HUGETLB | SHM_NORESERVE;
int shmid;

shmid = shmget(SKEY, SHMSZ, shmflags);
if (shmid < 0)
{
printf("shmat: shmget() failed, %d\n", errno);
return 1;
}
printf("After shmget()\n");
system("cat /proc/meminfo | grep -i hugepages_");

shmat(shmid, NULL, 0);
printf("\nAfter shmat()\n");
system("cat /proc/meminfo | grep -i hugepages_");

shmctl(shmid, IPC_RMID, NULL);
return 0;
}

#sysctl -w vm.nr_hugepages=20
#./shmhtb

After shmget()
HugePages_Total:      20
HugePages_Free:       20
HugePages_Rsvd:        0
HugePages_Surp:        0

After shmat()
HugePages_Total:      20
HugePages_Free:       20
HugePages_Rsvd:        5 <--
HugePages_Surp:        0
--------------------------------

Fix is to ensure that hugetlb pages are not reserved for SHM_HUGETLB shared
memory in the shmat() call.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Prakash Sangappa <[email protected]>
Acked-by: Muchun Song <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

ksmbd: free aux buffer if ksmbd_iov_pin_rsp_read fails

ksmbd_iov_pin_rsp_read() doesn't free the provided aux buffer if it
fails. Seems to be the caller's responsibility to clear the buffer in
error case.

Found by Linux Verification Center (linuxtesting.org).

Fixes: e2b76ab8b5c9 ("ksmbd: add support for read compound")
Cc: [email protected]
Signed-off-by: Fedor Pchelkin <[email protected]>
Acked-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>

ksmbd: Add kernel-doc for ksmbd_extract_sharename() function

The ksmbd_extract_sharename() function lacked a complete kernel-doc
comment. This patch adds parameter descriptions and detailed function
behavior to improve code readability and maintainability.

Signed-off-by: Yang Li <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
Acked-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>

nvme: use ns->head->pi_size instead of t10_pi_tuple structure size

Currently kernel supports 8 byte and 16 byte protection information.
So, use ns->head->pi_size instead of sizeof(struct t10_pi_tuple).

Signed-off-by: Francis Pravin <[email protected]>
Signed-off-by: Sathyavathi M <[email protected]>
Signed-off-by: Keith Busch <[email protected]>

drm/amdgpu: Fix HDP flush for VFs on nbio v7.9

HDP flush remapping is not done for VFs. Keep the original offsets in VF
environment.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Implement bounds check for stream encoder creation in DCN301

'stream_enc_regs' array is an array of dcn10_stream_enc_registers
structures. The array is initialized with four elements, corresponding
to the four calls to stream_enc_regs() in the array initializer. This
means that valid indices for this array are 0, 1, 2, and 3.

The error message 'stream_enc_regs' 4 <= 5 below, is indicating that
there is an attempt to access this array with an index of 5, which is
out of bounds. This could lead to undefined behavior

Here, eng_id is used as an index to access the stream_enc_regs array. If
eng_id is 5, this would result in an out-of-bounds access on the
stream_enc_regs array.

Thus fixing Buffer overflow error in dcn301_stream_encoder_create
reported by Smatch:
drivers/gpu/drm/amd/amdgpu/../display/dc/resource/dcn301/dcn301_resource.c:1011 dcn301_stream_encoder_create() error: buffer overflow 'stream_enc_regs' 4 <= 5

Fixes: 3a83e4e64bb1 ("drm/amd/display: Add dcn3.01 support to DC (v2)")
Cc: Roman Li <[email protected]>
Cc: Rodrigo Siqueira <[email protected]>
Cc: Aurabindo Pillai <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Reviewed-by: Roman Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Increase frame-larger-than for all display_mode_vba files

After a recent change in LLVM, allmodconfig (which has CONFIG_KCSAN=y
and CONFIG_WERROR=y enabled) has a few new instances of
-Wframe-larger-than for the mode support and system configuration
functions:

  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20v2.c:3393:6: error: stack frame size (2144) exceeds limit (2048) in 'dml20v2_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
   3393 | void dml20v2_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
        |      ^
  1 error generated.

  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn21/display_mode_vba_21.c:3520:6: error: stack frame size (2192) exceeds limit (2048) in 'dml21_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
   3520 | void dml21_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
        |      ^
  1 error generated.

  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20.c:3286:6: error: stack frame size (2128) exceeds limit (2048) in 'dml20_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
   3286 | void dml20_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
        |      ^
  1 error generated.

Without the sanitizers enabled, there are no warnings.

This was the catalyst for commit 6740ec97bcdb ("drm/amd/display:
Increase frame warning limit with KASAN or KCSAN in dml2") and that same
change was made to dml in commit 5b750b22530f ("drm/amd/display:
Increase frame warning limit with KASAN or KCSAN in dml") but the
frame_warn_flag variable was not applied to all files. Do so now to
clear up the warnings and make all these files consistent.

Cc: [email protected]
Closes: https://github.com/ClangBuiltLinux/linux/issue/1990
Signed-off-by: Nathan Chancellor <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Clear phantom stream count and plane count

When dc_state_destruct() was refactored the new phantom_stream_count
and phantom_plane_count members weren't cleared.

Fixes: 012a04b1d6af ("drm/amd/display: Refactor phantom resource allocation")
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Avoid fetching VRAM vendor info

The present way to fetch VRAM vendor information turns out to be not
reliable on GFX 9.4.3 dGPUs as well. Avoid using the data.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amd/display: Disable ODM by default for DCN35

Just ensure that ODM optimization is disabled by default.

Acked-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Update phantom pipe enable / disable sequence

Previously we would call apply_ctx_to_hw to enable and disable
phantom pipes. However, apply_ctx_to_hw can potentially update
non-phantom pipes as well which is undesired. Instead of calling
apply_ctx_to_hw as a whole, call the relevant helpers for each
phantom pipe when enabling / disabling which will avoid us modifying
hardware state for non-phantom pipes unknowingly.

The use case is for an FRL display where FRL_Update is requested
by the display. In this case link_state_valid flag is cleared in
a passive callback thread and should be handled in the next stream /
link update. However, due to the call to apply_ctx_to_hw for the
phantom pipes during a flip, the main pipes were modified outside
of the desired sequence (driver does not handle link_state_valid = 0
on flips).

Cc: [email protected] # 6.6+
Reviewed-by: Samson Tam <[email protected]>
Acked-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Alvin Lee <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Fix MST Null Ptr for RV

The change try to fix below error specific to RV platform:

BUG: kernel NULL pointer dereference, address: 0000000000000008
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 4 PID: 917 Comm: sway Not tainted 6.3.9-arch1-1 #1 124dc55df4f5272ccb409f39ef4872fc2b3376a2
Hardware name: LENOVO 20NKS01Y00/20NKS01Y00, BIOS R12ET61W(1.31 ) 07/28/2022
RIP: 0010:drm_dp_atomic_find_time_slots+0x5e/0x260 [drm_display_helper]
Code: 01 00 00 48 8b 85 60 05 00 00 48 63 80 88 00 00 00 3b 43 28 0f 8d 2e 01 00 00 48 8b 53 30 48 8d 04 80 48 8d 04 c2 48 8b 40 18 <48> 8>
RSP: 0018:ffff960cc2df77d8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8afb87e81280 RCX: 0000000000000224
RDX: ffff8afb9ee37c00 RSI: ffff8afb8da1a578 RDI: ffff8afb87e81280
RBP: ffff8afb83d67000 R08: 0000000000000001 R09: ffff8afb9652f850
R10: ffff960cc2df7908 R11: 0000000000000002 R12: 0000000000000000
R13: ffff8afb8d7688a0 R14: ffff8afb8da1a578 R15: 0000000000000224
FS:  00007f4dac35ce00(0000) GS:ffff8afe30b00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000010ddc6000 CR4: 00000000003506e0
Call Trace:
<TASK>
? __die+0x23/0x70
? page_fault_oops+0x171/0x4e0
? plist_add+0xbe/0x100
? exc_page_fault+0x7c/0x180
? asm_exc_page_fault+0x26/0x30
? drm_dp_atomic_find_time_slots+0x5e/0x260 [drm_display_helper 0e67723696438d8e02b741593dd50d80b44c2026]
? drm_dp_atomic_find_time_slots+0x28/0x260 [drm_display_helper 0e67723696438d8e02b741593dd50d80b44c2026]
compute_mst_dsc_configs_for_link+0x2ff/0xa40 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054]
? fill_plane_buffer_attributes+0x419/0x510 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054]
compute_mst_dsc_configs_for_state+0x1e1/0x250 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054]
amdgpu_dm_atomic_check+0xecd/0x1190 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054]
drm_atomic_check_only+0x5c5/0xa40
drm_mode_atomic_ioctl+0x76e/0xbc0
? _copy_to_user+0x25/0x30
? drm_ioctl+0x296/0x4b0
? __pfx_drm_mode_atomic_ioctl+0x10/0x10
drm_ioctl_kernel+0xcd/0x170
drm_ioctl+0x26d/0x4b0
? __pfx_drm_mode_atomic_ioctl+0x10/0x10
amdgpu_drm_ioctl+0x4e/0x90 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054]
__x64_sys_ioctl+0x94/0xd0
do_syscall_64+0x60/0x90
? do_syscall_64+0x6c/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f4dad17f76f
Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c>
RSP: 002b:00007ffd9ae859f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000055e255a55900 RCX: 00007f4dad17f76f
RDX: 00007ffd9ae85a90 RSI: 00000000c03864bc RDI: 000000000000000b
RBP: 00007ffd9ae85a90 R08: 0000000000000003 R09: 0000000000000003
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c03864bc
R13: 000000000000000b R14: 000055e255a7fc60 R15: 000055e255a01eb0
</TASK>
Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device ccm cmac algif_hash algif_skcipher af_alg joydev mousedev bnep >
typec libphy k10temp ipmi_msghandler roles i2c_scmi acpi_cpufreq mac_hid nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_mas>
CR2: 0000000000000008
---[ end trace 0000000000000000 ]---
RIP: 0010:drm_dp_atomic_find_time_slots+0x5e/0x260 [drm_display_helper]
Code: 01 00 00 48 8b 85 60 05 00 00 48 63 80 88 00 00 00 3b 43 28 0f 8d 2e 01 00 00 48 8b 53 30 48 8d 04 80 48 8d 04 c2 48 8b 40 18 <48> 8>
RSP: 0018:ffff960cc2df77d8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8afb87e81280 RCX: 0000000000000224
RDX: ffff8afb9ee37c00 RSI: ffff8afb8da1a578 RDI: ffff8afb87e81280
RBP: ffff8afb83d67000 R08: 0000000000000001 R09: ffff8afb9652f850
R10: ffff960cc2df7908 R11: 0000000000000002 R12: 0000000000000000
R13: ffff8afb8d7688a0 R14: ffff8afb8da1a578 R15: 0000000000000224
FS:  00007f4dac35ce00(0000) GS:ffff8afe30b00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000010ddc6000 CR4: 00000000003506e0

With a second DP monitor connected, drm_atomic_state in dm atomic check
sequence does not include the connector state for the old/existing/first
DP monitor. In such case, dsc determination policy would hit a null ptr
when it tries to iterate the old/existing stream that does not have a
valid connector state attached to it. When that happens, dm atomic check
should call drm_atomic_get_connector_state for a new connector state.
Existing dm has already done that, except for RV due to it does not have
official support of dsc where .num_dsc is not defined in dcn10 resource
cap, that prevent from getting drm_atomic_get_connector_state called.
So, skip dsc determination policy for ASICs that don't have DSC support.

Cc: [email protected] # 6.1+
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2314
Reviewed-by: Wayne Lin <[email protected]>
Acked-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Fangzhi Zuo <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Fix shared buff copy to user

ta if invoke node buffer
|-------- ta type ----------|
|-------- ta id ----------|
|-------- cmd id ----------|
|------ shared buf len -----|
|------ shared buffer ------|

ta if invoke node buffer is as above, copy shared buffer data to correct location

Signed-off-by: Stanley.Yang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>