Git Repo - linux.git/log

net/sched: Refactor qdisc_graft() for ingress and clsact Qdiscs

Grafting ingress and clsact Qdiscs does not need a for-loop in
qdisc_graft(). Refactor it. No functional changes intended.

Tested-by: Pedro Tammela <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Reviewed-by: Jamal Hadi Salim <[email protected]>
Reviewed-by: Vlad Buslov <[email protected]>
Signed-off-by: Peilin Ye <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net/sched: act_ct: Fix promotion of offloaded unreplied tuple

Currently UNREPLIED and UNASSURED connections are added to the nf flow
table. This causes the following connection packets to be processed
by the flow table which then skips conntrack_in(), and thus such the
connections will remain UNREPLIED and UNASSURED even if reply traffic
is then seen. Even still, the unoffloaded reply packets are the ones
triggering hardware update from new to established state, and if
there aren't any to triger an update and/or previous update was
missed, hardware can get out of sync with sw and still mark
packets as new.

Fix the above by:
1) Not skipping conntrack_in() for UNASSURED packets, but still
   refresh for hardware, as before the cited patch.
2) Try and force a refresh by reply-direction packets that update
   the hardware rules from new to established state.
3) Remove any bidirectional flows that didn't failed to update in
   hardware for re-insertion as bidrectional once any new packet
   arrives.

Fixes: 6a9bad0069cf ("net/sched: act_ct: offload UDP NEW connections")
Co-developed-by: Vlad Buslov <[email protected]>
Signed-off-by: Vlad Buslov <[email protected]>
Signed-off-by: Paul Blakey <[email protected]>
Reviewed-by: Florian Westphal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

wifi: iwlwifi: mvm: spin_lock_bh() to fix lockdep regression

Lockdep on 6.4-rc on ThinkPad X1 Carbon 5th says
=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
6.4.0-rc5 #1 Not tainted
-----------------------------------------------------
kworker/3:1/49 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire:
ffff8881066fa368 (&mvm_sta->deflink.lq_sta.rs_drv.pers.lock){+.+.}-{2:2}, at: rs_drv_get_rate+0x46/0xe7

and this task is already holding:
ffff8881066f80a8 (&sta->rate_ctrl_lock){+.-.}-{2:2}, at: rate_control_get_rate+0xbd/0x126
which would create a new lock dependency:
(&sta->rate_ctrl_lock){+.-.}-{2:2} -> (&mvm_sta->deflink.lq_sta.rs_drv.pers.lock){+.+.}-{2:2}

but this new dependency connects a SOFTIRQ-irq-safe lock:
(&sta->rate_ctrl_lock){+.-.}-{2:2}
etc. etc. etc.

Changing the spin_lock() in rs_drv_get_rate() to spin_lock_bh() was not
enough to pacify lockdep, but changing them all on pers.lock has worked.

Fixes: a8938bc881d2 ("wifi: iwlwifi: mvm: Add locking to the rate read flow")
Signed-off-by: Hugh Dickins <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

Merge branch 'fix-small-bugs-and-annoyances-in-tc-testing'

Vlad Buslov says:

====================
Fix small bugs and annoyances in tc-testing
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/tc-testing: Remove configs that no longer exist

Some qdiscs and classifiers have recently been retired from kernel.
However, tc-testing config is still cluttered with them which causes noise
when using merge_config.sh script to update existing config for tc-testing
compatibility. Remove the config settings for affected qdiscs and
classifiers.

Fixes: fb38306ceb9e ("net/sched: Retire ATM qdisc")
Fixes: 051d44209842 ("net/sched: Retire CBQ qdisc")
Fixes: bbe77c14ee61 ("net/sched: Retire dsmark qdisc")
Fixes: 265b4da82dbf ("net/sched: Retire rsvp classifier")
Fixes: 8c710f75256b ("net/sched: Retire tcindex classifier")
Signed-off-by: Vlad Buslov <[email protected]>
Reviewed-by: Pedro Tammela <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/tc-testing: Fix SFB db test

Setting very small value of db like 10ms introduces rounding errors when
converting to/from jiffies on some kernel configs. For example, on 250hz
the actual value will be set to 12ms which causes the test to fail:

# $ sudo ./tdc.py  -d eth2 -e 3410
#  -- ns/SubPlugin.__init__
# Test 3410: Create SFB with db setting
#
# All test results:
#
# 1..1
# not ok 1 3410 - Create SFB with db setting
#         Could not match regex pattern. Verify command output:
# qdisc sfb 1: root refcnt 2 rehash 600s db 12ms limit 1000p max 25p target 20p increment 0.000503548 decrement 4.57771e-05 penalty_rate 10pps penalty_burst 20p

Set the value to 100ms instead which currently seem to work on 100hz,
250hz, 300hz and 1000hz kernel configs.

Fixes: 6ad92dc56fca ("selftests/tc-testing: add selftests for sfb qdisc")
Signed-off-by: Vlad Buslov <[email protected]>
Reviewed-by: Pedro Tammela <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/tc-testing: Fix Error: failed to find target LOG

Add missing netfilter config dependency.

Fixes following example error when running tests via tdc.sh for all XT
tests:

# $ sudo ./tdc.py -d eth2 -e 2029
# Test 2029: Add xt action with log-prefix
# exit: 255
# exit: 0
#  failed to find target LOG
#
# bad action parsing
# parse_action: bad value (7:xt)!
# Illegal "action"
#
# -----> teardown stage *** Could not execute: "$TC actions flush action xt"
#
# -----> teardown stage *** Error message: "Error: Cannot flush unknown TC action.
# We have an error flushing
# "
# returncode 1; expected [0]
#
# -----> teardown stage *** Aborting test run.
#
# <_io.BufferedReader name=3> *** stdout ***
#
# <_io.BufferedReader name=5> *** stderr ***
# "-----> teardown stage" did not complete successfully
# Exception <class '__main__.PluginMgrTestFail'> ('teardown', ' failed to find target LOG\n\nbad action parsing\nparse_action: bad value (7:xt)!\nIllegal "action"\n', '"-----> teardown stage" did not complete successfully') (caught in test_runner, running test 2 2029 Add xt action with log-prefix stage teardown)
# ---------------
# traceback
#   File "/images/src/linux/tools/testing/selftests/tc-testing/./tdc.py", line 495, in test_runner
#     res = run_one_test(pm, args, index, tidx)
#   File "/images/src/linux/tools/testing/selftests/tc-testing/./tdc.py", line 434, in run_one_test
#     prepare_env(args, pm, 'teardown', '-----> teardown stage', tidx['teardown'], procout)
#   File "/images/src/linux/tools/testing/selftests/tc-testing/./tdc.py", line 245, in prepare_env
#     raise PluginMgrTestFail(
# ---------------
# accumulated output for this test:
#  failed to find target LOG
#
# bad action parsing
# parse_action: bad value (7:xt)!
# Illegal "action"
#
# ---------------
#
# All test results:
#
# 1..1
# ok 1 2029 - Add xt action with log-prefix # skipped - "-----> teardown stage" did not complete successfully

Fixes: 910d504bc187 ("selftests/tc-testings: add selftests for xt action")
Signed-off-by: Vlad Buslov <[email protected]>
Reviewed-by: Pedro Tammela <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/tc-testing: Fix Error: Specified qdisc kind is unknown.

All TEQL tests assume that sch_teql module is loaded. Load module in tdc.sh
before running qdisc tests.

Fixes following example error when running tests via tdc.sh for all TEQL
tests:

# $ sudo ./tdc.py -d eth2 -e 84a0
#  -- ns/SubPlugin.__init__
# Test 84a0: Create TEQL with default setting
# exit: 2
# exit: 0
# Error: Specified qdisc kind is unknown.
#
# -----> teardown stage *** Could not execute: "$TC qdisc del dev $DUMMY handle 1: root"
#
# -----> teardown stage *** Error message: "Error: Invalid handle.
# "
# returncode 2; expected [0]
#
# -----> teardown stage *** Aborting test run.
#
# <_io.BufferedReader name=3> *** stdout ***
#
# <_io.BufferedReader name=5> *** stderr ***
# "-----> teardown stage" did not complete successfully
# Exception <class '__main__.PluginMgrTestFail'> ('teardown', 'Error: Specified qdisc kind is unknown.\n', '"-----> teardown stage" did not complete successfully') (caught in test_runner, running test 2 84a0 Create TEQL with default setting stage teardown)
# ---------------
# traceback
#   File "/images/src/linux/tools/testing/selftests/tc-testing/./tdc.py", line 495, in test_runner
#     res = run_one_test(pm, args, index, tidx)
#   File "/images/src/linux/tools/testing/selftests/tc-testing/./tdc.py", line 434, in run_one_test
#     prepare_env(args, pm, 'teardown', '-----> teardown stage', tidx['teardown'], procout)
#   File "/images/src/linux/tools/testing/selftests/tc-testing/./tdc.py", line 245, in prepare_env
#     raise PluginMgrTestFail(
# ---------------
# accumulated output for this test:
# Error: Specified qdisc kind is unknown.
#
# ---------------
#
# All test results:
#
# 1..1
# ok 1 84a0 - Create TEQL with default setting # skipped - "-----> teardown stage" did not complete successfully

Fixes: cc62fbe114c9 ("selftests/tc-testing: add selftests for teql qdisc")
Signed-off-by: Vlad Buslov <[email protected]>
Reviewed-by: Victor Nogueira <[email protected]>
Reviewed-by: Pedro Tammela <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ethernet: ti: am65-cpsw: Call of_node_put() on error path

This code returns directly but it should instead call of_node_put()
to drop some reference counts.

Fixes: dab2b265dd23 ("net: ethernet: ti: am65-cpsw: Add support for SERDES configuration")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Roger Quadros <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

io_uring/net: save msghdr->msg_control for retries

If the application sets ->msg_control and we have to later retry this
command, or if it got queued with IOSQE_ASYNC to begin with, then we
need to retain the original msg_control value. This is due to the net
stack overwriting this field with an in-kernel pointer, to copy it
in. Hitting that path for the second time will now fail the copy from
user, as it's attempting to copy from a non-user address.

Cc: [email protected] # 5.10+
Link: https://github.com/axboe/liburing/issues/880
Reported-and-tested-by: Marek Majkowski <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Merge tag 'nios2_fix_v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux

Pull NIOS2 dts fix from Dinh Nguyen:

- Fix tse_mac "max-frame-size" property

* tag 'nios2_fix_v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
nios2: dts: Fix tse_mac "max-frame-size" property

Merge branch 'bpf: fix NULL dereference during extable search'

Krister Johansen says:

====================
Hi,
Enclosed are a pair of patches for an oops that can occur if an exception is
generated while a bpf subprogram is running.  One of the bpf_prog_aux entries
for the subprograms are missing an extable.  This can lead to an exception that
would otherwise be handled turning into a NULL pointer bug.

These changes were tested via the verifier and progs selftests and no
regressions were observed.

Changes from v4:
- Ensure that num_exentries is copied to prog->aux from func[0] (Feedback from
  Ilya Leoshkevich)

Changes from v3:
- Selftest style fixups (Feedback from Yonghong Song)
- Selftest needs to assert that test bpf program executed (Feedback from
  Yonghong Song)
- Selftest should combine open and load using open_and_load (Feedback from
  Yonghong Song)

Changes from v2:
- Insert only the main program's kallsyms (Feedback from Yonghong Song and
  Alexei Starovoitov)
- Selftest should use ASSERT instead of CHECK (Feedback from Yonghong Song)
- Selftest needs some cleanup (Feedback from Yonghong Song)
- Switch patch order (Feedback from Alexei Starovoitov)

Changes from v1:
- Add a selftest (Feedback From Alexei Starovoitov)
- Move to a 1-line verifier change instead of searching multiple extables
====================

Signed-off-by: Alexei Starovoitov <[email protected]>

selftests/bpf: add a test for subprogram extables

In certain situations a program with subprograms may have a NULL
extable entry.  This should not happen, and when it does, it turns a
single trap into multiple.  Add a test case for further debugging and to
prevent regressions.

The test-case contains three essentially identical versions of the same
test because just one program may not be sufficient to trigger the oops.
This is due to the fact that the items are stored in a binary tree and
have identical values so it's possible to sometimes find the ksym with
the extable.  With 3 copies, this has been reliable on this author's
test systems.

When triggered out of this test case, the oops looks like this:

   BUG: kernel NULL pointer dereference, address: 000000000000000c
   #PF: supervisor read access in kernel mode
   #PF: error_code(0x0000) - not-present page
   PGD 0 P4D 0
   Oops: 0000 [#1] PREEMPT SMP NOPTI
   CPU: 0 PID: 1132 Comm: test_progs Tainted: G           OE      6.4.0-rc3+ #2
   RIP: 0010:cmp_ex_search+0xb/0x30
   Code: cc cc cc cc e8 36 cb 03 00 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 55 48 89 e5 48 8b 07 <48> 63 0e 48 01 f1 31 d2 48 39 c8 19 d2 48 39 c8 b8 01 00 00 00 0f
   RSP: 0018:ffffb30c4291f998 EFLAGS: 00010006
   RAX: ffffffffc00b49da RBX: 0000000000000002 RCX: 000000000000000c
   RDX: 0000000000000002 RSI: 000000000000000c RDI: ffffb30c4291f9e8
   RBP: ffffb30c4291f998 R08: ffffffffab1a42d0 R09: 0000000000000001
   R10: 0000000000000000 R11: ffffffffab1a42d0 R12: ffffb30c4291f9e8
   R13: 000000000000000c R14: 000000000000000c R15: 0000000000000000
   FS:  00007fb5d9e044c0(0000) GS:ffff92e95ee00000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: 000000000000000c CR3: 000000010c3a2005 CR4: 00000000007706f0
   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
   PKRU: 55555554
   Call Trace:
    <TASK>
    bsearch+0x41/0x90
    ? __pfx_cmp_ex_search+0x10/0x10
    ? bpf_prog_45a7907e7114d0ff_handle_fexit_ret_subprogs3+0x2a/0x6c
    search_extable+0x3b/0x60
    ? bpf_prog_45a7907e7114d0ff_handle_fexit_ret_subprogs3+0x2a/0x6c
    search_bpf_extables+0x10d/0x190
    ? bpf_prog_45a7907e7114d0ff_handle_fexit_ret_subprogs3+0x2a/0x6c
    search_exception_tables+0x5d/0x70
    fixup_exception+0x3f/0x5b0
    ? look_up_lock_class+0x61/0x110
    ? __lock_acquire+0x6b8/0x3560
    ? __lock_acquire+0x6b8/0x3560
    ? __lock_acquire+0x6b8/0x3560
    kernelmode_fixup_or_oops+0x46/0x110
    __bad_area_nosemaphore+0x68/0x2b0
    ? __lock_acquire+0x6b8/0x3560
    bad_area_nosemaphore+0x16/0x20
    do_kern_addr_fault+0x81/0xa0
    exc_page_fault+0xd6/0x210
    asm_exc_page_fault+0x2b/0x30
   RIP: 0010:bpf_prog_45a7907e7114d0ff_handle_fexit_ret_subprogs3+0x2a/0x6c
   Code: f3 0f 1e fa 0f 1f 44 00 00 66 90 55 48 89 e5 f3 0f 1e fa 48 8b 7f 08 49 bb 00 00 00 00 00 80 00 00 4c 39 df 73 04 31 f6 eb 04 <48> 8b 77 00 49 bb 00 00 00 00 00 80 00 00 48 81 c7 7c 00 00 00 4c
   RSP: 0018:ffffb30c4291fcb8 EFLAGS: 00010282
   RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000
   RDX: 00000000cddf1af1 RSI: 000000005315a00d RDI: ffffffffffffffea
   RBP: ffffb30c4291fcb8 R08: ffff92e644bf38a8 R09: 0000000000000000
   R10: 0000000000000000 R11: 0000800000000000 R12: ffff92e663652690
   R13: 00000000000001c8 R14: 00000000000001c8 R15: 0000000000000003
    bpf_trampoline_251255721842_2+0x63/0x1000
    bpf_testmod_return_ptr+0x9/0xb0 [bpf_testmod]
    ? bpf_testmod_test_read+0x43/0x2d0 [bpf_testmod]
    sysfs_kf_bin_read+0x60/0x90
    kernfs_fop_read_iter+0x143/0x250
    vfs_read+0x240/0x2a0
    ksys_read+0x70/0xe0
    __x64_sys_read+0x1f/0x30
    do_syscall_64+0x68/0xa0
    ? syscall_exit_to_user_mode+0x77/0x1f0
    ? do_syscall_64+0x77/0xa0
    ? irqentry_exit+0x35/0xa0
    ? sysvec_apic_timer_interrupt+0x4d/0x90
    entry_SYSCALL_64_after_hwframe+0x72/0xdc
   RIP: 0033:0x7fb5da00a392
   Code: ac 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb be 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
   RSP: 002b:00007ffc5b3cab68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
   RAX: ffffffffffffffda RBX: 000055bee7b8b100 RCX: 00007fb5da00a392
   RDX: 00000000000001c8 RSI: 0000000000000000 RDI: 0000000000000009
   RBP: 00007ffc5b3caba0 R08: 0000000000000000 R09: 0000000000000037
   R10: 000055bee7b8c2a7 R11: 0000000000000246 R12: 000055bee78f1f60
   R13: 00007ffc5b3cae90 R14: 0000000000000000 R15: 0000000000000000
    </TASK>
   Modules linked in: bpf_testmod(OE) nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common intel_uncore_frequency_common ppdev nfit crct10dif_pclmul crc32_pclmul psmouse ghash_clmulni_intel sha512_ssse3 aesni_intel parport_pc crypto_simd cryptd input_leds parport rapl ena i2c_piix4 mac_hid serio_raw ramoops reed_solomon pstore_blk drm pstore_zone efi_pstore autofs4 [last unloaded: bpf_testmod(OE)]
   CR2: 000000000000000c

Though there may be some variation, depending on which suprogram
triggers the bug.

Signed-off-by: Krister Johansen <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/4ebf95ec857cd785b81db69f3e408c039ad8408b.1686616663.git.kjlx@templeofstupid.com
Signed-off-by: Alexei Starovoitov <[email protected]>

bpf: ensure main program has an extable

When subprograms are in use, the main program is not jit'd after the
subprograms because jit_subprogs sets a value for prog->bpf_func upon
success.  Subsequent calls to the JIT are bypassed when this value is
non-NULL.  This leads to a situation where the main program and its
func[0] counterpart are both in the bpf kallsyms tree, but only func[0]
has an extable.  Extables are only created during JIT.  Now there are
two nearly identical program ksym entries in the tree, but only one has
an extable.  Depending upon how the entries are placed, there's a chance
that a fault will call search_extable on the aux with the NULL entry.

Since jit_subprogs already copies state from func[0] to the main
program, include the extable pointer in this state duplication.
Additionally, ensure that the copy of the main program in func[0] is not
added to the bpf_prog_kallsyms table. Instead, let the main program get
added later in bpf_prog_load().  This ensures there is only a single
copy of the main program in the kallsyms table, and that its tag matches
the tag observed by tooling like bpftool.

Cc: [email protected]
Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs")
Signed-off-by: Krister Johansen <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Acked-by: Ilya Leoshkevich <[email protected]>
Tested-by: Ilya Leoshkevich <[email protected]>
Link: https://lore.kernel.org/r/6de9b2f4b4724ef56efbb0339daaa66c8b68b1e7.1686616663.git.kjlx@templeofstupid.com
Signed-off-by: Alexei Starovoitov <[email protected]>

nios2: dts: Fix tse_mac "max-frame-size" property

The given value of 1518 seems to refer to the layer 2 ethernet frame
size without 802.1Q tag. Actual use of the "max-frame-size" including in
the consumer of the "altr,tse-1.0" compatible is the MTU.

Fixes: 95acd4c7b69c ("nios2: Device tree support")
Fixes: 61c610ec61bb ("nios2: Add Max10 device tree")
Cc: <[email protected]>
Signed-off-by: Janne Grunau <[email protected]>
Signed-off-by: Dinh Nguyen <[email protected]>

drm/amd/display: limit DPIA link rate to HBR3

[Why]
DPIA doesn't support UHBR, driver should not enable UHBR
for dp tunneling

[How]
limit DPIA link rate to HBR3

Cc: Mario Limonciello <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: [email protected]
Acked-by: Stylon Wang <[email protected]>
Signed-off-by: Peichen Huang <[email protected]>
Reviewed-by: Mustapha Ghaddar <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: fix the system hang while disable PSR

[Why]
When the PSR enabled. If you try to adjust the timing parameters,
it may cause system hang. Because the timing mismatch with the
DMCUB settings.

[How]
Disable the PSR before adjusting timing parameters.

Cc: Mario Limonciello <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: [email protected]
Acked-by: Stylon Wang <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Reviewed-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: edp do not add non-edid timings

[Why] most edp support only timings from edid. applying
non-edid timings, especially those timings out of edp
bandwidth, may damage edp.

[How] do not add non-edid timings for edp.

Cc: Mario Limonciello <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: [email protected]
Acked-by: Stylon Wang <[email protected]>
Signed-off-by: Hersen Wu <[email protected]>
Reviewed-by: Roman Li <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

Revert "drm/amdgpu: remove TOPDOWN flags when allocating VRAM in large bar system"

This reverts commit c105518679b6e87232874ffc989ec403bee59664.

This patch disables the TOPDOWN flag for APU and few dGPU cards
which has the VRAM size equal to the BAR size.

When we enable the TOPDOWN flag, we get the free blocks at
the highest available memory region and we don't split the
lower order blocks. This change is required to keep off
the fragmentation related issues particularly in ASIC
which has VRAM space <= 500MiB

Hence, we are reverting this patch.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2270
Signed-off-by: Arunpravin Paneer Selvam <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amdgpu: vcn_4_0 set instance 0 init sched score to 1

Only vcn0 can process AV1 codecx. In order to use both vcn0 and
vcn1 in h264/265 transcode to AV1 cases, set vcn0 sched score to 1
at initialization time.

Signed-off-by: Sonny Jiang <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 6.1.x

drm/radeon: Disable outputs when releasing fbdev client

Disable the modesetting pipeline before release the radeon's fbdev
client. Fixes the following error:

[   17.217408] WARNING: CPU: 5 PID: 1464 at drivers/gpu/drm/ttm/ttm_bo.c:326 ttm_bo_release+0x27e/0x2d0 [ttm]
[   17.217418] Modules linked in: edac_mce_amd radeon(+) drm_ttm_helper ttm video drm_suballoc_helper drm_display_helper kvm irqbypass drm_kms_helper syscopyarea crc32_pclmul sysfillrect sha512_ssse3 sysimgblt sha512_generic cfbfillrect cfbimgblt wmi_bmof aesni_intel cfbcopyarea crypto_simd cryptd k10temp acpi_cpufreq wmi dm_mod
[   17.217432] CPU: 5 PID: 1464 Comm: systemd-udevd Not tainted 6.4.0-rc4+ #1
[   17.217436] Hardware name: Micro-Star International Co., Ltd. MS-7A38/B450M PRO-VDH MAX (MS-7A38), BIOS B.G0 07/26/2022
[   17.217438] RIP: 0010:ttm_bo_release+0x27e/0x2d0 [ttm]
[   17.217444] Code: 48 89 43 38 48 89 43 40 48 8b 5c 24 30 48 8b b5 40 08 00 00 48 8b 6c 24 38 48 83 c4 58 e9 7a 49 f7 e0 48 89 ef e9 6c fe ff ff <0f> 0b 48 83 7b 20 00 0f 84 b7 fd ff ff 0f 0b 0f 1f 00 e9 ad fd ff
[   17.217448] RSP: 0018:ffffc9000095fbb0 EFLAGS: 00010202
[   17.217451] RAX: 0000000000000001 RBX: ffff8881052c8de0 RCX: 0000000000000000
[   17.217453] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8881052c8de0
[   17.217455] RBP: ffff888104a66e00 R08: ffff8881052c8de0 R09: ffff888104a7cf08
[   17.217457] R10: ffffc9000095fbe0 R11: ffffc9000095fbe8 R12: ffff8881052c8c78
[   17.217458] R13: ffff8881052c8c78 R14: dead000000000100 R15: ffff88810528b108
[   17.217460] FS:  00007f319fcbb8c0(0000) GS:ffff88881a540000(0000) knlGS:0000000000000000
[   17.217463] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   17.217464] CR2: 000055dc8b0224a0 CR3: 000000010373d000 CR4: 0000000000750ee0
[   17.217466] PKRU: 55555554
[   17.217468] Call Trace:
[   17.217470]  <TASK>
[   17.217472]  ? __warn+0x97/0x160
[   17.217476]  ? ttm_bo_release+0x27e/0x2d0 [ttm]
[   17.217481]  ? report_bug+0x1ec/0x200
[   17.217487]  ? handle_bug+0x3c/0x70
[   17.217490]  ? exc_invalid_op+0x1f/0x90
[   17.217493]  ? preempt_count_sub+0xb5/0x100
[   17.217496]  ? asm_exc_invalid_op+0x16/0x20
[   17.217500]  ? ttm_bo_release+0x27e/0x2d0 [ttm]
[   17.217505]  ? ttm_resource_move_to_lru_tail+0x1ab/0x1d0 [ttm]
[   17.217511]  radeon_bo_unref+0x1a/0x30 [radeon]
[   17.217547]  radeon_gem_object_free+0x20/0x30 [radeon]
[   17.217579]  radeon_fbdev_fb_destroy+0x57/0x90 [radeon]
[   17.217616]  unregister_framebuffer+0x72/0x110
[   17.217620]  drm_client_dev_unregister+0x6d/0xe0
[   17.217623]  drm_dev_unregister+0x2e/0x90
[   17.217626]  drm_put_dev+0x26/0x90
[   17.217628]  pci_device_remove+0x44/0xc0
[   17.217631]  really_probe+0x257/0x340
[   17.217635]  __driver_probe_device+0x73/0x120
[   17.217638]  driver_probe_device+0x2c/0xb0
[   17.217641]  __driver_attach+0xa0/0x150
[   17.217643]  ? __pfx___driver_attach+0x10/0x10
[   17.217646]  bus_for_each_dev+0x67/0xa0
[   17.217649]  bus_add_driver+0x10e/0x210
[   17.217651]  driver_register+0x5c/0x120
[   17.217653]  ? __pfx_radeon_module_init+0x10/0x10 [radeon]
[   17.217681]  do_one_initcall+0x44/0x220
[   17.217684]  ? kmalloc_trace+0x37/0xc0
[   17.217688]  do_init_module+0x64/0x240
[   17.217691]  __do_sys_finit_module+0xb2/0x100
[   17.217694]  do_syscall_64+0x3b/0x90
[   17.217697]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[   17.217700] RIP: 0033:0x7f319feaa5a9
[   17.217702] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 08 0d 00 f7 d8 64 89 01 48
[   17.217706] RSP: 002b:00007ffc6bf3e7f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   17.217709] RAX: ffffffffffffffda RBX: 00005607204f3170 RCX: 00007f319feaa5a9
[   17.217710] RDX: 0000000000000000 RSI: 00007f31a002eefd RDI: 0000000000000018
[   17.217712] RBP: 00007f31a002eefd R08: 0000000000000000 R09: 00005607204f1860
[   17.217714] R10: 0000000000000018 R11: 0000000000000246 R12: 0000000000020000
[   17.217716] R13: 0000000000000000 R14: 0000560720522450 R15: 0000560720255899
[   17.217718]  </TASK>
[   17.217719] ---[ end trace 0000000000000000 ]---

The buffer object backing the fbdev emulation got pinned twice: by the
fb_probe helper radeon_fbdev_create_pinned_object() and the modesetting
code when the framebuffer got displayed. It only got unpinned once by
the fbdev helper radeon_fbdev_destroy_pinned_object(). Hence TTM's BO-
release function complains about the pin counter. Forcing the outputs
off also undoes the modesettings pin increment.

Tested-by: Borislav Petkov (AMD) <[email protected]>
Reported-by: Borislav Petkov <[email protected]>
Closes: https://lore.kernel.org/dri-devel/20230603174814.GCZHt83pN+wNjf63sC@fat_crate.local/
Signed-off-by: Thomas Zimmermann <[email protected]>
Fixes: e317a69fe891 ("drm/radeon: Implement client-based fbdev emulation")
Cc: Alex Deucher <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: "Pan, Xinhui" <[email protected]>
Cc: [email protected]
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/pm: workaround for compute workload type on some skus

On smu 13.0.0, the compute workload type cannot be set on all the skus
due to some other problems. This workaround is to make sure compute workload type
can also run on some specific skus.

v2: keep the variable consistent

Signed-off-by: Kenneth Feng <[email protected]>
Acked-by: Lijo Lazar <[email protected]>
Reviewed-by: Feifei Xu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 6.1.x

drm/amd: Tighten permissions on VBIOS flashing attributes

Non-root users shouldn't be able to try to trigger a VBIOS flash
or query the flashing status. This should be reserved for users with the
appropriate permissions.

Cc: [email protected]
Fixes: 8424f2ccb3c0 ("drm/amdgpu/psp: Add vbflash sysfs interface support")
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd: Make sure image is written to trigger VBIOS image update flow

The VBIOS image update flow requires userspace to:
1) Write the image to `psp_vbflash`
2) Read `psp_vbflash`
3) Poll `psp_vbflash_status` to check for completion

If userspace reads `psp_vbflash` before writing an image, it's
possible that it causes problems that can put the dGPU into an invalid
state.

Explicitly check that an image has been written before letting a read
succeed.

Cc: [email protected]
Fixes: 8424f2ccb3c0 ("drm/amdgpu/psp: Add vbflash sysfs interface support")
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: add missing radeon secondary PCI ID

0x5b70 is a missing RV370 secondary id. Add it so
we don't try and probe it with amdgpu.

Cc: [email protected]
Reviewed-by: Michel Dänzer <[email protected]>
Tested-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amdgpu: Implement gfx9 patch functions for resubmission

Patch the packages including CONTEXT_CONTROL and WRITE_DATA for gfx9
during the resubmission scenario.

Signed-off-by: Jiadong Zhu <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 6.3.x

drm/amdgpu: Modify indirect buffer packages for resubmission

When the preempted IB frame resubmitted to cp, we need to modify the frame
data including:
1. set PRE_RESUME 1 in CONTEXT_CONTROL.
2. use meta data(DE and CE) read from CSA in WRITE_DATA.

Add functions to save the location the first time IBs emitted and callback
to patch the package when resubmission happens.

Signed-off-by: Jiadong Zhu <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 6.3.x

drm/amdgpu: Program gds backup address as zero if no gds allocated

It is firmware requirement to set gds_backup_addrlo and gds_backup_addrhi
of DE meta both zero if no gds partition is allocated for the frame.

Signed-off-by: Jiadong Zhu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 6.3.x

drm/nouveau: add nv_encoder pointer check for NULL

Pointer nv_encoder could be dereferenced at nouveau_connector.c
in case it's equal to NULL by jumping to goto label.
This patch adds a NULL-check to avoid it.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 3195c5f9784a ("drm/nouveau: set encoder for lvds")
Signed-off-by: Natalia Petrova <[email protected]>
Reviewed-by: Lyude Paul <[email protected]>
[Fixed patch title]
Signed-off-by: Lyude Paul <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/amdgpu: Reset CP_VMID_PREEMPT after trailing fence signaled

When MEC executes unmap_queue for mid command buffer preemption, it will
kick the write pointer of the gfx ring, set CP_VMID_PREEMPT to trigger the
preemption and wait for CP_VMID_PREEMPT becomes zero after the preemption
done. There is a race condition that PFP may excute the resetting command
before MEC set CP_VMID_PREEMPT. As a result, hang happens as
CP_VMID_PREEMPT is always 0xffff.

To avoid this, we send resetting CP_VMID_PREEMPT command after the trailing
fence is siganled and update gfx write pointer explicitly.

Signed-off-by: Jiadong Zhu <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 6.3.x
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2535

drm/nouveau/dp: check for NULL nv_connector->native_mode

Add checking for NULL before calling nouveau_connector_detect_depth() in
nouveau_connector_get_modes() function because nv_connector->native_mode
could be dereferenced there since connector pointer passed to
nouveau_connector_detect_depth() and the same value of
nv_connector->native_mode is used there.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: d4c2c99bdc83 ("drm/nouveau/dp: remove broken display depth function, use the improved one")
Signed-off-by: Natalia Petrova <[email protected]>
Reviewed-by: Lyude Paul <[email protected]>
Signed-off-by: Lyude Paul <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

spi: dw: Replace incorrect spi_get_chipselect with set

Commit 445164e8c136 ("spi: dw: Replace spi->chip_select references with
function calls") replaced direct access to spi.chip_select with
spi_*_chipselect calls but incorrectly replaced a set instance with a
get instance, replace the incorrect instance.

Fixes: 445164e8c136 ("spi: dw: Replace spi->chip_select references with function calls")
Signed-off-by: Abe Kohandel <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Acked-by: Serge Semin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

drm/bridge: ti-sn65dsi86: Avoid possible buffer overflow

Smatch error:buffer overflow 'ti_sn_bridge_refclk_lut' 5 <= 5.

Fixes: cea86c5bb442 ("drm/bridge: ti-sn65dsi86: Implement the pwm_chip")
Signed-off-by: Su Hui <[email protected]>
Reviewed-by: Douglas Anderson <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'devicetree-fixes-for-6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

- Fix missing of_node_put() in init_overlay_changeset()

- Fix schema for qcom,pmic-mpp "qcom,paired" property

- Fix 'additionalProperties' in silvaco,i3c-master binding

- usage-model.rst: Use documented "arm,primecell" compatible string

- Update Damien Le Moal's email address

- Fixes in Realtek Bluetooth binding

* tag 'devicetree-fixes-for-6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: pinctrl: qcom,pmic-mpp: Fix schema for "qcom,paired"
  dt-bindings: i3c: silvaco,i3c-master: fix missing schema restriction
  of: overlay: Fix missing of_node_put() in error case of init_overlay_changeset()
  docs: zh_CN/devicetree: sync usage-model fix
  docs: dt: fix documented Primecell compatible string
  dt-bindings: Change Damien Le Moal's contact email
  dt-bindings: net: realtek-bluetooth: Fix double RTL8723CS in desc
  dt-bindings: net: realtek-bluetooth: Fix RTL8821CS binding

gpio: sifive: add missing check for platform_get_irq

Add the missing check for platform_get_irq() and return error code
if it fails.

The returned error code will be dealed with in
builtin_platform_driver(sifive_gpio_driver) and the driver will not
be registered.

Fixes: f52d6d8b43e5 ("gpio: sifive: To get gpio irq offset from device tree data")
Signed-off-by: Jiasheng Jiang <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>

mmc: mmci: stm32: fix max busy timeout calculation

The way that the timeout is currently calculated could lead to a u64
timeout value in mmci_start_command(). This value is then cast in a u32
register that leads to mmc erase failed issue with some SD cards.

Fixes: 8266c585f489 ("mmc: mmci: add hardware busy timeout feature")
Signed-off-by: Yann Gautier <[email protected]>
Signed-off-by: Christophe Kerello <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>

dt-bindings: pinctrl: qcom,pmic-mpp: Fix schema for "qcom,paired"

The "qcom,paired" schema is all wrong. First, it's a list rather than an
object(dictionary). Second, it is missing a required type. The meta-schema
normally catches this, but schemas under "$defs" was not getting checked.
A fix for that is pending.

Fixes: f9a06b810951 ("dt-bindings: pinctrl: qcom,pmic-mpp: Convert qcom pmic mpp bindings to YAML")
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>

gpiolib: Fix GPIO chip IRQ initialization restriction

In case of gpio-regmap, IRQ chip is added by regmap-irq and associated with
GPIO chip by gpiochip_irqchip_add_domain(). The initialization flag was not
added in gpiochip_irqchip_add_domain(), causing gpiochip_to_irq() to return
-EPROBE_DEFER.

Fixes: 5467801f1fcb ("gpio: Restrict usage of GPIO chip irq members before initialization")
Signed-off-by: Jiawen Wu <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>

regmap: regcache: Don't sync read-only registers

regcache_maple_sync() tries to sync all cached values no matter
whether it's writable or not.  OTOH, regache_sync_val() does care the
wrtability and returns -EIO for a read-only register.  This results in
an error message like:
  snd_hda_codec_realtek hdaudioC0D0: Unable to sync register 0x2f0009. -5
and the sync loop is aborted incompletely.

This patch adds the writable register check to regcache_sync_val() for
addressing the bug above.

Note that, although we may add the check in the caller side
(regcache_maple_sync()), here we put in regcache_sync_val(), so that a
similar case like this can be avoided in future.

Fixes: f033c26de5a5 ("regmap: Add maple tree based register cache")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

ASoC: tegra: Fix Master Volume Control

Commit 3ed2b549b39f ("ALSA: pcm: fix wait_time calculations") corrected
the PCM wait_time calculations and in doing so reduced the calculated
wait_time. This exposed an issue with the Tegra Master Volume Control
(MVC) device where the reduced wait_time caused the MVC to fail. For now
fix this by setting the default wait_time for Tegra to be 500ms.

Fixes: 3ed2b549b39f ("ALSA: pcm: fix wait_time calculations")
Signed-off-by: Jon Hunter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

tty: serial: fsl_lpuart: reduce RX watermark to 0 on LS1028A

LS1028A is using DMA with LPUART. Having RX watermark set to 1, means
DMA transactions are started only after receiving the second character.

On other platforms with newer LPUART IP, Receiver Idle Empty function
initiates the DMA request after the receiver is idling for 4 characters.
But this feature is missing on LS1028A, which is causing a 1-character
delay in the RX direction on this platform.

Set RX watermark to 0 to initiate RX DMA after each character.

Link: https://lore.kernel.org/linux-serial/[email protected]/
Fixes: 9ad9df844754 ("tty: serial: fsl_lpuart: Fix the wrong RXWATER setting for rx dma case")
Cc: stable <[email protected]>
Signed-off-by: Robert Hodaszi <[email protected]>
Message-ID: <20230609121334.1878626 [email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/nouveau: don't detect DSM for non-NVIDIA device

The call site of nouveau_dsm_pci_probe() uses single set of output
variables for all invocations. So, we must not write anything to them
unless it's an NVIDIA device. Otherwise, if we are called with another
device after the NVIDIA device, we'll clober the result of the NVIDIA
device.

For example, if the other device doesn't have _PR3 resources, the
detection later would miss the presence of power resource support, and
the rest of the code will keep using Optimus DSM, breaking power
management for that machine.

Also, because we're detecting NVIDIA's DSM, it doesn't make sense to run
this detection on a non-NVIDIA device anyway. Thus, check at the
beginning of the detection code if this is an NVIDIA card, and just
return if it isn't.

This, together with commit d22915d22ded ("drm/nouveau/devinit/tu102-:
wait for GFW_BOOT_PROGRESS == COMPLETED") developed independently and
landed earlier, fixes runtime power management of the NVIDIA card in
Lenovo Legion 5-15ARH05. Without this patch, the GPU resumption code
will "timeout", sometimes hanging userspace.

As a bonus, we'll also stop preventing _PR3 usage from the bridge for
unrelated devices, which is always nice, I guess.

Fixes: ccfc2d5cdb02 ("drm/nouveau: Use generic helper to check _PR3 presence")
Signed-off-by: Ratchanan Srirattanamet <[email protected]>
Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/79
Reviewed-by: Karol Herbst <[email protected]>
Signed-off-by: Karol Herbst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/DM6PR19MB2780805D4BE1E3F9B3AC96D0BC409@DM6PR19MB2780.namprd19.prod.outlook.com

usb: gadget: udc: core: Prevent soft_connect_store() race

usb_udc_connect_control(), soft_connect_store() and
usb_gadget_deactivate() can potentially race against each other to invoke
usb_gadget_connect()/usb_gadget_disconnect(). To prevent this, guard
udc->started, gadget->allow_connect, gadget->deactivate and
gadget->connect with connect_lock so that ->pullup() is only invoked when
the gadget is bound, started and not deactivated. The routines
usb_gadget_connect_locked(), usb_gadget_disconnect_locked(),
usb_udc_connect_control_locked(), usb_gadget_udc_start_locked(),
usb_gadget_udc_stop_locked() are called with this lock held.

An earlier version of this commit was reverted due to the crash reported in
https://lore.kernel.org/all/[email protected]/.
commit 16737e78d190 ("usb: gadget: udc: core: Offload usb_udc_vbus_handler processing")
addresses the crash reported.

Cc: [email protected]
Fixes: 628ef0d273a6 ("usb: udc: add usb_udc_vbus_handler")
Signed-off-by: Badhri Jagan Sridharan <[email protected]>
Reviewed-by: Alan Stern <[email protected]>
Message-ID: <20230609010227 [email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: gadget: udc: core: Offload usb_udc_vbus_handler processing

usb_udc_vbus_handler() can be invoked from interrupt context by irq
handlers of the gadget drivers, however, usb_udc_connect_control() has
to run in non-atomic context due to the following:
a. Some of the gadget driver implementations expect the ->pullup
callback to be invoked in non-atomic context.
b. usb_gadget_disconnect() acquires udc_lock which is a mutex.

Hence offload invocation of usb_udc_connect_control()
to workqueue.

UDC should not be pulled up unless gadget driver is bound. The new flag
"allow_connect" is now set by gadget_bind_driver() and cleared by
gadget_unbind_driver(). This prevents work item to pull up the gadget
even if queued when the gadget driver is already unbound.

Cc: [email protected]
Fixes: 1016fc0c096c ("USB: gadget: Fix obscure lockdep violation for udc_mutex")
Signed-off-by: Badhri Jagan Sridharan <[email protected]>
Reviewed-by: Alan Stern <[email protected]>
Message-ID: <20230609010227 [email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: typec: Fix fast_role_swap_current show function

The current implementation mistakenly performs a & operation on
the output of sysfs_emit. This patch performs the & operation before
calling sysfs_emit.

Fixes: 662a60102c12 ("usb: typec: Separate USB Power Delivery from USB Type-C")
Cc: stable <[email protected]>
Reported-by: Benson Leung <[email protected]>
Signed-off-by: Pavan Holla <[email protected]>
Reviewed-by: Heikki Krogerus <[email protected]>
Reviewed-by: Benson Leung <[email protected]>
Message-ID: <20230607193328.3359487 [email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: typec: ucsi: Fix command cancellation

The Cancel command was passed to the write callback as the
offset instead of as the actual command which caused NULL
pointer dereference.

Reported-by: Stephan Bolten <[email protected]>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217517
Fixes: 094902bc6a3c ("usb: typec: ucsi: Always cancel the command if PPM reports BUSY condition")
Cc: [email protected]
Signed-off-by: Heikki Krogerus <[email protected]>
Message-ID: <20230606115802 [email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

USB: dwc3: fix use-after-free on core driver unbind

Some dwc3 glue drivers are currently accessing the driver data of the
child core device directly, which is clearly a bad idea as the child may
not have probed yet or may have been unbound from its driver.

As a workaround until the glue drivers have been fixed, clear the driver
data pointer before allowing the glue parent device to runtime suspend
to prevent its driver from accessing data that has been freed during
unbind.

Fixes: 6dd2565989b4 ("usb: dwc3: add imx8mp dwc3 glue layer driver")
Fixes: 6895ea55c385 ("usb: dwc3: qcom: Configure wakeup interrupts during suspend")
Cc: [email protected] # 5.12
Cc: Li Jun <[email protected]>
Cc: Sandeep Maheswaram <[email protected]>
Cc: Krishna Kurapati <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Acked-by: Thinh Nguyen <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
Message-ID: <20230607100540 [email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

USB: dwc3: qcom: fix NULL-deref on suspend

The Qualcomm dwc3 glue driver is currently accessing the driver data of
the child core device during suspend and on wakeup interrupts. This is
clearly a bad idea as the child may not have probed yet or could have
been unbound from its driver.

The first such layering violation was part of the initial version of the
driver, but this was later made worse when the hack that accesses the
driver data of the grand child xhci device to configure the wakeup
interrupts was added.

Fixing this properly is not that easily done, so add a sanity check to
make sure that the child driver data is non-NULL before dereferencing it
for now.

Note that this relies on subtleties like the fact that driver core is
making sure that the parent is not suspended while the child is probing.

Reported-by: Manivannan Sadhasivam <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Fixes: d9152161b4bf ("usb: dwc3: Add Qualcomm DWC3 glue layer driver")
Fixes: 6895ea55c385 ("usb: dwc3: qcom: Configure wakeup interrupts during suspend")
Cc: [email protected] # 3.18: a872ab303d5d: "usb: dwc3: qcom: fix use-after-free on runtime-PM wakeup"
Cc: Sandeep Maheswaram <[email protected]>
Cc: Krishna Kurapati <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Acked-by: Thinh Nguyen <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
Message-ID: <20230607100540 [email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: dwc3: gadget: Reset num TRBs before giving back the request

Consider a scenario where cable disconnect happens when there is an active
usb reqest queued to the UDC. As part of the disconnect we would issue an
end transfer with no interrupt-on-completion before giving back this
request. Since we are giving back the request without skipping TRBs the
num_trbs field of dwc3_request still holds the stale value previously used.
Function drivers re-use same request for a given bind-unbind session and
hence their dwc3_request context gets preserved across cable
disconnect/connect. When such a request gets re-queued after cable connect,
we would increase the num_trbs field on top of the previous stale value
thus incorrectly representing the number of TRBs used. Fix this by
resetting num_trbs field before giving back the request.

Fixes: 09fe1f8d7e2f ("usb: dwc3: gadget: track number of TRBs per request")
Cc: stable <[email protected]>
Signed-off-by: Elson Roy Serrao <[email protected]>
Acked-by: Thinh Nguyen <[email protected]>
Message-ID: <1685654850 [email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: gadget: udc: renesas_usb3: Fix RZ/V2M {modprobe,bind} error

Currently {modprobe, bind} after {rmmod, unbind} results in probe failure.

genirq: Flags mismatch irq 22. 00000004 (85070400.usb3drd) vs. 00000004 (85070400.usb3drd)
renesas_usb3: probe of 85070000.usb3peri failed with error -16

The reason is, it is trying to register an interrupt handler for the same
IRQ twice. The devm_request_irq() was called with the parent device.
So the interrupt handler won't be unregistered when the usb3-peri device
is unbound.

Fix this issue by replacing "parent dev"->"dev" as the irq resource
is managed by this driver.

Fixes: 9cad72dfc556 ("usb: gadget: Add support for RZ/V2M USB3DRD driver")
Cc: stable <[email protected]>
Signed-off-by: Biju Das <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Message-ID: <20230530161720 [email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

clk: clk-loongson2: Zero init clk_init_data

As clk_core_populate_parent_map() checks clk_init_data.num_parents
first, and checks clk_init_data.parent_names[] before
clk_init_data.parent_data[] and clk_init_data.parent_hws[].

Therefore the clk_init_data structure needs to be explicitly initialised
to prevent an unexpected crash if clk_init_data.parent_names[] is a
random value.

CPU 0 Unable to handle kernel paging request at virtual address 0000000000000dc0, era == 9000000002986290, ra == 900000000298624c
Oops[#1]:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-rc2+ #4582
pc 9000000002986290 ra 900000000298624c tp 9000000100094000 sp 9000000100097a60
a0 9000000104541e00 a1 0000000000000000 a2 0000000000000dc0 a3 0000000000000001
a4 90000001000979f0 a5 90000001800977d7 a6 0000000000000000 a7 900000000362a000
t0 90000000034f3548 t1 6f8c2a9cb5ab5f64 t2 0000000000011340 t3 90000000031cf5b0
t4 0000000000000dc0 t5 0000000000000004 t6 0000000000011300 t7 9000000104541e40
t8 000000000005a4f8 u0 9000000104541e00 s9 9000000104541e00 s0 9000000104bc4700
s1 9000000104541da8 s2 0000000000000001 s3 900000000356f9d8 s4 ffffffffffffffff
s5 0000000000000000 s6 0000000000000dc0 s7 90000000030d0a88 s8 0000000000000000
    ra: 900000000298624c __clk_register+0x228/0x84c
   ERA: 9000000002986290 __clk_register+0x26c/0x84c
  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
  PRMD: 00000004 (PPLV0 +PIE -PWE)
  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
  ECFG: 00071c1c (LIE=2-4,10-12 VS=7)
ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
  BADV: 0000000000000dc0
  PRID: 0014a000 (Loongson-64bit, )
Modules linked in:
Process swapper/0 (pid: 1, threadinfo=(____ptrval____), task=(____ptrval____))
Stack : 90000000031c1810 90000000030d0a88 900000000325bac0 90000000034f3548
         90000001002ab410 9000000104541e00 0000000000000dc0 9000000003150098
         90000000031c1810 90000000031a0460 900000000362a000 90000001002ab410
         900000000362a000 9000000104541da8 9000000104541de8 90000001002ab410
         900000000362a000 9000000002986a68 90000000034f3ed8 90000000030d0aa8
         9000000104541da8 900000000298d3b8 90000000031c1810 0000000000000000
         90000000034f3ed8 90000000030d0aa8 0000000000000dc0 90000000030d0a88
         90000001002ab410 900000000298d401 0000000000000000 6f8c2a9cb5ab5f64
         90000000034f4000 90000000030d0a88 9000000003a48a58 90000001002ab410
         9000000104bd81a8 900000000298d484 9000000100020260 0000000000000000
         ...
Call Trace:
[<9000000002986290>] __clk_register+0x26c/0x84c
[<9000000002986a68>] devm_clk_hw_register+0x5c/0xe0
[<900000000298d3b8>] loongson2_clk_register.constprop.0+0xdc/0x10c
[<900000000298d484>] loongson2_clk_probe+0x9c/0x4ac
[<9000000002a4eba4>] platform_probe+0x68/0xc8
[<9000000002a4bf80>] really_probe+0xbc/0x2f0
[<9000000002a4c23c>] __driver_probe_device+0x88/0x128
[<9000000002a4c318>] driver_probe_device+0x3c/0x11c
[<9000000002a4c5dc>] __driver_attach+0x98/0x18c
[<9000000002a49ca0>] bus_for_each_dev+0x80/0xe0
[<9000000002a4b0dc>] bus_add_driver+0xfc/0x1ec
[<9000000002a4d4a8>] driver_register+0x68/0x134
[<90000000020f0110>] do_one_initcall+0x50/0x188
[<9000000003150f00>] kernel_init_freeable+0x224/0x294
[<90000000030240fc>] kernel_init+0x20/0x110
[<90000000020f1568>] ret_from_kernel_thread+0xc/0xa4

Fixes: acc0ccffec50 ("clk: clk-loongson2: add clock controller driver support")
Cc: [email protected]
Cc: Yinbo Zhu <[email protected]>
Signed-off-by: Binbin Zhou <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Stephen Boyd <[email protected]>

clk: mediatek: mt8365: Fix inverted topclk operations

The given operations are inverted for the wrong registers which makes
multiple of the mt8365 hardware units unusable. In my setup at least usb
did not work.

Fixed by swapping the operations with the inverted ones.

Reported-by: Alexandre Mergnat <[email protected]>
Fixes: 905b7430d3cc ("clk: mediatek: mt8365: Convert simple_gate to mtk_gate clocks")
Signed-off-by: Markus Schneider-Pargmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Tested-by: Alexandre Mergnat <[email protected]>
Reviewed-by: Alexandre Mergnat <[email protected]>
Reviewed-by: Matthias Brugger <[email protected]>
Reviewed-by: AngeloGioacchino Del Regno <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>

clk: composite: Fix handling of high clock rates

ULONG_MAX is used by a few drivers to figure out the highest available
clock rate via clk_round_rate(clk, ULONG_MAX). Since abs() takes a
signed value as input, the current logic effectively calculates with
ULONG_MAX = -1, which results in the worst parent clock being chosen
instead of the best one.

For example on Rockchip RK3588 the eMMC driver tries to figure out
the highest available clock rate. There are three parent clocks
available resulting in the following rate diffs with the existing
logic:

GPLL:   abs(18446744073709551615 - 1188000000) = 1188000001
CPLL:   abs(18446744073709551615 - 1500000000) = 1500000001
XIN24M: abs(18446744073709551615 -   24000000) =   24000001

As a result the clock framework will promote a maximum supported
clock rate of 24 MHz, even though 1.5GHz are possible. With the
updated logic any casting between signed and unsigned is avoided
and the numbers look like this instead:

GPLL:   18446744073709551615 - 1188000000 = 18446744072521551615
CPLL:   18446744073709551615 - 1500000000 = 18446744072209551615
XIN24M: 18446744073709551615 -   24000000 = 18446744073685551615

As a result the parent with the highest acceptable rate is chosen
instead of the parent clock with the lowest one.

Cc: [email protected]
Fixes: 49502408007b ("mmc: sdhci-of-dwcmshc: properly determine max clock on Rockchip")
Tested-by: Christopher Obbard <[email protected]>
Signed-off-by: Sebastian Reichel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: AngeloGioacchino Del Regno <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>

Merge branch 'mptcp-fixes'

Matthieu Baerts says:

====================
selftests: mptcp: skip tests not supported by old kernels (part 3)

After a few years of increasing test coverage in the MPTCP selftests, we
realised [1] the last version of the selftests is supposed to run on old
kernels without issues.

Supporting older versions is not that easy for this MPTCP case: these
selftests are often validating the internals by checking packets that
are exchanged, when some MIB counters are incremented after some
actions, how connections are getting opened and closed in some cases,
etc. In other words, it is not limited to the socket interface between
the userspace and the kernelspace.

In addition to that, the current MPTCP selftests run a lot of different
sub-tests but the TAP13 protocol used in the selftests don't support
sub-tests: one failure in sub-tests implies that the whole selftest is
seen as failed at the end because sub-tests are not tracked. It is then
important to skip sub-tests not supported by old kernels.

To minimise the modifications and reduce the complexity to support old
versions, the idea is to look at external signs and skip the whole
selftest or just some sub-tests before starting them. This cannot be
applied in all cases.

Similar to the second part, this third one focuses on marking different
sub-tests as skipped if some MPTCP features are not supported. This
time, only in "mptcp_join.sh" selftest, the remaining one, is modified.
Several techniques are used here to achieve this task:

- Before starting some tests:

  - Check if a file (sysctl knob) is present: that's what patch 12/17 is
    doing for the userspace PM feature.

  - Check if a required kernel symbol is present in /proc/kallsyms:
    patches 9, 10, 14 and 15/17 are using this technique.

  - Check if it is possible to setup a particular network environment
    requiring Netfilter or TC: if the preparation step fail, the linked
    sub-test is marked as skipped. Patch 5/17 is doing that.

  - Check if a MIB counter is available: patches 7 and 13/17 do that.

  - Check if the kernel version is newer than a specific one: patch 1/17
    adds some helpers in mptcp_lib.sh to ease its use. That's not ideal
    and it is only used as last resort but as mentioned above, it is
    important to skip tests if they are not supported not to have the
    whole selftest always being marked as failed on old kernels. Patches
    11 and 17/17 are checking the kernel version. An alternative would
    be to ignore the results for some sub-tests but that's not ideal
    too. Note that SELFTESTS_MPTCP_LIB_NO_KVERSION_CHECK env var can be
    set to 1 not to skip these tests if the running kernel doesn't have
    a supported version.

- After having launched the tests:

  - Adapt the expectations depending on the presence of a kernel symbol
    (patch 6/17) or a kernel version (patch 8/17).

  - Check is a MIB counter is available and skip the verification if
    not. Patch 4/17 is using this technique.

Before skipping tests, SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var
value is checked: if it is set to 1, the test is marked as "failed"
instead of "skipped". MPTCP public CI expects to have all features
supported and it sets this env var to 1 to catch regressions in these
new checks.

Patch 2/17 uses 'iptables-legacy' if available because it might be
needed when using an older kernel not supporting iptables-nft.

Patch 3/17 adds some helpers used in the other patches mentioned to
easily mark sub-tests as skipped.

Patch 16/17 uniforms MPTCP Join "listener" tests: it was imported code
from userspace_pm.sh but without using the "code style" and ways of
using tools and printing messages from MPTCP Join selftest.

Link: https://lore.kernel.org/stable/CA+G9fYtDGpgT4dckXD-y-N92nqUxuvue_7AtDdBcHrbOMsDZLg@mail.gmail.com/
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
====================

Link: https://lore.kernel.org/r/20230609-upstream-net-20230610-mptcp-selftests-support-old-kernels-part-3-v1-0-2896fe2ee8a3@tessares.net
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: join: skip mixed tests if not supported

Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.

One of them is the support of a mix of subflows in v4 and v6 by the
in-kernel PM introduced by commit b9d69db87fb7 ("mptcp: let the
in-kernel PM use mixed IPv4 and IPv6 addresses").

It looks like there is no external sign we can use to predict the
expected behaviour. Instead of accepting different behaviours and thus
not really checking for the expected behaviour, we are looking here for
a specific kernel version. That's not ideal but it looks better than
removing the test because it cannot support older kernel versions.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: ad3493746ebe ("selftests: mptcp: add test-cases for mixed v4/v6 subflows")
Cc: [email protected]
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: join: uniform listener tests

The alignment was different from the other tests because tabs were used
instead of spaces.

While at it, also use 'echo' instead of 'printf' to print the result to
keep the same style as done in the other sub-tests. And, even if it
should be better with, also remove 'stdbuf' and sed's '--unbuffered'
option because they are not used in the other subtests and they are not
available when using a minimal environment with busybox.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 178d023208eb ("selftests: mptcp: listener test for in-kernel PM")
Cc: [email protected]
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: join: skip PM listener tests if not supported

Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.

One of them is the support of PM listener events introduced by commit
f8c9dfbd875b ("mptcp: add pm listener events").

It is possible to look for "mptcp_event_pm_listener" in kallsyms to know
in advance if the kernel supports this feature.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 178d023208eb ("selftests: mptcp: listener test for in-kernel PM")
Cc: [email protected]
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: join: skip MPC backups tests if not supported

Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.

One of them is the support of sending an MP_PRIO signal for the initial
subflow, introduced by commit c157bbe776b7 ("mptcp: allow the in kernel
PM to set MPC subflow priority").

It is possible to look for "mptcp_subflow_send_ack" in kallsyms because
it was needed to introduce the mentioned feature. So we can know in
advance if the feature is supported instead of trying and accepting any
results.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 914f6a59b10f ("selftests: mptcp: add MPC backup tests")
Cc: [email protected]
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: join: skip fail tests if not supported

Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.

One of them is the support of the MP_FAIL / infinite mapping introduced
by commit 1e39e5a32ad7 ("mptcp: infinite mapping sending") and the
following ones.

It is possible to look for one of the infinite mapping counters to know
in advance if the this feature is available.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: b6e074e171bc ("selftests: mptcp: add infinite map testcase")
Cc: [email protected]
Fixes: 2ba18161d407 ("selftests: mptcp: add MP_FAIL reset testcase")
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: join: skip userspace PM tests if not supported

Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.

One of them is the support of the userspace PM introduced by commit
4638de5aefe5 ("mptcp: handle local addrs announced by userspace PMs")
and the following ones.

It is possible to look for the MPTCP pm_type's sysctl knob to know in
advance if the userspace PM is available.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 5ac1d2d63451 ("selftests: mptcp: Add tests for userspace PM type")
Cc: [email protected]
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: join: skip fullmesh flag tests if not supported

Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.

One of them is the support of the fullmesh flag for the in-kernel PM
introduced by commit 2843ff6f36db ("mptcp: remote addresses fullmesh")
and commit 1a0d6136c5f0 ("mptcp: local addresses fullmesh").

It looks like there is no easy external sign we can use to predict the
expected behaviour. We could add the flag and then check if it has been
added but for that, and for each fullmesh test, we would need to setup a
new environment, do the checks, clean it and then only start the test
from yet another clean environment. To keep it simple and avoid
introducing new issues, we look for a specific kernel version. That's
not ideal but an acceptable solution for this case.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 6a0653b96f5d ("selftests: mptcp: add fullmesh setting tests")
Cc: [email protected]
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: join: skip backup if set flag on ID not supported

Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.

Commit bccefb762439 ("selftests: mptcp: simplify pm_nl_change_endpoint")
has simplified the way the backup flag is set on an endpoint. Instead of
doing:

./pm_nl_ctl set 10.0.2.1 flags backup

Now we do:

./pm_nl_ctl set id 1 flags backup

The new way is easier to maintain but it is also incompatible with older
kernels not supporting the implicit endpoints putting in place the
infrastructure to set flags per ID, hence the second Fixes tag.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: bccefb762439 ("selftests: mptcp: simplify pm_nl_change_endpoint")
Cc: [email protected]
Fixes: 4cf86ae84c71 ("mptcp: strict local address ID selection")
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: join: skip implicit tests if not supported

Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.

One of them is the support of the implicit endpoints introduced by
commit d045b9eb95a9 ("mptcp: introduce implicit endpoints").

It is possible to look for "mptcp_subflow_send_ack" in kallsyms because
it was needed to introduce the mentioned feature. So we can know in
advance if the feature is supported instead of trying and accepting any
results.

Note that here and in the following commits, we re-do the same check for
each sub-test of the same function for a few reasons. The main one is
not to break the ID assign to each test in order to be able to easily
compare results between different kernel versions. Also, we can still
run a specific test even if it is skipped. Another reason is that it
makes it clear during the review that a specific subtest will be skipped
or not under certain conditions. At the end, it looks OK to call the
exact same helper multiple times: it is not a critical path and it is
the same code that is executed, not really more cases to maintain.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 69c6ce7b6eca ("selftests: mptcp: add implicit endpoint test case")
Cc: [email protected]
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: join: support RM_ADDR for used endpoints or not

Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.

At some points, a new feature caused internal behaviour changes we are
verifying in the selftests, see the Fixes tag below. It was not a UAPI
change but because in these selftests, we check some internal
behaviours, it is normal we have to adapt them from time to time after
having added some features.

It looks like there is no external sign we can use to predict the
expected behaviour. Instead of accepting different behaviours and thus
not really checking for the expected behaviour, we are looking here for
a specific kernel version. That's not ideal but it looks better than
removing the test because it cannot support older kernel versions.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 6fa0174a7c86 ("mptcp: more careful RM_ADDR generation")
Cc: [email protected]
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: join: skip Fastclose tests if not supported

Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.

One of them is the support of MP_FASTCLOSE introduced in commit
f284c0c77321 ("mptcp: implement fastclose xmit path").

If the MIB counter is not available, the test cannot be verified and the
behaviour will not be the expected one. So we can skip the test if the
counter is missing.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 01542c9bf9ab ("selftests: mptcp: add fastclose testcase")
Cc: [email protected]
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: join: support local endpoint being tracked or not

Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.

At some points, a new feature caused internal behaviour changes we are
verifying in the selftests, see the Fixes tag below. It was not a uAPI
change but because in these selftests, we check some internal
behaviours, it is normal we have to adapt them from time to time after
having added some features.

It is possible to look for "mptcp_pm_subflow_check_next" in kallsyms
because it was needed to introduce the mentioned feature. So we can know
in advance what the behaviour we are expecting here instead of
supporting the two behaviours.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk")
Cc: [email protected]
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: join: skip test if iptables/tc cmds fail

Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.

Some tests are using IPTables and/or TC commands to force some
behaviours. If one of these commands fails -- likely because some
features are not available due to missing kernel config -- we should
intercept the error and skip the tests requiring these features.

Note that if we expect to have these features available and if
SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var is set to 1, the tests
will be marked as failed instead of skipped.

This patch also replaces the 'exit 1' by 'return 1' not to stop the
selftest in the middle without the conclusion if there is an issue with
NF or TC.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 8d014eaa9254 ("selftests: mptcp: add ADD_ADDR timeout test case")
Cc: [email protected]
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: join: skip check if MIB counter not supported

Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.

One of them is the MPTCP MIB counters introduced in commit fc518953bc9c
("mptcp: add and use MIB counter infrastructure") and more later. The
MPTCP Join selftest heavily relies on these counters.

If a counter is not supported by the kernel, it is not displayed when
using 'nstat -z'. We can then detect that and skip the verification. A
new helper (get_counter()) has been added to do the required checks and
return an error if the counter is not available.

Note that if we expect to have these features available and if
SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var is set to 1, the tests
will be marked as failed instead of skipped.

This new helper also makes sure we get the exact counter we want to
avoid issues we had in the past, e.g. with MPTcpExtRmAddr and
MPTcpExtRmAddrDrop sharing the same prefix. While at it, we uniform the
way we fetch a MIB counter.

Note for the backports: we rarely change these modified blocks so if
there is are conflicts, it is very likely because a counter is not used
in the older kernels and we don't need that chunk.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: b08fbf241064 ("selftests: add test-cases for MPTCP MP_JOIN")
Cc: [email protected]
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: join: helpers to skip tests

Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.

Here are some helpers that will be used to mark subtests as skipped if a
feature is not supported. Marking as a fix for the commit introducing
this selftest to help with the backports.

While at it, also check if kallsyms feature is available as it will also
be used in the following commits to check if MPTCP features are
available before starting a test.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: b08fbf241064 ("selftests: add test-cases for MPTCP MP_JOIN")
Cc: [email protected]
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: join: use 'iptables-legacy' if available

IPTables commands using 'iptables-nft' fail on old kernels, at least
5.15 because it doesn't see the default IPTables chains:

$ iptables -L
iptables/1.8.2 Failed to initialize nft: Protocol not supported

As a first step before switching to NFTables, we can use iptables-legacy
if available.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 8d014eaa9254 ("selftests: mptcp: add ADD_ADDR timeout test case")
Cc: [email protected]
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: lib: skip if not below kernel version

Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.

A new function is now available to easily detect if a feature is
missing by looking at the kernel version. That's clearly not ideal and
this kind of check should be avoided as soon as possible. But sometimes,
there are no external sign that a "feature" is available or not:
internal behaviours can change without modifying the uAPI and these
selftests are verifying the internal behaviours. Sometimes, the only
(easy) way to verify if the feature is present is to run the test but
then the validation cannot determine if there is a failure with the
feature or if the feature is missing. Then it looks better to check the
kernel version instead of having tests that can never fail. In any case,
we need a solution not to have a whole selftest being marked as failed
just because one sub-test has failed.

Note that this env var car be set to 1 not to do such check and run the
linked sub-test: SELFTESTS_MPTCP_LIB_NO_KVERSION_CHECK.

This new helper is going to be used in the following commits. In order
to ease the backport of such future patches, it would be good if this
patch is backported up to the introduction of MPTCP selftests, hence the
Fixes tag below: this type of check was supposed to be done from the
beginning.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 048d19d444be ("mptcp: add basic kselftest for mptcp")
Cc: [email protected]
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'fixes-for-q-usgmii-speeds-and-autoneg'

Maxime Chevallier says:

====================
fixes for Q-USGMII speeds and autoneg

This is the second version of a small changeset for QUSGMII support,
fixing inconsistencies in reported max speed and control word parsing.

As reported here [1], there are some inconsistencies for the Q-USGMII
mode speeds and configuration. The first patch in this fixup series
makes so that we correctly report the max speed of 1Gbps for this mode.

The second patch uses a dedicated helper to decode the control word.
This is necessary as although USGMII control words are close to USXGMII,
they don't support the same speeds.

[1] : https://lore.kernel.org/netdev/[email protected]/
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phylink: use a dedicated helper to parse usgmii control word

Q-USGMII is a derivative of USGMII, that uses a specific formatting for
the control word. The layout is close to the USXGMII control word, but
doesn't support speeds over 1Gbps. Use a dedicated decoding logic for
the USGMII control word, re-using USXGMII definitions but only considering
10/100/1000Mbps speeds

Fixes: 5e61fe157a27 ("net: phy: Introduce QUSGMII PHY mode")
Signed-off-by: Maxime Chevallier <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: phylink: report correct max speed for QUSGMII

Q-USGMII is the quad port version of USGMII, and supports a max speed of
1Gbps on each line. Make so that phylink_interface_max_speed() reports
this information correctly.

Fixes: ae0e4bb2a0e0 ("net: phylink: Adjust link settings based on rate matching")
Signed-off-by: Maxime Chevallier <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

btrfs: do not ASSERT() on duplicated global roots

[BUG]
Syzbot reports a reproducible ASSERT() when using rescue=usebackuproot
mount option on a corrupted fs.

The full report can be found here:
https://syzkaller.appspot.com/bug?extid=c4614eae20a166c25bf0

  BTRFS error (device loop0: state C): failed to load root csum
  assertion failed: !tmp, in fs/btrfs/disk-io.c:1103
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/ctree.h:3664!
  invalid opcode: 0000 [#1] PREEMPT SMP KASAN
  CPU: 1 PID: 3608 Comm: syz-executor356 Not tainted 6.0.0-rc7-syzkaller-00029-g3800a713b607 #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
  RIP: 0010:assertfail+0x1a/0x1c fs/btrfs/ctree.h:3663
  RSP: 0018:ffffc90003aaf250 EFLAGS: 00010246
  RAX: 0000000000000032 RBX: 0000000000000000 RCX: f21c13f886638400
  RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
  RBP: ffff888021c640a0 R08: ffffffff816bd38d R09: ffffed10173667f1
  R10: ffffed10173667f1 R11: 1ffff110173667f0 R12: dffffc0000000000
  R13: ffff8880229c21f7 R14: ffff888021c64060 R15: ffff8880226c0000
  FS:  0000555556a73300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 000055a2637d7a00 CR3: 00000000709c4000 CR4: 00000000003506e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   btrfs_global_root_insert+0x1a7/0x1b0 fs/btrfs/disk-io.c:1103
   load_global_roots_objectid+0x482/0x8c0 fs/btrfs/disk-io.c:2467
   load_global_roots fs/btrfs/disk-io.c:2501 [inline]
   btrfs_read_roots fs/btrfs/disk-io.c:2528 [inline]
   init_tree_roots+0xccb/0x203c fs/btrfs/disk-io.c:2939
   open_ctree+0x1e53/0x33df fs/btrfs/disk-io.c:3574
   btrfs_fill_super+0x1c6/0x2d0 fs/btrfs/super.c:1456
   btrfs_mount_root+0x885/0x9a0 fs/btrfs/super.c:1824
   legacy_get_tree+0xea/0x180 fs/fs_context.c:610
   vfs_get_tree+0x88/0x270 fs/super.c:1530
   fc_mount fs/namespace.c:1043 [inline]
   vfs_kern_mount+0xc9/0x160 fs/namespace.c:1073
   btrfs_mount+0x3d3/0xbb0 fs/btrfs/super.c:1884

[CAUSE]
Since the introduction of global roots, we handle
csum/extent/free-space-tree roots as global roots, even if no
extent-tree-v2 feature is enabled.

So for regular csum/extent/fst roots, we load them into
fs_info::global_root_tree rb tree.

And we should not expect any conflicts in that rb tree, thus we have an
ASSERT() inside btrfs_global_root_insert().

But rescue=usebackuproot can break the assumption, as we will try to
load those trees again and again as long as we have bad roots and have
backup roots slot remaining.

So in that case we can have conflicting roots in the rb tree, and
triggering the ASSERT() crash.

[FIX]
We can safely remove that ASSERT(), as the caller will properly put the
offending root.

To make further debugging easier, also add two explicit error messages:

- Error message for conflicting global roots
- Error message when using backup roots slot

Reported-by: [email protected]
Fixes: abed4aaae4f7 ("btrfs: track the csum, extent, and free space trees in a rb tree")
CC: [email protected] # 6.1+
Signed-off-by: Qu Wenruo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

Merge tag 'mm-hotfixes-stable-2023-06-12-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
"19 hotfixes. 14 are cc:stable and the remainder address issues which
  were introduced during this development cycle or which were considered
  inappropriate for a backport"

* tag 'mm-hotfixes-stable-2023-06-12-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  zswap: do not shrink if cgroup may not zswap
  page cache: fix page_cache_next/prev_miss off by one
  ocfs2: check new file size on fallocate call
  mailmap: add entry for John Keeping
  mm/damon/core: fix divide error in damon_nr_accesses_to_accesses_bp()
  epoll: ep_autoremove_wake_function should use list_del_init_careful
  mm/gup_test: fix ioctl fail for compat task
  nilfs2: reject devices with insufficient block count
  ocfs2: fix use-after-free when unmounting read-only filesystem
  lib/test_vmalloc.c: avoid garbage in page array
  nilfs2: fix possible out-of-bounds segment allocation in resize ioctl
  riscv/purgatory: remove PGO flags
  powerpc/purgatory: remove PGO flags
  x86/purgatory: remove PGO flags
  kexec: support purgatories with .text.hot sections
  mm/uffd: allow vma to merge as much as possible
  mm/uffd: fix vma operation where start addr cuts part of vma
  radix-tree: move declarations to header
  nilfs2: fix incomplete buffer cleanup in nilfs_btnode_abort_change_key()

btrfs: can_nocow_file_extent should pass down args->strict from callers

Commit 619104ba453ad0 ("btrfs: move common NOCOW checks against a file
extent into a helper") changed our call to btrfs_cross_ref_exist() to
always pass false for the 'strict' parameter.  We're passing this down
through the stack so that we can do a full check for cross references
during swapfile activation.

With strict always false, this test fails:

  btrfs subvol create swappy
  chattr +C swappy
  fallocate -l1G swappy/swapfile
  chmod 600 swappy/swapfile
  mkswap swappy/swapfile

  btrfs subvol snap swappy swapsnap
  btrfs subvol del -C swapsnap

  btrfs fi sync /
  sync;sync;sync

  swapon swappy/swapfile

The fix is to just use args->strict, and everyone except swapfile
activation is passing false.

Fixes: 619104ba453ad0 ("btrfs: move common NOCOW checks against a file extent into a helper")
CC: [email protected] # 6.1+
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: fix iomap_begin length for nocow writes

can_nocow_extent can reduce the len passed in, which needs to be
propagated to btrfs_dio_iomap_begin so that iomap does not submit
more data then is mapped.

This problems exists since the btrfs_get_blocks_direct helper was added
in commit c5794e51784a ("btrfs: Factor out write portion of
btrfs_get_blocks_direct"), but the ordered_extent splitting added in
commit b73a6fd1b1ef ("btrfs: split partial dio bios before submit")
added a WARN_ON that made a syzkaller test fail.

Reported-by: [email protected]
Fixes: c5794e51784a ("btrfs: Factor out write portion of btrfs_get_blocks_direct")
CC: [email protected] # 6.1+
Tested-by: [email protected]
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: David Sterba <[email protected]>

igb: fix nvm.ops.read() error handling

Add error handling into igb_set_eeprom() function, in case
nvm.ops.read() fails just quit with error code asap.

Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver")
Signed-off-by: Aleksandr Loktionov <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

igc: Fix possible system crash when loading module

Guarantee that when probe() is run again, PTM and PCI busmaster will be
in the same state as it was if the driver was never loaded.

Avoid an i225/i226 hardware issue that PTM requests can be made even
though PCI bus mastering is not enabled. These unexpected PTM requests
can crash some systems.

So, "force" disable PTM and busmastering before removing the driver,
so they can be re-enabled in the right order during probe(). This is
more like a workaround and should be applicable for i225 and i226, in
any platform.

Fixes: 1b5d73fb8624 ("igc: Enable PCIe PTM")
Signed-off-by: Vinicius Costa Gomes <[email protected]>
Reviewed-by: Muhammad Husaini Zulkifli <[email protected]>
Tested-by: Naama Meir <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

igc: Clean the TX buffer and TX descriptor ring

There could be a race condition during link down where interrupt
being generated and igc_clean_tx_irq() been called to perform the
TX completion. Properly clear the TX buffer/descriptor ring and
disable the TX Queue ring in igc_free_tx_resources() to avoid that.

Kernel trace:
[  108.237177] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLIFUI1.R00.4204.A00.2105270302 05/27/2021
[  108.237178] RIP: 0010:refcount_warn_saturate+0x55/0x110
[  108.242143] RSP: 0018:ffff9e7980003db0 EFLAGS: 00010286
[  108.245555] Code: 84 bc 00 00 00 c3 cc cc cc cc 85 f6 74 46 80 3d 20 8c 4d 01 00 75 ee 48 c7 c7 88 f4 03 ab c6 05 10 8c 4d 01 01 e8 0b 10 96 ff <0f> 0b c3 cc cc cc cc 80 3d fc 8b 4d 01 00 75 cb 48 c7 c7 b0 f4 03
[  108.250434]
[  108.250434] RSP: 0018:ffff9e798125f910 EFLAGS: 00010286
[  108.254358] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  108.259325]
[  108.259325] RAX: 0000000000000000 RBX: ffff8ddb935b8000 RCX: 0000000000000027
[  108.261868] RDX: ffff8de250a28800 RSI: ffff8de250a1c580 RDI: ffff8de250a1c580
[  108.265538] RDX: 0000000000000027 RSI: 0000000000000002 RDI: ffff8de250a9c588
[  108.265539] RBP: ffff8ddb935b8000 R08: ffffffffab2655a0 R09: ffff9e798125f898
[  108.267914] RBP: ffff8ddb8a5b8d80 R08: 0000005648eba354 R09: 0000000000000000
[  108.270196] R10: 0000000000000001 R11: 000000002d2d2d2d R12: ffff9e798125f948
[  108.270197] R13: ffff9e798125fa1c R14: ffff8ddb8a5b8d80 R15: 7fffffffffffffff
[  108.273001] R10: 000000002d2d2d2d R11: 000000002d2d2d2d R12: ffff8ddb8a5b8ed4
[  108.276410] FS:  00007f605851b740(0000) GS:ffff8de250a80000(0000) knlGS:0000000000000000
[  108.280597] R13: 00000000000002ac R14: 00000000ffffff99 R15: ffff8ddb92561b80
[  108.282966] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  108.282967] CR2: 00007f053c039248 CR3: 0000000185850003 CR4: 0000000000f70ee0
[  108.286206] FS:  0000000000000000(0000) GS:ffff8de250a00000(0000) knlGS:0000000000000000
[  108.289701] PKRU: 55555554
[  108.289702] Call Trace:
[  108.289704]  <TASK>
[  108.293977] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  108.297562]  sock_alloc_send_pskb+0x20c/0x240
[  108.301494] CR2: 00007f053c03a168 CR3: 0000000184394002 CR4: 0000000000f70ef0
[  108.301495] PKRU: 55555554
[  108.306464]  __ip_append_data.isra.0+0x96f/0x1040
[  108.309441] Call Trace:
[  108.309443]  ? __pfx_ip_generic_getfrag+0x10/0x10
[  108.314927]  <IRQ>
[  108.314928]  sock_wfree+0x1c7/0x1d0
[  108.318078]  ? __pfx_ip_generic_getfrag+0x10/0x10
[  108.320276]  skb_release_head_state+0x32/0x90
[  108.324812]  ip_make_skb+0xf6/0x130
[  108.327188]  skb_release_all+0x16/0x40
[  108.330775]  ? udp_sendmsg+0x9f3/0xcb0
[  108.332626]  napi_consume_skb+0x48/0xf0
[  108.334134]  ? xfrm_lookup_route+0x23/0xb0
[  108.344285]  igc_poll+0x787/0x1620 [igc]
[  108.346659]  udp_sendmsg+0x9f3/0xcb0
[  108.360010]  ? ttwu_do_activate+0x40/0x220
[  108.365237]  ? __pfx_ip_generic_getfrag+0x10/0x10
[  108.366744]  ? try_to_wake_up+0x289/0x5e0
[  108.376987]  ? sock_sendmsg+0x81/0x90
[  108.395698]  ? __pfx_process_timeout+0x10/0x10
[  108.395701]  sock_sendmsg+0x81/0x90
[  108.409052]  __napi_poll+0x29/0x1c0
[  108.414279]  ____sys_sendmsg+0x284/0x310
[  108.419507]  net_rx_action+0x257/0x2d0
[  108.438216]  ___sys_sendmsg+0x7c/0xc0
[  108.439723]  __do_softirq+0xc1/0x2a8
[  108.444950]  ? finish_task_switch+0xb4/0x2f0
[  108.452077]  irq_exit_rcu+0xa9/0xd0
[  108.453584]  ? __schedule+0x372/0xd00
[  108.460713]  common_interrupt+0x84/0xa0
[  108.467840]  ? clockevents_program_event+0x95/0x100
[  108.474968]  </IRQ>
[  108.482096]  ? do_nanosleep+0x88/0x130
[  108.489224]  <TASK>
[  108.489225]  asm_common_interrupt+0x26/0x40
[  108.496353]  ? __rseq_handle_notify_resume+0xa9/0x4f0
[  108.503478] RIP: 0010:cpu_idle_poll+0x2c/0x100
[  108.510607]  __sys_sendmsg+0x5d/0xb0
[  108.518687] Code: 05 e1 d9 c8 00 65 8b 15 de 64 85 55 85 c0 7f 57 e8 b9 ef ff ff fb 65 48 8b 1c 25 00 cc 02 00 48 8b 03 a8 08 74 0b eb 1c f3 90 <48> 8b 03 a8 08 75 13 8b 05 77 63 cd 00 85 c0 75 ed e8 ce ec ff ff
[  108.525817]  do_syscall_64+0x44/0xa0
[  108.531563] RSP: 0018:ffffffffab203e70 EFLAGS: 00000202
[  108.538693]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  108.546775]
[  108.546777] RIP: 0033:0x7f605862b7f7
[  108.549495] RAX: 0000000000000001 RBX: ffffffffab20c940 RCX: 000000000000003b
[  108.551955] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[  108.554068] RDX: 4000000000000000 RSI: 000000002da97f6a RDI: 00000000002b8ff4
[  108.559816] RSP: 002b:00007ffc99264058 EFLAGS: 00000246
[  108.564178] RBP: 0000000000000000 R08: 00000000002b8ff4 R09: ffff8ddb01554c80
[  108.571302]  ORIG_RAX: 000000000000002e
[  108.571303] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f605862b7f7
[  108.574023] R10: 000000000000015b R11: 000000000000000f R12: ffffffffab20c940
[  108.574024] R13: 0000000000000000 R14: ffff8de26fbeef40 R15: ffffffffab20c940
[  108.578727] RDX: 0000000000000000 RSI: 00007ffc992640a0 RDI: 0000000000000003
[  108.578728] RBP: 00007ffc99264110 R08: 0000000000000000 R09: 175f48ad1c3a9c00
[  108.581187]  do_idle+0x62/0x230
[  108.585890] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc992642d8
[  108.585891] R13: 00005577814ab2ba R14: 00005577814addf0 R15: 00007f605876d000
[  108.587920]  cpu_startup_entry+0x1d/0x20
[  108.591422]  </TASK>
[  108.596127]  rest_init+0xc5/0xd0
[  108.600490] ---[ end trace 0000000000000000 ]---

Test Setup:

DUT:
- Change mac address on DUT Side. Ensure NIC not having same MAC Address
- Running udp_tai on DUT side. Let udp_tai running throughout the test

Example:
./udp_tai -i enp170s0 -P 100000 -p 90 -c 1 -t 0 -u 30004

Host:
- Perform link up/down every 5 second.

Result:
Kernel panic will happen on DUT Side.

Fixes: 13b5b7fd6a4a ("igc: Add support for Tx/Rx rings")
Signed-off-by: Muhammad Husaini Zulkifli <[email protected]>
Tested-by: Naama Meir <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

clk: mediatek: mt8365: Fix index issue

Before the patch [1], the clock probe was done directly in the
clk-mt8365 driver. In this probe function, the array which stores the
data clocks is sized using the higher defined numbers (*_NR_CLOCK) in
the clock lists [2]. Currently, with the patch [1], the specific
clk-mt8365 probe function is replaced by the mtk generic one [3], which
size the clock data array by adding all the clock descriptor array size
provided by the clk-mt8365 driver.

Actually, all clock indexes come from the header file [2], that mean, if
there are more clock (then more index) in the header file [2] than the
number of clock declared in the clock descriptor arrays (which is the
case currently), the clock data array will be undersized and then the
generic probe function will overflow when it will try to write in
"clk_data[CLK_INDEX]". Actually, instead of crashing at boot, the probe
function returns an error in the log which looks like:
"of_clk_hw_onecell_get: invalid index 135", then this clock isn't
enabled.

Solve this issue by adding in the driver the missing clocks declared in
the header clock file [2].

[1]: Commit ffe91cb28f6a ("clk: mediatek: mt8365: Convert to
mtk_clk_simple_{probe,remove}()")
[2]: include/dt-bindings/clock/mediatek,mt8365-clk.h
[3]: drivers/clk/mediatek/clk-mtk.c

Fixes: ffe91cb28f6a ("clk: mediatek: mt8365: Convert to mtk_clk_simple_{probe,remove}()")
Signed-off-by: Alexandre Mergnat <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Tested-by: Markus Schneider-Pargmann <[email protected]>
Reviewed-by: Markus Schneider-Pargmann <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>

zswap: do not shrink if cgroup may not zswap

Before storing a page, zswap first checks if the number of stored pages
exceeds the limit specified by memory.zswap.max, for each cgroup in the
hierarchy. If this limit is reached or exceeded, then zswap shrinking is
triggered and short-circuits the store attempt.

However, since the zswap's LRU is not memcg-aware, this can create the
following pathological behavior: the cgroup whose zswap limit is 0 will
evict pages from other cgroups continually, without lowering its own zswap
usage. This means the shrinking will continue until the need for swap
ceases or the pool becomes empty.

As a result of this, we observe a disproportionate amount of zswap
writeback and a perpetually small zswap pool in our experiments, even
though the pool limit is never hit.

More generally, a cgroup might unnecessarily evict pages from other
cgroups before we drive the memcg back below its limit.

This patch fixes the issue by rejecting zswap store attempt without
shrinking the pool when obj_cgroup_may_zswap() returns false.

[[email protected]: fix return of unintialized value]
[[email protected]: s/ENOSPC/ENOMEM/]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: f4840ccfca25 ("zswap: memcg accounting")
Signed-off-by: Nhat Pham <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: Domenico Cerasuolo <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Vitaly Wool <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

page cache: fix page_cache_next/prev_miss off by one

Ackerley Tng reported an issue with hugetlbfs fallocate here[1].  The
issue showed up after the conversion of hugetlb page cache lookup code to
use page_cache_next_miss.  Code in hugetlb fallocate, userfaultfd and GUP
is now using page_cache_next_miss to determine if a page is present the
page cache.  The following statement is used.

present = page_cache_next_miss(mapping, index, 1) != index;

There are two issues with page_cache_next_miss when used in this way.
1) If the passed value for index is equal to the 'wrap-around' value,
   the same index will always be returned.  This wrap-around value is 0,
   so 0 will be returned even if page is present at index 0.
2) If there is no gap in the range passed, the last index in the range
   will be returned.  When passed a range of 1 as above, the passed
   index value will be returned even if the page is present.
The end result is the statement above will NEVER indicate a page is
present in the cache, even if it is.

As noted by Ackerley in [1], users can see this by hugetlb fallocate
incorrectly returning EEXIST if pages are already present in the file.  In
addition, hugetlb pages will not be included in core dumps if they need to
be brought in via GUP.  userfaultfd UFFDIO_COPY also uses this code and
will not notice pages already present in the cache.  It may try to
allocate a new page and potentially return ENOMEM as opposed to EEXIST.

Both page_cache_next_miss and page_cache_prev_miss have similar issues.
Fix by:
- Check for index equal to 'wrap-around' value and do not exit early.
- If no gap is found in range, return index outside range.
- Update function description to say 'wrap-around' value could be
  returned if passed as index.

[1] https://lore.kernel.org/linux-mm/cover.1683069252 [email protected]/

Link: https://lkml.kernel.org/r/[email protected]
Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
Signed-off-by: Mike Kravetz <[email protected]>
Reported-by: Ackerley Tng <[email protected]>
Reviewed-by: Ackerley Tng <[email protected]>
Tested-by: Ackerley Tng <[email protected]>
Cc: Erdem Aktas <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Sidhartha Kumar <[email protected]>
Cc: Vishal Annapurve <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

ocfs2: check new file size on fallocate call

When changing a file size with fallocate() the new size isn't being
checked. In particular, the FSIZE ulimit isn't being checked, which makes
fstest generic/228 fail. Simply adding a call to inode_newsize_ok() fixes
this issue.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Luís Henriques <[email protected]>
Reviewed-by: Mark Fasheh <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mailmap: add entry for John Keeping

Map my corporate address to my personal one, as I am leaving the
company.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: John Keeping <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/damon/core: fix divide error in damon_nr_accesses_to_accesses_bp()

If 'aggr_interval' is smaller than 'sample_interval', max_nr_accesses in
damon_nr_accesses_to_accesses_bp() becomes zero which leads to divide
error, let's validate the values of them in damon_set_attrs() to fix it,
which similar to others attrs check.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 2f5bef5a590b ("mm/damon/core: update monitoring results for new monitoring attributes")
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=841a46899768ec7bec67
Link: https://lore.kernel.org/damon/[email protected]/
Reviewed-by: SeongJae Park <[email protected]>
Signed-off-by: Kefeng Wang <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

epoll: ep_autoremove_wake_function should use list_del_init_careful

autoremove_wake_function uses list_del_init_careful, so should epoll's
more aggressive variant. It only doesn't because it was copied from an
older wait.c rather than the most recent.

[[email protected]: add comment]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: a16ceb139610 ("epoll: autoremove wakers even more aggressively")
Signed-off-by: Ben Segall <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/gup_test: fix ioctl fail for compat task

When tools/testing/selftests/mm/gup_test.c is compiled as 32bit, then run
on arm64 kernel, it reports "ioctl: Inappropriate ioctl for device".

Fix it by filling compat_ioctl in gup_test_fops

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Haibo Li <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: AngeloGioacchino Del Regno <[email protected]>
Cc: Matthias Brugger <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

nilfs2: reject devices with insufficient block count

The current sanity check for nilfs2 geometry information lacks checks for
the number of segments stored in superblocks, so even for device images
that have been destructively truncated or have an unusually high number of
segments, the mount operation may succeed.

This causes out-of-bounds block I/O on file system block reads or log
writes to the segments, the latter in particular causing
"a_ops->writepages" to repeatedly fail, resulting in sync_inodes_sb() to
hang.

Fix this issue by checking the number of segments stored in the superblock
and avoiding mounting devices that can cause out-of-bounds accesses. To
eliminate the possibility of overflow when calculating the number of
blocks required for the device from the number of segments, this also adds
a helper function to calculate the upper bound on the number of segments
and inserts a check using it.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Reported-by: [email protected]
Link: https://syzkaller.appspot.com/bug?extid=7d50f1e54a12ba3aeae2
Tested-by: Ryusuke Konishi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

ocfs2: fix use-after-free when unmounting read-only filesystem

It's trivial to trigger a use-after-free bug in the ocfs2 quotas code using
fstest generic/452. After a read-only remount, quotas are suspended and
ocfs2_mem_dqinfo is freed through ->ocfs2_local_free_info(). When unmounting
the filesystem, an UAF access to the oinfo will eventually cause a crash.

BUG: KASAN: slab-use-after-free in timer_delete+0x54/0xc0
Read of size 8 at addr ffff8880389a8208 by task umount/669
...
Call Trace:
<TASK>
...
timer_delete+0x54/0xc0
try_to_grab_pending+0x31/0x230
__cancel_work_timer+0x6c/0x270
ocfs2_disable_quotas.isra.0+0x3e/0xf0 [ocfs2]
ocfs2_dismount_volume+0xdd/0x450 [ocfs2]
generic_shutdown_super+0xaa/0x280
kill_block_super+0x46/0x70
deactivate_locked_super+0x4d/0xb0
cleanup_mnt+0x135/0x1f0
...
</TASK>

Allocated by task 632:
kasan_save_stack+0x1c/0x40
kasan_set_track+0x21/0x30
__kasan_kmalloc+0x8b/0x90
ocfs2_local_read_info+0xe3/0x9a0 [ocfs2]
dquot_load_quota_sb+0x34b/0x680
dquot_load_quota_inode+0xfe/0x1a0
ocfs2_enable_quotas+0x190/0x2f0 [ocfs2]
ocfs2_fill_super+0x14ef/0x2120 [ocfs2]
mount_bdev+0x1be/0x200
legacy_get_tree+0x6c/0xb0
vfs_get_tree+0x3e/0x110
path_mount+0xa90/0xe10
__x64_sys_mount+0x16f/0x1a0
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 650:
kasan_save_stack+0x1c/0x40
kasan_set_track+0x21/0x30
kasan_save_free_info+0x2a/0x50
__kasan_slab_free+0xf9/0x150
__kmem_cache_free+0x89/0x180
ocfs2_local_free_info+0x2ba/0x3f0 [ocfs2]
dquot_disable+0x35f/0xa70
ocfs2_susp_quotas.isra.0+0x159/0x1a0 [ocfs2]
ocfs2_remount+0x150/0x580 [ocfs2]
reconfigure_super+0x1a5/0x3a0
path_mount+0xc8a/0xe10
__x64_sys_mount+0x16f/0x1a0
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Luís Henriques <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Tested-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

lib/test_vmalloc.c: avoid garbage in page array

It turns out that alloc_pages_bulk_array() does not treat the page_array
parameter as an output parameter, but rather reads the array and skips any
entries that have already been allocated.

This is somewhat unexpected and breaks this test, as we allocate the pages
array uninitialised on the assumption it will be overwritten.

As a result, the test was referencing uninitialised data and causing the
PFN to not be valid and thus a WARN_ON() followed by a null pointer deref
and panic.

In addition, this is an array of pointers not of struct page objects, so we
need only allocate an array with elements of pointer size.

We solve both problems by simply using kcalloc() and referencing
sizeof(struct page *) rather than sizeof(struct page).

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 869cb29a61a1 ("lib/test_vmalloc.c: add vm_map_ram()/vm_unmap_ram() test case")
Signed-off-by: Lorenzo Stoakes <[email protected]>
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>
Reviewed-by: Baoquan He <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

nilfs2: fix possible out-of-bounds segment allocation in resize ioctl

Syzbot reports that in its stress test for resize ioctl, the log writing
function nilfs_segctor_do_construct hits a WARN_ON in
nilfs_segctor_truncate_segments().

It turned out that there is a problem with the current implementation of
the resize ioctl, which changes the writable range on the device (the
range of allocatable segments) at the end of the resize process.

This order is necessary for file system expansion to avoid corrupting the
superblock at trailing edge. However, in the case of a file system
shrink, if log writes occur after truncating out-of-bounds trailing
segments and before the resize is complete, segments may be allocated from
the truncated space.

The userspace resize tool was fine as it limits the range of allocatable
segments before performing the resize, but it can run into this issue if
the resize ioctl is called alone.

Fix this issue by changing nilfs_sufile_resize() to update the range of
allocatable segments immediately after successful truncation of segment
space in case of file system shrink.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 4e33f9eab07e ("nilfs2: implement resize ioctl")
Signed-off-by: Ryusuke Konishi <[email protected]>
Reported-by: [email protected]
Closes: https://lkml.kernel.org/r/[email protected]
Tested-by: Ryusuke Konishi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

riscv/purgatory: remove PGO flags

If profile-guided optimization is enabled, the purgatory ends up with
multiple .text sections. This is not supported by kexec and crashes the
system.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory")
Signed-off-by: Ricardo Ribalda <[email protected]>
Acked-by: Palmer Dabbelt <[email protected]>
Cc: <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Borislav Petkov (AMD) <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Philipp Rudo <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Simon Horman <[email protected]>
Cc: Steven Rostedt (Google) <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tom Rix <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

powerpc/purgatory: remove PGO flags

If profile-guided optimization is enabled, the purgatory ends up with
multiple .text sections. This is not supported by kexec and crashes the
system.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory")
Signed-off-by: Ricardo Ribalda <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Borislav Petkov (AMD) <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Philipp Rudo <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Simon Horman <[email protected]>
Cc: Steven Rostedt (Google) <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tom Rix <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

x86/purgatory: remove PGO flags

If profile-guided optimization is enabled, the purgatory ends up with
multiple .text sections. This is not supported by kexec and crashes the
system.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory")
Signed-off-by: Ricardo Ribalda <[email protected]>
Cc: <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Borislav Petkov (AMD) <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Philipp Rudo <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Simon Horman <[email protected]>
Cc: Steven Rostedt (Google) <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tom Rix <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

kexec: support purgatories with .text.hot sections

Patch series "kexec: Fix kexec_file_load for llvm16 with PGO", v7.

When upreving llvm I realised that kexec stopped working on my test
platform.

The reason seems to be that due to PGO there are multiple .text sections
on the purgatory, and kexec does not supports that.

This patch (of 4):

Clang16 links the purgatory text in two sections when PGO is in use:

  [ 1] .text             PROGBITS         0000000000000000  00000040
       00000000000011a1  0000000000000000  AX       0     0     16
  [ 2] .rela.text        RELA             0000000000000000  00003498
       0000000000000648  0000000000000018   I      24     1     8
  ...
  [17] .text.hot.        PROGBITS         0000000000000000  00003220
       000000000000020b  0000000000000000  AX       0     0     1
  [18] .rela.text.hot.   RELA             0000000000000000  00004428
       0000000000000078  0000000000000018   I      24    17     8

And both of them have their range [sh_addr ... sh_addr+sh_size] on the
area pointed by `e_entry`.

This causes that image->start is calculated twice, once for .text and
another time for .text.hot. The second calculation leaves image->start
in a random location.

Because of this, the system crashes immediately after:

kexec_core: Starting new kernel

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory")
Signed-off-by: Ricardo Ribalda <[email protected]>
Reviewed-by: Ross Zwisler <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
Reviewed-by: Philipp Rudo <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Borislav Petkov (AMD) <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Simon Horman <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tom Rix <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/uffd: allow vma to merge as much as possible

We used to not pass in the pgoff correctly when register/unregister uffd
regions, it caused incorrect behavior on vma merging and can cause
mergeable vmas being separate after ioctls return.

For example, when we have:

  vma1(range 0-9, with uffd), vma2(range 10-19, no uffd)

Then someone unregisters uffd on range (5-9), it should logically become:

  vma1(range 0-4, with uffd), vma2(range 5-19, no uffd)

But with current code we'll have:

  vma1(range 0-4, with uffd), vma3(range 5-9, no uffd), vma2(range 10-19, no uffd)

This patch allows such merge to happen correctly before ioctl returns.

This behavior seems to have existed since the 1st day of uffd.  Since
pgoff for vma_merge() is only used to identify the possibility of vma
merging, meanwhile here what we did was always passing in a pgoff smaller
than what we should, so there should have no other side effect besides not
merging it.  Let's still tentatively copy stable for this, even though I
don't see anything will go wrong besides vma being split (which is mostly
not user visible).

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Peter Xu <[email protected]>
Reported-by: Lorenzo Stoakes <[email protected]>
Acked-by: Lorenzo Stoakes <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/uffd: fix vma operation where start addr cuts part of vma

Patch series "mm/uffd: Fix vma merge/split", v2.

This series contains two patches that fix vma merge/split for userfaultfd
on two separate issues.

Patch 1 fixes a regression since 6.1+ due to something we overlooked when
converting to maple tree apis.  The plan is we use patch 1 to replace the
commit "2f628010799e (mm: userfaultfd: avoid passing an invalid range to
vma_merge())" in mm-hostfixes-unstable tree if possible, so as to bring
uffd vma operations back aligned with the rest code again.

Patch 2 fixes a long standing issue that vma can be left unmerged even if
we can for either uffd register or unregister.

Many thanks to Lorenzo on either noticing this issue from the assert
movement patch, looking at this problem, and also provided a reproducer on
the unmerged vma issue [1].

[1] https://gist.github.com/lorenzo-stoakes/a11a10f5f479e7a977fc456331266e0e

This patch (of 2):

It seems vma merging with uffd paths is broken with either
register/unregister, where right now we can feed wrong parameters to
vma_merge() and it's found by recent patch which moved asserts upwards in
vma_merge() by Lorenzo Stoakes:

https://lore.kernel.org/all/[email protected]/

It's possible that "start" is contained within vma but not clamped to its
start.  We need to convert this into either "cannot merge" case or "can
merge" case 4 which permits subdivision of prev by assigning vma to prev.
As we loop, each subsequent VMA will be clamped to the start.

This patch will eliminate the report and make sure vma_merge() calls will
become legal again.

One thing to mention is that the "Fixes: 29417d292bd0" below is there only
to help explain where the warning can start to trigger, the real commit to
fix should be 69dbe6daf104.  Commit 29417d292bd0 helps us to identify the
issue, but unfortunately we may want to keep it in Fixes too just to ease
kernel backporters for easier tracking.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 69dbe6daf104 ("userfaultfd: use maple tree iterator to iterate VMAs")
Signed-off-by: Peter Xu <[email protected]>
Reported-by: Mark Rutland <[email protected]>
Reviewed-by: Lorenzo Stoakes <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Cc: Lorenzo Stoakes <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

radix-tree: move declarations to header

The xarray.c file contains the only call to radix_tree_node_rcu_free(),
and it comes with its own extern declaration for it. This means the
function definition causes a missing-prototype warning:

lib/radix-tree.c:288:6: error: no previous prototype for 'radix_tree_node_rcu_free' [-Werror=missing-prototypes]

Instead, move the declaration for this function to a new header that can
be included by both, and do the same for the radix_tree_node_cachep
variable that has the same underlying problem but does not cause a warning
with gcc.

[[email protected]: fix building radix tree test suite]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Peng Zhang <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>