Git Repo - linux.git/log

]> Git Repo - linux.git/log

projects / linux.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Linus Torvalds [Fri, 17 Jan 2025 03:49:26 +0000 (19:49 -0800)]

Merge tag 'drm-fixes-2025-01-17' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Final(?) set of fixes for 6.13, I think the holidays finally caught up
  with everyone, the misc changes are 2 weeks worth, otherwise amdgpu
  and xe are most of it. The largest pieces is a new test so I'm not too
  worried about that.

  kunit:
   - Fix W=1 build for kunit tests

  bridge:
   - Handle YCbCr420 better in bridge code, with tests
   - itee-it6263 error handling fix

  amdgpu:
   - SMU 13 fix
   - DP MST fixes
   - DCN 3.5 fix
   - PSR fixes
   - eDP fix
   - VRR fix
   - Enforce isolation fixes
   - GFX 12 fix
   - PSP 14.x fix

  xe:
   - Add steering info support for GuC register lists
   - Add means to wait for reset and synchronous reset
   - Make changing ccs_mode a synchronous action
   - Add missing mux registers
   - Mark ComputeCS read mode as UC on iGPU, unblocking ULLS on iGPU

  i915:
   - Relax clear color alignment to 64 bytes [fb]

  v3d:
   - Fix warn when unloading v3d

  nouveau:
   - Fix cross-device fence handling in nouveau
   - Fix backlight regression for macbooks 5,1

  vmwgfx:
   - Fix BO reservation handling in vmwgfx"

* tag 'drm-fixes-2025-01-17' of https://gitlab.freedesktop.org/drm/kernel: (33 commits)
  drm/xe: Mark ComputeCS read mode as UC on iGPU
  drm/xe/oa: Add missing VISACTL mux registers
  drm/xe: make change ccs_mode a synchronous action
  drm/xe: introduce xe_gt_reset and xe_gt_wait_for_reset
  drm/xe/guc: Adding steering info support for GuC register lists
  drm/bridge: ite-it6263: Prevent error pointer dereference in probe()
  drm/v3d: Ensure job pointer is set to NULL after job completion
  drm/vmwgfx: Add new keep_resv BO param
  drm/vmwgfx: Remove busy_places
  drm/vmwgfx: Unreserve BO on error
  drm/amdgpu: fix fw attestation for MP0_14_0_{2/3}
  drm/amdgpu: always sync the GFX pipe on ctx switch
  drm/amdgpu: disable gfxoff with the compute workload on gfx12
  drm/amdgpu: Fix Circular Locking Dependency in AMDGPU GFX Isolation
  drm/i915/fb: Relax clear color alignment to 64 bytes
  drm/amd/display: Disable replay and psr while VRR is enabled
  drm/amd/display: Fix PSR-SU not support but still call the amdgpu_dm_psr_enable
  nouveau/fence: handle cross device fences properly
  drm/tests: connector: Add ycbcr_420_allowed tests
  drm/connector: hdmi: Validate supported_formats matches ycbcr_420_allowed
  ...

commit | commitdiff | tree

Linus Torvalds [Fri, 17 Jan 2025 01:02:28 +0000 (17:02 -0800)]

Merge tag 'io_uring-6.13-20250116' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:
"One fix for the error handling in buffer cloning, and one fix for the
  ring resizing.

  Two minor followups for the latter as well.

  Both of these issues only affect 6.13, so not marked for stable"

* tag 'io_uring-6.13-20250116' of git://git.kernel.dk/linux:
  io_uring/register: cache old SQ/CQ head reading for copies
  io_uring/register: document io_register_resize_rings() shared mem usage
  io_uring/register: use stable SQ/CQ ring data during resize
  io_uring/rsrc: fixup io_clone_buffers() error handling

commit | commitdiff | tree

Dave Airlie [Thu, 16 Jan 2025 22:54:06 +0000 (08:54 +1000)]

Merge tag 'drm-xe-fixes-2025-01-16' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- Add steering info support for GuC register lists (Jesus Narvaez)
- Add means to wait for reset and synchronous reset (Maciej)
- Make changing ccs_mode a synchronous action (Maciej)
- Add missing mux registers (Ashutosh)
- Mark ComputeCS read mode as UC on iGPU, unblocking ULLS on iGPU (Matt Brost)

Signed-off-by: Dave Airlie <[email protected]>
From: Thomas Hellstrom <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/Z4ll3F1anLEwCvrf@fedora

commit | commitdiff | tree

Linus Torvalds [Fri, 17 Jan 2025 00:19:05 +0000 (16:19 -0800)]

Merge tag 'trace-v6.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

- Fix a regression in the irqsoff and wakeup latency tracing

   The function graph tracer infrastructure has become generic so that
   fprobes and BPF can be based on it. As it use to only handle function
   graph tracing, it would always calculate the time the function
   entered so that it could then calculate the time it exits and give
   the length of time the function executed for. But this is not needed
   for the other users (fprobes and BPF) and reading the clock adds a
   non-negligible overhead, so the calculation was moved into the
   function graph tracer logic.

   But the irqsoff and wakeup latency tracers, when the "display-graph"
   option was set, would use the function graph tracer to calculate the
   times of functions during the latency. The movement of the calltime
   calculation made the value zero for these tracers, and the output no
   longer showed the length of time of each tracer, but instead the
   absolute timestamp of when the function returned (rettime - calltime
   where calltime is now zero).

   Have the irqsoff and wakeup latency tracers also do the calltime
   calculation as the function graph tracer does and report the proper
   length of the function timings.

- Update the tracing display to reflect the new preempt lazy model

   When the system is configured with preempt lazy, the output of the
   trace data would state "unknown" for the current preemption model.
   Because the lazy preemption model was just added, make it known to
   the tracing subsystem too. This is just a one line change.

- Document multiple function graph having slightly different timings

   Now that function graph tracer infrastructure is separate, this also
   allows the function graph tracer to run in multiple instances (it
   wasn't able to do so before). If two instances ran the function graph
   tracer and traced the same functions, the timings for them will be
   slightly different because each does their own timings and collects
   the timestamps differently. Document this to not have people be
   confused by it.

* tag 'trace-v6.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ftrace: Document that multiple function_graph tracing may have different times
  tracing: Print lazy preemption model
  tracing: Fix irqsoff and wakeup latency tracers when using function graph

commit | commitdiff | tree

Dave Airlie [Thu, 16 Jan 2025 22:48:11 +0000 (08:48 +1000)]

Merge tag 'drm-intel-fixes-2025-01-15' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Relax clear color alignment to 64 bytes [fb] (Ville Syrjälä)

Signed-off-by: Dave Airlie <[email protected]>
From: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/Z4fdIVf68qsqIpiN@linux

commit | commitdiff | tree

Matthew Brost [Tue, 14 Jan 2025 00:25:07 +0000 (16:25 -0800)]

drm/xe: Mark ComputeCS read mode as UC on iGPU

RING_CMD_CCTL read index should be UC on iGPU parts due to L3 caching
structure. Having this as WB blocks ULLS from being enabled. Change to
UC to unblock ULLS on iGPU.

v2:
- Drop internal communications commnet, bspec is updated

Cc: Balasubramani Vivekanandan <[email protected]>
Cc: Michal Mrozek <[email protected]>
Cc: Paulo Zanoni <[email protected]>
Cc: José Roberto de Souza <[email protected]>
Cc: [email protected]
Fixes: 328e089bfb37 ("drm/xe: Leverage ComputeCS read L3 caching")
Signed-off-by: Matthew Brost <[email protected]>
Acked-by: Michal Mrozek <[email protected]>
Reviewed-by: Stuart Summers <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 758debf35b9cda5450e40996991a6e4b222899bd)
Signed-off-by: Thomas Hellström <[email protected]>

commit | commitdiff | tree

Linus Torvalds [Thu, 16 Jan 2025 17:09:44 +0000 (09:09 -0800)]

Merge tag 'net-6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Notably this includes fixes for a few regressions spotted very
  recently. No known outstanding ones.

  Current release - regressions:

   - core: avoid CFI problems with sock priv helpers

   - xsk: bring back busy polling support

   - netpoll: ensure skb_pool list is always initialized

  Current release - new code bugs:

   - core: make page_pool_ref_netmem work with net iovs

   - ipv4: route: fix drop reason being overridden in
     ip_route_input_slow

   - udp: make rehash4 independent in udp_lib_rehash()

  Previous releases - regressions:

   - bpf: fix bpf_sk_select_reuseport() memory leak

   - openvswitch: fix lockup on tx to unregistering netdev with carrier

   - mptcp: be sure to send ack when mptcp-level window re-opens

   - eth:
      - bnxt: always recalculate features after XDP clearing, fix
        null-deref
      - mlx5: fix sub-function add port error handling
      - fec: handle page_pool_dev_alloc_pages error

  Previous releases - always broken:

   - vsock: some fixes due to transport de-assignment

   - eth:
      - ice: fix E825 initialization
      - mlx5e: fix inversion dependency warning while enabling IPsec
        tunnel
      - gtp: destroy device along with udp socket's netns dismantle.
      - xilinx: axienet: Fix IRQ coalescing packet count overflow"

* tag 'net-6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (44 commits)
  netdev: avoid CFI problems with sock priv helpers
  net/mlx5e: Always start IPsec sequence number from 1
  net/mlx5e: Rely on reqid in IPsec tunnel mode
  net/mlx5e: Fix inversion dependency warning while enabling IPsec tunnel
  net/mlx5: Clear port select structure when fail to create
  net/mlx5: SF, Fix add port error handling
  net/mlx5: Fix a lockdep warning as part of the write combining test
  net/mlx5: Fix RDMA TX steering prio
  net: make page_pool_ref_netmem work with net iovs
  net: ethernet: xgbe: re-add aneg to supported features in PHY quirks
  net: pcs: xpcs: actively unset DW_VR_MII_DIG_CTRL1_2G5_EN for 1G SGMII
  net: pcs: xpcs: fix DW_VR_MII_DIG_CTRL1_2G5_EN bit being set for 1G SGMII w/o inband
  selftests: net: Adapt ethtool mq tests to fix in qdisc graft
  net: fec: handle page_pool_dev_alloc_pages error
  net: netpoll: ensure skb_pool list is always initialized
  net: xilinx: axienet: Fix IRQ coalescing packet count overflow
  nfp: bpf: prevent integer overflow in nfp_bpf_event_output()
  selftests: mptcp: avoid spurious errors on disconnect
  mptcp: fix spurious wake-up on under memory pressure
  mptcp: be sure to send ack when mptcp-level window re-opens
  ...

commit | commitdiff | tree

Linus Torvalds [Thu, 16 Jan 2025 17:04:10 +0000 (09:04 -0800)]

Merge tag 'pm-6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"Update the documentation of cpuidle governors that does not match the
  code any more after previous functional changes (Rafael Wysocki) and
  fix up the cpufreq Kconfig file broken inadvertently by a previous
  update (Viresh Kumar)"

* tag 'pm-6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: Move endif to the end of Kconfig file
  cpuidle: teo: Update documentation after previous changes
  cpuidle: menu: Update documentation after previous changes

commit | commitdiff | tree

Linus Torvalds [Thu, 16 Jan 2025 17:02:10 +0000 (09:02 -0800)]

Merge tag 'acpi-6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
"Prevent acpi_video_device_EDID() from returning a pointer to a memory
  region that should not be passed to kfree() which causes one of its
  users to crash randomly on attempts to free it (Chris Bainbridge)"

* tag 'acpi-6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: video: Fix random crashes due to bad kfree()

commit | commitdiff | tree

Linus Torvalds [Thu, 16 Jan 2025 16:54:33 +0000 (08:54 -0800)]

Merge tag 'for-6.13-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:

- handle d_path() errors when canonicalizing device mapper paths during
device scan

* tag 'for-6.13-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: add the missing error handling inside get_canonical_dev_path

commit | commitdiff | tree

Rafael J. Wysocki [Thu, 16 Jan 2025 14:36:41 +0000 (15:36 +0100)]

Merge branch 'pm-cpufreq'

Merge a cpufreq fix for 6.13:

- Fix cpufreq Kconfig breakage after previous changes (Viresh Kumar).

* pm-cpufreq:
cpufreq: Move endif to the end of Kconfig file

commit | commitdiff | tree

Jakub Kicinski [Wed, 15 Jan 2025 16:14:36 +0000 (08:14 -0800)]

netdev: avoid CFI problems with sock priv helpers

Li Li reports that casting away callback type may cause issues
for CFI. Let's generate a small wrapper for each callback,
to make sure compiler sees the anticipated types.

Reported-by: Li Li <[email protected]>
Link: https://lore.kernel.org/CANBPYPjQVqmzZ4J=rVQX87a9iuwmaetULwbK_5_3YWk2eGzkaA@mail.gmail.com
Fixes: 170aafe35cb9 ("netdev: support binding dma-buf to netdevice")
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Mina Almasry <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

commit | commitdiff | tree

Paolo Abeni [Thu, 16 Jan 2025 11:45:50 +0000 (12:45 +0100)]

Merge branch 'mlx5-misc-fixes-2025-01-15'

Tariq Toukan says:

====================
mlx5 misc fixes 2025-01-15

This patchset provides misc bug fixes from the team to the mlx5 core and
Eth drivers.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

commit | commitdiff | tree

Leon Romanovsky [Wed, 15 Jan 2025 11:39:10 +0000 (13:39 +0200)]

net/mlx5e: Always start IPsec sequence number from 1

According to RFC4303, section "3.3.3. Sequence Number Generation",
the first packet sent using a given SA will contain a sequence
number of 1.

This is applicable to both ESN and non-ESN mode, which was not covered
in commit mentioned in Fixes line.

Fixes: 3d42c8cc67a8 ("net/mlx5e: Ensure that IPsec sequence packet number starts from 1")
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

commit | commitdiff | tree

Leon Romanovsky [Wed, 15 Jan 2025 11:39:09 +0000 (13:39 +0200)]

net/mlx5e: Rely on reqid in IPsec tunnel mode

All packet offloads SAs have reqid in it to make sure they have
corresponding policy. While it is not strictly needed for transparent
mode, it is extremely important in tunnel mode. In that mode, policy and
SAs have different match criteria.

Policy catches the whole subnet addresses, and SA catches the tunnel gateways
addresses. The source address of such tunnel is not known during egress packet
traversal in flow steering as it is added only after successful encryption.

As reqid is required for packet offload and it is unique for every SA,
we can safely rely on it only.

The output below shows the configured egress policy and SA by strongswan:

[leonro@vm ~]$ sudo ip x s
src 192.169.101.2 dst 192.169.101.1
        proto esp spi 0xc88b7652 reqid 1 mode tunnel
        replay-window 0 flag af-unspec esn
        aead rfc4106(gcm(aes)) 0xe406a01083986e14d116488549094710e9c57bc6 128
        anti-replay esn context:
         seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0
         replay_window 1, bitmap-length 1
         00000000
        crypto offload parameters: dev eth2 dir out mode packet

[leonro@064 ~]$ sudo ip x p
src 192.170.0.0/16 dst 192.170.0.0/16
        dir out priority 383615 ptype main
        tmpl src 192.169.101.2 dst 192.169.101.1
                proto esp spi 0xc88b7652 reqid 1 mode tunnel
        crypto offload parameters: dev eth2 mode packet

Fixes: b3beba1fb404 ("net/mlx5e: Allow policies with reqid 0, to support IKE policy holes")
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

commit | commitdiff | tree

Leon Romanovsky [Wed, 15 Jan 2025 11:39:08 +0000 (13:39 +0200)]

net/mlx5e: Fix inversion dependency warning while enabling IPsec tunnel

Attempt to enable IPsec packet offload in tunnel mode in debug kernel
generates the following kernel panic, which is happening due to two
issues:
1. In SA add section, the should be _bh() variant when marking SA mode.
2. There is not needed flush_workqueue in SA delete routine. It is not
needed as at this stage as it is removed from SADB and the running work
will be canceled later in SA free.

=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
6.12.0+ #4 Not tainted
-----------------------------------------------------
charon/1337 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire:
ffff88810f365020 (&xa->xa_lock#24){+.+.}-{3:3}, at: mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]

and this task is already holding:
ffff88813e0f0d48 (&x->lock){+.-.}-{3:3}, at: xfrm_state_delete+0x16/0x30
which would create a new lock dependency:
  (&x->lock){+.-.}-{3:3} -> (&xa->xa_lock#24){+.+.}-{3:3}

but this new dependency connects a SOFTIRQ-irq-safe lock:
  (&x->lock){+.-.}-{3:3}

... which became SOFTIRQ-irq-safe at:
   lock_acquire+0x1be/0x520
   _raw_spin_lock_bh+0x34/0x40
   xfrm_timer_handler+0x91/0xd70
   __hrtimer_run_queues+0x1dd/0xa60
   hrtimer_run_softirq+0x146/0x2e0
   handle_softirqs+0x266/0x860
   irq_exit_rcu+0x115/0x1a0
   sysvec_apic_timer_interrupt+0x6e/0x90
   asm_sysvec_apic_timer_interrupt+0x16/0x20
   default_idle+0x13/0x20
   default_idle_call+0x67/0xa0
   do_idle+0x2da/0x320
   cpu_startup_entry+0x50/0x60
   start_secondary+0x213/0x2a0
   common_startup_64+0x129/0x138

to a SOFTIRQ-irq-unsafe lock:
  (&xa->xa_lock#24){+.+.}-{3:3}

... which became SOFTIRQ-irq-unsafe at:
...
   lock_acquire+0x1be/0x520
   _raw_spin_lock+0x2c/0x40
   xa_set_mark+0x70/0x110
   mlx5e_xfrm_add_state+0xe48/0x2290 [mlx5_core]
   xfrm_dev_state_add+0x3bb/0xd70
   xfrm_add_sa+0x2451/0x4a90
   xfrm_user_rcv_msg+0x493/0x880
   netlink_rcv_skb+0x12e/0x380
   xfrm_netlink_rcv+0x6d/0x90
   netlink_unicast+0x42f/0x740
   netlink_sendmsg+0x745/0xbe0
   __sock_sendmsg+0xc5/0x190
   __sys_sendto+0x1fe/0x2c0
   __x64_sys_sendto+0xdc/0x1b0
   do_syscall_64+0x6d/0x140
   entry_SYSCALL_64_after_hwframe+0x4b/0x53

other info that might help us debug this:

  Possible interrupt unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&xa->xa_lock#24);
                                local_irq_disable();
                                lock(&x->lock);
                                lock(&xa->xa_lock#24);
   <Interrupt>
     lock(&x->lock);

  *** DEADLOCK ***

2 locks held by charon/1337:
  #0: ffffffff87f8f858 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{4:4}, at: xfrm_netlink_rcv+0x5e/0x90
  #1: ffff88813e0f0d48 (&x->lock){+.-.}-{3:3}, at: xfrm_state_delete+0x16/0x30

the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
-> (&x->lock){+.-.}-{3:3} ops: 29 {
    HARDIRQ-ON-W at:
                     lock_acquire+0x1be/0x520
                     _raw_spin_lock_bh+0x34/0x40
                     xfrm_alloc_spi+0xc0/0xe60
                     xfrm_alloc_userspi+0x5f6/0xbc0
                     xfrm_user_rcv_msg+0x493/0x880
                     netlink_rcv_skb+0x12e/0x380
                     xfrm_netlink_rcv+0x6d/0x90
                     netlink_unicast+0x42f/0x740
                     netlink_sendmsg+0x745/0xbe0
                     __sock_sendmsg+0xc5/0x190
                     __sys_sendto+0x1fe/0x2c0
                     __x64_sys_sendto+0xdc/0x1b0
                     do_syscall_64+0x6d/0x140
                     entry_SYSCALL_64_after_hwframe+0x4b/0x53
    IN-SOFTIRQ-W at:
                     lock_acquire+0x1be/0x520
                     _raw_spin_lock_bh+0x34/0x40
                     xfrm_timer_handler+0x91/0xd70
                     __hrtimer_run_queues+0x1dd/0xa60
                     hrtimer_run_softirq+0x146/0x2e0
                     handle_softirqs+0x266/0x860
                     irq_exit_rcu+0x115/0x1a0
                     sysvec_apic_timer_interrupt+0x6e/0x90
                     asm_sysvec_apic_timer_interrupt+0x16/0x20
                     default_idle+0x13/0x20
                     default_idle_call+0x67/0xa0
                     do_idle+0x2da/0x320
                     cpu_startup_entry+0x50/0x60
                     start_secondary+0x213/0x2a0
                     common_startup_64+0x129/0x138
    INITIAL USE at:
                    lock_acquire+0x1be/0x520
                    _raw_spin_lock_bh+0x34/0x40
                    xfrm_alloc_spi+0xc0/0xe60
                    xfrm_alloc_userspi+0x5f6/0xbc0
                    xfrm_user_rcv_msg+0x493/0x880
                    netlink_rcv_skb+0x12e/0x380
                    xfrm_netlink_rcv+0x6d/0x90
                    netlink_unicast+0x42f/0x740
                    netlink_sendmsg+0x745/0xbe0
                    __sock_sendmsg+0xc5/0x190
                    __sys_sendto+0x1fe/0x2c0
                    __x64_sys_sendto+0xdc/0x1b0
                    do_syscall_64+0x6d/0x140
                    entry_SYSCALL_64_after_hwframe+0x4b/0x53
  }
  ... key      at: [<ffffffff87f9cd20>] __key.18+0x0/0x40

the dependencies between the lock to be acquired
  and SOFTIRQ-irq-unsafe lock:
-> (&xa->xa_lock#24){+.+.}-{3:3} ops: 9 {
    HARDIRQ-ON-W at:
                     lock_acquire+0x1be/0x520
                     _raw_spin_lock_bh+0x34/0x40
                     mlx5e_xfrm_add_state+0xc5b/0x2290 [mlx5_core]
                     xfrm_dev_state_add+0x3bb/0xd70
                     xfrm_add_sa+0x2451/0x4a90
                     xfrm_user_rcv_msg+0x493/0x880
                     netlink_rcv_skb+0x12e/0x380
                     xfrm_netlink_rcv+0x6d/0x90
                     netlink_unicast+0x42f/0x740
                     netlink_sendmsg+0x745/0xbe0
                     __sock_sendmsg+0xc5/0x190
                     __sys_sendto+0x1fe/0x2c0
                     __x64_sys_sendto+0xdc/0x1b0
                     do_syscall_64+0x6d/0x140
                     entry_SYSCALL_64_after_hwframe+0x4b/0x53
    SOFTIRQ-ON-W at:
                     lock_acquire+0x1be/0x520
                     _raw_spin_lock+0x2c/0x40
                     xa_set_mark+0x70/0x110
                     mlx5e_xfrm_add_state+0xe48/0x2290 [mlx5_core]
                     xfrm_dev_state_add+0x3bb/0xd70
                     xfrm_add_sa+0x2451/0x4a90
                     xfrm_user_rcv_msg+0x493/0x880
                     netlink_rcv_skb+0x12e/0x380
                     xfrm_netlink_rcv+0x6d/0x90
                     netlink_unicast+0x42f/0x740
                     netlink_sendmsg+0x745/0xbe0
                     __sock_sendmsg+0xc5/0x190
                     __sys_sendto+0x1fe/0x2c0
                     __x64_sys_sendto+0xdc/0x1b0
                     do_syscall_64+0x6d/0x140
                     entry_SYSCALL_64_after_hwframe+0x4b/0x53
    INITIAL USE at:
                    lock_acquire+0x1be/0x520
                    _raw_spin_lock_bh+0x34/0x40
                    mlx5e_xfrm_add_state+0xc5b/0x2290 [mlx5_core]
                    xfrm_dev_state_add+0x3bb/0xd70
                    xfrm_add_sa+0x2451/0x4a90
                    xfrm_user_rcv_msg+0x493/0x880
                    netlink_rcv_skb+0x12e/0x380
                    xfrm_netlink_rcv+0x6d/0x90
                    netlink_unicast+0x42f/0x740
                    netlink_sendmsg+0x745/0xbe0
                    __sock_sendmsg+0xc5/0x190
                    __sys_sendto+0x1fe/0x2c0
                    __x64_sys_sendto+0xdc/0x1b0
                    do_syscall_64+0x6d/0x140
                    entry_SYSCALL_64_after_hwframe+0x4b/0x53
  }
  ... key      at: [<ffffffffa078ff60>] __key.48+0x0/0xfffffffffff210a0 [mlx5_core]
  ... acquired at:
    __lock_acquire+0x30a0/0x5040
    lock_acquire+0x1be/0x520
    _raw_spin_lock_bh+0x34/0x40
    mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
    xfrm_dev_state_delete+0x90/0x160
    __xfrm_state_delete+0x662/0xae0
    xfrm_state_delete+0x1e/0x30
    xfrm_del_sa+0x1c2/0x340
    xfrm_user_rcv_msg+0x493/0x880
    netlink_rcv_skb+0x12e/0x380
    xfrm_netlink_rcv+0x6d/0x90
    netlink_unicast+0x42f/0x740
    netlink_sendmsg+0x745/0xbe0
    __sock_sendmsg+0xc5/0x190
    __sys_sendto+0x1fe/0x2c0
    __x64_sys_sendto+0xdc/0x1b0
    do_syscall_64+0x6d/0x140
    entry_SYSCALL_64_after_hwframe+0x4b/0x53

stack backtrace:
CPU: 7 UID: 0 PID: 1337 Comm: charon Not tainted 6.12.0+ #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
  <TASK>
  dump_stack_lvl+0x74/0xd0
  check_irq_usage+0x12e8/0x1d90
  ? print_shortest_lock_dependencies_backwards+0x1b0/0x1b0
  ? check_chain_key+0x1bb/0x4c0
  ? __lockdep_reset_lock+0x180/0x180
  ? check_path.constprop.0+0x24/0x50
  ? mark_lock+0x108/0x2fb0
  ? print_circular_bug+0x9b0/0x9b0
  ? mark_lock+0x108/0x2fb0
  ? print_usage_bug.part.0+0x670/0x670
  ? check_prev_add+0x1c4/0x2310
  check_prev_add+0x1c4/0x2310
  __lock_acquire+0x30a0/0x5040
  ? lockdep_set_lock_cmp_fn+0x190/0x190
  ? lockdep_set_lock_cmp_fn+0x190/0x190
  lock_acquire+0x1be/0x520
  ? mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
  ? lockdep_hardirqs_on_prepare+0x400/0x400
  ? __xfrm_state_delete+0x5f0/0xae0
  ? lock_downgrade+0x6b0/0x6b0
  _raw_spin_lock_bh+0x34/0x40
  ? mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
  mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
  xfrm_dev_state_delete+0x90/0x160
  __xfrm_state_delete+0x662/0xae0
  xfrm_state_delete+0x1e/0x30
  xfrm_del_sa+0x1c2/0x340
  ? xfrm_get_sa+0x250/0x250
  ? check_chain_key+0x1bb/0x4c0
  xfrm_user_rcv_msg+0x493/0x880
  ? copy_sec_ctx+0x270/0x270
  ? check_chain_key+0x1bb/0x4c0
  ? lockdep_set_lock_cmp_fn+0x190/0x190
  ? lockdep_set_lock_cmp_fn+0x190/0x190
  netlink_rcv_skb+0x12e/0x380
  ? copy_sec_ctx+0x270/0x270
  ? netlink_ack+0xd90/0xd90
  ? netlink_deliver_tap+0xcd/0xb60
  xfrm_netlink_rcv+0x6d/0x90
  netlink_unicast+0x42f/0x740
  ? netlink_attachskb+0x730/0x730
  ? lock_acquire+0x1be/0x520
  netlink_sendmsg+0x745/0xbe0
  ? netlink_unicast+0x740/0x740
  ? __might_fault+0xbb/0x170
  ? netlink_unicast+0x740/0x740
  __sock_sendmsg+0xc5/0x190
  ? fdget+0x163/0x1d0
  __sys_sendto+0x1fe/0x2c0
  ? __x64_sys_getpeername+0xb0/0xb0
  ? do_user_addr_fault+0x856/0xe30
  ? lock_acquire+0x1be/0x520
  ? __task_pid_nr_ns+0x117/0x410
  ? lock_downgrade+0x6b0/0x6b0
  __x64_sys_sendto+0xdc/0x1b0
  ? lockdep_hardirqs_on_prepare+0x284/0x400
  do_syscall_64+0x6d/0x140
  entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f7d31291ba4
Code: 7d e8 89 4d d4 e8 4c 42 f7 ff 44 8b 4d d0 4c 8b 45 c8 89 c3 44 8b 55 d4 8b 7d e8 b8 2c 00 00 00 48 8b 55 d8 48 8b 75 e0 0f 05 <48> 3d 00 f0 ff ff 77 34 89 df 48 89 45 e8 e8 99 42 f7 ff 48 8b 45
RSP: 002b:00007f7d2ccd94f0 EFLAGS: 00000297 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f7d31291ba4
RDX: 0000000000000028 RSI: 00007f7d2ccd96a0 RDI: 000000000000000a
RBP: 00007f7d2ccd9530 R08: 00007f7d2ccd9598 R09: 000000000000000c
R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000028
R13: 00007f7d2ccd9598 R14: 00007f7d2ccd96a0 R15: 00000000000000e1
  </TASK>

Fixes: 4c24272b4e2b ("net/mlx5e: Listen to ARP events to update IPsec L2 headers in tunnel mode")
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

commit | commitdiff | tree

Mark Zhang [Wed, 15 Jan 2025 11:39:07 +0000 (13:39 +0200)]

net/mlx5: Clear port select structure when fail to create

Clear the port select structure on error so no stale values left after
definers are destroyed. That's because the mlx5_lag_destroy_definers()
always try to destroy all lag definers in the tt_map, so in the flow
below lag definers get double-destroyed and cause kernel crash:

  mlx5_lag_port_sel_create()
    mlx5_lag_create_definers()
      mlx5_lag_create_definer()     <- Failed on tt 1
        mlx5_lag_destroy_definers() <- definers[tt=0] gets destroyed
  mlx5_lag_port_sel_create()
    mlx5_lag_create_definers()
      mlx5_lag_create_definer()     <- Failed on tt 0
        mlx5_lag_destroy_definers() <- definers[tt=0] gets double-destroyed

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
Mem abort info:
   ESR = 0x0000000096000005
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x05: level 1 translation fault
Data abort info:
   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 64k pages, 48-bit VAs, pgdp=0000000112ce2e00
[0000000000000008] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
Modules linked in: iptable_raw bonding ip_gre ip6_gre gre ip6_tunnel tunnel6 geneve ip6_udp_tunnel udp_tunnel ipip tunnel4 ip_tunnel rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) mlx5_fwctl(OE) fwctl(OE) mlx5_core(OE) mlxdevm(OE) ib_core(OE) mlxfw(OE) memtrack(OE) mlx_compat(OE) openvswitch nsh nf_conncount psample xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc netconsole overlay efi_pstore sch_fq_codel zram ip_tables crct10dif_ce qemu_fw_cfg fuse ipv6 crc_ccitt [last unloaded: mlx_compat(OE)]
  CPU: 3 UID: 0 PID: 217 Comm: kworker/u53:2 Tainted: G           OE      6.11.0+ #2
  Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
  Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
  Workqueue: mlx5_lag mlx5_do_bond_work [mlx5_core]
  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : mlx5_del_flow_rules+0x24/0x2c0 [mlx5_core]
  lr : mlx5_lag_destroy_definer+0x54/0x100 [mlx5_core]
  sp : ffff800085fafb00
  x29: ffff800085fafb00 x28: ffff0000da0c8000 x27: 0000000000000000
  x26: ffff0000da0c8000 x25: ffff0000da0c8000 x24: ffff0000da0c8000
  x23: ffff0000c31f81a0 x22: 0400000000000000 x21: ffff0000da0c8000
  x20: 0000000000000000 x19: 0000000000000001 x18: 0000000000000000
  x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffff8b0c9350
  x14: 0000000000000000 x13: ffff800081390d18 x12: ffff800081dc3cc0
  x11: 0000000000000001 x10: 0000000000000b10 x9 : ffff80007ab7304c
  x8 : ffff0000d00711f0 x7 : 0000000000000004 x6 : 0000000000000190
  x5 : ffff00027edb3010 x4 : 0000000000000000 x3 : 0000000000000000
  x2 : ffff0000d39b8000 x1 : ffff0000d39b8000 x0 : 0400000000000000
  Call trace:
   mlx5_del_flow_rules+0x24/0x2c0 [mlx5_core]
   mlx5_lag_destroy_definer+0x54/0x100 [mlx5_core]
   mlx5_lag_destroy_definers+0xa0/0x108 [mlx5_core]
   mlx5_lag_port_sel_create+0x2d4/0x6f8 [mlx5_core]
   mlx5_activate_lag+0x60c/0x6f8 [mlx5_core]
   mlx5_do_bond_work+0x284/0x5c8 [mlx5_core]
   process_one_work+0x170/0x3e0
   worker_thread+0x2d8/0x3e0
   kthread+0x11c/0x128
   ret_from_fork+0x10/0x20
  Code: a9025bf5 aa0003f6 a90363f7 f90023f9 (f9400400)
  ---[ end trace 0000000000000000 ]---

Fixes: dc48516ec7d3 ("net/mlx5: Lag, add support to create definers for LAG")
Signed-off-by: Mark Zhang <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Reviewed-by: Mark Bloch <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

commit | commitdiff | tree

Chris Mi [Wed, 15 Jan 2025 11:39:06 +0000 (13:39 +0200)]

net/mlx5: SF, Fix add port error handling

If failed to add SF, error handling doesn't delete the SF from the
SF table. But the hw resources are deleted. So when unload driver,
hw resources will be deleted again. Firmware will report syndrome
0x68def3 which means "SF is not allocated can not deallocate".

Fix it by delete SF from SF table if failed to add SF.

Fixes: 2597ee190b4e ("net/mlx5: Call mlx5_sf_id_erase() once in mlx5_sf_dealloc()")
Signed-off-by: Chris Mi <[email protected]>
Reviewed-by: Shay Drori <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

commit | commitdiff | tree

Yishai Hadas [Wed, 15 Jan 2025 11:39:05 +0000 (13:39 +0200)]

net/mlx5: Fix a lockdep warning as part of the write combining test

Fix a lockdep warning [1] observed during the write combining test.

The warning indicates a potential nested lock scenario that could lead
to a deadlock.

However, this is a false positive alarm because the SF lock and its
parent lock are distinct ones.

The lockdep confusion arises because the locks belong to the same object
class (i.e., struct mlx5_core_dev).

To resolve this, the code has been refactored to avoid taking both
locks. Instead, only the parent lock is acquired.

[1]
raw_ethernet_bw/2118 is trying to acquire lock:
[  213.619032] ffff88811dd75e08 (&dev->wc_state_lock){+.+.}-{3:3}, at:
               mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[  213.620270]
[  213.620270] but task is already holding lock:
[  213.620943] ffff88810b585e08 (&dev->wc_state_lock){+.+.}-{3:3}, at:
               mlx5_wc_support_get+0x10c/0x210 [mlx5_core]
[  213.622045]
[  213.622045] other info that might help us debug this:
[  213.622778]  Possible unsafe locking scenario:
[  213.622778]
[  213.623465]        CPU0
[  213.623815]        ----
[  213.624148]   lock(&dev->wc_state_lock);
[  213.624615]   lock(&dev->wc_state_lock);
[  213.625071]
[  213.625071]  *** DEADLOCK ***
[  213.625071]
[  213.625805]  May be due to missing lock nesting notation
[  213.625805]
[  213.626522] 4 locks held by raw_ethernet_bw/2118:
[  213.627019]  #0: ffff88813f80d578 (&uverbs_dev->disassociate_srcu){.+.+}-{0:0},
                at: ib_uverbs_ioctl+0xc4/0x170 [ib_uverbs]
[  213.628088]  #1: ffff88810fb23930 (&file->hw_destroy_rwsem){.+.+}-{3:3},
                at: ib_init_ucontext+0x2d/0xf0 [ib_uverbs]
[  213.629094]  #2: ffff88810fb23878 (&file->ucontext_lock){+.+.}-{3:3},
                at: ib_init_ucontext+0x49/0xf0 [ib_uverbs]
[  213.630106]  #3: ffff88810b585e08 (&dev->wc_state_lock){+.+.}-{3:3},
                at: mlx5_wc_support_get+0x10c/0x210 [mlx5_core]
[  213.631185]
[  213.631185] stack backtrace:
[  213.631718] CPU: 1 UID: 0 PID: 2118 Comm: raw_ethernet_bw Not tainted
               6.12.0-rc7_internal_net_next_mlx5_89a0ad0 #1
[  213.632722] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
               rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[  213.633785] Call Trace:
[  213.634099]
[  213.634393]  dump_stack_lvl+0x7e/0xc0
[  213.634806]  print_deadlock_bug+0x278/0x3c0
[  213.635265]  __lock_acquire+0x15f4/0x2c40
[  213.635712]  lock_acquire+0xcd/0x2d0
[  213.636120]  ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[  213.636722]  ? mlx5_ib_enable_lb+0x24/0xa0 [mlx5_ib]
[  213.637277]  __mutex_lock+0x81/0xda0
[  213.637697]  ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[  213.638305]  ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[  213.638902]  ? rcu_read_lock_sched_held+0x3f/0x70
[  213.639400]  ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[  213.640016]  mlx5_wc_support_get+0x18c/0x210 [mlx5_core]
[  213.640615]  set_ucontext_resp+0x68/0x2b0 [mlx5_ib]
[  213.641144]  ? debug_mutex_init+0x33/0x40
[  213.641586]  mlx5_ib_alloc_ucontext+0x18e/0x7b0 [mlx5_ib]
[  213.642145]  ib_init_ucontext+0xa0/0xf0 [ib_uverbs]
[  213.642679]  ib_uverbs_handler_UVERBS_METHOD_GET_CONTEXT+0x95/0xc0
                [ib_uverbs]
[  213.643426]  ? _copy_from_user+0x46/0x80
[  213.643878]  ib_uverbs_cmd_verbs+0xa6b/0xc80 [ib_uverbs]
[  213.644426]  ? ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x130/0x130
               [ib_uverbs]
[  213.645213]  ? __lock_acquire+0xa99/0x2c40
[  213.645675]  ? lock_acquire+0xcd/0x2d0
[  213.646101]  ? ib_uverbs_ioctl+0xc4/0x170 [ib_uverbs]
[  213.646625]  ? reacquire_held_locks+0xcf/0x1f0
[  213.647102]  ? do_user_addr_fault+0x45d/0x770
[  213.647586]  ib_uverbs_ioctl+0xe0/0x170 [ib_uverbs]
[  213.648102]  ? ib_uverbs_ioctl+0xc4/0x170 [ib_uverbs]
[  213.648632]  __x64_sys_ioctl+0x4d3/0xaa0
[  213.649060]  ? do_user_addr_fault+0x4a8/0x770
[  213.649528]  do_syscall_64+0x6d/0x140
[  213.649947]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[  213.650478] RIP: 0033:0x7fa179b0737b
[  213.650893] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c
               89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8
               10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d
               7d 2a 0f 00 f7 d8 64 89 01 48
[  213.652619] RSP: 002b:00007ffd2e6d46e8 EFLAGS: 00000246 ORIG_RAX:
               0000000000000010
[  213.653390] RAX: ffffffffffffffda RBX: 00007ffd2e6d47f8 RCX:
               00007fa179b0737b
[  213.654084] RDX: 00007ffd2e6d47e0 RSI: 00000000c0181b01 RDI:
               0000000000000003
[  213.654767] RBP: 00007ffd2e6d47c0 R08: 00007fa1799be010 R09:
               0000000000000002
[  213.655453] R10: 00007ffd2e6d4960 R11: 0000000000000246 R12:
               00007ffd2e6d487c
[  213.656170] R13: 0000000000000027 R14: 0000000000000001 R15:
               00007ffd2e6d4f70

Fixes: d98995b4bf98 ("net/mlx5: Reimplement write combining test")
Signed-off-by: Yishai Hadas <[email protected]>
Reviewed-by: Michael Guralnik <[email protected]>
Reviewed-by: Larysa Zaremba <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

commit | commitdiff | tree

Patrisious Haddad [Wed, 15 Jan 2025 11:39:04 +0000 (13:39 +0200)]

net/mlx5: Fix RDMA TX steering prio

User added steering rules at RDMA_TX were being added to the first prio,
which is the counters prio.
Fix that so that they are correctly added to the BYPASS_PRIO instead.

Fixes: 24670b1a3166 ("net/mlx5: Add support for RDMA TX steering")
Signed-off-by: Patrisious Haddad <[email protected]>
Reviewed-by: Mark Bloch <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

commit | commitdiff | tree

Dave Airlie [Thu, 16 Jan 2025 02:33:52 +0000 (12:33 +1000)]

Merge tag 'amd-drm-fixes-6.13-2025-01-15' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.13-2025-01-15:

amdgpu:
- SMU 13 fix
- DP MST fixes
- DCN 3.5 fix
- PSR fixes
- eDP fix
- VRR fix
- Enforce isolation fixes
- GFX 12 fix
- PSP 14.x fix

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

commit | commitdiff | tree

Pavel Begunkov [Wed, 8 Jan 2025 22:06:22 +0000 (14:06 -0800)]

net: make page_pool_ref_netmem work with net iovs

page_pool_ref_netmem() should work with either netmem representation, but
currently it casts to a page with netmem_to_page(), which will fail with
net iovs. Use netmem_get_pp_ref_count_ref() instead.

Fixes: 8ab79ed50cf1 ("page_pool: devmem support")
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: David Wei <[email protected]>
Link: https://lore.kernel.org/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

commit | commitdiff | tree

Dave Airlie [Thu, 16 Jan 2025 01:54:13 +0000 (11:54 +1000)]

Merge tag 'drm-misc-fixes-2025-01-15' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

drm-misc-fixes for v6.13:

- itee-it6263 error handling fix.
- Fix warn when unloading v3d.
- Fix W=1 build for kunit tests.
- Fix backlight regression for macbooks 5,1 in nouveau.
- Handle YCbCr420 better in bridge code, with tests.
- Fix cross-device fence handling in nouveau.
- Fix BO reservation handling in vmwgfx.

Signed-off-by: Dave Airlie <[email protected]>
From: Maarten Lankhorst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

commit | commitdiff | tree

Heiner Kallweit [Sun, 12 Jan 2025 21:59:59 +0000 (22:59 +0100)]

net: ethernet: xgbe: re-add aneg to supported features in PHY quirks

In 4.19, before the switch to linkmode bitmaps, PHY_GBIT_FEATURES
included feature bits for aneg and TP/MII ports.

SUPPORTED_TP | \
SUPPORTED_MII)

SUPPORTED_10baseT_Full)

SUPPORTED_100baseT_Full)

SUPPORTED_1000baseT_Full)

PHY_100BT_FEATURES | \
PHY_DEFAULT_FEATURES)

PHY_1000BT_FEATURES)

Referenced commit expanded PHY_GBIT_FEATURES, silently removing
PHY_DEFAULT_FEATURES. The removed part can be re-added by using
the new PHY_GBIT_FEATURES definition.
Not clear to me is why nobody seems to have noticed this issue.

I stumbled across this when checking what it takes to make
phy_10_100_features_array et al private to phylib.

Fixes: d0939c26c53a ("net: ethernet: xgbe: expand PHY_GBIT_FEAUTRES")
Cc: [email protected]
Signed-off-by: Heiner Kallweit <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

commit | commitdiff | tree

Vladimir Oltean [Tue, 14 Jan 2025 16:47:21 +0000 (18:47 +0200)]

net: pcs: xpcs: actively unset DW_VR_MII_DIG_CTRL1_2G5_EN for 1G SGMII

xpcs_config_2500basex() sets DW_VR_MII_DIG_CTRL1_2G5_EN, but
xpcs_config_aneg_c37_sgmii() never unsets it. So, on a protocol change
from 2500base-x to sgmii, the DW_VR_MII_DIG_CTRL1_2G5_EN bit will remain
set.

Fixes: f27abde3042a ("net: pcs: add 2500BASEX support for Intel mGbE controller")
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Maxime Chevallier <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

commit | commitdiff | tree

Vladimir Oltean [Tue, 14 Jan 2025 16:47:20 +0000 (18:47 +0200)]

net: pcs: xpcs: fix DW_VR_MII_DIG_CTRL1_2G5_EN bit being set for 1G SGMII w/o inband

On a port with SGMII fixed-link at SPEED_1000, DW_VR_MII_DIG_CTRL1 gets
set to 0x2404. This is incorrect, because bit 2 (DW_VR_MII_DIG_CTRL1_2G5_EN)
is set.

It comes from the previous write to DW_VR_MII_AN_CTRL, because the "val"
variable is reused and is dirty. Actually, its value is 0x4, aka
FIELD_PREP(DW_VR_MII_PCS_MODE_MASK, DW_VR_MII_PCS_MODE_C37_SGMII).

Resolve the issue by clearing "val" to 0 when writing to a new register.
After the fix, the register value is 0x2400.

Prior to the blamed commit, when the read-modify-write was open-coded,
the code saved the content of the DW_VR_MII_DIG_CTRL1 register in the
"ret" variable.

Fixes: ce8d6081fcf4 ("net: pcs: xpcs: add _modify() accessors")
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Maxime Chevallier <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

commit | commitdiff | tree

Jens Axboe [Wed, 15 Jan 2025 15:39:15 +0000 (08:39 -0700)]

io_uring/register: cache old SQ/CQ head reading for copies

The SQ and CQ ring heads are read twice - once for verifying that it's
within bounds, and once inside the loops copying SQE and CQE entries.
This is technically incorrect, in case the values could get modified
in between verifying them and using them in the copy loop. While this
won't lead to anything truly nefarious, it may cause longer loop times
for the copies than expected.

Read the ring head values once, and use the verified value in the copy
loops.

Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Jens Axboe [Wed, 15 Jan 2025 15:23:55 +0000 (08:23 -0700)]

io_uring/register: document io_register_resize_rings() shared mem usage

It can be a bit hard to tell which parts of io_register_resize_rings()
are operating on shared memory, and which ones are not. And anything
reading or writing to those regions should really use the read/write
once primitives.

Hence add those, ensuring sanity in how this memory is accessed, and
helping document the shared nature of it.

Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Jens Axboe [Wed, 15 Jan 2025 14:39:12 +0000 (07:39 -0700)]

io_uring/register: use stable SQ/CQ ring data during resize

Normally the kernel would not expect an application to modify any of
the data shared with the kernel during a resize operation, but of
course the kernel cannot always assume good intent on behalf of the
application.

As part of resizing the rings, existing SQEs and CQEs are copied over
to the new storage. Resizing uses the masks in the newly allocated
shared storage to index the arrays, however it's possible that malicious
userspace could modify these after they have been sanity checked.

Use the validated and locally stored CQ and SQ ring sizing for masking
to ensure the values are both stable and valid.

Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
Reported-by: Jann Horn <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Ashutosh Dixit [Sat, 11 Jan 2025 02:15:39 +0000 (18:15 -0800)]

drm/xe/oa: Add missing VISACTL mux registers

Add missing VISACTL mux registers required for some OA
config's (e.g. RenderPipeCtrl).

Fixes: cdf02fe1a94a ("drm/xe/oa/uapi: Add/remove OA config perf ops")
Cc: [email protected]
Signed-off-by: Ashutosh Dixit <[email protected]>
Reviewed-by: Umesh Nerlige Ramappa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit c26f22dac3449d8a687237cdfc59a6445eb8f75a)
Signed-off-by: Thomas Hellström <[email protected]>

commit | commitdiff | tree

Maciej Patelczyk [Wed, 11 Dec 2024 11:17:27 +0000 (12:17 +0100)]

drm/xe: make change ccs_mode a synchronous action

If ccs_mode is being modified via
/sys/class/drm/cardX/device/tileY/gtY/ccs_mode
the asynchronous reset is triggered and the write returns immediately.

With that some test receive false information about number of CCS engines
or even fail if they proceed without delay after changing the ccs_mode.

Changing the ccs_mode change from async to sync to prevent failures in
tests.

Signed-off-by: Maciej Patelczyk <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Fixes: f3bc5bb4d53d ("drm/xe: Allow userspace to configure CCS mode")
Reviewed-by: Nirmoy Das <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>
(cherry picked from commit 480fb9806e2e073532f7786166287114c696b340)
Signed-off-by: Thomas Hellström <[email protected]>

commit | commitdiff | tree

Maciej Patelczyk [Wed, 11 Dec 2024 11:17:26 +0000 (12:17 +0100)]

drm/xe: introduce xe_gt_reset and xe_gt_wait_for_reset

Add synchronous version gt reset as there are few places where it
is expected.
Also add a wait helper to wait until gt reset is done.

Signed-off-by: Maciej Patelczyk <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Fixes: f3bc5bb4d53d ("drm/xe: Allow userspace to configure CCS mode")
Reviewed-by: Nirmoy Das <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>
(cherry picked from commit 155c77f45f63dd58a37eeb0896b0b140ab785836)
Signed-off-by: Thomas Hellström <[email protected]>

commit | commitdiff | tree

Jesus Narvaez [Thu, 12 Dec 2024 19:01:00 +0000 (11:01 -0800)]

drm/xe/guc: Adding steering info support for GuC register lists

The guc_mmio_reg interface supports steering, but it is currently not
implemented. This will allow the GuC to control steering of MMIO
registers after save-restore and avoid reading from fused off MCR
register instances.

Fixes: 9c57bc08652a ("drm/xe/lnl: Drop force_probe requirement")
Signed-off-by: Jesus Narvaez <[email protected]>
Cc: Matt Roper <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Cc: Daniele Ceraolo Spurio <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Signed-off-by: Daniele Ceraolo Spurio <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit ee5a1321df90891d59d83b7c9d5b6c5b755d059d)
Signed-off-by: Thomas Hellström <[email protected]>

commit | commitdiff | tree

Victor Nogueira [Sat, 11 Jan 2025 21:15:15 +0000 (18:15 -0300)]

selftests: net: Adapt ethtool mq tests to fix in qdisc graft

Because of patch[1] the graft behaviour changed

So the command:

tcq replace parent 100:1 handle 204:

Is no longer valid and will not delete 100:4 added by command:

tcq replace parent 100:4 handle 204: pfifo_fast

So to maintain the original behaviour, this patch manually deletes 100:4
and grafts 100:1

Note: This change will also work fine without [1]

[1] https://lore.kernel.org/netdev/20250111151455 [email protected]/T/#u

Signed-off-by: Victor Nogueira <[email protected]>
Reviewed-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

commit | commitdiff | tree

Dan Carpenter [Tue, 12 Nov 2024 10:23:03 +0000 (13:23 +0300)]

drm/bridge: ite-it6263: Prevent error pointer dereference in probe()

If devm_i2c_new_dummy_device() fails then we were supposed to return an
error code, but instead the function continues and will crash on the next
line. Add the missing return statement.

Fixes: 049723628716 ("drm/bridge: Add ITE IT6263 LVDS to HDMI converter")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Biju Das <[email protected]>
Reviewed-by: Liu Ying <[email protected]>
Signed-off-by: Liu Ying <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

commit | commitdiff | tree

Kevin Groeneveld [Mon, 13 Jan 2025 15:48:45 +0000 (10:48 -0500)]

net: fec: handle page_pool_dev_alloc_pages error

The fec_enet_update_cbd function calls page_pool_dev_alloc_pages but did
not handle the case when it returned NULL. There was a WARN_ON(!new_page)
but it would still proceed to use the NULL pointer and then crash.

This case does seem somewhat rare but when the system is under memory
pressure it can happen. One case where I can duplicate this with some
frequency is when writing over a smbd share to a SATA HDD attached to an
imx6q.

Setting /proc/sys/vm/min_free_kbytes to higher values also seems to solve
the problem for my test case. But it still seems wrong that the fec driver
ignores the memory allocation error and can crash.

This commit handles the allocation error by dropping the current packet.

Fixes: 95698ff6177b5 ("net: fec: using page pool to manage RX buffers")
Signed-off-by: Kevin Groeneveld <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Reviewed-by: Wei Fang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

commit | commitdiff | tree

John Sperbeck [Tue, 14 Jan 2025 01:13:54 +0000 (17:13 -0800)]

net: netpoll: ensure skb_pool list is always initialized

When __netpoll_setup() is called directly, instead of through
netpoll_setup(), the np->skb_pool list head isn't initialized.
If skb_pool_flush() is later called, then we hit a NULL pointer
in skb_queue_purge_reason().  This can be seen with this repro,
when CONFIG_NETCONSOLE is enabled as a module:

    ip tuntap add mode tap tap0
    ip link add name br0 type bridge
    ip link set dev tap0 master br0
    modprobe netconsole [email protected]/br0,[email protected]/
    rmmod netconsole

The backtrace is:

    BUG: kernel NULL pointer dereference, address: 0000000000000008
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    ... ... ...
    Call Trace:
     <TASK>
     __netpoll_free+0xa5/0xf0
     br_netpoll_cleanup+0x43/0x50 [bridge]
     do_netpoll_cleanup+0x43/0xc0
     netconsole_netdev_event+0x1e3/0x300 [netconsole]
     unregister_netdevice_notifier+0xd9/0x150
     cleanup_module+0x45/0x920 [netconsole]
     __se_sys_delete_module+0x205/0x290
     do_syscall_64+0x70/0x150
     entry_SYSCALL_64_after_hwframe+0x76/0x7e

Move the skb_pool list setup and initial skb fill into __netpoll_setup().

Fixes: 221a9c1df790 ("net: netpoll: Individualize the skb pool")
Signed-off-by: John Sperbeck <[email protected]>
Reviewed-by: Breno Leitao <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

commit | commitdiff | tree

Sean Anderson [Mon, 13 Jan 2025 16:30:00 +0000 (11:30 -0500)]

net: xilinx: axienet: Fix IRQ coalescing packet count overflow

If coalesce_count is greater than 255 it will not fit in the register and
will overflow. This can be reproduced by running

# ethtool -C ethX rx-frames 256

which will result in a timeout of 0us instead. Fix this by checking for
invalid values and reporting an error.

Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
Signed-off-by: Sean Anderson <[email protected]>
Reviewed-by: Shannon Nelson <[email protected]>
Reviewed-by: Radhey Shyam Pandey <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

commit | commitdiff | tree

Dan Carpenter [Mon, 13 Jan 2025 06:18:39 +0000 (09:18 +0300)]

nfp: bpf: prevent integer overflow in nfp_bpf_event_output()

The "sizeof(struct cmsg_bpf_event) + pkt_size + data_size" math could
potentially have an integer wrapping bug on 32bit systems. Check for
this and return an error.

Fixes: 9816dd35ecec ("nfp: bpf: perf event output helpers support")
Signed-off-by: Dan Carpenter <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

commit | commitdiff | tree

Linus Torvalds [Tue, 14 Jan 2025 22:10:17 +0000 (14:10 -0800)]

Merge tag 'seccomp-v6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp fix from Kees Cook:
"Fix a randconfig failure:

- Unconditionally define stub for !CONFIG_SECCOMP (Linus Walleij)"

* tag 'seccomp-v6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: Stub for !CONFIG_SECCOMP

commit | commitdiff | tree

Jakub Kicinski [Tue, 14 Jan 2025 21:41:16 +0000 (13:41 -0800)]

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Fix E825 initialization

Grzegorz Nitka says:

E825 products have incorrect initialization procedure, which may lead to
initialization failures and register values.

Fix E825 products initialization by adding correct sync delay, checking
the PHY revision only for current PHY and adding proper destination
device when reading port/quad.

In addition, E825 uses PF ID for indexing per PF registers and as
a primary PHY lane number, which is incorrect.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  ice: Add correct PHY lane assignment
  ice: Fix ETH56G FC-FEC Rx offset value
  ice: Fix quad registers read on E825
  ice: Fix E825 initialization
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

commit | commitdiff | tree

Jakub Kicinski [Tue, 14 Jan 2025 21:32:13 +0000 (13:32 -0800)]

Merge branch 'mptcp-fixes-for-connect-selftest-flakes'

Matthieu Baerts says:

====================
mptcp: fixes for connect selftest flakes

Last week, Jakub reported [1] that the MPTCP Connect selftest was
unstable. It looked like it started after the introduction of some fixes
[2]. After analysis from Paolo, these patches revealed existing bugs,
that should be fixed by the following patches.

- Patch 1: Make sure ACK are sent when MPTCP-level window re-opens. In
  some corner cases, the other peer was not notified when more data
  could be sent. A fix for v5.11, but depending on a feature introduced
  in v5.19.

- Patch 2: Fix spurious wake-up under memory pressure. In this
  situation, the userspace could be invited to read data not being there
  yet. A fix for v6.7.

- Patch 3: Fix a false positive error when running the MPTCP Connect
  selftest with the "disconnect" cases. The userspace could disconnect
  the socket too soon, which would reset (MP_FASTCLOSE) the connection,
  interpreted as an error by the test. A fix for v5.17.

Link: https://lore.kernel.org/[email protected]
Link: https://lore.kernel.org/[email protected]
====================

Link: https://patch.msgid.link/20250113-net-mptcp-connect-st-flakes-v1-0-0d986ee7b1b6@kernel.org
Signed-off-by: Jakub Kicinski <[email protected]>

commit | commitdiff | tree

Paolo Abeni [Mon, 13 Jan 2025 15:44:58 +0000 (16:44 +0100)]

selftests: mptcp: avoid spurious errors on disconnect

The disconnect test-case generates spurious errors:

  INFO: disconnect
  INFO: extra options: -I 3 -i /tmp/tmp.r43niviyoI
  01 ns1 MPTCP -> ns1 (10.0.1.1:10000      ) MPTCP (duration 140ms) [FAIL]
  file received by server does not match (in, out):
  Unexpected revents: POLLERR/POLLNVAL(19)
  -rw-r--r-- 1 root root 10028676 Jan 10 10:47 /tmp/tmp.r43niviyoI.disconnect
  Trailing bytes are:
  ��\��R��!8��u2��5N%
  -rw------- 1 root root 9992290 Jan 10 10:47 /tmp/tmp.Os4UbnWbI1
  Trailing bytes are:
  ��\��R��!8��u2��5N%
  02 ns1 MPTCP -> ns1 (dead:beef:1::1:10001) MPTCP (duration 206ms) [ OK ]
  03 ns1 MPTCP -> ns1 (dead:beef:1::1:10002) TCP   (duration  31ms) [ OK ]
  04 ns1 TCP   -> ns1 (dead:beef:1::1:10003) MPTCP (duration  26ms) [ OK ]
  [FAIL] Tests of the full disconnection have failed
  Time: 2 seconds

The root cause is actually in the user-space bits: the test program
currently disconnects as soon as all the pending data has been spooled,
generating an FASTCLOSE. If such option reaches the peer before the
latter has reached the closed status, the msk socket will report an
error to the user-space, as per protocol specification, causing the
above failure.

Address the issue explicitly waiting for all the relevant sockets to
reach a closed status before performing the disconnect.

Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests")
Cc: [email protected]
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://patch.msgid.link/20250113-net-mptcp-connect-st-flakes-v1-3-0d986ee7b1b6@kernel.org
Signed-off-by: Jakub Kicinski <[email protected]>

commit | commitdiff | tree

Paolo Abeni [Mon, 13 Jan 2025 15:44:57 +0000 (16:44 +0100)]

mptcp: fix spurious wake-up on under memory pressure

The wake-up condition currently implemented by mptcp_epollin_ready()
is wrong, as it could mark the MPTCP socket as readable even when
no data are present and the system is under memory pressure.

Explicitly check for some data being available in the receive queue.

Fixes: 5684ab1a0eff ("mptcp: give rcvlowat some love")
Cc: [email protected]
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://patch.msgid.link/20250113-net-mptcp-connect-st-flakes-v1-2-0d986ee7b1b6@kernel.org
Signed-off-by: Jakub Kicinski <[email protected]>

commit | commitdiff | tree

Paolo Abeni [Mon, 13 Jan 2025 15:44:56 +0000 (16:44 +0100)]

mptcp: be sure to send ack when mptcp-level window re-opens

mptcp_cleanup_rbuf() is responsible to send acks when the user-space
reads enough data to update the receive windows significantly.

It tries hard to avoid acquiring the subflow sockets locks by checking
conditions similar to the ones implemented at the TCP level.

To avoid too much code duplication - the MPTCP protocol can't reuse the
TCP helpers as part of the relevant status is maintained into the msk
socket - and multiple costly window size computation, mptcp_cleanup_rbuf
uses a rough estimate for the most recently advertised window size:
the MPTCP receive free space, as recorded as at last-ack time.

Unfortunately the above does not allow mptcp_cleanup_rbuf() to detect
a zero to non-zero win change in some corner cases, skipping the
tcp_cleanup_rbuf call and leaving the peer stuck.

After commit ea66758c1795 ("tcp: allow MPTCP to update the announced
window"), MPTCP has actually cheap access to the announced window value.
Use it in mptcp_cleanup_rbuf() for a more accurate ack generation.

Fixes: e3859603ba13 ("mptcp: better msk receive window updates")
Cc: [email protected]
Reported-by: Jakub Kicinski <[email protected]>
Closes: https://lore.kernel.org/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://patch.msgid.link/20250113-net-mptcp-connect-st-flakes-v1-1-0d986ee7b1b6@kernel.org
Signed-off-by: Jakub Kicinski <[email protected]>

commit | commitdiff | tree

Maíra Canal [Mon, 13 Jan 2025 15:47:40 +0000 (12:47 -0300)]

drm/v3d: Ensure job pointer is set to NULL after job completion

After a job completes, the corresponding pointer in the device must
be set to NULL. Failing to do so triggers a warning when unloading
the driver, as it appears the job is still active. To prevent this,
assign the job pointer to NULL after completing the job, indicating
the job has finished.

Fixes: 14d1d1908696 ("drm/v3d: Remove the bad signaled() implementation.")
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Jose Maria Casanova Crespo <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

commit | commitdiff | tree

Viresh Kumar [Fri, 10 Jan 2025 05:53:10 +0000 (11:23 +0530)]

cpufreq: Move endif to the end of Kconfig file

It is possible to enable few cpufreq drivers, without the framework
being enabled. This happened due to a bug while moving the entries
earlier. Fix it.

Fixes: 7ee1378736f0 ("cpufreq: Move CPPC configs to common Kconfig and add RISC-V")
Signed-off-by: Viresh Kumar <[email protected]>
Reviewed-by: Pierre Gondois <[email protected]>
Reviewed-by: Sunil V L <[email protected]>
Reviewed-by: Sudeep Holla <[email protected]>
Link: https://patch.msgid.link/84ac7a8fa72a8fe20487bb0a350a758bce060965.1736488384.git.viresh.kumar@linaro.org
Signed-off-by: Rafael J. Wysocki <[email protected]>

commit | commitdiff | tree

Linus Torvalds [Tue, 14 Jan 2025 19:32:14 +0000 (11:32 -0800)]

Merge tag 'pci-v6.13-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fix from Bjorn Helgaas:

- Prevent bwctrl NULL pointer dereference that caused hangs on shutdown
on ASUS ROG Strix SCAR 17 G733PYV (Lukas Wunner)

* tag 'pci-v6.13-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI/bwctrl: Fix NULL pointer deref on unbind and bind

commit | commitdiff | tree

Linus Torvalds [Tue, 14 Jan 2025 18:07:40 +0000 (10:07 -0800)]

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"One iscsi driver fix and one core fix.

  The core fix is an important one because a retry efficiency update is
  now causing some USB devices to get the wrong size on discovery (it
  upset their retry logic for READ_CAPACITY_16)"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: iscsi: Fix redundant response for ISCSI_UEVENT_GET_HOST_STATS request
  scsi: core: Fix command pass through retry regression

commit | commitdiff | tree

Ian Forbes [Fri, 10 Jan 2025 18:53:35 +0000 (12:53 -0600)]

drm/vmwgfx: Add new keep_resv BO param

Adds a new BO param that keeps the reservation locked after creation.
This removes the need to re-reserve the BO after creation which is a
waste of cycles.

This also fixes a bug in vmw_prime_import_sg_table where the imported
reservation is unlocked twice.

Signed-off-by: Ian Forbes <[email protected]>
Fixes: b32233acceff ("drm/vmwgfx: Fix prime import/export")
Reviewed-by: Zack Rusin <[email protected]>
Signed-off-by: Zack Rusin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

commit | commitdiff | tree

Ian Forbes [Wed, 8 Jan 2025 20:13:55 +0000 (14:13 -0600)]

drm/vmwgfx: Remove busy_places

Unused since commit a78a8da51b36
("drm/ttm: replace busy placement with flags v6")

Signed-off-by: Ian Forbes <[email protected]>
Reviewed-by: Martin Krastev <[email protected]>
Signed-off-by: Zack Rusin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

commit | commitdiff | tree

Ian Forbes [Tue, 10 Dec 2024 19:55:35 +0000 (13:55 -0600)]

drm/vmwgfx: Unreserve BO on error

Unlock BOs in reverse order.
Add an acquire context so that lockdep doesn't complain.

Fixes: d6667f0ddf46 ("drm/vmwgfx: Fix handling of dumb buffers")
Signed-off-by: Ian Forbes <[email protected]>
Signed-off-by: Zack Rusin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

commit | commitdiff | tree

Linus Torvalds [Tue, 14 Jan 2025 17:54:57 +0000 (09:54 -0800)]

Merge tag 'sound-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Hopefully the last PR for 6.13. This became bigger than wished due to
  the timing after holiday breaks.

  The only large LOC is the additional document for Cirrus codec which
  is nice for users (and absolutely safe). All the rest are small fixes
  in ASoC Rcar and codecs as well as HD-audio quirks (And no fix for USB
  guitar pedals seen yet :)"

* tag 'sound-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek: Fix volume adjustment issue on Lenovo ThinkBook 16P Gen5
  ALSA: hda/realtek: fixup ASUS H7606W
  ALSA: hda/realtek: fixup ASUS GA605W
  ALSA: hda/realtek: Add support for Ayaneo System using CS35L41 HDA
  ASoC: rsnd: check rsnd_adg_clk_enable() return value
  ASoC: cs42l43: Add codec force suspend/resume ops
  ALSA: doc: Add codecs/index.rst to top-level index
  ALSA: doc: cs35l56: Add information about Cirrus Logic CS35L54/56/57
  ASoC: samsung: Add missing depends on I2C
  MAINTAINERS: add missing maintainers for Simple Audio Card
  ASoC: samsung: Add missing selects for MFD_WM8994
  ASoC: codecs: es8316: Fix HW rate calculation for 48Mhz MCLK
  ASoC: wm8994: Add depends on MFD core
  ASoC: tas2781: Fix occasional calibration failture
  ASoC: codecs: ES8326: Adjust ANA_MICBIAS to reduce pop noise

commit | commitdiff | tree

Gui Chengming [Tue, 7 Jan 2025 09:09:08 +0000 (17:09 +0800)]

drm/amdgpu: fix fw attestation for MP0_14_0_{2/3}

FW attestation was disabled on MP0_14_0_{2/3}.

V2:
Move check into is_fw_attestation_support func. (Frank)
Remove DRM_WARN log info. (Alex)
Fix format. (Christian)

Signed-off-by: Gui Chengming <[email protected]>
Reviewed-by: Frank.Min <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 62952a38d9bcf357d5ffc97615c48b12c9cd627c)
Cc: [email protected] # 6.12.x

commit | commitdiff | tree

Christian König [Fri, 20 Dec 2024 15:21:11 +0000 (16:21 +0100)]

drm/amdgpu: always sync the GFX pipe on ctx switch

That is needed to enforce isolation between contexts.

Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit def59436fb0d3ca0f211d14873d0273d69ebb405)
Cc: [email protected]

commit | commitdiff | tree

Kenneth Feng [Thu, 9 Jan 2025 07:58:23 +0000 (15:58 +0800)]

drm/amdgpu: disable gfxoff with the compute workload on gfx12

Disable gfxoff with the compute workload on gfx12. This is a
workaround for the opencl test failure.

Signed-off-by: Kenneth Feng <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 2affe2bbc997b3920045c2c434e480c81a5f9707)
Cc: [email protected] # 6.12.x

commit | commitdiff | tree

Jens Axboe [Tue, 14 Jan 2025 16:44:21 +0000 (09:44 -0700)]

io_uring/rsrc: fixup io_clone_buffers() error handling

Jann reports he can trigger a UAF if the target ring unregisters
buffers before the clone operation is fully done. And additionally
also an issue related to node allocation failures. Both of those
stemp from the fact that the cleanup logic puts the buffers manually,
rather than just relying on io_rsrc_data_free() doing it. Hence kill
the manual cleanup code and just let io_rsrc_data_free() handle it,
it'll put the nodes appropriately.

Reported-by: Jann Horn <[email protected]>
Fixes: 3597f2786b68 ("io_uring/rsrc: unify file and buffer resource tables")
Signed-off-by: Jens Axboe <[email protected]>

commit | commitdiff | tree

Srinivasan Shanmugam [Thu, 9 Jan 2025 16:03:51 +0000 (21:33 +0530)]

drm/amdgpu: Fix Circular Locking Dependency in AMDGPU GFX Isolation

This commit addresses a circular locking dependency issue within the GFX
isolation mechanism. The problem was identified by a warning indicating
a potential deadlock due to inconsistent lock acquisition order.

- The `amdgpu_gfx_enforce_isolation_ring_begin_use` and
  `amdgpu_gfx_enforce_isolation_ring_end_use` functions previously
  acquired `enforce_isolation_mutex` and called `amdgpu_gfx_kfd_sch_ctrl`,
  leading to potential deadlocks. ie., If `amdgpu_gfx_kfd_sch_ctrl` is
  called while `enforce_isolation_mutex` is held, and
  `amdgpu_gfx_enforce_isolation_handler` is called while `kfd_sch_mutex` is
  held, it can create a circular dependency.

By ensuring consistent lock usage, this fix resolves the issue:

[  606.297333] ======================================================
[  606.297343] WARNING: possible circular locking dependency detected
[  606.297353] 6.10.0-amd-mlkd-610-311224-lof #19 Tainted: G           OE
[  606.297365] ------------------------------------------------------
[  606.297375] kworker/u96:3/3825 is trying to acquire lock:
[  606.297385] ffff9aa64e431cb8 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}, at: __flush_work+0x232/0x610
[  606.297413]
               but task is already holding lock:
[  606.297423] ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu]
[  606.297725]
               which lock already depends on the new lock.

[  606.297738]
               the existing dependency chain (in reverse order) is:
[  606.297749]
               -> #2 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}:
[  606.297765]        __mutex_lock+0x85/0x930
[  606.297776]        mutex_lock_nested+0x1b/0x30
[  606.297786]        amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu]
[  606.298007]        amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu]
[  606.298225]        amdgpu_ring_alloc+0x48/0x70 [amdgpu]
[  606.298412]        amdgpu_ib_schedule+0x176/0x8a0 [amdgpu]
[  606.298603]        amdgpu_job_run+0xac/0x1e0 [amdgpu]
[  606.298866]        drm_sched_run_job_work+0x24f/0x430 [gpu_sched]
[  606.298880]        process_one_work+0x21e/0x680
[  606.298890]        worker_thread+0x190/0x350
[  606.298899]        kthread+0xe7/0x120
[  606.298908]        ret_from_fork+0x3c/0x60
[  606.298919]        ret_from_fork_asm+0x1a/0x30
[  606.298929]
               -> #1 (&adev->enforce_isolation_mutex){+.+.}-{3:3}:
[  606.298947]        __mutex_lock+0x85/0x930
[  606.298956]        mutex_lock_nested+0x1b/0x30
[  606.298966]        amdgpu_gfx_enforce_isolation_handler+0x87/0x370 [amdgpu]
[  606.299190]        process_one_work+0x21e/0x680
[  606.299199]        worker_thread+0x190/0x350
[  606.299208]        kthread+0xe7/0x120
[  606.299217]        ret_from_fork+0x3c/0x60
[  606.299227]        ret_from_fork_asm+0x1a/0x30
[  606.299236]
               -> #0 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}:
[  606.299257]        __lock_acquire+0x16f9/0x2810
[  606.299267]        lock_acquire+0xd1/0x300
[  606.299276]        __flush_work+0x250/0x610
[  606.299286]        cancel_delayed_work_sync+0x71/0x80
[  606.299296]        amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu]
[  606.299509]        amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu]
[  606.299723]        amdgpu_ring_alloc+0x48/0x70 [amdgpu]
[  606.299909]        amdgpu_ib_schedule+0x176/0x8a0 [amdgpu]
[  606.300101]        amdgpu_job_run+0xac/0x1e0 [amdgpu]
[  606.300355]        drm_sched_run_job_work+0x24f/0x430 [gpu_sched]
[  606.300369]        process_one_work+0x21e/0x680
[  606.300378]        worker_thread+0x190/0x350
[  606.300387]        kthread+0xe7/0x120
[  606.300396]        ret_from_fork+0x3c/0x60
[  606.300406]        ret_from_fork_asm+0x1a/0x30
[  606.300416]
               other info that might help us debug this:

[  606.300428] Chain exists of:
                 (work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work) --> &adev->enforce_isolation_mutex --> &adev->gfx.kfd_sch_mutex

[  606.300458]  Possible unsafe locking scenario:

[  606.300468]        CPU0                    CPU1
[  606.300476]        ----                    ----
[  606.300484]   lock(&adev->gfx.kfd_sch_mutex);
[  606.300494]                                lock(&adev->enforce_isolation_mutex);
[  606.300508]                                lock(&adev->gfx.kfd_sch_mutex);
[  606.300521]   lock((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work));
[  606.300536]
                *** DEADLOCK ***

[  606.300546] 5 locks held by kworker/u96:3/3825:
[  606.300555]  #0: ffff9aa5aa1f5d58 ((wq_completion)comp_1.1.0){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680
[  606.300577]  #1: ffffaa53c3c97e40 ((work_completion)(&sched->work_run_job)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680
[  606.300600]  #2: ffff9aa64e463c98 (&adev->enforce_isolation_mutex){+.+.}-{3:3}, at: amdgpu_gfx_enforce_isolation_ring_begin_use+0x1c3/0x5d0 [amdgpu]
[  606.300837]  #3: ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu]
[  606.301062]  #4: ffffffff8c1a5660 (rcu_read_lock){....}-{1:2}, at: __flush_work+0x70/0x610
[  606.301083]
               stack backtrace:
[  606.301092] CPU: 14 PID: 3825 Comm: kworker/u96:3 Tainted: G           OE      6.10.0-amd-mlkd-610-311224-lof #19
[  606.301109] Hardware name: Gigabyte Technology Co., Ltd. X570S GAMING X/X570S GAMING X, BIOS F7 03/22/2024
[  606.301124] Workqueue: comp_1.1.0 drm_sched_run_job_work [gpu_sched]
[  606.301140] Call Trace:
[  606.301146]  <TASK>
[  606.301154]  dump_stack_lvl+0x9b/0xf0
[  606.301166]  dump_stack+0x10/0x20
[  606.301175]  print_circular_bug+0x26c/0x340
[  606.301187]  check_noncircular+0x157/0x170
[  606.301197]  ? register_lock_class+0x48/0x490
[  606.301213]  __lock_acquire+0x16f9/0x2810
[  606.301230]  lock_acquire+0xd1/0x300
[  606.301239]  ? __flush_work+0x232/0x610
[  606.301250]  ? srso_alias_return_thunk+0x5/0xfbef5
[  606.301261]  ? mark_held_locks+0x54/0x90
[  606.301274]  ? __flush_work+0x232/0x610
[  606.301284]  __flush_work+0x250/0x610
[  606.301293]  ? __flush_work+0x232/0x610
[  606.301305]  ? __pfx_wq_barrier_func+0x10/0x10
[  606.301318]  ? mark_held_locks+0x54/0x90
[  606.301331]  ? srso_alias_return_thunk+0x5/0xfbef5
[  606.301345]  cancel_delayed_work_sync+0x71/0x80
[  606.301356]  amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu]
[  606.301661]  amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu]
[  606.302050]  ? srso_alias_return_thunk+0x5/0xfbef5
[  606.302069]  amdgpu_ring_alloc+0x48/0x70 [amdgpu]
[  606.302452]  amdgpu_ib_schedule+0x176/0x8a0 [amdgpu]
[  606.302862]  ? drm_sched_entity_error+0x82/0x190 [gpu_sched]
[  606.302890]  amdgpu_job_run+0xac/0x1e0 [amdgpu]
[  606.303366]  drm_sched_run_job_work+0x24f/0x430 [gpu_sched]
[  606.303388]  process_one_work+0x21e/0x680
[  606.303409]  worker_thread+0x190/0x350
[  606.303424]  ? __pfx_worker_thread+0x10/0x10
[  606.303437]  kthread+0xe7/0x120
[  606.303449]  ? __pfx_kthread+0x10/0x10
[  606.303463]  ret_from_fork+0x3c/0x60
[  606.303476]  ? __pfx_kthread+0x10/0x10
[  606.303489]  ret_from_fork_asm+0x1a/0x30
[  606.303512]  </TASK>

v2: Refactor lock handling to resolve circular dependency (Alex)

- Introduced a `sched_work` flag to defer the call to
  `amdgpu_gfx_kfd_sch_ctrl` until after releasing
  `enforce_isolation_mutex`.
- This change ensures that `amdgpu_gfx_kfd_sch_ctrl` is called outside
  the critical section, preventing the circular dependency and deadlock.
- The `sched_work` flag is set within the mutex-protected section if
  conditions are met, and the actual function call is made afterward.
- This approach ensures consistent lock acquisition order.

Fixes: afefd6f24502 ("drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization")
Cc: Christian König <[email protected]>
Cc: Alex Deucher <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Suggested-by: Alex Deucher <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 0b6b2dd38336d5fd49214f0e4e6495e658e3ab44)
Cc: [email protected]

commit | commitdiff | tree

Steven Rostedt [Tue, 14 Jan 2025 15:12:02 +0000 (10:12 -0500)]

ftrace: Document that multiple function_graph tracing may have different times

The function graph tracer now calculates the calltime internally and for
each instance. If there are two instances that are running function graph
tracer and are tracing the same functions, the timings of the length of
those functions may be slightly different:

# trace-cmd record -B foo -p function_graph -B bar -p function_graph sleep 5
# trace-cmd report
[..]
bar:            sleep-981   [000] ...1.  1101.109027: funcgraph_entry:        0.764 us   |          mutex_unlock(); (ret=0xffff8abcc256c300)
foo:            sleep-981   [000] ...1.  1101.109028: funcgraph_entry:        0.748 us   |          mutex_unlock(); (ret=0xffff8abcc256c300)
bar:            sleep-981   [000] .....  1101.109029: funcgraph_exit:         2.456 us   |        } (ret=0xffff8abcc256c300)
foo:            sleep-981   [000] .....  1101.109029: funcgraph_exit:         2.403 us   |        } (ret=0xffff8abcc256c300)
bar:            sleep-981   [000] d..1.  1101.109031: funcgraph_entry:        0.844 us   |  fpregs_assert_state_consistent(); (ret=0x0)
foo:            sleep-981   [000] d..1.  1101.109032: funcgraph_entry:        0.803 us   |  fpregs_assert_state_consistent(); (ret=0x0)

Link: https://lore.kernel.org/all/[email protected]/
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Suggested-by: Masami Hiramatsu <[email protected]>
Link: https://lore.kernel.org/[email protected]
Signed-off-by: Steven Rostedt (Google) <[email protected]>

commit | commitdiff | tree

Shrikanth Hegde [Fri, 3 Jan 2025 09:36:47 +0000 (15:06 +0530)]

tracing: Print lazy preemption model

Print lazy preemption model in ftrace header when latency-format=1.

# cat /sys/kernel/debug/sched/preempt
none voluntary full (lazy)

Without patch:
  latency: 0 us, #232946/232946, CPU#40 | (M:unknown VP:0, KP:0, SP:0 HP:0 #P:80)
                                             ^^^^^^^

With Patch:
  latency: 0 us, #1897938/25566788, CPU#16 | (M:lazy VP:0, KP:0, SP:0 HP:0 #P:80)
                                                ^^^^

Now that lazy preemption is part of the kernel, make sure the tracing
infrastructure reflects that.

Link: https://lore.kernel.org/[email protected]
Signed-off-by: Shrikanth Hegde <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

commit | commitdiff | tree

Steven Rostedt [Mon, 13 Jan 2025 23:31:24 +0000 (18:31 -0500)]

tracing: Fix irqsoff and wakeup latency tracers when using function graph

The function graph tracer has become generic so that kretprobes and BPF
can use it along with function graph tracing itself. Some of the
infrastructure was specific for function graph tracing such as recording
the calltime and return time of the functions. Calling the clock code on a
high volume function does add overhead. The calculation of the calltime
was removed from the generic code and placed into the function graph
tracer itself so that the other users did not incur this overhead as they
did not need that timestamp.

The calltime field was still kept in the generic return entry structure
and the function graph return entry callback filled it as that structure
was passed to other code.

But this broke both irqsoff and wakeup latency tracer as they still
depended on the trace structure containing the calltime when the option
display-graph is set as it used some of those same functions that the
function graph tracer used. But now the calltime was not set and was just
zero. This caused the calculation of the function time to be the absolute
value of the return timestamp and not the length of the function.

# cd /sys/kernel/tracing
# echo 1 > options/display-graph
# echo irqsoff > current_tracer

The tracers went from:

#   REL TIME      CPU  TASK/PID       ||||     DURATION                  FUNCTION CALLS
#      |          |     |    |        ||||      |   |                     |   |   |   |
        0 us |   4)    <idle>-0    |  d..1. |   0.000 us    |  irqentry_enter();
        3 us |   4)    <idle>-0    |  d..2. |               |  irq_enter_rcu() {
        4 us |   4)    <idle>-0    |  d..2. |   0.431 us    |    preempt_count_add();
        5 us |   4)    <idle>-0    |  d.h2. |               |    tick_irq_enter() {
        5 us |   4)    <idle>-0    |  d.h2. |   0.433 us    |      tick_check_oneshot_broadcast_this_cpu();
        6 us |   4)    <idle>-0    |  d.h2. |   2.426 us    |      ktime_get();
        9 us |   4)    <idle>-0    |  d.h2. |               |      tick_nohz_stop_idle() {
       10 us |   4)    <idle>-0    |  d.h2. |   0.398 us    |        nr_iowait_cpu();
       11 us |   4)    <idle>-0    |  d.h1. |   1.903 us    |      }
       11 us |   4)    <idle>-0    |  d.h2. |               |      tick_do_update_jiffies64() {
       12 us |   4)    <idle>-0    |  d.h2. |               |        _raw_spin_lock() {
       12 us |   4)    <idle>-0    |  d.h2. |   0.360 us    |          preempt_count_add();
       13 us |   4)    <idle>-0    |  d.h3. |   0.354 us    |          do_raw_spin_lock();
       14 us |   4)    <idle>-0    |  d.h2. |   2.207 us    |        }
       15 us |   4)    <idle>-0    |  d.h3. |   0.428 us    |        calc_global_load();
       16 us |   4)    <idle>-0    |  d.h3. |               |        _raw_spin_unlock() {
       16 us |   4)    <idle>-0    |  d.h3. |   0.380 us    |          do_raw_spin_unlock();
       17 us |   4)    <idle>-0    |  d.h3. |   0.334 us    |          preempt_count_sub();
       18 us |   4)    <idle>-0    |  d.h1. |   1.768 us    |        }
       18 us |   4)    <idle>-0    |  d.h2. |               |        update_wall_time() {
      [..]

To:

#   REL TIME      CPU  TASK/PID       ||||     DURATION                  FUNCTION CALLS
#      |          |     |    |        ||||      |   |                     |   |   |   |
        0 us |   5)    <idle>-0    |  d.s2. |   0.000 us    |  _raw_spin_lock_irqsave();
        0 us |   5)    <idle>-0    |  d.s3. |   312159583 us |      preempt_count_add();
        2 us |   5)    <idle>-0    |  d.s4. |   312159585 us |      do_raw_spin_lock();
        3 us |   5)    <idle>-0    |  d.s4. |               |      _raw_spin_unlock() {
        3 us |   5)    <idle>-0    |  d.s4. |   312159586 us |        do_raw_spin_unlock();
        4 us |   5)    <idle>-0    |  d.s4. |   312159587 us |        preempt_count_sub();
        4 us |   5)    <idle>-0    |  d.s2. |   312159587 us |      }
        5 us |   5)    <idle>-0    |  d.s3. |               |      _raw_spin_lock() {
        5 us |   5)    <idle>-0    |  d.s3. |   312159588 us |        preempt_count_add();
        6 us |   5)    <idle>-0    |  d.s4. |   312159589 us |        do_raw_spin_lock();
        7 us |   5)    <idle>-0    |  d.s3. |   312159590 us |      }
        8 us |   5)    <idle>-0    |  d.s4. |   312159591 us |      calc_wheel_index();
        9 us |   5)    <idle>-0    |  d.s4. |               |      enqueue_timer() {
        9 us |   5)    <idle>-0    |  d.s4. |               |        wake_up_nohz_cpu() {
       11 us |   5)    <idle>-0    |  d.s4. |               |          native_smp_send_reschedule() {
       11 us |   5)    <idle>-0    |  d.s4. |   312171987 us |            default_send_IPI_single_phys();
    12408 us |   5)    <idle>-0    |  d.s3. |   312171990 us |          }
    12408 us |   5)    <idle>-0    |  d.s3. |   312171991 us |        }
    12409 us |   5)    <idle>-0    |  d.s3. |   312171991 us |      }

Where the calculation of the time for each function was the return time
minus zero and not the time of when the function returned.

Have these tracers also save the calltime in the fgraph data section and
retrieve it again on the return to get the correct timings again.

Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Mark Rutland <[email protected]>
Link: https://lore.kernel.org/[email protected]
Fixes: f1f36e22bee9 ("ftrace: Have calltime be saved in the fgraph storage")
Signed-off-by: Steven Rostedt (Google) <[email protected]>

commit | commitdiff | tree

Paolo Abeni [Tue, 14 Jan 2025 11:29:39 +0000 (12:29 +0100)]

Merge branch 'vsock-some-fixes-due-to-transport-de-assignment'

Stefano Garzarella says:

====================
vsock: some fixes due to transport de-assignment

v1: https://lore.kernel.org/netdev/20250108180617 [email protected]/
v2:
- Added patch 3 to cancel the virtio close delayed work when de-assigning
  the transport
- Added patch 4 to clean the socket state after de-assigning the transport
- Added patch 5 as suggested by Michael and Hyunwoo Kim. It's based on
  Hyunwoo Kim and Wongi Lee patch [1] but using WARN_ON and covering more
  functions
- Added R-b/T-b tags

This series includes two patches discussed in the thread started by
Hyunwoo Kim a few weeks ago [1], plus 3 more patches added after some
discussions on v1 (see changelog). All related to the case where a vsock
socket is de-assigned from a transport (e.g., because the connect fails
or is interrupted by a signal) and then assigned to another transport
or to no-one (NULL).

I tested with usual vsock test suite, plus Michal repro [2]. (Note: the repo
works only if a G2H transport is not loaded, e.g. virtio-vsock driver).

The first patch is a fix more appropriate to the problem reported in
that thread, the second patch on the other hand is a related fix but
of a different problem highlighted by Michal Luczaj. It's present only
in vsock_bpf and already handled in af_vsock.c
The third patch is to cancel the virtio close delayed work when de-assigning
the transport, the fourth patch is to clean the socket state after de-assigning
the transport, the last patch adds warnings and prevents null-ptr-deref in
vsock_*[has_data|has_space].

Hyunwoo Kim, Michal, if you can test and report your Tested-by that
would be great!

[1] https://lore.kernel.org/netdev/Z2K%2FI4nlHdfMRTZC@v4bel-B760M-AORUS-ELITE-AX/
[2] https://lore.kernel.org/netdev/2b3062e3-bdaa-4c94-a3c0-2930595b9670@rbox.co/
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

commit | commitdiff | tree

Stefano Garzarella [Fri, 10 Jan 2025 08:35:11 +0000 (09:35 +0100)]

vsock: prevent null-ptr-deref in vsock_*[has_data|has_space]

Recent reports have shown how we sometimes call vsock_*_has_data()
when a vsock socket has been de-assigned from a transport (see attached
links), but we shouldn't.

Previous commits should have solved the real problems, but we may have
more in the future, so to avoid null-ptr-deref, we can return 0
(no space, no data available) but with a warning.

This way the code should continue to run in a nearly consistent state
and have a warning that allows us to debug future problems.

Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Cc: [email protected]
Link: https://lore.kernel.org/netdev/Z2K%2FI4nlHdfMRTZC@v4bel-B760M-AORUS-ELITE-AX/
Link: https://lore.kernel.org/netdev/[email protected]/
Link: https://lore.kernel.org/netdev/[email protected]/
Co-developed-by: Hyunwoo Kim <[email protected]>
Signed-off-by: Hyunwoo Kim <[email protected]>
Co-developed-by: Wongi Lee <[email protected]>
Signed-off-by: Wongi Lee <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
Reviewed-by: Luigi Leonardi <[email protected]>
Reviewed-by: Hyunwoo Kim <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

commit | commitdiff | tree

Stefano Garzarella [Fri, 10 Jan 2025 08:35:10 +0000 (09:35 +0100)]

vsock: reset socket state when de-assigning the transport

Transport's release() and destruct() are called when de-assigning the
vsock transport. These callbacks can touch some socket state like
sock flags, sk_state, and peer_shutdown.

Since we are reassigning the socket to a new transport during
vsock_connect(), let's reset these fields to have a clean state with
the new transport.

Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Cc: [email protected]
Signed-off-by: Stefano Garzarella <[email protected]>
Reviewed-by: Luigi Leonardi <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

commit | commitdiff | tree

Stefano Garzarella [Fri, 10 Jan 2025 08:35:09 +0000 (09:35 +0100)]

vsock/virtio: cancel close work in the destructor

During virtio_transport_release() we can schedule a delayed work to
perform the closing of the socket before destruction.

The destructor is called either when the socket is really destroyed
(reference counter to zero), or it can also be called when we are
de-assigning the transport.

In the former case, we are sure the delayed work has completed, because
it holds a reference until it completes, so the destructor will
definitely be called after the delayed work is finished.
But in the latter case, the destructor is called by AF_VSOCK core, just
after the release(), so there may still be delayed work scheduled.

Refactor the code, moving the code to delete the close work already in
the do_close() to a new function. Invoke it during destruction to make
sure we don't leave any pending work.

Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Cc: [email protected]
Reported-by: Hyunwoo Kim <[email protected]>
Closes: https://lore.kernel.org/netdev/Z37Sh+utS+iV3+eb@v4bel-B760M-AORUS-ELITE-AX/
Signed-off-by: Stefano Garzarella <[email protected]>
Reviewed-by: Luigi Leonardi <[email protected]>
Tested-by: Hyunwoo Kim <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

commit | commitdiff | tree

Stefano Garzarella [Fri, 10 Jan 2025 08:35:08 +0000 (09:35 +0100)]

vsock/bpf: return early if transport is not assigned

Some of the core functions can only be called if the transport
has been assigned.

As Michal reported, a socket might have the transport at NULL,
for example after a failed connect(), causing the following trace:

    BUG: kernel NULL pointer dereference, address: 00000000000000a0
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 12faf8067 P4D 12faf8067 PUD 113670067 PMD 0
    Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
    CPU: 15 UID: 0 PID: 1198 Comm: a.out Not tainted 6.13.0-rc2+
    RIP: 0010:vsock_connectible_has_data+0x1f/0x40
    Call Trace:
     vsock_bpf_recvmsg+0xca/0x5e0
     sock_recvmsg+0xb9/0xc0
     __sys_recvfrom+0xb3/0x130
     __x64_sys_recvfrom+0x20/0x30
     do_syscall_64+0x93/0x180
     entry_SYSCALL_64_after_hwframe+0x76/0x7e

So we need to check the `vsk->transport` in vsock_bpf_recvmsg(),
especially for connected sockets (stream/seqpacket) as we already
do in __vsock_connectible_recvmsg().

Fixes: 634f1a7110b4 ("vsock: support sockmap")
Cc: [email protected]
Reported-by: Michal Luczaj <[email protected]>
Closes: https://lore.kernel.org/netdev/[email protected]/
Tested-by: Michal Luczaj <[email protected]>
Reported-by: [email protected]
Closes: https://lore.kernel.org/netdev/[email protected]/
Tested-by: [email protected]
Reviewed-by: Hyunwoo Kim <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Luigi Leonardi <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

commit | commitdiff | tree

Stefano Garzarella [Fri, 10 Jan 2025 08:35:07 +0000 (09:35 +0100)]

vsock/virtio: discard packets if the transport changes

If the socket has been de-assigned or assigned to another transport,
we must discard any packets received because they are not expected
and would cause issues when we access vsk->transport.

A possible scenario is described by Hyunwoo Kim in the attached link,
where after a first connect() interrupted by a signal, and a second
connect() failed, we can find `vsk->transport` at NULL, leading to a
NULL pointer dereference.

Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Cc: [email protected]
Reported-by: Hyunwoo Kim <[email protected]>
Reported-by: Wongi Lee <[email protected]>
Closes: https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/
Signed-off-by: Stefano Garzarella <[email protected]>
Reviewed-by: Hyunwoo Kim <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

commit | commitdiff | tree

Paolo Abeni [Tue, 14 Jan 2025 10:20:06 +0000 (11:20 +0100)]

Merge branch 'gtp-pfcp-fix-use-after-free-of-udp-tunnel-socket'

Kuniyuki Iwashima says:

====================
gtp/pfcp: Fix use-after-free of UDP tunnel socket.

Xiao Liang pointed out weird netns usages in ->newlink() of
gtp and pfcp.

This series fixes the issues.

Link: https://lore.kernel.org/netdev/[email protected]/
Changes:
  v2:
    * Patch 1
      * Fix uninit/unused local var

  v1: https://lore.kernel.org/netdev/20250108062834 [email protected]/
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 10 Jan 2025 01:47:54 +0000 (10:47 +0900)]

pfcp: Destroy device along with udp socket's netns dismantle.

pfcp_newlink() links the device to a list in dev_net(dev) instead
of net, where a udp tunnel socket is created.

Even when net is removed, the device stays alive on dev_net(dev).
Then, removing net triggers the splat below. [0]

In this example, pfcp0 is created in ns2, but the udp socket is
created in ns1.

  ip netns add ns1
  ip netns add ns2
  ip -n ns1 link add netns ns2 name pfcp0 type pfcp
  ip netns del ns1

Let's link the device to the socket's netns instead.

Now, pfcp_net_exit() needs another netdev iteration to remove
all pfcp devices in the netns.

pfcp_dev_list is not used under RCU, so the list API is converted
to the non-RCU variant.

pfcp_net_exit() can be converted to .exit_batch_rtnl() in net-next.

[0]:
ref_tracker: net notrefcnt@00000000128b34dc has 1/1 users at
     sk_alloc (./include/net/net_namespace.h:345 net/core/sock.c:2236)
     inet_create (net/ipv4/af_inet.c:326 net/ipv4/af_inet.c:252)
     __sock_create (net/socket.c:1558)
     udp_sock_create4 (net/ipv4/udp_tunnel_core.c:18)
     pfcp_create_sock (drivers/net/pfcp.c:168)
     pfcp_newlink (drivers/net/pfcp.c:182 drivers/net/pfcp.c:197)
     rtnl_newlink (net/core/rtnetlink.c:3786 net/core/rtnetlink.c:3897 net/core/rtnetlink.c:4012)
     rtnetlink_rcv_msg (net/core/rtnetlink.c:6922)
     netlink_rcv_skb (net/netlink/af_netlink.c:2542)
     netlink_unicast (net/netlink/af_netlink.c:1321 net/netlink/af_netlink.c:1347)
     netlink_sendmsg (net/netlink/af_netlink.c:1891)
     ____sys_sendmsg (net/socket.c:711 net/socket.c:726 net/socket.c:2583)
     ___sys_sendmsg (net/socket.c:2639)
     __sys_sendmsg (net/socket.c:2669)
     do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
     entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

WARNING: CPU: 1 PID: 11 at lib/ref_tracker.c:179 ref_tracker_dir_exit (lib/ref_tracker.c:179)
Modules linked in:
CPU: 1 UID: 0 PID: 11 Comm: kworker/u16:0 Not tainted 6.13.0-rc5-00147-g4c1224501e9d #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: netns cleanup_net
RIP: 0010:ref_tracker_dir_exit (lib/ref_tracker.c:179)
Code: 00 00 00 fc ff df 4d 8b 26 49 bd 00 01 00 00 00 00 ad de 4c 39 f5 0f 85 df 00 00 00 48 8b 74 24 08 48 89 df e8 a5 cc 12 02 90 <0f> 0b 90 48 8d 6b 44 be 04 00 00 00 48 89 ef e8 80 de 67 ff 48 89
RSP: 0018:ff11000007f3fb60 EFLAGS: 00010286
RAX: 00000000000020ef RBX: ff1100000d6481e0 RCX: 1ffffffff0e40d82
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8423ee3c
RBP: ff1100000d648230 R08: 0000000000000001 R09: fffffbfff0e395af
R10: 0000000000000001 R11: 0000000000000000 R12: ff1100000d648230
R13: dead000000000100 R14: ff1100000d648230 R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ff1100006ce80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005620e1363990 CR3: 000000000eeb2002 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
? __warn (kernel/panic.c:748)
? ref_tracker_dir_exit (lib/ref_tracker.c:179)
? report_bug (lib/bug.c:201 lib/bug.c:219)
? handle_bug (arch/x86/kernel/traps.c:285)
? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1))
? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
? ref_tracker_dir_exit (lib/ref_tracker.c:179)
? __pfx_ref_tracker_dir_exit (lib/ref_tracker.c:158)
? kfree (mm/slub.c:4613 mm/slub.c:4761)
net_free (net/core/net_namespace.c:476 net/core/net_namespace.c:467)
cleanup_net (net/core/net_namespace.c:664 (discriminator 3))
process_one_work (kernel/workqueue.c:3229)
worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391)
kthread (kernel/kthread.c:389)
ret_from_fork (arch/x86/kernel/process.c:147)
ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
  </TASK>

Fixes: 76c8764ef36a ("pfcp: add PFCP module")
Reported-by: Xiao Liang <[email protected]>
Closes: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 10 Jan 2025 01:47:53 +0000 (10:47 +0900)]

gtp: Destroy device along with udp socket's netns dismantle.

gtp_newlink() links the device to a list in dev_net(dev) instead of
src_net, where a udp tunnel socket is created.

Even when src_net is removed, the device stays alive on dev_net(dev).
Then, removing src_net triggers the splat below. [0]

In this example, gtp0 is created in ns2, and the udp socket is created
in ns1.

  ip netns add ns1
  ip netns add ns2
  ip -n ns1 link add netns ns2 name gtp0 type gtp role sgsn
  ip netns del ns1

Let's link the device to the socket's netns instead.

Now, gtp_net_exit_batch_rtnl() needs another netdev iteration to remove
all gtp devices in the netns.

[0]:
ref_tracker: net notrefcnt@000000003d6e7d05 has 1/2 users at
     sk_alloc (./include/net/net_namespace.h:345 net/core/sock.c:2236)
     inet_create (net/ipv4/af_inet.c:326 net/ipv4/af_inet.c:252)
     __sock_create (net/socket.c:1558)
     udp_sock_create4 (net/ipv4/udp_tunnel_core.c:18)
     gtp_create_sock (./include/net/udp_tunnel.h:59 drivers/net/gtp.c:1423)
     gtp_create_sockets (drivers/net/gtp.c:1447)
     gtp_newlink (drivers/net/gtp.c:1507)
     rtnl_newlink (net/core/rtnetlink.c:3786 net/core/rtnetlink.c:3897 net/core/rtnetlink.c:4012)
     rtnetlink_rcv_msg (net/core/rtnetlink.c:6922)
     netlink_rcv_skb (net/netlink/af_netlink.c:2542)
     netlink_unicast (net/netlink/af_netlink.c:1321 net/netlink/af_netlink.c:1347)
     netlink_sendmsg (net/netlink/af_netlink.c:1891)
     ____sys_sendmsg (net/socket.c:711 net/socket.c:726 net/socket.c:2583)
     ___sys_sendmsg (net/socket.c:2639)
     __sys_sendmsg (net/socket.c:2669)
     do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)

WARNING: CPU: 1 PID: 60 at lib/ref_tracker.c:179 ref_tracker_dir_exit (lib/ref_tracker.c:179)
Modules linked in:
CPU: 1 UID: 0 PID: 60 Comm: kworker/u16:2 Not tainted 6.13.0-rc5-00147-g4c1224501e9d #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: netns cleanup_net
RIP: 0010:ref_tracker_dir_exit (lib/ref_tracker.c:179)
Code: 00 00 00 fc ff df 4d 8b 26 49 bd 00 01 00 00 00 00 ad de 4c 39 f5 0f 85 df 00 00 00 48 8b 74 24 08 48 89 df e8 a5 cc 12 02 90 <0f> 0b 90 48 8d 6b 44 be 04 00 00 00 48 89 ef e8 80 de 67 ff 48 89
RSP: 0018:ff11000009a07b60 EFLAGS: 00010286
RAX: 0000000000002bd3 RBX: ff1100000f4e1aa0 RCX: 1ffffffff0e40ac6
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8423ee3c
RBP: ff1100000f4e1af0 R08: 0000000000000001 R09: fffffbfff0e395ae
R10: 0000000000000001 R11: 0000000000036001 R12: ff1100000f4e1af0
R13: dead000000000100 R14: ff1100000f4e1af0 R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ff1100006ce80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f9b2464bd98 CR3: 0000000005286005 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
? __warn (kernel/panic.c:748)
? ref_tracker_dir_exit (lib/ref_tracker.c:179)
? report_bug (lib/bug.c:201 lib/bug.c:219)
? handle_bug (arch/x86/kernel/traps.c:285)
? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1))
? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
? ref_tracker_dir_exit (lib/ref_tracker.c:179)
? __pfx_ref_tracker_dir_exit (lib/ref_tracker.c:158)
? kfree (mm/slub.c:4613 mm/slub.c:4761)
net_free (net/core/net_namespace.c:476 net/core/net_namespace.c:467)
cleanup_net (net/core/net_namespace.c:664 (discriminator 3))
process_one_work (kernel/workqueue.c:3229)
worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391)
kthread (kernel/kthread.c:389)
ret_from_fork (arch/x86/kernel/process.c:147)
ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
</TASK>

Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
Reported-by: Xiao Liang <[email protected]>
Closes: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

commit | commitdiff | tree

Kuniyuki Iwashima [Fri, 10 Jan 2025 01:47:52 +0000 (10:47 +0900)]

gtp: Use for_each_netdev_rcu() in gtp_genl_dump_pdp().

gtp_newlink() links the gtp device to a list in dev_net(dev).

However, even after the gtp device is moved to another netns,
it stays on the list but should be invisible.

Let's use for_each_netdev_rcu() for netdev traversal in
gtp_genl_dump_pdp().

Note that gtp_dev_list is no longer used under RCU, so list
helpers are converted to the non-RCU variant.

Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
Reported-by: Xiao Liang <[email protected]>
Closes: https://lore.kernel.org/netdev/CABAhCOQdBL6h9M2C+kd+bGivRJ9Q72JUxW+-gur0nub_=PmFPA@mail.gmail.com/
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

commit | commitdiff | tree

Philo Lu [Fri, 10 Jan 2025 01:08:10 +0000 (09:08 +0800)]

udp: Make rehash4 independent in udp_lib_rehash()

As discussed in [0], rehash4 could be missed in udp_lib_rehash() when
udp hash4 changes while hash2 doesn't change. This patch fixes this by
moving rehash4 codes out of rehash2 checking, and then rehash2 and
rehash4 are done separately.

By doing this, we no longer need to call rehash4 explicitly in
udp_lib_hash4(), as the rehash callback in __ip4_datagram_connect takes
it. Thus, now udp_lib_hash4() returns directly if the sk is already
hashed.

Note that uhash4 may fail to work under consecutive connect(<dst
address>) calls because rehash() is not called with every connect(). To
overcome this, connect(<AF_UNSPEC>) needs to be called after the next
connect to a new destination.

[0]
https://lore.kernel.org/all/4761e466ab9f7542c68cdc95f248987d127044d2.1733499715 [email protected]/

Fixes: 78c91ae2c6de ("ipv4/udp: Add 4-tuple hash for connected socket")
Suggested-by: Paolo Abeni <[email protected]>
Signed-off-by: Philo Lu <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

commit | commitdiff | tree

Ville Syrjälä [Fri, 29 Nov 2024 06:50:11 +0000 (08:50 +0200)]

drm/i915/fb: Relax clear color alignment to 64 bytes

Mesa changed its clear color alignment from 4k to 64 bytes
without informing the kernel side about the change. This
is now likely to cause framebuffer creation to fail.

The only thing we do with the clear color buffer in i915 is:
1. map a single page
2. read out bytes 16-23 from said page
3. unmap the page

So the only requirement we really have is that those 8 bytes
are all contained within one page. Thus we can deal with the
Mesa regression by reducing the alignment requiment from 4k
to the same 64 bytes in the kernel. We could even go as low as
32 bytes, but IIRC 64 bytes is the hardware requirement on
the 3D engine side so matching that seems sensible.

Note that the Mesa alignment chages were partially undone
so the regression itself was already fixed on userspace
side.

Cc: [email protected]
Cc: Sagar Ghuge <[email protected]>
Cc: Nanley Chery <[email protected]>
Reported-by: Xi Ruoyao <[email protected]>
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13057
Closes: https://lore.kernel.org/all/[email protected]/
Link: https://gitlab.freedesktop.org/mesa/mesa/-/commit/17f97a69c13832a6c1b0b3aad45b06f07d4b852f
Link: https://gitlab.freedesktop.org/mesa/mesa/-/commit/888f63cf1baf34bc95e847a30a041dc7798edddb
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Tested-by: Xi Ruoyao <[email protected]>
Reviewed-by: José Roberto de Souza <[email protected]>
(cherry picked from commit ed3a892e5e3d6b3f6eeb76db7c92a968aeb52f3d)
Signed-off-by: Tvrtko Ursulin <[email protected]>

commit | commitdiff | tree

Heiner Kallweit [Thu, 9 Jan 2025 22:43:12 +0000 (23:43 +0100)]

r8169: remove redundant hwmon support

The temperature sensor is actually part of the integrated PHY and available
also on the standalone versions of the PHY. Therefore hwmon support will
be added to the Realtek PHY driver and can be removed here.

Fixes: 1ffcc8d41306 ("r8169: add support for the temperature sensor being available from RTL8125B")
Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

commit | commitdiff | tree

Paul Fertser [Thu, 9 Jan 2025 14:50:54 +0000 (17:50 +0300)]

net/ncsi: fix locking in Get MAC Address handling

Obtaining RTNL lock in a response handler is not allowed since it runs
in an atomic softirq context. Postpone setting the MAC address by adding
a dedicated step to the configuration FSM.

Fixes: 790071347a0a ("net/ncsi: change from ndo_set_mac_address to dev_set_mac_address")
Cc: [email protected]
Link: https://lore.kernel.org/20241129-potin-revert-ncsi-set-mac-addr-v1-1-94ea2cb596af@gmail.com
Signed-off-by: Paul Fertser <[email protected]>
Tested-by: Potin Lai <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

commit | commitdiff | tree

Qu Wenruo [Wed, 8 Jan 2025 03:44:04 +0000 (14:14 +1030)]

btrfs: add the missing error handling inside get_canonical_dev_path

Inside function get_canonical_dev_path(), we call d_path() to get the
final device path.

But d_path() can return error, and in that case the next strscpy() call
will trigger an invalid memory access.

Add back the missing error handling for d_path().

Reported-by: Boris Burkov <[email protected]>
Fixes: 7e06de7c83a7 ("btrfs: canonicalize the device path before adding it")
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>

commit | commitdiff | tree

Chris Bainbridge [Sat, 11 Jan 2025 18:59:45 +0000 (18:59 +0000)]

ACPI: video: Fix random crashes due to bad kfree()

Commit c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if
available for eDP") added function dm_helpers_probe_acpi_edid(), which
fetches the EDID from the BIOS by calling acpi_video_get_edid().

acpi_video_get_edid() returns a pointer to the EDID, but this pointer
does not originate from kmalloc() - it is actually the internal
"pointer" field from an acpi_buffer struct (which did come from
kmalloc()).

dm_helpers_probe_acpi_edid() then attempts to kfree() the EDID pointer,
resulting in memory corruption which leads to random, intermittent
crashes (e.g. 4% of boots will fail with some Oops).

Fix this by allocating a new array (which can be safely freed) for the
EDID data, and correctly freeing the acpi_buffer pointer.

The only other caller of acpi_video_get_edid() is nouveau_acpi_edid():
remove the extraneous kmemdup() here as the EDID data is now copied in
acpi_video_device_EDID().

Signed-off-by: Chris Bainbridge <[email protected]>
Fixes: c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if available for eDP")
Reviewed-by: Hans de Goede <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Reported-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Borislav Petkov (AMD) <[email protected]>
Closes: https://lore.kernel.org/amd-gfx/20250110175252.GBZ4FedNKqmBRaY4T3@fat_crate.local/T/#m324a23eb4c4c32fa7e89e31f8ba96c781e496fb1
Link: https://patch.msgid.link/[email protected]
[ rjw: Changed function description comment into a kerneldoc one ]
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <[email protected]>

commit | commitdiff | tree

Rafael J. Wysocki [Fri, 10 Jan 2025 12:48:10 +0000 (13:48 +0100)]

cpuidle: teo: Update documentation after previous changes

After previous changes, the description of the teo governor in the
documentation comment does not match the code any more, so update it
as appropriate.

Fixes: 449914398083 ("cpuidle: teo: Remove recent intercepts metric")
Fixes: 2662342079f5 ("cpuidle: teo: Gather statistics regarding whether or not to stop the tick")
Fixes: 6da8f9ba5a87 ("cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases")
Signed-off-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Christian Loehle <[email protected]>
Link: https://patch.msgid.link/[email protected]
[ rjw: Corrected 3 typos found by Christian ]
Signed-off-by: Rafael J. Wysocki <[email protected]>

commit | commitdiff | tree

Rafael J. Wysocki [Fri, 10 Jan 2025 12:46:43 +0000 (13:46 +0100)]

cpuidle: menu: Update documentation after previous changes

After commit 38f83090f515 ("cpuidle: menu: Remove iowait influence") and
other previous changes, the description of the menu governor in the
documentation does not match the code any more, so update it as
appropriate.

Fixes: 38f83090f515 ("cpuidle: menu: Remove iowait influence")
Fixes: 5484e31bbbff ("cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some cases")
Reported-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Christian Loehle <[email protected]>
Link: https://patch.msgid.link/[email protected]

commit | commitdiff | tree

Tom Chung [Thu, 5 Dec 2024 15:20:45 +0000 (23:20 +0800)]

drm/amd/display: Disable replay and psr while VRR is enabled

[Why]
Replay and PSR will cause some video corruption while VRR is enabled.

[How]
1. Disable the Replay and PSR while VRR is enabled.
2. Change the amdgpu_dm_crtc_vrr_active() parameter to const.
Because the function will only read data from dm_crtc_state.

Reviewed-by: Sun peng Li <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Signed-off-by: Roman Li <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit d7879340e987b3056b8ae39db255b6c19c170a0d)
Cc: [email protected]

commit | commitdiff | tree

Tom Chung [Thu, 5 Dec 2024 15:08:28 +0000 (23:08 +0800)]

drm/amd/display: Fix PSR-SU not support but still call the amdgpu_dm_psr_enable

[Why]
The enum DC_PSR_VERSION_SU_1 of psr_version is 1 and
DC_PSR_VERSION_UNSUPPORTED is 0xFFFFFFFF.

The original code may has chance trigger the amdgpu_dm_psr_enable()
while psr version is set to DC_PSR_VERSION_UNSUPPORTED.

[How]
Modify the condition to psr->psr_version == DC_PSR_VERSION_SU_1

Reviewed-by: Sun peng Li <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Signed-off-by: Roman Li <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit f765e7ce0417f8dc38479b4b495047c397c16902)
Cc: [email protected]

commit | commitdiff | tree

Karol Kolacinski [Tue, 5 Nov 2024 12:29:16 +0000 (13:29 +0100)]

ice: Add correct PHY lane assignment

Driver always naively assumes, that for PTP purposes, PHY lane to
configure is corresponding to PF ID.

This is not true for some port configurations, e.g.:
- 2x50G per quad, where lanes used are 0 and 2 on each quad, but PF IDs
are 0 and 1
- 100G per quad on 2 quads, where lanes used are 0 and 4, but PF IDs are
0 and 1

Use correct PHY lane assignment by getting and parsing port options.
This is read from the NVM by the FW and provided to the driver with
the indication of active port split.

Remove ice_is_muxed_topo(), which is no longer needed.

Fixes: 4409ea1726cb ("ice: Adjust PTP init for 2x50G E825C devices")
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Arkadiusz Kubalewski <[email protected]>
Signed-off-by: Karol Kolacinski <[email protected]>
Signed-off-by: Grzegorz Nitka <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

commit | commitdiff | tree

Karol Kolacinski [Tue, 5 Nov 2024 12:29:15 +0000 (13:29 +0100)]

ice: Fix ETH56G FC-FEC Rx offset value

Fix ETH56G FC-FEC incorrect Rx offset value by changing it from -255.96
to -469.26 ns.

Those values are derived from HW spec and reflect internal delays.
Hex value is a fixed point representation in Q23.9 format.

Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products")
Reviewed-by: Arkadiusz Kubalewski <[email protected]>
Signed-off-by: Karol Kolacinski <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

commit | commitdiff | tree

Karol Kolacinski [Tue, 5 Nov 2024 12:29:14 +0000 (13:29 +0100)]

ice: Fix quad registers read on E825

Quad registers are read/written incorrectly. E825 devices always use
quad 0 address and differentiate between the PHYs by changing SBQ
destination device (phy_0 or phy_0_peer).

Add helpers for reading/writing PTP registers shared per quad and use
correct quad address and SBQ destination device based on port.

Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products")
Reviewed-by: Arkadiusz Kubalewski <[email protected]>
Signed-off-by: Karol Kolacinski <[email protected]>
Signed-off-by: Grzegorz Nitka <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

commit | commitdiff | tree

Karol Kolacinski [Tue, 5 Nov 2024 12:29:13 +0000 (13:29 +0100)]

ice: Fix E825 initialization

Current implementation checks revision of all PHYs on all PFs, which is
incorrect and may result in initialization failure. Check only the
revision of the current PHY.

Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products")
Reviewed-by: Arkadiusz Kubalewski <[email protected]>
Signed-off-by: Karol Kolacinski <[email protected]>
Signed-off-by: Grzegorz Nitka <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

commit | commitdiff | tree

Linus Torvalds [Mon, 13 Jan 2025 17:03:18 +0000 (09:03 -0800)]

Merge tag 'mm-hotfixes-stable-2025-01-13-00-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
"18 hotfixes. 11 are cc:stable. 13 are MM and 5 are non-MM.

  All patches are singletons - please see the relevant changelogs for
  details"

* tag 'mm-hotfixes-stable-2025-01-13-00-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  fs/proc: fix softlockup in __read_vmcore (part 2)
  mm: fix assertion in folio_end_read()
  mm: vmscan : pgdemote vmstat is not getting updated when MGLRU is enabled.
  vmstat: disable vmstat_work on vmstat_cpu_down_prep()
  zram: fix potential UAF of zram table
  selftests/mm: set allocated memory to non-zero content in cow test
  mm: clear uffd-wp PTE/PMD state on mremap()
  module: fix writing of livepatch relocations in ROX text
  mm: zswap: properly synchronize freeing resources during CPU hotunplug
  Revert "mm: zswap: fix race between [de]compression and CPU hotunplug"
  hugetlb: fix NULL pointer dereference in trace_hugetlbfs_alloc_inode
  mm: fix div by zero in bdi_ratio_from_pages
  x86/execmem: fix ROX cache usage in Xen PV guests
  filemap: avoid truncating 64-bit offset to 32 bits
  tools: fix atomic_set() definition to set the value correctly
  mm/mempolicy: count MPOL_WEIGHTED_INTERLEAVE to "interleave_hit"
  scripts/decode_stacktrace.sh: fix decoding of lines with an additional info
  mm/kmemleak: fix percpu memory leak detection failure

commit | commitdiff | tree

Dave Airlie [Thu, 9 Jan 2025 00:55:53 +0000 (10:55 +1000)]

nouveau/fence: handle cross device fences properly

The fence sync logic doesn't handle a fence sync across devices
as it tries to write to a channel offset from one device into
the fence bo from a different device, which won't work so well.

This patch fixes that to avoid using the sync path in the case
where the fences come from different nouveau drm devices.

This works fine on a single device as the fence bo is shared
across the devices, and mapped into each channels vma space,
the channel offsets are therefore okay to pass between sides,
so one channel can sync on the seqnos from the other by using
the offset into it's vma.

Signed-off-by: Dave Airlie <[email protected]>
Cc: [email protected]
Reviewed-by: Ben Skeggs <[email protected]>
[ Fix compilation issue; remove version log from commit messsage.
- Danilo ]
Signed-off-by: Danilo Krummrich <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

commit | commitdiff | tree

Cristian Ciocaltea [Tue, 24 Dec 2024 18:22:44 +0000 (20:22 +0200)]

drm/tests: connector: Add ycbcr_420_allowed tests

Extend HDMI connector output format tests to verify its registration
succeeds only when the presence of YUV420 in the supported formats
matches the state of ycbcr_420_allowed flag.

Signed-off-by: Cristian Ciocaltea <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>

commit | commitdiff | tree

Cristian Ciocaltea [Tue, 24 Dec 2024 18:22:43 +0000 (20:22 +0200)]

drm/connector: hdmi: Validate supported_formats matches ycbcr_420_allowed

Ensure HDMI connector initialization fails when the presence of
HDMI_COLORSPACE_YUV420 in the given supported_formats bitmask doesn't
match the value of drm_connector->ycbcr_420_allowed.

Suggested-by: Dmitry Baryshkov <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Cristian Ciocaltea <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>

commit | commitdiff | tree

Cristian Ciocaltea [Tue, 24 Dec 2024 18:22:42 +0000 (20:22 +0200)]

drm/bridge-connector: Sync supported_formats with computed ycbcr_420_allowed

The case of having an HDMI bridge in the pipeline which advertises
YUV420 capability via its ->supported_formats and a non-HDMI one that
didn't enable ->ycbcr_420_allowed, is incorrectly handled because
supported_formats is passed as is to the helper initializing the HDMI
connector.

Ensure HDMI_COLORSPACE_YUV420 is removed from the bitmask passed to
drmm_connector_hdmi_init() when connector's ->ycbcr_420_allowed flag
ends up not being set.

Fixes: 3ced1c687512 ("drm/display: bridge_connector: handle ycbcr_420_allowed")
Signed-off-by: Cristian Ciocaltea <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>

commit | commitdiff | tree

Cristian Ciocaltea [Tue, 24 Dec 2024 18:22:41 +0000 (20:22 +0200)]

drm/bridge: Prioritize supported_formats over ycbcr_420_allowed

Bridges having DRM_BRIDGE_OP_HDMI set in their ->ops are supposed to
rely on the ->supported_formats bitmask to advertise the permitted
colorspaces, including HDMI_COLORSPACE_YUV420.

However, a new flag ->ycbcr_420_allowed has been recently introduced,
which brings the necessity to require redundant and potentially
inconsistent information to be provided on HDMI bridges initialization.

Adjust ->ycbcr_420_allowed for HDMI bridges according to
->supported_formats, right before adding them to the global bridge list.
This keeps the initialization process straightforward and unambiguous,
thereby preventing any further confusion.

Fixes: 3ced1c687512 ("drm/display: bridge_connector: handle ycbcr_420_allowed")
Signed-off-by: Cristian Ciocaltea <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>

commit | commitdiff | tree

Artem Chernyshev [Thu, 9 Jan 2025 08:30:39 +0000 (11:30 +0300)]

pktgen: Avoid out-of-bounds access in get_imix_entries

Passing a sufficient amount of imix entries leads to invalid access to the
pkt_dev->imix_entries array because of the incorrect boundary check.

UBSAN: array-index-out-of-bounds in net/core/pktgen.c:874:24
index 20 is out of range for type 'imix_pkt [20]'
CPU: 2 PID: 1210 Comm: bash Not tainted 6.10.0-rc1 #121
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
<TASK>
dump_stack_lvl lib/dump_stack.c:117
__ubsan_handle_out_of_bounds lib/ubsan.c:429
get_imix_entries net/core/pktgen.c:874
pktgen_if_write net/core/pktgen.c:1063
pde_write fs/proc/inode.c:334
proc_reg_write fs/proc/inode.c:346
vfs_write fs/read_write.c:593
ksys_write fs/read_write.c:644
do_syscall_64 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe arch/x86/entry/entry_64.S:130

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 52a62f8603f9 ("pktgen: Parse internet mix (imix) input")
Signed-off-by: Artem Chernyshev <[email protected]>
[ fp: allow to fill the array completely; minor changelog cleanup ]
Signed-off-by: Fedor Pchelkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

commit | commitdiff | tree

Yage Geng [Mon, 13 Jan 2025 08:52:08 +0000 (16:52 +0800)]

ALSA: hda/realtek: Fix volume adjustment issue on Lenovo ThinkBook 16P Gen5

This patch fixes the volume adjustment issue on the Lenovo ThinkBook 16P Gen5
by applying the necessary quirk configuration for the Realtek ALC287 codec.

The issue was caused by incorrect configuration in the driver,
which prevented proper volume control on certain systems.

Signed-off-by: Yage Geng <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

commit | commitdiff | tree

Rik van Riel [Fri, 10 Jan 2025 15:28:21 +0000 (10:28 -0500)]

fs/proc: fix softlockup in __read_vmcore (part 2)

Since commit 5cbcb62dddf5 ("fs/proc: fix softlockup in __read_vmcore") the
number of softlockups in __read_vmcore at kdump time have gone down, but
they still happen sometimes.

In a memory constrained environment like the kdump image, a softlockup is
not just a harmless message, but it can interfere with things like RCU
freeing memory, causing the crashdump to get stuck.

The second loop in __read_vmcore has a lot more opportunities for natural
sleep points, like scheduling out while waiting for a data write to
happen, but apparently that is not always enough.

Add a cond_resched() to the second loop in __read_vmcore to (hopefully)
get rid of the softlockups.

Link: https://lkml.kernel.org/r/20250110102821.2a37581b@fangorn
Fixes: 5cbcb62dddf5 ("fs/proc: fix softlockup in __read_vmcore")
Signed-off-by: Rik van Riel <[email protected]>
Reported-by: Breno Leitao <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Matthew Wilcox (Oracle) [Fri, 10 Jan 2025 16:32:57 +0000 (16:32 +0000)]

mm: fix assertion in folio_end_read()

We only need to assert that the uptodate flag is clear if we're going to
set it. This hasn't been a problem before now because we have only used
folio_end_read() when completing with an error, but it's convenient to use
it in squashfs if we discover the folio is already uptodate.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Phillip Lougher <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Donet Tom [Thu, 9 Jan 2025 06:05:39 +0000 (00:05 -0600)]

mm: vmscan : pgdemote vmstat is not getting updated when MGLRU is enabled.

When MGLRU is enabled, the pgdemote_kswapd, pgdemote_direct, and
pgdemote_khugepaged stats in vmstat are not being updated.

Commit f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA
balancing operations") moved the pgdemote vmstat update from
demote_folio_list() to shrink_inactive_list(), which is in the normal LRU
path. As a result, the pgdemote stats are updated correctly for the
normal LRU but not for MGLRU.

To address this, we have added the pgdemote stat update in the
evict_folios() function, which is in the MGLRU path. With this patch, the
pgdemote stats will now be updated correctly when MGLRU is enabled.

Without this patch vmstat output when MGLRU is enabled
======================================================
pgdemote_kswapd 0
pgdemote_direct 0
pgdemote_khugepaged 0

With this patch vmstat output when MGLRU is enabled
===================================================
pgdemote_kswapd 43234
pgdemote_direct 4691
pgdemote_khugepaged 0

Link: https://lkml.kernel.org/r/[email protected]
Fixes: f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA balancing operations")
Signed-off-by: Donet Tom <[email protected]>
Acked-by: Yu Zhao <[email protected]>
Tested-by: Li Zhijian <[email protected]>
Reviewed-by: Li Zhijian <[email protected]>
Cc: Aneesh Kumar K.V (Arm) <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kaiyang Zhao <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Ritesh Harjani (IBM) <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Wei Xu <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Koichiro Den [Wed, 8 Jan 2025 04:28:07 +0000 (13:28 +0900)]

vmstat: disable vmstat_work on vmstat_cpu_down_prep()

The upstream commit adcfb264c3ed ("vmstat: disable vmstat_work on
vmstat_cpu_down_prep()") introduced another warning during the boot phase
so was soon reverted on upstream by commit cd6313beaeae ("Revert "vmstat:
disable vmstat_work on vmstat_cpu_down_prep()"").  This commit resolves it
and reattempts the original fix.

Even after mm/vmstat:online teardown, shepherd may still queue work for
the dying cpu until the cpu is removed from online mask.  While it's quite
rare, this means that after unbind_workers() unbinds a per-cpu kworker, it
potentially runs vmstat_update for the dying CPU on an irrelevant cpu
before entering atomic AP states.  When CONFIG_DEBUG_PREEMPT=y, it results
in the following error with the backtrace.

  BUG: using smp_processor_id() in preemptible [00000000] code: \
                                               kworker/7:3/1702
  caller is refresh_cpu_vm_stats+0x235/0x5f0
  CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G
  Tainted: [N]=TEST
  Workqueue: mm_percpu_wq vmstat_update
  Call Trace:
   <TASK>
   dump_stack_lvl+0x8d/0xb0
   check_preemption_disabled+0xce/0xe0
   refresh_cpu_vm_stats+0x235/0x5f0
   vmstat_update+0x17/0xa0
   process_one_work+0x869/0x1aa0
   worker_thread+0x5e5/0x1100
   kthread+0x29e/0x380
   ret_from_fork+0x2d/0x70
   ret_from_fork_asm+0x1a/0x30
   </TASK>

So, for mm/vmstat:online, disable vmstat_work reliably on teardown and
symmetrically enable it on startup.

For secondary CPUs during CPU hotplug scenarios, ensure the delayed work
is disabled immediately after the initialization.  These CPUs are not yet
online when start_shepherd_timer() runs on boot CPU.  vmstat_cpu_online()
will enable the work for them.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Huacai Chen <[email protected]>
Signed-off-by: Koichiro Den <[email protected]>
Suggested-by: Huacai Chen <[email protected]>
Tested-by: Charalampos Mitrodimas <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Kairui Song [Tue, 7 Jan 2025 06:54:46 +0000 (14:54 +0800)]

zram: fix potential UAF of zram table

If zram_meta_alloc failed early, it frees allocated zram->table without
setting it NULL. Which will potentially cause zram_meta_free to access
the table if user reset an failed and uninitialized device.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 74363ec674cb ("zram: fix uninitialized ZRAM not releasing backing device")
Signed-off-by: Kairui Song <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Ryan Roberts [Tue, 7 Jan 2025 14:25:53 +0000 (14:25 +0000)]

selftests/mm: set allocated memory to non-zero content in cow test

After commit b1f202060afe ("mm: remap unused subpages to shared zeropage
when splitting isolated thp"), cow test cases involving swapping out THPs
via madvise(MADV_PAGEOUT) started to be skipped due to the subsequent
check via pagemap determining that the memory was not actually swapped
out.  Logs similar to this were emitted:

   ...

   # [RUN] Basic COW after fork() ... with swapped-out, PTE-mapped THP (16 kB)
   ok 2 # SKIP MADV_PAGEOUT did not work, is swap enabled?
   # [RUN] Basic COW after fork() ... with single PTE of swapped-out THP (16 kB)
   ok 3 # SKIP MADV_PAGEOUT did not work, is swap enabled?
   # [RUN] Basic COW after fork() ... with swapped-out, PTE-mapped THP (32 kB)
   ok 4 # SKIP MADV_PAGEOUT did not work, is swap enabled?

   ...

The commit in question introduces the behaviour of scanning THPs and if
their content is predominantly zero, it splits them and replaces the pages
which are wholly zero with the zero page.  These cow test cases were
getting caught up in this.

So let's avoid that by filling the contents of all allocated memory with
a non-zero value. With this in place, the tests are passing again.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp")
Signed-off-by: Ryan Roberts <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Usama Arif <[email protected]>
Cc: Yu Zhao <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

commit | commitdiff | tree

Ryan Roberts [Tue, 7 Jan 2025 14:47:52 +0000 (14:47 +0000)]

mm: clear uffd-wp PTE/PMD state on mremap()

When mremap()ing a memory region previously registered with userfaultfd as
write-protected but without UFFD_FEATURE_EVENT_REMAP, an inconsistency in
flag clearing leads to a mismatch between the vma flags (which have
uffd-wp cleared) and the pte/pmd flags (which do not have uffd-wp
cleared).  This mismatch causes a subsequent mprotect(PROT_WRITE) to
trigger a warning in page_table_check_pte_flags() due to setting the pte
to writable while uffd-wp is still set.

Fix this by always explicitly clearing the uffd-wp pte/pmd flags on any
such mremap() so that the values are consistent with the existing clearing
of VM_UFFD_WP.  Be careful to clear the logical flag regardless of its
physical form; a PTE bit, a swap PTE bit, or a PTE marker.  Cover PTE,
huge PMD and hugetlb paths.

Link: https://lkml.kernel.org/r/[email protected]
Co-developed-by: Mikołaj Lenczewski <[email protected]>
Signed-off-by: Mikołaj Lenczewski <[email protected]>
Signed-off-by: Ryan Roberts <[email protected]>
Closes: https://lore.kernel.org/linux-mm/[email protected]/
Fixes: 63b2d4174c4a ("userfaultfd: wp: add the writeprotect API to userfaultfd ioctl")
Cc: David Hildenbrand <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

Empty description

This page took 0.167173 seconds and 4 git commands to generate.