Git Repo - linux.git/log

Merge branch 'linus' into sched/urgent, to resolve conflict

Conflicts:
kernel/sched/ext.c

There's a context conflict between this upstream commit:

3fdb9ebcec10 sched_ext: Start schedulers with consistent p->scx.slice values

... and this fix in sched/urgent:

98442f0ccd82 sched: Fix delayed_dequeue vs switched_from_fair()

Resolve it.

Signed-off-by: Ingo Molnar <[email protected]>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
"Several miscellaneous fixes. A lot of bnxt_re activity, there will be
  more rc patches there coming.

   - Many bnxt_re bug fixes - Memory leaks, kasn, NULL pointer deref,
     soft lockups, error unwinding and some small functional issues

   - Error unwind bug in rdma netlink

   - Two issues with incorrect VLAN detection for iWarp

   - skb_splice_from_iter() splat in siw

   - Give SRP slab caches unique names to resolve the merge window
     WARN_ON regression"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/bnxt_re: Fix the GID table length
  RDMA/bnxt_re: Fix a bug while setting up Level-2 PBL pages
  RDMA/bnxt_re: Change the sequence of updating the CQ toggle value
  RDMA/bnxt_re: Fix an error path in bnxt_re_add_device
  RDMA/bnxt_re: Avoid CPU lockups due fifo occupancy check loop
  RDMA/bnxt_re: Fix a possible NULL pointer dereference
  RDMA/bnxt_re: Return more meaningful error
  RDMA/bnxt_re: Fix incorrect dereference of srq in async event
  RDMA/bnxt_re: Fix out of bound check
  RDMA/bnxt_re: Fix the max CQ WQEs for older adapters
  RDMA/srpt: Make slab cache names unique
  RDMA/irdma: Fix misspelling of "accept*"
  RDMA/cxgb4: Fix RDMA_CM_EVENT_UNREACHABLE error for iWARP
  RDMA/siw: Add sendpage_ok() check to disable MSG_SPLICE_PAGES
  RDMA/core: Fix ENODEV error for iWARP test over vlan
  RDMA/nldev: Fix NULL pointer dereferences issue in rdma_nl_notify_event
  RDMA/bnxt_re: Fix the max WQEs used in Static WQE mode
  RDMA/bnxt_re: Add a check for memory allocation
  RDMA/bnxt_re: Fix incorrect AVID type in WQE structure
  RDMA/bnxt_re: Fix a possible memory leak

Merge tag 'for-6.12-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

- regression fix: dirty extents tracked in xarray for qgroups must be
   adjusted for 32bit platforms

- fix potentially freeing uninitialized name in fscrypt structure

- fix warning about unneeded variable in a send callback

* tag 'for-6.12-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix uninitialized pointer free on read_alloc_one_name() error
  btrfs: send: cleanup unneeded return variable in changed_verity()
  btrfs: fix uninitialized pointer free in add_inode_ref()
  btrfs: use sector numbers as keys for the dirty extents xarray

Merge tag 'v6.12-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

- fix race between session setup and session logoff

- add supplementary group support

* tag 'v6.12-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: add support for supplementary groups
ksmbd: fix user-after-free from session log off

Merge tag 'v6.12-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:

- Remove bogus testmgr ENOENT error messages

- Ensure algorithm is still alive before marking it as tested

- Disable buggy hash algorithms in marvell/cesa

* tag 'v6.12-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: marvell/cesa - Disable hash algorithms
  crypto: testmgr - Hide ENOENT errors better
  crypto: api - Fix liveliness check in crypto_alg_tested

Merge tag 'sched_ext-for-6.12-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:

- More issues reported in the enable/disable paths on large machines
   with many tasks due to scx_tasks_lock being held too long. Break up
   the task iterations

- Remove ops.select_cpu() dependency in bypass mode so that a
   misbehaving implementation can't live-lock the machine by pushing all
   tasks to few CPUs in bypass mode

- Other misc fixes

* tag 'sched_ext-for-6.12-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Remove unnecessary cpu_relax()
  sched_ext: Don't hold scx_tasks_lock for too long
  sched_ext: Move scx_tasks_lock handling into scx_task_iter helpers
  sched_ext: bypass mode shouldn't depend on ops.select_cpu()
  sched_ext: Move scx_buildin_idle_enabled check to scx_bpf_select_cpu_dfl()
  sched_ext: Start schedulers with consistent p->scx.slice values
  Revert "sched_ext: Use shorter slice while bypassing"
  sched_ext: use correct function name in pick_task_scx() warning message
  selftests: sched_ext: Add sched_ext as proper selftest target

Merge tag 'trace-ringbuffer-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ring-buffer fixes from Steven Rostedt:

- Fix ref counter of buffers assigned at boot up

   A tracing instance can be created from the kernel command line. If it
   maps to memory, it is considered permanent and should not be deleted,
   or bad things can happen. If it is not mapped to memory, then the
   user is fine to delete it via rmdir from the instances directory. But
   the ref counts assumed 0 was free to remove and greater than zero was
   not. But this was not the case. When an instance is created, it
   should have the reference of 1, and if it should not be removed, it
   must be greater than 1. The boot up code set normal instances with a
   ref count of 0, which could get removed if something accessed it and
   then released it. And memory mapped instances had a ref count of 1
   which meant it could be deleted, and bad things happen. Keep normal
   instances ref count as 1, and set memory mapped instances ref count
   to 2.

- Protect sub buffer size (order) updates from other modifications

   When a ring buffer is changing the size of its sub-buffers, no other
   operations should be performed on the ring buffer. That includes
   reading it. But the locking only grabbed the buffer->mutex that keeps
   some operations from touching the ring buffer. It also must hold the
   cpu_buffer->reader_lock as well when updates happen as other paths
   use that to do some operations on the ring buffer.

* tag 'trace-ringbuffer-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Fix reader locking when changing the sub buffer order
  ring-buffer: Fix refcount setting of boot mapped buffers

Merge tag 'bcachefs-2024-10-14' of git://evilpiepirate.org/bcachefs

Pull bcachefs fixes from Kent Overstreet:

- New metadata version inode_has_child_snapshots

   This fixes bugs with handling of unlinked inodes + snapshots, in
   particular when an inode is reattached after taking a snapshot;
   deleted inodes now get correctly cleaned up across snapshots.

- Disk accounting rewrite fixes
     - validation fixes for when a device has been removed
     - fix journal replay failing with "journal_reclaim_would_deadlock"

- Some more small fixes for erasure coding + device removal

- Assorted small syzbot fixes

* tag 'bcachefs-2024-10-14' of git://evilpiepirate.org/bcachefs: (27 commits)
  bcachefs: Fix sysfs warning in fstests generic/730,731
  bcachefs: Handle race between stripe reuse, invalidate_stripe_to_dev
  bcachefs: Fix kasan splat in new_stripe_alloc_buckets()
  bcachefs: Add missing validation for bch_stripe.csum_granularity_bits
  bcachefs: Fix missing bounds checks in bch2_alloc_read()
  bcachefs: fix uaf in bch2_dio_write_done()
  bcachefs: Improve check_snapshot_exists()
  bcachefs: Fix bkey_nocow_lock()
  bcachefs: Fix accounting replay flags
  bcachefs: Fix invalid shift in member_to_text()
  bcachefs: Fix bch2_have_enough_devs() for BCH_SB_MEMBER_INVALID
  bcachefs: __wait_for_freeing_inode: Switch to wait_bit_queue_entry
  bcachefs: Check if stuck in journal_res_get()
  closures: Add closure_wait_event_timeout()
  bcachefs: Fix state lock involved deadlock
  bcachefs: Fix NULL pointer dereference in bch2_opt_to_text
  bcachefs: Release transaction before wake up
  bcachefs: add check for btree id against max in try read node
  bcachefs: Disk accounting device validation fixes
  bcachefs: bch2_inode_or_descendents_is_open()
  ...

ring-buffer: Fix reader locking when changing the sub buffer order

The function ring_buffer_subbuf_order_set() updates each
ring_buffer_per_cpu and installs new sub buffers that match the requested
page order. This operation may be invoked concurrently with readers that
rely on some of the modified data, such as the head bit (RB_PAGE_HEAD), or
the ring_buffer_per_cpu.pages and reader_page pointers. However, no
exclusive access is acquired by ring_buffer_subbuf_order_set(). Modifying
the mentioned data while a reader also operates on them can then result in
incorrect memory access and various crashes.

Fix the problem by taking the reader_lock when updating a specific
ring_buffer_per_cpu in ring_buffer_subbuf_order_set().

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Link: https://lore.kernel.org/[email protected]
Fixes: 8e7b58c27b3c ("ring-buffer: Just update the subbuffers when changing their allocation order")
Signed-off-by: Petr Pavlu <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

sched_ext: Remove unnecessary cpu_relax()

As described in commit b07996c7abac ("sched_ext: Don't hold
scx_tasks_lock for too long"), we're doing a cond_resched() every 32
calls to scx_task_iter_next() to avoid RCU and other stalls. That commit
also added a cpu_relax() to the codepath where we drop and reacquire the
lock, but as Waiman described in [0], cpu_relax() should only be
necessary in busy loops to avoid pounding on a cacheline (or to allow a
hypertwin to more fully utilize a core).

Let's remove the unnecessary cpu_relax().

[0]: https://lore.kernel.org/all/35b3889b-904a-4d26-981f-c8aa1557a7c7@redhat.com/

Cc: Waiman Long <[email protected]>
Signed-off-by: David Vernet <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>

ring-buffer: Fix refcount setting of boot mapped buffers

A ring buffer which has its buffered mapped at boot up to fixed memory
should not be freed. Other buffers can be. The ref counting setup was
wrong for both. It made the not mapped buffers ref count have zero, and the
boot mapped buffer a ref count of 1. But an normally allocated buffer
should be 1, where it can be removed.

Keep the ref count of a normal boot buffer with its setup ref count (do
not decrement it), and increment the fixed memory boot mapped buffer's ref
count.

Cc: Mathieu Desnoyers <[email protected]>
Link: https://lore.kernel.org/[email protected]
Fixes: e645535a954ad ("tracing: Add option to use memmapped memory for trace boot instance")
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

Merge tag 'f2fs-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs fix from Jaegeuk Kim:
"An urgent fix to resolve DIO read performance regression caused by
'f2fs: fix to avoid racing in between read and OPU dio write'"

* tag 'f2fs-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: allow parallel DIO reads

Merge tag 'erofs-for-6.12-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:
"The main one fixes a syzbot issue due to the invalid inode type out of
  file-backed mounts. The others are minor cleanups without actual logic
  changes.

  Summary:

   - Make sure only regular inodes can be used for file-backed mounts

   - Two minor codebase cleanups"

* tag 'erofs-for-6.12-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: get rid of kaddr in `struct z_erofs_maprecorder`
  erofs: get rid of z_erofs_try_to_claim_pcluster()
  erofs: ensure regular inodes for file-backed mounts

bcachefs: Fix sysfs warning in fstests generic/730,731

sysfs warns if we're removing a symlink from a directory that's no
longer in sysfs; this is triggered by fstests generic/730, which
simulates hot removal of a block device.

This patch is however not a correct fix, since checking
kobj->state_in_sysfs on a kobj owned by another subsystem is racy.

A better fix would be to add the appropriate check to
sysfs_remove_link() - and sysfs_create_link() as well.

But kobject_add_internal()/kobject_del() do not as of today have locking
that would support that.

Note that the block/holder.c code appears to be subject to this race as
well.

Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>

sched/fair: Fix external p->on_rq users

Sean noted that ever since commit 152e11f6df29 ("sched/fair: Implement
delayed dequeue") KVM's preemption notifiers have started
mis-classifying preemption vs blocking.

Notably p->on_rq is no longer sufficient to determine if a task is
runnable or blocked -- the aforementioned commit introduces tasks that
remain on the runqueue even through they will not run again, and
should be considered blocked for many cases.

Add the task_is_runnable() helper to classify things and audit all
external users of the p->on_rq state. Also add a few comments.

Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue")
Reported-by: Sean Christopherson <[email protected]>
Tested-by: Sean Christopherson <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

sched/psi: Fix mistaken CPU pressure indication after corrupted task state bug

Since sched_delayed tasks remain queued even after blocking, the load
balancer can migrate them between runqueues while PSI considers them
to be asleep. As a result, it misreads the migration requeue followed
by a wakeup as a double queue:

psi: inconsistent task state! task=... cpu=... psi_flags=4 clear=. set=4

First, call psi_enqueue() after p->sched_class->enqueue_task(). A
wakeup will clear p->se.sched_delayed while a migration will not, so
psi can use that flag to tell them apart.

Then teach psi to migrate any "sleep" state when delayed-dequeue tasks
are being migrated.

Delayed-dequeue tasks can be revived by ttwu_runnable(), which will
call down with a new ENQUEUE_DELAYED. Instead of further complicating
the wakeup conditional in enqueue_task(), identify migration contexts
instead and default to wakeup handling for all other cases.

It's not just the warning in dmesg, the task state corruption causes a
permanent CPU pressure indication, which messes with workload/machine
health monitoring.

Debugged-by-and-original-fix-by: K Prateek Nayak <[email protected]>
Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue")
Closes: https://lore.kernel.org/lkml/[email protected]/
Closes: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Johannes Weiner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Tested-by: K Prateek Nayak <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

bcachefs: Handle race between stripe reuse, invalidate_stripe_to_dev

When creating a new stripe, we may reuse an existing stripe that has
some empty and some nonempty blocks.

Generally, the existing stripe won't change underneath us - except for
block sector counts, which we copy to the new key in
ec_stripe_key_update.

But the device removal path can now invalidate stripe pointers to a
device, and that can race with stripe reuse.

Change ec_stripe_key_update() to check for and resolve this
inconsistency.

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Fix kasan splat in new_stripe_alloc_buckets()

Update for BCH_SB_MEMBER_INVALID.

Signed-off-by: Kent Overstreet <[email protected]>

Merge tag 'hid-for-linus-2024101301' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

- fix for memory corruption regression in amd_sfh driver (Basavaraj
   Natikar)

- fix for mis-reporting of BTN_TOOL_PEN and BTN_TOOL_RUBBER for AES
   sensors tools in Wacom driver (Jason Gerecke)

- fix for unitialized variable use in intel-ish-hid driver
   (SurajSonawane2415)

- a few device-specific quirks / device ID additions

* tag 'hid-for-linus-2024101301' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: wacom: Hardcode (non-inverted) AES pens as BTN_TOOL_PEN
  HID: amd_sfh: Switch to device-managed dmam_alloc_coherent()
  HID: multitouch: Add quirk for HONOR MagicBook Art 14 touchpad
  HID: multitouch: Add support for B2402FVA track point
  HID: plantronics: Workaround for an unexcepted opposite volume key
  hid: intel-ish-hid: Fix uninitialized variable 'rv' in ish_fw_xfer_direct_dma

bcachefs: Add missing validation for bch_stripe.csum_granularity_bits

Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Fix missing bounds checks in bch2_alloc_read()

We were checking that the alloc key was for a valid device, but not a
valid bucket.

This is the upgrade path from versions prior to bcachefs being mainlined.

Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: fix uaf in bch2_dio_write_done()

Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>

Linux 6.12-rc3

Merge tag '6.12-rc2-cifs-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
"Two fixes for Windows symlink handling"

* tag '6.12-rc2-cifs-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix creating native symlinks pointing to current or parent directory
cifs: Improve creating native symlinks pointing to directory

Merge tag 'usb-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some small USB fixes for some reported problems for 6.12-rc3.
  Include in here is:

   - fix for yurex driver that was caused in -rc1

   - build error fix for usbg network filesystem code

   - onboard_usb_dev build fix

   - dwc3 driver fixes for reported errors

   - gadget driver fix

   - new USB storage driver quirk

   - xhci resume bugfix

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  net/9p/usbg: Fix build error
  USB: yurex: kill needless initialization in yurex_read
  Revert "usb: yurex: Replace snprintf() with the safer scnprintf() variant"
  usb: xhci: Fix problem with xhci resume from suspend
  usb: misc: onboard_usb_dev: introduce new config symbol for usb5744 SMBus support
  usb: dwc3: core: Stop processing of pending events if controller is halted
  usb: dwc3: re-enable runtime PM after failed resume
  usb: storage: ignore bogus device raised by JieLi BR21 USB sound chip
  usb: gadget: core: force synchronous registration

Merge tag 'driver-core-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
"Here is a single driver core fix, and a .mailmap update.

  The fix is for the rust driver core bindings, turned out that the
  from_raw binding wasn't a good idea (don't want to pass a pointer to a
  reference counted object without actually incrementing the pointer.)
  So this change fixes it up as the from_raw binding came in in -rc1.

  The other change is a .mailmap update.

  Both have been in linux-next for a while with no reported issues"

* tag 'driver-core-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  mailmap: update mail for Fiona Behrens
  rust: device: change the from_raw() function

Merge tag 'powerpc-6.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:

- Fix crash in memcpy on 8xx due to dcbz workaround since recent
changes

Thanks to Christophe Leroy.

* tag 'powerpc-6.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/8xx: Fix kernel DTLB miss on dcbz

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Four small fixes, three in drivers and one in the FC transport class
  to add idempotence to state setting"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: scsi_transport_fc: Allow setting rport state to current state
  scsi: wd33c93: Don't use stale scsi_pointer value
  scsi: fnic: Move flush_work initialization out of if block
  scsi: ufs: Use pre-calculated offsets in ufshcd_init_lrb()

Merge tag 'hwmon-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

- Add missing dependencies on REGMAP_I2C for several drivers

- Fix memory leak in adt7475 driver

- Relabel Columbiaville temperature sensor in intel-m10-bmc-hwmon
   driver to match other sensor labels

* tag 'hwmon-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (max1668) Add missing dependency on REGMAP_I2C
  hwmon: (ltc2991) Add missing dependency on REGMAP_I2C
  hwmon: (adt7470) Add missing dependency on REGMAP_I2C
  hwmon: (adm9240) Add missing dependency on REGMAP_I2C
  hwmon: (mc34vr500) Add missing dependency on REGMAP_I2C
  hwmon: (tmp513) Add missing dependency on REGMAP_I2C
  hwmon: (adt7475) Fix memory leak in adt7475_fan_pwm_config()
  hwmon: intel-m10-bmc-hwmon: relabel Columbiaville to CVL Die Temperature

bcachefs: Improve check_snapshot_exists()

Check if we have snapshot_trees or subvolumes that refer to the snapshot
node being reconstructed, and use them.

With this, the kill_btree_root test that blows away the snapshots btree
now passes, and we're able to successfully reconstruct.

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Fix bkey_nocow_lock()

This fixes an assertion pop in nocow_locking.c

00243 kernel BUG at fs/bcachefs/nocow_locking.c:41!
00243 Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
00243 Modules linked in:
00243 Hardware name: linux,dummy-virt (DT)
00243 pstate: 60001005 (nZCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00244 pc : bch2_bucket_nocow_unlock (/home/testdashboard/linux-7/fs/bcachefs/nocow_locking.c:41)
00244 lr : bkey_nocow_lock (/home/testdashboard/linux-7/fs/bcachefs/data_update.c:79)
00244 sp : ffffff80c82373b0
00244 x29: ffffff80c82373b0 x28: ffffff80e08958c0 x27: ffffff80e0880000
00244 x26: ffffff80c8237a98 x25: 00000000000000a0 x24: ffffff80c8237ab0
00244 x23: 00000000000000c0 x22: 0000000000000008 x21: 0000000000000000
00244 x20: ffffff80c8237a98 x19: 0000000000000018 x18: 0000000000000000
00244 x17: 0000000000000000 x16: 000000000000003f x15: 0000000000000000
00244 x14: 0000000000000008 x13: 0000000000000018 x12: 0000000000000000
00244 x11: 0000000000000000 x10: ffffff80e0880000 x9 : ffffffc0803ac1a4
00244 x8 : 0000000000000018 x7 : ffffff80c8237a88 x6 : ffffff80c8237ab0
00244 x5 : ffffff80e08988d0 x4 : 00000000ffffffff x3 : 0000000000000000
00244 x2 : 0000000000000004 x1 : 0003000000000d1e x0 : ffffff80e08988c0
00244 Call trace:
00244 bch2_bucket_nocow_unlock (/home/testdashboard/linux-7/fs/bcachefs/nocow_locking.c:41)
00245 bch2_data_update_init (/home/testdashboard/linux-7/fs/bcachefs/data_update.c:627 (discriminator 1))
00245 promote_alloc.isra.0 (/home/testdashboard/linux-7/fs/bcachefs/io_read.c:242 /home/testdashboard/linux-7/fs/bcachefs/io_read.c:304)
00245 __bch2_read_extent (/home/testdashboard/linux-7/fs/bcachefs/io_read.c:949)
00246 __bch2_read (/home/testdashboard/linux-7/fs/bcachefs/io_read.c:1215)
00246 bch2_direct_IO_read (/home/testdashboard/linux-7/fs/bcachefs/fs-io-direct.c:132)
00246 bch2_read_iter (/home/testdashboard/linux-7/fs/bcachefs/fs-io-direct.c:201)
00247 aio_read.constprop.0 (/home/testdashboard/linux-7/fs/aio.c:1602)
00247 io_submit_one.constprop.0 (/home/testdashboard/linux-7/fs/aio.c:2003 /home/testdashboard/linux-7/fs/aio.c:2052)
00248 __arm64_sys_io_submit (/home/testdashboard/linux-7/fs/aio.c:2111 /home/testdashboard/linux-7/fs/aio.c:2081 /home/testdashboard/linux-7/fs/aio.c:2081)
00248 invoke_syscall.constprop.0 (/home/testdashboard/linux-7/arch/arm64/include/asm/syscall.h:61 /home/testdashboard/linux-7/arch/arm64/kernel/syscall.c:54)
00248 ========= FAILED TIMEOUT tiering_variable_buckets_replicas in 1200s

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Fix accounting replay flags

BCH_TRANS_COMMIT_journal_reclaim without BCH_WATERMARK_reclaim means
"return an error if low on journal space" - but accounting replay must
succeed.

Fixes https://github.com/koverstreet/bcachefs/issues/656

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Fix invalid shift in member_to_text()

Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Fix bch2_have_enough_devs() for BCH_SB_MEMBER_INVALID

This fixes a kasan splat in the ec device removal tests.

Signed-off-by: Kent Overstreet <[email protected]>

RDMA/bnxt_re: Fix the GID table length

GID table length is reported by FW. The gid index which is passed to the
driver during modify_qp/create_ah is restricted by the sgid_index field of
struct ib_global_route. sgid_index is u8 and the max sgid possible is
256.

Each GID entry in HW will have 2 GID entries in the kernel gid table. So
we can support twice the gid table size reported by FW. Also, restrict the
max GID to 256 also.

Fixes: 847b97887ed4 ("RDMA/bnxt_re: Restrict the max_gids to 256")
Link: https://patch.msgid.link/r/[email protected]
Signed-off-by: Kalesh AP <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

RDMA/bnxt_re: Fix a bug while setting up Level-2 PBL pages

Avoid memory corruption while setting up Level-2 PBL pages for the non MR
resources when num_pages > 256K.

There will be a single PDE page address (contiguous pages in the case of >
PAGE_SIZE), but, current logic assumes multiple pages, leading to invalid
memory access after 256K PBL entries in the PDE.

Fixes: 0c4dcd602817 ("RDMA/bnxt_re: Refactor hardware queue memory allocation")
Link: https://patch.msgid.link/r/[email protected]
Signed-off-by: Bhargava Chenna Marreddy <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

RDMA/bnxt_re: Change the sequence of updating the CQ toggle value

Currently the CQ toggle value in the shared page (read by the userlib) is
updated as part of the cqn_handler. There is a potential race of
application calling the CQ ARM doorbell immediately and using the old
toggle value.

Change the sequence of updating CQ toggle value to update in the
bnxt_qplib_service_nq function immediately after reading the toggle value
to be in sync with the HW updated value.

Fixes: e275919d9669 ("RDMA/bnxt_re: Share a page to expose per CQ info with userspace")
Link: https://patch.msgid.link/r/[email protected]
Signed-off-by: Chandramohan Akula <[email protected]>
Reviewed-by: Selvin Xavier <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

RDMA/bnxt_re: Fix an error path in bnxt_re_add_device

In bnxt_re_add_device(), when register netdev notifier fails, driver is
not unregistering the IB device in the error cleanup path. Also, removed
the duplicate cleanup in error path of bnxt_re_probe.

Fixes: 94a9dc6ac8f7 ("RDMA/bnxt_re: Group all operations under add_device and remove_device")
Link: https://patch.msgid.link/r/[email protected]
Signed-off-by: Kalesh AP <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

RDMA/bnxt_re: Avoid CPU lockups due fifo occupancy check loop

Driver waits indefinitely for the fifo occupancy to go below a threshold
as soon as the pacing interrupt is received. This can cause soft lockup on
one of the processors, if the rate of DB is very high.

Add a loop count for FPGA and exit the __wait_for_fifo_occupancy_below_th
if the loop is taking more time. Pacing will be continuing until the
occupancy is below the threshold. This is ensured by the checks in
bnxt_re_pacing_timer_exp and further scheduling the work for pacing based
on the fifo occupancy.

Fixes: 2ad4e6303a6d ("RDMA/bnxt_re: Implement doorbell pacing algorithm")
Link: https://patch.msgid.link/r/[email protected]
Reviewed-by: Kalesh AP <[email protected]>
Reviewed-by: Chandramohan Akula <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

RDMA/bnxt_re: Fix a possible NULL pointer dereference

There is a possibility of a NULL pointer dereference in the failure path
of bnxt_re_add_device(). To address that, moved the update of
"rdev->adev" to bnxt_re_dev_add().

Fixes: dee3da3422d5 ("RDMA/bnxt_re: Change aux driver data to en_info to hold more information")
Link: https://patch.msgid.link/r/[email protected]
Reported-by: Dan Carpenter <[email protected]>
Closes: https://lore.kernel.org/linux-rdma/CAH-L+nMCwymKGqf5pd8-FZNhxEkDD=kb6AoCaE6fAVi7b3e5Qw@mail.gmail.com/T/#t
Signed-off-by: Kalesh AP <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

RDMA/bnxt_re: Return more meaningful error

When the HWRM command fails, driver currently returns -EFAULT(Bad
address). This does not look correct.

Modified to return -EIO(I/O error).

Fixes: cc1ec769b87c ("RDMA/bnxt_re: Fixing the Control path command and response handling")
Fixes: 65288a22ddd8 ("RDMA/bnxt_re: use shadow qd while posting non blocking rcfw command")
Link: https://patch.msgid.link/r/[email protected]
Signed-off-by: Kalesh AP <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

RDMA/bnxt_re: Fix incorrect dereference of srq in async event

Currently driver is not getting correct srq. Dereference only if qplib has
a valid srq.

Fixes: b02fd3f79ec3 ("RDMA/bnxt_re: Report async events and errors")
Link: https://patch.msgid.link/r/[email protected]
Reviewed-by: Saravanan Vajravel <[email protected]>
Reviewed-by: Chandramohan Akula <[email protected]>
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

RDMA/bnxt_re: Fix out of bound check

Driver exports pacing stats only on GenP5 and P7 adapters. But while
parsing the pacing stats, driver has a check for "rdev->dbr_pacing". This
caused a trace when KASAN is enabled.

BUG: KASAN: slab-out-of-bounds in bnxt_re_get_hw_stats+0x2b6a/0x2e00 [bnxt_re]
Write of size 8 at addr ffff8885942a6340 by task modprobe/4809

Fixes: 8b6573ff3420 ("bnxt_re: Update the debug counters for doorbell pacing")
Link: https://patch.msgid.link/r/[email protected]
Signed-off-by: Kalesh AP <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

RDMA/bnxt_re: Fix the max CQ WQEs for older adapters

Older adapters doesn't support the MAX CQ WQEs reported by older FW. So
restrict the value reported to 1M always for older adapters.

Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Link: https://patch.msgid.link/r/[email protected]
Signed-off-by: Abhishek Mohapatra<[email protected]>
Reviewed-by: Chandramohan Akula <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

Merge tag 'linux_kselftest-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
"Fixes for build, run-time errors, and reporting errors:

   - ftrace: regression test for a kernel crash when running function
     graph tracing and then enabling function profiler.

   - rseq: fix for mm_cid test failure.

   - vDSO:
      - fixes to reporting skip and other error conditions
      - changes unconditionally build chacha and getrandom tests on all
        architectures to make it easier for them to run in CIs
      - build error when sched.h to bring in CLONE_NEWTIME define"

* tag 'linux_kselftest-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  ftrace/selftest: Test combination of function_graph tracer and function profiler
  selftests/rseq: Fix mm_cid test failure
  selftests: vDSO: Explicitly include sched.h
  selftests: vDSO: improve getrandom and chacha error messages
  selftests: vDSO: unconditionally build getrandom test
  selftests: vDSO: unconditionally build chacha test

Merge tag 'devicetree-fixes-for-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

- Disable kunit tests for arm64+ACPI

- Fix refcount issue in kunit tests

- Drop constraints on non-conformant 'interrupt-map' in fsl,ls-extirq

- Drop type ref on 'msi-parent in fsl,qoriq-mc binding

- Move elgin,jg10309-01 to its own binding from trivial-devices

* tag 'devicetree-fixes-for-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  of: Skip kunit tests when arm64+ACPI doesn't populate root node
  of: Fix unbalanced of node refcount and memory leaks
  dt-bindings: interrupt-controller: fsl,ls-extirq: workaround wrong interrupt-map number
  dt-bindings: misc: fsl,qoriq-mc: remove ref for msi-parent
  dt-bindings: display: elgin,jg10309-01: Add own binding

Merge tag 'fbdev-for-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev

Pull fbdev platform driver fix from Helge Deller:
"Switch fbdev drivers back to struct platform_driver::remove()

  Now that 'remove()' has been converted to the sane new API, there's
  no reason for the 'remove_new()' use, so this converts back to the
  traditional and simpler name.

  See commits

     5c5a7680e67b ("platform: Provide a remove callback that returns no value")
     0edb555a65d1 ("platform: Make platform_driver::remove() return void")

  for background to this all"

* tag 'fbdev-for-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbdev: Switch back to struct platform_driver::remove()

Merge tag 'gpio-fixes-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

- fix clock handle leak in probe() error path in gpio-aspeed

- add a dummy register read to ensure the write actually completed

* tag 'gpio-fixes-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: aspeed: Use devm_clk api to manage clock source
gpio: aspeed: Add the flush write to ensure the write complete.

Merge tag 'nfs-for-6.12-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:
"Localio Bugfixes:
   - remove duplicated include in localio.c
   - fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()
   - fix Kconfig for NFS_COMMON_LOCALIO_SUPPORT
   - fix nfsd_file tracepoints to handle NULL rqstp pointers

  Other Bugfixes:
   - fix program selection loop in svc_process_common
   - fix integer overflow in decode_rc_list()
   - prevent NULL-pointer dereference in nfs42_complete_copies()
   - fix CB_RECALL performance issues when using a large number of
     delegations"

* tag 'nfs-for-6.12-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: remove revoked delegation from server's delegation list
  nfsd/localio: fix nfsd_file tracepoints to handle NULL rqstp
  nfs_common: fix Kconfig for NFS_COMMON_LOCALIO_SUPPORT
  nfs_common: fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()
  NFSv4: Prevent NULL-pointer dereference in nfs42_complete_copies()
  SUNRPC: Fix integer overflow in decode_rc_list()
  sunrpc: fix prog selection loop in svc_process_common
  nfs: Remove duplicated include in localio.c

Merge tag 'rcu.fixes.6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU fix from Neeraj Upadhyay:
"Fix rcuog kthread wakeup invocation from softirq context on a CPU
  which has been marked offline.

  This can happen when new callbacks are enqueued from a softirq on an
  offline CPU before it calls rcutree_report_cpu_dead(). When this
  happens on NOCB configuration, the rcuog wake-up is deferred through
  an IPI to an online CPU. This is done to avoid call into the scheduler
  which can risk arming the RT-bandwidth after hrtimers have been
  migrated out and disabled.

  However, doing IPI call from softirq is not allowed: Fix this by
  forcing deferred rcuog wakeup through the NOCB timer when the CPU is
  offline"

* tag 'rcu.fixes.6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux:
  rcu/nocb: Fix rcuog wake-up from offline softirq

Merge tag 'for-linus-6.12a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fix from Juergen Gross:
"A fix for topology information of Xen PV guests"

* tag 'for-linus-6.12a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: mark boot CPU of PV guest in MSR_IA32_APICBASE

ftrace/selftest: Test combination of function_graph tracer and function profiler

Masami reported a bug when running function graph tracing then the
function profiler. The following commands would cause a kernel crash:

  # cd /sys/kernel/tracing/
  # echo function_graph > current_tracer
  # echo 1 > function_profile_enabled

In that order. Create a test to test this two to make sure this does not
come back as a regression.

Link: https://lore.kernel.org/172398528350.293426.8347220120333730248.stgit@devnote2
Link: https://lore.kernel.org/all/[email protected]/
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>

selftests/rseq: Fix mm_cid test failure

Adapt the rseq.c/rseq.h code to follow GNU C library changes introduced by:

glibc commit 2e456ccf0c34 ("Linux: Make __rseq_size useful for feature detection (bug 31965)")

Without this fix, rseq selftests for mm_cid fail:

./run_param_test.sh
Default parameters
Running test spinlock
Running compare-twice test spinlock
Running mm_cid test spinlock
Error: cpu id getter unavailable

Fixes: 18c2355838e7 ("selftests/rseq: Implement rseq mm_cid field support")
Signed-off-by: Mathieu Desnoyers <[email protected]>
Cc: Peter Zijlstra <[email protected]>
CC: Boqun Feng <[email protected]>
CC: "Paul E. McKenney" <[email protected]>
Cc: Shuah Khan <[email protected]>
CC: Carlos O'Donell <[email protected]>
CC: Florian Weimer <[email protected]>
CC: [email protected]
CC: [email protected]
Signed-off-by: Shuah Khan <[email protected]>

Merge tag 'io_uring-6.12-20241011' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

- Explicitly have a mshot_finished condition for IORING_OP_RECV in
   multishot mode, similarly to what IORING_OP_RECVMSG has. This doesn't
   fix a bug right now, but it makes it harder to actually have a bug
   here if a request takes multiple iterations to finish.

- Fix handling of retry of read/write of !FMODE_NOWAIT files. If they
   are pollable, that's all we need.

* tag 'io_uring-6.12-20241011' of git://git.kernel.dk/linux:
  io_uring/rw: allow pollable non-blocking attempts for !FMODE_NOWAIT
  io_uring/rw: fix cflags posting for single issue multishot read

Merge tag 'pm-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These address two issues in the TPMI module of the Intel RAPL power
  capping driver and one issue in the processor part of the Intel
  int340x thermal driver, update a CPU ID list and register definitions
  needed for RAPL PL4 support and remove some unused code.

  Specifics:

   - Fix the TPMI_RAPL_REG_DOMAIN_INFO register offset in the TPMI part
     of the Intel RAPL power capping driver, make it ignore minor
     hardware version mismatches (which only indicate exposing
     additional features) and update register definitions in it to
     enable PL4 support (Zhang Rui)

   - Add Arrow Lake-U to the list of processors supporting PL4 in the
     MSR part of the Intel RAPL power capping driver (Sumeet Pawnikar)

   - Remove excess pci_disable_device() calls from the processor part of
     the int340x thermal driver to address a warning triggered during
     module unload and remove unused CPU hotplug code related to RAPL
     support from it (Zhang Rui)"

* tag 'pm-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: intel: int340x: processor: Add MMIO RAPL PL4 support
  thermal: intel: int340x: processor: Remove MMIO RAPL CPU hotplug support
  powercap: intel_rapl_msr: Add PL4 support for Arrowlake-U
  powercap: intel_rapl_tpmi: Ignore minor version change
  thermal: intel: int340x: processor: Fix warning during module unload
  powercap: intel_rapl_tpmi: Fix bogus register reading

Merge tag 'thermal-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control fixes from Rafael Wysocki:
"Address possible use-after-free scenarios during the processing of
  thermal netlink commands and during thermal zone removal (Rafael
  Wysocki)"

* tag 'thermal-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: core: Free tzp copy along with the thermal zone
  thermal: core: Reference count the zone in thermal_zone_get_by_id()

Merge tag 'acpi-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
"Reduce the number of ACPI IRQ override DMI quirks by combining quirks
  that cover similar systems while making them cover additional models
  at the same time (Hans de Goede)"

* tag 'acpi-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: resource: Fold Asus Vivobook Pro N6506M* DMI quirks together
  ACPI: resource: Fold Asus ExpertBook B1402C* and B1502C* DMI quirks together
  ACPI: resource: Make Asus ExpertBook B2502 matches cover more models
  ACPI: resource: Make Asus ExpertBook B2402 matches cover more models

Merge tag 'pmdomain-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm

Pull pmdomain fixes from Ulf Hansson:
"pmdomain core:
   - Fix alloc/free in dev_pm_domain_attach|detach_list()

  pmdomain providers:
   - qcom: Fix the return of uninitialized variable

  pmdomain consumers:
   - drm/tegra/gr3d: Revert conversion to dev_pm_domain_attach|detach_list()

  OPP core:
   - Fix error code in dev_pm_opp_set_config()"

* tag 'pmdomain-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
  PM: domains: Fix alloc/free in dev_pm_domain_attach|detach_list()
  Revert "drm/tegra: gr3d: Convert into dev_pm_domain_attach|detach_list()"
  pmdomain: qcom-cpr: Fix the return of uninitialized variable
  OPP: fix error code in dev_pm_opp_set_config()

Merge tag 'mmc-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
"MMC core:
   - Prevent splat from warning when setting maximum DMA segment

  MMC host:
   - mvsdio: Drop sg_miter support for PIO as it didn't work
   - sdhci-of-dwcmshc: Prevent stale interrupt for the T-Head 1520
     variant"

* tag 'mmc-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci-of-dwcmshc: Prevent stale command interrupt handling
  Revert "mmc: mvsdio: Use sg_miter for PIO"
  mmc: core: Only set maximum DMA segment size if DMA is supported

Merge tag 'ata-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata fixes from Niklas Cassel:

- Fix a hibernate regression where the disk was needlessly spun down
   and then immediately spun up both when entering and when resuming
   from hibernation (me)

- Update the MAINTAINERS file to remove remnants from Jens
   maintainership of libata (Damien)

* tag 'ata-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: libata: Update MAINTAINERS file
  ata: libata: avoid superfluous disk spin down + spin up during hibernation

Merge tag 'drm-fixes-2024-10-11' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Weekly fixes haul for drm, lots of small fixes all over, amdgpu, xe
  lead the way, some minor nouveau and radeon fixes, and then a bunch of
  misc all over.

  Nothing too scary or out of the unusual.

  sched:
   - Avoid leaking lockdep map

  fbdev-dma:
   - Only clean up deferred I/O if instanciated

  amdgpu:
   - Fix invalid UBSAN warnings
   - Fix artifacts in MPO transitions
   - Hibernation fix

  amdkfd:
   - Fix an eviction fence leak

  radeon:
   - Add late register for connectors
   - Always set GEM function pointers

  i915:
   - HDCP refcount fix

  nouveau:
   - dmem: Fix privileged error in copy engine channel; Fix possible
     data leak in migrate_to_ram()
   - gsp: Fix coding style

  v3d:
   - Stop active perfmon before destroying it

  vc4:
   - Stop active perfmon before destroying it

  xe:
   - Drop GuC submit_wq pool
   - Fix error checking with xa_store()
   - Fix missing freq restore on GSC load error
   - Fix wedged_mode file permission
   - Fix use-after-free in ct communication"

* tag 'drm-fixes-2024-10-11' of https://gitlab.freedesktop.org/drm/kernel:
  drm/fbdev-dma: Only cleanup deferred I/O if necessary
  drm/xe: Make wedged_mode debugfs writable
  drm/xe: Restore GT freq on GSC load error
  drm/xe/guc_submit: fix xa_store() error checking
  drm/xe/ct: fix xa_store() error checking
  drm/xe/ct: prevent UAF in send_recv()
  drm/radeon: always set GEM function pointer
  nouveau/dmem: Fix vulnerability in migrate_to_ram upon copy error
  nouveau/dmem: Fix privileged error in copy engine channel
  drm/amd/display: fix hibernate entry for DCN35+
  drm/amd/display: Clear update flags after update has been applied
  drm/amdgpu: partially revert powerplay `__counted_by` changes
  drm/radeon: add late_register for connector
  drm/amdkfd: Fix an eviction fence leak
  drm/vc4: Stop the active perfmon before being destroyed
  drm/v3d: Stop the active perfmon before being destroyed
  drm/i915/hdcp: fix connector refcounting
  drm/nouveau/gsp: remove extraneous ; after mutex
  drm/xe: Drop GuC submit_wq pool
  drm/sched: Use drm sched lockdep map for submit_wq

btrfs: fix uninitialized pointer free on read_alloc_one_name() error

The function read_alloc_one_name() does not initialize the name field of
the passed fscrypt_str struct if kmalloc fails to allocate the
corresponding buffer. Thus, it is not guaranteed that
fscrypt_str.name is initialized when freeing it.

This is a follow-up to the linked patch that fixes the remaining
instances of the bug introduced by commit e43eec81c516 ("btrfs: use
struct qstr instead of name and namelen pairs").

Link: https://lore.kernel.org/linux-btrfs/[email protected]/
Fixes: e43eec81c516 ("btrfs: use struct qstr instead of name and namelen pairs")
CC: [email protected] # 6.1+
Reviewed-by: Anand Jain <[email protected]>
Signed-off-by: Roi Martin <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: send: cleanup unneeded return variable in changed_verity()

As all changed_* functions need to return something, just return 0
directly here, as the verity status is passed via the context.

Reported by LKP: fs/btrfs/send.c:6877:5-8: Unneeded variable: "ret". Return "0" on line 6883

Reported-by: kernel test robot <[email protected]>
Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Christian Heusel <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: fix uninitialized pointer free in add_inode_ref()

The add_inode_ref() function does not initialize the "name" struct when
it is declared. If any of the following calls to "read_one_inode()
returns NULL,

dir = read_one_inode(root, parent_objectid);
if (!dir) {
ret = -ENOENT;
goto out;
}

inode = read_one_inode(root, inode_objectid);
if (!inode) {
ret = -EIO;
goto out;
}

then "name.name" would be freed on "out" before being initialized.

out:
...
kfree(name.name);

This issue was reported by Coverity with CID 1526744.

Fixes: e43eec81c516 ("btrfs: use struct qstr instead of name and namelen pairs")
CC: [email protected] # 6.6+
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Roi Martin <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

RDMA/srpt: Make slab cache names unique

Since commit 4c39529663b9 ("slab: Warn on duplicate cache names when
DEBUG_VM=y"), slab complains about duplicate cache names. Hence this
patch. The approach is as follows:
- Maintain an xarray with the slab size as index and a reference count
  and a kmem_cache pointer as contents. Use srpt-${slab_size} as kmem
  cache name.
- Use 512-byte alignment for all slabs instead of only for some of the
  slabs.
- Increment the reference count instead of calling kmem_cache_create().
- Decrement the reference count instead of calling kmem_cache_destroy().

Fixes: 5dabcd0456d7 ("RDMA/srpt: Add support for immediate data")
Link: https://patch.msgid.link/r/[email protected]
Reported-by: Shinichiro Kawasaki <[email protected]>
Closes: https://lore.kernel.org/linux-block/xpe6bea7rakpyoyfvspvin2dsozjmjtjktpph7rep3h25tv7fb@ooz4cu5z6bq6/
Suggested-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
Tested-by: Shin'ichiro Kawasaki <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

RDMA/irdma: Fix misspelling of "accept*"

There is "accept*" misspelled as "accpet*" in the comments. Fix the
spelling.

Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager")
Link: https://patch.msgid.link/r/[email protected]
Signed-off-by: Alexander Zubkov <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

RDMA/cxgb4: Fix RDMA_CM_EVENT_UNREACHABLE error for iWARP

ip_dev_find() always returns real net_device address, whether traffic is
running on a vlan or real device, if traffic is over vlan, filling
endpoint struture with real ndev and an attempt to send a connect request
will results in RDMA_CM_EVENT_UNREACHABLE error. This patch fixes the
issue by using vlan_dev_real_dev().

Fixes: 830662f6f032 ("RDMA/cxgb4: Add support for active and passive open connection with IPv6 address")
Link: https://patch.msgid.link/r/[email protected]
Signed-off-by: Anumula Murali Mohan Reddy <[email protected]>
Signed-off-by: Potnuri Bharat Teja <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

RDMA/siw: Add sendpage_ok() check to disable MSG_SPLICE_PAGES

While running ISER over SIW, the initiator machine encounters a warning
from skb_splice_from_iter() indicating that a slab page is being used in
send_page. To address this, it is better to add a sendpage_ok() check
within the driver itself, and if it returns 0, then MSG_SPLICE_PAGES flag
should be disabled before entering the network stack.

A similar issue has been discussed for NVMe in this thread:
https://lore.kernel.org/all/20240530142417 [email protected]/

  WARNING: CPU: 0 PID: 5342 at net/core/skbuff.c:7140 skb_splice_from_iter+0x173/0x320
  Call Trace:
   tcp_sendmsg_locked+0x368/0xe40
   siw_tx_hdt+0x695/0xa40 [siw]
   siw_qp_sq_process+0x102/0xb00 [siw]
   siw_sq_resume+0x39/0x110 [siw]
   siw_run_sq+0x74/0x160 [siw]
   kthread+0xd2/0x100
   ret_from_fork+0x34/0x40
   ret_from_fork_asm+0x1a/0x30

Link: https://patch.msgid.link/r/[email protected]
Signed-off-by: Showrya M N <[email protected]>
Signed-off-by: Potnuri Bharat Teja <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

btrfs: use sector numbers as keys for the dirty extents xarray

We are using the logical address ("bytenr") of an extent as the key for
qgroup records in the dirty extents xarray. This is a problem because the
xarrays use "unsigned long" for keys/indices, meaning that on a 32 bits
platform any extent starting at or beyond 4G is truncated, which is a too
low limitation as virtually everyone is using storage with more than 4G of
space. This means a "bytenr" of 4G gets truncated to 0, and so does 8G and
16G for example, resulting in incorrect qgroup accounting.

Fix this by using sector numbers as keys instead, that is, using keys that
match the logical address right shifted by fs_info->sectorsize_bits, which
is what we do for the fs_info->buffer_radix that tracks extent buffers
(radix trees also use an "unsigned long" type for keys). This also makes
the index space more dense which helps optimize the xarray (as mentioned
at Documentation/core-api/xarray.rst).

Fixes: 3cce39a8ca4e ("btrfs: qgroup: use xarray to track dirty extents in transaction")
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

ksmbd: add support for supplementary groups

Even though system user has a supplementary group, It gets
NT_STATUS_ACCESS_DENIED when attempting to create file or directory.
This patch add KSMBD_EVENT_LOGIN_REQUEST_EXT/RESPONSE_EXT netlink events
to get supplementary groups list. The new netlink event doesn't break
backward compatibility when using old ksmbd-tools.

Co-developed-by: Atte Heikkilä <[email protected]>
Signed-off-by: Atte Heikkilä <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>

f2fs: allow parallel DIO reads

This fixes a regression which prevents parallel DIO reads.

Fixes: 0cac51185e65 ("f2fs: fix to avoid racing in between read and OPU dio write")
Reviewed-by: Daeho Jeong <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

HID: wacom: Hardcode (non-inverted) AES pens as BTN_TOOL_PEN

Unlike EMR tools which encode type information in their tool ID, tools
for AES sensors are all "generic pens". It is inappropriate to make use
of the wacom_intuos_get_tool_type function when dealing with these kinds
of devices. Instead, we should only ever report BTN_TOOL_PEN or
BTN_TOOL_RUBBER, as depending on the state of the Eraser and Invert
bits.

Reported-by: Daniel Jutz <[email protected]>
Closes: https://lore.kernel.org/linux-input/[email protected]/
Bisected-by: Christian Heusel <[email protected]>
Fixes: 9c2913b962da ("HID: wacom: more appropriate tool type categorization")
Link: https://gitlab.freedesktop.org/libinput/libinput/-/issues/1041
Link: https://github.com/linuxwacom/input-wacom/issues/440
Signed-off-by: Jason Gerecke <[email protected]>
Cc: [email protected]
Acked-by: Benjamin Tissoires <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>

sched/core: Dequeue PSI signals for blocked tasks that are delayed

psi_dequeue() in for blocked task expects psi_sched_switch() to clear
the TSK_.*RUNNING PSI flags and set the TSK_IOWAIT flags however
psi_sched_switch() uses "!task_on_rq_queued(prev)" to detect if the task
is blocked or still runnable which is no longer true with DELAY_DEQUEUE
since a blocking task can be left queued on the runqueue.

This can lead to PSI splats similar to:

psi: inconsistent task state! task=... cpu=... psi_flags=4 clear=0 set=4

when the task is requeued since the TSK_RUNNING flag was not cleared
when the task was blocked.

Explicitly communicate that the task was blocked to psi_sched_switch()
even if it was delayed and is still on the runqueue.

[ prateek: Broke off the relevant part from [1], commit message ]

Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue")
Closes: https://lore.kernel.org/lkml/[email protected]/
Closes: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Not-yet-signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: K Prateek Nayak <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Johannes Weiner <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/

sched: Fix delayed_dequeue vs switched_from_fair()

Commit 2e0199df252a ("sched/fair: Prepare exit/cleanup paths for delayed_dequeue")
and its follow up fixes try to deal with a rather unfortunate
situation where is task is enqueued in a new class, even though it
shouldn't have been. Mostly because the existing ->switched_to/from()
hooks are in the wrong place for this case.

This all led to Paul being able to trigger failures at something like
once per 10k CPU hours of RCU torture.

For now, do the ugly thing and move the code to the right place by
ignoring the switch hooks.

Note: Clean up the whole sched_class::switch*_{to,from}() thing.

Fixes: 2e0199df252a ("sched/fair: Prepare exit/cleanup paths for delayed_dequeue")
Reported-by: Paul E. McKenney <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

sched/core: Disable page allocation in task_tick_mm_cid()

With KASAN and PREEMPT_RT enabled, calling task_work_add() in
task_tick_mm_cid() may cause the following splat.

[   63.696416] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[   63.696416] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 610, name: modprobe
[   63.696416] preempt_count: 10001, expected: 0
[   63.696416] RCU nest depth: 1, expected: 1

This problem is caused by the following call trace.

  sched_tick() [ acquire rq->__lock ]
   -> task_tick_mm_cid()
    -> task_work_add()
     -> __kasan_record_aux_stack()
      -> kasan_save_stack()
       -> stack_depot_save_flags()
        -> alloc_pages_mpol_noprof()
         -> __alloc_pages_noprof()
  -> get_page_from_freelist()
   -> rmqueue()
    -> rmqueue_pcplist()
     -> __rmqueue_pcplist()
      -> rmqueue_bulk()
       -> rt_spin_lock()

The rq lock is a raw_spinlock_t. We can't sleep while holding
it. IOW, we can't call alloc_pages() in stack_depot_save_flags().

The task_tick_mm_cid() function with its task_work_add() call was
introduced by commit 223baf9d17f2 ("sched: Fix performance regression
introduced by mm_cid") in v6.4 kernel.

Fortunately, there is a kasan_record_aux_stack_noalloc() variant that
calls stack_depot_save_flags() while not allowing it to allocate
new pages.  To allow task_tick_mm_cid() to use task_work without
page allocation, a new TWAF_NO_ALLOC flag is added to enable calling
kasan_record_aux_stack_noalloc() instead of kasan_record_aux_stack()
if set. The task_tick_mm_cid() function is modified to add this new flag.

The possible downside is the missing stack trace in a KASAN report due
to new page allocation required when task_work_add_noallloc() is called
which should be rare.

Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid")
Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

sched/deadline: Use hrtick_enabled_dl() before start_hrtick_dl()

The deadline server code moved one of the start_hrtick_dl() calls
but dropped the dl specific hrtick_enabled check. This causes hrticks
to get armed even when sched_feat(HRTICK_DL) is false. Fix it.

Fixes: 63ba8422f876 ("sched/deadline: Introduce deadline servers")
Signed-off-by: Phil Auld <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Juri Lelli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

erofs: get rid of kaddr in `struct z_erofs_maprecorder`

`kaddr` becomes useless after switching to metabuf.

Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

erofs: get rid of z_erofs_try_to_claim_pcluster()

Just fold it into the caller for simplicity.

Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

erofs: ensure regular inodes for file-backed mounts

Only regular inodes are allowed for file-backed mounts, not directories
(as seen in the original syzbot case) or special inodes.

Also ensure that .read_folio() is implemented on the underlying fs
for the primary device.

Fixes: fb176750266a ("erofs: add file-backed mount support")
Reported-by: [email protected]
Closes: https://lore.kernel.org/r/[email protected]
Tested-by: [email protected]
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

powerpc/8xx: Fix kernel DTLB miss on dcbz

Following OOPS is encountered while loading test_bpf module
on powerpc 8xx:

[  218.835567] BUG: Unable to handle kernel data access on write at 0xcb000000
[  218.842473] Faulting instruction address: 0xc0017a80
[  218.847451] Oops: Kernel access of bad area, sig: 11 [#1]
[  218.852854] BE PAGE_SIZE=16K PREEMPT CMPC885
[  218.857207] SAF3000 DIE NOTIFICATION
[  218.860713] Modules linked in: test_bpf(+) test_module
[  218.865867] CPU: 0 UID: 0 PID: 527 Comm: insmod Not tainted 6.11.0-s3k-dev-09856-g3de3d71ae2e6-dirty #1280
[  218.875546] Hardware name: MIAE 8xx 0x500000 CMPC885
[  218.880521] NIP:  c0017a80 LR: beab859c CTR: 000101d4
[  218.885584] REGS: cac2bc90 TRAP: 0300   Not tainted  (6.11.0-s3k-dev-09856-g3de3d71ae2e6-dirty)
[  218.894308] MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 55005555  XER: a0007100
[  218.901290] DAR: cb000000 DSISR: c2000000
[  218.901290] GPR00: 000185d1 cac2bd50 c21b9580 caf7c030 c3883fcc 00000008 cafffffc 00000000
[  218.901290] GPR08: 00040000 18300000 20000000 00000004 99005555 100d815e ca669d08 00000369
[  218.901290] GPR16: ca730000 00000000 ca2c004c 00000000 00000000 0000035d 00000311 00000369
[  218.901290] GPR24: ca732240 00000001 00030ba3 c3800000 00000000 00185d48 caf7c000 ca2c004c
[  218.941087] NIP [c0017a80] memcpy+0x88/0xec
[  218.945277] LR [beab859c] test_bpf_init+0x22c/0x3c90 [test_bpf]
[  218.951476] Call Trace:
[  218.953916] [cac2bd50] [beab8570] test_bpf_init+0x200/0x3c90 [test_bpf] (unreliable)
[  218.962034] [cac2bde0] [c0004c04] do_one_initcall+0x4c/0x1fc
[  218.967706] [cac2be40] [c00a2ec4] do_init_module+0x68/0x360
[  218.973292] [cac2be60] [c00a5194] init_module_from_file+0x8c/0xc0
[  218.979401] [cac2bed0] [c00a5568] sys_finit_module+0x250/0x3f0
[  218.985248] [cac2bf20] [c000e390] system_call_exception+0x8c/0x15c
[  218.991444] [cac2bf30] [c00120a8] ret_from_syscall+0x0/0x28

This happens in the main loop of memcpy()

  ==> c0017a80: 7c 0b 37 ec dcbz    r11,r6
c0017a84: 80 e4 00 04 lwz     r7,4(r4)
c0017a88: 81 04 00 08 lwz     r8,8(r4)
c0017a8c: 81 24 00 0c lwz     r9,12(r4)
c0017a90: 85 44 00 10 lwzu    r10,16(r4)
c0017a94: 90 e6 00 04 stw     r7,4(r6)
c0017a98: 91 06 00 08 stw     r8,8(r6)
c0017a9c: 91 26 00 0c stw     r9,12(r6)
c0017aa0: 95 46 00 10 stwu    r10,16(r6)
c0017aa4: 42 00 ff dc bdnz    c0017a80 <memcpy+0x88>

Commit ac9f97ff8b32 ("powerpc/8xx: Inconditionally use task PGDIR in
DTLB misses") relies on re-reading DAR register to know if an error is
due to a missing copy of a PMD entry in task's PGDIR, allthough DAR
was already read in the exception prolog and copied into thread
struct. This is because is it done very early in the exception and
there are not enough registers available to keep a pointer to thread
struct.

However, dcbz instruction is buggy and doesn't update DAR register on
fault. That is detected and generates a call to FixupDAR workaround
which updates DAR copy in thread struct but doesn't fix DAR register.

Let's fix DAR in addition to the update of DAR copy in thread struct.

Fixes: ac9f97ff8b32 ("powerpc/8xx: Inconditionally use task PGDIR in DTLB misses")
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/2b851399bd87e81c6ccb87ea3a7a6b32c7aa04d7.1728118396.git.christophe.leroy@csgroup.eu

Merge tag 'drm-xe-fixes-2024-10-10' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- Fix error checking with xa_store() (Matthe Auld)
- Fix missing freq restore on GSC load error (Vinay)
- Fix wedged_mode file permission (Matt Roper)
- Fix use-after-free in ct communication (Matthew Auld)

Signed-off-by: Dave Airlie <[email protected]>
From: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/jri65tmv3bjbhqhxs5smv45nazssxzhtwphojem4uufwtjuliy@gsdhlh6kzsdy

Merge tag 'drm-misc-fixes-2024-10-10' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

fbdev-dma:
- Only clean up deferred I/O if instanciated

nouveau:
- dmem: Fix privileged error in copy engine channel; Fix possible
data leak in migrate_to_ram()
- gsp: Fix coding style

sched:
- Avoid leaking lockdep map

v3d:
- Stop active perfmon before destroying it

vc4:
- Stop active perfmon before destroying it

xe:
- Drop GuC submit_wq pool

Signed-off-by: Dave Airlie <[email protected]>
From: Thomas Zimmermann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'drm-intel-fixes-2024-10-10' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- HDCP refcount fix

Signed-off-by: Dave Airlie <[email protected]>
From: Joonas Lahtinen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

sched_ext: Don't hold scx_tasks_lock for too long

While enabling and disabling a BPF scheduler, every task is iterated a
couple times by walking scx_tasks. Except for one, all iterations keep
holding scx_tasks_lock. On multi-socket systems under heavy rq lock
contention and high number of threads, this can can lead to RCU and other
stalls.

The following is triggered on a 2 x AMD EPYC 7642 system (192 logical CPUs)
running `stress-ng --workload 150 --workload-threads 10` with >400k idle
threads and RCU stall period reduced to 5s:

  rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
  rcu:     91-...!: (10 ticks this GP) idle=0754/1/0x4000000000000000 softirq=18204/18206 fqs=17
  rcu:     186-...!: (17 ticks this GP) idle=ec54/1/0x4000000000000000 softirq=25863/25866 fqs=17
  rcu:     (detected by 80, t=10042 jiffies, g=89305, q=33 ncpus=192)
  Sending NMI from CPU 80 to CPUs 91:
  NMI backtrace for cpu 91
  CPU: 91 UID: 0 PID: 284038 Comm: sched_ext_ops_h Kdump: loaded Not tainted 6.12.0-rc2-work-g6bf5681f7ee2-dirty #471
  Hardware name: Supermicro Super Server/H11DSi, BIOS 2.8 12/14/2023
  Sched_ext: simple (disabling+all)
  RIP: 0010:queued_spin_lock_slowpath+0x17b/0x2f0
  Code: 02 c0 10 03 00 83 79 08 00 75 08 f3 90 83 79 08 00 74 f8 48 8b 11 48 85 d2 74 09 0f 0d 0a eb 0a 31 d2 eb 06 31 d2 eb 02 f3 90 <8b> 07 66 85 c0 75 f7 39 d8 75 0d be 01 00 00 00 89 d8 f0 0f b1 37
  RSP: 0018:ffffc9000fadfcb8 EFLAGS: 00000002
  RAX: 0000000001700001 RBX: 0000000001700000 RCX: ffff88bfcaaf10c0
  RDX: 0000000000000000 RSI: 0000000000000101 RDI: ffff88bfca8f0080
  RBP: 0000000001700000 R08: 0000000000000090 R09: ffffffffffffffff
  R10: ffff88a74761b268 R11: 0000000000000000 R12: ffff88a6b6765460
  R13: ffffc9000fadfd60 R14: ffff88bfca8f0080 R15: ffff88bfcaac0000
  FS:  0000000000000000(0000) GS:ffff88bfcaac0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f5c55f526a0 CR3: 0000000afd474000 CR4: 0000000000350eb0
  Call Trace:
   <NMI>
   </NMI>
   <TASK>
   do_raw_spin_lock+0x9c/0xb0
   task_rq_lock+0x50/0x190
   scx_task_iter_next_locked+0x157/0x170
   scx_ops_disable_workfn+0x2c2/0xbf0
   kthread_worker_fn+0x108/0x2a0
   kthread+0xeb/0x110
   ret_from_fork+0x36/0x40
   ret_from_fork_asm+0x1a/0x30
   </TASK>
  Sending NMI from CPU 80 to CPUs 186:
  NMI backtrace for cpu 186
  CPU: 186 UID: 0 PID: 51248 Comm: fish Kdump: loaded Not tainted 6.12.0-rc2-work-g6bf5681f7ee2-dirty #471

scx_task_iter can safely drop locks while iterating. Make
scx_task_iter_next() drop scx_tasks_lock every 32 iterations to avoid
stalls.

Signed-off-by: Tejun Heo <[email protected]>
Acked-by: David Vernet <[email protected]>

sched_ext: Move scx_tasks_lock handling into scx_task_iter helpers

Iterating with scx_task_iter involves scx_tasks_lock and optionally the rq
lock of the task being iterated. Both locks can be released during iteration
and the iteration can be continued after re-grabbing scx_tasks_lock.
Currently, all lock handling is pushed to the caller which is a bit
cumbersome and makes it difficult to add lock-aware behaviors. Make the
scx_task_iter helpers handle scx_tasks_lock.

- scx_task_iter_init/scx_taks_iter_exit() now grabs and releases
  scx_task_lock, respectively. Renamed to
  scx_task_iter_start/scx_task_iter_stop() to more clearly indicate that
  there are non-trivial side-effects.

- Add __ prefix to scx_task_iter_rq_unlock() to indicate that the function
  is internal.

- Add scx_task_iter_unlock/relock(). The former drops both rq lock (if held)
  and scx_tasks_lock and the latter re-locks only scx_tasks_lock.

This doesn't cause behavior changes and will be used to implement stall
avoidance.

Signed-off-by: Tejun Heo <[email protected]>
Acked-by: David Vernet <[email protected]>

sched_ext: bypass mode shouldn't depend on ops.select_cpu()

Bypass mode was depending on ops.select_cpu() which can't be trusted as with
the rest of the BPF scheduler. Always enable and use scx_select_cpu_dfl() in
bypass mode.

Signed-off-by: Tejun Heo <[email protected]>
Acked-by: David Vernet <[email protected]>

sched_ext: Move scx_buildin_idle_enabled check to scx_bpf_select_cpu_dfl()

Move the sanity check from the inner function scx_select_cpu_dfl() to the
exported kfunc scx_bpf_select_cpu_dfl(). This doesn't cause behavior
differences and will allow using scx_select_cpu_dfl() in bypass mode
regardless of scx_builtin_idle_enabled.

Signed-off-by: Tejun Heo <[email protected]>

sched_ext: Start schedulers with consistent p->scx.slice values

The disable path caps p->scx.slice to SCX_SLICE_DFL. As the field is already
being ignored at this stage during disable, the only effect this has is that
when the next BPF scheduler is loaded, it won't see unreasonable left-over
slices. Ultimately, this shouldn't matter but it's better to start in a
known state. Drop p->scx.slice capping from the disable path and instead
reset it to SCX_SLICE_DFL in the enable path.

Signed-off-by: Tejun Heo <[email protected]>
Acked-by: David Vernet <[email protected]>

Revert "sched_ext: Use shorter slice while bypassing"

This reverts commit 6f34d8d382d64e7d8e77f5a9ddfd06f4c04937b0.

Slice length is ignored while bypassing and tasks are switched on every tick
and thus the patch does not make any difference. The perceived difference
was from test noise.

Signed-off-by: Tejun Heo <[email protected]>
Acked-by: David Vernet <[email protected]>

Merge tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth and netfilter.

  Current release - regressions:

   - dsa: sja1105: fix reception from VLAN-unaware bridges

   - Revert "net: stmmac: set PP_FLAG_DMA_SYNC_DEV only if XDP is
     enabled"

   - eth: fec: don't save PTP state if PTP is unsupported

  Current release - new code bugs:

   - smc: fix lack of icsk_syn_mss with IPPROTO_SMC, prevent null-deref

   - eth: airoha: update Tx CPU DMA ring idx at the end of xmit loop

   - phy: aquantia: AQR115c fix up PMA capabilities

  Previous releases - regressions:

   - tcp: 3 fixes for retrans_stamp and undo logic

  Previous releases - always broken:

   - net: do not delay dst_entries_add() in dst_release()

   - netfilter: restrict xtables extensions to families that are safe,
     syzbot found a way to combine ebtables with extensions that are
     never used by userspace tools

   - sctp: ensure sk_state is set to CLOSED if hashing fails in
     sctp_listen_start

   - mptcp: handle consistently DSS corruption, and prevent corruption
     due to large pmtu xmit"

* tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
  MAINTAINERS: Add headers and mailing list to UDP section
  MAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL]
  slip: make slhc_remember() more robust against malicious packets
  net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC
  ppp: fix ppp_async_encode() illegal access
  docs: netdev: document guidance on cleanup patches
  phonet: Handle error of rtnl_register_module().
  mpls: Handle error of rtnl_register_module().
  mctp: Handle error of rtnl_register_module().
  bridge: Handle error of rtnl_register_module().
  vxlan: Handle error of rtnl_register_module().
  rtnetlink: Add bulk registration helpers for rtnetlink message handlers.
  net: do not delay dst_entries_add() in dst_release()
  mptcp: pm: do not remove closing subflows
  mptcp: fallback when MPTCP opts are dropped after 1st data
  tcp: fix mptcp DSS corruption due to large pmtu xmit
  mptcp: handle consistently DSS corruption
  net: netconsole: fix wrong warning
  net: dsa: refuse cross-chip mirroring operations
  net: fec: don't save PTP state if PTP is unsupported
  ...

Merge tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:
"Ring-buffer fix: do not have boot-mapped buffers use CPU hotplug
  callbacks

  When a ring buffer is mapped to memory assigned at boot, it also
  splits it up evenly between the possible CPUs. But the allocation code
  still attached a CPU notifier callback to this ring buffer. When a CPU
  is added, the callback will happen and another per-cpu buffer is
  created for the ring buffer.

  But for boot mapped buffers, there is no room to add another one (as
  they were all created already). The result of calling the CPU hotplug
  notifier on a boot mapped ring buffer is unpredictable and could lead
  to a system crash.

  If the ring buffer is boot mapped simply do not attach the CPU
  notifier to it"

* tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Do not have boot mapped buffers hook to CPU hotplug

of: Skip kunit tests when arm64+ACPI doesn't populate root node

A root node is required to apply DT overlays. A root node is usually
present after commit 7b937cc243e5 ("of: Create of_root if no dtb
provided by firmware"), except for on arm64 systems booted with ACPI
tables. In that case, the root node is intentionally not populated
because it would "allow DT devices to be instantiated atop an ACPI base
system"[1].

Introduce an OF function that skips the kunit test if the root node
isn't populated. Limit the test to when both CONFIG_ARM64 and
CONFIG_ACPI are set, because otherwise the lack of a root node is a bug.
Make the function private and take a kunit test parameter so that it
can't be abused to test for the presence of the root node in non-test
code.

Use this function to skip tests that require the root node. Currently
that's the DT tests and any tests that apply overlays.

Reported-by: Guenter Roeck <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Fixes: 893ecc6d2d61 ("of: Add KUnit test to confirm DTB is loaded")
Signed-off-by: Stephen Boyd <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring (Arm) <[email protected]>

Merge tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

- update fstrim loop and add more cancellation points, fix reported
   delayed or blocked suspend if there's a huge chunk queued

- fix error handling in recent qgroup xarray conversion

- in zoned mode, fix warning printing device path without RCU
   protection

- again fix invalid extent xarray state (6252690f7e1b), lost due to
   refactoring

* tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix clear_dirty and writeback ordering in submit_one_sector()
  btrfs: zoned: fix missing RCU locking in error message when loading zone info
  btrfs: fix missing error handling when adding delayed ref with qgroups enabled
  btrfs: add cancellation points to trim loops
  btrfs: split remaining space to discard in chunks

Merge tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

- Fix NFSD bring-up / shutdown

- Fix a UAF when releasing a stateid

* tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: fix possible badness in FREE_STATEID
  nfsd: nfsd_destroy_serv() must call svc_destroy() even if nfsd_startup_net() failed
  NFSD: Mark filecache "down" if init fails

rcu/nocb: Fix rcuog wake-up from offline softirq

After a CPU has set itself offline and before it eventually calls
rcutree_report_cpu_dead(), there are still opportunities for callbacks
to be enqueued, for example from a softirq. When that happens on NOCB,
the rcuog wake-up is deferred through an IPI to an online CPU in order
not to call into the scheduler and risk arming the RT-bandwidth after
hrtimers have been migrated out and disabled.

But performing a synchronized IPI from a softirq is buggy as reported in
the following scenario:

        WARNING: CPU: 1 PID: 26 at kernel/smp.c:633 smp_call_function_single
        Modules linked in: rcutorture torture
        CPU: 1 UID: 0 PID: 26 Comm: migration/1 Not tainted 6.11.0-rc1-00012-g9139f93209d1 #1
        Stopper: multi_cpu_stop+0x0/0x320 <- __stop_cpus+0xd0/0x120
        RIP: 0010:smp_call_function_single
        <IRQ>
        swake_up_one_online
        __call_rcu_nocb_wake
        __call_rcu_common
        ? rcu_torture_one_read
        call_timer_fn
        __run_timers
        run_timer_softirq
        handle_softirqs
        irq_exit_rcu
        ? tick_handle_periodic
        sysvec_apic_timer_interrupt
        </IRQ>

Fix this with forcing deferred rcuog wake up through the NOCB timer when
the CPU is offline. The actual wake up will happen from
rcutree_report_cpu_dead().

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Fixes: 9139f93209d1 ("rcu/nocb: Fix RT throttling hrtimer armed from offline CPU")
Reviewed-by: "Joel Fernandes (Google)" <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Neeraj Upadhyay <[email protected]>

Merge tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Carlos Maiolino:

- A few small typo fixes

- fstests xfs/538 DEBUG-only fix

- Performance fix on blockgc on COW'ed files, by skipping trims on
   cowblock inodes currently opened for write

- Prevent cowblocks to be freed under dirty pagecache during unshare

- Update MAINTAINERS file to quote the new maintainer

* tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix a typo
  xfs: don't free cowblocks from under dirty pagecache on unshare
  xfs: skip background cowblock trims on inodes open for write
  xfs: support lowmode allocations in xfs_bmap_exact_minlen_extent_alloc
  xfs: call xfs_bmap_exact_minlen_extent_alloc from xfs_bmap_btalloc
  xfs: don't ifdef around the exact minlen allocations
  xfs: fold xfs_bmap_alloc_userdata into xfs_bmapi_allocate
  xfs: distinguish extra split from real ENOSPC from xfs_attr_node_try_addname
  xfs: distinguish extra split from real ENOSPC from xfs_attr3_leaf_split
  xfs: return bool from xfs_attr3_leaf_add
  xfs: merge xfs_attr_leaf_try_add into xfs_attr_leaf_addname
  xfs: Use try_cmpxchg() in xlog_cil_insert_pcp_aggregate()
  xfs: scrub: convert comma to semicolon
  xfs: Remove empty declartion in header file
  MAINTAINERS: add Carlos Maiolino as XFS release manager

Merge branch 'maintainers-networking-file-coverage-updates'

Simon Horman says:

====================
MAINTAINERS: Networking file coverage updates

The aim of this proposal is to make the handling of some files,
related to Networking and Wireless, more consistently. It does so by:

1. Adding some more headers to the UDP section, making it consistent
   with the TCP section.

2. Excluding some files relating to Wireless from NETWORKING [GENERAL],
   making their handling consistent with other files related to
   Wireless.

The aim of this is to make things more consistent.  And for MAINTAINERS
to better reflect the situation on the ground.  I am more than happy to
be told that the current state of affairs is fine. Or for other ideas to
be discussed.

v1: https://lore.kernel.org/20241004-maint-net-hdrs-v1-0-41fd555aacc5@kernel.org
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

MAINTAINERS: Add headers and mailing list to UDP section

Add netdev mailing list and some more udp.h headers to the UDP section.
This is now more consistent with the TCP section.

Acked-by: Willem de Bruijn <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

MAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL]

We already exclude wireless drivers from the netdev@ traffic, to
delegate it to linux-wireless@, and avoid overwhelming netdev@.

Many of the following wireless-related sections MAINTAINERS
are already not included in the NETWORKING [GENERAL] section.
For consistency, exclude those that are.

* 802.11 (including CFG80211/NL80211)
* MAC80211
* RFKILL

Acked-by: Johannes Berg <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

sched_ext: use correct function name in pick_task_scx() warning message

pick_next_task_scx() was turned into pick_task_scx() since
commit 753e2836d139 ("sched_ext: Unify regular and core-sched pick
task paths"). Update the outdated message.

Signed-off-by: Honglei Wang <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>