Git Repo - linux.git/log

Merge tag 'arm-fixes-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
"There are a number of updates for devicetree files for Qualcomm,
  Rockchips, and NXP i.MX platforms, addressing mistakes in the DT
  contents:

   - Wrong GPIO polarity on some boards

   - Lower SD card interface speed for better stability

   - Incorrect power supply, clock, pmic, cache properties

   - Disable broken hbr3 on sc7280-herobrine

   - Devicetree warning fixes

  The only other changes are:

   - A regression fix for the Amlogic performance monitoring unit
     driver, along with two related DT changes.

   - imx_v6_v7_defconfig enables PCI support again.

   - Trivial fixes for tee, optee and psci firmware drivers, addressing
     compiler warning and error output"

* tag 'arm-fixes-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (32 commits)
  firmware/psci: demote suspend-mode warning to info level
  arm64: dts: qcom: sc7280: remove hbr3 support on herobrine boards
  ARM: imx_v6_v7_defconfig: Fix unintentional disablement of PCI
  arm64: dts: rockchip: correct panel supplies on some rk3326 boards
  arm64: dts: rockchip: use just "port" in panel on RockPro64
  arm64: dts: rockchip: use just "port" in panel on Pinebook Pro
  ARM: dts: imx6ull-colibri: Remove unnecessary #address-cells/#size-cells
  ARM: dts: imx7d-remarkable2: Remove unnecessary #address-cells/#size-cells
  arm64: dts: imx8mp-verdin: correct off-on-delay
  arm64: dts: imx8mm-verdin: correct off-on-delay
  arm64: dts: imx8mm-evk: correct pmic clock source
  arm64: dts: qcom: sc8280xp-pmics: fix pon compatible and registers
  arm64: dts: rockchip: Remove non-existing pwm-delay-us property
  arm64: dts: rockchip: Add clk_rtc_32k to Anbernic xx3 Devices
  tee: Pass a pointer to virt_to_page()
  perf/amlogic: adjust register offsets
  arm64: dts: meson-g12-common: resolve conflict between canvas & pmu
  arm64: dts: meson-g12-common: specify full DMC range
  arm64: dts: imx8mp: fix address length for LCDIF2
  riscv: dts: canaan: drop invalid spi-max-frequency
  ...

LoongArch: module: set section addresses to 0x0

These got*, plt* and .text.ftrace_trampoline sections specified for
LoongArch have non-zero addressses. Non-zero section addresses in a
relocatable ELF would confuse GDB when it tries to compute the section
offsets and it ends up printing wrong symbol addresses. Therefore, set
them to zero, which mirrors the change in commit 5d8591bc0fbaeb6ded
("arm64 module: set plt* section addresses to 0x0").

Cc: [email protected]
Reviewed-by: Guo Ren <[email protected]>
Signed-off-by: Chong Qiao <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Mark 3 symbol exports as non-GPL

vm_map_base, empty_zero_page and invalid_pmd_table could be accessed
widely by some out-of-tree non-GPL but important file systems or drivers
(e.g. OpenZFS). Let's use EXPORT_SYMBOL() instead of EXPORT_SYMBOL_GPL()
to export them, so as to avoid build errors.

1, Details about vm_map_base:

This is a LoongArch-specific symbol and may be referenced through macros
PCI_IOBASE, VMALLOC_START and VMALLOC_END.

2, Details about empty_zero_page:

As it stands today, only 3 architectures export empty_zero_page as a GPL
symbol: IA64, LoongArch and MIPS. LoongArch gets the GPL export by
inheriting from MIPS, and the MIPS export was first introduced in commit
497d2adcbf50b ("[MIPS] Export empty_zero_page for sake of the ext4
module."). The IA64 export was similar: commit a7d57ecf4216e ("[IA64]
Export three symbols for module use") did so for kvm.

In both IA64 and MIPS, the export of empty_zero_page was done for
satisfying some in-kernel component built as module (kvm and ext4
respectively), and given its reasonably low-level nature, GPL is a
reasonable choice. But looking at the bigger picture it is evident most
other architectures do not regard it as GPL, so in effect the symbol
probably should not be treated as such, in favor of consistency.

3, Details about invalid_pmd_table:

Keep consistency with invalid_pte_table and make it be possible by some
modules.

Cc: [email protected]
Reviewed-by: WANG Xuerui <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Enable PG when wakeup from suspend

Some firmwares don't enable PG when wakeup from suspend, so do it in
kernel. This can improve code compatibility for boot kernel.

Signed-off-by: Baoqi Zhang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Fix _CONST64_(x) as unsigned

Addresses should all be of unsigned type to avoid unnecessary conversions.

Signed-off-by: Qing Zhang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Fix build error if CONFIG_SUSPEND is not set

We can see the following build error on LoongArch if CONFIG_SUSPEND is
not set:

  ld: drivers/acpi/sleep.o: in function 'acpi_pm_prepare':
  sleep.c:(.text+0x2b8): undefined reference to 'loongarch_wakeup_start'

Here is the call trace:

  acpi_pm_prepare()
    __acpi_pm_prepare()
      acpi_sleep_prepare()
        acpi_get_wakeup_address()
          loongarch_wakeup_start()

Root cause: loongarch_wakeup_start() is defined in arch/loongarch/power/
suspend_asm.S which is only built under CONFIG_SUSPEND. In order to fix
the build error, just let acpi_get_wakeup_address() return 0 if CONFIG_
SUSPEND is not set.

Fixes: 366bb35a8e48 ("LoongArch: Add suspend (ACPI S3) support")
Reviewed-by: WANG Xuerui <[email protected]>
Reported-by: Randy Dunlap <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Tiezhu Yang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Fix probing of the CRC32 feature

Not all LoongArch processors support CRC32 instructions. This feature
is indicated by CPUCFG1.CRC32 (Bit25) but it is wrongly defined in the
previous versions of the ISA manual (and so does in loongarch.h). The
CRC32 feature is set unconditionally now, so fix it.

BTW, expose the CRC32 feature in /proc/cpuinfo.

Cc: [email protected]
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Make WriteCombine configurable for ioremap()

LoongArch maintains cache coherency in hardware, but when paired with
LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar
to WriteCombine) is out of the scope of cache coherency machanism for
PCIe devices (this is a PCIe protocol violation, which may be fixed in
newer chipsets).

This means WUC can only used for write-only memory regions now, so this
option is disabled by default, making WUC silently fallback to SUC for
ioremap(). You can enable this option if the kernel is ensured to run on
hardware without this bug.

Kernel parameter writecombine=on/off can be used to override the Kconfig
option.

Cc: [email protected]
Suggested-by: WANG Xuerui <[email protected]>
Reviewed-by: WANG Xuerui <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

mlxfw: fix null-ptr-deref in mlxfw_mfa2_tlv_next()

Function mlxfw_mfa2_tlv_multi_get() returns NULL if 'tlv' in
question does not pass checks in mlxfw_mfa2_tlv_payload_get(). This
behaviour may lead to NULL pointer dereference in 'multi->total_len'.
Fix this issue by testing mlxfw_mfa2_tlv_multi_get()'s return value
against NULL.

Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.

Fixes: 410ed13cae39 ("Add the mlxfw module for Mellanox firmware flash process")
Co-developed-by: Natalia Petrova <[email protected]>
Signed-off-by: Nikita Zhandarovich <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'bnxt_en-bug-fixes'

Michael Chan says:

====================
bnxt_en: Bug fixes

This small series contains 2 fixes. The first one fixes the PTP
initialization logic on older chips to avoid logging a warning. The
second one fixes a potenial NULL pointer dereference in the driver's
aux bus unload path.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

bnxt_en: Fix a possible NULL pointer dereference in unload path

In the driver unload path, the driver currently checks the valid
BNXT_FLAG_ROCE_CAP flag in bnxt_rdma_aux_device_uninit() before
proceeding.  This is flawed because the flag may not be set initially
during driver load.  It may be set later after the NVRAM setting is
changed followed by a firmware reset.  Relying on the
BNXT_FLAG_ROCE_CAP flag may crash in bnxt_rdma_aux_device_uninit() if
the aux device was never initialized:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 8ae6aa067 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 39 PID: 42558 Comm: rmmod Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-348.el8.x86_64 #1
Hardware name: Dell Inc. PowerEdge R750/0WT8Y6, BIOS 1.5.4 12/17/2021
RIP: 0010:device_del+0x1b/0x410
Code: 89 a5 50 03 00 00 4c 89 a5 58 03 00 00 eb 89 0f 1f 44 00 00 41 56 41 55 41 54 4c 8d a7 80 00 00 00 55 53 48 89 fb 48 83 ec 18 <48> 8b 2f 4c 89 e7 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0
RSP: 0018:ff7f82bf469a7dc8 EFLAGS: 00010292
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000206 RDI: 0000000000000000
RBP: ff31b7cd114b0ac0 R08: 0000000000000000 R09: ffffffff935c3400
R10: ff31b7cd45bc3440 R11: 0000000000000001 R12: 0000000000000080
R13: ffffffffc1069f40 R14: 0000000000000000 R15: 0000000000000000
FS:  00007fc9903ce740(0000) GS:ff31b7d4ffac0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000992fee004 CR4: 0000000000773ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
bnxt_rdma_aux_device_uninit+0x1f/0x30 [bnxt_en]
bnxt_remove_one+0x2f/0x1f0 [bnxt_en]
pci_device_remove+0x3b/0xc0
device_release_driver_internal+0x103/0x1f0
driver_detach+0x54/0x88
bus_remove_driver+0x77/0xc9
pci_unregister_driver+0x2d/0xb0
bnxt_exit+0x16/0x2c [bnxt_en]
__x64_sys_delete_module+0x139/0x280
do_syscall_64+0x5b/0x1a0
entry_SYSCALL_64_after_hwframe+0x65/0xca
RIP: 0033:0x7fc98f3af71b

Fix this by modifying the check inside bnxt_rdma_aux_device_uninit()
to check for bp->aux_priv instead.  We also need to make some changes
in bnxt_rdma_aux_device_init() to make sure that bp->aux_priv is set
only when the aux device is fully initialized.

Fixes: d80d88b0dfff ("bnxt_en: Add auxiliary driver support")
Reviewed-by: Ajit Khaparde <[email protected]>
Signed-off-by: Kalesh AP <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

bnxt_en: Do not initialize PTP on older P3/P4 chips

The driver does not support PTP on these older chips and it is assuming
that firmware on these older chips will not return the
PORT_MAC_PTP_QCFG_RESP_FLAGS_HWRM_ACCESS flag in __bnxt_hwrm_ptp_qcfg(),
causing the function to abort quietly.

But newer firmware now sets this flag and so __bnxt_hwrm_ptp_qcfg()
will proceed further. Eventually it will fail in bnxt_ptp_init() ->
bnxt_map_ptp_regs() because there is no code to support the older chips.
The driver will then complain:

"PTP initialization failed.\n"

Fix it so that we abort quietly earlier without going through the
unnecessary steps and alarming the user with the warning log.

Fixes: ae5c42f0b92c ("bnxt_en: Get PTP hardware capability from firmware")
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

x86/alternatives: Do not use integer constant suffixes in inline asm

The usage of the BIT() macro in inline asm code was introduced in 6.3 by
the commit in the Fixes tag. However, this macro uses "1UL" for integer
constant suffixes in its shift operation, while gas before 2.28 does not
support the "L" suffix after a number, and gas before 2.27 does not
support the "U" suffix, resulting in build errors such as the following
with such versions:

  ./arch/x86/include/asm/uaccess_64.h:124: Error: found 'L', expected: ')'
  ./arch/x86/include/asm/uaccess_64.h:124: Error: junk at end of line,
  first unrecognized character is `L'

However, the currently minimal binutils version the kernel supports is
2.25.

There's a single use of this macro here, revert to (1 << 0) that works
with such older binutils.

As an additional info, the binutils PRs which add support for those
suffixes are:

  https://sourceware.org/bugzilla/show_bug.cgi?id=19910
  https://sourceware.org/bugzilla/show_bug.cgi?id=20732

  [ bp: Massage and extend commit message. ]

Fixes: 5d1dd961e743 ("x86/alternatives: Add alt_instr.flags")
Signed-off-by: Willy Tarreau <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Jingbo Xu <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/

netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements

If NFT_SET_ELEM_CATCHALL is set on, then userspace provides no set element
key. Otherwise, bail out with -EINVAL.

Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support")
Signed-off-by: Pablo Neira Ayuso <[email protected]>

cxgb4: fix use after free bugs caused by circular dependency problem

The flower_stats_timer can schedule flower_stats_work and
flower_stats_work can also arm the flower_stats_timer. The
process is shown below:

----------- timer schedules work ------------
ch_flower_stats_cb() //timer handler
  schedule_work(&adap->flower_stats_work);

----------- work arms timer ------------
ch_flower_stats_handler() //workqueue callback function
  mod_timer(&adap->flower_stats_timer, ...);

When the cxgb4 device is detaching, the timer and workqueue
could still be rearmed. The process is shown below:

  (cleanup routine)           | (timer and workqueue routine)
remove_one()                  |
  free_some_resources()       | ch_flower_stats_cb() //timer
    cxgb4_cleanup_tc_flower() |   schedule_work()
      del_timer_sync()        |
                              | ch_flower_stats_handler() //workqueue
                              |   mod_timer()
      cancel_work_sync()      |
  kfree(adapter) //FREE       | ch_flower_stats_cb() //timer
                              |   adap->flower_stats_work //USE

This patch changes del_timer_sync() to timer_shutdown_sync(),
which could prevent rearming of the timer from the workqueue.

Fixes: e0f911c81e93 ("cxgb4: fetch stats for offloaded tc flower flows")
Signed-off-by: Duoming Zhou <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

netfilter: nf_tables: validate catch-all set elements

catch-all set element might jump/goto to chain that uses expressions
that require validation.

Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support")
Signed-off-by: Pablo Neira Ayuso <[email protected]>

ice: document RDMA devlink parameters

Commit e523af4ee560 ("net/ice: Add support for enable_iwarp and enable_roce
devlink param") added support for the enable_roce and enable_iwarp
parameters in the ice driver. It didn't document these parameters in the
ice devlink documentation file. Add this documentation, including a note
about the mutual exclusion between the two modes.

Signed-off-by: Jacob Keller <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Acked-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

drm/rockchip: vop2: Use regcache_sync() to fix suspend/resume

afa965a45e01 ("drm/rockchip: vop2: fix suspend/resume") uses
regmap_reinit_cache() to fix the suspend/resume issue with the VOP2
driver. During discussion it came up that we should rather use
regcache_sync() instead. As the original patch is already applied
fix this up in this follow-up patch.

Fixes: afa965a45e01 ("drm/rockchip: vop2: fix suspend/resume")
Cc: [email protected]
Signed-off-by: Sascha Hauer <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

i40e: fix i40e_setup_misc_vector() error handling

Add error handling of i40e_setup_misc_vector() in i40e_rebuild().
In case interrupt vectors setup fails do not re-open vsi-s and
do not bring up vf-s, we have no interrupts to serve a traffic
anyway.

Fixes: 41c445ff0f48 ("i40e: main driver core")
Signed-off-by: Aleksandr Loktionov <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

i40e: fix accessing vsi->active_filters without holding lock

Fix accessing vsi->active_filters without holding the mac_filter_hash_lock.
Move vsi->active_filters = 0 inside critical section and
move clear_bit(__I40E_VSI_OVERFLOW_PROMISC, vsi->state) after the critical
section to ensure the new filters from other threads can be added only after
filters cleaning in the critical section is finished.

Fixes: 278e7d0b9d68 ("i40e: store MAC/VLAN filters in a hash with the MAC Address as key")
Signed-off-by: Aleksandr Loktionov <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

SUNRPC: Fix failures of checksum Kunit tests

Scott reports that when the new GSS krb5 Kunit tests are built as
a separate module and loaded, the RFC 6803 and RFC 8009 checksum
tests all fail, even though they pass when run under kunit.py.

It appears that passing a buffer backed by static const memory to
gss_krb5_checksum() is a problem. A printk in checksum_case() shows
the correct plaintext, but by the time the buffer has been converted
to a scatterlist and arrives at checksummer(), it contains all
zeroes.

Replacing this buffer with one that is dynamically allocated fixes
the issue.

Reported-by: Scott Mayhew <[email protected]>
Fixes: 02142b2ca8fc ("SUNRPC: Add checksum KUnit tests for the RFC 6803 encryption types")
Tested-by: Scott Mayhew <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>

drm/nouveau: fix incorrect conversion to dma_resv_wait_timeout()

Commit 41d351f29528 ("drm/nouveau: stop using ttm_bo_wait")
converted from ttm_bo_wait_ctx() to dma_resv_wait_timeout().
However, dma_resv_wait_timeout() returns greater than zero on
success as opposed to ttm_bo_wait_ctx(). As a result, relocs
will fail and log errors even when it was a success.

Change the return code handling to match that of
nouveau_gem_ioctl_cpu_prep(), which was already using
dma_resv_wait_timeout() correctly.

Fixes: 41d351f29528 ("drm/nouveau: stop using ttm_bo_wait")
Reported-by: Tanmay Bhushan <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: John Ogness <[email protected]>
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
Signed-off-by: Karol Herbst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

netfilter: nf_tables: fix ifdef to also consider nf_tables=m

nftables can be built as a module, so fix the preprocessor conditional
accordingly.

Fixes: 478b360a47b7 ("netfilter: nf_tables: fix nf_trace always-on with XT_TRACE=n")
Reported-by: Florian Fainelli <[email protected]>
Reported-by: Jakub Kicinski <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

drm/rockchip: vop2: fix suspend/resume

During a suspend/resume cycle the VO power domain will be disabled and
the VOP2 registers will reset to their default values. After that the
cached register values will be out of sync and the read/modify/write
operations we do on the window registers will result in bogus values
written. Fix this by re-initializing the register cache each time we
enable the VOP2. With this the VOP2 will show a picture after a
suspend/resume cycle whereas without this the screen stays dark.

Fixes: 604be85547ce4 ("drm/rockchip: Add VOP2 driver")
Cc: [email protected]
Signed-off-by: Sascha Hauer <[email protected]>
Tested-by: Chris Morgan <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

net/sched: clear actions pointer in miss cookie init fail

Palash reports a UAF when using a modified version of syzkaller[1].

When 'tcf_exts_miss_cookie_base_alloc()' fails in 'tcf_exts_init_ex()'
a call to 'tcf_exts_destroy()' is made to free up the tcf_exts
resources.
In flower, a call to '__fl_put()' when 'tcf_exts_init_ex()' fails is made;
Then calling 'tcf_exts_destroy()', which triggers an UAF since the
already freed tcf_exts action pointer is lingering in the struct.

Before the offending patch, this was not an issue since there was no
case where the tcf_exts action pointer could linger. Therefore, restore
the old semantic by clearing the action pointer in case of a failure to
initialize the miss_cookie.

[1] https://github.com/cmu-pasta/linux-kernel-enriched-corpus

v1->v2: Fix compilation on configs without tc actions (kernel test robot)

Fixes: 80cd22c35c90 ("net/sched: cls_api: Support hardware miss to tc action")
Reported-by: Palash Oswal <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: Pedro Tammela <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drm/i915: Fix fast wake AUX sync len

Fast wake should use 8 SYNC pulses for the preamble
and 10-16 SYNC pulses for the precharge. Reduce our
fast wake SYNC count to match the maximum value.
We also use the maximum precharge length for normal
AUX transactions.

Cc: [email protected]
Cc: Jouni Högander <[email protected]>
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Jouni Högander <[email protected]>
(cherry picked from commit 605f7c73133341d4b762cbd9a22174cc22d4c38b)
Signed-off-by: Jani Nikula <[email protected]>

sfc: Fix use-after-free due to selftest_work

There is a use-after-free scenario that is:

When the NIC is down, user set mac address or vlan tag to VF,
the xxx_set_vf_mac() or xxx_set_vf_vlan() will invoke efx_net_stop()
and efx_net_open(), since netif_running() is false, the port will not
start and keep port_enabled false, but selftest_work is scheduled
in efx_net_open().

If we remove the device before selftest_work run, the efx_stop_port()
will not be called since the NIC is down, and then efx is freed,
we will soon get a UAF in run_timer_softirq() like this:

[ 1178.907941] ==================================================================
[ 1178.907948] BUG: KASAN: use-after-free in run_timer_softirq+0xdea/0xe90
[ 1178.907950] Write of size 8 at addr ff11001f449cdc80 by task swapper/47/0
[ 1178.907950]
[ 1178.907953] CPU: 47 PID: 0 Comm: swapper/47 Kdump: loaded Tainted: G           O     --------- -t - 4.18.0 #1
[ 1178.907954] Hardware name: SANGFOR X620G40/WI2HG-208T1061A, BIOS SPYH051032-U01 04/01/2022
[ 1178.907955] Call Trace:
[ 1178.907956]  <IRQ>
[ 1178.907960]  dump_stack+0x71/0xab
[ 1178.907963]  print_address_description+0x6b/0x290
[ 1178.907965]  ? run_timer_softirq+0xdea/0xe90
[ 1178.907967]  kasan_report+0x14a/0x2b0
[ 1178.907968]  run_timer_softirq+0xdea/0xe90
[ 1178.907971]  ? init_timer_key+0x170/0x170
[ 1178.907973]  ? hrtimer_cancel+0x20/0x20
[ 1178.907976]  ? sched_clock+0x5/0x10
[ 1178.907978]  ? sched_clock_cpu+0x18/0x170
[ 1178.907981]  __do_softirq+0x1c8/0x5fa
[ 1178.907985]  irq_exit+0x213/0x240
[ 1178.907987]  smp_apic_timer_interrupt+0xd0/0x330
[ 1178.907989]  apic_timer_interrupt+0xf/0x20
[ 1178.907990]  </IRQ>
[ 1178.907991] RIP: 0010:mwait_idle+0xae/0x370

If the NIC is not actually brought up, there is no need to schedule
selftest_work, so let's move invoking efx_selftest_async_start()
into efx_start_all(), and it will be canceled by broughting down.

Fixes: dd40781e3a4e ("sfc: Run event/IRQ self-test asynchronously when interface is brought up")
Fixes: e340be923012 ("sfc: add ndo_set_vf_mac() function for EF10")
Debugged-by: Huang Cun <[email protected]>
Cc: Donglin Peng <[email protected]>
Suggested-by: Martin Habets <[email protected]>
Signed-off-by: Ding Hui <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

virtio_net: bugfix overflow inside xdp_linearize_page()

Here we copy the data from the original buf to the new page. But we
not check that it may be overflow.

As long as the size received(including vnethdr) is greater than 3840
(PAGE_SIZE -VIRTIO_XDP_HEADROOM). Then the memcpy will overflow.

And this is completely possible, as long as the MTU is large, such
as 4096. In our test environment, this will cause crash. Since crash is
caused by the written memory, it is meaningless, so I do not include it.

Fixes: 72979a6c3590 ("virtio_net: xdp, add slowpath case for non contiguous buffers")
Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cifs: avoid dup prefix path in dfs_get_automount_devname()

@server->origin_fullpath already contains the tree name + optional
prefix, so avoid calling __build_path_from_dentry_optional_prefix() as
it might end up duplicating prefix path from @cifs_sb->prepath into
final full path.

Instead, generate DFS full path by simply merging
@server->origin_fullpath with dentry's path.

This fixes the following case

mount.cifs //root/dfs/dir /mnt/ -o ...
ls /mnt/link

where cifs_dfs_do_automount() will call smb3_parse_devname() with
@devname set to "//root/dfs/dir/link" instead of
"//root/dfs/dir/dir/link".

Fixes: 7ad54b98fc1f ("cifs: use origin fullpath for automounts")
Cc: <[email protected]> # 6.2+
Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

Linux 6.3-rc7

Revert "userfaultfd: don't fail on unrecognized features"

This is a proposal to revert commit 914eedcb9ba0ff53c33808.

I found this when writing a simple UFFDIO_API test to be the first unit
test in this set.  Two things breaks with the commit:

  - UFFDIO_API check was lost and missing.  According to man page, the
  kernel should reject ioctl(UFFDIO_API) if uffdio_api.api != 0xaa.  This
  check is needed if the api version will be extended in the future, or
  user app won't be able to identify which is a new kernel.

  - Feature flags checks were removed, which means UFFDIO_API with a
  feature that does not exist will also succeed.  According to the man
  page, we should (and it makes sense) to reject ioctl(UFFDIO_API) if
  unknown features passed in.

Link: https://lore.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 914eedcb9ba0 ("userfaultfd: don't fail on unrecognized features")
Signed-off-by: Peter Xu <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Dmitry Safonov <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Zach O'Keefe <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs

KASAN report null-ptr-deref:
==================================================================
BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
Write of size 8 at addr 0000000000000000 by task sync/943
CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
Call Trace:
<TASK>
dump_stack_lvl+0x7f/0xc0
print_report+0x2ba/0x340
kasan_report+0xc4/0x120
kasan_check_range+0x1b7/0x2e0
__kasan_check_write+0x24/0x40
bdi_split_work_to_wbs+0x5c5/0x7b0
sync_inodes_sb+0x195/0x630
sync_inodes_one_sb+0x3a/0x50
iterate_supers+0x106/0x1b0
ksys_sync+0x98/0x160
[...]
==================================================================

The race that causes the above issue is as follows:

           cpu1                     cpu2
-------------------------|-------------------------
inode_switch_wbs
INIT_WORK(&isw->work, inode_switch_wbs_work_fn)
queue_rcu_work(isw_wq, &isw->work)
// queue_work async
  inode_switch_wbs_work_fn
   wb_put_many(old_wb, nr_switched)
    percpu_ref_put_many
     ref->data->release(ref)
     cgwb_release
      queue_work(cgwb_release_wq, &wb->release_work)
      // queue_work async
       &wb->release_work
       cgwb_release_workfn
                            ksys_sync
                             iterate_supers
                              sync_inodes_one_sb
                               sync_inodes_sb
                                bdi_split_work_to_wbs
                                 kmalloc(sizeof(*work), GFP_ATOMIC)
                                 // alloc memory failed
        percpu_ref_exit
         ref->data = NULL
         kfree(data)
                                 wb_get(wb)
                                  percpu_ref_get(&wb->refcnt)
                                   percpu_ref_get_many(ref, 1)
                                    atomic_long_add(nr, &ref->data->count)
                                     atomic64_add(i, v)
                                     // trigger null-ptr-deref

bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all
wbs.  If the allocation of new work fails, the on-stack fallback will be
used and the reference count of the current wb is increased afterwards.
If cgroup writeback membership switches occur before getting the reference
count and the current wb is released as old_wd, then calling wb_get() or
wb_put() will trigger the null pointer dereference above.

This issue was introduced in v4.3-rc7 (see fix tag1).  Both
sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
bdi_split_work_to_wbs() can trigger this issue.  For scenarios called via
sync_inodes_sb(), originally commit 7fc5854f8c6e ("writeback: synchronize
sync(2) against cgroup writeback membership switches") reduced the
possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io,
thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
and the issue becomes easily reproducible again.

To solve this problem, percpu_ref_exit() is called under RCU protection to
avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs().
Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
and skip the current wb if wb_tryget() fails because the wb has already
been shutdown.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: b817525a4a80 ("writeback: bdi_writeback iteration must not skip dying ones")
Signed-off-by: Baokun Li <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Hou Tao <[email protected]>
Cc: yangerkun <[email protected]>
Cc: Zhang Yi <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

maple_tree: fix a potential memory leak, OOB access, or other unpredictable bug

In mas_alloc_nodes(), "node->node_count = 0" means to initialize the
node_count field of the new node, but the node may not be a new node.  It
may be a node that existed before and node_count has a value, setting it
to 0 will cause a memory leak.  At this time, mas->alloc->total will be
greater than the actual number of nodes in the linked list, which may
cause many other errors.  For example, out-of-bounds access in
mas_pop_node(), and mas_pop_node() may return addresses that should not be
used.  Fix it by initializing node_count only for new nodes.

Also, by the way, an if-else statement was removed to simplify the code.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

tools/mm/page_owner_sort.c: fix TGID output when cull=tg is used

When using cull option with 'tg' flag, the fprintf is using pid instead
of tgid. It should use tgid instead.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 9c8a0a8e599f4a ("tools/vm/page_owner_sort.c: support for user-defined culling rules")
Signed-off-by: Steve Chou <[email protected]>
Cc: Jiajian Ye <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mailmap: update jtoppins' entry to reference correct email

Link: https://lkml.kernel.org/r/d79bc6eaf65e68bd1c2a1e1510ab6291ce5926a6.1681162487.git.jtoppins@redhat.com
Signed-off-by: Jonathan Toppins <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Cc: Konrad Dybcio <[email protected]>
Cc: Qais Yousef <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/mempolicy: fix use-after-free of VMA iterator

set_mempolicy_home_node() iterates over a list of VMAs and calls
mbind_range() on each VMA, which also iterates over the singular list of
the VMA passed in and potentially splits the VMA.  Since the VMA iterator
is not passed through, set_mempolicy_home_node() may now point to a stale
node in the VMA tree.  This can result in a UAF as reported by syzbot.

Avoid the stale maple tree node by passing the VMA iterator through to the
underlying call to split_vma().

mbind_range() is also overly complicated, since there are two calling
functions and one already handles iterating over the VMAs.  Simplify
mbind_range() to only handle merging and splitting of the VMAs.

Align the new loop in do_mbind() and existing loop in
set_mempolicy_home_node() to use the reduced mbind_range() function.  This
allows for a single location of the range calculation and avoids
constantly looking up the previous VMA (since this is a loop over the
VMAs).

Link: https://lore.kernel.org/linux-mm/[email protected]/
Fixes: 66850be55e8e ("mm/mempolicy: use vma iterator & maple state instead of vma linked list")
Signed-off-by: Liam R. Howlett <[email protected]>
Reported-by: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Tested-by: [email protected]
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/huge_memory.c: warn with pr_warn_ratelimited instead of VM_WARN_ON_ONCE_FOLIO

split_huge_page_to_list() WARNs when called for huge zero pages, which
sounds to me too harsh because it does not imply a kernel bug, but just
notifies the event to admins. On the other hand, this is considered as
critical by syzkaller and makes its testing less efficient, which seems to
me harmful.

So replace the VM_WARN_ON_ONCE_FOLIO with pr_warn_ratelimited.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 478d134e9506 ("mm/huge_memory: do not overkill when splitting huge_zero_page")
Signed-off-by: Naoya Horiguchi <[email protected]>
Reported-by: [email protected]
Link: https://lore.kernel.org/lkml/[email protected]/
Reviewed-by: Yang Shi <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Xu Yu <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/mprotect: fix do_mprotect_pkey() return on error

When the loop over the VMA is terminated early due to an error, the return
code could be overwritten with ENOMEM. Fix the return code by only
setting the error on early loop termination when the error is not set.

User-visible effects include: attempts to run mprotect() against a
special mapping or with a poorly-aligned hugetlb address should return
-EINVAL, but they presently return -ENOMEM. In other cases an -EACCESS
should be returned.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 2286a6914c77 ("mm: change mprotect_fixup to vma iterator")
Signed-off-by: Liam R. Howlett <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/khugepaged: check again on anon uffd-wp during isolation

Khugepaged collapse an anonymous thp in two rounds of scans.  The 2nd
round done in __collapse_huge_page_isolate() after
hpage_collapse_scan_pmd(), during which all the locks will be released
temporarily.  It means the pgtable can change during this phase before 2nd
round starts.

It's logically possible some ptes got wr-protected during this phase, and
we can errornously collapse a thp without noticing some ptes are
wr-protected by userfault.  e1e267c7928f wanted to avoid it but it only
did that for the 1st phase, not the 2nd phase.

Since __collapse_huge_page_isolate() happens after a round of small page
swapins, we don't need to worry on any !present ptes - if it existed
khugepaged will already bail out.  So we only need to check present ptes
with uffd-wp bit set there.

This is something I found only but never had a reproducer, I thought it
was one caused a bug in Muhammad's recent pagemap new ioctl work, but it
turns out it's not the cause of that but an userspace bug.  However this
seems to still be a real bug even with a very small race window, still
worth to have it fixed and copy stable.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: e1e267c7928f ("khugepaged: skip collapse if uffd-wp detected")
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/userfaultfd: fix uffd-wp handling for THP migration entries

Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
fix uffd-wp handling for migration entries in
hugetlb_change_protection()") similarly applies to THP.

Setting/clearing uffd-wp on THP migration entries is not implemented
properly. Further, while removing migration PMDs considers the uffd-wp
bit, inserting migration PMDs does not consider the uffd-wp bit.

We have to set/clear independently of the migration entry type in
change_huge_pmd() and properly copy the uffd-wp bit in
set_pmd_migration_entry().

Verified using a simple reproducer that triggers migration of a THP, that
the set_pmd_migration_entry() no longer loses the uffd-wp bit.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Cc: <[email protected]>
Cc: Muhammad Usama Anjum <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: swap: fix performance regression on sparsetruncate-tiny

The ->percpu_pvec_drained was originally introduced by commit d9ed0d08b6c6
("mm: only drain per-cpu pagevecs once per pagevec usage") to drain
per-cpu pagevecs only once per pagevec usage.  But after converting the
swap code to be more folio-based, the commit c2bc16817aa0 ("mm/swap: add
folio_batch_move_lru()") breaks this logic, which would cause
->percpu_pvec_drained to be reset to false, that means per-cpu pagevecs
will be drained multiple times per pagevec usage.

In theory, there should be no functional changes when converting code to
be more folio-based.  We should call folio_batch_reinit() in
folio_batch_move_lru() instead of folio_batch_init().  And to verify that
we still need ->percpu_pvec_drained, I ran mmtests/sparsetruncate-tiny and
got the following data:

                             baseline                   with
                            baseline/                 patch/
Min       Time      326.00 (   0.00%)      328.00 (  -0.61%)
1st-qrtle Time      334.00 (   0.00%)      336.00 (  -0.60%)
2nd-qrtle Time      338.00 (   0.00%)      341.00 (  -0.89%)
3rd-qrtle Time      343.00 (   0.00%)      347.00 (  -1.17%)
Max-1     Time      326.00 (   0.00%)      328.00 (  -0.61%)
Max-5     Time      327.00 (   0.00%)      330.00 (  -0.92%)
Max-10    Time      328.00 (   0.00%)      331.00 (  -0.91%)
Max-90    Time      350.00 (   0.00%)      357.00 (  -2.00%)
Max-95    Time      395.00 (   0.00%)      390.00 (   1.27%)
Max-99    Time      508.00 (   0.00%)      434.00 (  14.57%)
Max       Time      547.00 (   0.00%)      476.00 (  12.98%)
Amean     Time      344.61 (   0.00%)      345.56 *  -0.28%*
Stddev    Time       30.34 (   0.00%)       19.51 (  35.69%)
CoeffVar  Time        8.81 (   0.00%)        5.65 (  35.87%)
BAmean-99 Time      342.38 (   0.00%)      344.27 (  -0.55%)
BAmean-95 Time      338.58 (   0.00%)      341.87 (  -0.97%)
BAmean-90 Time      336.89 (   0.00%)      340.26 (  -1.00%)
BAmean-75 Time      335.18 (   0.00%)      338.40 (  -0.96%)
BAmean-50 Time      332.54 (   0.00%)      335.42 (  -0.87%)
BAmean-25 Time      329.30 (   0.00%)      332.00 (  -0.82%)

From the above it can be seen that we get similar data to when
->percpu_pvec_drained was introduced, so we still need it.  Let's call
folio_batch_reinit() in folio_batch_move_lru() to restore the original
logic.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: c2bc16817aa0 ("mm/swap: add folio_batch_move_lru()")
Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

Merge tag 'sched_urgent_for_v6.3_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Borislav Petkov:

- Do not pull tasks to the local scheduling group if its average load
is higher than the average system load

* tag 'sched_urgent_for_v6.3_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix imbalance overflow

Merge tag 'x86_urgent_for_v6.3_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Borislav Petkov:

- Drop __init annotation from two rtc functions which get called after
boot is done, in order to prevent a crash

* tag 'x86_urgent_for_v6.3_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/rtc: Remove __init for runtime functions

Merge tag 'powerpc-6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:

- A fix for NUMA distance handling in the pseries SCM (pmem) driver.

Thanks to Aneesh Kumar K.V.

* tag 'powerpc-6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/papr_scm: Update the NUMA distance table for the target node

Merge tag 'kbuild-fixes-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

- Drop debug info from purgatory objects again

- Document that kernel.org provides prebuilt LLVM toolchains

- Give up handling untracked files for source package builds

- Avoid creating corrupted cpio when KBUILD_BUILD_TIMESTAMP is given
   with a pre-epoch data.

- Change panic_show_mem() to a macro to handle variable-length argument

- Compress tarballs on-the-fly again

* tag 'kbuild-fixes-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: do not create intermediate *.tar for tar packages
  kbuild: do not create intermediate *.tar for source tarballs
  kbuild: merge cmd_archive_linux and cmd_archive_perf
  init/initramfs: Fix argument forwarding to panic() in panic_show_mem()
  initramfs: Check negative timestamp to prevent broken cpio archive
  kbuild: give up untracked files for source package builds
  Documentation/llvm: Add a note about prebuilt kernel.org toolchains
  purgatory: fix disabling debug info

Merge tag '6.3-rc6-ksmbd-server-fix' of git://git.samba.org/ksmbd

Pull ksmbd server fix from Steve French:
"smb311 server preauth integrity negotiate context parsing fix (check
for out of bounds access)"

* tag '6.3-rc6-ksmbd-server-fix' of git://git.samba.org/ksmbd:
ksmbd: avoid out of bounds access in decode_preauth_ctxt()

PCI/MSI: Remove over-zealous hardware size check in pci_msix_validate_entries()

pci_msix_validate_entries() validates the entries array which is handed in
by the caller for a MSI-X interrupt allocation. Aside of consistency
failures it also detects a failure when the size of the MSI-X hardware table
in the device is smaller than the size of the entries array.

That's wrong for the case of range allocations where the caller provides
the minimum and the maximum number of vectors to allocate, when the
hardware size is greater or equal than the mininum, but smaller than the
maximum.

Remove the hardware size check completely from that function and just
ensure that the entires array up to the maximum size is consistent.

The limitation and range checking versus the hardware size happens
independently of that afterwards anyway because the entries array is
optional.

Fixes: 4644d22eb673 ("PCI/MSI: Validate MSI-X contiguous restriction early")
Reported-by: David Laight <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/87v8i3sg62.ffs@tglx

kbuild: do not create intermediate *.tar for tar packages

Commit 05e96e96a315 ("kbuild: use git-archive for source package
creation") split the compression as a separate step to factor out
the common build rules.

With the previous commit, we got back to the situation where source
tarballs are compressed on-the-fly.
There is no reason to keep the separate compression rules.

Generate the comressed tar packages directly.

Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>

kbuild: do not create intermediate *.tar for source tarballs

Since commit 05e96e96a315 ("kbuild: use git-archive for source package
creation"), a source tarball is created in two steps; create *.tar file
then compress it. I split the compression as a separate rule because I
just thought 'git archive' supported only gzip.

For other compression algorithms, I could pipe the two commands:

$ git archive HEAD | xz > linux.tar.xz

I read git-archive(1) carefully, and I realized GIT had provided a
more elegant way:

$ git -c tar.tar.xz.command=xz archive -o linux.tar.xz HEAD

This commit uses 'tar.tar.*.command' configuration to specify the
compression backend so we can compress a source tarball on-the-fly.

GIT commit 767cf4579f0e ("archive: implement configurable tar filters")
is more than a decade old, so it should be available on almost all build
environments.

Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>

kbuild: merge cmd_archive_linux and cmd_archive_perf

The two commands, cmd_archive_linux and cmd_archive_perf, are similar.
Merge them to make it easier to add more changes to the git-archive
command.

Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>

init/initramfs: Fix argument forwarding to panic() in panic_show_mem()

Forwarding variadic argument lists can't be done by passing a va_list
to a function with signature foo(...) (as panic() has). It ends up
interpreting the va_list itself as a single argument instead of
iterating it. printf() happily accepts it of course, leading to corrupt
output.

Convert panic_show_mem() to a macro to allow forwarding the arguments.
The function is trivial enough that it's easier than trying to introduce
a vpanic() variant.

Signed-off-by: Benjamin Gray <[email protected]>
Reviewed-by: Andrew Donnellan <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

initramfs: Check negative timestamp to prevent broken cpio archive

Similar to commit 4c9d410f32b3 ("initramfs: Check timestamp to prevent
broken cpio archive"), except asserts that the timestamp is
non-negative. This can happen when the KBUILD_BUILD_TIMESTAMP is a value
before UNIX epoch, which may be set when making reproducible builds that
don't want to look like they use a valid date.

While support for dates before 1970 might not be supported, this is more
about preventing undetected CPIO corruption. The printf's use a minimum
length format specifier, and will happily make the field longer than 8
characters if they need to.

Signed-off-by: Benjamin Gray <[email protected]>
Reviewed-by: Andrew Donnellan <[email protected]>
Tested-by: Andrew Donnellan <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

Merge tag '6.3-rc6-smb311-client-negcontext-fix' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fix from Steve French:
"Small client fix for better checking for smb311 negotiate context
overflows, also marked for stable"

* tag '6.3-rc6-smb311-client-negcontext-fix' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix negotiate context parsing

Merge tag 'ubifs-for-linus-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull UBI fixes from Richard Weinberger:

- Fix failure to attach when vid_hdr offset equals the (sub)page size

- Fix for a deadlock in UBI's worker thread

* tag 'ubifs-for-linus-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size
ubi: Fix deadlock caused by recursively holding work_sem

cifs: fix negotiate context parsing

smb311_decode_neg_context() doesn't properly check against SMB packet
boundaries prior to accessing individual negotiate context entries. This
is due to the length check omitting the eight byte smb2_neg_context
header, as well as incorrect decrementing of len_of_ctxts.

Fixes: 5100d8a3fe03 ("SMB311: Improve checking of negotiate security contexts")
Reported-by: Volker Lendecke <[email protected]>
Reviewed-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: David Disseldorp <[email protected]>
Signed-off-by: Steve French <[email protected]>

Merge tag 'i2c-for-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"Just two driver fixes"

* tag 'i2c-for-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: ocores: generate stop condition after timeout in polling mode
i2c: mchp-pci1xxxx: Update Timing registers

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fix from James Bottomley:
"One small fix to SCSI Enclosure Services to fix a regression caused by
another recent fix"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ses: Handle enclosure with just a primary component gracefully

Merge tag 'block-6.3-2023-04-14' of git://git.kernel.dk/linux

Pull block fix from Jens Axboe:
"A single NVMe quirk entry addition"

* tag 'block-6.3-2023-04-14' of git://git.kernel.dk/linux:
nvme-pci: add NVME_QUIRK_BOGUS_NID for T-FORCE Z330 SSD

Merge tag 'io_uring-6.3-2023-04-14' of git://git.kernel.dk/linux

Pull io_uring fix from Jens Axboe:
"Just a small tweak to when task_work needs redirection, marked for
stable as well"

* tag 'io_uring-6.3-2023-04-14' of git://git.kernel.dk/linux:
io_uring: complete request via task work in case of DEFER_TASKRUN

Merge tag 'riscv-for-linus-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

- A fix for a missing fence when generating the NOMMU sigreturn
   trampoline

- A set of fixes for early DTB handling of reserved memory nodes

* tag 'riscv-for-linus-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: No need to relocate the dtb as it lies in the fixmap region
  riscv: Do not set initial_boot_params to the linear address of the dtb
  riscv: Move early dtb mapping into the fixmap region
  riscv: add icache flush for nommu sigreturn trampoline

Merge tag 'acpi-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
"These add two ACPI-related quirks:

   - Add a quirk to force StorageD3Enable on AMD Picasso systems (Mario
     Limonciello)

   - Add an ACPI IRQ override quirk for ASUS ExpertBook B1502CBA (Paul
     Menzel)"

* tag 'acpi-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: resource: Skip IRQ override on ASUS ExpertBook B1502CBA
  ACPI: x86: utils: Add Picasso to the list for forcing StorageD3Enable

Merge tag 'pm-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
"Make the amd-pstate cpufreq driver take all of the possible
  combinations of the 'old' and 'new' status values correctly while
  changing the operation mode via sysfs (Wyes Karny)"

* tag 'pm-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  amd-pstate: Fix amd_pstate mode switch

Merge tag 'thermal-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control fix from Rafael Wysocki:
"Modify the Intel thermal throttling code to avoid updating unsupported
  status clearing mask bits which causes the kernel to complain about
  unchecked MSR access (Srinivas Pandruvada)"

* tag 'thermal-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: intel: Avoid updating unsupported THERM_STATUS_CLEAR mask bits

Merge tag 'sound-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"A collection of small fixes.

  At this time, quite a few fixes for the old PCI drivers are found.
  Although they are not regression fixes, I took these as they are
  materials for stable kernels.

  In addition, a couple of regression fixes and another couple of
  HD-audio quirks are included"

* tag 'sound-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/hdmi: disable KAE for Intel DG2
  ALSA: hda/realtek: Add quirks for Lenovo Z13/Z16 Gen2
  ALSA: hda: patch_realtek: add quirk for Asus N7601ZM
  ALSA: firewire-tascam: add missing unwind goto in snd_tscm_stream_start_duplex()
  ALSA: emu10k1: don't create old pass-through playback device on Audigy
  ALSA: emu10k1: fix capture interrupt handler unlinking
  ALSA: hda/sigmatel: fix S/PDIF out on Intel D*45* motherboards
  ALSA: hda/sigmatel: add pin overrides for Intel DP45SG motherboard
  ALSA: i2c/cs8427: fix iec958 mixer control deactivation

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
"We had a fairly slow cycle on the rc side this time, here are the
  accumulated fixes, mostly in drivers:

   - irdma should not generate extra completions during flushing

   - Fix several memory leaks

   - Do not get confused in irdma's iwarp mode if IPv6 is present

   - Correct a link speed calculation in mlx5

   - Increase the EQ/WQ limits on erdma as they are too small for big
     applications

   - Use the right math for erdma's inline mtt feature

   - Make erdma probing more robust to boot time ordering differences

   - Fix a KMSAN crash in CMA due to uninitialized qkey"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/core: Fix GID entry ref leak when create_ah fails
  RDMA/cma: Allow UD qp_type to join multicast only
  RDMA/erdma: Defer probing if netdevice can not be found
  RDMA/erdma: Inline mtt entries into WQE if supported
  RDMA/erdma: Update default EQ depth to 4096 and max_send_wr to 8192
  RDMA/erdma: Fix some typos
  IB/mlx5: Add support for 400G_8X lane speed
  RDMA/irdma: Add ipv4 check to irdma_find_listener()
  RDMA/irdma: Increase iWARP CM default rexmit count
  RDMA/irdma: Fix memory leak of PBLE objects
  RDMA/irdma: Do not generate SW completions for NOPs

s390/bpf: Fix bpf_arch_text_poke() with new_addr == NULL

Thomas Richter reported a crash in linux-next with a backtrace similar
to the following one:

[<0000000000000000>] 0x0
([<000000000031a182>] bpf_trace_run4+0xc2/0x218)
[<00000000001d59f4>] __bpf_trace_sched_switch+0x1c/0x28
[<0000000000c44a3a>] __schedule+0x43a/0x890
[<0000000000c44ef8>] schedule+0x68/0x110
[<0000000000c4e5ca>] do_nanosleep+0xa2/0x168
[<000000000026e7fe>] hrtimer_nanosleep+0xf6/0x1c0
[<000000000026eb6e>] __s390x_sys_nanosleep+0xb6/0xf0
[<0000000000c3b81c>] __do_syscall+0x1e4/0x208
[<0000000000c50510>] system_call+0x70/0x98
Last Breaking-Event-Address:
[<000003ff7fda1814>] bpf_prog_65e887c70a835bbf_on_switch+0x1a4/0x1f0

The problem is that bpf_arch_text_poke() with new_addr == NULL is
susceptible to the following race condition:

T1                 T2
        -----------------  -------------------
plt.target = NULL
                   entry: brcl 0xf,plt
entry.mask = 0
                   lgrl %r1,plt.target
                   br %r1

Fix by setting PLT target to the instruction following `brcl 0xf,plt`
instead of 0. This way T2 will simply resume the execution of the eBPF
program, which is the desired effect of passing new_addr == NULL.

Fixes: f1d5df84cd8c ("s390/bpf: Implement bpf_arch_text_poke()")
Reported-by: Thomas Richter <[email protected]>
Signed-off-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

Merge branch 'acpi-x86'

Merge a quirk to force StorageD3Enable on AMD Picasso systems (Mario
Limonciello).

* acpi-x86:
ACPI: x86: utils: Add Picasso to the list for forcing StorageD3Enable

io_uring: complete request via task work in case of DEFER_TASKRUN

So far io_req_complete_post() only covers DEFER_TASKRUN by completing
request via task work when the request is completed from IOWQ.

However, uring command could be completed from any context, and if io
uring is setup with DEFER_TASKRUN, the command is required to be
completed from current context, otherwise wait on IORING_ENTER_GETEVENTS
can't be wakeup, and may hang forever.

The issue can be observed on removing ublk device, but turns out it is
one generic issue for uring command & DEFER_TASKRUN, so solve it in
io_uring core code.

Fixes: e6aeb2721d3b ("io_uring: complete all requests in task context")
Cc: [email protected]
Link: https://lore.kernel.org/linux-block/[email protected]/
Reported-by: Jens Axboe <[email protected]>
Cc: Kanchan Joshi <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Merge branch 'nvme-6.3' of git://git.infradead.org/nvme into block-6.3

Pull NVMe fix from Christoph.

* 'nvme-6.3' of git://git.infradead.org/nvme:
nvme-pci: add NVME_QUIRK_BOGUS_NID for T-FORCE Z330 SSD

Merge tag 'qcom-arm64-fixes-for-6.3-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes

A few more Qualcomm ARM64 DeviceTree fixes for 6.3

The GPIO polarity of the WSA881x shutdown GPIO was inconsistent and had
to be corrected in the driver, this fixes the polarity in the DeviceTree
for QRB5165 RB5, SM8250 MTP, Samsung Galaxy Book 2 and Lenovo Yoga C630.

The recent rearrangement of nodes among the IPQ8074 accidentally enabled
the PCIe PHYs, rather than the PCIe controllers. This is being
corrected, to restore PCIe functionality.

PMK8280 PON node has the wrong compatible, which recently caused the
driver to stop probing. This is corrected and the required "pbs" region
is added.

With support for HBR3 introduced, it's noted that SC7280 Herobrine
devices are having trouble running at this rate. This drops the claim
that it's supported, until further analysis can be done.

* tag 'qcom-arm64-fixes-for-6.3-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
  arm64: dts: qcom: sc7280: remove hbr3 support on herobrine boards
  arm64: dts: qcom: sc8280xp-pmics: fix pon compatible and registers
  arm64: dts: qcom: ipq8074-hk10: enable QMP device, not the PHY node
  arm64: dts: qcom: ipq8074-hk01: enable QMP device, not the PHY node
  arm64: dts: qcom: qrb5165-rb5: Use proper WSA881x shutdown GPIO polarity
  arm64: dts: qcom: sm8250-mtp: Use proper WSA881x shutdown GPIO polarity
  arm64: dts: qcom: sdm850-samsung-w737: Use proper WSA881x shutdown GPIO polarity
  arm64: dts: qcom: sdm850-lenovo-yoga-c630: Use proper WSA881x shutdown GPIO polarity

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'v6.3-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/fixes

Lower sd card speeds for two boards to make them run more reliable,
missing 32k clock definition for Anbric xx3 devices, missing cache-levels
for rk3588, fixed rk3326-board display supplies and more dt-schema fixes.

* tag 'v6.3-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  arm64: dts: rockchip: correct panel supplies on some rk3326 boards
  arm64: dts: rockchip: use just "port" in panel on RockPro64
  arm64: dts: rockchip: use just "port" in panel on Pinebook Pro
  arm64: dts: rockchip: Remove non-existing pwm-delay-us property
  arm64: dts: rockchip: Add clk_rtc_32k to Anbernic xx3 Devices
  arm64: dts: rockchip: add rk3588 cache level information
  arm64: dts: rockchip: Lower SD card speed on rk3399 Pinebook Pro
  arm64: dts: rockchip: Lower sd speed on rk3566-soquartz
  ARM: dts: rockchip: fix a typo error for rk3288 spdif node
  arm64: dts: rockchip: Fix rk3399 GICv3 ITS node name

Link: https://lore.kernel.org/r/10559306.CDJkKcVGEf@phil
Signed-off-by: Arnd Bergmann <[email protected]>

firmware/psci: demote suspend-mode warning to info level

On some Qualcomm platforms, like SC8280XP, the attempt to set PC mode
during boot fails with PSCI_RET_DENIED and since commit 998fcd001feb
("firmware/psci: Print a warning if PSCI doesn't accept PC mode") this
is now logged at warning level:

psci: failed to set PC mode: -3

As there is nothing users can do about the firmware behaving this way,
demote the warning to info level and clearly mark it as a firmware bug:

psci: [Firmware Bug]: failed to set PC mode: -3

Reviewed-by: Ulf Hansson <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Acked-by: Sudeep Holla <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Acked-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>

net: sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_agg

If the TCA_QFQ_LMAX value is not offered through nlattr, lmax is determined by the MTU value of the network device.
The MTU of the loopback device can be set up to 2^31-1.
As a result, it is possible to have an lmax value that exceeds QFQ_MIN_LMAX.

Due to the invalid lmax value, an index is generated that exceeds the QFQ_MAX_INDEX(=24) value, causing out-of-bounds read/write errors.

The following reports a oob access:

[   84.582666] BUG: KASAN: slab-out-of-bounds in qfq_activate_agg.constprop.0 (net/sched/sch_qfq.c:1027 net/sched/sch_qfq.c:1060 net/sched/sch_qfq.c:1313)
[   84.583267] Read of size 4 at addr ffff88810f676948 by task ping/301
[   84.583686]
[   84.583797] CPU: 3 PID: 301 Comm: ping Not tainted 6.3.0-rc5 #1
[   84.584164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[   84.584644] Call Trace:
[   84.584787]  <TASK>
[   84.584906] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
[   84.585108] print_report (mm/kasan/report.c:320 mm/kasan/report.c:430)
[   84.585570] kasan_report (mm/kasan/report.c:538)
[   84.585988] qfq_activate_agg.constprop.0 (net/sched/sch_qfq.c:1027 net/sched/sch_qfq.c:1060 net/sched/sch_qfq.c:1313)
[   84.586599] qfq_enqueue (net/sched/sch_qfq.c:1255)
[   84.587607] dev_qdisc_enqueue (net/core/dev.c:3776)
[   84.587749] __dev_queue_xmit (./include/net/sch_generic.h:186 net/core/dev.c:3865 net/core/dev.c:4212)
[   84.588763] ip_finish_output2 (./include/net/neighbour.h:546 net/ipv4/ip_output.c:228)
[   84.589460] ip_output (net/ipv4/ip_output.c:430)
[   84.590132] ip_push_pending_frames (./include/net/dst.h:444 net/ipv4/ip_output.c:126 net/ipv4/ip_output.c:1586 net/ipv4/ip_output.c:1606)
[   84.590285] raw_sendmsg (net/ipv4/raw.c:649)
[   84.591960] sock_sendmsg (net/socket.c:724 net/socket.c:747)
[   84.592084] __sys_sendto (net/socket.c:2142)
[   84.593306] __x64_sys_sendto (net/socket.c:2150)
[   84.593779] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[   84.593902] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[   84.594070] RIP: 0033:0x7fe568032066
[   84.594192] Code: 0e 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c09[ 84.594796] RSP: 002b:00007ffce388b4e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c

Code starting with the faulting instruction
===========================================
[   84.595047] RAX: ffffffffffffffda RBX: 00007ffce388cc70 RCX: 00007fe568032066
[   84.595281] RDX: 0000000000000040 RSI: 00005605fdad6d10 RDI: 0000000000000003
[   84.595515] RBP: 00005605fdad6d10 R08: 00007ffce388eeec R09: 0000000000000010
[   84.595749] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[   84.595984] R13: 00007ffce388cc30 R14: 00007ffce388b4f0 R15: 0000001d00000001
[   84.596218]  </TASK>
[   84.596295]
[   84.596351] Allocated by task 291:
[   84.596467] kasan_save_stack (mm/kasan/common.c:46)
[   84.596597] kasan_set_track (mm/kasan/common.c:52)
[   84.596725] __kasan_kmalloc (mm/kasan/common.c:384)
[   84.596852] __kmalloc_node (./include/linux/kasan.h:196 mm/slab_common.c:967 mm/slab_common.c:974)
[   84.596979] qdisc_alloc (./include/linux/slab.h:610 ./include/linux/slab.h:731 net/sched/sch_generic.c:938)
[   84.597100] qdisc_create (net/sched/sch_api.c:1244)
[   84.597222] tc_modify_qdisc (net/sched/sch_api.c:1680)
[   84.597357] rtnetlink_rcv_msg (net/core/rtnetlink.c:6174)
[   84.597495] netlink_rcv_skb (net/netlink/af_netlink.c:2574)
[   84.597627] netlink_unicast (net/netlink/af_netlink.c:1340 net/netlink/af_netlink.c:1365)
[   84.597759] netlink_sendmsg (net/netlink/af_netlink.c:1942)
[   84.597891] sock_sendmsg (net/socket.c:724 net/socket.c:747)
[   84.598016] ____sys_sendmsg (net/socket.c:2501)
[   84.598147] ___sys_sendmsg (net/socket.c:2557)
[   84.598275] __sys_sendmsg (./include/linux/file.h:31 net/socket.c:2586)
[   84.598399] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[   84.598520] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[   84.598688]
[   84.598744] The buggy address belongs to the object at ffff88810f674000
[   84.598744]  which belongs to the cache kmalloc-8k of size 8192
[   84.599135] The buggy address is located 2664 bytes to the right of
[   84.599135]  allocated 7904-byte region [ffff88810f674000, ffff88810f675ee0)
[   84.599544]
[   84.599598] The buggy address belongs to the physical page:
[   84.599777] page:00000000e638567f refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10f670
[   84.600074] head:00000000e638567f order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[   84.600330] flags: 0x200000000010200(slab|head|node=0|zone=2)
[   84.600517] raw: 0200000000010200 ffff888100043180 dead000000000122 0000000000000000
[   84.600764] raw: 0000000000000000 0000000080020002 00000001ffffffff 0000000000000000
[   84.601009] page dumped because: kasan: bad access detected
[   84.601187]
[   84.601241] Memory state around the buggy address:
[   84.601396]  ffff88810f676800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   84.601620]  ffff88810f676880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   84.601845] >ffff88810f676900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   84.602069]                                               ^
[   84.602243]  ffff88810f676980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   84.602468]  ffff88810f676a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   84.602693] ==================================================================
[   84.602924] Disabling lock debugging due to kernel taint

Fixes: 3015f3d2a3cd ("pkt_sched: enable QFQ to support TSO/GSO")
Reported-by: Gwangun Jung <[email protected]>
Signed-off-by: Gwangun Jung <[email protected]>
Acked-by: Jamal Hadi Salim<[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Input: cyttsp5 - fix sensing configuration data structure

Prior to this patch, the sensing configuration data was not parsed
correctly, breaking detection of max_tch. The vendor driver includes
this field. This change informs the driver about the correct maximum
number of simultaneous touch inputs.

Tested on a Pine64 PineNote with a modified touch screen controller
firmware.

Signed-off-by: hrdl <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>

ALSA: hda/hdmi: disable KAE for Intel DG2

Use of keep-alive (KAE) has resulted in loss of audio on some A750/770
cards as the transition from keep-alive to stream playback is not
working as expected. As there is limited benefit of the new KAE mode
on discrete cards, revert back to older silent-stream implementation
on these systems.

Cc: [email protected]
Fixes: 15175a4f2bbb ("ALSA: hda/hdmi: add keep-alive support for ADL-P and DG2")
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8307
Signed-off-by: Kai Vehmanen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

nvme-pci: add NVME_QUIRK_BOGUS_NID for T-FORCE Z330 SSD

Added a quirk to fix the TeamGroup T-Force Cardea Zero Z330 SSDs reporting
duplicate NGUIDs.

Signed-off-by: Duy Truong <[email protected]>
Cc: [email protected]
Signed-off-by: Christoph Hellwig <[email protected]>

riscv: No need to relocate the dtb as it lies in the fixmap region

We used to access the dtb via its linear mapping address but now that the
dtb early mapping was moved in the fixmap region, we can keep using this
address since it is present in swapper_pg_dir, and remove the dtb
relocation.

Note that the relocation was wrong anyway since early_memremap() is
restricted to 256K whereas the maximum fdt size is 2MB.

Signed-off-by: Alexandre Ghiti <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Tested-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

riscv: Do not set initial_boot_params to the linear address of the dtb

early_init_dt_verify() is already called in parse_dtb() and since the dtb
address does not change anymore (it is now in the fixmap region), no need
to reset initial_boot_params by calling early_init_dt_verify() again.

Signed-off-by: Alexandre Ghiti <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

riscv: Move early dtb mapping into the fixmap region

riscv establishes 2 virtual mappings:

- early_pg_dir maps the kernel which allows to discover the system
memory
- swapper_pg_dir installs the final mapping (linear mapping included)

We used to map the dtb in early_pg_dir using DTB_EARLY_BASE_VA, and this
mapping was not carried over in swapper_pg_dir. It happens that
early_init_fdt_scan_reserved_mem() must be called before swapper_pg_dir is
setup otherwise we could allocate reserved memory defined in the dtb.
And this function initializes reserved_mem variable with addresses that
lie in the early_pg_dir dtb mapping: when those addresses are reused
with swapper_pg_dir, this mapping does not exist and then we trap.

The previous "fix" was incorrect as early_init_fdt_scan_reserved_mem()
must be called before swapper_pg_dir is set up otherwise we could
allocate in reserved memory defined in the dtb.

So move the dtb mapping in the fixmap region which is established in
early_pg_dir and handed over to swapper_pg_dir.

Fixes: 922b0375fc93 ("riscv: Fix memblock reservation for device tree blob")
Fixes: 8f3a2b4a96dc ("RISC-V: Move DT mapping outof fixmap")
Fixes: 50e63dd8ed92 ("riscv: fix reserved memory setup")
Reported-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Alexandre Ghiti <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Tested-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

Merge tag 'cgroup-for-6.3-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:
"This is a relatively big pull request this late in the cycle but the
  major contributor is the cpuset bug which is rather significant:

   - Fix several cpuset bugs including one where it wasn't applying the
     target cgroup when tasks are created with CLONE_INTO_CGROUP

  With a few smaller fixes:

   - Fix inversed locking order in cgroup1 freezer implementation

   - Fix garbage cpu.stat::core_sched.forceidle_usec reporting in the
     root cgroup"

* tag 'cgroup-for-6.3-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/cpuset: Make cpuset_attach_task() skip subpartitions CPUs for top_cpuset
  cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork() methods
  cgroup/cpuset: Make cpuset_fork() handle CLONE_INTO_CGROUP properly
  cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach()
  cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex
  cgroup/cpuset: Fix partition root's cpuset.cpus update bug
  cgroup: fix display of forceidle time at root

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
"A few more clk driver fixes:

   - Set the max_register member of the spreadtrum regmap so that reads
     don't go off the end of the I/O space

   - Avoid a clk parent error in the i.MX imx6ul driver when the
     selector is unknown

   - Fix an oops due to REGCACHE_NONE usage by the Renesas 9-series
     driver"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: rs9: Fix suspend/resume
  clk: imx6ul: fix "failed to get parent" error
  clk: sprd: set max_register according to mapping range

Merge tag 'net-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, and bluetooth.

  Not all that quiet given spring celebrations, but "current" fixes are
  thinning out, which is encouraging. One outstanding regression in the
  mlx5 driver when using old FW, not blocking but we're pushing for a
  fix.

  Current release - new code bugs:

   - eth: enetc: workaround for unresponsive pMAC after receiving
     express traffic

  Previous releases - regressions:

   - rtnetlink: restore RTM_NEW/DELLINK notification behavior, keep the
     pid/seq fields 0 for backward compatibility

  Previous releases - always broken:

   - sctp: fix a potential overflow in sctp_ifwdtsn_skip

   - mptcp:
      - use mptcp_schedule_work instead of open-coding it and make the
        worker check stricter, to avoid scheduling work on closed
        sockets
      - fix NULL pointer dereference on fastopen early fallback

   - skbuff: fix memory corruption due to a race between skb coalescing
     and releasing clones confusing page_pool reference counting

   - bonding: fix neighbor solicitation validation on backup slaves

   - bpf: tcp: use sock_gen_put instead of sock_put in bpf_iter_tcp

   - bpf: arm64: fixed a BTI error on returning to patched function

   - openvswitch: fix race on port output leading to inf loop

   - sfp: initialize sfp->i2c_block_size at sfp allocation to avoid
     returning a different errno than expected

   - phy: nxp-c45-tja11xx: unregister PTP, purge queues on remove

   - Bluetooth: fix printing errors if LE Connection times out

   - Bluetooth: assorted UaF, deadlock and data race fixes

   - eth: macb: fix memory corruption in extended buffer descriptor mode

  Misc:

   - adjust the XDP Rx flow hash API to also include the protocol layers
     over which the hash was computed"

* tag 'net-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
  selftests/bpf: Adjust bpf_xdp_metadata_rx_hash for new arg
  mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type
  veth: bpf_xdp_metadata_rx_hash add xdp rss hash type
  mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type
  xdp: rss hash types representation
  selftests/bpf: xdp_hw_metadata remove bpf_printk and add counters
  skbuff: Fix a race between coalescing and releasing SKBs
  net: macb: fix a memory corruption in extended buffer descriptor mode
  selftests: add the missing CONFIG_IP_SCTP in net config
  udp6: fix potential access to stale information
  selftests: openvswitch: adjust datapath NL message declaration
  selftests: mptcp: userspace pm: uniform verify events
  mptcp: fix NULL pointer dereference on fastopen early fallback
  mptcp: stricter state check in mptcp_worker
  mptcp: use mptcp_schedule_work instead of open-coding it
  net: enetc: workaround for unresponsive pMAC after receiving express traffic
  sctp: fix a potential overflow in sctp_ifwdtsn_skip
  net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume()
  rtnetlink: Restore RTM_NEW/DELLINK notification behavior
  net: ti/cpsw: Add explicit platform_device.h and of_platform.h includes
  ...

drm/sched: Check scheduler ready before calling timeout handling

During an IGT GPU reset test we see the following oops,

[  +0.000003] ------------[ cut here ]------------
[  +0.000000] WARNING: CPU: 9 PID: 0 at kernel/workqueue.c:1656 __queue_delayed_work+0x6d/0xa0
[  +0.000004] Modules linked in: iptable_filter bpfilter amdgpu(OE) nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic intel_rapl_msr ledtrig_audio snd_hda_codec_hdmi intel_rapl_common snd_hda_intel edac_mce_amd snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core iommu_v2 gpu_sched(OE) kvm_amd drm_buddy snd_hwdep kvm video drm_ttm_helper snd_pcm ttm snd_seq_midi drm_display_helper snd_seq_midi_event snd_rawmidi cec crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 snd_seq aesni_intel rc_core crypto_simd cryptd binfmt_misc drm_kms_helper rapl snd_seq_device input_leds joydev snd_timer i2c_algo_bit syscopyarea snd ccp sysfillrect sysimgblt wmi_bmof k10temp soundcore mac_hid sch_fq_codel msr parport_pc ppdev drm lp parport ramoops reed_solomon pstore_blk pstore_zone efi_pstore ip_tables x_tables autofs4 hid_generic usbhid hid r8169 ahci xhci_pci gpio_amdpt realtek i2c_piix4 wmi crc32_pclmul xhci_pci_renesas libahci gpio_generic
[  +0.000070] CPU: 9 PID: 0 Comm: swapper/9 Tainted: G        W OE      6.1.11+ #2
[  +0.000003] Hardware name: Gigabyte Technology Co., Ltd. AB350-Gaming 3/AB350-Gaming 3-CF, BIOS F7 06/16/2017
[  +0.000001] RIP: 0010:__queue_delayed_work+0x6d/0xa0
[  +0.000003] Code: 7a 50 48 01 c1 48 89 4a 30 81 ff 00 20 00 00 75 38 4c 89 cf e8 64 3e 0a 00 5d e9 1e c5 11 01 e8 99 f7 ff ff 5d e9 13 c5 11 01 <0f> 0b eb c1 0f 0b 48 81 7a 38 70 5c 0e 81 74 9f 0f 0b 48 8b 42 28
[  +0.000002] RSP: 0018:ffffc90000398d60 EFLAGS: 00010007
[  +0.000002] RAX: ffff88810d589c60 RBX: 0000000000000000 RCX: 0000000000000000
[  +0.000002] RDX: ffff88810d589c58 RSI: 0000000000000000 RDI: 0000000000002000
[  +0.000001] RBP: ffffc90000398d60 R08: 0000000000000000 R09: ffff88810d589c78
[  +0.000002] R10: 72705f305f39765f R11: 7866673a6d72645b R12: ffff88810d589c58
[  +0.000001] R13: 0000000000002000 R14: 0000000000000000 R15: 0000000000000000
[  +0.000002] FS:  0000000000000000(0000) GS:ffff8887fee40000(0000) knlGS:0000000000000000
[  +0.000001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000002] CR2: 00005562c4797fa0 CR3: 0000000110da0000 CR4: 00000000003506e0
[  +0.000002] Call Trace:
[  +0.000001]  <IRQ>
[  +0.000001]  mod_delayed_work_on+0x5e/0xa0
[  +0.000004]  drm_sched_fault+0x23/0x30 [gpu_sched]
[  +0.000007]  gfx_v9_0_fault.isra.0+0xa6/0xd0 [amdgpu]
[  +0.000258]  gfx_v9_0_priv_reg_irq+0x29/0x40 [amdgpu]
[  +0.000254]  amdgpu_irq_dispatch+0x1ac/0x2b0 [amdgpu]
[  +0.000243]  amdgpu_ih_process+0x89/0x130 [amdgpu]
[  +0.000245]  amdgpu_irq_handler+0x24/0x60 [amdgpu]
[  +0.000165]  __handle_irq_event_percpu+0x4f/0x1a0
[  +0.000003]  handle_irq_event_percpu+0x15/0x50
[  +0.000001]  handle_irq_event+0x39/0x60
[  +0.000002]  handle_edge_irq+0xa8/0x250
[  +0.000003]  __common_interrupt+0x7b/0x150
[  +0.000002]  common_interrupt+0xc1/0xe0
[  +0.000003]  </IRQ>
[  +0.000000]  <TASK>
[  +0.000001]  asm_common_interrupt+0x27/0x40
[  +0.000002] RIP: 0010:native_safe_halt+0xb/0x10
[  +0.000003] Code: 46 ff ff ff cc cc cc cc cc cc cc cc cc cc cc eb 07 0f 00 2d 69 f2 5e 00 f4 e9 f1 3b 3e 00 90 eb 07 0f 00 2d 59 f2 5e 00 fb f4 <e9> e0 3b 3e 00 0f 1f 44 00 00 55 48 89 e5 53 e8 b1 d4 fe ff 66 90
[  +0.000002] RSP: 0018:ffffc9000018fdc8 EFLAGS: 00000246
[  +0.000002] RAX: 0000000000004000 RBX: 000000000002e5a8 RCX: 000000000000001f
[  +0.000001] RDX: 0000000000000001 RSI: ffff888101298800 RDI: ffff888101298864
[  +0.000001] RBP: ffffc9000018fdd0 R08: 000000527f64bd8b R09: 000000000001dc90
[  +0.000001] R10: 000000000001dc90 R11: 0000000000000003 R12: 0000000000000001
[  +0.000001] R13: ffff888101298864 R14: ffffffff832d9e20 R15: ffff888193aa8c00
[  +0.000003]  ? acpi_idle_do_entry+0x5e/0x70
[  +0.000002]  acpi_idle_enter+0xd1/0x160
[  +0.000003]  cpuidle_enter_state+0x9a/0x6e0
[  +0.000003]  cpuidle_enter+0x2e/0x50
[  +0.000003]  call_cpuidle+0x23/0x50
[  +0.000002]  do_idle+0x1de/0x260
[  +0.000002]  cpu_startup_entry+0x20/0x30
[  +0.000002]  start_secondary+0x120/0x150
[  +0.000003]  secondary_startup_64_no_verify+0xe5/0xeb
[  +0.000004]  </TASK>
[  +0.000000] ---[ end trace 0000000000000000 ]---
[  +0.000003] BUG: kernel NULL pointer dereference, address: 0000000000000102
[  +0.006233] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_low timeout, signaled seq=3, emitted seq=4
[  +0.000734] #PF: supervisor read access in kernel mode
[  +0.009670] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process amd_deadlock pid 2002 thread amd_deadlock pid 2002
[  +0.005135] #PF: error_code(0x0000) - not-present page
[  +0.000002] PGD 0 P4D 0
[  +0.000002] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  +0.000002] CPU: 9 PID: 0 Comm: swapper/9 Tainted: G        W OE      6.1.11+ #2
[  +0.000002] Hardware name: Gigabyte Technology Co., Ltd. AB350-Gaming 3/AB350-Gaming 3-CF, BIOS F7 06/16/2017
[  +0.012101] amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
[  +0.005136] RIP: 0010:__queue_work+0x1f/0x4e0
[  +0.000004] Code: 87 cd 11 01 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 49 89 d5 41 54 49 89 f4 53 48 83 ec 10 89 7d d4 <f6> 86 02 01 00 00 01 0f 85 6c 03 00 00 e8 7f 36 08 00 8b 45 d4 48

For gfx_rings the schedulers may not be initialized by
amdgpu_device_init_schedulers() due to ring->no_scheduler flag being set to
true and thus the timeout_wq is NULL. As a result, since all ASICs call
drm_sched_fault() unconditionally even for schedulers which have not been
initialized, it is simpler to use the ready condition which indicates whether
the given scheduler worker thread runs and whether the timeout_wq of the reset
domain has been initialized.

Signed-off-by: Vitaly Prosyak <[email protected]>
Cc: Christian König <[email protected]>
Reviewed-by: Luben Tuikov <[email protected]>
Signed-off-by: Luben Tuikov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Merge tag 'devicetree-fixes-for-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

- Fix interaction between fw_devlink and DT overlays causing devices to
   not be probed

- Fix the compatible string for loongson,cpu-interrupt-controller

* tag 'devicetree-fixes-for-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  treewide: Fix probing of devices in DT overlays
  dt-bindings: interrupt-controller: loongarch: Fix mismatched compatible

Merge tag 'pinctrl-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fix from Linus Walleij:
"This is just a revert of the AMD fix, because the fix broke some
laptops. We are working on a proper solution"

* tag 'pinctrl-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
Revert "pinctrl: amd: Disable and mask interrupts on resume"

Merge tag 'drm-fixes-2023-04-13' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Daniel Vetter:

- two fbcon regressions

- amdgpu: dp mst, smu13

- i915: dual link dsi for tgl+

- armada, nouveau, drm/sched, fbmem

* tag 'drm-fixes-2023-04-13' of git://anongit.freedesktop.org/drm/drm:
  fbcon: set_con2fb_map needs to set con2fb_map!
  fbcon: Fix error paths in set_con2fb_map
  drm/amd/pm: correct the pcie link state check for SMU13
  drm/amd/pm: correct SMU13.0.7 max shader clock reporting
  drm/amd/pm: correct SMU13.0.7 pstate profiling clock settings
  drm/amd/display: Pass the right info to drm_dp_remove_payload
  drm/armada: Fix a potential double free in an error handling path
  fbmem: Reject FB_ACTIVATE_KD_TEXT from userspace
  drm/nouveau/fb: add missing sysmen flush callbacks
  drm/i915/dsi: fix DSS CTL register offsets for TGL+
  drm/scheduler: Fix UAF race in drm_sched_entity_push_job()

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2023-04-13

We've added 6 non-merge commits during the last 1 day(s) which contain
a total of 14 files changed, 205 insertions(+), 38 deletions(-).

The main changes are:

1) One late straggler fix on the XDP hints side which fixes
   bpf_xdp_metadata_rx_hash kfunc API before the release goes out
   in order to provide information on the RSS hash type,
   from Jesper Dangaard Brouer.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Adjust bpf_xdp_metadata_rx_hash for new arg
  mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type
  veth: bpf_xdp_metadata_rx_hash add xdp rss hash type
  mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type
  xdp: rss hash types representation
  selftests/bpf: xdp_hw_metadata remove bpf_printk and add counters
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ksmbd: avoid out of bounds access in decode_preauth_ctxt()

Confirm that the accessed pneg_ctxt->HashAlgorithms address sits within
the SMB request boundary; deassemble_neg_contexts() only checks that the
eight byte smb2_neg_context header + (client controlled) DataLength are
within the packet boundary, which is insufficient.

Checking for sizeof(struct smb2_preauth_neg_context) is overkill given
that the type currently assumes SMB311_SALT_SIZE bytes of trailing Salt.

Signed-off-by: David Disseldorp <[email protected]>
Acked-by: Namjae Jeon <[email protected]>
Cc: <[email protected]>
Signed-off-by: Steve French <[email protected]>

Merge tag 'drm-misc-fixes-2023-04-13' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Short summary of fixes pull:

* armada: Fix double free
* fb: Clear FB_ACTIVATE_KD_TEXT in ioctl
* nouveau: Add missing callbacks
* scheduler: Fix use-after-free error

Signed-off-by: Daniel Vetter <[email protected]>
From: Thomas Zimmermann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20230413184233.GA8148@linux-uq9g

Merge branch 'XDP-hints: change RX-hash kfunc bpf_xdp_metadata_rx_hash'

Jesper Dangaard Brouer says:

====================

Current API for bpf_xdp_metadata_rx_hash() returns the raw RSS hash value,
but doesn't provide information on the RSS hash type (part of 6.3-rc).

This patchset proposal is to change the function call signature via adding
a pointer value argument for providing the RSS hash type.

Patchset also removes all bpf_printk's from xdp_hw_metadata program
that we expect driver developers to use. Instead counters are introduced
for relaying e.g. skip and fail info.
====================

Signed-off-by: Alexei Starovoitov <[email protected]>

selftests/bpf: Adjust bpf_xdp_metadata_rx_hash for new arg

Update BPF selftests to use the new RSS type argument for kfunc
bpf_xdp_metadata_rx_hash.

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: Toke Høiland-Jørgensen <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Link: https://lore.kernel.org/r/168132894068.340624.8914711185697163690.stgit@firesoul
Signed-off-by: Alexei Starovoitov <[email protected]>

mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type

Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type
via matching individual Completion Queue Entry (CQE) status bits.

Fixes: ab46182d0dcb ("net/mlx4_en: Support RX XDP metadata")
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: Toke Høiland-Jørgensen <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Link: https://lore.kernel.org/r/168132893562.340624.12779118462402031248.stgit@firesoul
Signed-off-by: Alexei Starovoitov <[email protected]>

veth: bpf_xdp_metadata_rx_hash add xdp rss hash type

Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type.

The veth driver currently only support XDP-hints based on SKB code path.
The SKB have lost information about the RSS hash type, by compressing
the information down to a single bitfield skb->l4_hash, that only knows
if this was a L4 hash value.

In preparation for veth, the xdp_rss_hash_type have an L4 indication
bit that allow us to return a meaningful L4 indication when working
with SKB based packets.

Fixes: 306531f0249f ("veth: Support RX XDP metadata")
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: Toke Høiland-Jørgensen <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Link: https://lore.kernel.org/r/168132893055.340624.16209448340644513469.stgit@firesoul
Signed-off-by: Alexei Starovoitov <[email protected]>

mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type

Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type
via mapping table.

The mlx5 hardware can also identify and RSS hash IPSEC. This indicate
hash includes SPI (Security Parameters Index) as part of IPSEC hash.

Extend xdp core enum xdp_rss_hash_type with IPSEC hash type.

Fixes: bc8d405b1ba9 ("net/mlx5e: Support RX XDP metadata")
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: Toke Høiland-Jørgensen <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Link: https://lore.kernel.org/r/168132892548.340624.11185734579430124869.stgit@firesoul
Signed-off-by: Alexei Starovoitov <[email protected]>

xdp: rss hash types representation

The RSS hash type specifies what portion of packet data NIC hardware used
when calculating RSS hash value. The RSS types are focused on Internet
traffic protocols at OSI layers L3 and L4. L2 (e.g. ARP) often get hash
value zero and no RSS type. For L3 focused on IPv4 vs. IPv6, and L4
primarily TCP vs UDP, but some hardware supports SCTP.

Hardware RSS types are differently encoded for each hardware NIC. Most
hardware represent RSS hash type as a number. Determining L3 vs L4 often
requires a mapping table as there often isn't a pattern or sorting
according to ISO layer.

The patch introduce a XDP RSS hash type (enum xdp_rss_hash_type) that
contains both BITs for the L3/L4 types, and combinations to be used by
drivers for their mapping tables. The enum xdp_rss_type_bits get exposed
to BPF via BTF, and it is up to the BPF-programmer to match using these
defines.

This proposal change the kfunc API bpf_xdp_metadata_rx_hash() adding
a pointer value argument for provide the RSS hash type.
Change signature for all xmo_rx_hash calls in drivers to make it compile.

The RSS type implementations for each driver comes as separate patches.

Fixes: 3d76a4d3d4e5 ("bpf: XDP metadata RX kfuncs")
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: Toke Høiland-Jørgensen <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Link: https://lore.kernel.org/r/168132892042.340624.582563003880565460.stgit@firesoul
Signed-off-by: Alexei Starovoitov <[email protected]>

selftests/bpf: xdp_hw_metadata remove bpf_printk and add counters

The tool xdp_hw_metadata can be used by driver developers
implementing XDP-hints metadata kfuncs.

Remove all bpf_printk calls, as the tool already transfers all the
XDP-hints related information via metadata area to AF_XDP
userspace process.

Add counters for providing remaining information about failure and
skipped packet events.

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Link: https://lore.kernel.org/r/168132891533.340624.7313781245316405141.stgit@firesoul
Signed-off-by: Alexei Starovoitov <[email protected]>

fbcon: set_con2fb_map needs to set con2fb_map!

I got really badly confused in d443d9386472 ("fbcon: move more common
code into fb_open()") because we set the con2fb_map before the failure
points, which didn't look good.

But in trying to fix that I moved the assignment into the wrong path -
we need to do it for _all_ vc we take over, not just the first one
(which additionally requires the call to con2fb_acquire_newinfo).

I've figured this out because of a KASAN bug report, where the
fbcon_registered_fb and fbcon_display arrays went out of sync in
fbcon_mode_deleted() because the con2fb_map pointed at the old
fb_info, but the modes and everything was updated for the new one.

Signed-off-by: Daniel Vetter <[email protected]>
Reviewed-by: Javier Martinez Canillas <[email protected]>
Acked-by: Helge Deller <[email protected]>
Tested-by: Xingyuan Mo <[email protected]>
Fixes: d443d9386472 ("fbcon: move more common code into fb_open()")
Reported-by: Xingyuan Mo <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Xingyuan Mo <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: <[email protected]> # v5.19+

fbcon: Fix error paths in set_con2fb_map

This is a regressoin introduced in b07db3958485 ("fbcon: Ditch error
handling for con2fb_release_oldinfo"). I failed to realize what the if
(!err) checks. The mentioned commit was dropping the
con2fb_release_oldinfo() return value but the if (!err) was also
checking whether the con2fb_acquire_newinfo() function call above
failed or not.

Fix this with an early return statement.

Note that there's still a difference compared to the orginal state of
the code, the below lines are now also skipped on error:

if (!search_fb_in_map(info_idx))
info_idx = newidx;

These are only needed when we've actually thrown out an old fb_info
from the console mappings, which only happens later on.

Also move the fbcon_add_cursor_work() call into the same if block,
it's all protected by console_lock so doesn't matter when we set up
the blinking cursor delayed work anyway. This further simplifies the
control flow and allows us to ditch the found local variable.

v2: Clarify commit message (Javier)

Signed-off-by: Daniel Vetter <[email protected]>
Reviewed-by: Javier Martinez Canillas <[email protected]>
Acked-by: Helge Deller <[email protected]>
Tested-by: Xingyuan Mo <[email protected]>
Fixes: b07db3958485 ("fbcon: Ditch error handling for con2fb_release_oldinfo")
Cc: Thomas Zimmermann <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Xingyuan Mo <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: <[email protected]> # v5.19+

skbuff: Fix a race between coalescing and releasing SKBs

Commit 1effe8ca4e34 ("skbuff: fix coalescing for page_pool fragment
recycling") allowed coalescing to proceed with non page pool page and page
pool page when @from is cloned, i.e.

to->pp_recycle    --> false
from->pp_recycle  --> true
skb_cloned(from)  --> true

However, it actually requires skb_cloned(@from) to hold true until
coalescing finishes in this situation. If the other cloned SKB is
released while the merging is in process, from_shinfo->nr_frags will be
set to 0 toward the end of the function, causing the increment of frag
page _refcount to be unexpectedly skipped resulting in inconsistent
reference counts. Later when SKB(@to) is released, it frees the page
directly even though the page pool page is still in use, leading to
use-after-free or double-free errors. So it should be prohibited.

The double-free error message below prompted us to investigate:
BUG: Bad page state in process swapper/1  pfn:0e0d1
page:00000000c6548b28 refcount:-1 mapcount:0 mapping:0000000000000000
index:0x2 pfn:0xe0d1
flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
raw: 000fffffc0000000 0000000000000000 ffffffff00000101 0000000000000000
raw: 0000000000000002 0000000000000000 ffffffffffffffff 0000000000000000
page dumped because: nonzero _refcount

CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E      6.2.0+
Call Trace:
<IRQ>
dump_stack_lvl+0x32/0x50
bad_page+0x69/0xf0
free_pcp_prepare+0x260/0x2f0
free_unref_page+0x20/0x1c0
skb_release_data+0x10b/0x1a0
napi_consume_skb+0x56/0x150
net_rx_action+0xf0/0x350
? __napi_schedule+0x79/0x90
__do_softirq+0xc8/0x2b1
__irq_exit_rcu+0xb9/0xf0
common_interrupt+0x82/0xa0
</IRQ>
<TASK>
asm_common_interrupt+0x22/0x40
RIP: 0010:default_idle+0xb/0x20

Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
Signed-off-by: Liang Chen <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: macb: fix a memory corruption in extended buffer descriptor mode

For quite some time we were chasing a bug which looked like a sudden
permanent failure of networking and mmc on some of our devices.
The bug was very sensitive to any software changes and even more to
any kernel debug options.

Finally we got a setup where the problem was reproducible with
CONFIG_DMA_API_DEBUG=y and it revealed the issue with the rx dma:

[   16.992082] ------------[ cut here ]------------
[   16.996779] DMA-API: macb ff0b0000.ethernet: device driver tries to free DMA memory it has not allocated [device address=0x0000000875e3e244] [size=1536 bytes]
[   17.011049] WARNING: CPU: 0 PID: 85 at kernel/dma/debug.c:1011 check_unmap+0x6a0/0x900
[   17.018977] Modules linked in: xxxxx
[   17.038823] CPU: 0 PID: 85 Comm: irq/55-8000f000 Not tainted 5.4.0 #28
[   17.045345] Hardware name: xxxxx
[   17.049528] pstate: 60000005 (nZCv daif -PAN -UAO)
[   17.054322] pc : check_unmap+0x6a0/0x900
[   17.058243] lr : check_unmap+0x6a0/0x900
[   17.062163] sp : ffffffc010003c40
[   17.065470] x29: ffffffc010003c40 x28: 000000004000c03c
[   17.070783] x27: ffffffc010da7048 x26: ffffff8878e38800
[   17.076095] x25: ffffff8879d22810 x24: ffffffc010003cc8
[   17.081407] x23: 0000000000000000 x22: ffffffc010a08750
[   17.086719] x21: ffffff8878e3c7c0 x20: ffffffc010acb000
[   17.092032] x19: 0000000875e3e244 x18: 0000000000000010
[   17.097343] x17: 0000000000000000 x16: 0000000000000000
[   17.102647] x15: ffffff8879e4a988 x14: 0720072007200720
[   17.107959] x13: 0720072007200720 x12: 0720072007200720
[   17.113261] x11: 0720072007200720 x10: 0720072007200720
[   17.118565] x9 : 0720072007200720 x8 : 000000000000022d
[   17.123869] x7 : 0000000000000015 x6 : 0000000000000098
[   17.129173] x5 : 0000000000000000 x4 : 0000000000000000
[   17.134475] x3 : 00000000ffffffff x2 : ffffffc010a1d370
[   17.139778] x1 : b420c9d75d27bb00 x0 : 0000000000000000
[   17.145082] Call trace:
[   17.147524]  check_unmap+0x6a0/0x900
[   17.151091]  debug_dma_unmap_page+0x88/0x90
[   17.155266]  gem_rx+0x114/0x2f0
[   17.158396]  macb_poll+0x58/0x100
[   17.161705]  net_rx_action+0x118/0x400
[   17.165445]  __do_softirq+0x138/0x36c
[   17.169100]  irq_exit+0x98/0xc0
[   17.172234]  __handle_domain_irq+0x64/0xc0
[   17.176320]  gic_handle_irq+0x5c/0xc0
[   17.179974]  el1_irq+0xb8/0x140
[   17.183109]  xiic_process+0x5c/0xe30
[   17.186677]  irq_thread_fn+0x28/0x90
[   17.190244]  irq_thread+0x208/0x2a0
[   17.193724]  kthread+0x130/0x140
[   17.196945]  ret_from_fork+0x10/0x20
[   17.200510] ---[ end trace 7240980785f81d6f ]---

[  237.021490] ------------[ cut here ]------------
[  237.026129] DMA-API: exceeded 7 overlapping mappings of cacheline 0x0000000021d79e7b
[  237.033886] WARNING: CPU: 0 PID: 0 at kernel/dma/debug.c:499 add_dma_entry+0x214/0x240
[  237.041802] Modules linked in: xxxxx
[  237.061637] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W         5.4.0 #28
[  237.068941] Hardware name: xxxxx
[  237.073116] pstate: 80000085 (Nzcv daIf -PAN -UAO)
[  237.077900] pc : add_dma_entry+0x214/0x240
[  237.081986] lr : add_dma_entry+0x214/0x240
[  237.086072] sp : ffffffc010003c30
[  237.089379] x29: ffffffc010003c30 x28: ffffff8878a0be00
[  237.094683] x27: 0000000000000180 x26: ffffff8878e387c0
[  237.099987] x25: 0000000000000002 x24: 0000000000000000
[  237.105290] x23: 000000000000003b x22: ffffffc010a0fa00
[  237.110594] x21: 0000000021d79e7b x20: ffffffc010abe600
[  237.115897] x19: 00000000ffffffef x18: 0000000000000010
[  237.121201] x17: 0000000000000000 x16: 0000000000000000
[  237.126504] x15: ffffffc010a0fdc8 x14: 0720072007200720
[  237.131807] x13: 0720072007200720 x12: 0720072007200720
[  237.137111] x11: 0720072007200720 x10: 0720072007200720
[  237.142415] x9 : 0720072007200720 x8 : 0000000000000259
[  237.147718] x7 : 0000000000000001 x6 : 0000000000000000
[  237.153022] x5 : ffffffc010003a20 x4 : 0000000000000001
[  237.158325] x3 : 0000000000000006 x2 : 0000000000000007
[  237.163628] x1 : 8ac721b3a7dc1c00 x0 : 0000000000000000
[  237.168932] Call trace:
[  237.171373]  add_dma_entry+0x214/0x240
[  237.175115]  debug_dma_map_page+0xf8/0x120
[  237.179203]  gem_rx_refill+0x190/0x280
[  237.182942]  gem_rx+0x224/0x2f0
[  237.186075]  macb_poll+0x58/0x100
[  237.189384]  net_rx_action+0x118/0x400
[  237.193125]  __do_softirq+0x138/0x36c
[  237.196780]  irq_exit+0x98/0xc0
[  237.199914]  __handle_domain_irq+0x64/0xc0
[  237.204000]  gic_handle_irq+0x5c/0xc0
[  237.207654]  el1_irq+0xb8/0x140
[  237.210789]  arch_cpu_idle+0x40/0x200
[  237.214444]  default_idle_call+0x18/0x30
[  237.218359]  do_idle+0x200/0x280
[  237.221578]  cpu_startup_entry+0x20/0x30
[  237.225493]  rest_init+0xe4/0xf0
[  237.228713]  arch_call_rest_init+0xc/0x14
[  237.232714]  start_kernel+0x47c/0x4a8
[  237.236367] ---[ end trace 7240980785f81d70 ]---

Lars was fast to find an explanation: according to the datasheet
bit 2 of the rx buffer descriptor entry has a different meaning in the
extended mode:
  Address [2] of beginning of buffer, or
  in extended buffer descriptor mode (DMA configuration register [28] = 1),
  indicates a valid timestamp in the buffer descriptor entry.

The macb driver didn't mask this bit while getting an address and it
eventually caused a memory corruption and a dma failure.

The problem is resolved by explicitly clearing the problematic bit
if hw timestamping is used.

Fixes: 7b4296148066 ("net: macb: Add support for PTP timestamps in DMA descriptors")
Signed-off-by: Roman Gushchin <[email protected]>
Co-developed-by: Lars-Peter Clausen <[email protected]>
Signed-off-by: Lars-Peter Clausen <[email protected]>
Acked-by: Nicolas Ferre <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>