Git Repo - linux.git/log

selftests: vxlan_mdb: Fix failures with old libnet

Locally generated IP multicast packets (such as the ones used in the
test) do not perform routing and simply egress the bound device.

However, as explained in commit 8bcfb4ae4d97 ("selftests: forwarding:
Fix failing tests with old libnet"), old versions of libnet (used by
mausezahn) do not use the "SO_BINDTODEVICE" socket option. Specifically,
the library started using the option for IPv6 sockets in version 1.1.6
and for IPv4 sockets in version 1.2. This explains why on Ubuntu - which
uses version 1.1.6 - the IPv4 overlay tests are failing whereas the IPv6
ones are passing.

Fix by specifying the source and destination MAC of the packets which
will cause mausezahn to use a packet socket instead of an IP socket.

Fixes: 62199e3f1658 ("selftests: net: Add VXLAN MDB test")
Reported-by: Mirsad Todorovac <[email protected]>
Closes: https://lore.kernel.org/netdev/[email protected]/
Tested-by: Mirsad Todorovac <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

MAINTAINERS: split Renesas Ethernet drivers entry

Since the Renesas Ethernet Switch driver was added by Yoshihiro Shimoda,
I started receiving the patches to review for it -- which I was unable to
do, as I don't know this hardware and don't even have the manuals for it.
Fortunately, Shimoda-san has volunteered to be a reviewer for this new
driver, thus let's now split the single entry into 3 per-driver entries,
each with its own reviewer...

Signed-off-by: Sergey Shtylyov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Yoshihiro Shimoda <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: dsa: mt7530: fix improper frames on all 25MHz and 40MHz XTAL MT7530

The MT7530 switch after reset initialises with a core clock frequency that
works with a 25MHz XTAL connected to it. For 40MHz XTAL, the core clock
frequency must be set to 500MHz.

The mt7530_pll_setup() function is responsible of setting the core clock
frequency. Currently, it runs on MT7530 with 25MHz and 40MHz XTAL. This
causes MT7530 switch with 25MHz XTAL to egress and ingress frames
improperly.

Introduce a check to run it only on MT7530 with 40MHz XTAL.

The core clock frequency is set by writing to a switch PHY's register.
Access to the PHY's register is done via the MDIO bus the switch is also
on. Therefore, it works only when the switch makes switch PHYs listen on
the MDIO bus the switch is on. This is controlled either by the state of
the ESW_P1_LED_1 pin after reset deassertion or modifying bit 5 of the
modifiable trap register.

When ESW_P1_LED_1 is pulled high, PHY indirect access is used. That means
accessing PHY registers via the PHY indirect access control register of the
switch.

When ESW_P1_LED_1 is pulled low, PHY direct access is used. That means
accessing PHY registers via the MDIO bus the switch is on.

For MT7530 switch with 40MHz XTAL on a board with ESW_P1_LED_1 pulled high,
the core clock frequency won't be set to 500MHz, causing the switch to
egress and ingress frames improperly.

Run mt7530_pll_setup() after PHY direct access is set on the modifiable
trap register.

With these two changes, all MT7530 switches with 25MHz and 40MHz, and
P1_LED_1 pulled high or low, will egress and ingress frames properly.

Link: https://github.com/BPI-SINOVOIP/BPI-R2-bsp/blob/4a5dd143f2172ec97a2872fa29c7c4cd520f45b5/linux-mt/drivers/net/ethernet/mediatek/gsw_mt7623.c#L1039
Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
Signed-off-by: Arınç ÜNAL <[email protected]>
Link: https://lore.kernel.org/r/20240320-for-net-mt7530-fix-25mhz-xtal-with-direct-phy-access-v1-1-d92f605f1160@arinc9.com
Signed-off-by: Paolo Abeni <[email protected]>

net: wwan: t7xx: Split 64bit accesses to fix alignment issues

Some of the registers are aligned on a 32bit boundary, causing
alignment faults on 64bit platforms.

Unable to handle kernel paging request at virtual address ffffffc084a1d004
Mem abort info:
ESR = 0x0000000096000061
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x21: alignment fault
Data abort info:
ISV = 0, ISS = 0x00000061, ISS2 = 0x00000000
CM = 0, WnR = 1, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000046ad6000
[ffffffc084a1d004] pgd=100000013ffff003, p4d=100000013ffff003, pud=100000013ffff003, pmd=0068000020a00711
Internal error: Oops: 0000000096000061 [#1] SMP
Modules linked in: mtk_t7xx(+) qcserial pppoe ppp_async option nft_fib_inet nf_flow_table_inet mt7921u(O) mt7921s(O) mt7921e(O) mt7921_common(O) iwlmvm(O) iwldvm(O) usb_wwan rndis_host qmi_wwan pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt7996e(O) mt792x_usb(O) mt792x_lib(O) mt7915e(O) mt76_usb(O) mt76_sdio(O) mt76_connac_lib(O) mt76(O) mac80211(O) iwlwifi(O) huawei_cdc_ncm cfg80211(O) cdc_ncm cdc_ether wwan usbserial usbnet slhc sfp rtc_pcf8563 nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 mt6577_auxadc mdio_i2c libcrc32c compat(O) cdc_wdm cdc_acm at24 crypto_safexcel pwm_fan i2c_gpio i2c_smbus industrialio i2c_algo_bit i2c_mux_reg i2c_mux_pca954x i2c_mux_pca9541 i2c_mux_gpio i2c_mux dummy oid_registry tun sha512_arm64 sha1_ce sha1_generic seqiv
md5 geniv des_generic libdes cbc authencesn authenc leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd xhci_hcd nvme nvme_core gpio_button_hotplug(O) dm_mirror dm_region_hash dm_log dm_crypt dm_mod dax usbcore usb_common ptp aquantia pps_core mii tpm encrypted_keys trusted
CPU: 3 PID: 5266 Comm: kworker/u9:1 Tainted: G O 6.6.22 #0
Hardware name: Bananapi BPI-R4 (DT)
Workqueue: md_hk_wq t7xx_fsm_uninit [mtk_t7xx]
pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : t7xx_cldma_hw_set_start_addr+0x1c/0x3c [mtk_t7xx]
lr : t7xx_cldma_start+0xac/0x13c [mtk_t7xx]
sp : ffffffc085d63d30
x29: ffffffc085d63d30 x28: 0000000000000000 x27: 0000000000000000
x26: 0000000000000000 x25: ffffff80c804f2c0 x24: ffffff80ca196c05
x23: 0000000000000000 x22: ffffff80c814b9b8 x21: ffffff80c814b128
x20: 0000000000000001 x19: ffffff80c814b080 x18: 0000000000000014
x17: 0000000055c9806b x16: 000000007c5296d0 x15: 000000000f6bca68
x14: 00000000dbdbdce4 x13: 000000001aeaf72a x12: 0000000000000001
x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
x8 : ffffff80ca1ef6b4 x7 : ffffff80c814b818 x6 : 0000000000000018
x5 : 0000000000000870 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 000000010a947000 x1 : ffffffc084a1d004 x0 : ffffffc084a1d004
Call trace:
t7xx_cldma_hw_set_start_addr+0x1c/0x3c [mtk_t7xx]
t7xx_fsm_uninit+0x578/0x5ec [mtk_t7xx]
process_one_work+0x154/0x2a0
worker_thread+0x2ac/0x488
kthread+0xe0/0xec
ret_from_fork+0x10/0x20
Code: f9400800 91001000 8b214001 d50332bf (f9000022)
---[ end trace 0000000000000000 ]---

The inclusion of io-64-nonatomic-lo-hi.h indicates that all 64bit
accesses can be replaced by pairs of nonatomic 32bit access. Fix
alignment by forcing all accesses to be 32bit on 64bit platforms.

Link: https://forum.openwrt.org/t/fibocom-fm350-gl-support/142682/72
Fixes: 39d439047f1d ("net: wwan: t7xx: Add control DMA interface")
Signed-off-by: Bjørn Mork <[email protected]>
Reviewed-by: Sergey Ryazanov <[email protected]>
Tested-by: Liviu Dudau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tcp: properly terminate timers for kernel sockets

We had various syzbot reports about tcp timers firing after
the corresponding netns has been dismantled.

Fortunately Josef Bacik could trigger the issue more often,
and could test a patch I wrote two years ago.

When TCP sockets are closed, we call inet_csk_clear_xmit_timers()
to 'stop' the timers.

inet_csk_clear_xmit_timers() can be called from any context,
including when socket lock is held.
This is the reason it uses sk_stop_timer(), aka del_timer().
This means that ongoing timers might finish much later.

For user sockets, this is fine because each running timer
holds a reference on the socket, and the user socket holds
a reference on the netns.

For kernel sockets, we risk that the netns is freed before
timer can complete, because kernel sockets do not hold
reference on the netns.

This patch adds inet_csk_clear_xmit_timers_sync() function
that using sk_stop_timer_sync() to make sure all timers
are terminated before the kernel socket is released.
Modules using kernel sockets close them in their netns exit()
handler.

Also add sock_not_owned_by_me() helper to get LOCKDEP
support : inet_csk_clear_xmit_timers_sync() must not be called
while socket lock is held.

It is very possible we can revert in the future commit
3a58f13a881e ("net: rds: acquire refcount on TCP sockets")
which attempted to solve the issue in rds only.
(net/smc/af_smc.c and net/mptcp/subflow.c have similar code)

We probably can remove the check_net() tests from
tcp_out_of_resources() and __tcp_close() in the future.

Reported-by: Josef Bacik <[email protected]>
Closes: https://lore.kernel.org/netdev/20240314210740.GA2823176@perftesting/
Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
Fixes: 8a68173691f0 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket")
Link: https://lore.kernel.org/bpf/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@mail.gmail.com/
Signed-off-by: Eric Dumazet <[email protected]>
Tested-by: Josef Bacik <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: hsr: hsr_slave: Fix the promiscuous mode in offload mode

commit e748d0fd66ab ("net: hsr: Disable promiscuous mode in
offload mode") disables promiscuous mode of slave devices
while creating an HSR interface. But while deleting the
HSR interface, it does not take care of it. It decreases the
promiscuous mode count, which eventually enables promiscuous
mode on the slave devices when creating HSR interface again.

Fix this by not decrementing the promiscuous mode count while
deleting the HSR interface when offload is enabled.

Fixes: e748d0fd66ab ("net: hsr: Disable promiscuous mode in offload mode")
Signed-off-by: Ravi Gunasekaran <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: ll_temac: platform_get_resource replaced by wrong function

The function platform_get_resource was replaced with
devm_platform_ioremap_resource_byname and is called using 0 as name.

This eventually ends up in platform_get_resource_byname in the call
stack, where it causes a null pointer in strcmp.

if (type == resource_type(r) && !strcmp(r->name, name))

It should have been replaced with devm_platform_ioremap_resource.

Fixes: bd69058f50d5 ("net: ll_temac: Use devm_platform_ioremap_resource_byname()")
Signed-off-by: Claus Hansen Ries <[email protected]>
Cc: [email protected]
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

s390/qeth: handle deferred cc1

The IO subsystem expects a driver to retry a ccw_device_start, when the
subsequent interrupt response block (irb) contains a deferred
condition code 1.

Symptoms before this commit:
On the read channel we always trigger the next read anyhow, so no
different behaviour here.
On the write channel we may experience timeout errors, because the
expected reply will never be received without the retry.
Other callers of qeth_send_control_data() may wrongly assume that the ccw
was successful, which may cause problems later.

Note that since
commit 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers")
and
commit 5ef1dc40ffa6 ("s390/cio: fix invalid -EBUSY on ccw_device_start")
deferred CC1s are much more likely to occur. See the commit message of the
latter for more background information.

Fixes: 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers")
Signed-off-by: Alexandra Winter <[email protected]>
Co-developed-by: Thorsten Winkler <[email protected]>
Signed-off-by: Thorsten Winkler <[email protected]>
Reviewed-by: Peter Oberparleiter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

dpll: indent DPLL option type by a tab

Indent config option type by a tab. It helps Kconfig parsers
to read file without error.

Fixes: 9431063ad323 ("dpll: core: Add DPLL framework base functions")
Signed-off-by: Prasad Pandit <[email protected]>
Reviewed-by: Vadim Fedorenko <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

riscv, bpf: Fix kfunc parameters incompatibility between bpf and riscv abi

We encountered a failing case when running selftest in no_alu32 mode:

The failure case is `kfunc_call/kfunc_call_test4` and its source code is
like bellow:
```
long bpf_kfunc_call_test4(signed char a, short b, int c, long d) __ksym;
int kfunc_call_test4(struct __sk_buff *skb)
{
...
tmp = bpf_kfunc_call_test4(-3, -30, -200, -1000);
...
}
```

And its corresponding asm code is:
```
0: r1 = -3
1: r2 = -30
2: r3 = 0xffffff38 # opcode: 18 03 00 00 38 ff ff ff 00 00 00 00 00 00 00 00
4: r4 = -1000
5: call bpf_kfunc_call_test4
```

insn 2 is parsed to ld_imm64 insn to emit 0x00000000ffffff38 imm, and
converted to int type and then send to bpf_kfunc_call_test4. But since
it is zero-extended in the bpf calling convention, riscv jit will
directly treat it as an unsigned 32-bit int value, and then fails with
the message "actual 4294966063 != expected -1234".

The reason is the incompatibility between bpf and riscv abi, that is,
bpf will do zero-extension on uint, but riscv64 requires sign-extension
on int or uint. We can solve this problem by sign extending the 32-bit
parameters in kfunc.

The issue is related to [0], and thanks to Yonghong and Alexei.

Link: https://github.com/llvm/llvm-project/pull/84874
Fixes: d40c3847b485 ("riscv, bpf: Add kfunc support for RV64")
Signed-off-by: Pu Lehui <[email protected]>
Tested-by: Puranjay Mohan <[email protected]>
Reviewed-by: Puranjay Mohan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

Merge tag 'gfs2-v6.8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Andreas Gruenbacher:

- Fix boundary check in punch_hole

* tag 'gfs2-v6.8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Fix invalid metadata access in punch_hole

Merge tag 'v6.9-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
"This fixes a regression that broke iwd as well as a divide by zero in
  iaa"

* tag 'v6.9-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: iaa - Fix nr_cpus < nr_iaa case
  Revert "crypto: pkcs7 - remove sha1 support"

igc: Remove stale comment about Tx timestamping

The initial igc Tx timestamping implementation used only one register for
retrieving Tx timestamps. Commit 3ed247e78911 ("igc: Add support for
multiple in-flight TX timestamps") added support for utilizing all four of
them e.g., for multiple domain support. Remove the stale comment/FIXME.

Fixes: 3ed247e78911 ("igc: Add support for multiple in-flight TX timestamps")
Signed-off-by: Kurt Kanzenbach <[email protected]>
Acked-by: Vinicius Costa Gomes <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Tested-by: Naama Meir <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ixgbe: avoid sleeping allocation in ixgbe_ipsec_vf_add_sa()

Change kzalloc() flags used in ixgbe_ipsec_vf_add_sa() to GFP_ATOMIC, to
avoid sleeping in IRQ context.

Dan Carpenter, with the help of Smatch, has found following issue:
The patch eda0333ac293: "ixgbe: add VF IPsec management" from Aug 13,
2018 (linux-next), leads to the following Smatch static checker
warning: drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c:917 ixgbe_ipsec_vf_add_sa()
warn: sleeping in IRQ context

The call tree that Smatch is worried about is:
ixgbe_msix_other() <- IRQ handler
-> ixgbe_msg_task()
-> ixgbe_rcv_msg_from_vf()
-> ixgbe_ipsec_vf_add_sa()

Fixes: eda0333ac293 ("ixgbe: add VF IPsec management")
Reported-by: Dan Carpenter <[email protected]>
Link: https://lore.kernel.org/intel-wired-lan/[email protected]
Reviewed-by: Michal Kubiak <[email protected]>
Signed-off-by: Przemek Kitszel <[email protected]>
Reviewed-by: Shannon Nelson <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

ice: fix memory corruption bug with suspend and rebuild

The ice driver would previously panic after suspend. This is caused
from the driver *only* calling the ice_vsi_free_q_vectors() function by
itself, when it is suspending. Since commit b3e7b3a6ee92 ("ice: prevent
NULL pointer deref during reload") the driver has zeroed out
num_q_vectors, and only restored it in ice_vsi_cfg_def().

This further causes the ice_rebuild() function to allocate a zero length
buffer, after which num_q_vectors is updated, and then the new value of
num_q_vectors is used to index into the zero length buffer, which
corrupts memory.

The fix entails making sure all the code referencing num_q_vectors only
does so after it has been reset via ice_vsi_cfg_def().

I didn't perform a full bisect, but I was able to test against 6.1.77
kernel and that ice driver works fine for suspend/resume with no panic,
so sometime since then, this problem was introduced.

Also clean up an un-needed init of a local variable in the function
being modified.

PANIC from 6.8.0-rc1:

[1026674.915596] PM: suspend exit
[1026675.664697] ice 0000:17:00.1: PTP reset successful
[1026675.664707] ice 0000:17:00.1: 2755 msecs passed between update to cached PHC time
[1026675.667660] ice 0000:b1:00.0: PTP reset successful
[1026675.675944] ice 0000:b1:00.0: 2832 msecs passed between update to cached PHC time
[1026677.137733] ixgbe 0000:31:00.0 ens787: NIC Link is Up 1 Gbps, Flow Control: None
[1026677.190201] BUG: kernel NULL pointer dereference, address: 0000000000000010
[1026677.192753] ice 0000:17:00.0: PTP reset successful
[1026677.192764] ice 0000:17:00.0: 4548 msecs passed between update to cached PHC time
[1026677.197928] #PF: supervisor read access in kernel mode
[1026677.197933] #PF: error_code(0x0000) - not-present page
[1026677.197937] PGD 1557a7067 P4D 0
[1026677.212133] ice 0000:b1:00.1: PTP reset successful
[1026677.212143] ice 0000:b1:00.1: 4344 msecs passed between update to cached PHC time
[1026677.212575]
[1026677.243142] Oops: 0000 [#1] PREEMPT SMP NOPTI
[1026677.247918] CPU: 23 PID: 42790 Comm: kworker/23:0 Kdump: loaded Tainted: G        W          6.8.0-rc1+ #1
[1026677.257989] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022
[1026677.269367] Workqueue: ice ice_service_task [ice]
[1026677.274592] RIP: 0010:ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice]
[1026677.281421] Code: 0f 84 3a ff ff ff 41 0f b7 74 ec 02 66 89 b0 22 02 00 00 81 e6 ff 1f 00 00 e8 ec fd ff ff e9 35 ff ff ff 48 8b 43 30 49 63 ed <41> 0f b7 34 24 41 83 c5 01 48 8b 3c e8 66 89 b7 aa 02 00 00 81 e6
[1026677.300877] RSP: 0018:ff3be62a6399bcc0 EFLAGS: 00010202
[1026677.306556] RAX: ff28691e28980828 RBX: ff28691e41099828 RCX: 0000000000188000
[1026677.314148] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ff28691e41099828
[1026677.321730] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[1026677.329311] R10: 0000000000000007 R11: ffffffffffffffc0 R12: 0000000000000010
[1026677.336896] R13: 0000000000000000 R14: 0000000000000000 R15: ff28691e0eaa81a0
[1026677.344472] FS:  0000000000000000(0000) GS:ff28693cbffc0000(0000) knlGS:0000000000000000
[1026677.353000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1026677.359195] CR2: 0000000000000010 CR3: 0000000128df4001 CR4: 0000000000771ef0
[1026677.366779] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1026677.374369] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[1026677.381952] PKRU: 55555554
[1026677.385116] Call Trace:
[1026677.388023]  <TASK>
[1026677.390589]  ? __die+0x20/0x70
[1026677.394105]  ? page_fault_oops+0x82/0x160
[1026677.398576]  ? do_user_addr_fault+0x65/0x6a0
[1026677.403307]  ? exc_page_fault+0x6a/0x150
[1026677.407694]  ? asm_exc_page_fault+0x22/0x30
[1026677.412349]  ? ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice]
[1026677.418614]  ice_vsi_rebuild+0x34b/0x3c0 [ice]
[1026677.423583]  ice_vsi_rebuild_by_type+0x76/0x180 [ice]
[1026677.429147]  ice_rebuild+0x18b/0x520 [ice]
[1026677.433746]  ? delay_tsc+0x8f/0xc0
[1026677.437630]  ice_do_reset+0xa3/0x190 [ice]
[1026677.442231]  ice_service_task+0x26/0x440 [ice]
[1026677.447180]  process_one_work+0x174/0x340
[1026677.451669]  worker_thread+0x27e/0x390
[1026677.455890]  ? __pfx_worker_thread+0x10/0x10
[1026677.460627]  kthread+0xee/0x120
[1026677.464235]  ? __pfx_kthread+0x10/0x10
[1026677.468445]  ret_from_fork+0x2d/0x50
[1026677.472476]  ? __pfx_kthread+0x10/0x10
[1026677.476671]  ret_from_fork_asm+0x1b/0x30
[1026677.481050]  </TASK>

Fixes: b3e7b3a6ee92 ("ice: prevent NULL pointer deref during reload")
Reported-by: Robert Elliott <[email protected]>
Signed-off-by: Jesse Brandeburg <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Aleksandr Loktionov <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

ice: Refactor FW data type and fix bitmap casting issue

According to the datasheet, the recipe association data is an 8-byte
little-endian value. It is described as 'Bitmap of the recipe indexes
associated with this profile', it is from 24 to 31 byte area in FW.
Therefore, it is defined to '__le64 recipe_assoc' in struct
ice_aqc_recipe_to_profile. And then fix the bitmap casting issue, as we
must never ever use castings for bitmap type.

Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for SRIOV on bonded interface")
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Andrii Staikov <[email protected]>
Reviewed-by: Jan Sokolowski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Steven Zou <[email protected]>
Tested-by: Sujai Buvaneswaran <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

kunit: fix wireless test dependencies

For the wireless tests, CONFIG_WLAN and CONFIG_NETDEVICES are
needed, though seem to be available by default on ARCH=um, so
we didn't notice this before. Add them to fix kunit running
on other architectures.

Fixes: 28b3df1fe6ba ("kunit: add wireless unit tests")
Reported-by: Mark Brown <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]/
Signed-off-by: Johannes Berg <[email protected]>

net: mark racy access on sk->sk_rcvbuf

sk->sk_rcvbuf in __sock_queue_rcv_skb() and __sk_receive_skb() can be
changed by other threads. Mark this as benign using READ_ONCE().

This patch is aimed at reducing the number of benign races reported by
KCSAN in order to focus future debugging effort on harmful races.

Signed-off-by: linke li <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

wifi: iwlwifi: mvm: include link ID when releasing frames

When releasing frames from the reorder buffer, the link ID was not
included in the RX status information. This subsequently led mac80211 to
drop the frame. Change it so that the link information is set
immediately when possible so that it doesn't not need to be filled in
anymore when submitting the frame to mac80211.

Fixes: b8a85a1d42d7 ("wifi: iwlwifi: mvm: rxmq: report link ID to mac80211")
Signed-off-by: Benjamin Berg <[email protected]>
Tested-by: Emmanuel Grumbach <[email protected]>
Reviewed-by: Johannes Berg <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240320232419.bbbd5e9bfe80.Iec1bf5c884e371f7bc5ea2534ed9ea8d3f2c0bf6@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: iwlwifi: mvm: handle debugfs names more carefully

With debugfs=off, we can get here with the dbgfs_dir being
an ERR_PTR(). Instead of checking for all this, which is
often flagged as a mistake, simply handle the names here
more carefully by printing them, then we don't need extra
checks.

Also, while checking, I noticed theoretically 'buf' is too
small, so fix that size as well.

Cc: [email protected]
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218422
Fixes: c36235acb34f ("wifi: iwlwifi: mvm: rework debugfs handling")
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240320232419.4dc1eb3dd015.I32f308b0356ef5bcf8d188dd98ce9b210e3ab9fd@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: iwlwifi: mvm: guard against invalid STA ID on removal

Guard against invalid station IDs in iwl_mvm_mld_rm_sta_id as that would
result in out-of-bounds array accesses. This prevents issues should the
driver get into a bad state during error handling.

Signed-off-by: Benjamin Berg <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240320232419.d523167bda9c.I1cffd86363805bf86a95d8bdfd4b438bb54baddc@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: iwlwifi: read txq->read_ptr under lock

If we read txq->read_ptr without lock, we can read the same
value twice, then obtain the lock, and reclaim from there
to two different places, but crucially reclaim the same
entry twice, resulting in the WARN_ONCE() a little later.
Fix that by reading txq->read_ptr under lock.

Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240319100755.bf4c62196504.I978a7ca56c6bd6f1bf42c15aa923ba03366a840b@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: iwlwifi: fw: don't always use FW dump trig

Since the dump_data (struct iwl_fwrt_dump_data) is a union,
it's not safe to unconditionally access and use the 'trig'
member, it might be 'desc' instead. Access it only if it's
known to be 'trig' rather than 'desc', i.e. if ini-debug
is present.

Cc: [email protected]
Fixes: 0eb50c674a1e ("iwlwifi: yoyo: send hcmd to fw after dump collection completes.")
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240319100755.e2976bc58b29.I72fbd6135b3623227de53d8a2bb82776066cb72b@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: iwlwifi: mvm: rfi: fix potential response leaks

If the rx payload length check fails, or if kmemdup() fails,
we still need to free the command response. Fix that.

Fixes: 21254908cbe9 ("iwlwifi: mvm: add RFI-M support")
Co-authored-by: Anjaneyulu <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240319100755.db2fa0196aa7.I116293b132502ac68a65527330fa37799694b79c@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211: correctly set active links upon TTLM

Fix ieee80211_ttlm_set_links() to not set all active links,
but instead let the driver know that valid links status changed
and select the active links properly.

Fixes: 8f500fbc6c65 ("wifi: mac80211: process and save negotiated TID to Link mapping request")
Signed-off-by: Ayala Beker <[email protected]>
Reviewed-by: Ilan Peer <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240318184907.acddbbf39584.Ide858f95248fcb3e483c97fcaa14b0cd4e964b10@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: iwlwifi: mvm: Configure the link mapping for non-MLD FW

In the non MLD firmware flows, although the deflink is used, the mapping
of link ID to BSS configuration was missing, which causes flows that need
this mapping to crash.

Fix this by adding the link ID to BSS configuration mapping to non MLD
flows as well.

Signed-off-by: Ilan Peer <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240311081938.0b5c361e8f0c.Ib11f41815d2efa5d1ec57f855de4c8563142987b@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: iwlwifi: mvm: consider having one active link

Do not call iwl_mvm_mld_get_primary_link if only one link
is active.
In that case, the sole active link should be used.

iwl_mvm_mld_get_primary_link returns -1 if only one link
is active causing a warning.

Fixes: 8c9bef26e98b ("wifi: iwlwifi: mvm: d3: implement suspend with MLO")
Signed-off-by: Shaul Triebitz <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240311081938.6c50061bf69b.I05b0ac7fa7149eabaa5570a6f65b0d9bfb09a6f1@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: iwlwifi: mvm: pick the version of SESSION_PROTECTION_NOTIF

When we want to know whether we should look for the mac_id or the
link_id in struct iwl_mvm_session_prot_notif, we should look at the
version of SESSION_PROTECTION_NOTIF.

This causes WARNINGs:

WARNING: CPU: 0 PID: 11403 at drivers/net/wireless/intel/iwlwifi/mvm/time-event.c:959 iwl_mvm_rx_session_protect_notif+0x333/0x340 [iwlmvm]
RIP: 0010:iwl_mvm_rx_session_protect_notif+0x333/0x340 [iwlmvm]
Code: 00 49 c7 84 24 48 07 00 00 00 00 00 00 41 c6 84 24 78 07 00 00 ff 4c 89 f7 e8 e9 71 54 d9 e9 7d fd ff ff 0f 0b e9 23 fe ff ff <0f> 0b e9 1c fe ff ff 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90
RSP: 0018:ffffb4bb00003d40 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff9ae63a361000 RCX: ffff9ae4a98b60d4
RDX: ffff9ae4588499c0 RSI: 0000000000000305 RDI: ffff9ae4a98b6358
RBP: ffffb4bb00003d68 R08: 0000000000000003 R09: 0000000000000010
R10: ffffb4bb00003d00 R11: 000000000000000f R12: ffff9ae441399050
R13: ffff9ae4761329e8 R14: 0000000000000001 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff9ae7af400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055fb75680018 CR3: 00000003dae32006 CR4: 0000000000f70ef0
PKRU: 55555554
Call Trace:
<IRQ>
? show_regs+0x69/0x80
? __warn+0x8d/0x150
? iwl_mvm_rx_session_protect_notif+0x333/0x340 [iwlmvm]
? report_bug+0x196/0x1c0
? handle_bug+0x45/0x80
? exc_invalid_op+0x1c/0xb0
? asm_exc_invalid_op+0x1f/0x30
? iwl_mvm_rx_session_protect_notif+0x333/0x340 [iwlmvm]
iwl_mvm_rx_common+0x115/0x340 [iwlmvm]
iwl_mvm_rx_mq+0xa6/0x100 [iwlmvm]
iwl_pcie_rx_handle+0x263/0xa10 [iwlwifi]
iwl_pcie_napi_poll_msix+0x32/0xd0 [iwlwifi]

Fixes: 085d33c53012 ("wifi: iwlwifi: support link id in SESSION_PROTECTION_NOTIF")
Signed-off-by: Emmanuel Grumbach <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240311081938.39d5618f7b9d.I564d863e53c6cbcb49141467932ecb6a9840b320@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211: fix prep_connection error path

If prep_channel fails in prep_connection, the code releases
the deflink's chanctx, which is wrong since we may be using
a different link. It's already wrong to even do that always
though, since we might still have the station. Remove it
only if prep_channel succeeded and later updates fail.

Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240318184907.2780c1f08c3d.I033c9b15483933088f32a2c0789612a33dd33d82@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: cfg80211: fix rdev_dump_mpp() arguments order

Fix the order of arguments in the TP_ARGS macro
for the rdev_dump_mpp tracepoint event.

Found by Linux Verification Center (linuxtesting.org).

Signed-off-by: Igor Artemiev <[email protected]>
Link: https://msgid.link/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

wifi: iwlwifi: mvm: disable MLO for the time being

MLO ended up not really fully stable yet, we want to make
sure it works well with the ecosystem before enabling it.
Thus, remove the flag, but set WIPHY_FLAG_DISABLE_WEXT so
we don't get wireless extensions back until we enable MLO
for this hardware.

Cc: [email protected]
Reviewed-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240314110951.d6ad146df98d.I47127e4fdbdef89e4ccf7483641570ee7871d4e6@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: cfg80211: add a flag to disable wireless extensions

Wireless extensions are already disabled if MLO is enabled,
given that we cannot support MLO there with all the hard-
coded assumptions about BSSID etc.

However, the WiFi7 ecosystem is still stabilizing, and some
devices may need MLO disabled while that happens. In that
case, we might end up with a device that supports wext (but
not MLO) in one kernel, and then breaks wext in the future
(by enabling MLO), which is not desirable.

Add a flag to let such drivers/devices disable wext even if
MLO isn't yet enabled.

Cc: [email protected]
Link: https://msgid.link/20240314110951.b50f1dc4ec21.I656ddd8178eedb49dc5c6c0e70f8ce5807afb54f@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211: fix ieee80211_bss_*_flags kernel-doc

Running kernel-doc on ieee80211_i.h flagged the following:
net/mac80211/ieee80211_i.h:145: warning: expecting prototype for enum ieee80211_corrupt_data_flags. Prototype was for enum ieee80211_bss_corrupt_data_flags instead
net/mac80211/ieee80211_i.h:162: warning: expecting prototype for enum ieee80211_valid_data_flags. Prototype was for enum ieee80211_bss_valid_data_flags instead

Fix these warnings.

Signed-off-by: Jeff Johnson <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://msgid.link/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211: check/clear fast rx for non-4addr sta VLAN changes

When moving a station out of a VLAN and deleting the VLAN afterwards, the
fast_rx entry still holds a pointer to the VLAN's netdev, which can cause
use-after-free bugs. Fix this by immediately calling ieee80211_check_fast_rx
after the VLAN change.

Cc: [email protected]
Reported-by: [email protected]
Signed-off-by: Felix Fietkau <[email protected]>
Link: https://msgid.link/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211: fix mlme_link_id_dbg()

Make sure that the new mlme_link_id_dbg() macro honours
CONFIG_MAC80211_MLME_DEBUG as intended to avoid spamming the log with
messages like:

wlan0: no EHT support, limiting to HE
wlan0: determined local STA to be HE, BW limited to 160 MHz
wlan0: determined AP xx:xx:xx:xx:xx:xx to be VHT
wlan0: connecting with VHT mode, max bandwidth 160 MHz

Fixes: 310c8387c638 ("wifi: mac80211: clean up connection process")
Signed-off-by: Johan Hovold <[email protected]>
Link: https://msgid.link/[email protected]
Tested-by: Kalle Valo <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>

fs/9p: fix uninitialized values during inode evict

If an iget fails due to not being able to retrieve information
from the server then the inode structure is only partially
initialized. When the inode gets evicted, references to
uninitialized structures (like fscache cookies) were being
made.

This patch checks for a bad_inode before doing anything other
than clearing the inode from the cache. Since the inode is
bad, it shouldn't have any state associated with it that needs
to be written back (and there really isn't a way to complete
those anyways).

Reported-by: [email protected]
Signed-off-by: Eric Van Hensbergen <[email protected]>

mlxbf_gige: stop PHY during open() error paths

The mlxbf_gige_open() routine starts the PHY as part of normal
initialization. The mlxbf_gige_open() routine must stop the
PHY during its error paths.

Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
Signed-off-by: David Thompson <[email protected]>
Reviewed-by: Asmaa Mnebhi <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tracing: probes: Fix to zero initialize a local variable

Fix to initialize 'val' local variable with zero.
Dan reported that Smatch static code checker reports an error that a local
'val' variable needs to be initialized. Actually, the 'val' is expected to
be initialized by FETCH_OP_ARG in the same loop, but it is not obvious. So
initialize it with zero.

Link: https://lore.kernel.org/all/171092223833.237219.17304490075697026697.stgit@devnote2/
Reported-by: Dan Carpenter <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Fixes: 25f00e40ce79 ("tracing/probes: Support $argN in return probe (kprobe and fprobe)")
Reviewed-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>

pwm: img: fix pwm clock lookup

22e8e19 has introduced a regression in the imgchip->pwm_clk lookup, whereas
the clock name has also been renamed to "imgchip". This causes the driver
failing to load:

[ 0.546905] img-pwm 18101300.pwm: failed to get imgchip clock
[ 0.553418] img-pwm: probe of 18101300.pwm failed with error -2

Fix this lookup by reverting the clock name back to "pwm".

Signed-off-by: Zoltan HERPAI <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Fixes: 22e8e19a46f7 ("pwm: img: Rename variable pointing to driver private data")
Signed-off-by: Uwe Kleine-König <[email protected]>

MAINTAINERS: erofs: add myself as reviewer

I have been contributing to erofs for sometime and I would like to help
with code reviews as well.

Signed-off-by: Sandeep Dhavale <[email protected]>
Acked-by: Chao Yu <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Gao Xiang <[email protected]>

erofs: drop experimental warning for FSDAX

As EXT4/XFS filesystems, FSDAX functionality is considered to be stable.
Let's drop this warning.

Reviewed-by: Jingbo Xu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

fs/9p: remove redundant pointer v9ses

Pointer v9ses is being assigned the value from the return of inlined
function v9fs_inode2v9ses (which just returns inode->i_sb->s_fs_info).
The pointer is not used after the assignment, so the variable is
redundant and can be removed.

Cleans up clang scan warnings such as:
fs/9p/vfs_inode_dotl.c:300:28: warning: variable 'v9ses' set but not
used [-Wunused-but-set-variable]

Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Dominique Martinet <[email protected]>
Signed-off-by: Eric Van Hensbergen <[email protected]>

fs/9p: fix uaf in in v9fs_stat2inode_dotl

The incorrect logical order of accessing the st object code in v9fs_fid_iget_dotl
is causing this uaf.

Fixes: 724a08450f74 ("fs/9p: simplify iget to remove unnecessary paths")
Reported-and-tested-by: [email protected]
Signed-off-by: Lizhi Xu <[email protected]>
Tested-by: Breno Leitao <[email protected]>
Reviewed-by: Dominique Martinet <[email protected]>
Signed-off-by: Eric Van Hensbergen <[email protected]>

Linux 6.9-rc1

Merge tag 'efi-fixes-for-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fixes from Ard Biesheuvel:

- Fix logic that is supposed to prevent placement of the kernel image
   below LOAD_PHYSICAL_ADDR

- Use the firmware stack in the EFI stub when running in mixed mode

- Clear BSS only once when using mixed mode

- Check efi.get_variable() function pointer for NULL before trying to
   call it

* tag 'efi-fixes-for-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: fix panic in kdump kernel
  x86/efistub: Don't clear BSS twice in mixed mode
  x86/efistub: Call mixed mode boot services on the firmware's stack
  efi/libstub: fix efi_random_alloc() to allocate memory at alloc_min or higher address

Merge tag 'x86-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:

- Ensure that the encryption mask at boot is properly propagated on
   5-level page tables, otherwise the PGD entry is incorrectly set to
   non-encrypted, which causes system crashes during boot.

- Undo the deferred 5-level page table setup as it cannot work with
   memory encryption enabled.

- Prevent inconsistent XFD state on CPU hotplug, where the MSR is reset
   to the default value but the cached variable is not, so subsequent
   comparisons might yield the wrong result and as a consequence the
   result prevents updating the MSR.

- Register the local APIC address only once in the MPPARSE enumeration
   to prevent triggering the related WARN_ONs() in the APIC and topology
   code.

- Handle the case where no APIC is found gracefully by registering a
   fake APIC in the topology code. That makes all related topology
   functions work correctly and does not affect the actual APIC driver
   code at all.

- Don't evaluate logical IDs during early boot as the local APIC IDs
   are not yet enumerated and the invoked function returns an error
   code. Nothing requires the logical IDs before the final CPUID
   enumeration takes place, which happens after the enumeration.

- Cure the fallout of the per CPU rework on UP which misplaced the
   copying of boot_cpu_data to per CPU data so that the final update to
   boot_cpu_data got lost which caused inconsistent state and boot
   crashes.

- Use copy_from_kernel_nofault() in the kprobes setup as there is no
   guarantee that the address can be safely accessed.

- Reorder struct members in struct saved_context to work around another
   kmemleak false positive

- Remove the buggy code which tries to update the E820 kexec table for
   setup_data as that is never passed to the kexec kernel.

- Update the resource control documentation to use the proper units.

- Fix a Kconfig warning observed with tinyconfig

* tag 'x86-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/64: Move 5-level paging global variable assignments back
  x86/boot/64: Apply encryption mask to 5-level pagetable update
  x86/cpu: Add model number for another Intel Arrow Lake mobile processor
  x86/fpu: Keep xfd_state in sync with MSR_IA32_XFD
  Documentation/x86: Document that resctrl bandwidth control units are MiB
  x86/mpparse: Register APIC address only once
  x86/topology: Handle the !APIC case gracefully
  x86/topology: Don't evaluate logical IDs during early boot
  x86/cpu: Ensure that CPU info updates are propagated on UP
  kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe address
  x86/pm: Work around false positive kmemleak report in msr_build_context()
  x86/kexec: Do not update E820 kexec table for setup_data
  x86/config: Fix warning for 'make ARCH=x86_64 tinyconfig'

Merge tag 'sched-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler doc clarification from Thomas Gleixner:
"A single update for the documentation of the base_slice_ns tunable to
  clarify that any value which is less than the tick slice has no effect
  because the scheduler tick is not guaranteed to happen within the set
  time slice"

* tag 'sched-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/doc: Update documentation for base_slice_ns and CONFIG_HZ relation

Merge tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:
"This has a set of swiotlb alignment fixes for sometimes very long
  standing bugs from Will. We've been discussion them for a while and
  they should be solid now"

* tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE
  iommu/dma: Force swiotlb_max_mapping_size on an untrusted device
  swiotlb: Fix alignment checks when both allocation and DMA masks are present
  swiotlb: Honour dma_alloc_coherent() alignment in swiotlb_alloc()
  swiotlb: Enforce page alignment in swiotlb_alloc()
  swiotlb: Fix double-allocation of slots due to broken alignment handling

efi: fix panic in kdump kernel

Check if get_next_variable() is actually valid pointer before
calling it. In kdump kernel this method is set to NULL that causes
panic during the kexec-ed kernel boot.

Tested with QEMU and OVMF firmware.

Fixes: bad267f9e18f ("efi: verify that variable services are supported")
Signed-off-by: Oleksandr Tymoshenko <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>

x86/efistub: Don't clear BSS twice in mixed mode

Clearing BSS should only be done once, at the very beginning.
efi_pe_entry() is the entrypoint from the firmware, which may not clear
BSS and so it is done explicitly. However, efi_pe_entry() is also used
as an entrypoint by the mixed mode startup code, in which case BSS will
already have been cleared, and doing it again at this point will corrupt
global variables holding the firmware's GDT/IDT and segment selectors.

So make the memset() conditional on whether the EFI stub is running in
native mode.

Fixes: b3810c5a2cc4a666 ("x86/efistub: Clear decompressor BSS in native EFI entrypoint")
Signed-off-by: Ard Biesheuvel <[email protected]>

x86/efistub: Call mixed mode boot services on the firmware's stack

Normally, the EFI stub calls into the EFI boot services using the stack
that was live when the stub was entered. According to the UEFI spec,
this stack needs to be at least 128k in size - this might seem large but
all asynchronous processing and event handling in EFI runs from the same
stack and so quite a lot of space may be used in practice.

In mixed mode, the situation is a bit different: the bootloader calls
the 32-bit EFI stub entry point, which calls the decompressor's 32-bit
entry point, where the boot stack is set up, using a fixed allocation
of 16k. This stack is still in use when the EFI stub is started in
64-bit mode, and so all calls back into the EFI firmware will be using
the decompressor's limited boot stack.

Due to the placement of the boot stack right after the boot heap, any
stack overruns have gone unnoticed. However, commit

5c4feadb0011983b ("x86/decompressor: Move global symbol references to C code")

moved the definition of the boot heap into C code, and now the boot
stack is placed right at the base of BSS, where any overruns will
corrupt the end of the .data section.

While it would be possible to work around this by increasing the size of
the boot stack, doing so would affect all x86 systems, and mixed mode
systems are a tiny (and shrinking) fraction of the x86 installed base.

So instead, record the firmware stack pointer value when entering from
the 32-bit firmware, and switch to this stack every time a EFI boot
service call is made.

Cc: <[email protected]> # v6.1+
Signed-off-by: Ard Biesheuvel <[email protected]>

x86/boot/64: Move 5-level paging global variable assignments back

Commit 63bed9660420 ("x86/startup_64: Defer assignment of 5-level paging
global variables") moved assignment of 5-level global variables to later
in the boot in order to avoid having to use RIP relative addressing in
order to set them. However, when running with 5-level paging and SME
active (mem_encrypt=on), the variables are needed as part of the page
table setup needed to encrypt the kernel (using pgd_none(), p4d_offset(),
etc.). Since the variables haven't been set, the page table manipulation
is done as if 4-level paging is active, causing the system to crash on
boot.

While only a subset of the assignments that were moved need to be set
early, move all of the assignments back into check_la57_support() so that
these assignments aren't spread between two locations. Instead of just
reverting the fix, this uses the new RIP_REL_REF() macro when assigning
the variables.

Fixes: 63bed9660420 ("x86/startup_64: Defer assignment of 5-level paging global variables")
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>
Link: https://lore.kernel.org/r/2ca419f4d0de719926fd82353f6751f717590a86.1711122067.git.thomas.lendacky@amd.com

x86/boot/64: Apply encryption mask to 5-level pagetable update

When running with 5-level page tables, the kernel mapping PGD entry is
updated to point to the P4D table. The assignment uses _PAGE_TABLE_NOENC,
which, when SME is active (mem_encrypt=on), results in a page table
entry without the encryption mask set, causing the system to crash on
boot.

Change the assignment to use _PAGE_TABLE instead of _PAGE_TABLE_NOENC so
that the encryption mask is set for the PGD entry.

Fixes: 533568e06b15 ("x86/boot/64: Use RIP_REL_REF() to access early_top_pgt[]")
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>
Link: https://lore.kernel.org/r/8f20345cda7dbba2cf748b286e1bc00816fe649a.1711122067.git.thomas.lendacky@amd.com

x86/cpu: Add model number for another Intel Arrow Lake mobile processor

This one is the regular laptop CPU.

Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

x86/fpu: Keep xfd_state in sync with MSR_IA32_XFD

Commit 672365477ae8 ("x86/fpu: Update XFD state where required") and
commit 8bf26758ca96 ("x86/fpu: Add XFD state to fpstate") introduced a
per CPU variable xfd_state to keep the MSR_IA32_XFD value cached, in
order to avoid unnecessary writes to the MSR.

On CPU hotplug MSR_IA32_XFD is reset to the init_fpstate.xfd, which
wipes out any stale state. But the per CPU cached xfd value is not
reset, which brings them out of sync.

As a consequence a subsequent xfd_update_state() might fail to update
the MSR which in turn can result in XRSTOR raising a #NM in kernel
space, which crashes the kernel.

To fix this, introduce xfd_set_state() to write xfd_state together
with MSR_IA32_XFD, and use it in all places that set MSR_IA32_XFD.

Fixes: 672365477ae8 ("x86/fpu: Update XFD state where required")
Signed-off-by: Adamos Ttofari <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Closes: https://lore.kernel.org/lkml/[email protected]

Documentation/x86: Document that resctrl bandwidth control units are MiB

The memory bandwidth software controller uses 2^20 units rather than
10^6. See mbm_bw_count() which computes bandwidth using the "SZ_1M"
Linux define for 0x00100000.

Update the documentation to use MiB when describing this feature.
It's too late to fix the mount option "mba_MBps" as that is now an
established user interface.

Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Merge tag 'timers-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
"Two regression fixes for the timer and timer migration code:

   - Prevent endless timer requeuing which is caused by two CPUs racing
     out of idle. This happens when the last CPU goes idle and therefore
     has to ensure to expire the pending global timers and some other
     CPU come out of idle at the same time and the other CPU wins the
     race and expires the global queue. This causes the last CPU to
     chase ghost timers forever and reprogramming it's clockevent device
     endlessly.

     Cure this by re-evaluating the wakeup time unconditionally.

   - The split into local (pinned) and global timers in the timer wheel
     caused a regression for NOHZ full as it broke the idle tracking of
     global timers. On NOHZ full this prevents an self IPI being sent
     which in turn causes the timer to be not programmed and not being
     expired on time.

     Restore the idle tracking for the global timer base so that the
     self IPI condition for NOHZ full is working correctly again"

* tag 'timers-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers: Fix removed self-IPI on global timer's enqueue in nohz_full
  timers/migration: Fix endless timer requeue after idle interrupts

Merge tag 'timers-core-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull more clocksource updates from Thomas Gleixner:
"A set of updates for clocksource and clockevent drivers:

   - A fix for the prescaler of the ARM global timer where the prescaler
     mask define only covered 4 bits while it is actully 8 bits wide.
     This obviously restricted the possible range of prescaler
     adjustments

   - A fix for the RISC-V timer which prevents a timer interrupt being
     raised while the timer is initialized

   - A set of device tree updates to support new system on chips in
     various drivers

   - Kernel-doc and other cleanups all over the place"

* tag 'timers-core-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource/drivers/timer-riscv: Clear timer interrupt on timer initialization
  dt-bindings: timer: Add support for cadence TTC PWM
  clocksource/drivers/arm_global_timer: Simplify prescaler register access
  clocksource/drivers/arm_global_timer: Guard against division by zero
  clocksource/drivers/arm_global_timer: Make gt_target_rate unsigned long
  dt-bindings: timer: add Ralink SoCs system tick counter
  clocksource: arm_global_timer: fix non-kernel-doc comment
  clocksource/drivers/arm_global_timer: Remove stray tab
  clocksource/drivers/arm_global_timer: Fix maximum prescaler value
  clocksource/drivers/imx-sysctr: Add i.MX95 support
  clocksource/drivers/imx-sysctr: Drop use global variables
  dt-bindings: timer: nxp,sysctr-timer: support i.MX95
  dt-bindings: timer: renesas: ostm: Document RZ/Five SoC
  dt-bindings: timer: renesas,tmu: Document input capture interrupt
  clocksource/drivers/ti-32K: Fix misuse of "/**" comment
  clocksource/drivers/stm32: Fix all kernel-doc warnings
  dt-bindings: timer: exynos4210-mct: Add google,gs101-mct compatible
  clocksource/drivers/imx: Fix -Wunused-but-set-variable warning

Merge tag 'irq-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
"A series of fixes for the Renesas RZG21 interrupt chip driver to
  prevent spurious and misrouted interrupts.

   - Ensure that posted writes are flushed in the eoi() callback

   - Ensure that interrupts are masked at the chip level when the
     trigger type is changed

   - Clear the interrupt status register when setting up edge type
     trigger modes

   - Ensure that the trigger type and routing information is set before
     the interrupt is enabled"

* tag 'irq-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/renesas-rzg2l: Do not set TIEN and TINT source at the same time
  irqchip/renesas-rzg2l: Prevent spurious interrupts when setting trigger type
  irqchip/renesas-rzg2l: Rename rzg2l_irq_eoi()
  irqchip/renesas-rzg2l: Rename rzg2l_tint_eoi()
  irqchip/renesas-rzg2l: Flush posted write in irq_eoi()

Merge tag 'core-entry-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core entry fix from Thomas Gleixner:
"A single fix for the generic entry code:

  The trace_sys_enter() tracepoint can modify the syscall number via
  kprobes or BPF in pt_regs, but that requires that the syscall number
  is re-evaluted from pt_regs after the tracepoint.

  A seccomp fix in that area removed the re-evaluation so the change
  does not take effect as the code just uses the locally cached number.

  Restore the original behaviour by re-evaluating the syscall number
  after the tracepoint"

* tag 'core-entry-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  entry: Respect changes to system call number by trace_sys_enter()

Merge tag 'powerpc-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull more powerpc updates from Michael Ellerman:

- Handle errors in mark_rodata_ro() and mark_initmem_nx()

- Make struct crash_mem available without CONFIG_CRASH_DUMP

Thanks to Christophe Leroy and Hari Bathini.

* tag 'powerpc-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP dependency
  powerpc/kexec: split CONFIG_KEXEC_FILE and CONFIG_CRASH_DUMP
  kexec/kdump: make struct crash_mem available without CONFIG_CRASH_DUMP
  powerpc: Handle error in mark_rodata_ro() and mark_initmem_nx()

Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM updates from Russell King:

- remove a misuse of kernel-doc comment

- use "Call trace:" for backtraces like other architectures

- implement copy_from_kernel_nofault_allowed() to fix a LKDTM test

- add a "cut here" line for prefetch aborts

- remove unnecessary Kconfing entry for FRAME_POINTER

- remove iwmmxy support for PJ4/PJ4B cores

- use bitfield helpers in ptrace to improve readabililty

- check if folio is reserved before flushing

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 9359/1: flush: check if the folio is reserved for no-mapping addresses
  ARM: 9354/1: ptrace: Use bitfield helpers
  ARM: 9352/1: iwmmxt: Remove support for PJ4/PJ4B cores
  ARM: 9353/1: remove unneeded entry for CONFIG_FRAME_POINTER
  ARM: 9351/1: fault: Add "cut here" line for prefetch aborts
  ARM: 9350/1: fault: Implement copy_from_kernel_nofault_allowed()
  ARM: 9349/1: unwind: Add missing "Call trace:" line
  ARM: 9334/1: mm: init: remove misuse of kernel-doc comment

Merge tag 'hardening-v6.9-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull more hardening updates from Kees Cook:

- CONFIG_MEMCPY_SLOW_KUNIT_TEST is no longer needed (Guenter Roeck)

- Fix needless UTF-8 character in arch/Kconfig (Liu Song)

- Improve __counted_by warning message in LKDTM (Nathan Chancellor)

- Refactor DEFINE_FLEX() for default use of __counted_by

- Disable signed integer overflow sanitizer on GCC < 8

* tag 'hardening-v6.9-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  lkdtm/bugs: Improve warning message for compilers without counted_by support
  overflow: Change DEFINE_FLEX to take __counted_by member
  Revert "kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST"
  arch/Kconfig: eliminate needless UTF-8 character in Kconfig help
  ubsan: Disable signed integer overflow sanitizer on GCC < 8

x86/mpparse: Register APIC address only once

The APIC address is registered twice. First during the early detection and
afterwards when actually scanning the table for APIC IDs. The APIC and
topology core warn about the second attempt.

Restrict it to the early detection call.

Fixes: 81287ad65da5 ("x86/apic: Sanitize APIC address setup")
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

x86/topology: Handle the !APIC case gracefully

If there is no local APIC enumerated and registered then the topology
bitmaps are empty. Therefore, topology_init_possible_cpus() will die with
a division by zero exception.

Prevent this by registering a fake APIC id to populate the topology
bitmap. This also allows to use all topology query interfaces
unconditionally. It does not affect the actual APIC code because either
the local APIC address was not registered or no local APIC could be
detected.

Fixes: f1f758a80516 ("x86/topology: Add a mechanism to track topology via APIC IDs")
Reported-by: Guenter Roeck <[email protected]>
Reported-by: Linus Torvalds <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

x86/topology: Don't evaluate logical IDs during early boot

The local APICs have not yet been enumerated so the logical ID evaluation
from the topology bitmaps does not work and would return an error code.

Skip the evaluation during the early boot CPUID evaluation and only apply
it on the final run.

Fixes: 380414be78bf ("x86/cpu/topology: Use topology logical mapping mechanism")
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

x86/cpu: Ensure that CPU info updates are propagated on UP

The boot sequence evaluates CPUID information twice:

  1) During early boot

  2) When finalizing the early setup right before
     mitigations are selected and alternatives are patched.

In both cases the evaluation is stored in boot_cpu_data, but on UP the
copying of boot_cpu_data to the per CPU info of the boot CPU happens
between #1 and #2. So any update which happens in #2 is never propagated to
the per CPU info instance.

Consolidate the whole logic and copy boot_cpu_data right before applying
alternatives as that's the point where boot_cpu_data is in it's final
state and not supposed to change anymore.

This also removes the voodoo mb() from smp_prepare_cpus_common() which
had absolutely no purpose.

Fixes: 71eb4893cfaf ("x86/percpu: Cure per CPU madness on UP")
Reported-by: Guenter Roeck <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

bpf: verifier: reject addr_space_cast insn without arena

The verifier allows using the addr_space_cast instruction in a program
that doesn't have an associated arena. This was caught in the form an
invalid memory access in do_misc_fixups() when while converting
addr_space_cast to a normal 32-bit mov, env->prog->aux->arena was
dereferenced to check for BPF_F_NO_USER_CONV flag.

Reject programs that include the addr_space_cast instruction but don't
have an associated arena.

root@rv-tester:~# ./reproducer
Unable to handle kernel access to user memory without uaccess routines at virtual address 0000000000000030
Oops [#1]
[<ffffffff8017eeaa>] do_misc_fixups+0x43c/0x1168
[<ffffffff801936d6>] bpf_check+0xda8/0x22b6
[<ffffffff80174b32>] bpf_prog_load+0x486/0x8dc
[<ffffffff80176566>] __sys_bpf+0xbd8/0x214e
[<ffffffff80177d14>] __riscv_sys_bpf+0x22/0x2a
[<ffffffff80d2493a>] do_trap_ecall_u+0x102/0x17c
[<ffffffff80d3048c>] ret_from_exception+0x0/0x64

Fixes: 6082b6c328b5 ("bpf: Recognize addr_space_cast instruction in the verifier.")
Reported-by: xingwei lee <[email protected]>
Reported-by: yue sun <[email protected]>
Closes: https://lore.kernel.org/bpf/CABOYnLz09O1+2gGVJuCxd_24a-7UueXzV-Ff+Fr+h5EKFDiYCQ@mail.gmail.com/
Signed-off-by: Puranjay Mohan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

selftests/bpf: verifier_arena: fix mmap address for arm64

The arena_list selftest uses (1ull << 32) in the mmap address
computation for arm64. Use the same in the verifier_arena selftest.

This makes the selftest pass for arm64 on the CI[1].

[1] https://github.com/kernel-patches/bpf/pull/6622

Signed-off-by: Puranjay Mohan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

bpf: verifier: fix addr_space_cast from as(1) to as(0)

The verifier currently converts addr_space_cast from as(1) to as(0) that
is: BPF_ALU64 | BPF_MOV | BPF_X with off=1 and imm=1
to
BPF_ALU | BPF_MOV | BPF_X with imm=1 (32-bit mov)

Because of this imm=1, the JITs that have bpf_jit_needs_zext() == true,
interpret the converted instruction as BPF_ZEXT_REG(DST) which is a
special form of mov32, used for doing explicit zero extension on dst.
These JITs will just zero extend the dst reg and will not move the src to
dst before the zext.

Fix do_misc_fixups() to set imm=0 when converting addr_space_cast to a
normal mov32.

The JITs that have bpf_jit_needs_zext() == true rely on the verifier to
emit zext instructions. Mark dst_reg as subreg when doing cast from
as(1) to as(0) so the verifier emits a zext instruction after the mov.

Fixes: 6082b6c328b5 ("bpf: Recognize addr_space_cast instruction in the verifier.")
Signed-off-by: Puranjay Mohan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

ipv6: Fix address dump when IPv6 is disabled on an interface

Cited commit started returning an error when user space requests to dump
the interface's IPv6 addresses and IPv6 is disabled on the interface.
Restore the previous behavior and do not return an error.

Before cited commit:

# ip address show dev dummy1
2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
     link/ether 1a:52:02:5a:c2:6e brd ff:ff:ff:ff:ff:ff
     inet6 fe80::1852:2ff:fe5a:c26e/64 scope link proto kernel_ll
        valid_lft forever preferred_lft forever
# ip link set dev dummy1 mtu 1000
# ip address show dev dummy1
2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1000 qdisc noqueue state UNKNOWN group default qlen 1000
     link/ether 1a:52:02:5a:c2:6e brd ff:ff:ff:ff:ff:ff

After cited commit:

# ip address show dev dummy1
2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
     link/ether 1e:9b:94:00:ac:e8 brd ff:ff:ff:ff:ff:ff
     inet6 fe80::1c9b:94ff:fe00:ace8/64 scope link proto kernel_ll
        valid_lft forever preferred_lft forever
# ip link set dev dummy1 mtu 1000
# ip address show dev dummy1
RTNETLINK answers: No such device
Dump terminated

With this patch:

# ip address show dev dummy1
2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
     link/ether 42:35:fc:53:66:cf brd ff:ff:ff:ff:ff:ff
     inet6 fe80::4035:fcff:fe53:66cf/64 scope link proto kernel_ll
        valid_lft forever preferred_lft forever
# ip link set dev dummy1 mtu 1000
# ip address show dev dummy1
2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1000 qdisc noqueue state UNKNOWN group default qlen 1000
     link/ether 42:35:fc:53:66:cf brd ff:ff:ff:ff:ff:ff

Fixes: 9cc4cc329d30 ("ipv6: use xa_array iterator to implement inet6_dump_addr()")
Reported-by: Gal Pressman <[email protected]>
Closes: https://lore.kernel.org/netdev/[email protected]/
Tested-by: Gal Pressman <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

nexthop: fix uninitialized variable in nla_put_nh_group_stats()

The "*hw_stats_used" value needs to be set on the success paths to prevent
an uninitialized variable bug in the caller, nla_put_nh_group_stats().

Fixes: 5072ae00aea4 ("net: nexthop: Expose nexthop group HW stats to user space")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tools: ynl: fix setting presence bits in simple nests

When we set members of simple nested structures in requests
we need to set "presence" bits for all the nesting layers
below. This has nothing to do with the presence type of
the last layer.

Fixes: be5bea1cc0bf ("net: add basic C code generators for Netlink")
Reviewed-by: Breno Leitao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

lkdtm/bugs: Improve warning message for compilers without counted_by support

The current message for telling the user that their compiler does not
support the counted_by attribute in the FAM_BOUNDS test does not make
much sense either grammatically or semantically. Fix it to make it
correct in both aspects.

Signed-off-by: Nathan Chancellor <[email protected]>
Reviewed-by: Gustavo A. R. Silva <[email protected]>
Link: https://lore.kernel.org/r/20240321-lkdtm-improve-lack-of-counted_by-msg-v1-1-0fbf7481a29c@kernel.org
Signed-off-by: Kees Cook <[email protected]>

overflow: Change DEFINE_FLEX to take __counted_by member

The norm should be flexible array structures with __counted_by
annotations, so DEFINE_FLEX() is updated to expect that. Rename
the non-annotated version to DEFINE_RAW_FLEX(), and update the
few existing users. Additionally add selftests for the macros.

Reviewed-by: Gustavo A. R. Silva <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Kees Cook <[email protected]>

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull more SCSI updates from James Bottomley:
"The vfs has long had a write lifetime hint mechanism that gives the
  expected longevity on storage of the data being written. f2fs was the
  original consumer of this and used the hint for flash data placement
  (mostly to avoid write amplification by placing objects with similar
  lifetimes in the same erase block).

  More recently the SCSI based UFS (Universal Flash Storage) drivers
  have wanted to take advantage of this as well, for the same reasons as
  f2fs, necessitating plumbing the write hints through the block layer
  and then adding it to the SCSI core.

  The vfs write_hints already taken plumbs this as far as block and this
  completes the SCSI core enabling based on a recently agreed reuse of
  the old write command group number. The additions to the scsi_debug
  driver are for emulating this property so we can run tests on it in
  the absence of an actual UFS device"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: scsi_debug: Maintain write statistics per group number
  scsi: scsi_debug: Implement GET STREAM STATUS
  scsi: scsi_debug: Implement the IO Advice Hints Grouping mode page
  scsi: scsi_debug: Allocate the MODE SENSE response from the heap
  scsi: scsi_debug: Rework subpage code error handling
  scsi: scsi_debug: Rework page code error handling
  scsi: scsi_debug: Support the block limits extension VPD page
  scsi: scsi_debug: Reduce code duplication
  scsi: sd: Translate data lifetime information
  scsi: scsi_proto: Add structures and constants related to I/O groups and streams
  scsi: core: Query the Block Limits Extension VPD page

Merge tag 'block-6.9-20240322' of git://git.kernel.dk/linux

Pull more block updates from Jens Axboe:

- NVMe pull request via Keith:
     - Make an informative message less ominous (Keith)
     - Enhanced trace decoding (Guixin)
     - TCP updates (Hannes, Li)
     - Fabrics connect deadlock fix (Chunguang)
     - Platform API migration update (Uwe)
     - A new device quirk (Jiawei)

- Remove dead assignment in fd (Yufeng)

* tag 'block-6.9-20240322' of git://git.kernel.dk/linux:
  nvmet-rdma: remove NVMET_RDMA_REQ_INVALIDATE_RKEY flag
  nvme: remove redundant BUILD_BUG_ON check
  floppy: remove duplicated code in redo_fd_request()
  nvme/tcp: Add wq_unbound modparam for nvme_tcp_wq
  nvme-tcp: Export the nvme_tcp_wq to sysfs
  drivers/nvme: Add quirks for device 126f:2262
  nvme: parse format command's lbafu when tracing
  nvme: add tracing of reservation commands
  nvme: parse zns command's zsa and zrasf to string
  nvme: use nvme_disk_is_ns_head helper
  nvme: fix reconnection fail due to reserved tag allocation
  nvmet: add tracing of zns commands
  nvmet: add tracing of authentication commands
  nvme-apple: Convert to platform remove callback returning void
  nvmet-tcp: do not continue for invalid icreq
  nvme: change shutdown timeout setting message

Merge tag 'io_uring-6.9-20240322' of git://git.kernel.dk/linux

Pull more io_uring updates from Jens Axboe:
"One patch just missed the initial pull, the rest are either fixes or
  small cleanups that make our life easier for the next kernel:

   - Fix a potential leak in error handling of pinned pages, and clean
     it up (Gabriel, Pavel)

   - Fix an issue with how read multishot returns retry (me)

   - Fix a problem with waitid/futex removals, if we hit the case of
     needing to remove all of them at exit time (me)

   - Fix for a regression introduced in this merge window, where we
     don't always have sr->done_io initialized if the ->prep_async()
     path is used (me)

   - Fix for SQPOLL setup error handling (me)

   - Fix for a poll removal request being delayed (Pavel)

   - Rename of a struct member which had a confusing name (Pavel)"

* tag 'io_uring-6.9-20240322' of git://git.kernel.dk/linux:
  io_uring/sqpoll: early exit thread if task_context wasn't allocated
  io_uring: clear opcode specific data for an early failure
  io_uring/net: ensure async prep handlers always initialize ->done_io
  io_uring/waitid: always remove waitid entry for cancel all
  io_uring/futex: always remove futex entry for cancel all
  io_uring: fix poll_remove stalled req completion
  io_uring: Fix release of pinned pages when __io_uaddr_map fails
  io_uring/kbuf: rename is_mapped
  io_uring: simplify io_pages_free
  io_uring: clean rings on NO_MMAP alloc fail
  io_uring/rw: return IOU_ISSUE_SKIP_COMPLETE for multishot retry
  io_uring: don't save/restore iowait state

Merge tag 'for-6.9/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

- Fix a memory leak in DM integrity recheck code that was added during
   the 6.9 merge. Also fix the recheck code to ensure it issues bios
   with proper alignment.

- Fix DM snapshot's dm_exception_table_exit() to schedule while
   handling an large exception table during snapshot device shutdown.

* tag 'for-6.9/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm-integrity: align the outgoing bio in integrity_recheck
  dm snapshot: fix lockup in dm_exception_table_exit
  dm-integrity: fix a memory leak when rechecking the data

Merge tag 'ceph-for-6.9-rc1' of https://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
"A patch to minimize blockage when processing very large batches of
  dirty caps and two fixes to better handle EOF in the face of multiple
  clients performing reads and size-extending writes at the same time"

* tag 'ceph-for-6.9-rc1' of https://github.com/ceph/ceph-client:
  ceph: set correct cap mask for getattr request for read
  ceph: stop copying to iter at EOF on sync reads
  ceph: remove SLAB_MEM_SPREAD flag usage
  ceph: break the check delayed cap loop every 5s

Merge tag 'xfs-6.9-merge-9' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Chandan Babu:

- Fix invalid pointer dereference by initializing xmbuf before
   tracepoint function is invoked

- Use memalloc_nofs_save() when inserting into quota radix tree

* tag 'xfs-6.9-merge-9' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: quota radix tree allocations need to be NOFS on insert
  xfs: fix dev_t usage in xmbuf tracepoints

Merge tag 'riscv-for-linus-6.9-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

- Support for various vector-accelerated crypto routines

- Hibernation is now enabled for portable kernel builds

- mmap_rnd_bits_max is larger on systems with larger VAs

- Support for fast GUP

- Support for membarrier-based instruction cache synchronization

- Support for the Andes hart-level interrupt controller and PMU

- Some cleanups around unaligned access speed probing and Kconfig
   settings

- Support for ACPI LPI and CPPC

- Various cleanus related to barriers

- A handful of fixes

* tag 'riscv-for-linus-6.9-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (66 commits)
  riscv: Fix syscall wrapper for >word-size arguments
  crypto: riscv - add vector crypto accelerated AES-CBC-CTS
  crypto: riscv - parallelize AES-CBC decryption
  riscv: Only flush the mm icache when setting an exec pte
  riscv: Use kcalloc() instead of kzalloc()
  riscv/barrier: Add missing space after ','
  riscv/barrier: Consolidate fence definitions
  riscv/barrier: Define RISCV_FULL_BARRIER
  riscv/barrier: Define __{mb,rmb,wmb}
  RISC-V: defconfig: Enable CONFIG_ACPI_CPPC_CPUFREQ
  cpufreq: Move CPPC configs to common Kconfig and add RISC-V
  ACPI: RISC-V: Add CPPC driver
  ACPI: Enable ACPI_PROCESSOR for RISC-V
  ACPI: RISC-V: Add LPI driver
  cpuidle: RISC-V: Move few functions to arch/riscv
  riscv: Introduce set_compat_task() in asm/compat.h
  riscv: Introduce is_compat_thread() into compat.h
  riscv: add compile-time test into is_compat_task()
  riscv: Replace direct thread flag check with is_compat_task()
  riscv: Improve arch_get_mmap_end() macro
  ...

Merge tag 'loongarch-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch updates from Huacai Chen:

- Add objtool support for LoongArch

- Add ORC stack unwinder support for LoongArch

- Add kernel livepatching support for LoongArch

- Select ARCH_HAS_CURRENT_STACK_POINTER in Kconfig

- Select HAVE_ARCH_USERFAULTFD_MINOR in Kconfig

- Some bug fixes and other small changes

* tag 'loongarch-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch/crypto: Clean up useless assignment operations
  LoongArch: Define the __io_aw() hook as mmiowb()
  LoongArch: Remove superfluous flush_dcache_page() definition
  LoongArch: Move {dmw,tlb}_virt_to_page() definition to page.h
  LoongArch: Change __my_cpu_offset definition to avoid mis-optimization
  LoongArch: Select HAVE_ARCH_USERFAULTFD_MINOR in Kconfig
  LoongArch: Select ARCH_HAS_CURRENT_STACK_POINTER in Kconfig
  LoongArch: Add kernel livepatching support
  LoongArch: Add ORC stack unwinder support
  objtool: Check local label in read_unwind_hints()
  objtool: Check local label in add_dead_ends()
  objtool/LoongArch: Enable orc to be built
  objtool/x86: Separate arch-specific and generic parts
  objtool/LoongArch: Implement instruction decoder
  objtool/LoongArch: Enable objtool to be built

Merge tag 'fbdev-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev

Pull fbdev updates from Helge Deller:

- Allow console fonts up to 64x128 pixels (Samuel Thibault)

- Prevent division-by-zero in fb monitor code (Roman Smirnov)

- Drop Renesas ARM platforms from Mobile LCDC framebuffer driver (Geert
   Uytterhoeven)

- Various code cleanups in viafb, uveafb and mb862xxfb drivers by
   Aleksandr Burakov, Li Zhijian and Michael Ellerman

* tag 'fbdev-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbdev: panel-tpo-td043mtea1: Convert sprintf() to sysfs_emit()
  fbmon: prevent division by zero in fb_videomode_from_videomode()
  fbcon: Increase maximum font width x height to 64 x 128
  fbdev: viafb: fix typo in hw_bitblt_1 and hw_bitblt_2
  fbdev: mb862xxfb: Fix defined but not used error
  fbdev: uvesafb: Convert sprintf/snprintf to sysfs_emit
  fbdev: Restrict FB_SH_MOBILE_LCDC to SuperH

Merge tag 'spi-fix-v6.9-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A small collection of fixes that came in since the merge window. Most
  of it is relatively minor driver specific fixes, there's also fixes
  for error handling with SPI flash devices and a fix restoring delay
  control functionality for non-GPIO chip selects managed by the core"

* tag 'spi-fix-v6.9-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spi-mt65xx: Fix NULL pointer access in interrupt handler
  spi: docs: spidev: fix echo command format
  spi: spi-imx: fix off-by-one in mx51 CPU mode burst length
  spi: lm70llp: fix links in doc and comments
  spi: Fix error code checking in spi_mem_exec_op()
  spi: Restore delays for non-GPIO chip select
  spi: lpspi: Avoid potential use-after-free in probe()

Merge tag 'regulator-fix-v6.9-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
"One fix that came in during the merge window, fixing a problem with
  bootstrapping the state of exclusive regulators which have a parent
  regulator"

* tag 'regulator-fix-v6.9-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: core: Propagate the regulator state in case of exclusive get

Merge tag 'sound-fix2-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull more sound fixes from Takashi Iwai:
"The remaining fixes for 6.9-rc1 that have been gathered in this week.

  More about ASoC at this time (one long-standing fix for compress
  offload, SOF, AMD ACP, Rockchip, Cirrus and tlv320 stuff) while
  another regression fix in ALSA core and a couple of HD-audio quirks as
  usual are included"

* tag 'sound-fix2-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: control: Fix unannotated kfree() cleanup
  ALSA: hda/realtek: Add quirks for some Clevo laptops
  ALSA: hda/realtek: Add quirk for HP Spectre x360 14 eu0000
  ALSA: hda/realtek: fix the hp playback volume issue for LG machines
  ASoC: soc-compress: Fix and add DPCM locking
  ASoC: SOF: amd: Skip IRAM/DRAM size modification for Steam Deck OLED
  ASoC: SOF: amd: Move signed_fw_image to struct acp_quirk_entry
  ASoC: amd: yc: Revert "add new YC platform variant (0x63) support"
  ASoC: amd: yc: Revert "Fix non-functional mic on Lenovo 21J2"
  ASoC: soc-core.c: Skip dummy codec when adding platforms
  ASoC: rockchip: i2s-tdm: Fix inaccurate sampling rates
  ASoC: dt-bindings: cirrus,cs42l43: Fix 'gpio-ranges' schema
  ASoC: amd: yc: Fix non-functional mic on ASUS M7600RE
  ASoC: tlv320adc3xxx: Don't strip remove function when driver is builtin

Merge tag 'i2c-for-6.9-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull more i2c updates from Wolfram Sang:
"Some more I2C updates after the dependencies have been merged now.

  Plus a DT binding fix"

* tag 'i2c-for-6.9-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  dt-bindings: i2c: qcom,i2c-cci: Fix OV7251 'data-lanes' entries
  i2c: muxes: pca954x: Allow sharing reset GPIO
  i2c: nomadik: sort includes
  i2c: nomadik: support Mobileye EyeQ5 I2C controller
  i2c: nomadik: fetch i2c-transfer-timeout-us property from devicetree
  i2c: nomadik: replace jiffies by ktime for FIFO flushing timeout
  i2c: nomadik: support short xfer timeouts using waitqueue & hrtimer
  i2c: nomadik: use bitops helpers
  i2c: nomadik: simplify IRQ masking logic
  i2c: nomadik: rename private struct pointers from dev to priv
  dt-bindings: i2c: nomadik: add mobileye,eyeq5-i2c bindings and example

efi/libstub: fix efi_random_alloc() to allocate memory at alloc_min or higher address

Following warning is sometimes observed while booting my servers:
  [    3.594838] DMA: preallocated 4096 KiB GFP_KERNEL pool for atomic allocations
  [    3.602918] swapper/0: page allocation failure: order:10, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0-1
  ...
  [    3.851862] DMA: preallocated 1024 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation

If 'nokaslr' boot option is set, the warning always happens.

On x86, ZONE_DMA is small zone at the first 16MB of physical address
space. When this problem happens, most of that space seems to be used by
decompressed kernel. Thereby, there is not enough space at DMA_ZONE to
meet the request of DMA pool allocation.

The commit 2f77465b05b1 ("x86/efistub: Avoid placing the kernel below
LOAD_PHYSICAL_ADDR") tried to fix this problem by introducing lower
bound of allocation.

But the fix is not complete.

efi_random_alloc() allocates pages by following steps.
1. Count total available slots ('total_slots')
2. Select a slot ('target_slot') to allocate randomly
3. Calculate a starting address ('target') to be included target_slot
4. Allocate pages, which starting address is 'target'

In step 1, 'alloc_min' is used to offset the starting address of memory
chunk. But in step 3 'alloc_min' is not considered at all.  As the
result, 'target' can be miscalculated and become lower than 'alloc_min'.

When KASLR is disabled, 'target_slot' is always 0 and the problem
happens everytime if the EFI memory map of the system meets the
condition.

Fix this problem by calculating 'target' considering 'alloc_min'.

Cc: [email protected]
Cc: Tom Englund <[email protected]>
Cc: [email protected]
Fixes: 2f77465b05b1 ("x86/efistub: Avoid placing the kernel below LOAD_PHYSICAL_ADDR")
Signed-off-by: Kazuma Kondo <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>

crypto: iaa - Fix nr_cpus < nr_iaa case

If nr_cpus < nr_iaa, the calculated cpus_per_iaa will be 0, which
causes a divide-by-0 in rebalance_wq_table().

Make sure cpus_per_iaa is 1 in that case, and also in the nr_iaa == 0
case, even though cpus_per_iaa is never used if nr_iaa == 0, for
paranoia.

Cc: <[email protected]> # v6.8+
Reported-by: Jerry Snitselaar <[email protected]>
Signed-off-by: Tom Zanussi <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

Revert "crypto: pkcs7 - remove sha1 support"

This reverts commit 16ab7cb5825fc3425c16ad2c6e53d827f382d7c6 because it
broke iwd.  iwd uses the KEYCTL_PKEY_* UAPIs via its dependency libell,
and apparently it is relying on SHA-1 signature support.  These UAPIs
are fairly obscure, and their documentation does not mention which
algorithms they support.  iwd really should be using a properly
supported userspace crypto library instead.  Regardless, since something
broke we have to revert the change.

It may be possible that some parts of this commit can be reinstated
without breaking iwd (e.g. probably the removal of MODULE_SIG_SHA1), but
for now this just does a full revert to get things working again.

Reported-by: Karel Balej <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]
Cc: Dimitri John Ledkov <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
Tested-by: Karel Balej <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe address

Read from an unsafe address with copy_from_kernel_nofault() in
arch_adjust_kprobe_addr() because this function is used before checking
the address is in text or not. Syzcaller bot found a bug and reported
the case if user specifies inaccessible data area,
arch_adjust_kprobe_addr() will cause a kernel panic.

[ mingo: Clarified the comment. ]

Fixes: cc66bb914578 ("x86/ibt,kprobes: Cure sym+0 equals fentry woes")
Reported-by: Qiang Zhang <[email protected]>
Tested-by: Jinghao Jia <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/171042945004.154897.2221804961882915806.stgit@devnote2

x86/pm: Work around false positive kmemleak report in msr_build_context()

Since:

  7ee18d677989 ("x86/power: Make restore_processor_context() sane")

kmemleak reports this issue:

  unreferenced object 0xf68241e0 (size 32):
    comm "swapper/0", pid 1, jiffies 4294668610 (age 68.432s)
    hex dump (first 32 bytes):
      00 cc cc cc 29 10 01 c0 00 00 00 00 00 00 00 00  ....)...........
      00 42 82 f6 cc cc cc cc cc cc cc cc cc cc cc cc  .B..............
    backtrace:
      [<461c1d50>] __kmem_cache_alloc_node+0x106/0x260
      [<ea65e13b>] __kmalloc+0x54/0x160
      [<c3858cd2>] msr_build_context.constprop.0+0x35/0x100
      [<46635aff>] pm_check_save_msr+0x63/0x80
      [<6b6bb938>] do_one_initcall+0x41/0x1f0
      [<3f3add60>] kernel_init_freeable+0x199/0x1e8
      [<3b538fde>] kernel_init+0x1a/0x110
      [<938ae2b2>] ret_from_fork+0x1c/0x28

Which is a false positive.

Reproducer:

  - Run rsync of whole kernel tree (multiple times if needed).
  - start a kmemleak scan
  - Note this is just an example: a lot of our internal tests hit these.

The root cause is similar to the fix in:

  b0b592cf0836 x86/pm: Fix false positive kmemleak report in msr_build_context()

ie. the alignment within the packed struct saved_context
which has everything unaligned as there is only "u16 gs;" at start of
struct where in the past there were four u16 there thus aligning
everything afterwards.  The issue is with the fact that Kmemleak only
searches for pointers that are aligned (see how pointers are scanned in
kmemleak.c) so when the struct members are not aligned it doesn't see
them.

Testing:

We run a lot of tests with our CI, and after applying this fix we do not
see any kmemleak issues any more whilst without it we see hundreds of
the above report. From a single, simple test run consisting of 416 individual test
cases on kernel 5.10 x86 with kmemleak enabled we got 20 failures due to this,
which is quite a lot. With this fix applied we get zero kmemleak related failures.

Fixes: 7ee18d677989 ("x86/power: Make restore_processor_context() sane")
Signed-off-by: Anton Altaparmakov <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: "Rafael J. Wysocki" <[email protected]>
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet

syzbot reported the following uninit-value access issue [1][2]:

nci_rx_work() parses and processes received packet. When the payload
length is zero, each message type handler reads uninitialized payload
and KMSAN detects this issue. The receipt of a packet with a zero-size
payload is considered unexpected, and therefore, such packets should be
silently discarded.

This patch resolved this issue by checking payload size before calling
each message type handler codes.

Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
Reported-and-tested-by: [email protected]
Reported-and-tested-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=7ea9413ea6749baf5574 [1]
Closes: https://syzkaller.appspot.com/bug?extid=29b5ca705d2e0f4a44d2 [2]
Signed-off-by: Ryosuke Yasuoka <[email protected]>
Reviewed-by: Jeremy Cline <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

x86/kexec: Do not update E820 kexec table for setup_data

crashkernel reservation failed on a Thinkpad t440s laptop recently.
Actually the memblock reservation succeeded, but later insert_resource()
failed.

Test steps:
  kexec load -> /* make sure add crashkernel param eg. crashkernel=160M */
    kexec reboot ->
        dmesg|grep "crashkernel reserved";
            crashkernel memory range like below reserved successfully:
              0x00000000d0000000 - 0x00000000da000000
        But no such "Crash kernel" region in /proc/iomem

The background story:

Currently the E820 code reserves setup_data regions for both the current
kernel and the kexec kernel, and it inserts them into the resources list.

Before the kexec kernel reboots nobody passes the old setup_data, and
kexec only passes fresh SETUP_EFI/SETUP_IMA/SETUP_RNG_SEED if needed.
Thus the old setup data memory is not used at all.

Due to old kernel updates the kexec e820 table as well so kexec kernel
sees them as E820_TYPE_RESERVED_KERN regions, and later the old setup_data
regions are inserted into resources list in the kexec kernel by
e820__reserve_resources().

Note, due to no setup_data is passed in for those old regions they are not
early reserved (by function early_reserve_memory), and the crashkernel
memblock reservation will just treat them as usable memory and it could
reserve the crashkernel region which overlaps with the old setup_data
regions. And just like the bug I noticed here, kdump insert_resource
failed because e820__reserve_resources has added the overlapped chunks
in /proc/iomem already.

Finally, looking at the code, the old setup_data regions are not used
at all as no setup_data is passed in by the kexec boot loader. Although
something like SETUP_PCI etc could be needed, kexec should pass
the info as new setup_data so that kexec kernel can take care of them.
This should be taken care of in other separate patches if needed.

Thus drop the useless buggy code here.

Signed-off-by: Dave Young <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Jiri Bohac <[email protected]>
Cc: Eric DeVolder <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

binfmt: replace deprecated strncpy

strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

There is a _nearly_ identical implementation of fill_psinfo present in
binfmt_elf.c -- except that one uses get_task_comm over strncpy(). Let's
mirror that in binfmt_elf_fdpic.c

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Link: https://github.com/KSPP/linux/issues/90
Cc: <[email protected]>
Signed-off-by: Justin Stitt <[email protected]>
Link: https://lore.kernel.org/r/20240321-strncpy-fs-binfmt_elf_fdpic-c-v2-1-0b6daec6cc56@google.com
Signed-off-by: Kees Cook <[email protected]>

Merge tag '6.9-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- Various get_inode_info_fixes

- Fix for querying xattrs of cached dirs

- Four minor cleanup fixes (including adding some header corrections
   and a missing flag)

- Performance improvement for deferred close

- Two query interface fixes

* tag '6.9-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  smb311: additional compression flag defined in updated protocol spec
  smb311: correct incorrect offset field in compression header
  cifs: Move some extern decls from .c files to .h
  cifs: remove redundant variable assignment
  cifs: fixes for get_inode_info
  cifs: open_cached_dir(): add FILE_READ_EA to desired access
  cifs: reduce warning log level for server not advertising interfaces
  cifs: make sure server interfaces are requested only for SMB3+
  cifs: defer close file handles having RH lease

Merge tag 'drm-next-2024-03-22' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Fixes from the last week (or 3 weeks in amdgpu case), after amdgpu,
  it's xe and nouveau then a few scattered core fixes.

  core:
   - fix rounding in drm_fixp2int_round()

  bridge:
   - fix documentation for DRM_BRIDGE_OP_EDID

  sun4i:
   - fix 64-bit division on 32-bit architectures

  tests:
   - fix dependency on DRM_KMS_HELPER

  probe-helper:
   - never return negative values from .get_modes() plus driver fixes

  xe:
   - invalidate userptr vma on page pin fault
   - fail early on sysfs file creation error
   - skip VMA pinning on xe_exec if no batches

  nouveau:
   - clear bo resource bus after eviction
   - documentation fixes
   - don't check devinit disable on GSP

  amdgpu:
   - Freesync fixes
   - UAF IOCTL fixes
   - Fix mmhub client ID mapping
   - IH 7.0 fix
   - DML2 fixes
   - VCN 4.0.6 fix
   - GART bind fix
   - GPU reset fix
   - SR-IOV fix
   - OD table handling fixes
   - Fix TA handling on boards without display hardware
   - DML1 fix
   - ABM fix
   - eDP panel fix
   - DPPCLK fix
   - HDCP fix
   - Revert incorrect error case handling in ioremap
   - VPE fix
   - HDMI fixes
   - SDMA 4.4.2 fix
   - Other misc fixes

  amdkfd:
   - Fix duplicate BO handling in process restore"

* tag 'drm-next-2024-03-22' of https://gitlab.freedesktop.org/drm/kernel: (50 commits)
  drm/amdgpu/pm: Don't use OD table on Arcturus
  drm/amdgpu: drop setting buffer funcs in sdma442
  drm/amd/display: Fix noise issue on HDMI AV mute
  drm/amd/display: Revert Remove pixle rate limit for subvp
  Revert "drm/amdgpu/vpe: don't emit cond exec command under collaborate mode"
  Revert "drm/amd/amdgpu: Fix potential ioremap() memory leaks in amdgpu_device_init()"
  drm/amd/display: Add a dc_state NULL check in dc_state_release
  drm/amd/display: Return the correct HDCP error code
  drm/amd/display: Implement wait_for_odm_update_pending_complete
  drm/amd/display: Lock all enabled otg pipes even with no planes
  drm/amd/display: Amend coasting vtotal for replay low hz
  drm/amd/display: Fix idle check for shared firmware state
  drm/amd/display: Update odm when ODM combine is changed on an otg master pipe with no plane
  drm/amd/display: Init DPPCLK from SMU on dcn32
  drm/amd/display: Add monitor patch for specific eDP
  drm/amd/display: Allow dirty rects to be sent to dmub when abm is active
  drm/amd/display: Override min required DCFCLK in dml1_validate
  drm/amdgpu: Bypass display ta if display hw is not available
  drm/amdgpu: correct the KGQ fallback message
  drm/amdgpu/pm: Check the validity of overdiver power limit
  ...

Merge tag 'amd-drm-fixes-6.9-2024-03-21' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-fixes-6.9-2024-03-21:

amdgpu:
- Freesync fixes
- UAF IOCTL fixes
- Fix mmhub client ID mapping
- IH 7.0 fix
- DML2 fixes
- VCN 4.0.6 fix
- GART bind fix
- GPU reset fix
- SR-IOV fix
- OD table handling fixes
- Fix TA handling on boards without display hardware
- DML1 fix
- ABM fix
- eDP panel fix
- DPPCLK fix
- HDCP fix
- Revert incorrect error case handling in ioremap
- VPE fix
- HDMI fixes
- SDMA 4.4.2 fix
- Other misc fixes

amdkfd:
- Fix duplicate BO handling in process restore

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

- Re-instate the CPUMASK_OFFSTACK option for arm64 when NR_CPUS > 256.
   The bug that led to the initial revert was the cpufreq-dt code not
   using zalloc_cpumask_var().

- Make the STARFIVE_STARLINK_PMU config option depend on 64BIT to
   prevent compile-test failures on 32-bit architectures due to missing
   writeq().

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  perf: starfive: fix 64-bit only COMPILE_TEST condition
  ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512