Linus Torvalds [Thu, 18 Jan 2024 21:41:48 +0000 (13:41 -0800)]
Merge tag 'x86_tdx_for_6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 TDX updates from Dave Hansen:
"This contains the initial support for host-side TDX support so that
KVM can run TDX-protected guests. This does not include the actual
KVM-side support which will come from the KVM folks. The TDX host
interactions with kexec also needs to be ironed out before this is
ready for prime time, so this code is currently Kconfig'd off when
kexec is on.
The majority of the code here is the kernel telling the TDX module
which memory to protect and handing some additional memory over to it
to use to store TDX module metadata. That sounds pretty simple, but
the TDX architecture is rather flexible and it takes quite a bit of
back-and-forth to say, "just protect all memory, please."
There is also some code tacked on near the end of the series to handle
a hardware erratum. The erratum can make software bugs such as a
kernel write to TDX-protected memory cause a machine check and
masquerade as a real hardware failure. The erratum handling watches
out for these and tries to provide nicer user errors"
* tag 'x86_tdx_for_6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
x86/virt/tdx: Make TDX host depend on X86_MCE
x86/virt/tdx: Disable TDX host support when kexec is enabled
Documentation/x86: Add documentation for TDX host support
x86/mce: Differentiate real hardware #MCs from TDX erratum ones
x86/cpu: Detect TDX partial write machine check erratum
x86/virt/tdx: Handle TDX interaction with sleep and hibernation
x86/virt/tdx: Initialize all TDMRs
x86/virt/tdx: Configure global KeyID on all packages
x86/virt/tdx: Configure TDX module with the TDMRs and global KeyID
x86/virt/tdx: Designate reserved areas for all TDMRs
x86/virt/tdx: Allocate and set up PAMTs for TDMRs
x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions
x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX memory regions
x86/virt/tdx: Get module global metadata for module initialization
x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory
x86/virt/tdx: Add skeleton to enable TDX on demand
x86/virt/tdx: Add SEAMCALL error printing for module initialization
x86/virt/tdx: Handle SEAMCALL no entropy error in common code
x86/virt/tdx: Make INTEL_TDX_HOST depend on X86_X2APIC
x86/virt/tdx: Define TDX supported page sizes as macros
...
Ma Jun [Fri, 12 Jan 2024 05:33:24 +0000 (13:33 +0800)]
drm/amdgpu: Fix the null pointer when load rlc firmware
If the RLC firmware is invalid because of wrong header size,
the pointer to the rlc firmware is released in function
amdgpu_ucode_request. There will be a null pointer error
in subsequent use. So skip validation to fix it.
Wayne Lin [Tue, 2 Jan 2024 06:20:37 +0000 (14:20 +0800)]
drm/amd/display: Align the returned error code with legacy DP
[Why]
For usb4 connector, AUX transaction is handled by dmub utilizing a differnt
code path comparing to legacy DP connector. If the usb4 DP connector is
disconnected, AUX access will report EBUSY and cause igt@kms_dp_aux_dev
fail.
[How]
Align the error code with the one reported by legacy DP as EIO.
Ilya Bakoulin [Wed, 3 Jan 2024 14:42:04 +0000 (09:42 -0500)]
drm/amd/display: Clear OPTC mem select on disable
[Why]
Not clearing the memory select bits prior to OPTC disable can cause DSC
corruption issues when attempting to reuse a memory instance for another
OPTC that enables ODM.
[How]
Clear the memory select bits prior to disabling an OPTC.
Dillon Varone [Fri, 29 Dec 2023 02:36:39 +0000 (21:36 -0500)]
drm/amd/display: Init link enc resources in dc_state only if res_pool presents
[Why & How]
res_pool is not initialized in all situations such as virtual
environments, and therefore link encoder resources should not be
initialized if res_pool is NULL.
drm/amd/display: Fix late derefrence 'dsc' check in 'link_set_dsc_pps_packet()'
In link_set_dsc_pps_packet(), 'struct display_stream_compressor *dsc'
was dereferenced in a DC_LOGGER_INIT(dsc->ctx->logger); before the 'dsc'
NULL pointer check.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_dpms.c:905 link_set_dsc_pps_packet() warn: variable dereferenced before check 'dsc' (see line 903)
Linus Torvalds [Thu, 18 Jan 2024 21:23:53 +0000 (13:23 -0800)]
Merge tag 'x86_sgx_for_6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SGX updates from Dave Hansen:
"This time, these are entirely confined to SGX selftests fixes.
The mini SGX enclave built by the selftests has garnered some
attention because it stands alone and does not need the sizable
infrastructure of the official SGX SDK. I think that's why folks are
suddently interested in cleaning it up.
- Clean up selftest compilation issues, mostly from non-gcc compilers
- Avoid building selftests when not on x86"
* tag 'x86_sgx_for_6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/sgx: Skip non X86_64 platform
selftests/sgx: Remove incomplete ABI sanitization code in test enclave
selftests/sgx: Discard unsupported ELF sections
selftests/sgx: Ensure expected location of test enclave buffer
selftests/sgx: Ensure test enclave buffer is entirely preserved
selftests/sgx: Fix linker script asserts
selftests/sgx: Handle relocations in test enclave
selftests/sgx: Produce static-pie executable for test enclave
selftests/sgx: Remove redundant enclave base address save/restore
selftests/sgx: Specify freestanding environment for enclave compilation
selftests/sgx: Separate linker options
selftests/sgx: Include memory clobber for inline asm in test enclave
selftests/sgx: Fix uninitialized pointer dereferences in encl_get_entry
selftests/sgx: Fix uninitialized pointer dereference in error path
Jakub Kicinski [Thu, 18 Jan 2024 20:45:04 +0000 (12:45 -0800)]
Merge tag 'nf-24-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following batch contains Netfilter fixes for net. Slightly larger
than usual because this batch includes several patches to tighten the
nf_tables control plane to reject inconsistent configuration:
1) Restrict NFTA_SET_POLICY to NFT_SET_POL_PERFORMANCE and
NFT_SET_POL_MEMORY.
2) Bail out if a nf_tables expression registers more than 16 netlink
attributes which is what struct nft_expr_info allows.
3) Bail out if NFT_EXPR_STATEFUL provides no .clone interface, remove
existing fallback to memcpy() when cloning which might accidentally
duplicate memory reference to the same object.
4) Fix br_netfilter interaction with neighbour layer. This requires
three preparation patches:
- Use nf_bridge_get_physinif() in nfnetlink_log
- Use nf_bridge_info_exists() to check in br_netfilter context
is available in nf_queue.
- Pass net to nf_bridge_get_physindev()
And finally, the fix which replaces physindev with physinif
in nf_bridge_info.
Patches from Pavel Tikhomirov.
5) Catch-all deactivation happens in the transaction, hence this
oneliner to check for the next generation. This bug uncovered after
the removal of the _BUSY bit, which happened in set elements back in
summer 2023.
6) Ensure set (total) key length size and concat field length description
is consistent, otherwise bail out.
7) Skip set element with the _DEAD flag on from the netlink dump path.
A tests occasionally shows that dump is mismatching because GC might
lose race to get rid of this element while a netlink dump is in
progress.
8) Reject NFT_SET_CONCAT for field_count < 1.
9) Use IP6_INC_STATS in ipvs to fix preemption BUG splat, patch
from Fedor Pchelkin.
* tag 'nf-24-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
ipvs: avoid stat macros calls from preemptible context
netfilter: nf_tables: reject NFT_SET_CONCAT with not field length description
netfilter: nf_tables: skip dead set elements in netlink dump
netfilter: nf_tables: do not allow mismatch field size and set key length
netfilter: nf_tables: check if catch-all set element is active in next generation
netfilter: bridge: replace physindev with physinif in nf_bridge_info
netfilter: propagate net to nf_bridge_get_physindev
netfilter: nf_queue: remove excess nf_bridge variable
netfilter: nfnetlink_log: use proper helper for fetching physinif
netfilter: nft_limit: do not ignore unsupported flags
netfilter: nf_tables: bail out if stateful expression provides no .clone
netfilter: nf_tables: validate .maxattr at expression registration
netfilter: nf_tables: reject invalid set policy
====================
Kees Cook [Wed, 10 Jan 2024 23:54:41 +0000 (15:54 -0800)]
bcachefs: Replace strlcpy() with strscpy()
strlcpy() reads the entire source buffer first. This read may exceed
the destination size limit. This is both inefficient and can lead
to linear read overflows if a source string is not NUL-terminated[1].
Additionally, it returns the size of the source string, not the
resulting size of the destination string. In an effort to remove strlcpy()
completely[2], replace strlcpy() here with strscpy().
Nothing checks the return value here, so a direct replacement with
strspy() is possible.
Alain Volmat [Fri, 15 Dec 2023 17:06:10 +0000 (18:06 +0100)]
i2c: stm32f7: add support for stm32mp25 soc
The stm32mp25 has only a single interrupt line used for both
events and errors. In order to cope with that, reorganise the
error handling code so that it can be called either from the
common handler (used in case of SoC having only a single IT line)
and the error handler for others.
The CR1 register also embeds a new FMP bit, necessary when running
at Fast Mode Plus frequency. This bit should be used instead of
the SYSCFG bit used on other platforms.
Add a new compatible to distinguish between the SoCs and two
boolean within the setup structure in order to know if the
platform has a single/multiple IT lines and if the FMP bit
within CR1 is available or not.
Add a new compatible st,stm32mp25-i2c for the STM32MP25 series which
has only one interrupt line for both events and errors and differs in
term of handling of FastModePlus.
Alain Volmat [Fri, 15 Dec 2023 17:06:07 +0000 (18:06 +0100)]
i2c: stm32f7: simplify status messages in case of errors
Avoid usage of __func__ when reporting an error message
since dev_err/dev_dbg are already providing enough details
to identify the source of the message.
Alain Volmat [Fri, 15 Dec 2023 17:06:06 +0000 (18:06 +0100)]
i2c: stm32f7: perform most of irq job in threaded handler
The irq handling is currently split between the irq handler
and the threaded irq handler. Some of the handling (such as
dma related stuffs) done within the irq handler might sleep or
take some time leading to issues if the kernel is built with
realtime constraints. In order to fix that, perform an overall
rework to perform most of the job within the threaded handler
and only keep fifo access in the non threaded handler.
Paul Menzel [Wed, 20 Dec 2023 16:10:02 +0000 (17:10 +0100)]
i2c: i801: Add lis3lv02d for Dell XPS 15 7590
On the Dell XPS 15 7590/0VYV0G, BIOS 1.24.0 09/11/2023, Linux prints the
warning below.
i801_smbus 0000:00:1f.4: Accelerometer lis3lv02d is present on SMBus but its address is unknown, skipping registration
Following the same suggestions by Wolfram Sang as for the Dell Precision
3540 [1], the accelerometer can be successfully found on I2C bus 2 at
address 0x29.
$ echo lis3lv02d 0x29 | sudo tee /sys/bus/i2c/devices/i2c-2/new_device
lis3lv02d 0x29
$ dmesg | tail -5
[ 549.522876] lis3lv02d_i2c 2-0029: supply Vdd not found, using dummy regulator
[ 549.522904] lis3lv02d_i2c 2-0029: supply Vdd_IO not found, using dummy regulator
[ 549.542486] lis3lv02d: 8 bits 3DC sensor found
[ 549.630022] input: ST LIS3LV02DL Accelerometer as /devices/platform/lis3lv02d/input/input35
[ 549.630586] i2c i2c-2: new_device: Instantiated device lis3lv02d at 0x29
So, the device has that accelerometer. Add the I2C address to the
mapping list, and test it successfully on the device.
Heiner Kallweit [Wed, 8 Nov 2023 06:38:07 +0000 (07:38 +0100)]
i2c: mux: reg: Remove class-based device auto-detection support
Legacy class-based device auto-detection shouldn't be used in new code.
Therefore remove support in i2c-mux-reg as long as we don't have a
user of this feature yet.
Now that the driver core can properly handle constant struct bus_type,
move the i2c_bus_type variable to be a constant structure as well, placing
it into read-only memory which can not be modified at runtime.
Note, the sound/soc/rockchip/rk3399_gru_sound.c also needed tweaking as
it decided to save off a pointer to a bus type for internal stuff, and
it was using the i2c_bus_type as well.
Alexander Stein [Thu, 30 Nov 2023 09:57:51 +0000 (10:57 +0100)]
i2c: imx: Make SDA actually optional for bus recovering
Both i2c_generic_scl_recovery() and the debug output indicate that SDA is
purely optional for bus recovery. But devm_gpiod_get() never returns NULL
making it mandatory. Fix this by calling devm_gpiod_get_optional instead.
Jean Delvare [Tue, 14 Nov 2023 14:13:28 +0000 (15:13 +0100)]
i2c: smbus: Support up to 8 SPD EEPROMs
I originally restricted i2c_register_spd() to only support systems
with up to 4 memory slots, so that we can experiment with it on
a limited numbers of systems. It's been more than 3 years and it
seems to work just fine, so the time has come to lift this arbitrary
limitation.
The maximum number of memory slots which can be connected to a single
I2C segment is 8, so support that many SPD EEPROMs. Any system with
more than 8 memory slots would have either multiple SMBus channels
or SMBus multiplexing, so it would need dedicated care. We'll get to
that later as needed.
Add support for atomic transfers using polling mode with interrupts
intentionally disabled to get rid of the following warning introduced by
commit 63b96983a5dd ("i2c: core: introduce callbacks for atomic
transfers") during system reboot and power-off:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1518 at drivers/i2c/i2c-core.h:40 i2c_transfer+0xe8/0xf4
No atomic I2C transfer handler for 'i2c-0'
...
---[ end trace 0000000000000000 ]---
i2c: s3c24xx: fix transferring more than one message in polling mode
To properly handle ACK on the bus when transferring more than one
message in polling mode, move the polling handling loop from
s3c24xx_i2c_message_start() to s3c24xx_i2c_doxfer(). This way
i2c_s3c_irq_nextbyte() is always executed till the end, properly
acknowledging the IRQ bits and no recursive calls to
i2c_s3c_irq_nextbyte() are made.
While touching this, also fix finishing transfers in polling mode by
using common code path and always waiting for the bus to become idle
and disabled.
To properly handle read transfers in polling mode, no waiting for the ACK
state is needed as it will never come. Just wait a bit to ensure start
state is on the bus and continue processing next bytes.
Wolfram Sang [Thu, 14 Dec 2023 07:43:58 +0000 (08:43 +0100)]
i2c: rcar: add FastMode+ support for Gen4
To support FM+, we mainly need to turn the SMD constant into a parameter
and set it accordingly. That also means we can finally fix SMD to our
needs instead of bailing out. A sanity check for SMD then becomes a
sanity check for 'x == 0'. After all that, activating the enable bit for
FM+ is all we need to do. Tested with a Renesas Falcon board using R-Car
V3U.
Wolfram Sang [Sun, 12 Nov 2023 22:54:41 +0000 (17:54 -0500)]
i2c: create debugfs entry per adapter
Two drivers already implement custom debugfs handling for their
i2c_adapter and more will come. So, let the core create a debugfs
directory per adapter and pass that to drivers for their debugfs files.
Heiner Kallweit [Fri, 24 Nov 2023 10:16:11 +0000 (11:16 +0100)]
staging: greybus: Don't let i2c adapters declare I2C_CLASS_SPD support if they support I2C_CLASS_HWMON
After removal of the legacy eeprom driver the only remaining I2C
client device driver supporting I2C_CLASS_SPD is jc42. Because this
driver also supports I2C_CLASS_HWMON, adapters don't have to
declare support for I2C_CLASS_SPD if they support I2C_CLASS_HWMON.
It's one step towards getting rid of I2C_CLASS_SPD mid-term.
Heiner Kallweit [Fri, 24 Nov 2023 10:16:17 +0000 (11:16 +0100)]
media: netup_unidvb: Don't let i2c adapters declare I2C_CLASS_SPD support if they support I2C_CLASS_HWMON
After removal of the legacy eeprom driver the only remaining I2C
client device driver supporting I2C_CLASS_SPD is jc42. Because this
driver also supports I2C_CLASS_HWMON, adapters don't have to
declare support for I2C_CLASS_SPD if they support I2C_CLASS_HWMON.
It's one step towards getting rid of I2C_CLASS_SPD mid-term.
Heiner Kallweit [Fri, 24 Nov 2023 10:16:14 +0000 (11:16 +0100)]
i2c: stub: Don't let i2c adapters declare I2C_CLASS_SPD support if they support I2C_CLASS_HWMON
After removal of the legacy eeprom driver the only remaining I2C
client device driver supporting I2C_CLASS_SPD is jc42. Because this
driver also supports I2C_CLASS_HWMON, adapters don't have to
declare support for I2C_CLASS_SPD if they support I2C_CLASS_HWMON.
It's one step towards getting rid of I2C_CLASS_SPD mid-term.
Heiner Kallweit [Fri, 24 Nov 2023 10:16:10 +0000 (11:16 +0100)]
i2c: Don't let i2c adapters declare I2C_CLASS_SPD support if they support I2C_CLASS_HWMON
After removal of the legacy eeprom driver the only remaining I2C
client device driver supporting I2C_CLASS_SPD is jc42. Because this
driver also supports I2C_CLASS_HWMON, adapters don't have to
declare support for I2C_CLASS_SPD if they support I2C_CLASS_HWMON.
It's one step towards getting rid of I2C_CLASS_SPD mid-term.
Heiner Kallweit [Mon, 13 Nov 2023 11:37:15 +0000 (12:37 +0100)]
drm/amd/pm: Remove I2C_CLASS_SPD support
I2C_CLASS_SPD was used to expose the EEPROM content to user space,
via the legacy eeprom driver. Now that this driver has been removed,
we can remove I2C_CLASS_SPD support. at24 driver with explicit
instantiation should be used instead.
Heiner Kallweit [Thu, 23 Nov 2023 09:40:40 +0000 (10:40 +0100)]
include/linux/i2c.h: remove I2C_CLASS_DDC support
After removal of the legacy EEPROM driver and I2C_CLASS_DDC support in
olpc_dcon there's no i2c client driver left supporting I2C_CLASS_DDC.
Class-based device auto-detection is a legacy mechanism and shouldn't
be used in new code. So we can remove this class completely now.
Heiner Kallweit [Thu, 23 Nov 2023 09:40:25 +0000 (10:40 +0100)]
fbdev: remove I2C_CLASS_DDC support
After removal of the legacy EEPROM driver and I2C_CLASS_DDC support in
olpc_dcon there's no i2c client driver left supporting I2C_CLASS_DDC.
Class-based device auto-detection is a legacy mechanism and shouldn't
be used in new code. So we can remove this class completely now.
Heiner Kallweit [Thu, 23 Nov 2023 09:40:21 +0000 (10:40 +0100)]
drm: remove I2C_CLASS_DDC support
After removal of the legacy EEPROM driver and I2C_CLASS_DDC support in
olpc_dcon there's no i2c client driver left supporting I2C_CLASS_DDC.
Class-based device auto-detection is a legacy mechanism and shouldn't
be used in new code. So we can remove this class completely now.
Linus Torvalds [Thu, 18 Jan 2024 19:57:33 +0000 (11:57 -0800)]
Merge tag 'sched-urgent-2024-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Fix a cpufreq related performance regression on certain systems, where
the CPU would remain at the lowest frequency, degrading performance
substantially"
* tag 'sched-urgent-2024-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix frequency selection for non-invariant case
Linus Torvalds [Thu, 18 Jan 2024 19:43:55 +0000 (11:43 -0800)]
Merge tag 'usb-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt updates from Greg KH:
"Here is the big set of USB and Thunderbolt changes for 6.8-rc1.
Included in here are the following:
- Thunderbolt subsystem and driver updates for USB 4 hardware and
issues reported by real devices
- xhci driver updates
- dwc3 driver updates
- uvc_video gadget driver updates
- typec driver updates
- gadget string functions cleaned up
- other small changes
All of these have been in the linux-next tree for a while with no
reported issues"
* tag 'usb-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (169 commits)
usb: typec: tipd: fix use of device-specific init function
usb: typec: tipd: Separate reset for TPS6598x
usb: mon: Fix atomicity violation in mon_bin_vma_fault
usb: gadget: uvc: Remove nested locking
usb: gadget: uvc: Fix use are free during STREAMOFF
usb: typec: class: fix typec_altmode_put_partner to put plugs
dt-bindings: usb: dwc3: Limit num-hc-interrupters definition
dt-bindings: usb: xhci: Add num-hc-interrupters definition
xhci: add support to allocate several interrupters
USB: core: Use device_driver directly in struct usb_driver and usb_device_driver
arm64: dts: mediatek: mt8195: Add 'rx-fifo-depth' for cherry
usb: xhci-mtk: fix a short packet issue of gen1 isoc-in transfer
dt-bindings: usb: mtk-xhci: add a property for Gen1 isoc-in transfer issue
arm64: dts: qcom: msm8996: Remove PNoC clock from MSS
arm64: dts: qcom: msm8996: Remove AGGRE2 clock from SLPI
arm64: dts: qcom: msm8998: Remove AGGRE2 clock from SLPI
arm64: dts: qcom: msm8939: Drop RPM bus clocks
arm64: dts: qcom: sdm630: Drop RPM bus clocks
arm64: dts: qcom: qcs404: Drop RPM bus clocks
arm64: dts: qcom: msm8996: Drop RPM bus clocks
...
Linus Torvalds [Thu, 18 Jan 2024 19:37:24 +0000 (11:37 -0800)]
Merge tag 'tty-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial updates from Greg KH:
"Here is the big set of tty and serial driver changes for 6.8-rc1.
As usual, Jiri has a bunch of refactoring and cleanups for the tty
core and drivers in here, along with the usual set of rs485 updates
(someday this might work properly...)
Along with those, in here are changes for:
- sc16is7xx serial driver updates
- platform driver removal api updates
- amba-pl011 driver updates
- tty driver binding updates
- other small tty/serial driver updates and changes
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (197 commits)
serial: sc16is7xx: refactor EFR lock
serial: sc16is7xx: reorder code to remove prototype declarations
serial: sc16is7xx: refactor FIFO access functions to increase commonality
serial: sc16is7xx: drop unneeded MODULE_ALIAS
serial: sc16is7xx: replace hardcoded divisor value with BIT() macro
serial: sc16is7xx: add explicit return for some switch default cases
serial: sc16is7xx: add macro for max number of UART ports
serial: sc16is7xx: add driver name to struct uart_driver
serial: sc16is7xx: use i2c_get_match_data()
serial: sc16is7xx: use spi_get_device_match_data()
serial: sc16is7xx: use DECLARE_BITMAP for sc16is7xx_lines bitfield
serial: sc16is7xx: improve do/while loop in sc16is7xx_irq()
serial: sc16is7xx: remove obsolete loop in sc16is7xx_port_irq()
serial: sc16is7xx: set safe default SPI clock frequency
serial: sc16is7xx: add check for unsupported SPI modes during probe
serial: sc16is7xx: fix invalid sc16is7xx_lines bitfield in case of probe error
serial: 8250_exar: Set missing rs485_supported flag
serial: omap: do not override settings for RS485 support
serial: core, imx: do not set RS485 enabled if it is not supported
serial: core: make sure RS485 cannot be enabled when it is not supported
...
Michael Ellerman [Fri, 15 Dec 2023 12:44:49 +0000 (23:44 +1100)]
powerpc/64s: Increase default stack size to 32KB
There are reports of kernels crashing due to stack overflow while
running OpenShift (Kubernetes). The primary contributor to the stack
usage seems to be openvswitch, which is used by OVN-Kubernetes (based on
OVN (Open Virtual Network)), but NFS also contributes in some stack
traces.
There may be some opportunities to reduce stack usage in the openvswitch
code, but doing so potentially require tradeoffs vs performance, and
also requires testing across architectures.
Looking at stack usage across the kernel (using -fstack-usage), shows
that ppc64le stack frames are on average 50-100% larger than the
equivalent function built for x86-64. Which is not surprising given the
minimum stack frame size is 32 bytes on ppc64le vs 16 bytes on x86-64.
So increase the default stack size to 32KB for the modern 64-bit Book3S
platforms, ie. pseries (virtualised) and powernv (bare metal). That
leaves the older systems like G5s, and the AmigaOne (pasemi) with a 16KB
stack which should be sufficient on those machines.
Linus Torvalds [Thu, 18 Jan 2024 18:30:48 +0000 (10:30 -0800)]
Merge tag 'staging-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver updates from Greg KH:
"Here is the "big" set of staging driver changes for 6.8-rc1. It's not
really that big this release cycle, not much happened except for 186
patches of coding style cleanups. The majority was in the rtl8192e
driver, but there are other smaller changes in a few other staging
drivers, full details in the shortlog.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (186 commits)
Staging: rtl8192e: Rename variable OpMode
Staging: rtl8192e: Rename variable bIsAggregateFrame
Staging: rtl8192e: Rename function rtllib_EnableNetMonitorMode()
Staging: rtl8192e: Rename variable NumRxOkInPeriod
Staging: rtl8192e: Rename variable NumTxOkInPeriod
Staging: rtl8192e: Rename variable bUsed
staging: vme_user: print more detailed infomation when an error occurs
Staging: rtl8192e: Rename function rtllib_DisableNetMonitorMode()
Staging: rtl8192e: Rename variable bInitState
Staging: rtl8192e: Rename variable skb_waitQ
Staging: rtl8192e: Rename variable BasicRate
Staging: rtl8192e: Rename variable QueryRate
Staging: rtl8192e: Rename function rtllib_TURBO_Info()
Staging: rtl8192e: Rename function rtllib_WMM_Info()
Staging: rtl8192e: Rename function rtllib_MFIE_Grate()
Staging: rtl8192e: Rename function rtllib_MFIE_Brate()
Staging: rtl8192e: Fixup statement broken across 2 lines in rtllib_softmac_new_net()
Staging: rtl8192e: Fixup statement broken across 2 lines in rtllib_softmac_xmit()
Staging: rtl8192e: Fix function definition broken across multiple lines
Staging: rtl8192e: Fix statement broken across 2 lines in rtllib_rx_assoc_resp()
...
Steve French [Wed, 17 Jan 2024 22:15:18 +0000 (16:15 -0600)]
smb3: show beginning time for per share stats
In analyzing problems, one missing piece of debug data is when the
mount occurred. A related problem is when collecting stats we don't
know the period of time the stats covered, ie when this set of stats
for the tcon started to be collected. To make debugging easier track
the stats begin time. Set it when the mount occurred at mount time,
and reset it to current time whenever stats are reset. For example,
...
1) \\localhost\test
SMBs: 14 since 2024-01-17 22:17:30 UTC
Bytes read: 0 Bytes written: 0
Open files: 0 total (local), 0 open on server
TreeConnects: 1 total 0 failed
TreeDisconnects: 0 total 0 failed
...
2) \\localhost\scratch
SMBs: 24 since 2024-01-17 22:16:04 UTC
Bytes read: 0 Bytes written: 0
Open files: 0 total (local), 0 open on server
TreeConnects: 1 total 0 failed
TreeDisconnects: 0 total 0 failed
...
Note the time "since ... UTC" is now displayed in /proc/fs/cifs/Stats
for each share that is mounted.
Jakub Kicinski [Thu, 18 Jan 2024 17:54:24 +0000 (09:54 -0800)]
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-01-18
We've added 10 non-merge commits during the last 5 day(s) which contain
a total of 12 files changed, 806 insertions(+), 51 deletions(-).
The main changes are:
1) Fix an issue in bpf_iter_udp under backward progress which prevents
user space process from finishing iteration, from Martin KaFai Lau.
2) Fix BPF verifier to reject variable offset alu on registers with a type
of PTR_TO_FLOW_KEYS to prevent oob access, from Hao Sun.
3) Follow up fixes for kernel- and libbpf-side logic around handling
arg:ctx tagged arguments of BPF global subprogs, from Andrii Nakryiko.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
libbpf: warn on unexpected __arg_ctx type when rewriting BTF
selftests/bpf: add tests confirming type logic in kernel for __arg_ctx
bpf: enforce types for __arg_ctx-tagged arguments in global subprogs
bpf: extract bpf_ctx_convert_map logic and make it more reusable
libbpf: feature-detect arg:ctx tag support in kernel
selftests/bpf: Add test for alu on PTR_TO_FLOW_KEYS
bpf: Reject variable offset alu on PTR_TO_FLOW_KEYS
selftests/bpf: Test udp and tcp iter batching
bpf: Avoid iter->offset making backward progress in bpf_iter_udp
bpf: iter_udp: Retry with a larger batch size without going back to the previous bucket
====================
Tony Nguyen [Wed, 17 Jan 2024 17:25:32 +0000 (09:25 -0800)]
i40e: Include types.h to some headers
Commit 56df345917c0 ("i40e: Remove circular header dependencies and fix
headers") redistributed a number of includes from one large header file
to the locations they were needed. In some environments, types.h is not
included and causing compile issues. The driver should not rely on
implicit inclusion from other locations; explicitly include it to these
files.
Snippet of issue. Entire log can be seen through the Closes: link.
In file included from drivers/net/ethernet/intel/i40e/i40e_diag.h:7,
from drivers/net/ethernet/intel/i40e/i40e_diag.c:4:
drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h:33:9: error: unknown type name '__le16'
33 | __le16 flags;
| ^~~~~~
drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h:34:9: error: unknown type name '__le16'
34 | __le16 opcode;
| ^~~~~~
...
drivers/net/ethernet/intel/i40e/i40e_diag.h:22:9: error: unknown type name 'u32'
22 | u32 elements; /* number of elements if array */
| ^~~
drivers/net/ethernet/intel/i40e/i40e_diag.h:23:9: error: unknown type name 'u32'
23 | u32 stride; /* bytes between each element */
ipv6: mcast: fix data-race in ipv6_mc_down / mld_ifc_work
idev->mc_ifc_count can be written over without proper locking.
Originally found by syzbot [1], fix this issue by encapsulating calls
to mld_ifc_stop_work() (and mld_gq_stop_work() for good measure) with
mutex_lock() and mutex_unlock() accordingly as these functions
should only be called with mc_lock per their declarations.
[1]
BUG: KCSAN: data-race in ipv6_mc_down / mld_ifc_work
write to 0xffff88813a80c832 of 1 bytes by task 22 on cpu 1:
mld_ifc_work+0x54c/0x7b0 net/ipv6/mcast.c:2653
process_one_work kernel/workqueue.c:2627 [inline]
process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2700
worker_thread+0x525/0x730 kernel/workqueue.c:2781
...
Linus Torvalds [Thu, 18 Jan 2024 17:48:40 +0000 (09:48 -0800)]
Merge tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here are the set of driver core and kernfs changes for 6.8-rc1.
Nothing major in here this release cycle, just lots of small cleanups
and some tweaks on kernfs that in the very end, got reverted and will
come back in a safer way next release cycle.
Included in here are:
- more driver core 'const' cleanups and fixes
- fw_devlink=rpm is now the default behavior
- kernfs tiny changes to remove some string functions
- cpu handling in the driver core is updated to work better on many
systems that add topologies and cpus after booting
- other minor changes and cleanups
All of the cpu handling patches have been acked by the respective
maintainers and are coming in here in one series. Everything has been
in linux-next for a while with no reported issues"
* tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (51 commits)
Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock"
kernfs: convert kernfs_idr_lock to an irq safe raw spinlock
class: fix use-after-free in class_register()
PM: clk: make pm_clk_add_notifier() take a const pointer
EDAC: constantify the struct bus_type usage
kernfs: fix reference to renamed function
driver core: device.h: fix Excess kernel-doc description warning
driver core: class: fix Excess kernel-doc description warning
driver core: mark remaining local bus_type variables as const
driver core: container: make container_subsys const
driver core: bus: constantify subsys_register() calls
driver core: bus: make bus_sort_breadthfirst() take a const pointer
kernfs: d_obtain_alias(NULL) will do the right thing...
driver core: Better advertise dev_err_probe()
kernfs: Convert kernfs_path_from_node_locked() from strlcpy() to strscpy()
kernfs: Convert kernfs_name_locked() from strlcpy() to strscpy()
kernfs: Convert kernfs_walk_ns() from strlcpy() to strscpy()
initramfs: Expose retained initrd as sysfs file
fs/kernfs/dir: obey S_ISGID
kernel/cgroup: use kernfs_create_dir_ns()
...
Amit Cohen [Wed, 17 Jan 2024 15:04:21 +0000 (16:04 +0100)]
selftests: mlxsw: qos_pfc: Adjust the test to support 8 lanes
'qos_pfc' test checks PFC behavior. The idea is to limit the traffic
using a shaper somewhere in the flow of the packets. In this area, the
buffer is smaller than the buffer at the beginning of the flow, so it fills
up until there is no more space left. The test configures there PFC
which is supposed to notice that the headroom is filling up and send PFC
Xoff to indicate the transmitter to stop sending traffic for the priorities
sharing this PG.
The Xon/Xoff threshold is auto-configured and always equal to
2*(MTU rounded up to cell size). Even after sending the PFC Xoff packet,
traffic will keep arriving until the transmitter receives and processes
the PFC packet. This amount of traffic is known as the PFC delay allowance.
Currently the buffer for the delay traffic is configured as 100KB. The
MTU in the test is 10KB, therefore the threshold for Xoff is about 20KB.
This allows 80KB extra to be stored in this buffer.
8-lane ports use two buffers among which the configured buffer is split,
the Xoff threshold then applies to each buffer in parallel.
The test does not take into account the behavior of 8-lane ports, when the
ports are configured to 400Gbps with 8 lanes or 800Gbps with 8 lanes,
packets are dropped and the test fails.
Check if the relevant ports use 8 lanes, in such case double the size of
the buffer, as the headroom is split half-half.
Petr Machata [Wed, 17 Jan 2024 15:04:19 +0000 (16:04 +0100)]
mlxsw: spectrum_router: Register netdevice notifier before nexthop
If there are IPIP nexthops at the time when the driver is loaded (or the
devlink instance reloaded), the driver looks up the corresponding IPIP
entry. But IPIP entries are only created as a result of netdevice
notifications. Since the netdevice notifier is registered after the nexthop
notifier, mlxsw_sp_nexthop_type_init() never finds the IPIP entry,
registers the nexthop MLXSW_SP_NEXTHOP_TYPE_ETH, and fails to assign a CRIF
to the nexthop. Later on when the CRIF is necessary, the WARN_ON in
mlxsw_sp_nexthop_rif() triggers, causing the splat [1].
In order to fix the issue, reorder the netdevice notifier to be registered
before the nexthop one.
Ido Schimmel [Wed, 17 Jan 2024 15:04:18 +0000 (16:04 +0100)]
mlxsw: spectrum_acl_tcam: Fix stack corruption
When tc filters are first added to a net device, the corresponding local
port gets bound to an ACL group in the device. The group contains a list
of ACLs. In turn, each ACL points to a different TCAM region where the
filters are stored. During forwarding, the ACLs are sequentially
evaluated until a match is found.
One reason to place filters in different regions is when they are added
with decreasing priorities and in an alternating order so that two
consecutive filters can never fit in the same region because of their
key usage.
In Spectrum-2 and newer ASICs the firmware started to report that the
maximum number of ACLs in a group is more than 16, but the layout of the
register that configures ACL groups (PAGT) was not updated to account
for that. It is therefore possible to hit stack corruption [1] in the
rare case where more than 16 ACLs in a group are required.
Fix by limiting the maximum ACL group size to the minimum between what
the firmware reports and the maximum ACLs that fit in the PAGT register.
Add a test case to make sure the machine does not crash when this
condition is hit.
Ido Schimmel [Wed, 17 Jan 2024 15:04:17 +0000 (16:04 +0100)]
mlxsw: spectrum_acl_tcam: Fix NULL pointer dereference in error path
When calling mlxsw_sp_acl_tcam_region_destroy() from an error path after
failing to attach the region to an ACL group, we hit a NULL pointer
dereference upon 'region->group->tcam' [1].
Fix by retrieving the 'tcam' pointer using mlxsw_sp_acl_to_tcam().
Amit Cohen [Wed, 17 Jan 2024 15:04:16 +0000 (16:04 +0100)]
mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure
Lately, a bug was found when many TC filters are added - at some point,
several bugs are printed to dmesg [1] and the switch is crashed with
segmentation fault.
The issue starts when gen_pool_free() fails because of unexpected
behavior - a try to free memory which is already freed, this leads to BUG()
call which crashes the switch and makes many other bugs.
Trying to track down the unexpected behavior led to a bug in eRP code. The
function mlxsw_sp_acl_erp_table_alloc() gets a pointer to the allocated
index, sets the value and returns an error code. When gen_pool_alloc()
fails it returns address 0, we track it and return -ENOBUFS outside, BUT
the call for gen_pool_alloc() already override the index in erp_table
structure. This is a problem when such allocation is done as part of
table expansion. This is not a new table, which will not be used in case
of allocation failure. We try to expand eRP table and override the
current index (non-zero) with zero. Then, it leads to an unexpected
behavior when address 0 is freed twice. Note that address 0 is valid in
erp_table->base_index and indeed other tables use it.
gen_pool_alloc() fails in case that there is no space left in the
pre-allocated pool, in our case, the pool is limited to
ACL_MAX_ERPT_BANK_SIZE, which is read from hardware. When more than max
erp entries are required, we exceed the limit and return an error, this
error leads to "Failed to migrate vregion" print.
Fix this by changing erp_table->base_index only in case of a successful
allocation.
Add a test case for such a scenario. Without this fix it causes
segmentation fault:
$ TESTS="max_erp_entries_test" ./tc_flower.sh
./tc_flower.sh: line 988: 1560 Segmentation fault tc filter del dev $h2 ingress chain $i protocol ip pref $i handle $j flower &>/dev/null
loop: fix the the direct I/O support check when used on top of block devices
__loop_update_dio only checks the alignment requirement for block backed
file systems, but misses them for the case where the loop device is
created directly on top of another block device. Due to this creating
a loop device with default option plus the direct I/O flag on a > 512 byte
sector size file system will lead to incorrect I/O being submitted to the
lower block device and a lot of error from the lock layer. This can
be seen with xfstests generic/563.
Fix the code in __loop_update_dio by factoring the alignment check into
a helper, and calling that also for the struct block_device of a block
device inode.
Also remove the TODO comment talking about dynamically switching between
buffered and direct I/O, which is a would be a recipe for horrible
performance and occasional data loss.
Nathan Lynch [Tue, 16 Jan 2024 14:09:25 +0000 (08:09 -0600)]
seq_buf: Make DECLARE_SEQ_BUF() usable
Using the address operator on the array doesn't work:
./include/linux/seq_buf.h:27:27: error: initialization of ‘char *’
from incompatible pointer type ‘char (*)[128]’
[-Werror=incompatible-pointer-types]
27 | .buffer = &__ ## NAME ## _buffer, \
| ^
Apart from fixing that, we can improve DECLARE_SEQ_BUF() by using a
compound literal to define the buffer array without attaching a name
to it. This makes the macro a single statement, allowing constructs
such as:
Felix Fietkau [Thu, 4 Jan 2024 18:10:59 +0000 (19:10 +0100)]
wifi: mac80211: fix race condition on enabling fast-xmit
fast-xmit must only be enabled after the sta has been uploaded to the driver,
otherwise it could end up passing the not-yet-uploaded sta via drv_tx calls
to the driver, leading to potential crashes because of uninitialized drv_priv
data.
Add a missing sta->uploaded check and re-check fast xmit after inserting a sta.
iwl_fw_ini_trigger_tlv::data is a pointer to a __le32, which means that
if we copy to iwl_fw_ini_trigger_tlv::data + offset while offset is in
bytes, we'll write past the buffer.
Johannes Berg [Thu, 11 Jan 2024 16:17:44 +0000 (18:17 +0200)]
wifi: mac80211: fix potential sta-link leak
When a station is allocated, links are added but not
set to valid yet (e.g. during connection to an AP MLD),
we might remove the station without ever marking links
valid, and leak them. Fix that.
Lukas Bulwahn [Thu, 18 Jan 2024 08:25:45 +0000 (09:25 +0100)]
wifi: cfg80211/mac80211: remove dependency on non-existing option
Commit ffbd0c8c1e7f ("wifi: mac80211: add an element parsing unit test")
and commit 730eeb17bbdd ("wifi: cfg80211: add first kunit tests, for
element defrag") add new configs that depend on !KERNEL_6_2, but the config
option KERNEL_6_2 does not exist in the tree. This dependency is used for
handling backporting to restrict the option to certain kernels but this
really should not be carried around the mainline kernel tree.
Clean up this needless dependency on the non-existing option KERNEL_6_2.
dump(cb=[0, 1]):
if_start=cb[1] (=1)
send rdev0.wdev1 -> ok
// since if_start=1 the rdev0.wdev0 got skipped
// through if_idx < if_start
send rdev1.wdev1 -> ok
The if_start needs to be reset back to 0 upon wdev
loop end.
The problem is actually hard to hit on a desktop,
and even on most routers. The prerequisites for
this manifesting was:
- more than 1 wiphy
- a few handful of interfaces
- dump without rdev or wdev filter
I was seeing this with 4 wiphys 9 interfaces each.
It'd miss 6 interfaces from the last wiphy
reported to userspace.
Accessing an ethernet device that is powered off or clock gated might
cause the CPU to hang. Add ethnl_ops_begin/complete in
ethnl_set_features() to protect against this.
Robin Murphy [Wed, 17 Jan 2024 17:05:45 +0000 (17:05 +0000)]
arm64: Fix silcon-errata.rst formatting
Remove the errant blank lines to make the desired empty row separators
around the Fujitsu and ASR entries in the main table, rather than them
being their own separate tables which then look odd in the HTML view.
Mark Brown [Mon, 15 Jan 2024 20:15:46 +0000 (20:15 +0000)]
arm64/sme: Always exit sme_alloc() early with existing storage
When sme_alloc() is called with existing storage and we are not flushing we
will always allocate new storage, both leaking the existing storage and
corrupting the state. Fix this by separating the checks for flushing and
for existing storage as we do for SVE.
Callers that reallocate (eg, due to changing the vector length) should
call sme_free() themselves.
Mark Brown [Mon, 15 Jan 2024 19:53:01 +0000 (19:53 +0000)]
arm64/fpsimd: Remove spurious check for SVE support
There is no need to check for SVE support when changing vector lengths,
even if the system is SME only we still need SVE storage for the streaming
SVE state.
Mark Brown [Mon, 15 Jan 2024 18:42:38 +0000 (18:42 +0000)]
arm64/ptrace: Don't flush ZA/ZT storage when writing ZA via ptrace
When writing ZA we currently unconditionally flush the buffer used to store
it as part of ensuring that it is allocated. Since this buffer is shared
with ZT0 this means that a write to ZA when PSTATE.ZA is already set will
corrupt the value of ZT0 on a SME2 system. Fix this by only flushing the
backing storage if PSTATE.ZA was not previously set.
This will mean that short or failed writes may leave stale data in the
buffer, this seems as correct as our current behaviour and unlikely to be
something that userspace will rely on.
This expands to the same code, but is simpler for a human to follow as
it avoids duplicates the restore of LR+SP, and makes it clear that the
ERET is associated with the SB.
There should be no functional change as a result of this patch.
Currently the ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD workaround isn't
quite right, as it is supposed to be applied after the last explicit
memory access, but is immediately followed by an LDR.
The ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD workaround is used to
handle Cortex-A520 erratum 2966298 and Cortex-A510 erratum 3117295,
which are described in:
| If pagetable isolation is disabled, the context switch logic in the
| kernel can be updated to execute the following sequence on affected
| cores before exiting to EL0, and after all explicit memory accesses:
|
| 1. A non-shareable TLBI to any context and/or address, including
| unused contexts or addresses, such as a `TLBI VALE1 Xzr`.
|
| 2. A DSB NSH to guarantee completion of the TLBI.
The important part being that the TLBI+DSB must be placed "after all
explicit memory accesses".
Unfortunately, as-implemented, the TLBI+DSB is immediately followed by
an LDR, as we have:
This patch fixes this by reworking the logic to place the TLBI+DSB
immediately before the ERET, after all explicit memory accesses.
The ERET is currently in a separate alternative block, and alternatives
cannot be nested. To account for this, the alternative block for
ARM64_UNMAP_KERNEL_AT_EL0 is replaced with a single alternative branch
to skip the KPTI logic, with the new shape of the logic being:
The new structure means that the workaround is only applied when KPTI is
not in use; this is fine as noted in the documented implications of the
erratum:
| Pagetable isolation between EL0 and higher level ELs prevents the
| issue from occurring.
... and as per the workaround description quoted above, the workaround
is only necessary "If pagetable isolation is disabled".
Benjamin Poirier [Tue, 16 Jan 2024 15:49:26 +0000 (10:49 -0500)]
selftests: bonding: Add more missing config options
As a followup to commit 03fb8565c880 ("selftests: bonding: add missing
build configs"), add more networking-specific config options which are
needed for bonding tests.
For testing, I used the minimal config generated by virtme-ng and I added
the options in the config file. All bonding tests passed.
Fixes: bbb774d921e2 ("net: Add tests for bonding and team address list management") # for ipv6 Fixes: 6cbe791c0f4e ("kselftest: bonding: add num_grat_arp test") # for tc options Fixes: 222c94ec0ad4 ("selftests: bonding: add tests for ether type changes") # for nlmon Suggested-by: Jakub Kicinski <[email protected]> Signed-off-by: Benjamin Poirier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
Jakub Kicinski [Tue, 16 Jan 2024 15:43:11 +0000 (07:43 -0800)]
selftests: netdevsim: add a config file
netdevsim tests aren't very well integrated with kselftest,
which has its advantages and disadvantages. But regardless
of the intended integration - a config file to know what kernel
to build is very useful, add one.
Benjamin Berg [Mon, 15 Jan 2024 10:18:05 +0000 (11:18 +0100)]
wifi: ath11k: rely on mac80211 debugfs handling for vif
mac80211 started to delete debugfs entries in certain cases, causing a
ath11k to crash when it tried to delete the entries later. Fix this by
relying on mac80211 to delete the entries when appropriate and adding
them from the vif_add_debugfs handler.
====================
Tighten up arg:ctx type enforcement
Follow up fixes for kernel-side and libbpf-side logic around handling arg:ctx
(__arg_ctx) tagged arguments of BPF global subprogs.
Patch #1 adds libbpf feature detection of kernel-side __arg_ctx support to
avoid unnecessary rewriting BTF types. With stricter kernel-side type
enforcement this is now mandatory to avoid problems with using `struct
bpf_user_pt_regs_t` instead of actual typedef. For __arg_ctx tagged arguments
verifier is now supporting either `bpf_user_pt_regs_t` typedef or resolves it
down to the actual struct (pt_regs/user_pt_regs/user_regs_struct), depending
on architecture), but for old kernels without __arg_ctx support it's more
backwards compatible for libbpf to use `struct bpf_user_pt_regs_t` rewrite
which will work on wider range of kernels. So feature detection prevent libbpf
accidentally breaking global subprogs on new kernels.
We also adjust selftests to do similar feature detection (much simpler, but
potentially breaking due to kernel source code refactoring, which is fine for
selftests), and skip tests expecting libbpf's BTF type rewrites.
Patch #2 is preparatory refactoring for patch #3 which adds type enforcement
for arg:ctx tagged global subprog args. See the patch for specifics.
Patch #4 adds many new cases to ensure type logic works as expected.
Finally, patch #5 adds a relevant subset of kernel-side type checks to
__arg_ctx cases that libbpf supports rewrite of. In libbpf's case, type
violations are reported as warnings and BTF rewrite is not performed, which
will eventually lead to BPF verifier complaining at program verification time.
Good care was taken to avoid conflicts between bpf and bpf-next tree (which
has few follow up refactorings in the same code area). Once trees converge
some of the code will be moved around a bit (and some will be deleted), but
with no change to functionality or general shape of the code.
v2->v3:
- support `bpf_user_pt_regs_t` typedef for KPROBE and PERF_EVENT (CI);
v1->v2:
- add user_pt_regs and user_regs_struct support for PERF_EVENT (CI);
- drop FEAT_ARG_CTX_TAG enum leftover from patch #1;
- fix warning about default: without break in the switch (CI).
====================
Andrii Nakryiko [Thu, 18 Jan 2024 03:31:43 +0000 (19:31 -0800)]
libbpf: warn on unexpected __arg_ctx type when rewriting BTF
On kernel that don't support arg:ctx tag, before adjusting global
subprog BTF information to match kernel's expected canonical type names,
make sure that types used by user are meaningful, and if not, warn and
don't do BTF adjustments.
This is similar to checks that kernel performs, but narrower in scope,
as only a small subset of BPF program types can be accommodated by
libbpf using canonical type names.
Libbpf unconditionally allows `struct pt_regs *` for perf_event program
types, unlike kernel, which supports that conditionally on architecture.
This is done to keep things simple and not cause unnecessary false
positives. This seems like a minor and harmless deviation, which in
real-world programs will be caught by kernels with arg:ctx tag support
anyways. So KISS principle.
This logic is hard to test (especially on latest kernels), so manual
testing was performed instead. Libbpf emitted the following warning for
perf_event program with wrong context argument type:
libbpf: prog 'arg_tag_ctx_perf': subprog 'subprog_ctx_tag' arg#0 is expected to be of `struct bpf_perf_event_data *` type
Andrii Nakryiko [Thu, 18 Jan 2024 03:31:41 +0000 (19:31 -0800)]
bpf: enforce types for __arg_ctx-tagged arguments in global subprogs
Add enforcement of expected types for context arguments tagged with
arg:ctx (__arg_ctx) tag.
First, any program type will accept generic `void *` context type when
combined with __arg_ctx tag.
Besides accepting "canonical" struct names and `void *`, for a bunch of
program types for which program context is actually a named struct, we
allows a bunch of pragmatic exceptions to match real-world and expected
usage:
- for both kprobes and perf_event we allow `bpf_user_pt_regs_t *` as
canonical context argument type, where `bpf_user_pt_regs_t` is a
*typedef*, not a struct;
- for kprobes, we also always accept `struct pt_regs *`, as that's what
actually is passed as a context to any kprobe program;
- for perf_event, we resolve typedefs (unless it's `bpf_user_pt_regs_t`)
down to actual struct type and accept `struct pt_regs *`, or
`struct user_pt_regs *`, or `struct user_regs_struct *`, depending
on the actual struct type kernel architecture points `bpf_user_pt_regs_t`
typedef to; otherwise, canonical `struct bpf_perf_event_data *` is
expected;
- for raw_tp/raw_tp.w programs, `u64/long *` are accepted, as that's
what's expected with BPF_PROG() usage; otherwise, canonical
`struct bpf_raw_tracepoint_args *` is expected;
- tp_btf supports both `struct bpf_raw_tracepoint_args *` and `u64 *`
formats, both are coded as expections as tp_btf is actually a TRACING
program type, which has no canonical context type;
- iterator programs accept `struct bpf_iter__xxx *` structs, currently
with no further iterator-type specific enforcement;
- fentry/fexit/fmod_ret/lsm/struct_ops all accept `u64 *`;
- classic tracepoint programs, as well as syscall and freplace
programs allow any user-provided type.
In all other cases kernel will enforce exact match of struct name to
expected canonical type. And if user-provided type doesn't match that
expectation, verifier will emit helpful message with expected type name.
Note a bit unnatural way the check is done after processing all the
arguments. This is done to avoid conflict between bpf and bpf-next
trees. Once trees converge, a small follow up patch will place a simple
btf_validate_prog_ctx_type() check into a proper ARG_PTR_TO_CTX branch
(which bpf-next tree patch refactored already), removing duplicated
arg:ctx detection logic.
Andrii Nakryiko [Thu, 18 Jan 2024 03:31:40 +0000 (19:31 -0800)]
bpf: extract bpf_ctx_convert_map logic and make it more reusable
Refactor btf_get_prog_ctx_type() a bit to allow reuse of
bpf_ctx_convert_map logic in more than one places. Simplify interface by
returning btf_type instead of btf_member (field reference in BTF).
To do the above we need to touch and start untangling
btf_translate_to_vmlinux() implementation. We do the bare minimum to
not regress anything for btf_translate_to_vmlinux(), but its
implementation is very questionable for what it claims to be doing.
Mapping kfunc argument types to kernel corresponding types conceptually
is quite different from recognizing program context types. Fixing this
is out of scope for this change though.
Andrii Nakryiko [Thu, 18 Jan 2024 03:31:39 +0000 (19:31 -0800)]
libbpf: feature-detect arg:ctx tag support in kernel
Add feature detector of kernel-side arg:ctx (__arg_ctx) tag support. If
this is detected, libbpf will avoid doing any __arg_ctx-related BTF
rewriting and checks in favor of letting kernel handle this completely.
test_global_funcs/ctx_arg_rewrite subtest is adjusted to do the same
feature detection (albeit in much simpler, though round-about and
inefficient, way), and skip the tests. This is done to still be able to
execute this test on older kernels (like in libbpf CI).
Note, BPF token series ([0]) does a major refactor and code moving of
libbpf-internal feature detection "framework", so to avoid unnecessary
conflicts we keep newly added feature detection stand-alone with ad-hoc
result caching. Once things settle, there will be a small follow up to
re-integrate everything back and move code into its final place in
newly-added (by BPF token series) features.c file.
Maxim Kochetkov [Thu, 14 Dec 2023 06:39:06 +0000 (09:39 +0300)]
riscv: optimize ELF relocation function in riscv
The patch can optimize the running times of insmod command by modify ELF
relocation function.
In the 5.10 and latest kernel, when install the riscv ELF drivers which
contains multiple symbol table items to be relocated, kernel takes a lot
of time to execute the relocation. For example, we install a 3+MB driver
need 180+s.
We focus on the riscv architecture handle R_RISCV_HI20 and R_RISCV_LO20
type items relocation function in the arch\riscv\kernel\module.c and
find that there are two-loops in the function. If we modify the begin
number in the second for-loops iteration, we could save significant time
for installation. We install the same 3+MB driver could just need 2s.
Xiao Wang [Sun, 12 Nov 2023 09:52:44 +0000 (17:52 +0800)]
riscv: Optimize hweight API with Zbb extension
The Hamming Weight of a number is the total number of bits set in it, so
the cpop/cpopw instruction from Zbb extension can be used to accelerate
hweight() API.
This series includes a three ftrace improvements for RISC-V:
1. Do not require to run recordmcount at build time (patch 1)
2. Simplification of the function graph functionality (patch 2)
3. Enable DYNAMIC_FTRACE_WITH_DIRECT_CALLS (patch 3 and 4)
The series has been tested on Qemu/rv64 virt/Debian sid with the
following test configs:
CONFIG_FTRACE_SELFTEST=y
CONFIG_FTRACE_STARTUP_TEST=y
CONFIG_SAMPLE_FTRACE_DIRECT=m
CONFIG_SAMPLE_FTRACE_DIRECT_MULTI=m
CONFIG_SAMPLE_FTRACE_OPS=m
All tests pass.
* b4-shazam-merge:
samples: ftrace: Add RISC-V support for SAMPLE_FTRACE_DIRECT[_MULTI]
riscv: ftrace: Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support
riscv: ftrace: Make function graph use ftrace directly
riscv: select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY