Add support for atomic transfers using polling mode with interrupts
intentionally disabled to get rid of the following warning introduced by
commit 63b96983a5dd ("i2c: core: introduce callbacks for atomic
transfers") during system reboot and power-off:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1518 at drivers/i2c/i2c-core.h:40 i2c_transfer+0xe8/0xf4
No atomic I2C transfer handler for 'i2c-0'
...
---[ end trace 0000000000000000 ]---
i2c: s3c24xx: fix transferring more than one message in polling mode
To properly handle ACK on the bus when transferring more than one
message in polling mode, move the polling handling loop from
s3c24xx_i2c_message_start() to s3c24xx_i2c_doxfer(). This way
i2c_s3c_irq_nextbyte() is always executed till the end, properly
acknowledging the IRQ bits and no recursive calls to
i2c_s3c_irq_nextbyte() are made.
While touching this, also fix finishing transfers in polling mode by
using common code path and always waiting for the bus to become idle
and disabled.
To properly handle read transfers in polling mode, no waiting for the ACK
state is needed as it will never come. Just wait a bit to ensure start
state is on the bus and continue processing next bytes.
Wolfram Sang [Thu, 14 Dec 2023 07:43:58 +0000 (08:43 +0100)]
i2c: rcar: add FastMode+ support for Gen4
To support FM+, we mainly need to turn the SMD constant into a parameter
and set it accordingly. That also means we can finally fix SMD to our
needs instead of bailing out. A sanity check for SMD then becomes a
sanity check for 'x == 0'. After all that, activating the enable bit for
FM+ is all we need to do. Tested with a Renesas Falcon board using R-Car
V3U.
Wolfram Sang [Sun, 12 Nov 2023 22:54:41 +0000 (17:54 -0500)]
i2c: create debugfs entry per adapter
Two drivers already implement custom debugfs handling for their
i2c_adapter and more will come. So, let the core create a debugfs
directory per adapter and pass that to drivers for their debugfs files.
Heiner Kallweit [Fri, 24 Nov 2023 10:16:11 +0000 (11:16 +0100)]
staging: greybus: Don't let i2c adapters declare I2C_CLASS_SPD support if they support I2C_CLASS_HWMON
After removal of the legacy eeprom driver the only remaining I2C
client device driver supporting I2C_CLASS_SPD is jc42. Because this
driver also supports I2C_CLASS_HWMON, adapters don't have to
declare support for I2C_CLASS_SPD if they support I2C_CLASS_HWMON.
It's one step towards getting rid of I2C_CLASS_SPD mid-term.
Heiner Kallweit [Fri, 24 Nov 2023 10:16:17 +0000 (11:16 +0100)]
media: netup_unidvb: Don't let i2c adapters declare I2C_CLASS_SPD support if they support I2C_CLASS_HWMON
After removal of the legacy eeprom driver the only remaining I2C
client device driver supporting I2C_CLASS_SPD is jc42. Because this
driver also supports I2C_CLASS_HWMON, adapters don't have to
declare support for I2C_CLASS_SPD if they support I2C_CLASS_HWMON.
It's one step towards getting rid of I2C_CLASS_SPD mid-term.
Heiner Kallweit [Fri, 24 Nov 2023 10:16:14 +0000 (11:16 +0100)]
i2c: stub: Don't let i2c adapters declare I2C_CLASS_SPD support if they support I2C_CLASS_HWMON
After removal of the legacy eeprom driver the only remaining I2C
client device driver supporting I2C_CLASS_SPD is jc42. Because this
driver also supports I2C_CLASS_HWMON, adapters don't have to
declare support for I2C_CLASS_SPD if they support I2C_CLASS_HWMON.
It's one step towards getting rid of I2C_CLASS_SPD mid-term.
Heiner Kallweit [Fri, 24 Nov 2023 10:16:10 +0000 (11:16 +0100)]
i2c: Don't let i2c adapters declare I2C_CLASS_SPD support if they support I2C_CLASS_HWMON
After removal of the legacy eeprom driver the only remaining I2C
client device driver supporting I2C_CLASS_SPD is jc42. Because this
driver also supports I2C_CLASS_HWMON, adapters don't have to
declare support for I2C_CLASS_SPD if they support I2C_CLASS_HWMON.
It's one step towards getting rid of I2C_CLASS_SPD mid-term.
Heiner Kallweit [Mon, 13 Nov 2023 11:37:15 +0000 (12:37 +0100)]
drm/amd/pm: Remove I2C_CLASS_SPD support
I2C_CLASS_SPD was used to expose the EEPROM content to user space,
via the legacy eeprom driver. Now that this driver has been removed,
we can remove I2C_CLASS_SPD support. at24 driver with explicit
instantiation should be used instead.
Heiner Kallweit [Thu, 23 Nov 2023 09:40:40 +0000 (10:40 +0100)]
include/linux/i2c.h: remove I2C_CLASS_DDC support
After removal of the legacy EEPROM driver and I2C_CLASS_DDC support in
olpc_dcon there's no i2c client driver left supporting I2C_CLASS_DDC.
Class-based device auto-detection is a legacy mechanism and shouldn't
be used in new code. So we can remove this class completely now.
Heiner Kallweit [Thu, 23 Nov 2023 09:40:25 +0000 (10:40 +0100)]
fbdev: remove I2C_CLASS_DDC support
After removal of the legacy EEPROM driver and I2C_CLASS_DDC support in
olpc_dcon there's no i2c client driver left supporting I2C_CLASS_DDC.
Class-based device auto-detection is a legacy mechanism and shouldn't
be used in new code. So we can remove this class completely now.
Heiner Kallweit [Thu, 23 Nov 2023 09:40:21 +0000 (10:40 +0100)]
drm: remove I2C_CLASS_DDC support
After removal of the legacy EEPROM driver and I2C_CLASS_DDC support in
olpc_dcon there's no i2c client driver left supporting I2C_CLASS_DDC.
Class-based device auto-detection is a legacy mechanism and shouldn't
be used in new code. So we can remove this class completely now.
Linus Torvalds [Thu, 18 Jan 2024 19:57:33 +0000 (11:57 -0800)]
Merge tag 'sched-urgent-2024-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Fix a cpufreq related performance regression on certain systems, where
the CPU would remain at the lowest frequency, degrading performance
substantially"
* tag 'sched-urgent-2024-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix frequency selection for non-invariant case
Linus Torvalds [Thu, 18 Jan 2024 19:43:55 +0000 (11:43 -0800)]
Merge tag 'usb-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt updates from Greg KH:
"Here is the big set of USB and Thunderbolt changes for 6.8-rc1.
Included in here are the following:
- Thunderbolt subsystem and driver updates for USB 4 hardware and
issues reported by real devices
- xhci driver updates
- dwc3 driver updates
- uvc_video gadget driver updates
- typec driver updates
- gadget string functions cleaned up
- other small changes
All of these have been in the linux-next tree for a while with no
reported issues"
* tag 'usb-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (169 commits)
usb: typec: tipd: fix use of device-specific init function
usb: typec: tipd: Separate reset for TPS6598x
usb: mon: Fix atomicity violation in mon_bin_vma_fault
usb: gadget: uvc: Remove nested locking
usb: gadget: uvc: Fix use are free during STREAMOFF
usb: typec: class: fix typec_altmode_put_partner to put plugs
dt-bindings: usb: dwc3: Limit num-hc-interrupters definition
dt-bindings: usb: xhci: Add num-hc-interrupters definition
xhci: add support to allocate several interrupters
USB: core: Use device_driver directly in struct usb_driver and usb_device_driver
arm64: dts: mediatek: mt8195: Add 'rx-fifo-depth' for cherry
usb: xhci-mtk: fix a short packet issue of gen1 isoc-in transfer
dt-bindings: usb: mtk-xhci: add a property for Gen1 isoc-in transfer issue
arm64: dts: qcom: msm8996: Remove PNoC clock from MSS
arm64: dts: qcom: msm8996: Remove AGGRE2 clock from SLPI
arm64: dts: qcom: msm8998: Remove AGGRE2 clock from SLPI
arm64: dts: qcom: msm8939: Drop RPM bus clocks
arm64: dts: qcom: sdm630: Drop RPM bus clocks
arm64: dts: qcom: qcs404: Drop RPM bus clocks
arm64: dts: qcom: msm8996: Drop RPM bus clocks
...
Linus Torvalds [Thu, 18 Jan 2024 19:37:24 +0000 (11:37 -0800)]
Merge tag 'tty-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial updates from Greg KH:
"Here is the big set of tty and serial driver changes for 6.8-rc1.
As usual, Jiri has a bunch of refactoring and cleanups for the tty
core and drivers in here, along with the usual set of rs485 updates
(someday this might work properly...)
Along with those, in here are changes for:
- sc16is7xx serial driver updates
- platform driver removal api updates
- amba-pl011 driver updates
- tty driver binding updates
- other small tty/serial driver updates and changes
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (197 commits)
serial: sc16is7xx: refactor EFR lock
serial: sc16is7xx: reorder code to remove prototype declarations
serial: sc16is7xx: refactor FIFO access functions to increase commonality
serial: sc16is7xx: drop unneeded MODULE_ALIAS
serial: sc16is7xx: replace hardcoded divisor value with BIT() macro
serial: sc16is7xx: add explicit return for some switch default cases
serial: sc16is7xx: add macro for max number of UART ports
serial: sc16is7xx: add driver name to struct uart_driver
serial: sc16is7xx: use i2c_get_match_data()
serial: sc16is7xx: use spi_get_device_match_data()
serial: sc16is7xx: use DECLARE_BITMAP for sc16is7xx_lines bitfield
serial: sc16is7xx: improve do/while loop in sc16is7xx_irq()
serial: sc16is7xx: remove obsolete loop in sc16is7xx_port_irq()
serial: sc16is7xx: set safe default SPI clock frequency
serial: sc16is7xx: add check for unsupported SPI modes during probe
serial: sc16is7xx: fix invalid sc16is7xx_lines bitfield in case of probe error
serial: 8250_exar: Set missing rs485_supported flag
serial: omap: do not override settings for RS485 support
serial: core, imx: do not set RS485 enabled if it is not supported
serial: core: make sure RS485 cannot be enabled when it is not supported
...
Michael Ellerman [Fri, 15 Dec 2023 12:44:49 +0000 (23:44 +1100)]
powerpc/64s: Increase default stack size to 32KB
There are reports of kernels crashing due to stack overflow while
running OpenShift (Kubernetes). The primary contributor to the stack
usage seems to be openvswitch, which is used by OVN-Kubernetes (based on
OVN (Open Virtual Network)), but NFS also contributes in some stack
traces.
There may be some opportunities to reduce stack usage in the openvswitch
code, but doing so potentially require tradeoffs vs performance, and
also requires testing across architectures.
Looking at stack usage across the kernel (using -fstack-usage), shows
that ppc64le stack frames are on average 50-100% larger than the
equivalent function built for x86-64. Which is not surprising given the
minimum stack frame size is 32 bytes on ppc64le vs 16 bytes on x86-64.
So increase the default stack size to 32KB for the modern 64-bit Book3S
platforms, ie. pseries (virtualised) and powernv (bare metal). That
leaves the older systems like G5s, and the AmigaOne (pasemi) with a 16KB
stack which should be sufficient on those machines.
Linus Torvalds [Thu, 18 Jan 2024 18:30:48 +0000 (10:30 -0800)]
Merge tag 'staging-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver updates from Greg KH:
"Here is the "big" set of staging driver changes for 6.8-rc1. It's not
really that big this release cycle, not much happened except for 186
patches of coding style cleanups. The majority was in the rtl8192e
driver, but there are other smaller changes in a few other staging
drivers, full details in the shortlog.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (186 commits)
Staging: rtl8192e: Rename variable OpMode
Staging: rtl8192e: Rename variable bIsAggregateFrame
Staging: rtl8192e: Rename function rtllib_EnableNetMonitorMode()
Staging: rtl8192e: Rename variable NumRxOkInPeriod
Staging: rtl8192e: Rename variable NumTxOkInPeriod
Staging: rtl8192e: Rename variable bUsed
staging: vme_user: print more detailed infomation when an error occurs
Staging: rtl8192e: Rename function rtllib_DisableNetMonitorMode()
Staging: rtl8192e: Rename variable bInitState
Staging: rtl8192e: Rename variable skb_waitQ
Staging: rtl8192e: Rename variable BasicRate
Staging: rtl8192e: Rename variable QueryRate
Staging: rtl8192e: Rename function rtllib_TURBO_Info()
Staging: rtl8192e: Rename function rtllib_WMM_Info()
Staging: rtl8192e: Rename function rtllib_MFIE_Grate()
Staging: rtl8192e: Rename function rtllib_MFIE_Brate()
Staging: rtl8192e: Fixup statement broken across 2 lines in rtllib_softmac_new_net()
Staging: rtl8192e: Fixup statement broken across 2 lines in rtllib_softmac_xmit()
Staging: rtl8192e: Fix function definition broken across multiple lines
Staging: rtl8192e: Fix statement broken across 2 lines in rtllib_rx_assoc_resp()
...
Steve French [Wed, 17 Jan 2024 22:15:18 +0000 (16:15 -0600)]
smb3: show beginning time for per share stats
In analyzing problems, one missing piece of debug data is when the
mount occurred. A related problem is when collecting stats we don't
know the period of time the stats covered, ie when this set of stats
for the tcon started to be collected. To make debugging easier track
the stats begin time. Set it when the mount occurred at mount time,
and reset it to current time whenever stats are reset. For example,
...
1) \\localhost\test
SMBs: 14 since 2024-01-17 22:17:30 UTC
Bytes read: 0 Bytes written: 0
Open files: 0 total (local), 0 open on server
TreeConnects: 1 total 0 failed
TreeDisconnects: 0 total 0 failed
...
2) \\localhost\scratch
SMBs: 24 since 2024-01-17 22:16:04 UTC
Bytes read: 0 Bytes written: 0
Open files: 0 total (local), 0 open on server
TreeConnects: 1 total 0 failed
TreeDisconnects: 0 total 0 failed
...
Note the time "since ... UTC" is now displayed in /proc/fs/cifs/Stats
for each share that is mounted.
Jakub Kicinski [Thu, 18 Jan 2024 17:54:24 +0000 (09:54 -0800)]
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-01-18
We've added 10 non-merge commits during the last 5 day(s) which contain
a total of 12 files changed, 806 insertions(+), 51 deletions(-).
The main changes are:
1) Fix an issue in bpf_iter_udp under backward progress which prevents
user space process from finishing iteration, from Martin KaFai Lau.
2) Fix BPF verifier to reject variable offset alu on registers with a type
of PTR_TO_FLOW_KEYS to prevent oob access, from Hao Sun.
3) Follow up fixes for kernel- and libbpf-side logic around handling
arg:ctx tagged arguments of BPF global subprogs, from Andrii Nakryiko.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
libbpf: warn on unexpected __arg_ctx type when rewriting BTF
selftests/bpf: add tests confirming type logic in kernel for __arg_ctx
bpf: enforce types for __arg_ctx-tagged arguments in global subprogs
bpf: extract bpf_ctx_convert_map logic and make it more reusable
libbpf: feature-detect arg:ctx tag support in kernel
selftests/bpf: Add test for alu on PTR_TO_FLOW_KEYS
bpf: Reject variable offset alu on PTR_TO_FLOW_KEYS
selftests/bpf: Test udp and tcp iter batching
bpf: Avoid iter->offset making backward progress in bpf_iter_udp
bpf: iter_udp: Retry with a larger batch size without going back to the previous bucket
====================
Tony Nguyen [Wed, 17 Jan 2024 17:25:32 +0000 (09:25 -0800)]
i40e: Include types.h to some headers
Commit 56df345917c0 ("i40e: Remove circular header dependencies and fix
headers") redistributed a number of includes from one large header file
to the locations they were needed. In some environments, types.h is not
included and causing compile issues. The driver should not rely on
implicit inclusion from other locations; explicitly include it to these
files.
Snippet of issue. Entire log can be seen through the Closes: link.
In file included from drivers/net/ethernet/intel/i40e/i40e_diag.h:7,
from drivers/net/ethernet/intel/i40e/i40e_diag.c:4:
drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h:33:9: error: unknown type name '__le16'
33 | __le16 flags;
| ^~~~~~
drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h:34:9: error: unknown type name '__le16'
34 | __le16 opcode;
| ^~~~~~
...
drivers/net/ethernet/intel/i40e/i40e_diag.h:22:9: error: unknown type name 'u32'
22 | u32 elements; /* number of elements if array */
| ^~~
drivers/net/ethernet/intel/i40e/i40e_diag.h:23:9: error: unknown type name 'u32'
23 | u32 stride; /* bytes between each element */
ipv6: mcast: fix data-race in ipv6_mc_down / mld_ifc_work
idev->mc_ifc_count can be written over without proper locking.
Originally found by syzbot [1], fix this issue by encapsulating calls
to mld_ifc_stop_work() (and mld_gq_stop_work() for good measure) with
mutex_lock() and mutex_unlock() accordingly as these functions
should only be called with mc_lock per their declarations.
[1]
BUG: KCSAN: data-race in ipv6_mc_down / mld_ifc_work
write to 0xffff88813a80c832 of 1 bytes by task 22 on cpu 1:
mld_ifc_work+0x54c/0x7b0 net/ipv6/mcast.c:2653
process_one_work kernel/workqueue.c:2627 [inline]
process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2700
worker_thread+0x525/0x730 kernel/workqueue.c:2781
...
Linus Torvalds [Thu, 18 Jan 2024 17:48:40 +0000 (09:48 -0800)]
Merge tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here are the set of driver core and kernfs changes for 6.8-rc1.
Nothing major in here this release cycle, just lots of small cleanups
and some tweaks on kernfs that in the very end, got reverted and will
come back in a safer way next release cycle.
Included in here are:
- more driver core 'const' cleanups and fixes
- fw_devlink=rpm is now the default behavior
- kernfs tiny changes to remove some string functions
- cpu handling in the driver core is updated to work better on many
systems that add topologies and cpus after booting
- other minor changes and cleanups
All of the cpu handling patches have been acked by the respective
maintainers and are coming in here in one series. Everything has been
in linux-next for a while with no reported issues"
* tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (51 commits)
Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock"
kernfs: convert kernfs_idr_lock to an irq safe raw spinlock
class: fix use-after-free in class_register()
PM: clk: make pm_clk_add_notifier() take a const pointer
EDAC: constantify the struct bus_type usage
kernfs: fix reference to renamed function
driver core: device.h: fix Excess kernel-doc description warning
driver core: class: fix Excess kernel-doc description warning
driver core: mark remaining local bus_type variables as const
driver core: container: make container_subsys const
driver core: bus: constantify subsys_register() calls
driver core: bus: make bus_sort_breadthfirst() take a const pointer
kernfs: d_obtain_alias(NULL) will do the right thing...
driver core: Better advertise dev_err_probe()
kernfs: Convert kernfs_path_from_node_locked() from strlcpy() to strscpy()
kernfs: Convert kernfs_name_locked() from strlcpy() to strscpy()
kernfs: Convert kernfs_walk_ns() from strlcpy() to strscpy()
initramfs: Expose retained initrd as sysfs file
fs/kernfs/dir: obey S_ISGID
kernel/cgroup: use kernfs_create_dir_ns()
...
Amit Cohen [Wed, 17 Jan 2024 15:04:21 +0000 (16:04 +0100)]
selftests: mlxsw: qos_pfc: Adjust the test to support 8 lanes
'qos_pfc' test checks PFC behavior. The idea is to limit the traffic
using a shaper somewhere in the flow of the packets. In this area, the
buffer is smaller than the buffer at the beginning of the flow, so it fills
up until there is no more space left. The test configures there PFC
which is supposed to notice that the headroom is filling up and send PFC
Xoff to indicate the transmitter to stop sending traffic for the priorities
sharing this PG.
The Xon/Xoff threshold is auto-configured and always equal to
2*(MTU rounded up to cell size). Even after sending the PFC Xoff packet,
traffic will keep arriving until the transmitter receives and processes
the PFC packet. This amount of traffic is known as the PFC delay allowance.
Currently the buffer for the delay traffic is configured as 100KB. The
MTU in the test is 10KB, therefore the threshold for Xoff is about 20KB.
This allows 80KB extra to be stored in this buffer.
8-lane ports use two buffers among which the configured buffer is split,
the Xoff threshold then applies to each buffer in parallel.
The test does not take into account the behavior of 8-lane ports, when the
ports are configured to 400Gbps with 8 lanes or 800Gbps with 8 lanes,
packets are dropped and the test fails.
Check if the relevant ports use 8 lanes, in such case double the size of
the buffer, as the headroom is split half-half.
Petr Machata [Wed, 17 Jan 2024 15:04:19 +0000 (16:04 +0100)]
mlxsw: spectrum_router: Register netdevice notifier before nexthop
If there are IPIP nexthops at the time when the driver is loaded (or the
devlink instance reloaded), the driver looks up the corresponding IPIP
entry. But IPIP entries are only created as a result of netdevice
notifications. Since the netdevice notifier is registered after the nexthop
notifier, mlxsw_sp_nexthop_type_init() never finds the IPIP entry,
registers the nexthop MLXSW_SP_NEXTHOP_TYPE_ETH, and fails to assign a CRIF
to the nexthop. Later on when the CRIF is necessary, the WARN_ON in
mlxsw_sp_nexthop_rif() triggers, causing the splat [1].
In order to fix the issue, reorder the netdevice notifier to be registered
before the nexthop one.
Ido Schimmel [Wed, 17 Jan 2024 15:04:18 +0000 (16:04 +0100)]
mlxsw: spectrum_acl_tcam: Fix stack corruption
When tc filters are first added to a net device, the corresponding local
port gets bound to an ACL group in the device. The group contains a list
of ACLs. In turn, each ACL points to a different TCAM region where the
filters are stored. During forwarding, the ACLs are sequentially
evaluated until a match is found.
One reason to place filters in different regions is when they are added
with decreasing priorities and in an alternating order so that two
consecutive filters can never fit in the same region because of their
key usage.
In Spectrum-2 and newer ASICs the firmware started to report that the
maximum number of ACLs in a group is more than 16, but the layout of the
register that configures ACL groups (PAGT) was not updated to account
for that. It is therefore possible to hit stack corruption [1] in the
rare case where more than 16 ACLs in a group are required.
Fix by limiting the maximum ACL group size to the minimum between what
the firmware reports and the maximum ACLs that fit in the PAGT register.
Add a test case to make sure the machine does not crash when this
condition is hit.
Ido Schimmel [Wed, 17 Jan 2024 15:04:17 +0000 (16:04 +0100)]
mlxsw: spectrum_acl_tcam: Fix NULL pointer dereference in error path
When calling mlxsw_sp_acl_tcam_region_destroy() from an error path after
failing to attach the region to an ACL group, we hit a NULL pointer
dereference upon 'region->group->tcam' [1].
Fix by retrieving the 'tcam' pointer using mlxsw_sp_acl_to_tcam().
Amit Cohen [Wed, 17 Jan 2024 15:04:16 +0000 (16:04 +0100)]
mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure
Lately, a bug was found when many TC filters are added - at some point,
several bugs are printed to dmesg [1] and the switch is crashed with
segmentation fault.
The issue starts when gen_pool_free() fails because of unexpected
behavior - a try to free memory which is already freed, this leads to BUG()
call which crashes the switch and makes many other bugs.
Trying to track down the unexpected behavior led to a bug in eRP code. The
function mlxsw_sp_acl_erp_table_alloc() gets a pointer to the allocated
index, sets the value and returns an error code. When gen_pool_alloc()
fails it returns address 0, we track it and return -ENOBUFS outside, BUT
the call for gen_pool_alloc() already override the index in erp_table
structure. This is a problem when such allocation is done as part of
table expansion. This is not a new table, which will not be used in case
of allocation failure. We try to expand eRP table and override the
current index (non-zero) with zero. Then, it leads to an unexpected
behavior when address 0 is freed twice. Note that address 0 is valid in
erp_table->base_index and indeed other tables use it.
gen_pool_alloc() fails in case that there is no space left in the
pre-allocated pool, in our case, the pool is limited to
ACL_MAX_ERPT_BANK_SIZE, which is read from hardware. When more than max
erp entries are required, we exceed the limit and return an error, this
error leads to "Failed to migrate vregion" print.
Fix this by changing erp_table->base_index only in case of a successful
allocation.
Add a test case for such a scenario. Without this fix it causes
segmentation fault:
$ TESTS="max_erp_entries_test" ./tc_flower.sh
./tc_flower.sh: line 988: 1560 Segmentation fault tc filter del dev $h2 ingress chain $i protocol ip pref $i handle $j flower &>/dev/null
loop: fix the the direct I/O support check when used on top of block devices
__loop_update_dio only checks the alignment requirement for block backed
file systems, but misses them for the case where the loop device is
created directly on top of another block device. Due to this creating
a loop device with default option plus the direct I/O flag on a > 512 byte
sector size file system will lead to incorrect I/O being submitted to the
lower block device and a lot of error from the lock layer. This can
be seen with xfstests generic/563.
Fix the code in __loop_update_dio by factoring the alignment check into
a helper, and calling that also for the struct block_device of a block
device inode.
Also remove the TODO comment talking about dynamically switching between
buffered and direct I/O, which is a would be a recipe for horrible
performance and occasional data loss.
Nathan Lynch [Tue, 16 Jan 2024 14:09:25 +0000 (08:09 -0600)]
seq_buf: Make DECLARE_SEQ_BUF() usable
Using the address operator on the array doesn't work:
./include/linux/seq_buf.h:27:27: error: initialization of ‘char *’
from incompatible pointer type ‘char (*)[128]’
[-Werror=incompatible-pointer-types]
27 | .buffer = &__ ## NAME ## _buffer, \
| ^
Apart from fixing that, we can improve DECLARE_SEQ_BUF() by using a
compound literal to define the buffer array without attaching a name
to it. This makes the macro a single statement, allowing constructs
such as:
Accessing an ethernet device that is powered off or clock gated might
cause the CPU to hang. Add ethnl_ops_begin/complete in
ethnl_set_features() to protect against this.
Robin Murphy [Wed, 17 Jan 2024 17:05:45 +0000 (17:05 +0000)]
arm64: Fix silcon-errata.rst formatting
Remove the errant blank lines to make the desired empty row separators
around the Fujitsu and ASR entries in the main table, rather than them
being their own separate tables which then look odd in the HTML view.
Mark Brown [Mon, 15 Jan 2024 20:15:46 +0000 (20:15 +0000)]
arm64/sme: Always exit sme_alloc() early with existing storage
When sme_alloc() is called with existing storage and we are not flushing we
will always allocate new storage, both leaking the existing storage and
corrupting the state. Fix this by separating the checks for flushing and
for existing storage as we do for SVE.
Callers that reallocate (eg, due to changing the vector length) should
call sme_free() themselves.
Mark Brown [Mon, 15 Jan 2024 19:53:01 +0000 (19:53 +0000)]
arm64/fpsimd: Remove spurious check for SVE support
There is no need to check for SVE support when changing vector lengths,
even if the system is SME only we still need SVE storage for the streaming
SVE state.
Mark Brown [Mon, 15 Jan 2024 18:42:38 +0000 (18:42 +0000)]
arm64/ptrace: Don't flush ZA/ZT storage when writing ZA via ptrace
When writing ZA we currently unconditionally flush the buffer used to store
it as part of ensuring that it is allocated. Since this buffer is shared
with ZT0 this means that a write to ZA when PSTATE.ZA is already set will
corrupt the value of ZT0 on a SME2 system. Fix this by only flushing the
backing storage if PSTATE.ZA was not previously set.
This will mean that short or failed writes may leave stale data in the
buffer, this seems as correct as our current behaviour and unlikely to be
something that userspace will rely on.
This expands to the same code, but is simpler for a human to follow as
it avoids duplicates the restore of LR+SP, and makes it clear that the
ERET is associated with the SB.
There should be no functional change as a result of this patch.
Currently the ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD workaround isn't
quite right, as it is supposed to be applied after the last explicit
memory access, but is immediately followed by an LDR.
The ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD workaround is used to
handle Cortex-A520 erratum 2966298 and Cortex-A510 erratum 3117295,
which are described in:
| If pagetable isolation is disabled, the context switch logic in the
| kernel can be updated to execute the following sequence on affected
| cores before exiting to EL0, and after all explicit memory accesses:
|
| 1. A non-shareable TLBI to any context and/or address, including
| unused contexts or addresses, such as a `TLBI VALE1 Xzr`.
|
| 2. A DSB NSH to guarantee completion of the TLBI.
The important part being that the TLBI+DSB must be placed "after all
explicit memory accesses".
Unfortunately, as-implemented, the TLBI+DSB is immediately followed by
an LDR, as we have:
This patch fixes this by reworking the logic to place the TLBI+DSB
immediately before the ERET, after all explicit memory accesses.
The ERET is currently in a separate alternative block, and alternatives
cannot be nested. To account for this, the alternative block for
ARM64_UNMAP_KERNEL_AT_EL0 is replaced with a single alternative branch
to skip the KPTI logic, with the new shape of the logic being:
The new structure means that the workaround is only applied when KPTI is
not in use; this is fine as noted in the documented implications of the
erratum:
| Pagetable isolation between EL0 and higher level ELs prevents the
| issue from occurring.
... and as per the workaround description quoted above, the workaround
is only necessary "If pagetable isolation is disabled".
Benjamin Poirier [Tue, 16 Jan 2024 15:49:26 +0000 (10:49 -0500)]
selftests: bonding: Add more missing config options
As a followup to commit 03fb8565c880 ("selftests: bonding: add missing
build configs"), add more networking-specific config options which are
needed for bonding tests.
For testing, I used the minimal config generated by virtme-ng and I added
the options in the config file. All bonding tests passed.
Fixes: bbb774d921e2 ("net: Add tests for bonding and team address list management") # for ipv6 Fixes: 6cbe791c0f4e ("kselftest: bonding: add num_grat_arp test") # for tc options Fixes: 222c94ec0ad4 ("selftests: bonding: add tests for ether type changes") # for nlmon Suggested-by: Jakub Kicinski <[email protected]> Signed-off-by: Benjamin Poirier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
Jakub Kicinski [Tue, 16 Jan 2024 15:43:11 +0000 (07:43 -0800)]
selftests: netdevsim: add a config file
netdevsim tests aren't very well integrated with kselftest,
which has its advantages and disadvantages. But regardless
of the intended integration - a config file to know what kernel
to build is very useful, add one.
====================
Tighten up arg:ctx type enforcement
Follow up fixes for kernel-side and libbpf-side logic around handling arg:ctx
(__arg_ctx) tagged arguments of BPF global subprogs.
Patch #1 adds libbpf feature detection of kernel-side __arg_ctx support to
avoid unnecessary rewriting BTF types. With stricter kernel-side type
enforcement this is now mandatory to avoid problems with using `struct
bpf_user_pt_regs_t` instead of actual typedef. For __arg_ctx tagged arguments
verifier is now supporting either `bpf_user_pt_regs_t` typedef or resolves it
down to the actual struct (pt_regs/user_pt_regs/user_regs_struct), depending
on architecture), but for old kernels without __arg_ctx support it's more
backwards compatible for libbpf to use `struct bpf_user_pt_regs_t` rewrite
which will work on wider range of kernels. So feature detection prevent libbpf
accidentally breaking global subprogs on new kernels.
We also adjust selftests to do similar feature detection (much simpler, but
potentially breaking due to kernel source code refactoring, which is fine for
selftests), and skip tests expecting libbpf's BTF type rewrites.
Patch #2 is preparatory refactoring for patch #3 which adds type enforcement
for arg:ctx tagged global subprog args. See the patch for specifics.
Patch #4 adds many new cases to ensure type logic works as expected.
Finally, patch #5 adds a relevant subset of kernel-side type checks to
__arg_ctx cases that libbpf supports rewrite of. In libbpf's case, type
violations are reported as warnings and BTF rewrite is not performed, which
will eventually lead to BPF verifier complaining at program verification time.
Good care was taken to avoid conflicts between bpf and bpf-next tree (which
has few follow up refactorings in the same code area). Once trees converge
some of the code will be moved around a bit (and some will be deleted), but
with no change to functionality or general shape of the code.
v2->v3:
- support `bpf_user_pt_regs_t` typedef for KPROBE and PERF_EVENT (CI);
v1->v2:
- add user_pt_regs and user_regs_struct support for PERF_EVENT (CI);
- drop FEAT_ARG_CTX_TAG enum leftover from patch #1;
- fix warning about default: without break in the switch (CI).
====================
Andrii Nakryiko [Thu, 18 Jan 2024 03:31:43 +0000 (19:31 -0800)]
libbpf: warn on unexpected __arg_ctx type when rewriting BTF
On kernel that don't support arg:ctx tag, before adjusting global
subprog BTF information to match kernel's expected canonical type names,
make sure that types used by user are meaningful, and if not, warn and
don't do BTF adjustments.
This is similar to checks that kernel performs, but narrower in scope,
as only a small subset of BPF program types can be accommodated by
libbpf using canonical type names.
Libbpf unconditionally allows `struct pt_regs *` for perf_event program
types, unlike kernel, which supports that conditionally on architecture.
This is done to keep things simple and not cause unnecessary false
positives. This seems like a minor and harmless deviation, which in
real-world programs will be caught by kernels with arg:ctx tag support
anyways. So KISS principle.
This logic is hard to test (especially on latest kernels), so manual
testing was performed instead. Libbpf emitted the following warning for
perf_event program with wrong context argument type:
libbpf: prog 'arg_tag_ctx_perf': subprog 'subprog_ctx_tag' arg#0 is expected to be of `struct bpf_perf_event_data *` type
Andrii Nakryiko [Thu, 18 Jan 2024 03:31:41 +0000 (19:31 -0800)]
bpf: enforce types for __arg_ctx-tagged arguments in global subprogs
Add enforcement of expected types for context arguments tagged with
arg:ctx (__arg_ctx) tag.
First, any program type will accept generic `void *` context type when
combined with __arg_ctx tag.
Besides accepting "canonical" struct names and `void *`, for a bunch of
program types for which program context is actually a named struct, we
allows a bunch of pragmatic exceptions to match real-world and expected
usage:
- for both kprobes and perf_event we allow `bpf_user_pt_regs_t *` as
canonical context argument type, where `bpf_user_pt_regs_t` is a
*typedef*, not a struct;
- for kprobes, we also always accept `struct pt_regs *`, as that's what
actually is passed as a context to any kprobe program;
- for perf_event, we resolve typedefs (unless it's `bpf_user_pt_regs_t`)
down to actual struct type and accept `struct pt_regs *`, or
`struct user_pt_regs *`, or `struct user_regs_struct *`, depending
on the actual struct type kernel architecture points `bpf_user_pt_regs_t`
typedef to; otherwise, canonical `struct bpf_perf_event_data *` is
expected;
- for raw_tp/raw_tp.w programs, `u64/long *` are accepted, as that's
what's expected with BPF_PROG() usage; otherwise, canonical
`struct bpf_raw_tracepoint_args *` is expected;
- tp_btf supports both `struct bpf_raw_tracepoint_args *` and `u64 *`
formats, both are coded as expections as tp_btf is actually a TRACING
program type, which has no canonical context type;
- iterator programs accept `struct bpf_iter__xxx *` structs, currently
with no further iterator-type specific enforcement;
- fentry/fexit/fmod_ret/lsm/struct_ops all accept `u64 *`;
- classic tracepoint programs, as well as syscall and freplace
programs allow any user-provided type.
In all other cases kernel will enforce exact match of struct name to
expected canonical type. And if user-provided type doesn't match that
expectation, verifier will emit helpful message with expected type name.
Note a bit unnatural way the check is done after processing all the
arguments. This is done to avoid conflict between bpf and bpf-next
trees. Once trees converge, a small follow up patch will place a simple
btf_validate_prog_ctx_type() check into a proper ARG_PTR_TO_CTX branch
(which bpf-next tree patch refactored already), removing duplicated
arg:ctx detection logic.
Andrii Nakryiko [Thu, 18 Jan 2024 03:31:40 +0000 (19:31 -0800)]
bpf: extract bpf_ctx_convert_map logic and make it more reusable
Refactor btf_get_prog_ctx_type() a bit to allow reuse of
bpf_ctx_convert_map logic in more than one places. Simplify interface by
returning btf_type instead of btf_member (field reference in BTF).
To do the above we need to touch and start untangling
btf_translate_to_vmlinux() implementation. We do the bare minimum to
not regress anything for btf_translate_to_vmlinux(), but its
implementation is very questionable for what it claims to be doing.
Mapping kfunc argument types to kernel corresponding types conceptually
is quite different from recognizing program context types. Fixing this
is out of scope for this change though.
Andrii Nakryiko [Thu, 18 Jan 2024 03:31:39 +0000 (19:31 -0800)]
libbpf: feature-detect arg:ctx tag support in kernel
Add feature detector of kernel-side arg:ctx (__arg_ctx) tag support. If
this is detected, libbpf will avoid doing any __arg_ctx-related BTF
rewriting and checks in favor of letting kernel handle this completely.
test_global_funcs/ctx_arg_rewrite subtest is adjusted to do the same
feature detection (albeit in much simpler, though round-about and
inefficient, way), and skip the tests. This is done to still be able to
execute this test on older kernels (like in libbpf CI).
Note, BPF token series ([0]) does a major refactor and code moving of
libbpf-internal feature detection "framework", so to avoid unnecessary
conflicts we keep newly added feature detection stand-alone with ad-hoc
result caching. Once things settle, there will be a small follow up to
re-integrate everything back and move code into its final place in
newly-added (by BPF token series) features.c file.
Maxim Kochetkov [Thu, 14 Dec 2023 06:39:06 +0000 (09:39 +0300)]
riscv: optimize ELF relocation function in riscv
The patch can optimize the running times of insmod command by modify ELF
relocation function.
In the 5.10 and latest kernel, when install the riscv ELF drivers which
contains multiple symbol table items to be relocated, kernel takes a lot
of time to execute the relocation. For example, we install a 3+MB driver
need 180+s.
We focus on the riscv architecture handle R_RISCV_HI20 and R_RISCV_LO20
type items relocation function in the arch\riscv\kernel\module.c and
find that there are two-loops in the function. If we modify the begin
number in the second for-loops iteration, we could save significant time
for installation. We install the same 3+MB driver could just need 2s.
Xiao Wang [Sun, 12 Nov 2023 09:52:44 +0000 (17:52 +0800)]
riscv: Optimize hweight API with Zbb extension
The Hamming Weight of a number is the total number of bits set in it, so
the cpop/cpopw instruction from Zbb extension can be used to accelerate
hweight() API.
This series includes a three ftrace improvements for RISC-V:
1. Do not require to run recordmcount at build time (patch 1)
2. Simplification of the function graph functionality (patch 2)
3. Enable DYNAMIC_FTRACE_WITH_DIRECT_CALLS (patch 3 and 4)
The series has been tested on Qemu/rv64 virt/Debian sid with the
following test configs:
CONFIG_FTRACE_SELFTEST=y
CONFIG_FTRACE_STARTUP_TEST=y
CONFIG_SAMPLE_FTRACE_DIRECT=m
CONFIG_SAMPLE_FTRACE_DIRECT_MULTI=m
CONFIG_SAMPLE_FTRACE_OPS=m
All tests pass.
* b4-shazam-merge:
samples: ftrace: Add RISC-V support for SAMPLE_FTRACE_DIRECT[_MULTI]
riscv: ftrace: Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support
riscv: ftrace: Make function graph use ftrace directly
riscv: select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
Song Shuai [Thu, 30 Nov 2023 12:15:30 +0000 (13:15 +0100)]
riscv: ftrace: Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support
Select the DYNAMIC_FTRACE_WITH_DIRECT_CALLS to provide the
register_ftrace_direct[_multi] interfaces allowing users to register
the customed trampoline (direct_caller) as the mcount for one or more
target functions. And modify_ftrace_direct[_multi] are also provided
for modifying direct_caller.
To make the direct_caller and the other ftrace hooks (e.g.
function/fgraph tracer, k[ret]probes) co-exist, a temporary register
is nominated to store the address of direct_caller in
ftrace_regs_caller. After the setting of the address direct_caller by
direct_ops->func and the RESTORE_REGS in ftrace_regs_caller,
direct_caller will be jumped to by the `jr` inst.
Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support for RISC-V.
Song Shuai [Thu, 30 Nov 2023 12:15:29 +0000 (13:15 +0100)]
riscv: ftrace: Make function graph use ftrace directly
Similar to commit 0c0593b45c9b ("x86/ftrace: Make function graph use
ftrace directly") and commit c4a0ebf87ceb ("arm64/ftrace: Make
function graph use ftrace directly"), RISC-V has no need for a special
graph tracer hook. The graph_ops::func function can be used to install
the return_hooker.
This cleanup only changes the FTRACE_WITH_REGS implementation, leaving
the mcount-based implementation is unaffected.
Perform the simplification, and also cleanup the register save/restore
macros.
In commit afc76b8b8011 ("riscv: Using PATCHABLE_FUNCTION_ENTRY instead
of MCOUNT") RISC-V added support for -fpatchable-function-entry, which
removes the need for recordmcount.
Select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY to tell the build
system not to run recordmcount.
This series disables DWARF5 for LLVM versions where it is known to be
broken due to linker relaxation.
* b4-shazam-merge:
lib/Kconfig.debug: Update AS_HAS_NON_CONST_LEB128 comment and name
riscv: Restrict DWARF5 when building with LLVM to known working versions
riscv: Hoist linker relaxation disabling logic into Kconfig
lib/Kconfig.debug: Update AS_HAS_NON_CONST_LEB128 comment and name
Fangrui noted that the comment around CONFIG_AS_HAS_NON_CONST_LEB128
could be made more accurate because explicit .sleb128 directives are not
emitted, only .uleb128 directives are. Rename the symbol to
CONFIG_AS_HAS_NON_CONST_ULEB128 as a result.
Further clarifications include replacing "symbol deltas" with the more
accurate "label differences", noting that this issue has been resolved
in newer binutils (2.41+), and it only occurs when a port uses RISC-V
style linker relaxation.
riscv: Restrict DWARF5 when building with LLVM to known working versions
LLVM prior to 18.0.0 would generate incorrect debug info for DWARF5 due
to linker relaxation, which was worked around in clang by defaulting
RISC-V to DWARF4 [1]. Unfortunately, this workaround does not work for
the kernel because the DWARF version can be independently changed from
the default in Kconfig.
Do not allow DWARF5 to be selected for RISC-V when using linker
relaxation (ld.lld >= 15.0.0) and a version of LLVM that does not have
the fixes (the integrated assembler [2] and ld.lld [3] < 18.0.0)
necessary to generate the correct debug info.
riscv: Hoist linker relaxation disabling logic into Kconfig
Certain configurations may need to be disabled if linker relaxation is
in use, such as DWARF5 with ld.lld < 18. Hoist the logic of whether or
not linker relaxation is in use into Kconfig so decisions can be made at
configuration time.
Each architecture generally implements fine-tuned checksum functions to
leverage the instruction set. This patch adds the main checksum
functions that are used in networking. Tested on QEMU, this series
allows the CHECKSUM_KUNIT tests to complete an average of 50.9% faster.
This patch takes heavy use of the Zbb extension using alternatives
patching.
To test this patch, enable the configs for KUNIT, then CHECKSUM_KUNIT.
I have attempted to make these functions as optimal as possible, but I
have not ran anything on actual riscv hardware. My performance testing
has been limited to inspecting the assembly, running the algorithms on
x86 hardware, and running in QEMU.
ip_fast_csum is a relatively small function so even though it is
possible to read 64 bits at a time on compatible hardware, the
bottleneck becomes the clean up and setup code so loading 32 bits at a
time is actually faster.
* b4-shazam-merge:
kunit: Add tests for csum_ipv6_magic and ip_fast_csum
riscv: Add checksum library
riscv: Add checksum header
riscv: Add static key for misaligned accesses
asm-generic: Improve csum_fold
Charlie Jenkins [Mon, 8 Jan 2024 23:57:05 +0000 (15:57 -0800)]
riscv: Add checksum library
Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit
will load from the buffer in groups of 32 bits, and when compiled for
64-bit will load in groups of 64 bits.
Additionally provide riscv optimized implementation of csum_ipv6_magic.
Charlie Jenkins [Mon, 8 Jan 2024 23:57:04 +0000 (15:57 -0800)]
riscv: Add checksum header
Provide checksum algorithms that have been designed to leverage riscv
instructions such as rotate. In 64-bit, can take advantage of the larger
register to avoid some overflow checking.
Charlie Jenkins [Mon, 8 Jan 2024 23:57:03 +0000 (15:57 -0800)]
riscv: Add static key for misaligned accesses
Support static branches depending on the value of misaligned accesses.
This will be used by a later patch in the series. At any point in time,
this static branch will only be enabled if all online CPUs are
considered "fast".
Charlie Jenkins [Mon, 8 Jan 2024 23:57:02 +0000 (15:57 -0800)]
asm-generic: Improve csum_fold
This csum_fold implementation introduced into arch/arc by Vineet Gupta
is better than the default implementation on at least arc, x86, and
riscv. Using GCC trunk and compiling non-inlined version, this
implementation has 41.6667%, 25% fewer instructions on riscv64, x86-64
respectively with -O3 optimization. Most implmentations override this
default in asm, but this should be more performant than all of those
other implementations except for arm which has barrel shifting and
sparc32 which has a carry flag.
Andrew Jones [Wed, 17 Jan 2024 13:09:34 +0000 (14:09 +0100)]
RISC-V: selftests: cbo: Ensure asm operands match constraints
The 'i' constraint expects a constant operand, which fn and its
constant derivative MK_CBO(fn) are, but passing fn through a function
as a parameter and using a local variable for MK_CBO(fn) allow the
compiler to lose sight of that when no optimization is done. Use
a macro instead of a function and skip the local variable to ensure
the compiler uses constants, matching the asm constraints.
Linus Torvalds [Thu, 18 Jan 2024 00:23:17 +0000 (16:23 -0800)]
Merge tag 'pci-v6.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Reserve ECAM so we don't assign it to PCI BARs; this works around
bugs where BIOS included ECAM in a PNP0A03 host bridge window,
didn't reserve it via a PNP0C02 motherboard device, and didn't
allocate space for SR-IOV VF BARs (Bjorn Helgaas)
- Add MMCONFIG/ECAM debug logging (Bjorn Helgaas)
- Rename 'MMCONFIG' to 'ECAM' to match spec usage (Bjorn Helgaas)
- Log device type (Root Port, Switch Port, etc) during enumeration
(Bjorn Helgaas)
- Log bridges before downstream devices so the dmesg order is more
logical (Bjorn Helgaas)
- Log resource names (BAR 0, VF BAR 0, bridge window, etc)
consistently instead of a mix of names and "reg 0x10" (Puranjay
Mohan, Bjorn Helgaas)
- Fix 64GT/s effective data rate calculation to use 1b/1b encoding
rather than the 8b/10b or 128b/130b used by lower rates (Ilpo
Järvinen)
- Use PCI_HEADER_TYPE_* instead of literals in x86, powerpc, SCSI
lpfc (Ilpo Järvinen)
- Clean up open-coded PCIBIOS return code mangling (Ilpo Järvinen)
Resource management:
- Restructure pci_dev_for_each_resource() to avoid computing the
address of an out-of-bounds array element (the bounds check was
performed later so the element was never actually *read*, but it's
nicer to avoid even computing an out-of-bounds address) (Andy
Shevchenko)
Driver binding:
- Convert pci-host-common.c platform .remove() callback to
.remove_new() returning 'void' since it's not useful to return
error codes here (Uwe Kleine-König)
- Convert exynos, keystone, kirin from .remove() to .remove_new(),
which returns void instead of int (Uwe Kleine-König)
- Drop unused struct pci_driver.node member (Mathias Krause)
Virtualization:
- Add ACS quirk for more Zhaoxin Root Ports (LeoLiuoc)
Error handling:
- Log AER errors as "Correctable" (not "Corrected") or
"Uncorrectable" to match spec terminology (Bjorn Helgaas)
- Decode Requester ID when no error info found instead of printing
the raw hex value (Bjorn Helgaas)
Endpoint framework:
- Use a unique test pattern for each BAR in the pci_endpoint_test to
make it easier to debug address translation issues (Niklas Cassel)
Broadcom STB PCIe controller driver:
- Add DT property "brcm,clkreq-mode" and driver support for different
CLKREQ# modes to make ASPM L1.x states possible (Jim Quinlan)
Freescale Layerscape PCIe controller driver:
- Add suspend/resume support for Layerscape LS1043a and LS1021a,
including software-managed PME_Turn_Off and transitions between L0,
L2/L3_Ready Link states (Frank Li)
MediaTek PCIe controller driver:
- Clear MSI interrupt status before handler to avoid missing MSIs
that occur after the handler (qizhong cheng)
MediaTek PCIe Gen3 controller driver:
- Update mediatek-gen3 translation window setup to handle MMIO space
that is not a power of two in size (Jianjun Wang)
Qualcomm PCIe controller driver:
- Increase qcom iommu-map maxItems to accommodate SDX55 (five
entries) and SDM845 (sixteen entries) (Krzysztof Kozlowski)
- Describe qcom,pcie-sc8180x clocks and resets accurately (Krzysztof
Kozlowski)
- Describe qcom,pcie-sm8150 clocks and resets accurately (Krzysztof
Kozlowski)
- Correct the qcom "reset-name" property, previously incorrectly
called "reset-names" (Krzysztof Kozlowski)
- Document qcom,pcie-sm8650, based on qcom,pcie-sm8550 (Neil
Armstrong)
Renesas R-Car PCIe controller driver:
- Replace of_device.h with explicit of.h include to untangle header
usage (Rob Herring)
- Add DT and driver support for optional miniPCIe 1.5v and 3.3v
regulators on KingFisher (Wolfram Sang)
SiFive FU740 PCIe controller driver:
- Convert fu740 CONFIG_PCIE_FU740 dependency from SOC_SIFIVE to
ARCH_SIFIVE (Conor Dooley)
Synopsys DesignWare PCIe controller driver:
- Align iATU mapping for endpoint MSI-X (Niklas Cassel)
- Drop "host_" prefix from struct dw_pcie_host_ops members (Yoshihiro
Shimoda)
- Drop "ep_" prefix from struct dw_pcie_ep_ops members (Yoshihiro
Shimoda)
- Rename struct dw_pcie_ep_ops.func_conf_select() to
.get_dbi_offset() to be more descriptive (Yoshihiro Shimoda)
- Add j721e DT and driver support for 'num-lanes' for devices that
support x1, x2, or x4 Links (Matt Ranostay)
- Add j721e DT compatible strings and driver support for j784s4 (Matt
Ranostay)
- Make TI J721E Kconfig depend on ARCH_K3 since the hardware is
specific to those TI SoC parts (Peter Robinson)
TI Keystone PCIe controller driver:
- Hold power management references to all PHYs while enabling them to
avoid a race when one provides clocks to others (Siddharth
Vadapalli)
Xilinx XDMA PCIe controller driver:
- Remove redundant dev_err(), since platform_get_irq() and
platform_get_irq_byname() already log errors (Yang Li)
- Fix uninitialized symbols in xilinx_pl_dma_pcie_setup_irq()
(Krzysztof Wilczyński)
- Fix xilinx_pl_dma_pcie_init_irq_domain() error return when
irq_domain_add_linear() fails (Harshit Mogalapalli)
MicroSemi Switchtec management driver:
- Do dma_mrpc cleanup during switchtec_pci_remove() to match its devm
ioremapping in switchtec_pci_probe(). Previously the cleanup was
done in stdev_release(), which used stale pointers if stdev->cdev
happened to be open when the PCI device was removed (Daniel
Stodden)
Miscellaneous:
- Convert interrupt terminology from "legacy" to "INTx" to be more
specific and match spec terminology (Damien Le Moal)
- In dw-xdata-pcie, pci_endpoint_test, and vmd, replace usage of
deprecated ida_simple_*() API with ida_alloc() and ida_free()
(Christophe JAILLET)"
* tag 'pci-v6.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (97 commits)
PCI: Fix kernel-doc issues
PCI: brcmstb: Configure HW CLKREQ# mode appropriate for downstream device
dt-bindings: PCI: brcmstb: Add property "brcm,clkreq-mode"
PCI: mediatek-gen3: Fix translation window size calculation
PCI: mediatek: Clear interrupt status before dispatching handler
PCI: keystone: Fix race condition when initializing PHYs
PCI: xilinx-xdma: Fix error code in xilinx_pl_dma_pcie_init_irq_domain()
PCI: xilinx-xdma: Fix uninitialized symbols in xilinx_pl_dma_pcie_setup_irq()
PCI: rcar-gen4: Fix -Wvoid-pointer-to-enum-cast error
PCI: iproc: Fix -Wvoid-pointer-to-enum-cast warning
PCI: dwc: Add dw_pcie_ep_{read,write}_dbi[2] helpers
PCI: dwc: Rename .func_conf_select to .get_dbi_offset in struct dw_pcie_ep_ops
PCI: dwc: Rename .ep_init to .init in struct dw_pcie_ep_ops
PCI: dwc: Drop host prefix from struct dw_pcie_host_ops members
misc: pci_endpoint_test: Use a unique test pattern for each BAR
PCI: j721e: Make TI J721E depend on ARCH_K3
PCI: j721e: Add TI J784S4 PCIe configuration
PCI/AER: Use explicit register sizes for struct members
PCI/AER: Decode Requester ID when no error info found
PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
...
Mia Lin [Mon, 13 Nov 2023 10:38:07 +0000 (18:38 +0800)]
rtc: nuvoton: Compatible with NCT3015Y-R and NCT3018Y-R
The NCT3015Y-R and NCT3018Y-R use the same datasheet
but have different topologies as follows.
- Topology (Only 1st i2c can set TWO bit and HF bit)
In NCT3015Y-R,
rtc 1st i2c is connected to a host CPU
rtc 2nd i2c is connected to a BMC
In NCT3018Y-R,
rtc 1st i2c is connected to a BMC
rtc 2nd i2c is connected to a host CPU
In order to be compatible with NCT3015Y-R and NCT3018Y-R,
- In probe,
If part number is NCT3018Y-R, only set HF bit to 24-Hour format.
Else, do nothing
- In set_time,
If part number is NCT3018Y-R && TWO bit is 0,
change TWO bit to 1, and restore TWO bit after updating time.
Linus Torvalds [Wed, 17 Jan 2024 23:55:33 +0000 (15:55 -0800)]
Merge tag 'pinctrl-v6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control updates from Linus Walleij:
"For this kernel cycle I managed an immutable branch for the PEF2256
WAN framer that has some pin control portions. It already landed in
your tree through the net pull request but here it is mentioned again.
The most interesting is perhaps the Samsung Exynos subdrivers for the
Tensor SoC used in Google Pixel 6 and the ExynosAuto subdriver for
automotive. Along with the earlier merged Tesla FSD subdriver it shows
some of the versatile uses of the Samsung Exynos silicon. It is also
used in the latest version of Axis Communications ARTPEC chips so it
is a very widely deployed SoC family.
We also have the Intel Meteor Lake SoC which I think is for laptops.
It's a pretty interesting chip with Xe graphics and integrated PCH.
Core changes:
- A new PINCTRL_GROUP_DESC() infrastructure macro is added and used
in different drivers, generic group description struct group_desc
is now used all over the place.
New drivers:
- New driver for the Texas Instruments TPS6494 Power Management IC.
- New driver for the Lantic PEF2256 framer pin multiplexer. This IC
has some pins that can be reconfigured in different ways. The
actual driver comes on an immutable branch with the net WAN parts,
the IC is some latest-and-greatest serial line funnel for e.g.
wireless access points.
- New subdriver for the Samsung Exynos Auto V920 pin controller, used
for automotive applications.
- New subdriver for the Samsung "GS101" SoC pin controller, this is
the Google "Tensor" SoC used in the Google Pixel 6.
- New subdriver for the Intel Meteor Point SoC pin controller.
- New subdriver for the Qualcomm SM8650 top level (TLMM) and LPASS
pin controllers.
- New subdriver for the Qualcomm X1E80100 top level (TLMM) pin
controller.
- New subdriver for the Qualcomm SM4450 top level (TLMM) pin
controller.
- The "single" pin controller now supports the Texas Instruments
J7200 SoC.
Improvements:
- Intel has created a new (Intel-)generic pin controller driver that
is now used by all contemporary Intel platforms.
- Intel is now also making use of some cleanup helpers.
- Enble 910 Ohm bias in the Intel Tangier driver.
- The Samsung driver now suppors irq_set_affinity() in it's IRQ chip
giving support for non wake up external gpio interrupts"
* tag 'pinctrl-v6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (112 commits)
pinctrl: samsung: constify iomem pointers
pinctrl: cy8c95x0: Cache muxed registers
dt-bindings: pinctrl: xilinx: Rename *gpio to *gpio-grp
pinctrl: qcom: lpass-lpi: remove duplicated include
dt-bindings: pinctrl: qcom: drop common properties and allow wakeup-parent
dt-bindings: pinctrl: qcom: drop common properties
dt-bindings: pinctrl: qcom,ipq5018-tlmm: use common TLMM bindings
dt-bindings: pinctrl: qcom,x1e80100-tlmm: restrict number of interrupts
dt-bindings: pinctrl: qcom,sm8650-tlmm: restrict number of interrupts
dt-bindings: pinctrl: qcom,sm8550-tlmm: restrict number of interrupts
dt-bindings: pinctrl: qcom,sdx75-tlmm: restrict number of interrupts
dt-bindings: pinctrl: qcom,sa8775p-tlmm: restrict number of interrupts
dt-bindings: pinctrl: qcom,qdu1000-tlmm: restrict number of interrupts
dt-bindings: pinctrl: qcom: create common LPASS LPI schema
pinctrl: qcom: sm4450: dd SM4450 pinctrl driver
dt-bindings: pinctrl: qcom: Add SM4450 pinctrl
dt-bindings: pinctrl: qcom,pmic-mpp: clean up example
pinctrl: intel: Add Intel Meteor Point pin controller and GPIO support
pinctrl: renesas: rzg2l: Add input enable to the Ethernet pins
pinctrl: renesas: rzg2l: Add output enable support
...
Linus Torvalds [Wed, 17 Jan 2024 23:25:27 +0000 (15:25 -0800)]
Merge tag 'leds-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds
Pull LED updates from Lee Jones:
"New Drivers:
- Add support for Allwinner A100 RGB LED controller
- Add support for Maxim 5970 Dual Hot-swap controller
New Device Support:
- Add support for AW20108 to Awinic LED driver
New Functionality:
- Extend support for Net speeds to include; 2.5G, 5G and 10G
- Allow tx/rx and cts/dsr/dcd/rng TTY LEDS to be turned on and off
via sysfs if required
- Add support for hardware control in AW200xx
Fix-ups:
- Use safer methods for string handling
- Improve error handling; return proper error values, simplify,
avoid duplicates, etc
- Replace Mutex use with the Completion mechanism
- Fix include lists; alphabetise, remove unused, explicitly add used
- Use generic platform device properties
- Use/convert to new/better APIs/helpers/MACROs instead of
hand-rolling implementations
- Device Tree binding adaptions/conversions/creation
- Continue work to remove superfluous platform .remove() call-backs
- Remove superfluous/defunct code
- Trivial; whitespace, unused variables, spelling, clean-ups, etc
- Avoid unnecessary duplicate locks
Bug Fixes:
- Repair Kconfig based dependency lists
- Ensure unused dynamically allocated data is freed after use
- Fix support for brightness control
- Add missing sufficient delays during reset to ensure correct
operation
- Avoid division-by-zero issues"
* tag 'leds-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds: (45 commits)
leds: trigger: netdev: Add core support for hw not supporting fallback to LED sw control
leds: trigger: panic: Don't register panic notifier if creating the trigger failed
leds: sun50i-a100: Convert to be agnostic to property provider
leds: max5970: Add missing headers
leds: max5970: Make use of dev_err_probe()
leds: max5970: Make use of device properties
leds: max5970: Remove unused variable
leds: rgb: Drop obsolete dependency on COMPILE_TEST
leds: sun50i-a100: Avoid division-by-zero warning
leds: trigger: Remove unused function led_trigger_rename_static()
leds: qcom-lpg: Introduce a wrapper for getting driver data from a pwm chip
leds: gpio: Add kernel log if devm_fwnode_gpiod_get() fails
dt-bindings: leds: qcom,spmi-flash-led: Fix example node name
dt-bindings: leds: aw200xx: Fix led pattern and add reg constraints
dt-bindings: leds: awinic,aw200xx: Add AW20108 device
leds: aw200xx: Add support for aw20108 device
leds: aw200xx: Improve autodim calculation method
leds: aw200xx: Enable disable_locking flag in regmap config
leds: aw200xx: Add delay after software reset
dt-bindings: leds: aw200xx: Remove property "awinic,display-rows"
...
Linus Torvalds [Wed, 17 Jan 2024 23:21:21 +0000 (15:21 -0800)]
Merge tag 'mfd-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull mfd updates from Lee Jones:
"New Device Support:
- Add support for Qualcomm PM8937 PMIC to QCOM SPMI PMIC
Fix-ups:
- Use/convert to new/better APIs/helpers/MACROs instead of
hand-rolling implementations
- Device Tree binding adaptions/conversions/creation
- Improve error handling; return proper error values, simplify,
avoid duplicates, etc
- Continue work to remove superfluous platform .remove() call-backs
- Move some exported symbols into private namespaces
- Clean-up and staticify PM related operations
- Trivial; spelling, whitespace, clean-ups, etc
- Fix include lists; alphabetise, remove unused, explicitly add used
Bug Fixes:
- Use PLATFORM_DEVID_AUTO to ensure multiple duplicate devices can
co-exist
- Ensure debugfs register view is correctly presented
- Fix ordering and value issues in current use of
clk_register_fractional_divider()
- Repair Kconfig based dependency lists"
* tag 'mfd-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (50 commits)
mfd: ti_am335x_tscadc: Fix TI SoC dependencies
dt-bindings: mfd: sprd: Add support for UMS9620
mfd: ab8500-sysctrl: Drop ancient charger
mfd: intel-lpss: Fix the fractional clock divider flags
mfd: tps6594: Add null pointer check to tps6594_device_init()
dt-bindings: mfd: pm8008: Clean up example node names
dt-bindings: mfd: hisilicon,hi6421-spmi-pmic: Clean up example
dt-bindings: mfd: hisilicon,hi6421-spmi-pmic: Fix regulator binding
dt-bindings: mfd: hisilicon,hi6421-spmi-pmic: Fix up binding reference
mfd: da9062: Simplify obtaining I2C match data
mfd: syscon: Fix null pointer dereference in of_syscon_register()
mfd: intel-lpss: Don't fail probe on success of pci_alloc_irq_vectors()
mfd: twl6030-irq: Revert to use of_match_device()
mfd: cs42l43: Correct order of include files to be alphabetical
mfd: cs42l43: Correct SoundWire port list
mfd: Fix a few spelling mistakes in PMIC header file comments
mfd: intel-lpss: Provide Intel LPSS PM ops structure
mfd: intel-lpss: Move exported symbols to INTEL_LPSS namespace
mfd: intel-lpss: Adjust header inclusions
mfd: intel-lpss: Use device_get_match_data()
...
Linus Torvalds [Wed, 17 Jan 2024 23:09:12 +0000 (15:09 -0800)]
Merge tag 'rproc-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux
Pull remoteproc updates from Bjorn Andersson:
- The i.MX DSP remoteproc driver adds support for providing a resource
table, in order to enable IPC with the core
- The TI K3 DSP driver is transitioned to remove_new, error messages
are changed to use symbolic error codes, and dev_err_probe() is used
where applicable
- Support for the Qualcomm SC7280 audio, compute and WiFi co-processors
are added to the Qualcomm TrustZone based remoteproc driver
* tag 'rproc-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
remoteproc: qcom_q6v5_pas: Add SC7280 ADSP, CDSP & WPSS
dt-bindings: remoteproc: qcom: sc7180-pas: Add SC7280 compatibles
dt-bindings: remoteproc: qcom: sc7180-pas: Fix SC7280 MPSS PD-names
remoteproc: k3-dsp: Convert to platform remove callback returning void
remoteproc: k3-dsp: Use symbolic error codes in error messages
remoteproc: k3-dsp: Suppress duplicate error message in .remove()
arm64: dts: imx8mp: Add reserve-memory nodes for DSP
remoteproc: imx_dsp_rproc: Add mandatory find_loaded_rsc_table op
Linus Torvalds [Wed, 17 Jan 2024 23:05:27 +0000 (15:05 -0800)]
Merge tag 'rpmsg-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux
Pull rpmsg updates from Bjorn Andersson:
"This make virtio free driver_override upon removal. It also updates
the rpmsg documentation after earlier API updates"
* tag 'rpmsg-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
rpmsg: virtio: Free driver_override when rpmsg_remove()
doc: rmpsg: Update with rpmsg_endpoint
Linus Torvalds [Wed, 17 Jan 2024 22:59:05 +0000 (14:59 -0800)]
Merge tag 'drm-next-2024-01-15-1' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"This is just a wrap up of fixes from the last few days. It has the
proper fix to the i915/xe collision, we can clean up what you did
later once rc1 lands.
Otherwise it's a few other i915, a v3d, rockchip and a nouveau fix to
make GSP load on some original Turing GPUs.
i915:
- Fixes for kernel-doc warnings enforced in linux-next
- Another build warning fix for string formatting of intel_wakeref_t
- Display fixes for DP DSC BPC and C20 PLL state verification
v3d:
- register readout fix
rockchip:
- two build warning fixes
nouveau:
- fix GSP loading on Turing with different nvdec configuration"
* tag 'drm-next-2024-01-15-1' of git://anongit.freedesktop.org/drm/drm:
nouveau/gsp: handle engines in runl without nonstall interrupts.
drm/i915/perf: reconcile Excess struct member kernel-doc warnings
drm/i915/guc: reconcile Excess struct member kernel-doc warnings
drm/i915/gt: reconcile Excess struct member kernel-doc warnings
drm/i915/gem: reconcile Excess struct member kernel-doc warnings
drm/i915/dp: Fix the max DSC bpc supported by source
drm/i915: don't make assumptions about intel_wakeref_t type
drm/i915/dp: Fix the PSR debugfs entries wrt. MST connectors
drm/i915/display: Fix C20 pll selection for state verification
drm/v3d: Fix support for register debugging on the RPi 4
drm/rockchip: vop2: Drop unused if_dclk_rate variable
drm/rockchip: vop2: Drop superfluous include
Linus Torvalds [Wed, 17 Jan 2024 22:47:33 +0000 (14:47 -0800)]
Merge tag 'thermal-6.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more thermal control updates from Rafael Wysocki:
"These add support for debugfs-based diagnostics to the thermal core,
simplify the thermal netlink API, fix system-wide PM support in the
Intel HFI driver and clean up some code.
Specifics:
- Add debugfs-based diagnostics support to the thermal core (Daniel
Lezcano, Dan Carpenter)
- Fix a power allocator thermal governor issue preventing it from
resetting cooling devices sometimes (Di Shen)
- Simplify the thermal netlink API and clean up related code (Rafael
J. Wysocki)
- Make the Intel HFI driver support hibernation and deep suspend
properly (Ricardo Neri)"
* tag 'thermal-6.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal/debugfs: Unlock on error path in thermal_debug_tz_trip_up()
thermal: intel: hfi: Add syscore callbacks for system-wide PM
thermal: gov_power_allocator: avoid inability to reset a cdev
thermal: helpers: Rearrange thermal_cdev_set_cur_state()
thermal: netlink: Rework notify API for cooling devices
thermal: core: Use kstrdup_const() during cooling device registration
thermal/debugfs: Add thermal debugfs information for mitigation episodes
thermal/debugfs: Add thermal cooling device debugfs information
thermal: netlink: Pass thermal zone pointer to notify routines
thermal: netlink: Drop thermal_notify_tz_trip_add/delete()
thermal: netlink: Pass pointers to thermal_notify_tz_trip_up/down()
thermal: netlink: Pass pointers to thermal_notify_tz_trip_change()
Linus Torvalds [Wed, 17 Jan 2024 22:37:40 +0000 (14:37 -0800)]
Merge tag 'acpi-6.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
"These add support for new MADT flags to ACPICA, constify the PNP bus
type structure and add new ACPI IRQ management quirks.
Specifics:
- Make pnp_bus_type const (Greg Kroah-Hartman)
- Add ACPI IRQ management quirks for ASUS ExpertBook B1502CGA and
ASUS Vivobook E1504GA and E1504GAB (Ben Mayo, Michael Maltsev)
- Add new MADT GICC/GICR/ITS non-coherent flags and GICC online
capable bit handling to ACPICA (Lorenzo Pieralisi)"
* tag 'acpi-6.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: MADT: Add new MADT GICC/GICR/ITS non-coherent flags handling
ACPICA: MADT: Add GICC online capable bit handling
ACPI: resource: Skip IRQ override on ASUS ExpertBook B1502CGA
ACPI: resource: Add DMI quirks for ASUS Vivobook E1504GA and E1504GAB
PNP: make pnp_bus_type const
Linus Torvalds [Wed, 17 Jan 2024 22:07:14 +0000 (14:07 -0800)]
Merge tag 'pm-6.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These restore the asynchronous device resume optimization removed by
the previous PM merge, make the intel_pstate driver work better on
Meteor Lake systems, optimize the PM QoS core code slightly and fix up
typos in admin-guide.
Specifics:
- Restore the system-wide asynchronous device resume optimization
removed by a recent concurrency fix (Rafael J. Wysocki)
- Make the intel_pstate cpufreq driver allow Meteor Lake systems to
run at somewhat higher frequencies (Srinivas Pandruvada)
- Make the PM QoS core code use kcalloc() for array allocation (Erick
Archer)
- Fix two PM-related typos in admin-guide (Erwan Velu)"
* tag 'pm-6.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: sleep: Restore asynchronous device resume optimization
Documentation: admin-guide: PM: Fix two typos
cpufreq: intel_pstate: Update hybrid scaling factor for Meteor Lake
PM: QoS: Use kcalloc() instead of kzalloc()
Linus Torvalds [Wed, 17 Jan 2024 21:03:37 +0000 (13:03 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"Generic:
- Use memdup_array_user() to harden against overflow.
- Unconditionally advertise KVM_CAP_DEVICE_CTRL for all
architectures.
- Clean up Kconfigs that all KVM architectures were selecting
- New functionality around "guest_memfd", a new userspace API that
creates an anonymous file and returns a file descriptor that refers
to it. guest_memfd files are bound to their owning virtual machine,
cannot be mapped, read, or written by userspace, and cannot be
resized. guest_memfd files do however support PUNCH_HOLE, which can
be used to switch a memory area between guest_memfd and regular
anonymous memory.
- New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
per-page attributes for a given page of guest memory; right now the
only attribute is whether the guest expects to access memory via
guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
TDX or ARM64 pKVM is checked by firmware or hypervisor that
guarantees confidentiality (AMD PSP, Intel TDX module, or EL2 in
the case of pKVM).
x86:
- Support for "software-protected VMs" that can use the new
guest_memfd and page attributes infrastructure. This is mostly
useful for testing, since there is no pKVM-like infrastructure to
provide a meaningfully reduced TCB.
- Fix a relatively benign off-by-one error when splitting huge pages
during CLEAR_DIRTY_LOG.
- Fix a bug where KVM could incorrectly test-and-clear dirty bits in
non-leaf TDP MMU SPTEs if a racing thread replaces a huge SPTE with
a non-huge SPTE.
- Use more generic lockdep assertions in paths that don't actually
care about whether the caller is a reader or a writer.
- let Xen guests opt out of having PV clock reported as "based on a
stable TSC", because some of them don't expect the "TSC stable" bit
(added to the pvclock ABI by KVM, but never set by Xen) to be set.
- Revert a bogus, made-up nested SVM consistency check for
TLB_CONTROL.
- Advertise flush-by-ASID support for nSVM unconditionally, as KVM
always flushes on nested transitions, i.e. always satisfies flush
requests. This allows running bleeding edge versions of VMware
Workstation on top of KVM.
- Sanity check that the CPU supports flush-by-ASID when enabling SEV
support.
- On AMD machines with vNMI, always rely on hardware instead of
intercepting IRET in some cases to detect unmasking of NMIs
- Support for virtualizing Linear Address Masking (LAM)
- Fix a variety of vPMU bugs where KVM fail to stop/reset counters
and other state prior to refreshing the vPMU model.
- Fix a double-overflow PMU bug by tracking emulated counter events
using a dedicated field instead of snapshotting the "previous"
counter. If the hardware PMC count triggers overflow that is
recognized in the same VM-Exit that KVM manually bumps an event
count, KVM would pend PMIs for both the hardware-triggered overflow
and for KVM-triggered overflow.
- Turn off KVM_WERROR by default for all configs so that it's not
inadvertantly enabled by non-KVM developers, which can be
problematic for subsystems that require no regressions for W=1
builds.
- Advertise all of the host-supported CPUID bits that enumerate
IA32_SPEC_CTRL "features".
- Don't force a masterclock update when a vCPU synchronizes to the
current TSC generation, as updating the masterclock can cause
kvmclock's time to "jump" unexpectedly, e.g. when userspace
hotplugs a pre-created vCPU.
- Use RIP-relative address to read kvm_rebooting in the VM-Enter
fault paths, partly as a super minor optimization, but mostly to
make KVM play nice with position independent executable builds.
- Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
CONFIG_HYPERV as a minor optimization, and to self-document the
code.
- Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV
"emulation" at build time.
ARM64:
- LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base
granule sizes. Branch shared with the arm64 tree.
- Large Fine-Grained Trap rework, bringing some sanity to the
feature, although there is more to come. This comes with a prefix
branch shared with the arm64 tree.
- Some additional Nested Virtualization groundwork, mostly
introducing the NV2 VNCR support and retargetting the NV support to
that version of the architecture.
- A small set of vgic fixes and associated cleanups.
Loongarch:
- Optimization for memslot hugepage checking
- Cleanup and fix some HW/SW timer issues
- Add LSX/LASX (128bit/256bit SIMD) support
RISC-V:
- KVM_GET_REG_LIST improvement for vector registers
- Generate ISA extension reg_list using macros in get-reg-list
selftest
- Support for reporting steal time along with selftest
s390:
- Bugfixes
Selftests:
- Fix an annoying goof where the NX hugepage test prints out garbage
instead of the magic token needed to run the test.
- Fix build errors when a header is delete/moved due to a missing
flag in the Makefile.
- Detect if KVM bugged/killed a selftest's VM and print out a helpful
message instead of complaining that a random ioctl() failed.
- Annotate the guest printf/assert helpers with __printf(), and fix
the various bugs that were lurking due to lack of said annotation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (185 commits)
x86/kvm: Do not try to disable kvmclock if it was not enabled
KVM: x86: add missing "depends on KVM"
KVM: fix direction of dependency on MMU notifiers
KVM: introduce CONFIG_KVM_COMMON
KVM: arm64: Add missing memory barriers when switching to pKVM's hyp pgd
KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache
RISC-V: KVM: selftests: Add get-reg-list test for STA registers
RISC-V: KVM: selftests: Add steal_time test support
RISC-V: KVM: selftests: Add guest_sbi_probe_extension
RISC-V: KVM: selftests: Move sbi_ecall to processor.c
RISC-V: KVM: Implement SBI STA extension
RISC-V: KVM: Add support for SBI STA registers
RISC-V: KVM: Add support for SBI extension registers
RISC-V: KVM: Add SBI STA info to vcpu_arch
RISC-V: KVM: Add steal-update vcpu request
RISC-V: KVM: Add SBI STA extension skeleton
RISC-V: paravirt: Implement steal-time support
RISC-V: Add SBI STA extension definitions
RISC-V: paravirt: Add skeleton for pv-time support
RISC-V: KVM: Fix indentation in kvm_riscv_vcpu_set_reg_csr()
...