ftrace/scripts: Update the instructions for ftrace-bisect.sh
The instructions for the ftrace-bisect.sh script, which is used to find
what function is being traced that is causing a kernel crash, and possibly
a triple fault reboot, uses the old method. In 5.1, a new feature was
added that let the user write in the index into available_filter_functions
that maps to the function a user wants to set in set_ftrace_filter (or
set_ftrace_notrace). This takes O(1) to set, as suppose to writing a
function name, which takes O(n) (where n is the number of functions in
available_filter_functions).
The ftrace-bisect.sh requires setting half of the functions in
available_filter_functions, which is O(n^2) using the name method to enable
and can take several minutes to complete. The number method is O(n) which
takes less than a second to complete. Using the number method for any
kernel 5.1 and after is the proper way to do the bisect.
Update the usage to reflect the new change, as well as using the
/sys/kernel/tracing path instead of the obsolete debugfs path.
tracing: Make sure trace_printk() can output as soon as it can be used
Currently trace_printk() can be used as soon as early_trace_init() is
called from start_kernel(). But if a crash happens, and
"ftrace_dump_on_oops" is set on the kernel command line, all you get will
be:
[ 0.456075] <idle>-0 0dN.2. 347519us : Unknown type 6
[ 0.456075] <idle>-0 0dN.2. 353141us : Unknown type 6
[ 0.456075] <idle>-0 0dN.2. 358684us : Unknown type 6
This is because the trace_printk() event (type 6) hasn't been registered
yet. That gets done via an early_initcall(), which may be early, but not
early enough.
Instead of registering the trace_printk() event (and other ftrace events,
which are not trace events) via an early_initcall(), have them registered at
the same time that trace_printk() can be used. This way, if there is a
crash before early_initcall(), then the trace_printk()s will actually be
useful.
Mark Rutland [Tue, 3 Jan 2023 12:49:11 +0000 (12:49 +0000)]
ftrace: Export ftrace_free_filter() to modules
Setting filters on an ftrace ops results in some memory being allocated
for the filter hashes, which must be freed before the ops can be freed.
This can be done by removing every individual element of the hash by
calling ftrace_set_filter_ip() or ftrace_set_filter_ips() with `remove`
set, but this is somewhat error prone as it's easy to forget to remove
an element.
Make it easier to clean this up by exporting ftrace_free_filter(), which
can be used to clean up all of the filter hashes after an ftrace_ops has
been unregistered.
Using this, fix the ftrace-direct* samples to free hashes prior to being
unloaded. All other code either removes individual filters explicitly or
is built-in and already calls ftrace_free_filter().
This cycle we ported all filesystems to the new posix acl api. While
looking at further simplifications in this area to remove the last
remnants of the generic dummy posix acl handlers we realized that we
regressed fuse daemons that don't set FUSE_POSIX_ACL but still make use
of posix acls.
With the change to a dedicated posix acl api interacting with posix acls
doesn't go through the old xattr codepaths anymore and instead only
relies the get acl and set acl inode operations.
Before this change fuse daemons that don't set FUSE_POSIX_ACL were able
to get and set posix acl albeit with two caveats. First, that posix acls
aren't cached. And second, that they aren't used for permission checking
in the vfs.
We regressed that use-case as we currently refuse to retrieve any posix
acls if they aren't enabled via FUSE_POSIX_ACL. So older fuse daemons
would see a change in behavior.
We can restore the old behavior in multiple ways. We could change the
new posix acl api and look for a dedicated xattr handler and if we find
one prefer that over the dedicated posix acl api. That would break the
consistency of the new posix acl api so we would very much prefer not to
do that.
We could introduce a new ACL_*_CACHE sentinel that would instruct the
vfs permission checking codepath to not call into the filesystem and
ignore acls.
But a more straightforward fix for v6.2 is to do the same thing that
Overlayfs does and give fuse a separate get acl method for permission
checking. Overlayfs uses this to express different needs for vfs
permission lookup and acl based retrieval via the regular system call
path as well. Let fuse do the same for now. This way fuse can continue
to refuse to retrieve posix acls for daemons that don't set
FUSE_POSXI_ACL for permission checking while allowing a fuse server to
retrieve it via the usual system calls.
In the future, we could extend the get acl inode operation to not just
pass a simple boolean to indicate rcu lookup but instead make it a flag
argument. Then in addition to passing the information that this is an
rcu lookup to the filesystem we could also introduce a flag that tells
the filesystem that this is a request from the vfs to use these acls for
permission checking. Then fuse could refuse the get acl request for
permission checking when the daemon doesn't have FUSE_POSIX_ACL set in
the same get acl method. This would also help Overlayfs and allow us to
remove the second method for it as well.
But since that change is more invasive as we need to update the get acl
inode operation for multiple filesystems we should not do this as a fix
for v6.2. Instead we will do this for the v6.3 merge window.
Fwiw, since posix acls are now always correctly translated in the new
posix acl api we could also allow them to be used for daemons without
FUSE_POSIX_ACL that are not mounted on the host. But this is behavioral
change and again if dones should be done for v6.3. For now, let's just
restore the original behavior.
A nice side-effect of this change is that for fuse daemons with and
without FUSE_POSIX_ACL the same code is used for posix acls in a
backwards compatible way. This also means we can remove the legacy xattr
handlers completely. We've also added comments to explain the expected
behavior for daemons without FUSE_POSIX_ACL into the code.
Hans de Goede [Tue, 24 Jan 2023 10:57:54 +0000 (11:57 +0100)]
ACPI: video: Fix apple gmux detection
Some apple laptop models have an ACPI device with a HID of APP000B
and that device has an IO resource (so it does not describe the new
unsupported MMIO based gmux type), but there actually is no gmux
in the laptop at all.
The gmux_probe() function of the actual apple-gmux driver has code
to detect this, this code has been factored out into a new
apple_gmux_detect() helper in apple-gmux.h.
Use this new function to fix acpi_video_get_backlight_type() wrongly
returning apple_gmux as type on the following laptops:
Add a new (static inline) apple_gmux_detect() helper to apple-gmux.h
which can be used for gmux detection instead of apple_gmux_present().
The latter is not really reliable since an ACPI device with a HID
of APP000B is present on some devices without a gmux at all, as well
as on devices with a newer (unsupported) MMIO based gmux model.
This causes apple_gmux_present() to return false-positives on
a number of different Apple laptop models.
This new helper uses the same probing as the actual apple-gmux
driver, so that it does not return false positives.
To avoid code duplication the gmux_probe() function of the actual
driver is also moved over to using the new apple_gmux_detect() helper.
This avoids false positives (vs _HID + IO region detection) on:
Hans de Goede [Tue, 24 Jan 2023 10:57:52 +0000 (11:57 +0100)]
platform/x86: apple-gmux: Move port defines to apple-gmux.h
This is a preparation patch for adding a new static inline
apple_gmux_detect() helper which actually checks a supported
gmux is present, rather then only checking an ACPI device with
the HID is there as apple_gmux_present() does.
platform/x86/amd: pmc: Disable IRQ1 wakeup for RN/CZN
By default when the system is configured for low power idle in the FADT
the keyboard is set up as a wake source. This matches the behavior that
Windows uses for Modern Standby as well.
It has been reported that a variety of AMD based designs there are
spurious wakeups are happening where two IRQ sources are active.
For example:
```
PM: Triggering wakeup from IRQ 9
PM: Triggering wakeup from IRQ 1
```
In these designs IRQ 9 is the ACPI SCI and IRQ 1 is the keyboard.
One way to trigger this problem is to suspend the laptop and then unplug
the AC adapter. The SOC will be in a hardware sleep state and plugging
in the AC adapter returns control to the kernel's s2idle loop.
Normally if just IRQ 9 was active the s2idle loop would advance any EC
transactions and no other IRQ being active would cause the s2idle loop
to put the SOC back into hardware sleep state.
When this bug occurred IRQ 1 is also active even if no keyboard activity
occurred. This causes the s2idle loop to break and the system to wake.
This is a platform firmware bug triggering IRQ1 without keyboard activity.
This occurs in Windows as well, but Windows will enter "SW DRIPS" and
then with no activity enters back into "HW DRIPS" (hardware sleep state).
This issue affects Renoir, Lucienne, Cezanne, and Barcelo platforms. It
does not happen on newer systems such as Mendocino or Rembrandt.
It's been fixed in newer platform firmware. To avoid triggering the bug
on older systems check the SMU F/W version and adjust the policy at suspend
time for s2idle wakeup from keyboard on these systems. A lot of thought
and experimentation has been given around the timing of disabling IRQ1,
and to make it work the "suspend" PM callback is restored.
Commit 1ea0d3b46798 ("platform/x86: asus-wmi: Simplify tablet-mode-switch
handling") unified the asus-wmi tablet-switch handling, but it did not take
into account that the value returned for the kbd_dock_devid WMI method is
inverted where as the other ones are not inverted.
This causes asus-wmi to report an inverted tablet-switch state for devices
which use the kbd_dock_devid, which causes libinput to ignore touchpad
events while the affected T10x model 2-in-1s are docked.
Add inverting of the return value in the kbd_dock_devid case to fix this.
Kevin Kuriakose [Thu, 19 Jan 2023 15:09:25 +0000 (20:39 +0530)]
platform/x86: gigabyte-wmi: add support for B450M DS3H WIFI-CF
To the best of my knowledge this is the same board as the B450M DS3H-CF,
but with an added WiFi card. Name obtained using dmidecode, tested
with force_load on v6.1.6
Rishit Bansal [Fri, 20 Jan 2023 22:12:14 +0000 (03:42 +0530)]
platform/x86: hp-wmi: Handle Omen Key event
Add support to map the "HP Omen Key" to KEY_PROG2. Laptops in the HP
Omen Series open the HP Omen Command Center application on windows. But,
on linux it fails with the following message from the hp-wmi driver:
Also adds support to map Fn+Esc to KEY_FN_ESC. This currently throws the
following message on the hp-wmi driver:
[ 6082.143785] hp_wmi: Unknown key code - 0x21a7
There is also a "Win-Lock" key on HP Omen Laptops which supports
Enabling and Disabling the Windows key, which trigger commands 0x21a4
and 0x121a4 respectively, but I wasn't able to find any KEY in input.h
to map this to.
syzbot reported a use-after-free in do_accept(), precisely nr_accept()
as sk_prot_alloc() allocated the memory and sock_put() frees it. [0]
The issue could happen if the heartbeat timer is fired and
nr_heartbeat_expiry() calls nr_destroy_socket(), where a socket
has SOCK_DESTROY or a listening socket has SOCK_DEAD.
In this case, the first condition cannot be true. SOCK_DESTROY is
flagged in nr_release() only when the file descriptor is close()d,
but accept() is being called for the listening socket, so the second
condition must be true.
Usually, the AF_NETROM listener neither starts timers nor sets
SOCK_DEAD. However, the condition is met if connect() fails before
listen(). connect() starts the t1 timer and heartbeat timer, and
t1timer calls nr_disconnect() when timeout happens. Then, SOCK_DEAD
is set, and if we call listen(), the heartbeat timer calls
nr_destroy_socket().
This path seems expected, and nr_destroy_socket() is called to clean
up resources. Initially, there was sock_hold() before nr_destroy_socket()
so that the socket would not be freed, but the commit 517a16b1a88b
("netrom: Decrease sock refcount when sock timers expire") accidentally
removed it.
To fix use-after-free, let's add sock_hold().
[0]:
BUG: KASAN: use-after-free in do_accept+0x483/0x510 net/socket.c:1848
Read of size 8 at addr ffff88807978d398 by task syz-executor.3/5315
drm/fb-helper: Use a per-driver FB deferred I/O handler
The DRM fbdev emulation layer sets the struct fb_info .fbdefio field to
a struct fb_deferred_io pointer, that is shared across all drivers that
use the generic drm_fbdev_generic_setup() helper function.
It is a problem because the fbdev core deferred I/O logic assumes that
the struct fb_deferred_io data is not shared between devices, and it's
stored there state such as the list of pages touched and a mutex that
is use to synchronize between the fb_deferred_io_track_page() function
that track the dirty pages and fb_deferred_io_work() workqueue handler
doing the actual deferred I/O.
The latter can lead to the following error, since it may happen that two
drivers are probed and then one is removed, which causes the mutex bo be
destroyed and not existing anymore by the time the other driver tries to
grab it for the fbdev deferred I/O logic:
netfilter: conntrack: unify established states for SCTP paths
An SCTP endpoint can start an association through a path and tear it
down over another one. That means the initial path will not see the
shutdown sequence, and the conntrack entry will remain in ESTABLISHED
state for 5 days.
By merging the HEARTBEAT_ACKED and ESTABLISHED states into one
ESTABLISHED state, there remains no difference between a primary or
secondary path. The timeout for the merged ESTABLISHED state is set to
210 seconds (hb_interval * max_path_retrans + rto_max). So, even if a
path doesn't see the shutdown sequence, it will expire in a reasonable
amount of time.
With this change in place, there is now more than one state from which
we can transition to ESTABLISHED, COOKIE_ECHOED and HEARTBEAT_SENT, so
handle the setting of ASSURED bit whenever a state change has happened
and the new state is ESTABLISHED. Removed the check for dir==REPLY since
the transition to ESTABLISHED can happen only in the reply direction.
This reverts commit (bff3d0534804: "netfilter: conntrack: add sctp
DATA_SENT state")
Using DATA/SACK to detect a new connection on secondary/alternate paths
works only on new connections, while a HEARTBEAT is required on
connection re-use. It is probably consistent to wait for HEARTBEAT to
create a secondary connection in conntrack.
netfilter: conntrack: fix vtag checks for ABORT/SHUTDOWN_COMPLETE
RFC 9260, Sec 8.5.1 states that for ABORT/SHUTDOWN_COMPLETE, the chunk
MUST be accepted if the vtag of the packet matches its own tag and the
T bit is not set OR if it is set to its peer's vtag and the T bit is set
in chunk flags. Otherwise the packet MUST be silently dropped.
Update vtag verification for ABORT/SHUTDOWN_COMPLETE based on the above
description.
Jakub Kicinski [Tue, 24 Jan 2023 06:36:58 +0000 (22:36 -0800)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-01-20 (iavf)
This series contains updates to iavf driver only.
Michal Schmidt converts single iavf workqueue to per adapter to avoid
deadlock issues.
Marcin moves setting of VLAN related netdev features to watchdog task to
avoid RTNL deadlock.
Stefan Assmann schedules immediate watchdog task execution on changing
primary MAC to avoid excessive delay.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: schedule watchdog immediately when changing primary MAC
iavf: Move netdev_update_features() into watchdog task
iavf: fix temporary deadlock and failure to set MAC address
====================
1) Fix overlap detection in rbtree set backend: Detect overlap by going
through the ordered list of valid tree nodes. To shorten the number of
visited nodes in the list, this algorithm descends the tree to search
for an existing element greater than the key value to insert that is
greater than the new element.
2) Fix for the rbtree set garbage collector: Skip inactive and busy
elements when checking for expired elements to avoid interference
with an ongoing transaction from control plane.
This is a rather large fix coming at this stage of the 6.2-rc. Since 33c7aba0b4ff ("netfilter: nf_tables: do not set up extensions for end
interval"), bogus overlap errors in the rbtree set occur more frequently.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_set_rbtree: skip elements in transaction from garbage collection
netfilter: nft_set_rbtree: Switch to node list walk for overlap detection
====================
Mat Martineau [Fri, 20 Jan 2023 23:11:21 +0000 (15:11 -0800)]
MAINTAINERS: Update MPTCP maintainer list and CREDITS
My responsibilities at Intel have changed, so I'm handing off exclusive
MPTCP subsystem maintainer duties to Matthieu. It has been a privilege
to see MPTCP through its initial upstreaming and first few years in the
upstream kernel!
Driver marked broadcast/multicast frames as offloaded incorrectly.
Mark them as offloaded only when HW offloading has been enabled.
This should happen only for ADIN2111 when both ports are bridged
by the software.
Ahmad Fatoum [Fri, 20 Jan 2023 11:09:32 +0000 (12:09 +0100)]
net: dsa: microchip: fix probe of I2C-connected KSZ8563
Starting with commit eee16b147121 ("net: dsa: microchip: perform the
compatibility check for dev probed"), the KSZ switch driver now bails
out if it thinks the DT compatible doesn't match the actual chip ID
read back from the hardware:
ksz9477-switch 1-005f: Device tree specifies chip KSZ9893 but found
KSZ8563, please fix it!
For the KSZ8563, which used ksz_switch_chips[KSZ9893], this was fine
at first, because it indeed shares the same chip id as the KSZ9893.
Commit b44908095612 ("net: dsa: microchip: add separate struct
ksz_chip_data for KSZ8563 chip") started differentiating KSZ9893
compatible chips by consulting the 0x1F register. The resulting breakage
was fixed for the SPI driver in the same commit by introducing the
appropriate ksz_switch_chips[KSZ8563], but not for the I2C driver.
Fix this for I2C-connected KSZ8563 now to get it probing again.
Eric Dumazet [Fri, 20 Jan 2023 12:59:53 +0000 (12:59 +0000)]
netlink: annotate data races around nlk->portid
syzbot reminds us netlink_getname() runs locklessly [1]
This first patch annotates the race against nlk->portid.
Following patches take care of the remaining races.
[1]
BUG: KCSAN: data-race in netlink_getname / netlink_insert
write to 0xffff88814176d310 of 4 bytes by task 2315 on cpu 1:
netlink_insert+0xf1/0x9a0 net/netlink/af_netlink.c:583
netlink_autobind+0xae/0x180 net/netlink/af_netlink.c:856
netlink_sendmsg+0x444/0x760 net/netlink/af_netlink.c:1895
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg net/socket.c:734 [inline]
____sys_sendmsg+0x38f/0x500 net/socket.c:2476
___sys_sendmsg net/socket.c:2530 [inline]
__sys_sendmsg+0x19a/0x230 net/socket.c:2559
__do_sys_sendmsg net/socket.c:2568 [inline]
__se_sys_sendmsg net/socket.c:2566 [inline]
__x64_sys_sendmsg+0x42/0x50 net/socket.c:2566
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff88814176d310 of 4 bytes by task 2316 on cpu 0:
netlink_getname+0xcd/0x1a0 net/netlink/af_netlink.c:1144
__sys_getsockname+0x11d/0x1b0 net/socket.c:2026
__do_sys_getsockname net/socket.c:2041 [inline]
__se_sys_getsockname net/socket.c:2038 [inline]
__x64_sys_getsockname+0x3e/0x50 net/socket.c:2038
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x00000000 -> 0xc9a49780
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 2316 Comm: syz-executor.2 Not tainted 6.2.0-rc3-syzkaller-00030-ge8f60cd7db24-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
netfilter: nft_set_rbtree: skip elements in transaction from garbage collection
Skip interference with an ongoing transaction, do not perform garbage
collection on inactive elements. Reset annotated previous end interval
if the expired element is marked as busy (control plane removed the
element right before expiration).
netfilter: nft_set_rbtree: Switch to node list walk for overlap detection
...instead of a tree descent, which became overly complicated in an
attempt to cover cases where expired or inactive elements would affect
comparisons with the new element being inserted.
Further, it turned out that it's probably impossible to cover all those
cases, as inactive nodes might entirely hide subtrees consisting of a
complete interval plus a node that makes the current insertion not
overlap.
To speed up the overlap check, descent the tree to find a greater
element that is closer to the key value to insert. Then walk down the
node list for overlap detection. Starting the overlap check from
rb_first() unconditionally is slow, it takes 10 times longer due to the
full linear traversal of the list.
Moreover, perform garbage collection of expired elements when walking
down the node list to avoid bogus overlap reports.
For the insertion operation itself, this essentially reverts back to the
implementation before commit 7c84d41416d8 ("netfilter: nft_set_rbtree:
Detect partial overlaps on insertion"), except that cases of complete
overlap are already handled in the overlap detection phase itself, which
slightly simplifies the loop to find the insertion point.
Based on initial patch from Stefano Brivio, including text from the
original patch description too.
Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion") Reviewed-by: Stefano Brivio <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
Hans de Goede [Thu, 19 Jan 2023 17:24:41 +0000 (18:24 +0100)]
ACPI: video: Add backlight=native DMI quirk for Asus U46E
The Asus U46E backlight tables have a set of interesting problems:
1. Its ACPI tables do make _OSI ("Windows 2012") checks, so
acpi_osi_is_win8() should return true.
But the tables have 2 sets of _OSI calls, one from the usual global
_INI method setting a global OSYS variable and a second set of _OSI
calls from a MSOS method and the MSOS method is the only one calling
_OSI ("Windows 2012").
The MSOS method only gets called in the following cases:
1. From some Asus specific WMI methods
2. From _DOD, which only runs after acpi_video_get_backlight_type()
has already been called by the i915 driver
3. From other ACPI video bus methods which never run (see below)
4. From some EC query callbacks
So when i915 calls acpi_video_get_backlight_type() MSOS has never run
and acpi_osi_is_win8() returns false, so acpi_video_get_backlight_type()
returns acpi_video as the desired backlight type, which causes
the intel_backlight device to not register.
2. _DOD effectively does this:
Return (Package (0x01)
{
0x0400
})
causing acpi_video_device_in_dod() to return false, which causes
the acpi_video backlight device to not register.
Leaving the user with no backlight device at all. Note that before 6.1.y
the i915 driver would register the intel_backlight device unconditionally
and since that then was the only backlight device userspace would use that.
Add a backlight=native DMI quirk for this special laptop to restore
the old (and working) behavior of the intel_backlight device registering.
Fixes: fb1836c91317 ("ACPI: video: Prefer native over vendor") Signed-off-by: Hans de Goede <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
Hans de Goede [Thu, 19 Jan 2023 16:37:44 +0000 (17:37 +0100)]
ACPI: video: Add backlight=native DMI quirk for HP EliteBook 8460p
The HP EliteBook 8460p predates Windows 8, so it defaults to using
acpi_video# for backlight control.
Starting with the 6.1.y kernels the native radeon_bl0 backlight is hidden
in this case instead of relying on userspace preferring acpi_video# over
native backlight devices.
It turns out that for the acpi_video# interface to work on
the HP EliteBook 8460p, the brightness needs to be set at least once
through the native interface, which now no longer is done breaking
backlight control.
The native interface however always works without problems, so add
a quirk to use native backlight on the EliteBook 8460p to fix this.
Linus Torvalds [Mon, 23 Jan 2023 19:56:07 +0000 (11:56 -0800)]
Merge tag 'vfio-v6.2-rc6' of https://github.com/awilliam/linux-vfio
Pull VFIO fixes from Alex Williamson:
- Honor reserved regions when testing for IOMMU find grained super page
support, avoiding a regression on s390 for a firmware device where
the existence of the mapping, even if unused can trigger an error
state. (Niklas Schnelle)
- Fix a deadlock in releasing KVM references by using the alternate
.release() rather than .destroy() callback for the kvm-vfio device.
(Yi Liu)
* tag 'vfio-v6.2-rc6' of https://github.com/awilliam/linux-vfio:
kvm/vfio: Fix potential deadlock on vfio group_lock
vfio/type1: Respect IOMMU reserved regions in vfio_test_domain_fgsp()
Linus Torvalds [Mon, 23 Jan 2023 19:46:19 +0000 (11:46 -0800)]
Merge tag 'efi-fixes-for-v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
"Another couple of EFI fixes, of which the first two were already in
-next when I sent out the previous PR, but they caused some issues on
non-EFI boots so I let them simmer for a bit longer.
- ensure the EFI ResetSystem and ACPI PRM calls are recognized as
users of the EFI runtime, and therefore protected against
exceptions
- account for the EFI runtime stack in the stacktrace code
- remove Matthew Garrett's MAINTAINERS entry for efivarfs"
* tag 'efi-fixes-for-v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: Remove Matthew Garrett as efivarfs maintainer
arm64: efi: Account for the EFI runtime stack in stack unwinder
arm64: efi: Avoid workqueue to check whether EFI runtime is live
Lucas De Marchi [Thu, 19 Jan 2023 01:52:39 +0000 (17:52 -0800)]
drm/i915/mtl: Fix bcs default context
Commit 0d0e7d1eea9e ("drm/i915/mtl: Define engine context layouts")
added the engine context for Meteor Lake. In a second revision of the
patch it was believed the xcs offsets were wrong due to a tagging
issue in the spec. The first version was actually correct, as shown
by the intel_lrc_live_selftests/live_lrc_layout test:
gpiolib-acpi: Don't set GPIOs for wakeup in S3 mode
commit 1796f808e4bb ("HID: i2c-hid: acpi: Stop setting wakeup_capable")
adjusted the policy to enable wakeup by default if the ACPI tables
indicated that a device was wake capable.
It was reported however that this broke suspend on at least two System76
systems in S3 mode and two Lenovo Gen2a systems, but only with S3.
When the machines are set to s2idle, wakeup behaves properly.
Configuring the GPIOs for wakeup with S3 doesn't work properly, so only
set it when the system supports low power idle.
Jeff Layton [Fri, 20 Jan 2023 19:52:14 +0000 (14:52 -0500)]
nfsd: don't free files unconditionally in __nfsd_file_cache_purge
nfsd_file_cache_purge is called when the server is shutting down, in
which case, tearing things down is generally fine, but it also gets
called when the exports cache is flushed.
Instead of walking the cache and freeing everything unconditionally,
handle it the same as when we have a notification of conflicting access.
ARM: 9287/1: Reduce __thumb2__ definition to crypto files that require it
Commit 1d2e9b67b001 ("ARM: 9265/1: pass -march= only to compiler") added
a __thumb2__ define to ASFLAGS to avoid build errors in the crypto code,
which relies on __thumb2__ for preprocessing. Commit 59e2cf8d21e0 ("ARM:
9275/1: Drop '-mthumb' from AFLAGS_ISA") followed up on this by removing
-mthumb from AFLAGS so that __thumb2__ would not be defined when the
default target was ARMv7 or newer.
Unfortunately, the second commit's fix assumes that the toolchain
defaults to -mno-thumb / -marm, which is not the case for Debian's
arm-linux-gnueabihf target, which defaults to -mthumb:
This target is used by several CI systems, which will still see
redefined macro warnings, despite '-mthumb' not being present in the
flags:
<command-line>: warning: "__thumb2__" redefined
<built-in>: note: this is the location of the previous definition
Remove the global AFLAGS __thumb2__ define and move it to the crypto
folder where it is required by the imported OpenSSL algorithms; the rest
of the kernel should use the internal CONFIG_THUMB2_KERNEL symbol to
know whether or not Thumb2 is being used or not. Be sure that __thumb2__
is undefined first so that there are no macro redefinition warnings.
Jens Axboe [Sun, 22 Jan 2023 17:02:55 +0000 (10:02 -0700)]
io_uring/net: cache provided buffer group value for multishot receives
If we're using ring provided buffers with multishot receive, and we end
up doing an io-wq based issue at some points that also needs to select
a buffer, we'll lose the initially assigned buffer group as
io_ring_buffer_select() correctly clears the buffer group list as the
issue isn't serialized by the ctx uring_lock. This is fine for normal
receives as the request puts the buffer and finishes, but for multishot,
we will re-arm and do further receives. On the next trigger for this
multishot receive, the receive will try and pick from a buffer group
whose value is the same as the buffer ID of the las receive. That is
obviously incorrect, and will result in a premature -ENOUFS error for
the receive even if we had available buffers in the correct group.
Cache the buffer group value at prep time, so we can restore it for
future receives. This only needs doing for the above mentioned case, but
just do it by default to keep it easier to read.
Gergely Risko [Thu, 19 Jan 2023 13:40:41 +0000 (14:40 +0100)]
ipv6: fix reachability confirmation with proxy_ndp
When proxying IPv6 NDP requests, the adverts to the initial multicast
solicits are correct and working. On the other hand, when later a
reachability confirmation is requested (on unicast), no reply is sent.
This causes the neighbor entry expiring on the sending node, which is
mostly a non-issue, as a new multicast request is sent. There are
routers, where the multicast requests are intentionally delayed, and in
these environments the current implementation causes periodic packet
loss for the proxied endpoints.
The root cause is the erroneous decrease of the hop limit, as this
is checked in ndisc.c and no answer is generated when it's 254 instead
of the correct 255.
David S. Miller [Mon, 23 Jan 2023 10:58:12 +0000 (10:58 +0000)]
Merge branch 'ethtool-mac-merge'
Vladimir Oltean say:
====================
ethtool support for IEEE 802.3 MAC Merge layer
Change log
----------
v3->v4:
- add missing opening bracket in ocelot_port_mm_irq()
- moved cfg.verify_time range checking so that it actually takes place
for the updated rather than old value
v3 at:
https://patchwork.kernel.org/project/netdevbpf/cover/20230117085947.2176464[email protected]/
v2->v3:
- made get_mm return int instead of void
- deleted ETHTOOL_A_MM_SUPPORTED
- renamed ETHTOOL_A_MM_ADD_FRAG_SIZE to ETHTOOL_A_MM_TX_MIN_FRAG_SIZE
- introduced ETHTOOL_A_MM_RX_MIN_FRAG_SIZE
- cleaned up documentation
- rebased on top of PLCA changes
- renamed ETHTOOL_STATS_SRC_* to ETHTOOL_MAC_STATS_SRC_*
v2 at:
https://patchwork.kernel.org/project/netdevbpf/cover/20230111161706.1465242[email protected]/
v1->v2:
I've decided to focus just on the MAC Merge layer for now, which is why
I am able to submit this patch set as non-RFC.
v1 (RFC) at:
https://patchwork.kernel.org/project/netdevbpf/cover/20220816222920.1952936[email protected]/
What is being introduced
------------------------
TL;DR: a MAC Merge layer as defined by IEEE 802.3-2018, clause 99
(interspersing of express traffic). This is controlled through ethtool
netlink (ETHTOOL_MSG_MM_GET, ETHTOOL_MSG_MM_SET). The raw ethtool
commands are posted here:
https://patchwork.kernel.org/project/netdevbpf/cover/20230111153638.1454687[email protected]/
The MAC Merge layer has its own statistics counters
(ethtool --include-statistics --show-mm swp0) as well as two member
MACs, the statistics of which can be queried individually, through a new
ethtool netlink attribute, corresponding to:
The core properties of the MAC Merge layer are described in great detail
in patches 02/12 and 03/12. They can be viewed in "make htmldocs" format.
Devices for which the API is supported
--------------------------------------
I decided to start with the Ethernet switch on NXP LS1028A (Felix)
because of the smaller patch set. I also have support for the ENETC
controller pending.
I would like to get confirmation that the UAPI being proposed here will
not restrict any use cases known by other hardware vendors.
Why is support for preemptible traffic classes not here?
--------------------------------------------------------
There is legitimate concern whether the 802.1Q portion of the standard
(which traffic classes go to the eMAC and which to the pMAC) should be
modeled in Linux using tc or using another UAPI. I think that is
stalling the entire series, but should be discussed separately instead.
Removing FP adminStatus support makes me confident enough to submit this
patch set without an RFC tag (meaning: I wouldn't mind if it was merged
as is).
What is submitted here is sufficient for an LLDP daemon to do its job.
I've patched openlldp to advertise and configure frame preemption:
https://github.com/vladimiroltean/openlldp/tree/frame-preemption-v3
In case someone wants to try it out, here are some commands I've used.
# Configure the interfaces to receive and transmit LLDP Data Units
lldptool -L -i eno0 adminStatus=rxtx
lldptool -L -i swp0 adminStatus=rxtx
# Enable the transmission of certain TLVs on switch's interface
lldptool -T -i eno0 -V addEthCap enableTx=yes
lldptool -T -i swp0 -V addEthCap enableTx=yes
# Query LLDP statistics on switch's interface
lldptool -S -i swp0
# Query the received neighbor TLVs
lldptool -i swp0 -t -n -V addEthCap
Additional Ethernet Capabilities TLV
Preemption capability supported
Preemption capability enabled
Preemption capability active
Additional fragment size: 60 octets
So using this patch set, lldpad will be able to advertise and configure
frame preemption, but still, no data packet will be sent as preemptible
over the link, because there is no UAPI to control which traffic classes
are sent as preemptible and which as express.
Preemptable or preemptible?
---------------------------
IEEE 802.3 uses "preemptable" throughout. IEEE 802.1Q uses "preemptible"
throughout. Because the definition of "preemptible" falls under 802.1Q's
jurisdiction and 802.3 just references it, I went with the 802.1Q naming
even where supporting an 802.3 feature. Also, checkpatch agrees with this.
====================
Due to the fact that the kernel-side data structures have been carried
over from the ioctl-based ethtool, we are now in the situation where we
have an ethnl_update_bool32() function, but the plain function that
operates on a boolean value kept in an actual u8 netlink attribute
doesn't exist.
With new ethtool features that are exposed solely over netlink, the
kernel data structures will use the "bool" type, so we will need this
kind of helper. Introduce it now; it's needed for things like
verify-disabled for the MAC merge configuration.
Wei Fang [Thu, 19 Jan 2023 04:37:47 +0000 (12:37 +0800)]
net: fec: Use page_pool_put_full_page when freeing rx buffers
The page_pool_release_page was used when freeing rx buffers, and this
function just unmaps the page (if mapped) and does not recycle the page.
So after hundreds of down/up the eth0, the system will out of memory.
For more details, please refer to the following reproduce steps and
bug logs. To solve this issue and refer to the doc of page pool, the
page_pool_put_full_page should be used to replace page_pool_release_page.
Because this API will try to recycle the page if the page refcnt equal to
1. After testing 20000 times, the issue can not be reproduced anymore
(about testing 391 times the issue will occur on i.MX8MN-EVK before).
Reproduce steps:
Create the test script and run the script. The script content is as
follows:
LOOPS=20000
i=1
while [ $i -le $LOOPS ]
do
echo "TINFO:ENET $curface up and down test $i times"
org_macaddr=$(cat /sys/class/net/eth0/address)
ifconfig eth0 down
ifconfig eth0 hw ether $org_macaddr up
i=$(expr $i + 1)
done
sleep 5
if cat /sys/class/net/eth0/operstate | grep 'up';then
echo "TEST PASS"
else
echo "TEST FAIL"
fi
Linus Torvalds [Sun, 22 Jan 2023 20:14:58 +0000 (12:14 -0800)]
Merge tag 'sched_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Make sure the scheduler doesn't use stale frequency scaling values
when latter get disabled due to a value error
- Fix a NULL pointer access on UP configs
- Use the proper locking when updating CPU capacity
* tag 'sched_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/aperfmperf: Erase stale arch_freq_scale values when disabling frequency invariance readings
sched/core: Fix NULL pointer access fault in sched_setaffinity() with non-SMP configs
sched/fair: Fixes for capacity inversion detection
sched/uclamp: Fix a uninitialized variable warnings
Linus Torvalds [Sun, 22 Jan 2023 20:10:47 +0000 (12:10 -0800)]
Merge tag 'edac_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:
- Respect user-supplied polling value in the EDAC device code
- Fix a use-after-free issue in qcom_edac
* tag 'edac_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/qcom: Do not pass llcc_driv_data as edac_device_ctl_info's pvt_info
EDAC/device: Respect any driver-supplied workqueue polling value
Linus Torvalds [Sun, 22 Jan 2023 19:56:33 +0000 (11:56 -0800)]
Merge tag 'gfs2-v6.2-rc4-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 writepage fix from Andreas Gruenbacher:
- Fix a regression introduced by commit "gfs2: stop using
generic_writepages in gfs2_ail1_start_one".
* tag 'gfs2-v6.2-rc4-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
Revert "gfs2: stop using generic_writepages in gfs2_ail1_start_one"
Vipin Sharma [Wed, 11 Jan 2023 18:34:08 +0000 (10:34 -0800)]
KVM: selftests: Make reclaim_period_ms input always be positive
reclaim_period_ms used to be positive only but the commit 0001725d0f9b
("KVM: selftests: Add atoi_positive() and atoi_non_negative() for input
validation") incorrectly changed it to non-negative validation.
KVM: x86/vmx: Do not skip segment attributes if unusable bit is set
When serializing and deserializing kvm_sregs, attributes of the segment
descriptors are stored by user space. For unusable segments,
vmx_segment_access_rights skips all attributes and sets them to 0.
This means we zero out the DPL (Descriptor Privilege Level) for unusable
entries.
Unusable segments are - contrary to their name - usable in 64bit mode and
are used by guests to for example create a linear map through the
NULL selector.
VMENTER checks if SS.DPL is correct depending on the CS segment type.
For types 9 (Execute Only) and 11 (Execute Read), CS.DPL must be equal to
SS.DPL [1].
We have seen real world guests setting CS to a usable segment with DPL=3
and SS to an unusable segment with DPL=3. Once we go through an sregs
get/set cycle, SS.DPL turns to 0. This causes the virtual machine to crash
reproducibly.
This commit changes the attribute logic to always preserve attributes for
unusable segments. According to [2] SS.DPL is always saved on VM exits,
regardless of the unusable bit so user space applications should have saved
the information on serialization correctly.
[3] specifies that besides SS.DPL the rest of the attributes of the
descriptors are undefined after VM entry if unusable bit is set. So, there
should be no harm in setting them all to the previous state.
Paolo Bonzini [Sun, 22 Jan 2023 09:04:50 +0000 (04:04 -0500)]
selftests: kvm: move declaration at the beginning of main()
Placing a declaration of evt_reset is pedantically invalid
according to the C standard. While GCC does not really care
and only warns with -Wpedantic, clang ignores the declaration
altogether with an error:
Revert "gfs2: stop using generic_writepages in gfs2_ail1_start_one"
Commit b2b0a5e97855 switched from generic_writepages() to
filemap_fdatawrite_wbc() in gfs2_ail1_start_one() on the path to
replacing ->writepage() with ->writepages() and eventually eliminating
the former. Function gfs2_ail1_start_one() is called from
gfs2_log_flush(), our main function for flushing the filesystem log.
Unfortunately, at least as implemented today, ->writepage() and
->writepages() are entirely different operations for journaled data
inodes: while the former creates and submits transactions covering the
data to be written, the latter flushes dirty buffers out to disk.
With gfs2_ail1_start_one() now calling ->writepages(), we end up
creating filesystem transactions while we are in the course of a log
flush, which immediately deadlocks on the sdp->sd_log_flush_lock
semaphore.
Work around that by going back to how things used to work before commit b2b0a5e97855 for now; figuring out a superior solution will take time we
don't have available right now. However ...
Since the removal of generic_writepages() is imminent, open-code it
here. We're already inside a blk_start_plug() ... blk_finish_plug()
section here, so skip that part of the original generic_writepages().
Linus Torvalds [Sun, 22 Jan 2023 00:21:56 +0000 (16:21 -0800)]
Merge tag 'io_uring-6.2-2023-01-21' of git://git.kernel.dk/linux
Pull another io_uring fix from Jens Axboe:
"Just a single fix for a regression that happened in this release due
to a poll change. Normally I would've just deferred it to next week,
but since the original fix got picked up by stable, I think it's
better to just send this one off separately.
The issue is around the poll race fix, and how it mistakenly also got
applied to multishot polling. Those don't need the race fix, and we
should not be doing any reissues for that case. Exhaustive test cases
were written and committed to the liburing regression suite for the
reported issue, and additions for similar issues"
* tag 'io_uring-6.2-2023-01-21' of git://git.kernel.dk/linux:
io_uring/poll: don't reissue in case of poll race on multishot request
Linus Torvalds [Sat, 21 Jan 2023 19:20:55 +0000 (11:20 -0800)]
Merge tag 'char-misc-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc and other subsystem driver fixes for
6.2-rc5 to resolve a few reported issues. They include:
- long time pending fastrpc fixes (should have gone into 6.1, my
fault)
- mei driver/bus fixes and new device ids
- interconnect driver fixes for reported problems
- vmci bugfix
- w1 driver bugfixes for reported problems
Almost all of these have been in linux-next with no reported problems,
the rest have all passed 0-day bot testing in my tree and on the
mailing lists where they have sat too long due to me taking a long
time to catch up on my pending patch queue"
* tag 'char-misc-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
VMCI: Use threaded irqs instead of tasklets
misc: fastrpc: Pass bitfield into qcom_scm_assign_mem
gsmi: fix null-deref in gsmi_get_variable
misc: fastrpc: Fix use-after-free race condition for maps
misc: fastrpc: Don't remove map on creater_process and device_release
misc: fastrpc: Fix use-after-free and race in fastrpc_map_find
misc: fastrpc: fix error code in fastrpc_req_mmap()
mei: me: add meteor lake point M DID
mei: bus: fix unlink on bus in error path
w1: fix WARNING after calling w1_process()
w1: fix deadloop in __w1_remove_master_device()
comedi: adv_pci1760: Fix PWM instruction handling
interconnect: qcom: rpm: Use _optional func for provider clocks
interconnect: qcom: msm8996: Fix regmap max_register values
interconnect: qcom: msm8996: Provide UFS clocks to A2NoC
dt-bindings: interconnect: Add UFS clocks to MSM8996 A2NoC
Linus Torvalds [Sat, 21 Jan 2023 19:17:23 +0000 (11:17 -0800)]
Merge tag 'driver-core-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are three small driver and kernel core fixes for 6.2-rc5. They
include:
- potential gadget fixup in do_prlimit
- device property refcount leak fix
- test_async_probe bugfix for reported problem"
* tag 'driver-core-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
prlimit: do_prlimit needs to have a speculation check
driver core: Fix test_async_probe_init saves device in wrong array
device property: fix of node refcount leak in fwnode_graph_get_next_endpoint()
Linus Torvalds [Sat, 21 Jan 2023 19:15:21 +0000 (11:15 -0800)]
Merge tag 'staging-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fix from Greg KH:
"Here is a single staging driver fix for 6.2-rc5. It resolves a build
issue reported and Fixed by Arnd in the vc04_services driver. It's
been in linux-next this week with no reported problems"
* tag 'staging-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: vchiq_arm: fix enum vchiq_status return types
Linus Torvalds [Sat, 21 Jan 2023 19:12:42 +0000 (11:12 -0800)]
Merge tag 'tty-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes for 6.2-rc5 that
resolve a number of tiny reported issues and some new device ids. They
include:
- new device id for the exar serial driver
- speakup tty driver bugfix
- atmel serial driver baudrate fixup
- stm32 serial driver bugfix and then revert as the bugfix broke the
build. That will come back in a later pull request once it is all
worked out properly.
- amba-pl011 serial driver rs486 mode bugfix
- qcom_geni serial driver bugfix
Most of these have been in linux-next with no reported problems (well,
other than the build breakage which generated the revert), the new
device id passed 0-day testing"
* tag 'tty-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: exar: Add support for Sealevel 7xxxC serial cards
Revert "serial: stm32: Merge hard IRQ and threaded IRQ handling into single IRQ handler"
tty: serial: qcom_geni: avoid duplicate struct member init
serial: atmel: fix incorrect baudrate setup
tty: fix possible null-ptr-defer in spk_ttyio_release
serial: stm32: Merge hard IRQ and threaded IRQ handling into single IRQ handler
serial: amba-pl011: fix high priority character transmission in rs486 mode
serial: pch_uart: Pass correct sg to dma_unmap_sg()
tty: serial: qcom-geni-serial: fix slab-out-of-bounds on RX FIFO buffer
Linus Torvalds [Sat, 21 Jan 2023 19:10:03 +0000 (11:10 -0800)]
Merge tag 'usb-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt fixes from Greg KH:
"Here are a number of small USB and Thunderbolt driver fixes and new
device id changes for 6.2-rc5. Included in here are:
- thunderbolt bugfixes for reported problems
- new usb-serial driver ids added
- onboard_hub usb driver fixes for much-reported problems
- xhci bugfixes
- typec bugfixes
- ehci-fsl driver module alias fix
- iowarrior header size fix
- usb gadget driver fixes
All of these, except for the iowarrior fix, have been in linux-next
with no reported issues. The iowarrior fix passed the 0-day testing
and is a one digit change based on a reported problem in the driver
(which was written to a spec, not the real device that is now
available)"
* tag 'usb-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (40 commits)
USB: misc: iowarrior: fix up header size for USB_DEVICE_ID_CODEMERCS_IOW100
usb: host: ehci-fsl: Fix module alias
usb: dwc3: fix extcon dependency
usb: core: hub: disable autosuspend for TI TUSB8041
USB: fix misleading usb_set_intfdata() kernel doc
usb: gadget: f_ncm: fix potential NULL ptr deref in ncm_bitrate()
USB: gadget: Add ID numbers to configfs-gadget driver names
usb: typec: tcpm: Fix altmode re-registration causes sysfs create fail
usb: gadget: g_webcam: Send color matching descriptor per frame
usb: typec: altmodes/displayport: Use proper macro for pin assignment check
usb: typec: altmodes/displayport: Fix pin assignment calculation
usb: typec: altmodes/displayport: Add pin assignment helper
usb: gadget: f_fs: Ensure ep0req is dequeued before free_request
usb: gadget: f_fs: Prevent race during ffs_ep0_queue_wait
usb: misc: onboard_hub: Move 'attach' work to the driver
usb: misc: onboard_hub: Invert driver registration order
usb: ucsi: Ensure connector delayed work items are flushed
usb: musb: fix error return code in omap2430_probe()
usb: chipidea: core: fix possible constant 0 if use IS_ERR(ci->role_switch)
xhci: Detect lpm incapable xHC USB3 roothub ports from ACPI tables
...
Linus Torvalds [Sat, 21 Jan 2023 18:56:37 +0000 (10:56 -0800)]
Merge tag 'kbuild-fixes-v6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Hide LDFLAGS_vmlinux from decompressor Makefiles to fix error
messages when GNU Make 4.4 is used.
- Fix 'make modules' build error when CONFIG_DEBUG_INFO_BTF_MODULES=y.
- Fix warnings emitted by GNU Make 4.4 in scripts/kconfig/Makefile.
- Support GNU Make 4.4 for scripts/jobserver-exec.
- Show clearer error message when kernel/gen_kheaders.sh fails due to
missing cpio.
* tag 'kbuild-fixes-v6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kheaders: explicitly validate existence of cpio command
scripts: support GNU make 4.4 in jobserver-exec
kconfig: Update all declared targets
scripts: rpm: make clear that mkspec script contains 4.13 feature
init/Kconfig: fix LOCALVERSION_AUTO help text
kbuild: fix 'make modules' error when CONFIG_DEBUG_INFO_BTF_MODULES=y
kbuild: export top-level LDFLAGS_vmlinux only to scripts/Makefile.vmlinux
init/version-timestamp.c: remove unneeded #include <linux/version.h>
docs: kbuild: remove mention to dropped $(objtree) feature
Linus Torvalds [Wed, 18 Jan 2023 04:27:23 +0000 (20:27 -0800)]
ext4: deal with legacy signed xattr name hash values
We potentially have old hashes of the xattr names generated on systems
with signed 'char' types. Now that everybody uses '-funsigned-char',
those hashes will no longer match.
This only happens if you use xattrs names that have the high bit set,
which probably doesn't happen in practice, but the xfstest generic/454
shows it.
Instead of adding a new "signed xattr hash filesystem" bit and having to
deal with all the possible combinations, just calculate the hash both
ways if the first one fails, and always generate new hashes with the
proper unsigned char version.
prlimit: do_prlimit needs to have a speculation check
do_prlimit() adds the user-controlled resource value to a pointer that
will subsequently be dereferenced. In order to help prevent this
codepath from being used as a spectre "gadget" a barrier needs to be
added after checking the range.
Marc Zyngier [Thu, 19 Jan 2023 11:07:59 +0000 (11:07 +0000)]
KVM: arm64: GICv4.1: Fix race with doorbell on VPE activation/deactivation
To save the vgic LPI pending state with GICv4.1, the VPEs must all be
unmapped from the ITSs so that the sGIC caches can be flushed.
The opposite is done once the state is saved.
This is all done by using the activate/deactivate irqdomain callbacks
directly from the vgic code. Crutially, this is done without holding
the irqdesc lock for the interrupts that represent the VPE. And these
callbacks are changing the state of the irqdesc. What could possibly
go wrong?
If a doorbell fires while we are messing with the irqdesc state,
it will acquire the lock and change the interrupt state concurrently.
Since we don't hole the lock, curruption occurs in on the interrupt
state. Oh well.
While acquiring the lock would fix this (and this was Shanker's
initial approach), this is still a layering violation we could do
without. A better approach is actually to free the VPE interrupt,
do what we have to do, and re-request it.
It is more work, but this usually happens only once in the lifetime
of the VM and we don't really care about this sort of overhead.
Catalin Marinas [Thu, 19 Jan 2023 17:09:02 +0000 (17:09 +0000)]
KVM: arm64: Pass the actual page address to mte_clear_page_tags()
Commit d77e59a8fccd ("arm64: mte: Lock a page for MTE tag
initialisation") added a call to mte_clear_page_tags() in case a
prior mte_copy_tags_from_user() failed in order to avoid stale tags in
the guest page (it should have really been a separate commit).
Unfortunately, the argument passed to this function was the address of
the struct page rather than the actual page address. Fix this function
call.
Paolo Abeni [Thu, 19 Jan 2023 18:55:45 +0000 (19:55 +0100)]
net: fix UaF in netns ops registration error path
If net_assign_generic() fails, the current error path in ops_init() tries
to clear the gen pointer slot. Anyway, in such error path, the gen pointer
itself has not been modified yet, and the existing and accessed one is
smaller than the accessed index, causing an out-of-bounds error:
BUG: KASAN: slab-out-of-bounds in ops_init+0x2de/0x320
Write of size 8 at addr ffff888109124978 by task modprobe/1018
Eric Dumazet [Thu, 19 Jan 2023 11:01:50 +0000 (11:01 +0000)]
netlink: prevent potential spectre v1 gadgets
Most netlink attributes are parsed and validated from
__nla_validate_parse() or validate_nla()
u16 type = nla_type(nla);
if (type == 0 || type > maxtype) {
/* error or continue */
}
@type is then used as an array index and can be used
as a Spectre v1 gadget.
array_index_nospec() can be used to prevent leaking
content of kernel memory to malicious users.
This should take care of vast majority of netlink uses,
but an audit is needed to take care of others where
validation is not yet centralized in core netlink functions.
Linus Torvalds [Sat, 21 Jan 2023 01:13:55 +0000 (17:13 -0800)]
Merge tag 'gpio-fixes-for-v6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix a potential race condition and always set GPIOs used as interrupt
source to input in gpio-mxc
- fix a GPIO ACPI-related issue with system suspend on Clevo NL5xRU
* tag 'gpio-fixes-for-v6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: acpi: Add a ignore wakeup quirk for Clevo NL5xRU
gpiolib: acpi: Allow ignoring wake capability on pins that aren't in _AEI
gpio: mxc: Always set GPIOs used as interrupt source to INPUT mode
gpio: mxc: Protect GPIO irqchip RMW with bgpio spinlock
Linus Torvalds [Fri, 20 Jan 2023 22:28:49 +0000 (14:28 -0800)]
Merge tag '6.2-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
- important fix for packet signature calculation error
- three fixes to correct DFS deadlock, and DFS refresh problem
- remove an unused DFS function, and duplicate tcon refresh code
- DFS cache lookup fix
- uninitialized rc fix
* tag '6.2-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: remove unused function
cifs: do not include page data when checking signature
cifs: fix return of uninitialized rc in dfs_cache_update_tgthint()
cifs: handle cache lookup errors different than -ENOENT
cifs: remove duplicate code in __refresh_tcon()
cifs: don't take exclusive lock for updating target hints
cifs: avoid re-lookups in dfs_cache_find()
cifs: fix potential deadlock in cache_refresh_path()
Linus Torvalds [Fri, 20 Jan 2023 22:22:56 +0000 (14:22 -0800)]
Merge tag 'pinctrl-v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
- Compilation fix for Sunplus sp7021
- Add some missing headers after a cleanup to the Nomadik driver
- Fix pull type and mux routes on Rockchip RK3568
* tag 'pinctrl-v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: rockchip: fix mux route data for rk3568
pinctrl: rockchip: fix reading pull type on rk3568
pinctrl: nomadik: Add missing header(s)
pinctrl: sp7021: fix unused function warning
Jens Axboe [Fri, 20 Jan 2023 22:08:21 +0000 (15:08 -0700)]
io_uring/poll: don't reissue in case of poll race on multishot request
A previous commit fixed a poll race that can occur, but it's only
applicable for multishot requests. For a multishot request, we can safely
ignore a spurious wakeup, as we never leave the waitqueue to begin with.
A blunt reissue of a multishot armed request can cause us to leak a
buffer, if they are ring provided. While this seems like a bug in itself,
it's not really defined behavior to reissue a multishot request directly.
It's less efficient to do so as well, and not required to rearm anything
like it is for singleshot poll requests.
ksmbd: do not sign response to session request for guest login
If ksmbd.mountd is configured to assign unknown users to the guest account
("map to guest = bad user" in the config), ksmbd signs the response.
This is wrong according to MS-SMB2 3.3.5.5.3:
12. If the SMB2_SESSION_FLAG_IS_GUEST bit is not set in the SessionFlags
field, and Session.IsAnonymous is FALSE, the server MUST sign the
final session setup response before sending it to the client, as
follows:
[...]
This fixes libsmb2 based applications failing to establish a session
("Wrong signature in received").
Linus Torvalds [Fri, 20 Jan 2023 20:44:41 +0000 (12:44 -0800)]
Merge tag 'block-6.2-2023-01-20' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"Various little tweaks all over the place:
- NVMe pull request via Christoph:
- fix controller shutdown regression in nvme-apple (Janne Grunau)
- fix a polling on timeout regression in nvme-pci (Keith Busch)
- Fix a bug in the read request side request allocation caching
(Pavel)
- pktcdvd was brought back after we configured a NULL return on bio
splits, make it consistent with the others (me)
- BFQ refcount fix (Yu)
- Block cgroup policy activation fix (Yu)
- Fix for an md regression introduced in the 6.2 cycle (Adrian)"
* tag 'block-6.2-2023-01-20' of git://git.kernel.dk/linux:
nvme-pci: fix timeout request state check
nvme-apple: only reset the controller when RTKit is running
nvme-apple: reset controller during shutdown
block: fix hctx checks for batch allocation
block/rnbd-clt: fix wrong max ID in ida_alloc_max
blk-cgroup: fix missing pd_online_fn() while activating policy
pktcdvd: check for NULL returna fter calling bio_split_to_limits()
block, bfq: switch 'bfqg->ref' to use atomic refcount apis
md: fix incorrect declaration about claim_rdev in md_import_device
Linus Torvalds [Fri, 20 Jan 2023 20:39:45 +0000 (12:39 -0800)]
Merge tag 'io_uring-6.2-2023-01-20' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
"Fixes for the MSG_RING opcode. Nothing really major:
- Fix an overflow missing serialization around posting CQEs to the
target ring (me)
- Disable MSG_RING on a ring that isn't enabled yet. There's nothing
really wrong with allowing it, but 1) it's somewhat odd as nobody
can receive them yet, and 2) it means that using the right delivery
mechanism might change. As nobody should be sending CQEs to a ring
that isn't enabled yet, let's just disable it (Pavel)
- Tweak to when we decide to post remotely or not for MSG_RING
(Pavel)"
* tag 'io_uring-6.2-2023-01-20' of git://git.kernel.dk/linux:
io_uring/msg_ring: fix remote queue to disabled ring
io_uring/msg_ring: fix flagging remote execution
io_uring/msg_ring: fix missing lock on overflow for IOPOLL
io_uring/msg_ring: move double lock/unlock helpers higher up
Linus Torvalds [Fri, 20 Jan 2023 19:59:01 +0000 (11:59 -0800)]
Merge tag 'for-6.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix potential out-of-bounds access to leaf data when seeking in an
inline file
- fix potential crash in quota when rescan races with disable
- reimplement super block signature scratching by marking page/folio
dirty and syncing block device, allow removing write_one_page
* tag 'for-6.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix race between quota rescan and disable leading to NULL pointer deref
btrfs: fix invalid leaf access due to inline extent during lseek
btrfs: stop using write_one_page in btrfs_scratch_superblock
btrfs: factor out scratching of one regular super block
Linus Torvalds [Fri, 20 Jan 2023 19:35:21 +0000 (11:35 -0800)]
Merge tag 'linux-kselftest-fixes-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fix from Shuah Khan:
"Fix an error seen during unconfigured LLVM builds"
* tag 'linux-kselftest-fixes-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kselftest: Fix error message for unconfigured LLVM builds
Linus Torvalds [Fri, 20 Jan 2023 19:14:41 +0000 (11:14 -0800)]
Merge tag 'thermal-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fix from Rafael Wysocki:
"Modify __thermal_cooling_device_register() to make it call
put_device() after invoking device_register() and fix up a few error
paths calling thermal_cooling_device_destroy_sysfs() unnecessarily
(Viresh Kumar)"
* tag 'thermal-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: core: call put_device() only after device_register() fails
Linus Torvalds [Fri, 20 Jan 2023 19:11:35 +0000 (11:11 -0800)]
Merge tag 'acpi-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These update the ACPICA entry in MAINTAINERS, add a backlight handling
quirk and fix the ACPI PRM (platform runtime) mechanism support.
Specifics:
- Update the ACPICA development list address in MAINTAINERS to the
new one that does not bounce (Rafael Wysocki)
- Check whether EFI runtime is available when registering the ACPI
PRM address space handler and when running it (Ard Biesheuvel)
- Add backlight=native DMI quirk for Acer Aspire 4810T to the ACPI
video driver (Hans de Goede)"
* tag 'acpi-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PRM: Check whether EFI runtime is available
ACPI: video: Add backlight=native DMI quirk for Acer Aspire 4810T
MAINTAINERS: Update the ACPICA development list address
Linus Torvalds [Fri, 20 Jan 2023 19:04:59 +0000 (11:04 -0800)]
Merge tag 'mmc-v6.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
- sunxi-mmc: Fix clock refcount imbalance during unbind
- sdhci-esdhc-imx: Fix some tuning settings
* tag 'mmc-v6.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sunxi-mmc: Fix clock refcount imbalance during unbind
mmc: sdhci-esdhc-imx: correct the tuning start tap and step setting
EDAC/qcom: Do not pass llcc_driv_data as edac_device_ctl_info's pvt_info
The memory for llcc_driv_data is allocated by the LLCC driver. But when
it is passed as the private driver info to the EDAC core, it will get freed
during the qcom_edac driver release. So when the qcom_edac driver gets probed
again, it will try to use the freed data leading to the use-after-free bug.
Hence, do not pass llcc_driv_data as pvt_info but rather reference it
using the platform_data pointer in the qcom_edac driver.
amdgpu:
- Fix display scaling
- Fix RN/CZN power reporting on some firmware versions
- Colorspace fixes
- Fix resource freeing in error case in CS IOCTL
- Fix warning on driver unload
- GC11 fixes
- DCN 3.1.4/5 S/G display workarounds"
* tag 'drm-fixes-2023-01-20' of git://anongit.freedesktop.org/drm/drm: (24 commits)
drm/amd/display: disable S/G display on DCN 3.1.4
drm/amd/display: disable S/G display on DCN 3.1.5
drm/amdgpu: allow multipipe policy on ASICs with one MEC
drm/amdgpu: correct MEC number for gfx11 APUs
drm/amd/display: fix issues with driver unload
drm/amdgpu: fix amdgpu_job_free_resources v2
drm/amd/display: Fix COLOR_SPACE_YCBCR2020_TYPE matrix
drm/amd/display: Calculate output_color_space after pixel encoding adjustment
drm/amdgpu: fix cleaning up reserved VMID on release
drm/amdgpu: Correct the power calcultion for Renior/Cezanne.
drm/amd/display: Fix set scaling doesn's work
drm/i915: Remove unused variable
drm/i915/dg2: Introduce Wa_18019271663
drm/i915/dg2: Introduce Wa_18018764978
drm/fb-helper: Set framebuffer for vga-switcheroo clients
drm/i915: Allow switching away via vga-switcheroo if uninitialized
drm/i915/selftests: Unwind hugepages to drop wakeref on error
drm/i915: re-disable RC6p on Sandy Bridge
drm/panfrost: fix GENERIC_ATOMIC64 dependency
drm/i915/display: Check source height is > 0
...
Linus Torvalds [Fri, 20 Jan 2023 18:23:14 +0000 (10:23 -0800)]
Merge tag 'dmaengine-fix-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine fixes from Vinod Koul:
- email address Update for Jie Hai
- fix double increment of client_count in dma_chan_get()
- idxd driver fixes: use after free, probe error handling and callback
on wq disable
- fix for qcom gpi driver GO tre
- ptdma locking fix
- tegra & imx-sdma mem leak fix
* tag 'dmaengine-fix-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
ptdma: pt_core_execute_cmd() should use spinlock
dmaengine: tegra: Fix memory leak in terminate_all()
dmaengine: xilinx_dma: call of_node_put() when breaking out of for_each_child_of_node()
dmaengine: imx-sdma: Fix a possible memory leak in sdma_transfer_init
dmaengine: Fix double increment of client_count in dma_chan_get()
dmaengine: tegra210-adma: fix global intr clear
Add exception protection processing for vd in axi_chan_handle_err function
dmaengine: lgm: Move DT parsing after initialization
MAINTAINERS: update Jie Hai's email address
dmaengine: ti: k3-udma: Do conditional decrement of UDMA_CHAN_RT_PEER_BCNT_REG
dmaengine: idxd: Do not call DMX TX callbacks during workqueue disable
dmaengine: idxd: Prevent use after free on completion memory
dmaengine: idxd: Let probe fail when workqueue cannot be enabled
dmaengine: qcom: gpi: Set link_rx bit on GO TRE for rx operation
Linus Torvalds [Fri, 20 Jan 2023 17:58:44 +0000 (09:58 -0800)]
Merge tag 'net-6.2-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless, bluetooth, bpf and netfilter.
Current release - regressions:
- Revert "net: team: use IFF_NO_ADDRCONF flag to prevent ipv6
addrconf", fix nsna_ping mode of team
- wifi: mt76: fix bugs in Rx queue handling and DMA mapping
- eth: mlx5:
- add missing mutex_unlock in error reporter
- protect global IPsec ASO with a lock
Current release - new code bugs:
- rxrpc: fix wrong error return in rxrpc_connect_call()
Previous releases - regressions:
- bluetooth: hci_sync: fix use of HCI_OP_LE_READ_BUFFER_SIZE_V2
- wifi:
- mac80211: fix crashes on Rx due to incorrect initialization of
rx->link and rx->link_sta
- mac80211: fix bugs in iTXQ conversion - Tx stalls, incorrect
aggregation handling, crashes
- brcmfmac: fix regression for Broadcom PCIe wifi devices
- rndis_wlan: prevent buffer overflow in rndis_query_oid
- netfilter: conntrack: handle tcp challenge acks during connection
reuse
- sched: avoid grafting on htb_destroy_class_offload when destroying
- virtio-net: correctly enable callback during start_xmit, fix stalls
- tcp: avoid the lookup process failing to get sk in ehash table
- ipa: disable ipa interrupt during suspend
- eth: stmmac: enable all safety features by default
Previous releases - always broken:
- bpf:
- fix pointer-leak due to insufficient speculative store bypass
mitigation (Spectre v4)
- skip task with pid=1 in send_signal_common() to avoid a splat
- fix BPF program ID information in BPF_AUDIT_UNLOAD as well as
PERF_BPF_EVENT_PROG_UNLOAD events
- fix potential deadlock in htab_lock_bucket from same bucket
index but different map_locked index
- bluetooth:
- fix a buffer overflow in mgmt_mesh_add()
- hci_qca: fix driver shutdown on closed serdev
- ISO: fix possible circular locking dependency
- CIS: hci_event: fix invalid wait context
- wifi: brcmfmac: fixes for survey dump handling
- mptcp: explicitly specify sock family at subflow creation time
- netfilter: nft_payload: incorrect arithmetics when fetching VLAN
header bits
- tcp: fix rate_app_limited to default to 1
- l2tp: close all race conditions in l2tp_tunnel_register()
- eth: mlx5: fixes for QoS config and eswitch configuration
- eth: enetc: avoid deadlock in enetc_tx_onestep_tstamp()
- eth: stmmac: fix invalid call to mdiobus_get_phy()
Misc:
- ethtool: add netlink attr in rss get reply only if the value is not
empty"
* tag 'net-6.2-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
Revert "Merge branch 'octeontx2-af-CPT'"
tcp: fix rate_app_limited to default to 1
bnxt: Do not read past the end of test names
net: stmmac: enable all safety features by default
octeontx2-af: add mbox to return CPT_AF_FLT_INT info
octeontx2-af: update cpt lf alloc mailbox
octeontx2-af: restore rxc conf after teardown sequence
octeontx2-af: optimize cpt pf identification
octeontx2-af: modify FLR sequence for CPT
octeontx2-af: add mbox for CPT LF reset
octeontx2-af: recover CPT engine when it gets fault
net: dsa: microchip: ksz9477: port map correction in ALU table entry register
selftests/net: toeplitz: fix race on tpacket_v3 block close
net/ulp: use consistent error code when blocking ULP
octeontx2-pf: Fix the use of GFP_KERNEL in atomic context on rt
tcp: avoid the lookup process failing to get sk in ehash table
Revert "net: team: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf"
MAINTAINERS: add networking entries for Willem
net: sched: gred: prevent races when adding offloads to stats
l2tp: prevent lockdep issue in l2tp_tunnel_register()
...