When IO requests are made continuously and the target block device
handles requests faster than request arrival, the request dispatch loop
keeps on repeating to dispatch the arriving requests very long time,
more than a minute. Since the loop runs as a workqueue worker task, the
very long loop duration triggers workqueue watchdog timeout and BUG [1].
To avoid the very long loop duration, break the loop periodically. When
opportunity to dispatch requests still exists, check need_resched(). If
need_resched() returns true, the dispatch loop already consumed its time
slice, then reschedule the dispatch work and break the loop. With heavy
IO load, need_resched() does not return true for 20~30 seconds. To cover
such case, check time spent in the dispatch loop with jiffies. If more
than 1 second is spent, reschedule the dispatch work and break the loop.
Workloads using provided buffers benefit from using and returning buffers
in the right order, and so does TLBs for that matter. Manage the internal
buffer list in a straight list, rather than use the head buffer as the
insertion node. Use a hashed list for the buffer group IDs instead of
xarray, the overhead is much lower this way. xarray provides internal
locking and other trickery that is handy for some uses cases, but
io_uring already locks internally for the buffer manipulation and needs
none of that.
This is good for about a 2% reduction in overhead, combination of the
improved management and the fact that the workload has an easier time
bundling back provided buffers.
Linus Torvalds [Thu, 17 Mar 2022 19:40:59 +0000 (12:40 -0700)]
Merge tag 'acpi-5.17-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Revert recent commit that caused multiple systems to misbehave due to
firmware issues"
* tag 'acpi-5.17-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI: scan: Do not add device IDs from _CID if _HID is not valid"
Rework to add the header files to LOCAL_HDRS before including ../lib.mk,
since the dependency is evaluated in '$(OUTPUT)/%:%.c $(LOCAL_HDRS)' in
file lib.mk.
Joseph Qi [Wed, 16 Mar 2022 23:15:09 +0000 (16:15 -0700)]
ocfs2: fix crash when initialize filecheck kobj fails
Once s_root is set, genric_shutdown_super() will be called if
fill_super() fails. That means, we will call ocfs2_dismount_volume()
twice in such case, which can lead to kernel crash.
Fix this issue by initializing filecheck kobj before setting s_root.
Qian Cai [Wed, 16 Mar 2022 23:15:06 +0000 (16:15 -0700)]
configs/debug: restore DEBUG_INFO=y for overriding
Previously, I failed to realize that Kees' patch [1] has not been merged
into the mainline yet, and dropped DEBUG_INFO=y too eagerly from the
mainline. As the results, "make debug.config" won't be able to flip
DEBUG_INFO=n from the existing .config. This should close the gaps of a
few weeks before Kees' patch is there, and work regardless of their
merging status anyway.
The reason for the livelock is that swapcache_prepare() always returns
EEXIST, indicating that SWAP_HAS_CACHE has not been cleared, so that it
cannot jump out of the loop. We suspect that the task that clears the
SWAP_HAS_CACHE flag never gets a chance to run. We try to lower the
priority of the task stuck in a livelock so that the task that clears
the SWAP_HAS_CACHE flag will run. The results show that the system
returns to normal after the priority is lowered.
In our testing, multiple real-time tasks are bound to the same core, and
the task in the livelock is the highest priority task of the core, so
the livelocked task cannot be preempted.
Although cond_resched() is used by __read_swap_cache_async, it is an
empty function in the preemptive system and cannot achieve the purpose
of releasing the CPU. A high-priority task cannot release the CPU
unless preempted by a higher-priority task. But when this task is
already the highest priority task on this core, other tasks will not be
able to be scheduled. So we think we should replace cond_resched() with
schedule_timeout_uninterruptible(1), schedule_timeout_interruptible will
call set_current_state first to set the task state, so the task will be
removed from the running queue, so as to achieve the purpose of giving
up the CPU and prevent it from running in kernel mode for too long.
(akpm: ugly hack becomes uglier. But it fixes the issue in a
backportable-to-stable fashion while we hopefully work on something
better)
Biju Das [Wed, 16 Mar 2022 17:53:17 +0000 (17:53 +0000)]
spi: Fix erroneous sgs value with min_t()
While computing sgs in spi_map_buf(), the data type
used in min_t() for max_seg_size is 'unsigned int' where
as that of ctlr->max_dma_len is 'size_t'.
min_t(unsigned int,x,y) gives wrong results if one of x/y is
'size_t'
Consider the below examples on a 64-bit machine (ie size_t is
64-bits, and unsigned int is 32-bit).
case 1) min_t(unsigned int, 5, 0x100000001);
case 2) min_t(size_t, 5, 0x100000001);
Case 1 returns '1', where as case 2 returns '5'. As you can see
the result from case 1 is wrong.
This patch fixes the above issue by using the data type of the
parameters that are used in min_t with maximum data length.
Ivan Vecera [Thu, 17 Mar 2022 10:45:24 +0000 (11:45 +0100)]
iavf: Fix hang during reboot/shutdown
Recent commit 974578017fc1 ("iavf: Add waiting so the port is
initialized in remove") adds a wait-loop at the beginning of
iavf_remove() to ensure that port initialization is finished
prior unregistering net device. This causes a regression
in reboot/shutdown scenario because in this case callback
iavf_shutdown() is called and this callback detaches the device,
makes it down if it is running and sets its state to __IAVF_REMOVE.
Later shutdown callback of associated PF driver (e.g. ice_shutdown)
is called. That callback calls among other things sriov_disable()
that calls indirectly iavf_remove() (see stack trace below).
As the adapter state is already __IAVF_REMOVE then the mentioned
loop is end-less and shutdown process hangs.
The patch fixes this by checking adapter's state at the beginning
of iavf_remove() and skips the rest of the function if the adapter
is already in remove state (shutdown is in progress).
Reproducer:
1. Create VF on PF driven by ice or i40e driver
2. Ensure that the VF is bound to iavf driver
3. Reboot
Vladimir Oltean [Wed, 16 Mar 2022 19:21:17 +0000 (21:21 +0200)]
net: mscc: ocelot: fix backwards compatibility with single-chain tc-flower offload
ACL rules can be offloaded to VCAP IS2 either through chain 0, or, since
the blamed commit, through a chain index whose number encodes a specific
PAG (Policy Action Group) and lookup number.
The chain number is translated through ocelot_chain_to_pag() into a PAG,
and through ocelot_chain_to_lookup() into a lookup number.
The problem with the blamed commit is that the above 2 functions don't
have special treatment for chain 0. So ocelot_chain_to_pag(0) returns
filter->pag = 224, which is in fact -32, but the "pag" field is an u8.
So we end up programming the hardware with VCAP IS2 entries having a PAG
of 224. But the way in which the PAG works is that it defines a subset
of VCAP IS2 filters which should match on a packet. The default PAG is
0, and previous VCAP IS1 rules (which we offload using 'goto') can
modify it. So basically, we are installing filters with a PAG on which
no packet will ever match. This is the hardware equivalent of adding
filters to a chain which has no 'goto' to it.
Restore the previous functionality by making ACL filters offloaded to
chain 0 go to PAG 0 and lookup number 0. The choice of PAG is clearly
correct, but the choice of lookup number isn't "as before" (which was to
leave the lookup a "don't care"). However, lookup 0 should be fine,
since even though there are ACL actions (policers) which have a
requirement to be used in a specific lookup, that lookup is 0.
Doug Berger [Thu, 17 Mar 2022 01:28:12 +0000 (18:28 -0700)]
net: bcmgenet: skip invalid partial checksums
The RXCHK block will return a partial checksum of 0 if it encounters
a problem while receiving a packet. Since a 1's complement sum can
only produce this result if no bits are set in the received data
stream it is fair to treat it as an invalid partial checksum and
not pass it up the stack.
Manish Chopra [Wed, 16 Mar 2022 21:46:13 +0000 (14:46 -0700)]
bnx2x: fix built-in kernel driver load failure
Commit b7a49f73059f ("bnx2x: Utilize firmware 7.13.21.0")
added request_firmware() logic in probe() which caused
load failure when firmware file is not present in initrd (below),
as access to firmware file is not feasible during probe.
Direct firmware load for bnx2x/bnx2x-e2-7.13.15.0.fw failed with error -2
Direct firmware load for bnx2x/bnx2x-e2-7.13.21.0.fw failed with error -2
This patch fixes this issue by -
1. Removing request_firmware() logic from the probe()
such that .ndo_open() handle it as it used to handle
it earlier
2. Given request_firmware() is removed from probe(), so
driver has to relax FW version comparisons a bit against
the already loaded FW version (by some other PFs of same
adapter) to allow different compatible/close enough FWs with which
multiple PFs may run with (in different environments), as the
given PF who is in probe flow has no idea now with which firmware
file version it is going to initialize the device in ndo_open()
Eliminate anonymous module_init() and module_exit(), which can lead to
confusion or ambiguity when reading System.map, crashes/oops/bugs,
or an initcall_debug log.
Give each of these init and exit functions unique driver-specific
names to eliminate the anonymous names.
The powernow-k8 driver will do checks at startup that the current
active driver is acpi-cpufreq and show a warning when they're not
expected.
Because of this the following warning comes up on systems that
support amd-pstate and compiled in both drivers:
`WTF driver: amd-pstate`
The systems that support powernow-k8 will not support amd-pstate,
so re-order the checks to validate the CPU model number first to
avoid this warning being displayed on modern SOCs.
ACPI: bus: Avoid using CPPC if not supported by firmware
If the platform firmware indicates that it does not support CPPC by
clearing the OSC_SB_CPC_SUPPORT and OSC_SB_CPCV2_SUPPORT bits in the
platform _OSC capabilities mask, avoid attempting to evaluate _CPC
which may fail in that case.
Because the OSC_SB_CPC_SUPPORT and OSC_SB_CPCV2_SUPPORT bits are only
added to the supported platform capabilities mask on x86, when
X86_FEATURE_HWP is supported, allow _CPC to be evaluated regardless
in the other cases.
Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag"
Revert commit 159d8c274fd9 ("ACPI: Pass the same capabilities to the
_OSC regardless of the query flag") which caused legitimate usage
scenarios (when the platform firmware does not want the OS to control
certain platform features controlled by the system bus scope _OSC) to
break and was misguided by some misleading language in the _OSC
definition in the ACPI specification (in particular, Section 6.2.11.1.3
"Sequence of _OSC Calls" that contradicts other perts of the _OSC
definition).
Commit bf9282dc26e7 ("cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic")
moved the leave_mm() call away from intel_idle(), but it didn't update
its kerneldoc comment accordingly, so do that now.
Fixes: bf9282dc26e7 ("cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic") Signed-off-by: Rafael J. Wysocki <[email protected]>
Werner Sembach [Tue, 15 Mar 2022 19:02:28 +0000 (20:02 +0100)]
ACPI: video: Force backlight native for Clevo NL5xRU and NL5xNU
Clevo NL5xRU and NL5xNU/TUXEDO Aura 15 Gen1 and Gen2 have both a working
native and video interface. However the default detection mechanism first
registers the video interface before unregistering it again and switching
to the native interface during boot. This results in a dangling SBIOS
request for backlight change for some reason, causing the backlight to
switch to ~2% once per boot on the first power cord connect or disconnect
event. Setting the native interface explicitly circumvents this buggy
behaviour by avoiding the unregistering process.
drm: Don't make DRM_PANEL_BRIDGE dependent on DRM_KMS_HELPERS
Fix a number of undefined references to drm_kms_helper.ko in
drm_dp_helper.ko:
arm-suse-linux-gnueabi-ld: drivers/gpu/drm/dp/drm_dp_mst_topology.o: in function `drm_dp_mst_duplicate_state':
drm_dp_mst_topology.c:(.text+0x2df0): undefined reference to `__drm_atomic_helper_private_obj_duplicate_state'
arm-suse-linux-gnueabi-ld: drivers/gpu/drm/dp/drm_dp_mst_topology.o: in function `drm_dp_delayed_destroy_work':
drm_dp_mst_topology.c:(.text+0x370c): undefined reference to `drm_kms_helper_hotplug_event'
arm-suse-linux-gnueabi-ld: drivers/gpu/drm/dp/drm_dp_mst_topology.o: in function `drm_dp_mst_up_req_work':
drm_dp_mst_topology.c:(.text+0x7938): undefined reference to `drm_kms_helper_hotplug_event'
arm-suse-linux-gnueabi-ld: drivers/gpu/drm/dp/drm_dp_mst_topology.o: in function `drm_dp_mst_link_probe_work':
drm_dp_mst_topology.c:(.text+0x82e0): undefined reference to `drm_kms_helper_hotplug_event'
This happens if panel-edp.ko has been configured with
DRM_PANEL_EDP=y
DRM_DP_HELPER=y
DRM_KMS_HELPER=m
which builds DP helpers into the kernel and KMS helpers sa a module.
Making DRM_PANEL_EDP select DRM_KMS_HELPER resolves this problem.
To avoid a resulting cyclic dependency with DRM_PANEL_BRIDGE, don't
make the latter depend on DRM_KMS_HELPER and fix the one DRM bridge
drivers that doesn't already select DRM_KMS_HELPER. As KMS helpers
cannot be selected directly by the user, config symbols should avoid
depending on it anyway.
Steve French [Thu, 17 Mar 2022 03:08:43 +0000 (22:08 -0500)]
smb3: fix incorrect session setup check for multiuser mounts
A recent change to how the SMB3 server (socket) and session status
is managed regressed multiuser mounts by changing the check
for whether session setup is needed to the socket (TCP_Server_info)
structure instead of the session struct (cifs_ses). Add additional
check in cifs_setup_sesion to fix this.
Pavel Begunkov [Thu, 17 Mar 2022 02:03:42 +0000 (02:03 +0000)]
io_uring: fold evfd signalling under a slower path
Add ->has_evfd flag, which is true IFF there is an eventfd attached, and
use it to hide io_eventfd_signal() into __io_commit_cqring_flush() and
combine fast checks in a single if. Also, gcc 11.2 wasn't inlining
io_cqring_ev_posted() without this change, so helps with that as well.
Pavel Begunkov [Thu, 17 Mar 2022 02:03:41 +0000 (02:03 +0000)]
io_uring: thin down io_commit_cqring()
io_commit_cqring() is currently always under spinlock section, so it's
always better to keep it as slim as possible. Move
__io_commit_cqring_flush() out of it into ev_posted*(). If fast checks
do fail and this post-processing is required, we'll reacquire
->completion_lock, which is fine as we don't care about performance of
draining and offset timeouts.
Pavel Begunkov [Thu, 17 Mar 2022 02:03:40 +0000 (02:03 +0000)]
io_uring: shuffle io_eventfd_signal() bits around
A preparation patch, which moves a fast ->io_ev_fd check out of
io_eventfd_signal() into ev_posted*(). Compilers are smart enough for it
to not change anything, but will need it later.
Pavel Begunkov [Thu, 17 Mar 2022 02:03:39 +0000 (02:03 +0000)]
io_uring: remove extra barrier for non-sqpoll iopoll
smp_mb() in io_cqring_ev_posted_iopoll() is only there because of
waitqueue_active(). However, non-SQPOLL IOPOLL ring doesn't wake the CQ
and so the barrier there is useless. Kill it, it's usually pretty
expensive.
Pavel Begunkov [Thu, 17 Mar 2022 02:03:38 +0000 (02:03 +0000)]
io_uring: extend provided buf return to fails
It's never a good idea to put provided buffers without notifying the
userspace, it'll lead to userspace leaks, so add io_put_kbuf() in
io_req_complete_failed(). The fail helper is called by all sorts of
requests, but it's still safe to do as io_put_kbuf() will return 0 in
for all requests that don't support and so don't expect provided buffers.
io_fill_cqe*() is not always the best way to post CQEs just because
there is enough of infrastructure on top. Replace a raw call to a
variant of it inside of io_timeout_cancel(), which also saves us some
bloating and might help with batching later.
Jens Axboe [Wed, 16 Mar 2022 22:59:10 +0000 (16:59 -0600)]
io_uring: cache poll/double-poll state with a request flag
With commit "io_uring: cache req->apoll->events in req->cflags" applied,
we now have just io_poll_remove_entries() dipping into req->apoll when
it isn't strictly necessary.
Mark poll and double-poll with a flag, so we know if we need to look
at apoll->double_poll. This avoids pulling in those cachelines if we
don't need them. The common case is that the poll wake handler already
removed these entries while hot off the completion path.
Jens Axboe [Wed, 16 Mar 2022 22:55:05 +0000 (16:55 -0600)]
io_uring: cache req->apoll->events in req->cflags
When we arm poll on behalf of a different type of request, like a network
receive, then we allocate req->apoll as our poll entry. Running network
workloads shows io_poll_check_events() as the most expensive part of
io_uring, and it's all due to having to pull in req->apoll instead of
just the request which we have hot already.
Cache poll->events in req->cflags, which isn't used until the request
completes anyway. This isn't strictly needed for regular poll, where
req->poll.events is used and thus already hot, but for the sake of
unification we do it all around.
This saves 3-4% of overhead in certain request workloads.
Jens Axboe [Wed, 16 Mar 2022 18:53:23 +0000 (12:53 -0600)]
io_uring: move req->poll_refs into previous struct hole
This serves two purposes:
- We now have the last cacheline mostly unused for generic workloads,
instead of having to pull in the poll refs explicitly for workloads
that rely on poll arming.
This reverts commit 869f0ec048dc8fd88c0b2003373bd985795179fb. That
updated the expected device tree binding format for the ls-extirq
driver, without also updating the parsing code (ls_extirq_parse_map)
to the new format.
The context is that the ls-extirq driver uses the standard
"interrupt-map" OF property in a non-standard way, as suggested by
Rob Herring during review:
https://lore.kernel.org/lkml/20190927161118.GA19333@bogus/
This has turned out to be problematic, as Marc Zyngier discovered
through commit 041284181226 ("of/irq: Allow matching of an interrupt-map
local to an interrupt controller"), later fixed through commit de4adddcbcc2 ("of/irq: Add a quirk for controllers with their own
definition of interrupt-map"). Marc's position, expressed on multiple
opportunities, is that:
(a) [ making private use of the reserved "interrupt-map" name in a
driver ] "is wrong, by the very letter of what an interrupt-map
means. If the interrupt map points to an interrupt controller,
that's the target for the interrupt."
https://lore.kernel.org/lkml/[email protected]/
(b) [ updating the driver's bindings to accept a non-reserved name for
this property, as an alternative, is ] "is totally pointless. These
machines have been in the wild for years, and existing DTs will be
there *forever*."
https://lore.kernel.org/lkml/[email protected]/
Considering the above, the Linux kernel has quirks in place to deal with
the ls-extirq's non-standard use of the "interrupt-map". These quirks
may be needed in other operating systems that consume this device tree,
yet this is seen as the only viable solution.
Therefore, the premise of the patch being reverted here is invalid.
It doesn't matter whether the driver, in its non-standard use of the
property, complies to the standard format or not, since this property
isn't expected to be used for interrupt translation by the core.
This change restores LS1088A, LS2088A/LS2085A and LX2160A to their
previous bindings, which allows these systems to continue to use
external interrupt lines with the correct polarity.
thermal: int340x: Update OS policy capability handshake
Update the firmware with OS supported policies mask, so that firmware can
relinquish its internal controls. Without this update several Tiger Lake
laptops gets performance limited with in few seconds of executing in
turbo region.
The existing way of enumerating firmware policies via IDSP method and
selecting policy by directly writing those policy UUIDS via _OSC method
is not supported in newer generation of hardware.
There is a new UUID "B23BA85D-C8B7-3542-88DE-8DE2FFCFD698" is defined for
updating policy capabilities. As part of ACPI _OSC method:
DWORD1: As defined in the ACPI 5.0 Specification
- Bit 0: Query Flag
- Bits 1-3: Always 0
- Bits 4-31: Reserved
DWORD2 and beyond:
- Bit0: set to 1 to indicate Intel(R) Dynamic Tuning is active, 0 to
indicate it is disabled and legacy thermal mechanism should
be enabled.
- Bit1: set to 1 to indicate Intel(R) Dynamic Tuning is controlling
active cooling, 0 to indicate bios shall enable legacy thermal
zone with active trip point.
- Bit2: set to 1 to indicate Intel(R) Dynamic Tuning is controlling
passive cooling, 0 to indicate bios shall enable legacy thermal
zone with passive trip point.
- Bit3: set to 1 to indicate Intel(R) Dynamic Tuning is handling
critical trip point, 0 to indicate bios shall enable legacy
thermal zone with critical trip point.
- Bits 4:31: Reserved
From sysfs interface, there is an existing interface to update policy
UUID using attribute "current_uuid". User space can write the same UUID
for ACTIVE, PASSIVE and CRITICAL policy. Driver converts these UUIDs to
DWORD2 Bit 1 to Bit 3. When any of the policy is activated by user
space it is assumed that dynamic tuning is active.
For example
$cd /sys/bus/platform/devices/INTC1040:00/uuids
To support active policy
$echo "3A95C389-E4B8-4629-A526-C52C88626BAE" > current_uuid
To support passive policy
$echo "42A441D6-AE6A-462b-A84B-4A8CE79027D3" > current_uuid
To support critical policy
$echo "97C68AE7-15FA-499c-B8C9-5DA81D606E0A" > current_uuid
To match the bit format for DWORD2, rearranged enum int3400_thermal_uuid
and int3400_thermal_uuids[] by swapping current INT3400_THERMAL_ACTIVE
and INT3400_THERMAL_PASSIVE_1.
If the policies are enumerated via IDSP method then legacy method is
used, if not the new method is used to update policy support.
1) Fix a kernel-info-leak in pfkey.
From Haimin Zhang.
2) Fix an incorrect check of the return value of ipv6_skip_exthdr.
From Sabrina Dubroca.
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
esp6: fix check on ipv6_skip_exthdr's return value
af_key: add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register
====================
David Woodhouse [Fri, 11 Mar 2022 19:20:17 +0000 (19:20 +0000)]
PM: hibernate: Honour ACPI hardware signature by default for virtual guests
The ACPI specification says that OSPM should refuse to restore from
hibernate if the hardware signature changes, and should boot from
scratch. However, real BIOSes often vary the hardware signature in cases
where we *do* want to resume from hibernate, so Linux doesn't follow the
spec by default.
However, in a virtual environment there's no reason for the VMM to vary
the hardware signature *unless* it wants to trigger a clean reboot as
defined by the ACPI spec. So enable the check by default if a hypervisor
is detected.
For some specific platforms (E.g. AlderLake) the balance performance
EPP is updated from the hard coded value in the driver. This acts as
the default and balance_performance EPP. The purpose of this EPP
update is to reach maximum 1 core turbo frequency (when possible) out
of the box.
Although we can achieve the objective by using hard coded value in the
driver, there can be other EPP which can be better in terms of power.
But that will be very subjective based on platform and use cases.
This is not practical to have a per platform specific default hard coded
in the driver.
If a platform wants to specify default EPP, it can be set in the firmware.
If this EPP is not the chipset default of 0x80 (balance_perf_epp unless
driver changed it) and more performance oriented but not 0, the driver
can use this as the default and balanced_perf EPP. In this case no driver
update is required every time there is some new platform and default EPP.
If the firmware didn't update the EPP from the chipset default then
the hard coded value is used as per existing implementation.
Jakub Kicinski [Wed, 16 Mar 2022 18:08:09 +0000 (11:08 -0700)]
Merge tag 'wireless-2022-03-16' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v5.17
Third set of fixes for v5.17. We have only one revert to fix an ath10k
regression.
* tag 'wireless-2022-03-16' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
Revert "ath10k: drop beacon and probe response which leak from other channel"
====================
drm/imx: parallel-display: Remove bus flags check in imx_pd_bridge_atomic_check()
If display timings were read from the devicetree using
of_get_display_timing() and pixelclk-active is defined
there, the flag DISPLAY_FLAGS_SYNC_POSEDGE/NEGEDGE is
automatically generated. Through the function
drm_bus_flags_from_videomode() e.g. called in the
panel-simple driver this flag got into the bus flags,
but then in imx_pd_bridge_atomic_check() the bus flag
check failed and will not initialize the display. The
original commit fe141cedc433 does not explain why this
check was introduced. So remove the bus flags check,
because it stops the initialization of the display with
valid bus flags.
I was reported privately that this commit breaks AP and mesh mode on QCA9984
(firmware 10.4-3.9.0.2-00156). So revert the commit to fix the regression.
There was a conflict due to cfg80211 API changes but that was easy to fix.
Revert "ACPI: scan: Do not add device IDs from _CID if _HID is not valid"
Revert commit e38f9ff63e6d ("ACPI: scan: Do not add device IDs from _CID
if _HID is not valid"), because it has introduced regressions on
multiple systems, even though it only has effect on clearly invalid
firmware.
Jiri Kosina [Mon, 14 Mar 2022 08:25:18 +0000 (09:25 +0100)]
x86/nmi: Remove the 'strange power saving mode' hint from unknown NMI handler
The
Do you have a strange power saving mode enabled?
hint when unknown NMI happens dates back to i386 stone age, and isn't
currently really helpful.
Unknown NMIs are coming for many different reasons (broken firmware,
faulty hardware, ...) and rarely have anything to do with 'strange power
saving mode' (whatever that even is).
A bug in legacy U-Boot causes a crash during SDRAM boot if ECC is not
enabled in the bitstream but enabled in the Linux config.
Memory mapped read of the ECC Enabled bit was only enabled if U-Boot
determined ECC was enabled in the bitstream.
The Linux driver checks the ECC enable bit using a memory map read.
In the ECC disabled bitstream case, U-Boot didn't enable ECC register
memory map reads and since they are not allowed this results in a crash.
Always read the ECC Enable register through a SMC call which is always
allowed and it works with legacy and current U-Boot.
Jiasheng Jiang [Mon, 14 Mar 2022 02:01:25 +0000 (10:01 +0800)]
hv_netvsc: Add check for kvmalloc_array
As the potential failure of the kvmalloc_array(),
it should be better to check and restore the 'data'
if fails in order to avoid the dereference of the
NULL pointer.
Lukas Bulwahn [Mon, 14 Mar 2022 15:03:21 +0000 (16:03 +0100)]
sr: simplify the local variable initialization in sr_block_open()
Commit 01d0c698536f ("sr: implement ->free_disk to simplify refcounting")
refactored sr_block_open(), initialized one variable with a duplicate
assignment (probably an unintended copy & paste duplication) and turned one
error case into an early return, which makes the initialization of the
return variable needless.
So, simplify the local variable initialization in sr_block_open() to make
the code a bit more clear.
No functional change. No change in resulting object code.
Fix double free possibility in iavf_disable_vf, as crit_lock is
freed in caller, iavf_reset_task. Add kernel-doc for iavf_disable_vf.
Remove mutex_unlock in iavf_disable_vf.
Without this patch there is double free scenario, when calling
iavf_reset_task.
ice: fix NULL pointer dereference in ice_update_vsi_tx_ring_stats()
It is possible to do NULL pointer dereference in routine that updates
Tx ring stats. Currently only stats and bytes are updated when ring
pointer is valid, but later on ring is accessed to propagate gathered Tx
stats onto VSI stats.
Change the existing logic to move to next ring when ring is NULL.
counter: Stop using dev_get_drvdata() to get the counter device
dev_get_drvdata() returns NULL since commit b56346ddbd82 ("counter: Use
container_of instead of drvdata to track counter_device") which wrongly
claimed there were no users of drvdata. Convert to container_of() to
fix a null pointer dereference.
Jann Horn [Mon, 14 Mar 2022 18:59:53 +0000 (19:59 +0100)]
pstore: Don't use semaphores in always-atomic-context code
pstore_dump() is *always* invoked in atomic context (nowadays in an RCU
read-side critical section, before that under a spinlock).
It doesn't make sense to try to use semaphores here.
This is mostly a revert of commit ea84b580b955 ("pstore: Convert buf_lock
to semaphore"), except that two parts aren't restored back exactly as they
were:
- keep the lock initialization in pstore_register
- in efi_pstore_write(), always set the "block" flag to false
- omit "is_locked", that was unnecessary since
commit 959217c84c27 ("pstore: Actually give up during locking failure")
- fix the bailout message
The actual problem that the buggy commit was trying to address may have
been that the use of preemptible() in efi_pstore_write() was wrong - it
only looks at preempt_count() and the state of IRQs, but __rcu_read_lock()
doesn't touch either of those under CONFIG_PREEMPT_RCU.
(Sidenote: CONFIG_PREEMPT_RCU means that the scheduler can preempt tasks in
RCU read-side critical sections, but you're not allowed to actively
block/reschedule.)
Lockdep probably never caught the problem because it's very rare that you
actually hit the contended case, so lockdep always just sees the
down_trylock(), not the down_interruptible(), and so it can't tell that
there's a problem.
David Jeffery [Fri, 11 Mar 2022 18:43:59 +0000 (13:43 -0500)]
scsi: fnic: Finish scsi_cmnd before dropping the spinlock
When aborting a SCSI command through fnic, there is a race with the fnic
interrupt handler which can result in the SCSI command and its request
being completed twice. If the interrupt handler claims the command by
setting CMD_SP to NULL first, the abort handler assumes the interrupt
handler has completed the command and returns SUCCESS, causing the request
for the scsi_cmnd to be re-queued.
But the interrupt handler may not have finished the command yet. After it
drops the spinlock protecting CMD_SP, it does memory cleanup before finally
calling scsi_done() to complete the scsi_cmnd. If the call to scsi_done
occurs after the abort handler finishes and re-queues the request, the
completion of the scsi_cmnd will advance and try to double complete a
request already queued for retry.
This patch fixes the issue by moving scsi_done() and any other use of
scsi_cmnd to before the spinlock is released by the interrupt handler.
Although the bug manifested in the driver core, the real cause was a
race with the gadget core. dev_uevent() does:
if (dev->driver)
add_uevent_var(env, "DRIVER=%s", dev->driver->name);
and between the test and the dereference of dev->driver, the gadget
core sets dev->driver to NULL.
The race wouldn't occur if the gadget core registered its devices on
a real bus, using the standard synchronization techniques of the
driver core. However, it's not necessary to make such a large change
in order to fix this bug; all we need to do is make sure that
udc->dev.driver is always NULL.
In fact, there is no reason for udc->dev.driver ever to be set to
anything, let alone to the value it currently gets: the address of the
gadget's driver. After all, a gadget driver only knows how to manage
a gadget, not how to manage a UDC.
This patch simply removes the statements in the gadget core that touch
udc->dev.driver.
This commit - while attempting to fix a regression - has caused a number
of other problems. As the fallout from it is more significant than the
initial problem itself, revert it for now before we find a correct
solution.
Jens Axboe [Tue, 15 Mar 2022 16:54:08 +0000 (10:54 -0600)]
io_uring: recycle apoll_poll entries
Particularly for networked workloads, io_uring intensively uses its
poll based backend to get a notification when data/space is available.
Profiling workloads, we see 3-4% of alloc+free that is directly attributed
to just the apoll allocation and free (and the rest being skb alloc+free).
For the fast path, we have ctx->uring_lock held already for both issue
and the inline completions, and we can utilize that to avoid any extra
locking needed to have a basic recycling cache for the apoll entries on
both the alloc and free side.
Double poll still requires an allocation. But those are rare and not
a fast path item.
With the simple cache in place, we see a 3-4% reduction in overhead for
the workload.
MAINTAINERS: Mark VMware mailing list entries as email aliases
VMware mailing lists in the MAINTAINERS file are private lists meant
for VMware-internal review/notification for patches to the respective
subsystems. Anyone can post to these addresses, but there is no public
read access like open mailing lists, which makes them more like email
aliases instead (to reach out to reviewers).
So update all the VMware mailing list references in the MAINTAINERS
file to mark them as such, using "R: [email protected]".
MAINTAINERS: Update maintainers for paravirt ops and VMware hypervisor interface
Deep has decided to transfer the joint-maintainership of paravirt ops
to Srivatsa, and the maintainership of the VMware hypervisor interface
to Srivatsa and Alexey. Update the MAINTAINERS file to reflect this
change, and also add Alexey as a reviewer for paravirt ops.
partially Revert "usb: musb: Set the DT node on the child device"
This reverts the omap2430 changes of
commit cf081d009c44 ("usb: musb: Set the DT node on the child device")
Since v5.17-rc1, musb is broken on the gta04 and openpandora devices
(omap3530/dm3730). BeagleBone Black (am335x) seems to work.
Symptoms of this bug are
a) main symptom
[ 21.336517] using random host ethernet address
[ 21.341430] using host ethernet address: 32:70:05:18:ff:78
[ 21.341461] using self ethernet address: 46:10:3a:b3:af:d9
[ 21.358184] usb0: HOST MAC 32:70:05:18:ff:78
[ 21.376678] usb0: MAC 46:10:3a:b3:af:d9
[ 21.388305] using random self ethernet address
[ 21.393371] using random host ethernet address
[ 21.398162] g_ether gadget: Ethernet Gadget, version: Memorial Day 2008
[ 21.421081] g_ether gadget: g_ether ready
[ 21.492156] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 21.691345] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 21.803192] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 21.819427] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 22.124450] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 22.168518] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 22.179382] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 23.213592] musb-hdrc musb-hdrc.1.auto: pm runtime get failed in musb_gadget_queue
[ 23.221832] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 23.227905] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 23.239440] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 23.401000] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 23.407073] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 23.426361] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 23.734466] musb-hdrc musb-hdrc.1.auto: pm runtime get failed in musb_gadget_queue
[ 23.742462] musb-hdrc musb-hdrc.1.auto: pm runtime get failed in musb_gadget_queue
[ 23.750396] musb-hdrc musb-hdrc.1.auto: pm runtime get failed in musb_gadget_queue
... (repeats with high frequency)
This stops if the USB cable is unplugged and restarts if it is plugged in again.
b) also found in the log
[ 6.498107] ------------[ cut here ]------------
[ 6.502960] WARNING: CPU: 0 PID: 868 at arch/arm/mach-omap2/omap_hwmod.c:1885 _enable+0x50/0x234
[ 6.512207] omap_hwmod: usb_otg_hs: enabled state can only be entered from initialized, idle, or disabled state
[ 6.522766] Modules linked in: omap2430(+) bmp280_i2c bmp280 itg3200 at24 tsc2007 leds_tca6507 bma180 hmc5843_i2c hmc5843_core industrialio_triggered_buffer lis3lv02d_i2c kfifo_buf lis3lv02d phy_twl4030_usb snd_soc_omap_mcbsp snd_soc_ti_sdma musb_hdrc snd_soc_twl4030 gnss_sirf twl4030_vibra twl4030_madc twl4030_charger twl4030_pwrbutton gnss industrialio ehci_omap omapdrm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm drm_panel_orientation_quirks cec
[ 6.566436] CPU: 0 PID: 868 Comm: udevd Not tainted 5.16.0-rc5-letux+ #8251
[ 6.573730] Hardware name: Generic OMAP36xx (Flattened Device Tree)
[ 6.580322] [<c010ed30>] (unwind_backtrace) from [<c010a1d0>] (show_stack+0x10/0x14)
[ 6.588470] [<c010a1d0>] (show_stack) from [<c0897c14>] (dump_stack_lvl+0x40/0x4c)
[ 6.596405] [<c0897c14>] (dump_stack_lvl) from [<c0130cc4>] (__warn+0xb4/0xdc)
[ 6.604003] [<c0130cc4>] (__warn) from [<c0130d5c>] (warn_slowpath_fmt+0x70/0x9c)
[ 6.611846] [<c0130d5c>] (warn_slowpath_fmt) from [<c011f4d4>] (_enable+0x50/0x234)
[ 6.619903] [<c011f4d4>] (_enable) from [<c012081c>] (omap_hwmod_enable+0x28/0x40)
[ 6.627838] [<c012081c>] (omap_hwmod_enable) from [<c0120ff4>] (omap_device_enable+0x4c/0x78)
[ 6.636779] [<c0120ff4>] (omap_device_enable) from [<c0121030>] (_od_runtime_resume+0x10/0x3c)
[ 6.645812] [<c0121030>] (_od_runtime_resume) from [<c05c688c>] (__rpm_callback+0x3c/0xf4)
[ 6.654510] [<c05c688c>] (__rpm_callback) from [<c05c6994>] (rpm_callback+0x50/0x54)
[ 6.662628] [<c05c6994>] (rpm_callback) from [<c05c66b0>] (rpm_resume+0x448/0x4e4)
[ 6.670593] [<c05c66b0>] (rpm_resume) from [<c05c6784>] (__pm_runtime_resume+0x38/0x50)
[ 6.678985] [<c05c6784>] (__pm_runtime_resume) from [<bf14ab20>] (musb_init_controller+0x350/0xa5c [musb_hdrc])
[ 6.689727] [<bf14ab20>] (musb_init_controller [musb_hdrc]) from [<c05bccb8>] (platform_probe+0x58/0xa8)
[ 6.699737] [<c05bccb8>] (platform_probe) from [<c05badf0>] (really_probe+0x170/0x2fc)
[ 6.708068] [<c05badf0>] (really_probe) from [<c05bb040>] (__driver_probe_device+0xc4/0xd8)
[ 6.716827] [<c05bb040>] (__driver_probe_device) from [<c05bb084>] (driver_probe_device+0x30/0xac)
[ 6.726226] [<c05bb084>] (driver_probe_device) from [<c05bb3d0>] (__device_attach_driver+0x94/0xb4)
[ 6.735717] [<c05bb3d0>] (__device_attach_driver) from [<c05b93f8>] (bus_for_each_drv+0xa0/0xb4)
[ 6.744934] [<c05b93f8>] (bus_for_each_drv) from [<c05bb248>] (__device_attach+0xc0/0x134)
[ 6.753631] [<c05bb248>] (__device_attach) from [<c05b9fcc>] (bus_probe_device+0x28/0x80)
[ 6.762207] [<c05b9fcc>] (bus_probe_device) from [<c05b7e40>] (device_add+0x5fc/0x788)
[ 6.770507] [<c05b7e40>] (device_add) from [<c05bd240>] (platform_device_add+0x70/0x1bc)
[ 6.779022] [<c05bd240>] (platform_device_add) from [<bf177830>] (omap2430_probe+0x260/0x2d4 [omap2430])
[ 6.789001] [<bf177830>] (omap2430_probe [omap2430]) from [<c05bccb8>] (platform_probe+0x58/0xa8)
[ 6.798309] [<c05bccb8>] (platform_probe) from [<c05badf0>] (really_probe+0x170/0x2fc)
[ 6.806610] [<c05badf0>] (really_probe) from [<c05bb040>] (__driver_probe_device+0xc4/0xd8)
[ 6.815399] [<c05bb040>] (__driver_probe_device) from [<c05bb084>] (driver_probe_device+0x30/0xac)
[ 6.824798] [<c05bb084>] (driver_probe_device) from [<c05bb4b4>] (__driver_attach+0xc4/0xd8)
[ 6.833648] [<c05bb4b4>] (__driver_attach) from [<c05b9308>] (bus_for_each_dev+0x64/0xa0)
[ 6.842224] [<c05b9308>] (bus_for_each_dev) from [<c05ba248>] (bus_add_driver+0x148/0x1a4)
[ 6.850891] [<c05ba248>] (bus_add_driver) from [<c05bbd1c>] (driver_register+0xb4/0xf8)
[ 6.859313] [<c05bbd1c>] (driver_register) from [<c0101f54>] (do_one_initcall+0x90/0x1c8)
[ 6.867889] [<c0101f54>] (do_one_initcall) from [<c0893968>] (do_init_module+0x4c/0x204)
[ 6.876373] [<c0893968>] (do_init_module) from [<c01b4c30>] (load_module+0x13f0/0x1928)
[ 6.884796] [<c01b4c30>] (load_module) from [<c01b53a0>] (sys_finit_module+0xa0/0xc0)
[ 6.893005] [<c01b53a0>] (sys_finit_module) from [<c0100080>] (ret_fast_syscall+0x0/0x54)
[ 6.901580] Exception stack(0xc2807fa8 to 0xc2807ff0)
[ 6.906890] 7fa0: b6e517d40005206800000006b6e509f800000000b6e5131c
[ 6.915466] 7fc0: b6e517d400052068cd7180000000017b0002000000037f780005004800063368
[ 6.924011] 7fe0: bed8fef0bed8fee0b6e4ac4bb6f55a42
[ 6.929321] ---[ end trace d715ff121b58763c ]---
c) git bisect result on testing for "musb-hdrc" in the console log:
The musb glue drivers just copy the glue resources to the musb child device.
Instead, set the musb child device's DT node pointer to the parent device's
node so that platform_get_irq_byname() can find the resources in the DT.
This removes the need for statically populating the IRQ resources from the
DT which has been deprecated for some time.
nvmet: move the call to nvmet_ns_changed out of nvmet_ns_revalidate
nvmet_ns_changed states via lockdep that the ns->subsys->lock must be
held. The only caller of nvmet_ns_changed which does not acquire that
lock is nvmet_ns_revalidate. nvmet_ns_revalidate has 3 callers,
of which 2 do not acquire that lock: nvmet_execute_identify_cns_cs_ns
and nvmet_execute_identify_ns. The other caller
nvmet_ns_revalidate_size_store does acquire the lock.
Move the call to nvmet_ns_changed from nvmet_ns_revalidate to the callers
so that they can perform the correct locking as needed.
This issue was found using a static type-based analyser and manually
verified.
Eric Dumazet [Sat, 12 Mar 2022 23:29:58 +0000 (15:29 -0800)]
net/packet: fix slab-out-of-bounds access in packet_recvmsg()
syzbot found that when an AF_PACKET socket is using PACKET_COPY_THRESH
and mmap operations, tpacket_rcv() is queueing skbs with
garbage in skb->cb[], triggering a too big copy [1]
Presumably, users of af_packet using mmap() already gets correct
metadata from the mapped buffer, we can simply make sure
to clear 12 bytes that might be copied to user space later.
BUG: KASAN: stack-out-of-bounds in memcpy include/linux/fortify-string.h:225 [inline]
BUG: KASAN: stack-out-of-bounds in packet_recvmsg+0x56c/0x1150 net/packet/af_packet.c:3489
Write of size 165 at addr ffffc9000385fb78 by task syz-executor233/3631
Michael Walle [Sat, 12 Mar 2022 22:41:40 +0000 (23:41 +0100)]
net: mdio: mscc-miim: fix duplicate debugfs entry
This driver can have up to two regmaps. If the second one is registered
its debugfs entry will have the same name as the first one and the
following error will be printed:
[ 3.833521] debugfs: Directory 'e200413c.mdio' with parent 'regmap' already present!
This is caused by mpt3sas_base_sync_reply_irqs() using an invalid reply_q
pointer outside of the list_for_each_entry() loop. At the end of the full
list traversal the pointer is invalid.
Move the _base_process_reply_queue() call inside of the loop.