Juri Lelli [Thu, 14 Jul 2022 15:19:08 +0000 (17:19 +0200)]
sched/deadline: Fix BUG_ON condition for deboosted tasks
Tasks the are being deboosted from SCHED_DEADLINE might enter
enqueue_task_dl() one last time and hit an erroneous BUG_ON condition:
since they are not boosted anymore, the if (is_dl_boosted()) branch is
not taken, but the else if (!dl_prio) is and inside this one we
BUG_ON(!is_dl_boosted), which is of course false (BUG_ON triggered)
otherwise we had entered the if branch above. Long story short, the
current condition doesn't make sense and always leads to triggering of a
BUG.
Fix this by only checking enqueue flags, properly: ENQUEUE_REPLENISH has
to be present, but additional flags are not a problem.
drm/amdgpu: Protect the amdgpu_bo_list list with a mutex v2
Protect the struct amdgpu_bo_list with a mutex. This is used during command
submission in order to avoid buffer object corruption as recorded in
the link below.
v2 (chk): Keep the mutex looked for the whole CS to avoid using the
list from multiple CS threads at the same time.
watchqueue: make sure to serialize 'wqueue->defunct' properly
When the pipe is closed, we mark the associated watchqueue defunct by
calling watch_queue_clear(). However, while that is protected by the
watchqueue lock, new watchqueue entries aren't actually added under that
lock at all: they use the pipe->rd_wait.lock instead, and looking up
that pipe happens without any locking.
The watchqueue code uses the RCU read-side section to make sure that the
wqueue entry itself hasn't disappeared, but that does not protect the
pipe_info in any way.
So make sure to actually hold the wqueue lock when posting watch events,
properly serializing against the pipe being torn down.
On a machine with the LBR format LBR_FORMAT_EIP_FLAGS2, when the TSX is
disabled, a TSX quirk is required to access LBR from registers.
The lbr_from_signext_quirk_needed() is introduced to determine whether
the TSX quirk should be applied. However, the
lbr_from_signext_quirk_needed() is invoked before the
intel_pmu_lbr_init(), which parses the LBR format information. Without
the correct LBR format information, the TSX quirk never be applied.
Move the lbr_from_signext_quirk_needed() into the intel_pmu_lbr_init().
Checking x86_pmu.lbr_has_tsx in the lbr_from_signext_quirk_needed() is
not required anymore.
Both LBR_FORMAT_EIP_FLAGS2 and LBR_FORMAT_INFO have LBR_TSX flag, but
only the LBR_FORMAT_EIP_FLAGS2 requirs the quirk. Update the comments
accordingly.
WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:557 apply_returns (arch/x86/kernel/alternative.c:557 (discriminator 1))
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc4-00008-gee88d363d156 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
RIP: 0010:apply_returns (arch/x86/kernel/alternative.c:557 (discriminator 1))
Code: ff ff 74 cb 48 83 c5 04 49 39 ee 0f 87 81 fe ff ff e9 22 ff ff ff 0f 0b 48 83 c5 04 49 39 ee 0f 87 6d fe ff ff e9 0e ff ff ff <0f> 0b 48 83 c5 04 49 39 ee 0f 87 59 fe ff ff e9 fa fe ff ff 48 89
The warning happened when apply_returns() failed to convert "JMP
__x86_return_thunk" to RET. It was instead a JMP to nowhere, due to the
thunk relocation not getting resolved.
That rodata.o code is objcopy'd to .rodata, and later memcpy'd, so
relocations don't work (and are apparently silently ignored).
LKDTM is only used for testing, so the naked RET should be fine. So
just disable return thunks for that file.
While at it, disable objtool and KCSAN for the file.
x86/bugs: Warn when "ibrs" mitigation is selected on Enhanced IBRS parts
IBRS mitigation for spectre_v2 forces write to MSR_IA32_SPEC_CTRL at
every kernel entry/exit. On Enhanced IBRS parts setting
MSR_IA32_SPEC_CTRL[IBRS] only once at boot is sufficient. MSR writes at
every kernel entry/exit incur unnecessary performance loss.
When Enhanced IBRS feature is present, print a warning about this
unnecessary performance loss.
Eric Snowberg [Wed, 20 Jul 2022 16:40:27 +0000 (12:40 -0400)]
lockdown: Fix kexec lockdown bypass with ima policy
The lockdown LSM is primarily used in conjunction with UEFI Secure Boot.
This LSM may also be used on machines without UEFI. It can also be
enabled when UEFI Secure Boot is disabled. One of lockdown's features
is to prevent kexec from loading untrusted kernels. Lockdown can be
enabled through a bootparam or after the kernel has booted through
securityfs.
If IMA appraisal is used with the "ima_appraise=log" boot param,
lockdown can be defeated with kexec on any machine when Secure Boot is
disabled or unavailable. IMA prevents setting "ima_appraise=log" from
the boot param when Secure Boot is enabled, but this does not cover
cases where lockdown is used without Secure Boot.
To defeat lockdown, boot without Secure Boot and add ima_appraise=log to
the kernel command line; then:
spi: bcm2835: bcm2835_spi_handle_err(): fix NULL pointer deref for non DMA transfers
In case a IRQ based transfer times out the bcm2835_spi_handle_err()
function is called. Since commit 1513ceee70f2 ("spi: bcm2835: Drop
dma_pending flag") the TX and RX DMA transfers are unconditionally
canceled, leading to NULL pointer derefs if ctlr->dma_tx or
ctlr->dma_rx are not set.
Fix the NULL pointer deref by checking that ctlr->dma_tx and
ctlr->dma_rx are valid pointers before accessing them.
Kent Gibson [Tue, 19 Jul 2022 11:06:01 +0000 (19:06 +0800)]
selftests: gpio: fix include path to kernel headers for out of tree builds
When building selftests out of the kernel tree the gpio.h the include
path is incorrect and the build falls back to the system includes
which may be outdated.
Add the KHDR_INCLUDES to the CFLAGS to include the gpio.h from the
build tree.
mlxsw needs to distinguish nexthops with a gateway from connected
nexthops in order to write the former to the adjacency table of the
device. The check used to rely on the fact that nexthops with a gateway
have a 'link' scope whereas connected nexthops have a 'host' scope. This
is no longer correct after commit 747c14307214 ("ip: fix dflt addr
selection for connected nexthop").
Fix that by instead checking the address family of the gateway IP. This
is a more direct way and also consistent with the IPv6 counterpart in
mlxsw_sp_rt6_is_gateway().
The cited commit refactored the flow action initialization sequence to
use an interface method when translating tc action instances to flow
offload objects. The refactored version skips the initialization of the
generic flow action attributes for tc actions, such as pedit, that allocate
more than one offload entry. This can cause potential issues for drivers
mapping flow action ids.
Populate the generic flow action fields for all the flow action entries.
Fixes: c54e1d920f04 ("flow_offload: add ops to tc_action_ops for flow action setup") Signed-off-by: Oz Shlomo <[email protected]> Reviewed-by: Roi Dayan <[email protected]>
----
v1 -> v2:
- coalese the generic flow action fields initialization to a single loop Reviewed-by: Baowen Zheng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
tcp: Fix data-races around sysctl_tcp_max_reordering.
While reading sysctl_tcp_max_reordering, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: dca145ffaa8d ("tcp: allow for bigger reordering level") Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
tcp: Fix a data-race around sysctl_tcp_thin_linear_timeouts.
While reading sysctl_tcp_thin_linear_timeouts, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 36e31b0af587 ("net: TCP thin linear timeouts") Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
While reading sysctl_tcp_recovery, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: 4f41b1c58a32 ("tcp: use RACK to detect losses") Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
udp: Fix a data-race around sysctl_udp_l3mdev_accept.
While reading sysctl_udp_l3mdev_accept, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: 63a6fff353d0 ("net: Avoid receiving packets with an l3mdev on unbound UDP sockets") Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
sysctl_ip_prot_sock is accessed concurrently, and there is always a chance
of data-race. So, all readers and writers need some basic protection to
avoid load/store-tearing.
Fixes: 4548b683b781 ("Introduce a sysctl that modifies the value of PROT_SOCK.") Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
ipv4: Fix data-races around sysctl_fib_multipath_hash_fields.
While reading sysctl_fib_multipath_hash_fields, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: ce5c9c20d364 ("ipv4: Add a sysctl to control multipath hash fields") Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
ipv4: Fix data-races around sysctl_fib_multipath_hash_policy.
While reading sysctl_fib_multipath_hash_policy, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: bf4e0a3db97e ("net: ipv4: add support for ECMP hash policy choice") Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
In commit c6f2a617a0a8 ("can: mcp251xfd: add support for mcp251863")
support for the mcp251863 was added. However it was not taken into
account that the auto detection of the chip model cannot distinguish
between mcp2518fd and mcp251863 and would lead to a warning message if
the firmware specifies a mcp251863.
Fix auto detection: If a mcp2518fd compatible chip is found, keep the
mcp251863 if specified by firmware, use mcp2518fd instead.
Liang He [Thu, 14 Jul 2022 08:13:37 +0000 (16:13 +0800)]
drm/imx/dcss: Add missing of_node_put() in fail path
In dcss_dev_create() and dcss_dev_destroy(), we should call of_node_put()
in fail path or before the dcss's destroy as of_graph_get_port_by_id() has
increased the refcount.
This patch re-introduces support for GuC v69 in parallel to v70. As this
is a quick fix, v69 has been re-introduced as the single "fallback" guc
version in case v70 is not available on disk and only for platforms that
are out of force_probe and require the GuC by default. All v69 specific
code has been labeled as such for easy identification, and the same was
done for all v70 functions for which there is a separate v69 version,
to avoid accidentally calling the wrong version via the unlabeled name.
When the fallback mode kicks in, a drm_notice message is printed in
dmesg to inform the user of the required update. The existing
logging of the fetch function has also been updated so that we no
longer complain immediately if we can't find a fw and we only throw an
error if the fetch of both the base and fallback blobs fails.
The plan is to follow this up with a more complex rework to allow for
multiple different GuC versions to be supported at the same time.
v2: reduce the fallback to platform that require it, switch to
firmware_request_nowarn(), improve logs.
Matthew Brost [Wed, 4 May 2022 23:46:36 +0000 (16:46 -0700)]
drm/i915/guc: Support programming the EU priority in the GuC descriptor
In GuC submission mode the EU priority must be updated by the GuC rather
than the driver as the GuC owns the programming of the context descriptor.
Given that the GuC code uses the GuC priorities, we can't use a generic
function using i915 priorities for both execlists and GuC submission.
The existing function has therefore been pushed to the execlists
back-end while a new one has been added for GuC.
Jakub Kicinski [Wed, 20 Jul 2022 00:43:02 +0000 (17:43 -0700)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-07-18
This series contains updates to iavf driver only.
Przemyslaw fixes handling of multiple VLAN requests to account for
individual errors instead of rejecting them all. He removes incorrect
implementations of ETHTOOL_COALESCE_MAX_FRAMES and
ETHTOOL_COALESCE_MAX_FRAMES_IRQ.
He also corrects an issue with NULL pointer caused by improper handling of
dummy receive descriptors. Finally, he corrects debug prints reporting an
unknown state.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: Fix missing state logs
iavf: Fix handling of dummy receive descriptors
iavf: Disallow changing rx/tx-frames and rx/tx-frames-irq
iavf: Fix VLAN_V2 addition/rejection
====================
Xin Long [Mon, 18 Jul 2022 17:56:59 +0000 (13:56 -0400)]
Documentation: fix udp_wmem_min in ip-sysctl.rst
UDP doesn't support tx memory accounting, and sysctl udp_wmem_min
is not really used anywhere. So we should fix the description in
ip-sysctl.rst accordingly.
Lorenzo Bianconi [Mon, 18 Jul 2022 09:51:53 +0000 (11:51 +0200)]
net: ethernet: mtk_ppe: fix possible NULL pointer dereference in mtk_flow_get_wdma_info
odev pointer can be NULL in mtk_flow_offload_replace routine according
to the flower action rules. Fix possible NULL pointer dereference in
mtk_flow_get_wdma_info.
Hayes Wang [Mon, 18 Jul 2022 08:21:20 +0000 (16:21 +0800)]
r8152: fix a WOL issue
This fixes that the platform is waked by an unexpected packet. The
size and range of FIFO is different when the device enters S3 state,
so it is necessary to correct some settings when suspending.
Regardless of jumbo frame, set RMS to 1522 and MTPS to MTPS_DEFAULT.
Besides, enable MCU_BORW_EN to update the method of calculating the
pointer of data. Then, the hardware could get the correct data.
srcu: Make expedited RCU grace periods block even less frequently
The purpose of commit 282d8998e997 ("srcu: Prevent expedited GPs
and blocking readers from consuming CPU") was to prevent a long
series of never-blocking expedited SRCU grace periods from blocking
kernel-live-patching (KLP) progress. Although it was successful, it also
resulted in excessive boot times on certain embedded workloads running
under qemu with the "-bios QEMU_EFI.fd" command line. Here "excessive"
means increasing the boot time up into the three-to-four minute range.
This increase in boot time was due to the more than 6000 back-to-back
invocations of synchronize_rcu_expedited() within the KVM host OS, which
in turn resulted from qemu's emulation of a long series of MMIO accesses.
Commit 640a7d37c3f4 ("srcu: Block less aggressively for expedited grace
periods") did not significantly help this particular use case.
Zhangfei Gao and Shameerali Kolothum Thodi did experiments varying the
value of SRCU_MAX_NODELAY_PHASE with HZ=250 and with various values
of non-sleeping per phase counts on a system with preemption enabled,
and observed the following boot times:
Analysis on the experiment results show additional improvements with
CPU-bound delays approaching one jiffy in duration. This improvement was
also seen when number of per-phase iterations were scaled to one jiffy.
This commit therefore scales per-grace-period phase number of non-sleeping
polls so that non-sleeping polls extend for about one jiffy. In addition,
the delay-calculation call to srcu_get_delay() in srcu_gp_end() is
replaced with a simple check for an expedited grace period. This change
schedules callback invocation immediately after expedited grace periods
complete, which results in greatly improved boot times. Testing done
by Marc and Zhangfei confirms that this change recovers most of the
performance degradation in boottime; for CONFIG_HZ_250 configuration,
specifically, boot times improve from 3m50s to 41s on Marc's setup;
and from 2m40s to ~9.7s on Zhangfei's setup.
In addition to the changes to default per phase delays, this
change adds 3 new kernel parameters - srcutree.srcu_max_nodelay,
srcutree.srcu_max_nodelay_phase, and srcutree.srcu_retry_check_delay.
This allows users to configure the srcu grace period scanning delays in
order to more quickly react to additional use cases.
Paul E. McKenney [Sun, 12 Jun 2022 22:00:06 +0000 (15:00 -0700)]
srcu: Block less aggressively for expedited grace periods
Commit 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers
from consuming CPU") fixed a problem where a long-running expedited SRCU
grace period could block kernel live patching. It did so by giving up
on expediting once a given SRCU expedited grace period grew too old.
Unfortunately, this added excessive delays to boots of virtual embedded
systems specifying "-bios QEMU_EFI.fd" to qemu. This commit therefore
makes the transition away from expediting less aggressive, increasing
the per-grace-period phase number of non-sleeping polls of readers from
one to three and increasing the required grace-period age from one jiffy
(actually from zero to one jiffies) to two jiffies (actually from one
to two jiffies).
Aaron Lewis [Thu, 14 Jul 2022 16:13:15 +0000 (16:13 +0000)]
KVM: x86: Protect the unused bits in MSR exiting flags
The flags for KVM_CAP_X86_USER_SPACE_MSR and KVM_X86_SET_MSR_FILTER
have no protection for their unused bits. Without protection, future
development for these features will be difficult. Add the protection
needed to make it possible to extend these features in the future.
Paolo Bonzini [Tue, 19 Jul 2022 13:16:53 +0000 (09:16 -0400)]
tools headers UAPI: Sync linux/kvm.h with the kernel sources
Silence this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
KVM: selftests: Fix target thread to be migrated in rseq_test
In rseq_test, there are two threads, which are vCPU thread and migration
worker separately. Unfortunately, the test has the wrong PID passed to
sched_setaffinity() in the migration worker. It forces migration on the
migration worker because zeroed PID represents the calling thread, which
is the migration worker itself. It means the vCPU thread is never enforced
to migration and it can migrate at any time, which eventually leads to
failure as the following logs show.
host# uname -r
5.19.0-rc6-gavin+
host# # cat /proc/cpuinfo | grep processor | tail -n 1
processor : 223
host# pwd
/home/gavin/sandbox/linux.main/tools/testing/selftests/kvm
host# for i in `seq 1 100`; do \
echo "--------> $i"; ./rseq_test; done
--------> 1
--------> 2
--------> 3
--------> 4
--------> 5
--------> 6
==== Test Assertion Failure ====
rseq_test.c:265: rseq_cpu == cpu
pid=3925 tid=3925 errno=4 - Interrupted system call
1 0x0000000000401963: main at rseq_test.c:265 (discriminator 2)
2 0x0000ffffb044affb: ?? ??:0
3 0x0000ffffb044b0c7: ?? ??:0
4 0x0000000000401a6f: _start at ??:?
rseq CPU = 4, sched CPU = 27
Fix the issue by passing correct parameter, TID of the vCPU thread, to
sched_setaffinity() in the migration worker.
====================
amt: fix validation and synchronization bugs
There are some synchronization issues in the amt module.
Especially, an amt gateway doesn't well synchronize its own variables
and status(amt->status).
It tries to use a workqueue for handles in a single thread.
A global lock is also good, but it would occur complex locking complex.
In this patchset, only the gateway uses workqueue.
The reason why only gateway interface uses workqueue is that gateway
should manage its own states and variables a little bit statefully.
But relay doesn't need to manage tunnels statefully, stateless is okay.
So, relay side message handlers are okay to be called concurrently.
But it doesn't mean that no lock is needed.
Only amt multicast data message type will not be processed by the work
queue because It contains actual multicast data.
So, it should be processed immediately.
When any amt gateway events are triggered(sending discovery message by
delayed_work, sending request message by delayed_work and receiving
messages), it stores event and skb into the event queue(amt->events[16]).
Then, workqueue processes these events one by one.
The first patch is to use the work queue.
The second patch is to remove unnecessary lock due to a previous patch.
The third patch is to use READ_ONCE() in the amt module.
Even if the amt module uses a single thread, some variables (ready4,
ready6, amt->status) can be accessed concurrently.
The fourth patch is to add missing nonce generation logic when it sends a
new request message.
The fifth patch is to drop unexpected advertisement messages.
advertisement message should be received only after the gateway sends
a discovery message first.
So, the gateway should drop advertisement messages if it has never
sent a discovery message and it also should drop duplicate advertisement
messages.
Using nonce is good to distinguish whether a received message is an
expected message or not.
The sixth patch is to drop unexpected query messages.
This is the same behavior as the fourth patch.
Query messages should be received only after the gateway sends a request
message first.
The nonce variable is used to distinguish whether it is a reply to a
previous request message or not.
amt->ready4 and amt->ready6 are used to distinguish duplicate messages.
The seventh patch is to drop unexpected multicast data.
AMT gateway should not receive multicast data message type before
establish between gateway and relay.
In order to drop unexpected multicast data messages, it checks amt->status.
The last patch is to fix a locking problem on the relay side.
amt->nr_tunnels variable is protected by amt->lock.
But amt_request_handler() doesn't protect this variable.
v2:
- Use local_bh_disable() instead of rcu_read_lock_bh() in
amt_membership_query_handler.
- Fix using uninitialized variables.
- Fix unexpectedly start the event_wq after stopping.
- Fix possible deadlock in amt_event_work().
- Add a limit variable in amt_event_work() to prevent infinite working.
- Rename amt_queue_events() to amt_queue_event().
====================
amt->nr_tunnels is protected by amt->lock.
But, amt_request_handler() has been using this variable without the
amt->lock.
So, it expands context of amt->lock in the amt_request_handler() to
protect amt->nr_tunnels variable.
Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
AMT gateway interface should not receive unexpected multicast data.
Multicast data message type should be received after sending an update
message, which means all establishment between gateway and relay is
finished.
So, amt_multicast_data_handler() checks amt->status.
Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
AMT gateway interface should not receive unexpected query messages.
In order to drop unexpected query messages, it checks nonce.
And it also checks ready4 and ready6 variables to drop duplicated messages.
Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
AMT gateway interface should not receive unexpected advertisement messages.
In order to drop these packets, it should check nonce and amt->status.
Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
amt: add missing regeneration nonce logic in request logic
When AMT gateway starts sending a new request message, it should
regenerate the nonce variable.
Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
There are some data races in the amt module.
amt->ready4, amt->ready6, and amt->status can be accessed concurrently
without locks.
So, it uses READ_ONCE() and WRITE_ONCE().
Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
By the previous patch, amt gateway handlers are changed to worked by
a single thread.
So, most locks for gateway are not needed.
So, it removes.
Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
amt: use workqueue for gateway side message handling
There are some synchronization issues(amt->status, amt->req_cnt, etc)
if the interface is in gateway mode because gateway message handlers
are processed concurrently.
This applies a work queue for processing these messages instead of
expanding the locking context.
So, the purposes of this patch are to fix exist race conditions and to make
gateway to be able to validate a gateway status more correctly.
When the AMT gateway interface is created, it tries to establish to relay.
The establishment step looks stateless, but it should be managed well.
In order to handle messages in the gateway, it saves the current
status(i.e. AMT_STATUS_XXX).
This patch makes gateway code to be worked with a single thread.
Now, all messages except the multicast are triggered(received or
delay expired), and these messages will be stored in the event
queue(amt->events).
Then, the single worker processes stored messages asynchronously one
by one.
The multicast data message type will be still processed immediately.
Now, amt->lock is only needed to access the event queue(amt->events)
if an interface is the gateway mode.
Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
Add spi_device_id entries to silent following warnings:
SPI driver sja1105 has no spi_device_id for nxp,sja1105e
SPI driver sja1105 has no spi_device_id for nxp,sja1105t
SPI driver sja1105 has no spi_device_id for nxp,sja1105p
SPI driver sja1105 has no spi_device_id for nxp,sja1105q
SPI driver sja1105 has no spi_device_id for nxp,sja1105r
SPI driver sja1105 has no spi_device_id for nxp,sja1105s
SPI driver sja1105 has no spi_device_id for nxp,sja1110a
SPI driver sja1105 has no spi_device_id for nxp,sja1110b
SPI driver sja1105 has no spi_device_id for nxp,sja1110c
SPI driver sja1105 has no spi_device_id for nxp,sja1110d
be2net: Fix buffer overflow in be_get_module_eeprom
be_cmd_read_port_transceiver_data assumes that it is given a buffer that
is at least PAGE_DATA_LEN long, or twice that if the module supports SFF
8472. However, this is not always the case.
Fix this by passing the desired offset and length to
be_cmd_read_port_transceiver_data so that we only copy the bytes once.
gpio: pca953x: use the correct range when do regmap sync
regmap will sync a range of registers, here use the correct range
to make sure the sync do not touch other unexpected registers.
Find on pca9557pw on imx8qxp/dxl evk board, this device support
8 pin, so only need one register(8 bits) to cover all the 8 pins's
property setting. But when sync the output, we find it actually
update two registers, output register and the following register.
gpio: pca953x: only use single read/write for No AI mode
For the device use NO AI mode(not support auto address increment),
only use the single read/write when config the regmap.
We meet issue on PCA9557PW on i.MX8QXP/DXL evk board, this device
do not support AI mode, but when do the regmap sync, regmap will
sync 3 byte data to register 1, logically this means write first
data to register 1, write second data to register 2, write third data
to register 3. But this device do not support AI mode, finally, these
three data write only into register 1 one by one. the reault is the
value of register 1 alway equal to the latest data, here is the third
data, no operation happened on register 2 and register 3. This is
not what we expect.
clk: lan966x: Fix the lan966x clock gate register address
The register address used for the clock gate register is the base
register address coming from first reg map (ie. the generic
clock registers) instead of the second reg map defining the clock
gate register.
Disable is done in stmmac_init_eee() on the event of MAC link down.
Since setting enable/disable EEE via ethtool will eventually trigger
a MAC down, removing this redunctant call in stmmac_ethtool.c to avoid
calling xpcs_config_eee() twice.
====================
Fix 2 DSA issues with vlan_filtering_is_global
This patch set fixes 2 issues with vlan_filtering_is_global switches.
Both are regressions introduced by refactoring commit d0004a020bb5
("net: dsa: remove the "dsa_to_port in a loop" antipattern from the
core"), which wasn't tested on a wide enough variety of switches.
Tested on the sja1105 driver.
====================
Vladimir Oltean [Fri, 15 Jul 2022 15:16:59 +0000 (18:16 +0300)]
net: dsa: fix NULL pointer dereference in dsa_port_reset_vlan_filtering
The "ds" iterator variable used in dsa_port_reset_vlan_filtering() ->
dsa_switch_for_each_port() overwrites the "dp" received as argument,
which is later used to call dsa_port_vlan_filtering() proper.
As a result, switches which do enter that code path (the ones with
vlan_filtering_is_global=true) will dereference an invalid dp in
dsa_port_reset_vlan_filtering() after leaving a VLAN-aware bridge.
Use a dedicated "other_dp" iterator variable to avoid this from
happening.
Fixes: d0004a020bb5 ("net: dsa: remove the "dsa_to_port in a loop" antipattern from the core") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
Vladimir Oltean [Fri, 15 Jul 2022 15:16:58 +0000 (18:16 +0300)]
net: dsa: fix dsa_port_vlan_filtering when global
The blamed refactoring commit changed a "port" iterator with "other_dp",
but still looked at the slave_dev of the dp outside the loop, instead of
other_dp->slave from the loop.
As a result, dsa_port_vlan_filtering() would not call
dsa_slave_manage_vlan_filtering() except for the port in cause, and not
for all switch ports as expected.
Fixes: d0004a020bb5 ("net: dsa: remove the "dsa_to_port in a loop" antipattern from the core") Reported-by: Lucian Banu <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
i40e: Fix erroneous adapter reinitialization during recovery process
Fix an issue when driver incorrectly detects state
of recovery process and erroneously reinitializes interrupts,
which results in a kernel error and call trace message.
The issue was caused by a combination of two factors:
1. Assuming the EMP reset issued after completing
firmware recovery means the whole recovery process is complete.
2. Erroneous reinitialization of interrupt vector after detecting
the above mentioned EMP reset.
Fixes (1) by changing how recovery state change is detected
and (2) by adjusting the conditional expression to ensure using proper
interrupt reinitialization method, depending on the situation.
====================
net: lan966x: Fix issues with MAC table
The patch series fixes 2 issues:
- when an entry was forgotten the irq thread was holding a spin lock and then
was talking also rtnl_lock.
- the access to the HW MAC table is indirect, so the access to the HW MAC
table was not synchronized, which means that there could be race conditions.
====================
net: lan966x: Fix usage of lan966x->mac_lock when used by FDB
When the SW bridge was trying to add/remove entries to/from HW, the
access to HW was not protected by any lock. In this way, it was
possible to have race conditions.
Fix this by using the lan966x->mac_lock to protect parallel access to HW
for this cases.
Fixes: 25ee9561ec622 ("net: lan966x: More MAC table functionality") Signed-off-by: Horatiu Vultur <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
net: lan966x: Fix usage of lan966x->mac_lock inside lan966x_mac_irq_handler
The problem with this spin lock is that it was just protecting the list
of the MAC entries in SW and not also the access to the MAC entries in HW.
Because the access to HW is indirect, then it could happen to have race
conditions.
For example when SW introduced an entry in MAC table and the irq mac is
trying to read something from the MAC.
Update such that also the access to MAC entries in HW is protected by
this lock.
Fixes: 5ccd66e01cbef ("net: lan966x: add support for interrupts from analyzer") Signed-off-by: Horatiu Vultur <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
net: lan966x: Fix usage of lan966x->mac_lock when entry is removed
To remove an entry to the MAC table, it is required first to setup the
entry and then issue a command for the MAC to forget the entry.
So if it happens for two threads to remove simultaneously an entry
in MAC table then it would be a race condition.
Fix this by using lan966x->mac_lock to protect the HW access.
net: lan966x: Fix usage of lan966x->mac_lock when entry is added
To add an entry to the MAC table, it is required first to setup the
entry and then issue a command for the MAC to learn the entry.
So if it happens for two threads to add simultaneously an entry in MAC
table then it would be a race condition.
Fix this by using lan966x->mac_lock to protect the HW access.
Fixes: fc0c3fe7486f2 ("net: lan966x: Add function lan966x_mac_ip_learn()") Signed-off-by: Horatiu Vultur <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
net: lan966x: Fix taking rtnl_lock while holding spin_lock
When the HW deletes an entry in MAC table then it generates an
interrupt. The SW will go through it's own list of MAC entries and if it
is not found then it would notify the listeners about this. The problem
is that when the SW will go through it's own list it would take a spin
lock(lan966x->mac_lock) and when it notifies that the entry is deleted.
But to notify the listeners it taking the rtnl_lock which is illegal.
This is fixed by instead of notifying right away that the entry is
deleted, move the entry on a temp list and once, it checks all the
entries then just notify that the entries from temp list are deleted.
Fixes: 5ccd66e01cbe ("net: lan966x: add support for interrupts from analyzer") Signed-off-by: Horatiu Vultur <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Two bug fixes for irdma:
- x722 does not support 1GB pages, trying to configure them will
corrupt the dma mapping
- Fix a sleep while holding a spinlock"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/irdma: Fix sleep from invalid context BUG
RDMA/irdma: Do not advertise 1GB page size for x722
Vladimir Oltean [Sat, 16 Jul 2022 23:37:45 +0000 (02:37 +0300)]
pinctrl: armada-37xx: use raw spinlocks for regmap to avoid invalid wait context
The irqchip->irq_set_type method is called by __irq_set_trigger() under
the desc->lock raw spinlock.
The armada-37xx implementation, armada_37xx_irq_set_type(), uses an MMIO
regmap created by of_syscon_register(), which uses plain spinlocks
(the kind that are sleepable on RT).
Therefore, this is an invalid locking scheme for which we get a kernel
splat stating just that ("[ BUG: Invalid wait context ]"), because the
context in which the plain spinlock may sleep is atomic due to the raw
spinlock. We need to go raw spinlocks all the way.
Make this driver create its own MMIO regmap, with use_raw_spinlock=true,
and stop relying on syscon to provide it.
This patch depends on commit 67021f25d952 ("regmap: teach regmap to use
raw spinlocks if requested in the config").
Vladimir Oltean [Sat, 16 Jul 2022 23:37:44 +0000 (02:37 +0300)]
pinctrl: armada-37xx: make irq_lock a raw spinlock to avoid invalid wait context
The irqchip->irq_set_type method is called by __irq_set_trigger() under
the desc->lock raw spinlock.
The armada-37xx implementation, armada_37xx_irq_set_type(), takes a
plain spinlock, the kind that becomes sleepable on RT.
Therefore, this is an invalid locking scheme for which we get a kernel
splat stating just that ("[ BUG: Invalid wait context ]"), because the
context in which the plain spinlock may sleep is atomic due to the raw
spinlock. We need to go raw spinlocks all the way.
Replace the driver's irq_lock with a raw spinlock, to disable preemption
even on RT.
ACPI: CPPC: Don't require flexible address space if X86_FEATURE_CPPC is supported
Commit 0651ab90e4ad ("ACPI: CPPC: Check _OSC for flexible address space")
changed _CPC probing to require flexible address space to be negotiated
for CPPC to work.
However it was observed that this caused a regression for Arek's ROG
Zephyrus G15 GA503QM which previously CPPC worked, but now it stopped
working.
To avoid causing a regression waive this failure when the CPU is known
to support CPPC.
Extend iavf_state_str by strings for __IAVF_INIT_EXTENDED_CAPS and
__IAVF_INIT_CONFIG_ADAPTER.
Without this patch, when enabling debug prints for iavf.h, user will
see:
iavf 0000:06:0e.0: state transition from:__IAVF_INIT_GET_RESOURCES to:__IAVF_UNKNOWN_STATE
iavf 0000:06:0e.0: state transition from:__IAVF_UNKNOWN_STATE to:__IAVF_UNKNOWN_STATE
Fix memory leak caused by not handling dummy receive descriptor properly.
iavf_get_rx_buffer now sets the rx_buffer return value for dummy receive
descriptors. Without this patch, when the hardware writes a dummy
descriptor, iavf would not free the page allocated for the previous receive
buffer. This is an unlikely event but can still happen.
iavf: Disallow changing rx/tx-frames and rx/tx-frames-irq
Remove from supported_coalesce_params ETHTOOL_COALESCE_MAX_FRAMES
and ETHTOOL_COALESCE_MAX_FRAMES_IRQ. As tx-frames-irq allowed
user to change budget for iavf_clean_tx_irq, remove work_limit
and use define for budget.
Without this patch there would be possibility to change rx/tx-frames
and rx/tx-frames-irq, which for rx/tx-frames did nothing, while for
rx/tx-frames-irq it changed rx/tx-frames and only changed budget
for cleaning NAPI poll.
Fix VLAN addition, so that PF driver does not reject whole VLAN batch.
Add VLAN reject handling, so rejected VLANs, won't litter VLAN filter
list. Fix handling of active_(c/s)vlans, so it will be possible to
re-add VLAN filters for user.
Without this patch, after changing trust to off, with VLAN filters
saturated, no VLAN is added, due to PF rejecting addition.
Fixes: 92fc50859872 ("iavf: Restrict maximum VLAN filters for VIRTCHNL_VF_OFFLOAD_VLAN_V2") Signed-off-by: Przemyslaw Patynowski <[email protected]> Signed-off-by: Jedrzej Jagielski <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
Peter Zijlstra [Mon, 18 Jul 2022 11:41:37 +0000 (13:41 +0200)]
x86/amd: Use IBPB for firmware calls
On AMD IBRS does not prevent Retbleed; as such use IBPB before a
firmware call to flush the branch history state.
And because in order to do an EFI call, the kernel maps a whole lot of
the kernel page table into the EFI page table, do an IBPB just in case
in order to prevent the scenario of poisoning the BTB and causing an EFI
call using the unprotected RET there.
David S. Miller [Mon, 18 Jul 2022 11:44:37 +0000 (12:44 +0100)]
Merge branch 'dsa-docs'
Vladimir Oltean says:
====================
Update DSA documentation
These are some updates of dsa.rst, since it hasn't kept up with
development (in some cases, even since 2017). I've added Fixes: tags as
I thought was appropriate.
====================
Vladimir Oltean [Sat, 16 Jul 2022 18:53:44 +0000 (21:53 +0300)]
docs: net: dsa: mention that VLANs are now refcounted on shared ports
The blamed commit updated the way in which VLANs are handled at the
cross-chip notifier layer and didn't update the documentation to say
that. Fix it.
Fixes: 134ef2388e7f ("net: dsa: add explicit support for host bridge VLANs") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Vladimir Oltean [Sat, 16 Jul 2022 18:53:43 +0000 (21:53 +0300)]
docs: net: dsa: delete misinformation about -EOPNOTSUPP for FDB/MDB/VLAN
Returning -EOPNOTSUPP does *NOT* mean anything special.
port_vlan_add() is actually called from 2 code paths, one is
vlan_vid_add() from 8021q module and the other is
br_switchdev_port_vlan_add() from switchdev.
The bridge has a wrapper __vlan_vid_add() which first tries via
switchdev, then if that returns -EOPNOTSUPP, tries again via the VLAN RX
filters in the 8021q module. But DSA doesn't distinguish between one
call path and the other when calling the driver's port_vlan_add(), so if
the driver returns -EOPNOTSUPP to switchdev, it also returns -EOPNOTSUPP
to the 8021q module. And the latter is a hard error.
port_fdb_add() is called from the deferred dsa_owq only, so obviously
its return code isn't propagated anywhere, and cannot be interpreted in
any way.
The return code from port_mdb_add() is propagated to the bridge, but
again, this doesn't do anything special when -EOPNOTSUPP is returned,
but rather, br_switchdev_mdb_notify() returns void.
Vladimir Oltean [Sat, 16 Jul 2022 18:53:41 +0000 (21:53 +0300)]
docs: net: dsa: add a section for address databases
The given definition for what VID 0 represents in the current
port_fdb_add and port_mdb_add is blatantly wrong. Delete it and explain
the concepts surrounding DSA's understanding of FDB isolation.
Vladimir Oltean [Sat, 16 Jul 2022 18:53:39 +0000 (21:53 +0300)]
docs: net: dsa: remove port_vlan_dump
This was deleted in 2017, delete the obsolete documentation.
Fixes: c069fcd82c57 ("net: dsa: Remove support for bypass bridge port attributes/vlan set") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Vladimir Oltean [Sat, 16 Jul 2022 18:53:38 +0000 (21:53 +0300)]
docs: net: dsa: remove port_bridge_tx_fwd_offload
We've changed the API through which we can offload the bridge TX
forwarding process. Update the documentation in light of the removal of
2 DSA switch ops.
Fixes: b079922ba2ac ("net: dsa: add a "tx_fwd_offload" argument to ->port_bridge_join") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>