AMT gateway interface should not receive unexpected advertisement messages.
In order to drop these packets, it should check nonce and amt->status.
Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
amt: add missing regeneration nonce logic in request logic
When AMT gateway starts sending a new request message, it should
regenerate the nonce variable.
Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
There are some data races in the amt module.
amt->ready4, amt->ready6, and amt->status can be accessed concurrently
without locks.
So, it uses READ_ONCE() and WRITE_ONCE().
Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
By the previous patch, amt gateway handlers are changed to worked by
a single thread.
So, most locks for gateway are not needed.
So, it removes.
Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
amt: use workqueue for gateway side message handling
There are some synchronization issues(amt->status, amt->req_cnt, etc)
if the interface is in gateway mode because gateway message handlers
are processed concurrently.
This applies a work queue for processing these messages instead of
expanding the locking context.
So, the purposes of this patch are to fix exist race conditions and to make
gateway to be able to validate a gateway status more correctly.
When the AMT gateway interface is created, it tries to establish to relay.
The establishment step looks stateless, but it should be managed well.
In order to handle messages in the gateway, it saves the current
status(i.e. AMT_STATUS_XXX).
This patch makes gateway code to be worked with a single thread.
Now, all messages except the multicast are triggered(received or
delay expired), and these messages will be stored in the event
queue(amt->events).
Then, the single worker processes stored messages asynchronously one
by one.
The multicast data message type will be still processed immediately.
Now, amt->lock is only needed to access the event queue(amt->events)
if an interface is the gateway mode.
Fixes: cbc21dc1cfe9 ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
Add spi_device_id entries to silent following warnings:
SPI driver sja1105 has no spi_device_id for nxp,sja1105e
SPI driver sja1105 has no spi_device_id for nxp,sja1105t
SPI driver sja1105 has no spi_device_id for nxp,sja1105p
SPI driver sja1105 has no spi_device_id for nxp,sja1105q
SPI driver sja1105 has no spi_device_id for nxp,sja1105r
SPI driver sja1105 has no spi_device_id for nxp,sja1105s
SPI driver sja1105 has no spi_device_id for nxp,sja1110a
SPI driver sja1105 has no spi_device_id for nxp,sja1110b
SPI driver sja1105 has no spi_device_id for nxp,sja1110c
SPI driver sja1105 has no spi_device_id for nxp,sja1110d
be2net: Fix buffer overflow in be_get_module_eeprom
be_cmd_read_port_transceiver_data assumes that it is given a buffer that
is at least PAGE_DATA_LEN long, or twice that if the module supports SFF
8472. However, this is not always the case.
Fix this by passing the desired offset and length to
be_cmd_read_port_transceiver_data so that we only copy the bytes once.
gpio: pca953x: use the correct range when do regmap sync
regmap will sync a range of registers, here use the correct range
to make sure the sync do not touch other unexpected registers.
Find on pca9557pw on imx8qxp/dxl evk board, this device support
8 pin, so only need one register(8 bits) to cover all the 8 pins's
property setting. But when sync the output, we find it actually
update two registers, output register and the following register.
gpio: pca953x: only use single read/write for No AI mode
For the device use NO AI mode(not support auto address increment),
only use the single read/write when config the regmap.
We meet issue on PCA9557PW on i.MX8QXP/DXL evk board, this device
do not support AI mode, but when do the regmap sync, regmap will
sync 3 byte data to register 1, logically this means write first
data to register 1, write second data to register 2, write third data
to register 3. But this device do not support AI mode, finally, these
three data write only into register 1 one by one. the reault is the
value of register 1 alway equal to the latest data, here is the third
data, no operation happened on register 2 and register 3. This is
not what we expect.
clk: lan966x: Fix the lan966x clock gate register address
The register address used for the clock gate register is the base
register address coming from first reg map (ie. the generic
clock registers) instead of the second reg map defining the clock
gate register.
Disable is done in stmmac_init_eee() on the event of MAC link down.
Since setting enable/disable EEE via ethtool will eventually trigger
a MAC down, removing this redunctant call in stmmac_ethtool.c to avoid
calling xpcs_config_eee() twice.
====================
Fix 2 DSA issues with vlan_filtering_is_global
This patch set fixes 2 issues with vlan_filtering_is_global switches.
Both are regressions introduced by refactoring commit d0004a020bb5
("net: dsa: remove the "dsa_to_port in a loop" antipattern from the
core"), which wasn't tested on a wide enough variety of switches.
Tested on the sja1105 driver.
====================
Vladimir Oltean [Fri, 15 Jul 2022 15:16:59 +0000 (18:16 +0300)]
net: dsa: fix NULL pointer dereference in dsa_port_reset_vlan_filtering
The "ds" iterator variable used in dsa_port_reset_vlan_filtering() ->
dsa_switch_for_each_port() overwrites the "dp" received as argument,
which is later used to call dsa_port_vlan_filtering() proper.
As a result, switches which do enter that code path (the ones with
vlan_filtering_is_global=true) will dereference an invalid dp in
dsa_port_reset_vlan_filtering() after leaving a VLAN-aware bridge.
Use a dedicated "other_dp" iterator variable to avoid this from
happening.
Fixes: d0004a020bb5 ("net: dsa: remove the "dsa_to_port in a loop" antipattern from the core") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
Vladimir Oltean [Fri, 15 Jul 2022 15:16:58 +0000 (18:16 +0300)]
net: dsa: fix dsa_port_vlan_filtering when global
The blamed refactoring commit changed a "port" iterator with "other_dp",
but still looked at the slave_dev of the dp outside the loop, instead of
other_dp->slave from the loop.
As a result, dsa_port_vlan_filtering() would not call
dsa_slave_manage_vlan_filtering() except for the port in cause, and not
for all switch ports as expected.
Fixes: d0004a020bb5 ("net: dsa: remove the "dsa_to_port in a loop" antipattern from the core") Reported-by: Lucian Banu <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
i40e: Fix erroneous adapter reinitialization during recovery process
Fix an issue when driver incorrectly detects state
of recovery process and erroneously reinitializes interrupts,
which results in a kernel error and call trace message.
The issue was caused by a combination of two factors:
1. Assuming the EMP reset issued after completing
firmware recovery means the whole recovery process is complete.
2. Erroneous reinitialization of interrupt vector after detecting
the above mentioned EMP reset.
Fixes (1) by changing how recovery state change is detected
and (2) by adjusting the conditional expression to ensure using proper
interrupt reinitialization method, depending on the situation.
====================
net: lan966x: Fix issues with MAC table
The patch series fixes 2 issues:
- when an entry was forgotten the irq thread was holding a spin lock and then
was talking also rtnl_lock.
- the access to the HW MAC table is indirect, so the access to the HW MAC
table was not synchronized, which means that there could be race conditions.
====================
net: lan966x: Fix usage of lan966x->mac_lock when used by FDB
When the SW bridge was trying to add/remove entries to/from HW, the
access to HW was not protected by any lock. In this way, it was
possible to have race conditions.
Fix this by using the lan966x->mac_lock to protect parallel access to HW
for this cases.
Fixes: 25ee9561ec622 ("net: lan966x: More MAC table functionality") Signed-off-by: Horatiu Vultur <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
net: lan966x: Fix usage of lan966x->mac_lock inside lan966x_mac_irq_handler
The problem with this spin lock is that it was just protecting the list
of the MAC entries in SW and not also the access to the MAC entries in HW.
Because the access to HW is indirect, then it could happen to have race
conditions.
For example when SW introduced an entry in MAC table and the irq mac is
trying to read something from the MAC.
Update such that also the access to MAC entries in HW is protected by
this lock.
Fixes: 5ccd66e01cbef ("net: lan966x: add support for interrupts from analyzer") Signed-off-by: Horatiu Vultur <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
net: lan966x: Fix usage of lan966x->mac_lock when entry is removed
To remove an entry to the MAC table, it is required first to setup the
entry and then issue a command for the MAC to forget the entry.
So if it happens for two threads to remove simultaneously an entry
in MAC table then it would be a race condition.
Fix this by using lan966x->mac_lock to protect the HW access.
net: lan966x: Fix usage of lan966x->mac_lock when entry is added
To add an entry to the MAC table, it is required first to setup the
entry and then issue a command for the MAC to learn the entry.
So if it happens for two threads to add simultaneously an entry in MAC
table then it would be a race condition.
Fix this by using lan966x->mac_lock to protect the HW access.
Fixes: fc0c3fe7486f2 ("net: lan966x: Add function lan966x_mac_ip_learn()") Signed-off-by: Horatiu Vultur <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
net: lan966x: Fix taking rtnl_lock while holding spin_lock
When the HW deletes an entry in MAC table then it generates an
interrupt. The SW will go through it's own list of MAC entries and if it
is not found then it would notify the listeners about this. The problem
is that when the SW will go through it's own list it would take a spin
lock(lan966x->mac_lock) and when it notifies that the entry is deleted.
But to notify the listeners it taking the rtnl_lock which is illegal.
This is fixed by instead of notifying right away that the entry is
deleted, move the entry on a temp list and once, it checks all the
entries then just notify that the entries from temp list are deleted.
Fixes: 5ccd66e01cbe ("net: lan966x: add support for interrupts from analyzer") Signed-off-by: Horatiu Vultur <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Two bug fixes for irdma:
- x722 does not support 1GB pages, trying to configure them will
corrupt the dma mapping
- Fix a sleep while holding a spinlock"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/irdma: Fix sleep from invalid context BUG
RDMA/irdma: Do not advertise 1GB page size for x722
Vladimir Oltean [Sat, 16 Jul 2022 23:37:45 +0000 (02:37 +0300)]
pinctrl: armada-37xx: use raw spinlocks for regmap to avoid invalid wait context
The irqchip->irq_set_type method is called by __irq_set_trigger() under
the desc->lock raw spinlock.
The armada-37xx implementation, armada_37xx_irq_set_type(), uses an MMIO
regmap created by of_syscon_register(), which uses plain spinlocks
(the kind that are sleepable on RT).
Therefore, this is an invalid locking scheme for which we get a kernel
splat stating just that ("[ BUG: Invalid wait context ]"), because the
context in which the plain spinlock may sleep is atomic due to the raw
spinlock. We need to go raw spinlocks all the way.
Make this driver create its own MMIO regmap, with use_raw_spinlock=true,
and stop relying on syscon to provide it.
This patch depends on commit 67021f25d952 ("regmap: teach regmap to use
raw spinlocks if requested in the config").
Vladimir Oltean [Sat, 16 Jul 2022 23:37:44 +0000 (02:37 +0300)]
pinctrl: armada-37xx: make irq_lock a raw spinlock to avoid invalid wait context
The irqchip->irq_set_type method is called by __irq_set_trigger() under
the desc->lock raw spinlock.
The armada-37xx implementation, armada_37xx_irq_set_type(), takes a
plain spinlock, the kind that becomes sleepable on RT.
Therefore, this is an invalid locking scheme for which we get a kernel
splat stating just that ("[ BUG: Invalid wait context ]"), because the
context in which the plain spinlock may sleep is atomic due to the raw
spinlock. We need to go raw spinlocks all the way.
Replace the driver's irq_lock with a raw spinlock, to disable preemption
even on RT.
This commit introduced a regression that can cause mount hung. The
changes in __ocfs2_find_empty_slot causes that any node with none-zero
node number can grab the slot that was already taken by node 0, so node 1
will access the same journal with node 0, when it try to grab journal
cluster lock, it will hung because it was already acquired by node 0.
It's very easy to reproduce this, in one cluster, mount node 0 first, then
node 1, you will see the following call trace from node 1.
To fix it, we can just fix __ocfs2_find_empty_slot. But original commit
introduced the feature to mount ocfs2 locally even it is cluster based,
that is a very dangerous, it can easily cause serious data corruption,
there is no way to stop other nodes mounting the fs and corrupting it.
Setup ha or other cluster-aware stack is just the cost that we have to
take for avoiding corruption, otherwise we have to do it in kernel.
==================================================================
BUG: KASAN: use-after-free in ntfs_ucsncmp+0x123/0x130
Read of size 2 at addr ffff8880751acee8 by task a.out/879
Eric Biggers suggested that this happens when
secretmem_setattr()->simple_setattr() races with secretmem_fault() so that
a page that is faulted in by secretmem_fault() (and thus removed from the
direct map) is zeroed by inode truncation right afterwards.
Use mapping->invalidate_lock to make secretmem_fault() and
secretmem_setattr() mutually exclusive.
mm/hugetlb: separate path for hwpoison entry in copy_hugetlb_page_range()
Originally copy_hugetlb_page_range() handles migration entries and
hwpoisoned entries in similar manner. But recently the related code path
has more code for migration entries, and when
is_writable_migration_entry() was converted to
!is_readable_migration_entry(), hwpoison entries on source processes got
to be unexpectedly updated (which is legitimate for migration entries, but
not for hwpoison entries). This results in unexpected serious issues like
kernel panic when forking processes with hwpoison entries in pmd.
Separate the if branch into one for hwpoison entries and one for migration
entries.
Muchun Song [Tue, 5 Jul 2022 12:35:32 +0000 (20:35 +0800)]
mm: fix missing wake-up event for FSDAX pages
FSDAX page refcounts are 1-based, rather than 0-based: if refcount is
1, then the page is freed. The FSDAX pages can be pinned through GUP,
then they will be unpinned via unpin_user_page() using a folio variant
to put the page, however, folio variants did not consider this special
case, the result will be to miss a wakeup event (like the user of
__fuse_dax_break_layouts()). This results in a task being permanently
stuck in TASK_INTERRUPTIBLE state.
Since FSDAX pages are only possibly obtained by GUP users, so fix GUP
instead of folio_put() to lower overhead.
Josef Bacik [Tue, 5 Jul 2022 20:00:36 +0000 (16:00 -0400)]
mm: fix page leak with multiple threads mapping the same page
We have an application with a lot of threads that use a shared mmap backed
by tmpfs mounted with -o huge=within_size. This application started
leaking loads of huge pages when we upgraded to a recent kernel.
Using the page ref tracepoints and a BPF program written by Tejun Heo we
were able to determine that these pages would have multiple refcounts from
the page fault path, but when it came to unmap time we wouldn't drop the
number of refs we had added from the faults.
I wrote a reproducer that mmap'ed a file backed by tmpfs with -o
huge=always, and then spawned 20 threads all looping faulting random
offsets in this map, while using madvise(MADV_DONTNEED) randomly for huge
page aligned ranges. This very quickly reproduced the problem.
The problem here is that we check for the case that we have multiple
threads faulting in a range that was previously unmapped. One thread maps
the PMD, the other thread loses the race and then returns 0. However at
this point we already have the page, and we are no longer putting this
page into the processes address space, and so we leak the page. We
actually did the correct thing prior to f9ce0be71d1f, however it looks
like Kirill copied what we do in the anonymous page case. In the
anonymous page case we don't yet have a page, so we don't have to drop a
reference on anything. Previously we did the correct thing for file based
faults by returning VM_FAULT_NOPAGE so we correctly drop the reference on
the page we faulted in.
Fix this by returning VM_FAULT_NOPAGE in the pmd_devmap_trans_unstable()
case, this makes us drop the ref on the page properly, and now my
reproducer no longer leaks the huge pages.
ZhaoLong Wang [Wed, 29 Jun 2022 12:43:24 +0000 (20:43 +0800)]
tmpfs: fix the issue that the mount and remount results are inconsistent.
An undefined-behavior issue has not been completely fixed since commit d14f5efadd84 ("tmpfs: fix undefined-behaviour in shmem_reconfigure()").
In the commit, check in the shmem_reconfigure() is added in remount
process to avoid the Ubsan problem. However, the check is not added to
the mount process. It causes inconsistent results between mount and
remount. The operations to reproduce the problem in user mode as follows:
If nr_blocks is set to 0x8000000000000000, the mounting is successful.
# mount tmpfs /dev/shm/ -t tmpfs -o nr_blocks=0x8000000000000000
However, when -o remount is used, the mount fails because of the
check in the shmem_reconfigure()
# mount tmpfs /dev/shm/ -t tmpfs -o remount,nr_blocks=0x8000000000000000
mount: /dev/shm: mount point not mounted or bad option.
Therefore, add checks in the shmem_parse_one() function and remove the
check in shmem_reconfigure() to avoid this problem.
Yee Lee [Tue, 28 Jun 2022 11:37:11 +0000 (19:37 +0800)]
mm: kfence: apply kmemleak_ignore_phys on early allocated pool
This patch solves two issues.
(1) The pool allocated by memblock needs to unregister from
kmemleak scanning. Apply kmemleak_ignore_phys to replace the
original kmemleak_free as its address now is stored in the phys tree.
(2) The pool late allocated by page-alloc doesn't need to unregister.
Move out the freeing operation from its call path.
ACPI: CPPC: Don't require flexible address space if X86_FEATURE_CPPC is supported
Commit 0651ab90e4ad ("ACPI: CPPC: Check _OSC for flexible address space")
changed _CPC probing to require flexible address space to be negotiated
for CPPC to work.
However it was observed that this caused a regression for Arek's ROG
Zephyrus G15 GA503QM which previously CPPC worked, but now it stopped
working.
To avoid causing a regression waive this failure when the CPU is known
to support CPPC.
Extend iavf_state_str by strings for __IAVF_INIT_EXTENDED_CAPS and
__IAVF_INIT_CONFIG_ADAPTER.
Without this patch, when enabling debug prints for iavf.h, user will
see:
iavf 0000:06:0e.0: state transition from:__IAVF_INIT_GET_RESOURCES to:__IAVF_UNKNOWN_STATE
iavf 0000:06:0e.0: state transition from:__IAVF_UNKNOWN_STATE to:__IAVF_UNKNOWN_STATE
Fix memory leak caused by not handling dummy receive descriptor properly.
iavf_get_rx_buffer now sets the rx_buffer return value for dummy receive
descriptors. Without this patch, when the hardware writes a dummy
descriptor, iavf would not free the page allocated for the previous receive
buffer. This is an unlikely event but can still happen.
iavf: Disallow changing rx/tx-frames and rx/tx-frames-irq
Remove from supported_coalesce_params ETHTOOL_COALESCE_MAX_FRAMES
and ETHTOOL_COALESCE_MAX_FRAMES_IRQ. As tx-frames-irq allowed
user to change budget for iavf_clean_tx_irq, remove work_limit
and use define for budget.
Without this patch there would be possibility to change rx/tx-frames
and rx/tx-frames-irq, which for rx/tx-frames did nothing, while for
rx/tx-frames-irq it changed rx/tx-frames and only changed budget
for cleaning NAPI poll.
Fix VLAN addition, so that PF driver does not reject whole VLAN batch.
Add VLAN reject handling, so rejected VLANs, won't litter VLAN filter
list. Fix handling of active_(c/s)vlans, so it will be possible to
re-add VLAN filters for user.
Without this patch, after changing trust to off, with VLAN filters
saturated, no VLAN is added, due to PF rejecting addition.
Fixes: 92fc50859872 ("iavf: Restrict maximum VLAN filters for VIRTCHNL_VF_OFFLOAD_VLAN_V2") Signed-off-by: Przemyslaw Patynowski <[email protected]> Signed-off-by: Jedrzej Jagielski <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
Peter Zijlstra [Mon, 18 Jul 2022 11:41:37 +0000 (13:41 +0200)]
x86/amd: Use IBPB for firmware calls
On AMD IBRS does not prevent Retbleed; as such use IBPB before a
firmware call to flush the branch history state.
And because in order to do an EFI call, the kernel maps a whole lot of
the kernel page table into the EFI page table, do an IBPB just in case
in order to prevent the scenario of poisoning the BTB and causing an EFI
call using the unprotected RET there.
David S. Miller [Mon, 18 Jul 2022 11:44:37 +0000 (12:44 +0100)]
Merge branch 'dsa-docs'
Vladimir Oltean says:
====================
Update DSA documentation
These are some updates of dsa.rst, since it hasn't kept up with
development (in some cases, even since 2017). I've added Fixes: tags as
I thought was appropriate.
====================
Vladimir Oltean [Sat, 16 Jul 2022 18:53:44 +0000 (21:53 +0300)]
docs: net: dsa: mention that VLANs are now refcounted on shared ports
The blamed commit updated the way in which VLANs are handled at the
cross-chip notifier layer and didn't update the documentation to say
that. Fix it.
Fixes: 134ef2388e7f ("net: dsa: add explicit support for host bridge VLANs") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Vladimir Oltean [Sat, 16 Jul 2022 18:53:43 +0000 (21:53 +0300)]
docs: net: dsa: delete misinformation about -EOPNOTSUPP for FDB/MDB/VLAN
Returning -EOPNOTSUPP does *NOT* mean anything special.
port_vlan_add() is actually called from 2 code paths, one is
vlan_vid_add() from 8021q module and the other is
br_switchdev_port_vlan_add() from switchdev.
The bridge has a wrapper __vlan_vid_add() which first tries via
switchdev, then if that returns -EOPNOTSUPP, tries again via the VLAN RX
filters in the 8021q module. But DSA doesn't distinguish between one
call path and the other when calling the driver's port_vlan_add(), so if
the driver returns -EOPNOTSUPP to switchdev, it also returns -EOPNOTSUPP
to the 8021q module. And the latter is a hard error.
port_fdb_add() is called from the deferred dsa_owq only, so obviously
its return code isn't propagated anywhere, and cannot be interpreted in
any way.
The return code from port_mdb_add() is propagated to the bridge, but
again, this doesn't do anything special when -EOPNOTSUPP is returned,
but rather, br_switchdev_mdb_notify() returns void.
Vladimir Oltean [Sat, 16 Jul 2022 18:53:41 +0000 (21:53 +0300)]
docs: net: dsa: add a section for address databases
The given definition for what VID 0 represents in the current
port_fdb_add and port_mdb_add is blatantly wrong. Delete it and explain
the concepts surrounding DSA's understanding of FDB isolation.
Vladimir Oltean [Sat, 16 Jul 2022 18:53:39 +0000 (21:53 +0300)]
docs: net: dsa: remove port_vlan_dump
This was deleted in 2017, delete the obsolete documentation.
Fixes: c069fcd82c57 ("net: dsa: Remove support for bypass bridge port attributes/vlan set") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Vladimir Oltean [Sat, 16 Jul 2022 18:53:38 +0000 (21:53 +0300)]
docs: net: dsa: remove port_bridge_tx_fwd_offload
We've changed the API through which we can offload the bridge TX
forwarding process. Update the documentation in light of the removal of
2 DSA switch ops.
Fixes: b079922ba2ac ("net: dsa: add a "tx_fwd_offload" argument to ->port_bridge_join") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Vladimir Oltean [Sat, 16 Jul 2022 18:53:37 +0000 (21:53 +0300)]
docs: net: dsa: document port_fast_age
The provided information about FDB flushing is not really up to date.
The DSA core automatically calls port_fast_age() when necessary, and
drivers should just implement that rather than hooking it to
port_bridge_leave, port_stp_state_set and others.
Vladimir Oltean [Sat, 16 Jul 2022 18:53:36 +0000 (21:53 +0300)]
docs: net: dsa: document port_setup and port_teardown
These methods were added without being documented, fix that.
Fixes: fd292c189a97 ("net: dsa: tear down devlink port regions when tearing down the devlink port on error") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Vladimir Oltean [Sat, 16 Jul 2022 18:53:34 +0000 (21:53 +0300)]
docs: net: dsa: document change_tag_protocol
Support for changing the tagging protocol was added without this
operation being documented; do so now.
Fixes: 53da0ebaad10 ("net: dsa: allow changing the tag protocol via the "tagging" device attribute") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Vladimir Oltean [Sat, 16 Jul 2022 18:53:33 +0000 (21:53 +0300)]
docs: net: dsa: add more info about the other arguments to get_tag_protocol
Changes were made to the prototype of get_tag_protocol without
describing at a high level what they are about. Update the documentation
to explain that.
Fixes: 5ed4e3eb0217 ("net: dsa: Pass a port to get_tag_protocol()") Fixes: 4d776482ecc6 ("net: dsa: Get information about stacked DSA protocol") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Vladimir Oltean [Sat, 16 Jul 2022 18:53:32 +0000 (21:53 +0300)]
docs: net: dsa: rename tag_protocol to get_tag_protocol
Since the blamed commit, the enum was turned into a function pointer and
also renamed. Update the documentation.
Fixes: 7b314362a234 ("net: dsa: Allow the DSA driver to indicate the tag protocol") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Vladimir Oltean [Sat, 16 Jul 2022 18:53:31 +0000 (21:53 +0300)]
docs: net: dsa: document the shutdown behavior
Document the changes that took place in the DSA core in the blamed
commit.
Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Vladimir Oltean [Sat, 16 Jul 2022 18:53:30 +0000 (21:53 +0300)]
docs: net: dsa: update probing documentation
Since the blamed commit we don't have register_switch_driver() and
unregister_switch_driver() anymore. Additionally, the expected
dsa_register_switch() and dsa_unregister_switch() calls aren't
documented.
Update the probing section with the details of how things are currently
done.
David S. Miller [Mon, 18 Jul 2022 11:21:54 +0000 (12:21 +0100)]
Merge branch 'net-ipv4-sysctl-races-part-3'
Kuniyuki Iwashima says:
====================
sysctl: Fix data-races around ipv4_net_table (Round 3).
This series fixes data-races around 21 knobs after
igmp_link_local_mcast_reports in ipv4_net_table.
These 4 knobs are skipped because they are safe.
- tcp_congestion_control: Safe with RCU and xchg().
- tcp_available_congestion_control: Read only.
- tcp_allowed_congestion_control: Safe with RCU and spinlock().
- tcp_fastopen_key: Safe with RCU and xchg()
So, round 4 will start with fib_multipath_use_neigh.
====================
tcp: Fix data-races around sysctl_tcp_fastopen_blackhole_timeout.
While reading sysctl_tcp_fastopen_blackhole_timeout, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: cf1ef3f0719b ("net/tcp_fastopen: Disable active side TFO in certain scenarios") Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
igmp: Fix data-races around sysctl_igmp_llm_reports.
While reading sysctl_igmp_llm_reports, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
This test can be packed into a helper, so such changes will be in the
follow-up series after net is merged into net-next.
if (ipv4_is_local_multicast(pmc->multiaddr) &&
!READ_ONCE(net->ipv4.sysctl_igmp_llm_reports))
Fixes: df2cf4a78e48 ("IGMP: Inhibit reports for local multicast groups") Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
It was observed that by allowing pinctrl_amd to be loaded
later in the boot process that interrupts sent to the GPIO
controller early in the boot are not serviced. The kernel treats
these as a spurious IRQ and disables the IRQ.
This problem was exacerbated because it happened on a system with
an encrypted partition so the kernel object was not accesssible for
an extended period of time while waiting for a passphrase.
To avoid this situation from occurring, stop allowing pinctrl-amd
from being built as a module and instead require it to be built-in
or disabled.
net: prestera: acl: use proper mask for port selector
Adjusted as per packet processor documentation.
This allows to properly match 'indev' for clsact rules.
Fixes: 47327e198d42 ("net: prestera: acl: migrate to new vTCAM api") Signed-off-by: Maksym Glubokiy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Socket destruction flow and tls_device_down function sync against each
other using tls_device_lock and the context refcount, to guarantee the
device resources are freed via tls_dev_del() by the end of
tls_device_down.
In the following unfortunate flow, this won't happen:
- refcount is decreased to zero in tls_device_sk_destruct.
- tls_device_down starts, skips the context as refcount is zero, going
all the way until it flushes the gc work, and returns without freeing
the device resources.
- only then, tls_device_queue_ctx_destruction is called, queues the gc
work and frees the context's device resources.
Solve it by decreasing the refcount in the socket's destruction flow
under the tls_device_lock, for perfect synchronization. This does not
slow down the common likely destructor flow, in which both the refcount
is decreased and the spinlock is acquired, anyway.
Wong Vee Khee [Thu, 14 Jul 2022 07:54:27 +0000 (15:54 +0800)]
net: stmmac: switch to use interrupt for hw crosstimestamping
Using current implementation of polling mode, there is high chances we
will hit into timeout error when running phc2sys. Hence, update the
implementation of hardware crosstimestamping to use the MAC interrupt
service routine instead of polling for TSIS bit in the MAC Timestamp
Interrupt Status register to be set.
The blamed commit changed to use regmaps instead of __iomem. But it
didn't update the register offsets to be at word offset, so it uses byte
offset.
Another issue with the same commit is that it has a limit of 32 registers
which is incorrect. The sparx5 has 64 while lan966x has 77.
The blamed commit introduce support for lan966x which use the same
pinconf_ops as sparx5. The problem is that pinconf_ops is specific to
sparx5. More precisely the offset of the bits in the pincfg register are
different and also lan966x doesn't have support for
PIN_CONFIG_INPUT_SCHMITT_ENABLE.
Fix this by making pinconf_ops more generic such that it can be also
used by lan966x. This is done by introducing 'ocelot_pincfg_data' which
contains the offset and what is supported for each SOC.
Christian König [Fri, 15 Jul 2022 07:57:22 +0000 (09:57 +0200)]
drm/ttm: fix locking in vmap/vunmap TTM GEM helpers
I've stumbled over this while reviewing patches for DMA-buf and it looks
like we completely messed the locking up here.
In general most TTM function should only be called while holding the
appropriate BO resv lock. Without this we could break the internal
buffer object state here.
Merge tag 'perf-tools-fixes-for-v5.19-2022-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix SIGSEGV when processing syscall args in perf.data files in 'perf
trace'
- Sync kvm, msr-index and cpufeatures headers with the kernel sources
- Fix 'convert perf time to TSC' 'perf test':
- No need to open events twice
- Fix finding correct event on hybrid systems
* tag 'perf-tools-fixes-for-v5.19-2022-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf trace: Fix SIGSEGV when processing syscall args
perf tests: Fix Convert perf time to TSC test for hybrid
perf tests: Stop Convert perf time to TSC test opening events twice
tools arch x86: Sync the msr-index.h copy with the kernel sources
tools headers cpufeatures: Sync with the kernel sources
tools headers UAPI: Sync linux/kvm.h with the kernel sources
Matthew Auld [Tue, 12 Jul 2022 17:40:50 +0000 (18:40 +0100)]
drm/i915/ttm: fix 32b build
Since segment_pages is no longer a compile time constant, it looks the
DIV_ROUND_UP(node->size, segment_pages) breaks the 32b build. Simplest
is just to use the ULL variant, but really we should need not need more
than u32 for the page alignment (also we are limited by that due to the
sg->length type), so also make it all u32.
Merge tag 'perf_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Borislav Petkov:
- A single data race fix on the perf event cleanup path to avoid
endless loops due to insufficient locking
* tag 'perf_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix data race between perf_event_set_output() and perf_mmap_close()
Merge tag 'x86_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Improve the check whether the kernel supports WP mappings so that it
can accomodate a XenPV guest due to how the latter is setting up the
PAT machinery
- Now that the retbleed nightmare is public, here's the first round of
fallout fixes:
* Fix a build failure on 32-bit due to missing include
* Remove an untraining point in espfix64 return path
* other small cleanups
* tag 'x86_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Remove apostrophe typo
um: Add missing apply_returns()
x86/entry: Remove UNTRAIN_RET from native_irq_return_ldt
x86/bugs: Mark retbleed_strings static
x86/pat: Fix x86_has_pat_wp()
x86/asm/32: Fix ANNOTATE_UNRET_SAFE use on 32-bit
Merge tag 'input-for-v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- fix Goodix driver to properly behave on the Aya Neo Next
- some more sanity checks in usbtouchscreen driver
- a tweak in wm97xx driver in preparation for remove() to return void
- a clarification in input core regarding units of measurement for
resolution on touch events.
* tag 'input-for-v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: document the units for resolution of size axes
Input: goodix - call acpi_device_fix_up_power() in some cases
Input: wm97xx - make .remove() obviously always return 0
Input: usbtouchscreen - add driver_info sanity check
perf trace: Fix SIGSEGV when processing syscall args
On powerpc, 'perf trace' is crashing with a SIGSEGV when trying to
process a perf.data file created with 'perf trace record -p':
#0 0x00000001225b8988 in syscall_arg__scnprintf_augmented_string <snip> at builtin-trace.c:1492
#1 syscall_arg__scnprintf_filename <snip> at builtin-trace.c:1492
#2 syscall_arg__scnprintf_filename <snip> at builtin-trace.c:1486
#3 0x00000001225bdd9c in syscall_arg_fmt__scnprintf_val <snip> at builtin-trace.c:1973
#4 syscall__scnprintf_args <snip> at builtin-trace.c:2041
#5 0x00000001225bff04 in trace__sys_enter <snip> at builtin-trace.c:2319
That points to the below code in tools/perf/builtin-trace.c:
/*
* If this is raw_syscalls.sys_enter, then it always comes with the 6 possible
* arguments, even if the syscall being handled, say "openat", uses only 4 arguments
* this breaks syscall__augmented_args() check for augmented args, as we calculate
* syscall->args_size using each syscalls:sys_enter_NAME tracefs format file,
* so when handling, say the openat syscall, we end up getting 6 args for the
* raw_syscalls:sys_enter event, when we expected just 4, we end up mistakenly
* thinking that the extra 2 u64 args are the augmented filename, so just check
* here and avoid using augmented syscalls when the evsel is the raw_syscalls one.
*/
if (evsel != trace->syscalls.events.sys_enter)
augmented_args = syscall__augmented_args(sc, sample, &augmented_args_size, trace->raw_augmented_syscalls_args_size);
As the comment points out, we should not be trying to augment the args
for raw_syscalls. However, when processing a perf.data file, we are not
initializing those properly. Fix the same.
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
$ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
$ diff -u before after
$
Just silences this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h