Jakub Kicinski [Thu, 29 Feb 2024 00:59:11 +0000 (16:59 -0800)]
selftests: kselftest_harness: generate test name once
Since we added variant support generating full test case
name takes 4 string arguments. We're about to need it
in another two places. Stop the duplication and print
once into a temporary buffer.
Mickaël Salaün [Thu, 29 Feb 2024 00:59:09 +0000 (16:59 -0800)]
selftests/harness: Merge TEST_F_FORK() into TEST_F()
Replace Landlock-specific TEST_F_FORK() with an improved TEST_F() which
brings four related changes:
Run TEST_F()'s tests in a grandchild process to make it possible to
drop privileges and delegate teardown to the parent.
Compared to TEST_F_FORK(), simplify handling of the test grandchild
process thanks to vfork(2), and makes it generic (e.g. no explicit
conversion between exit code and _metadata).
Compared to TEST_F_FORK(), run teardown even when tests failed with an
assert thanks to commit 63e6b2a42342 ("selftests/harness: Run TEARDOWN
for ASSERT failures").
Simplify the test harness code by removing the no_print and step fields
which are not used. I added this feature just after I made
kselftest_harness.h more broadly available but this step counter
remained even though it wasn't needed after all. See commit 369130b63178
("selftests: Enhance kselftest_harness.h to print which assert failed").
Replace spaces with tabs in one line of __TEST_F_IMPL().
Mickaël Salaün [Thu, 29 Feb 2024 00:59:08 +0000 (16:59 -0800)]
selftests/landlock: Redefine TEST_F() as TEST_F_FORK()
This has the effect of creating a new test process for either TEST_F()
or TEST_F_FORK(), which doesn't change tests but will ease potential
backports. See next commit for the TEST_F_FORK() merge into TEST_F().
Justin Chen [Wed, 28 Feb 2024 22:53:59 +0000 (14:53 -0800)]
net: bcmasp: Keep buffers through power management
There is no advantage of freeing and re-allocating buffers through
suspend and resume. This waste cycles and makes suspend/resume time
longer. We also open ourselves to failed allocations in systems with
heavy memory fragmentation.
David S. Miller [Fri, 1 Mar 2024 08:56:39 +0000 (08:56 +0000)]
Merge branch 'qcom-phy-possible'
Robert Marko says:
====================
net: phy: qcom: qca808x: fill in possible_interfaces
QCA808x does not currently fill in the possible_interfaces.
This leads to Phylink not being aware that it supports 2500Base-X as well
so in cases where it is connected to a DSA switch like MV88E6393 it will
limit that port to phy-mode set in the DTS.
That means that if SGMII is used you are limited to 1G only while if
2500Base-X was set you are limited to 2.5G only.
Populating the possible_interfaces fixes this.
Changes in v2:
* Get rid of the if/else by Russels suggestion in the helper
====================
Robert Marko [Wed, 28 Feb 2024 17:24:10 +0000 (18:24 +0100)]
net: phy: qcom: qca808x: fill in possible_interfaces
Currently QCA808x driver does not fill the possible_interfaces.
2.5G QCA808x support SGMII and 2500Base-X while 1G model only supports
SGMII, so fill the possible_interfaces accordingly.
Robert Marko [Wed, 28 Feb 2024 17:24:09 +0000 (18:24 +0100)]
net: phy: qcom: qca808x: add helper for checking for 1G only model
There are 2 versions of QCA808x, one 2.5G capable and one 1G capable.
Currently, this matter only in the .get_features call however, it will
be required for filling supported interface modes so lets add a helper
that can be reused.
Eric Dumazet [Wed, 28 Feb 2024 13:54:39 +0000 (13:54 +0000)]
ipv6: use xa_array iterator to implement inet6_netconf_dump_devconf()
1) inet6_netconf_dump_devconf() can run under RCU protection
instead of RTNL.
2) properly return 0 at the end of a dump, avoiding an
an extra recvmsg() system call.
3) Do not use inet6_base_seq() anymore, for_each_netdev_dump()
has nice properties. Restarting a GETDEVCONF dump if a device has
been added/removed or if net->ipv6.dev_addr_genid has changed is moot.
net/mptcp/protocol.c adf1bb78dab5 ("mptcp: fix snd_wnd initialization for passive socket") 9426ce476a70 ("mptcp: annotate lockless access for RX path fields")
https://lore.kernel.org/all/20240228103048.19255709@canb.auug.org.au/
Adjacent changes:
drivers/dpll/dpll_core.c 0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()") e7f8df0e81bf ("dpll: move xa_erase() call in to match dpll_pin_alloc() error path order")
drivers/net/veth.c 1ce7d306ea63 ("veth: try harder when allocating queue memory") 0bef512012b1 ("net: add netdev_lockdep_set_classes() to virtual drivers")
drivers/net/wireless/intel/iwlwifi/mvm/d3.c 8c9bef26e98b ("wifi: iwlwifi: mvm: d3: implement suspend with MLO") 78f65fbf421a ("wifi: iwlwifi: mvm: ensure offloading TID queue exists")
net/wireless/nl80211.c f78c1375339a ("wifi: nl80211: reject iftype change with mesh ID change") 414532d8aa89 ("wifi: cfg80211: use IEEE80211_MAX_MESH_ID_LEN appropriately")
Linus Torvalds [Thu, 29 Feb 2024 20:40:20 +0000 (12:40 -0800)]
Merge tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, WiFi and netfilter.
We have one outstanding issue with the stmmac driver, which may be a
LOCKDEP false positive, not a blocker.
Current release - regressions:
- netfilter: nf_tables: re-allow NFPROTO_INET in
nft_(match/target)_validate()
- eth: ionic: fix error handling in PCI reset code
Current release - new code bugs:
- eth: stmmac: complete meta data only when enabled, fix null-deref
- kunit: fix again checksum tests on big endian CPUs
Previous releases - regressions:
- veth: try harder when allocating queue memory
- Bluetooth:
- hci_bcm4377: do not mark valid bd_addr as invalid
- hci_event: fix handling of HCI_EV_IO_CAPA_REQUEST
Previous releases - always broken:
- info leak in __skb_datagram_iter() on netlink socket
- mptcp:
- map v4 address to v6 when destroying subflow
- fix potential wake-up event loss due to sndbuf auto-tuning
- fix double-free on socket dismantle
- wifi: nl80211: reject iftype change with mesh ID change
- fix small out-of-bound read when validating netlink be16/32 types
- rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
- ipv6: fix potential "struct net" ref-leak in inet6_rtm_getaddr()
- ip_tunnel: prevent perpetual headroom growth with huge number of
tunnels on top of each other
- mctp: fix skb leaks on error paths of mctp_local_output()
- eth: ice: fixes for DPLL state reporting
- dpll: rely on rcu for netdev_dpll_pin() to prevent UaF
- eth: dpaa: accept phy-interface-type = '10gbase-r' in the device
tree"
* tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits)
dpll: fix build failure due to rcu_dereference_check() on unknown type
kunit: Fix again checksum tests on big endian CPUs
tls: fix use-after-free on failed backlog decryption
tls: separate no-async decryption request handling from async
tls: fix peeking with sync+async decryption
tls: decrement decrypt_pending if no async completion will be called
gtp: fix use-after-free and null-ptr-deref in gtp_newlink()
net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames
igb: extend PTP timestamp adjustments to i211
rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
tools: ynl: fix handling of multiple mcast groups
selftests: netfilter: add bridge conntrack + multicast test case
netfilter: bridge: confirm multicast packets before passing them up the stack
netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
Bluetooth: qca: Fix triggering coredump implementation
Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT
Bluetooth: qca: Fix wrong event type for patch config command
Bluetooth: Enforce validation on max value of connection interval
Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
Bluetooth: mgmt: Fix limited discoverable off timeout
...
Christophe Leroy [Fri, 23 Feb 2024 10:41:52 +0000 (11:41 +0100)]
kunit: Fix again checksum tests on big endian CPUs
Commit b38460bc463c ("kunit: Fix checksum tests on big endian CPUs")
fixed endianness issues with kunit checksum tests, but then
commit 6f4c45cbcb00 ("kunit: Add tests for csum_ipv6_magic and
ip_fast_csum") introduced new issues on big endian CPUs. Those issues
are once again reflected by the warnings reported by sparse.
So, fix them with the same approach, perform proper conversion in
order to support both little and big endian CPUs. Once the conversions
are properly done and the right types used, the sparse warnings are
cleared as well.
Jakub Kicinski [Thu, 29 Feb 2024 17:10:24 +0000 (09:10 -0800)]
Merge tag 'for-net-2024-02-28' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- mgmt: Fix limited discoverable off timeout
- hci_qca: Set BDA quirk bit if fwnode exists in DT
- hci_bcm4377: do not mark valid bd_addr as invalid
- hci_sync: Check the correct flag before starting a scan
- Enforce validation on max value of connection interval
- hci_sync: Fix accept_list when attempting to suspend
- hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
- Avoid potential use-after-free in hci_error_reset
- rfcomm: Fix null-ptr-deref in rfcomm_check_security
- hci_event: Fix wrongly recorded wakeup BD_ADDR
- qca: Fix wrong event type for patch config command
- qca: Fix triggering coredump implementation
* tag 'for-net-2024-02-28' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: qca: Fix triggering coredump implementation
Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT
Bluetooth: qca: Fix wrong event type for patch config command
Bluetooth: Enforce validation on max value of connection interval
Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
Bluetooth: mgmt: Fix limited discoverable off timeout
Bluetooth: hci_event: Fix wrongly recorded wakeup BD_ADDR
Bluetooth: rfcomm: Fix null-ptr-deref in rfcomm_check_security
Bluetooth: hci_sync: Fix accept_list when attempting to suspend
Bluetooth: Avoid potential use-after-free in hci_error_reset
Bluetooth: hci_sync: Check the correct flag before starting a scan
Bluetooth: hci_bcm4377: do not mark valid bd_addr as invalid
====================
====================
tls: a few more fixes for async decrypt
The previous patchset [1] took care of "full async". This adds a few
fixes for cases where only part of the crypto operations go the async
route, found by extending my previous debug patch [2] to do N
synchronous operations followed by M asynchronous ops (with N and M
configurable).
Sabrina Dubroca [Wed, 28 Feb 2024 22:44:00 +0000 (23:44 +0100)]
tls: fix use-after-free on failed backlog decryption
When the decrypt request goes to the backlog and crypto_aead_decrypt
returns -EBUSY, tls_do_decryption will wait until all async
decryptions have completed. If one of them fails, tls_do_decryption
will return -EBADMSG and tls_decrypt_sg jumps to the error path,
releasing all the pages. But the pages have been passed to the async
callback, and have already been released by tls_decrypt_done.
The only true async case is when crypto_aead_decrypt returns
-EINPROGRESS. With -EBUSY, we already waited so we can tell
tls_sw_recvmsg that the data is available for immediate copy, but we
need to notify tls_decrypt_sg (via the new ->async_done flag) that the
memory has already been released.
Sabrina Dubroca [Wed, 28 Feb 2024 22:43:59 +0000 (23:43 +0100)]
tls: separate no-async decryption request handling from async
If we're not doing async, the handling is much simpler. There's no
reference counting, we just need to wait for the completion to wake us
up and return its result.
We should preferably also use a separate crypto_wait. I'm not seeing a
UAF as I did in the past, I think aec7961916f3 ("tls: fix race between
async notify and socket close") took care of it.
Sabrina Dubroca [Wed, 28 Feb 2024 22:43:58 +0000 (23:43 +0100)]
tls: fix peeking with sync+async decryption
If we peek from 2 records with a currently empty rx_list, and the
first record is decrypted synchronously but the second record is
decrypted async, the following happens:
1. decrypt record 1 (sync)
2. copy from record 1 to the userspace's msg
3. queue the decrypted record to rx_list for future read(!PEEK)
4. decrypt record 2 (async)
5. queue record 2 to rx_list
6. call process_rx_list to copy data from the 2nd record
We currently pass copied=0 as skip offset to process_rx_list, so we
end up copying once again from the first record. We should skip over
the data we've already copied.
Seen with selftest tls.12_aes_gcm.recv_peek_large_buf_mult_recs
Sabrina Dubroca [Wed, 28 Feb 2024 22:43:57 +0000 (23:43 +0100)]
tls: decrement decrypt_pending if no async completion will be called
With mixed sync/async decryption, or failures of crypto_aead_decrypt,
we increment decrypt_pending but we never do the corresponding
decrement since tls_decrypt_done will not be called. In this case, we
should decrement decrypt_pending immediately to avoid getting stuck.
For example, the prequeue prequeue test gets stuck with mixed
modes (one async decrypt + one sync decrypt).
The commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf") added a field in struct_netdevice, which tells what
type of statistics the driver supports.
That field is used primarily to allocate stats structures automatically,
but, it also could leveraged to simplify the drivers even further, such
as, if the driver relies in the default stats collection, then it
doesn't need to assign to .ndo_get_stats64. That means that drivers only
assign functions to .ndo_get_stats64 if they are using something
special.
I started to move some of these drivers[1][2][3] to use the core
allocation, and with this change in, I just need to touch the driver
once, and be able to simplify the whole stats allocation and collection
for generic case.
There are 44 devices today that could benefit from this simplification.
Breno Leitao [Wed, 28 Feb 2024 11:31:22 +0000 (03:31 -0800)]
net: sit: Do not set .ndo_get_stats64
If the driver is using the network core allocation mechanism, by setting
NETDEV_PCPU_STAT_TSTATS, as this driver is, then, it doesn't need to set
the dev_get_tstats64() generic .ndo_get_stats64 function pointer. Since
the network core calls it automatically, and .ndo_get_stats64 should
only be set if the driver needs special treatment.
This simplifies the driver, since all the generic statistics is now
handled by core.
Breno Leitao [Wed, 28 Feb 2024 11:31:21 +0000 (03:31 -0800)]
net: get stats64 if device if driver is configured
If the network driver is relying in the net core to do stats allocation,
then we want to dev_get_tstats64() instead of netdev_stats_to_stats64(),
since there are per-cpu stats that needs to be taken in consideration.
This will also simplify the drivers in regard to statistics. Once the
driver sets NETDEV_PCPU_STAT_TSTATS, it doesn't not need to allocate the
stacks, neither it needs to set `.ndo_get_stats64 = dev_get_tstats64`
for the generic stats collection function anymore.
Paolo Abeni [Thu, 29 Feb 2024 11:16:07 +0000 (12:16 +0100)]
Merge tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
Patch #1 restores NFPROTO_INET with nft_compat, from Ignat Korchagin.
Patch #2 fixes an issue with bridge netfilter and broadcast/multicast
packets.
There is a day 0 bug in br_netfilter when used with connection tracking.
Conntrack assumes that an nf_conn structure that is not yet added to
hash table ("unconfirmed"), is only visible by the current cpu that is
processing the sk_buff.
For bridge this isn't true, sk_buff can get cloned in between, and
clones can be processed in parallel on different cpu.
This patch disables NAT and conntrack helpers for multicast packets.
Patch #3 adds a selftest to cover for the br_netfilter bug.
netfilter pull request 24-02-29
* tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
selftests: netfilter: add bridge conntrack + multicast test case
netfilter: bridge: confirm multicast packets before passing them up the stack
netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
====================
The current code adds extra two bytes (i.e. sizeof(struct hsr_sup_tlv))
when offset for skb_pull() is calculated.
This is wrong, as both 'struct hsrv1_ethhdr_sp' and 'hsrv0_ethhdr_sp'
already have 'struct hsr_sup_tag' defined in them, so there is no need
for adding extra two bytes.
This code was working correctly as with no RedBox support, the check for
HSR_TLV_EOT (0x00) was off by two bytes, which were corresponding to
zeroed padded bytes for minimal packet size.
====================
net: dsa: mv88e6xxx: add Amethyst specific SMI GPIO function
Amethyst family (MV88E6191X/6193X/6393X) has a simplified SMI GPIO setting
via the Scratch and Misc register so it requires family specific function.
In the v1 review, Andrew pointed out that it would make sense to rename the
existing mv88e6xxx_g2_scratch_gpio_set_smi as it only works on the MV6390
family.
Changes in v2:
* Add rename of mv88e6xxx_g2_scratch_gpio_set_smi to
mv88e6390_g2_scratch_gpio_set_smi
====================
Robert Marko [Tue, 27 Feb 2024 17:54:22 +0000 (18:54 +0100)]
net: dsa: mv88e6xxx: add Amethyst specific SMI GPIO function
The existing mv88e6390_g2_scratch_gpio_set_smi() cannot be used on the
88E6393X as it requires certain P0_MODE, it also checks the CPU mode
as it impacts the bit setting value.
This is all irrelevant for Amethyst (MV88E6191X/6193X/6393X) as only
the default value of the SMI_PHY Config bit is set to CPU_MGD bootstrap
pin value but it can be changed without restrictions so that GPIO pins
9 and 10 are used as SMI pins.
So, introduce Amethyst specific function and call that if the Amethyst
family wants to setup the external PHY.
The name mv88e6xxx_g2_scratch_gpio_set_smi is a bit ambiguous as it appears
to only be applicable to the 6390 family, so lets rename it to
mv88e6390_g2_scratch_gpio_set_smi to make it more obvious.
Eric Dumazet [Tue, 27 Feb 2024 22:22:59 +0000 (22:22 +0000)]
inet6: expand rcu_read_lock() scope in inet6_dump_addr()
I missed that inet6_dump_addr() is calling in6_dump_addrs()
from two points.
First one under RTNL protection, and second one under rcu_read_lock().
Since we want to remove RTNL use from inet6_dump_addr() very soon,
no longer assume in6_dump_addrs() is protected by RTNL (even
if this is still the case).
Use rcu_read_lock() earlier to fix this lockdep splat:
Eric Dumazet [Tue, 27 Feb 2024 21:01:04 +0000 (21:01 +0000)]
net: call skb_defer_free_flush() from __napi_busy_loop()
skb_defer_free_flush() is currently called from net_rx_action()
and napi_threaded_poll().
We should also call it from __napi_busy_loop() otherwise
there is the risk the percpu queue can grow until an IPI
is forced from skb_attempt_defer_free() adding a latency spike.
Oleksij Rempel [Tue, 27 Feb 2024 18:49:41 +0000 (10:49 -0800)]
igb: extend PTP timestamp adjustments to i211
The i211 requires the same PTP timestamp adjustments as the i210,
according to its datasheet. To ensure consistent timestamping across
different platforms, this change extends the existing adjustments to
include the i211.
The adjustment result are tested and comparable for i210 and i211 based
systems.
Breno Leitao [Tue, 27 Feb 2024 18:23:36 +0000 (10:23 -0800)]
net: bridge: Do not allocate stats in the driver
With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core
instead of this driver.
With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.
Remove the allocation in the bridge driver and leverage the network
core allocation.
Lin Ma [Tue, 27 Feb 2024 12:11:28 +0000 (20:11 +0800)]
rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
In the commit d73ef2d69c0d ("rtnetlink: let rtnl_bridge_setlink checks
IFLA_BRIDGE_MODE length"), an adjustment was made to the old loop logic
in the function `rtnl_bridge_setlink` to enable the loop to also check
the length of the IFLA_BRIDGE_MODE attribute. However, this adjustment
removed the `break` statement and led to an error logic of the flags
writing back at the end of this function.
if (have_flags)
memcpy(nla_data(attr), &flags, sizeof(flags));
// attr should point to IFLA_BRIDGE_FLAGS NLA !!!
Before the mentioned commit, the `attr` is granted to be IFLA_BRIDGE_FLAGS.
However, this is not necessarily true fow now as the updated loop will let
the attr point to the last NLA, even an invalid NLA which could cause
overflow writes.
This patch introduces a new variable `br_flag` to save the NLA pointer
that points to IFLA_BRIDGE_FLAGS and uses it to resolve the mentioned
error logic.
Zhengchao Shao [Tue, 27 Feb 2024 09:36:04 +0000 (17:36 +0800)]
netlabel: remove impossible return value in netlbl_bitmap_walk
Since commit 446fda4f2682 ("[NetLabel]: CIPSOv4 engine"), *bitmap_walk
function only returns -1. Nearly 18 years have passed, -2 scenes never
come up, so there's no need to consider it.
Eric Dumazet [Tue, 27 Feb 2024 09:24:11 +0000 (09:24 +0000)]
inet: use xa_array iterator to implement inet_netconf_dump_devconf()
1) inet_netconf_dump_devconf() can run under RCU protection
instead of RTNL.
2) properly return 0 at the end of a dump, avoiding an
an extra recvmsg() system call.
3) Do not use inet_base_seq() anymore, for_each_netdev_dump()
has nice properties. Restarting a GETDEVCONF dump if a device has
been added/removed or if net->ipv4.dev_addr_genid has changed is moot.
Catalin Popescu [Mon, 26 Feb 2024 16:23:39 +0000 (17:23 +0100)]
net: phy: dp83826: disable WOL at init
Commit d1d77120bc28 ("net: phy: dp83826: support TX data voltage tuning")
introduced a regression in that WOL is not disabled by default for DP83826.
WOL should normally be enabled through ethtool.
Chengming Zhou [Wed, 28 Feb 2024 03:06:58 +0000 (03:06 +0000)]
net: remove SLAB_MEM_SPREAD flag usage
The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was
removed as of v6.8-rc1, so it became a dead flag since the commit 16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"). And the
series[1] went on to mark it obsolete to avoid confusion for users.
Here we can just remove all its users, which has no functional change.
Jakub Kicinski [Wed, 28 Feb 2024 23:25:47 +0000 (15:25 -0800)]
Merge branch 'tools-ynl-stop-using-libmnl'
Jakub Kicinski says:
====================
tools: ynl: stop using libmnl
There is no strong reason to stop using libmnl in ynl but there
are a few small ones which add up.
First (as I remembered immediately after hitting send on v1),
C++ compilers do not like the libmnl for_each macros.
I haven't tried it myself, but having all the code directly
in YNL makes it easier for folks porting to C++ to modify them
and/or make YNL more C++ friendly.
Second, we do much more advanced netlink level parsing in ynl
than libmnl so it's hard to say that libmnl abstracts much from us.
The fact that this series, removing the libmnl dependency, only
adds <300 LoC shows that code savings aren't huge.
OTOH when new types are added (e.g. auto-int) we need to add
compatibility to deal with older version of libmnl (in fact,
even tho patches have been sent months ago, auto-ints are still
not supported in libmnl.git).
Thrid, the dependency makes ynl less self contained, and harder
to vendor in. Whether vendoring libraries into projects is a good
idea is a separate discussion, nonetheless, people want to do it.
Fourth, there are small annoyances with the libmnl APIs which
are hard to fix in backward-compatible ways. See the last patch
for example.
All in all, libmnl is a great library, but with all the code
generation and structured parsing, ynl is better served by going
its own way.
Jakub Kicinski [Tue, 27 Feb 2024 22:30:32 +0000 (14:30 -0800)]
tools: ynl: use MSG_DONTWAIT for getting notifications
To stick to libmnl wrappers in the past we had to use poll()
to check if there are any outstanding notifications on the socket.
This is no longer necessary, we can use MSG_DONTWAIT.
Jakub Kicinski [Tue, 27 Feb 2024 22:30:30 +0000 (14:30 -0800)]
tools: ynl: stop using mnl socket helpers
Most libmnl socket helpers can be replaced by direct calls to
the underlying libc API. We need portid, the netlink manpage
suggests we bind() address of zero.
Jakub Kicinski [Tue, 27 Feb 2024 22:30:27 +0000 (14:30 -0800)]
tools: ynl: stop using mnl_cb_run2()
There's only one set of callbacks in YNL, for netlink control
messages, and most of them are trivial. So implement the message
walking directly without depending on mnl_cb_run2().
Jakub Kicinski [Tue, 27 Feb 2024 22:30:26 +0000 (14:30 -0800)]
tools: ynl: use ynl_sock_read_msgs() for ACK handling
ynl_recv_ack() is simple and it's the only user of mnl_cb_run().
Now that ynl_sock_read_msgs() exists it's actually less code
to use ynl_sock_read_msgs() instead of being special.
Jakub Kicinski [Tue, 27 Feb 2024 22:30:25 +0000 (14:30 -0800)]
tools: ynl: wrap recv() + mnl_cb_run2() into a single helper
All callers to mnl_cb_run2() call mnl_socket_recvfrom() right before.
Wrap the two in a helper, take typed arguments (struct ynl_parse_arg),
instead of hoping that all callers remember that parser error handling
requires yarg.
In case of ynl_sock_read_family() we will no longer check for kernel
returning no data, but that would be a kernel bug, not worth complicating
the code to catch this. Calling mnl_cb_run2() on an empty buffer
is legal and results in STOP (1).
Jakub Kicinski [Tue, 27 Feb 2024 22:30:24 +0000 (14:30 -0800)]
tools: ynl-gen: remove unused parse code
Commit f2ba1e5e2208 ("tools: ynl-gen: stop generating common notification handlers")
removed the last caller of the parse_cb_run() helper.
We no longer need to export ynl_cb_array.
Jakub Kicinski [Tue, 27 Feb 2024 22:30:23 +0000 (14:30 -0800)]
tools: ynl: make yarg the first member of struct ynl_dump_state
All YNL parsing code expects a pointer to struct ynl_parse_arg AKA yarg.
For dump was pass in struct ynl_dump_state, which works fine, because
struct ynl_dump_state and struct ynl_parse_arg have identical layout
for the members that matter.. but it's a bit hacky.
Jakub Kicinski [Tue, 27 Feb 2024 22:30:19 +0000 (14:30 -0800)]
tools: ynl: create local attribute helpers
Don't use mnl attr helpers, we're trying to remove the libmnl
dependency. Create both signed and unsigned helpers, libmnl
had unsigned helpers, so code generator no longer needs
the mnl_type() hack.
The new helpers are written from first principles, but are
hopefully not too buggy.
Jakub Kicinski [Tue, 27 Feb 2024 22:30:18 +0000 (14:30 -0800)]
tools: ynl: give up on libmnl for auto-ints
The temporary auto-int helpers are not really correct.
We can't treat signed and unsigned ints the same when
determining whether we need full 8B. I realized this
before sending the patch to add support in libmnl.
Unfortunately, that patch has not been merged,
so time to fix our local helpers. Use the mnl* name
for now, subsequent patches will address that.
Jakub Kicinski [Mon, 26 Feb 2024 22:58:06 +0000 (14:58 -0800)]
tools: ynl: protect from old OvS headers
Since commit 7c59c9c8f202 ("tools: ynl: generate code for ovs families")
we need relatively recent OvS headers to get YNL to compile.
Add the direct include workaround to fix compilation on less
up-to-date OSes like CentOS 9.
Florian Westphal [Tue, 27 Feb 2024 15:17:51 +0000 (16:17 +0100)]
netfilter: bridge: confirm multicast packets before passing them up the stack
conntrack nf_confirm logic cannot handle cloned skbs referencing
the same nf_conn entry, which will happen for multicast (broadcast)
frames on bridges.
Example:
macvlan0
|
br0
/ \
ethX ethY
ethX (or Y) receives a L2 multicast or broadcast packet containing
an IP packet, flow is not yet in conntrack table.
1. skb passes through bridge and fake-ip (br_netfilter)Prerouting.
-> skb->_nfct now references a unconfirmed entry
2. skb is broad/mcast packet. bridge now passes clones out on each bridge
interface.
3. skb gets passed up the stack.
4. In macvlan case, macvlan driver retains clone(s) of the mcast skb
and schedules a work queue to send them out on the lower devices.
The clone skb->_nfct is not a copy, it is the same entry as the
original skb. The macvlan rx handler then returns RX_HANDLER_PASS.
5. Normal conntrack hooks (in NF_INET_LOCAL_IN) confirm the orig skb.
The Macvlan broadcast worker and normal confirm path will race.
This race will not happen if step 2 already confirmed a clone. In that
case later steps perform skb_clone() with skb->_nfct already confirmed (in
hash table). This works fine.
But such confirmation won't happen when eb/ip/nftables rules dropped the
packets before they reached the nf_confirm step in postrouting.
Pablo points out that nf_conntrack_bridge doesn't allow use of stateful
nat, so we can safely discard the nf_conn entry and let inet call
conntrack again.
This doesn't work for bridge netfilter: skb could have a nat
transformation. Also bridge nf prevents re-invocation of inet prerouting
via 'sabotage_in' hook.
Work around this problem by explicit confirmation of the entry at LOCAL_IN
time, before upper layer has a chance to clone the unconfirmed entry.
The downside is that this disables NAT and conntrack helpers.
Alternative fix would be to add locking to all code parts that deal with
unconfirmed packets, but even if that could be done in a sane way this
opens up other problems, for example:
For multicast case, only one of such conflicting mappings will be
created, conntrack only handles 1:1 NAT mappings.
Users should set create a setup that explicitly marks such traffic
NOTRACK (conntrack bypass) to avoid this, but we cannot auto-bypass
them, ruleset might have accept rules for untracked traffic already,
so user-visible behaviour would change.
Ignat Korchagin [Thu, 22 Feb 2024 10:33:08 +0000 (10:33 +0000)]
netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
Commit d0009effa886 ("netfilter: nf_tables: validate NFPROTO_* family") added
some validation of NFPROTO_* families in the nft_compat module, but it broke
the ability to use legacy iptables modules in dual-stack nftables.
While with legacy iptables one had to independently manage IPv4 and IPv6
tables, with nftables it is possible to have dual-stack tables sharing the
rules. Moreover, it was possible to use rules based on legacy iptables
match/target modules in dual-stack nftables.
As an example, the program from [2] creates an INET dual-stack family table
using an xt_bpf based rule, which looks like the following (the actual output
was generated with a patched nft tool as the current nft tool does not parse
dual stack tables with legacy match rules, so consider it for illustrative
purposes only):
After d0009effa886 ("netfilter: nf_tables: validate NFPROTO_* family") we get
EOPNOTSUPP for the above program.
Fix this by allowing NFPROTO_INET for nft_(match/target)_validate(), but also
restrict the functions to classic iptables hooks.
Changes in v3:
* clarify that upstream nft will not display such configuration properly and
that the output was generated with a patched nft tool
* remove example program from commit description and link to it instead
* no code changes otherwise
Changes in v2:
* restrict nft_(match/target)_validate() to classic iptables hooks
* rewrite example program to use unmodified libnftnl
Linus Torvalds [Wed, 28 Feb 2024 20:20:00 +0000 (12:20 -0800)]
Merge tag 'acpi-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Revert a recent EC driver change that introduced an unexpected and
undesirable user-visible difference in behavior (Rafael Wysocki)"
* tag 'acpi-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI: EC: Use a spin lock without disabing interrupts"
Linus Torvalds [Wed, 28 Feb 2024 20:18:31 +0000 (12:18 -0800)]
Merge tag 'pm-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix a latent bug in the intel-pstate cpufreq driver that has been
exposed by the recent schedutil governor changes (Doug Smythies)"
* tag 'pm-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: fix pstate limits enforcement for adjust_perf call back
Linus Torvalds [Wed, 28 Feb 2024 19:16:19 +0000 (11:16 -0800)]
Merge tag 'spi-fix-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"There's two things here - the big one is a batch of fixes for the
power management in the Cadence QuadSPI driver which had some serious
issues with runtime PM and there's also a revert of one of the last
batch of fixes for ppc4xx which has a dependency on -next but was in
between two mainline fixes so the -next dependency got missed.
The ppc4xx driver is not currently included in any defconfig and has
dependencies that exclude it from allmodconfigs so none of the CI
systems catch issues with it, hence the need for the earlier fixes
series. There's some updates to the PowerPC configs to address this"
* tag 'spi-fix-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: Drop mismerged fix
spi: cadence-qspi: add system-wide suspend and resume callbacks
spi: cadence-qspi: put runtime in runtime PM hooks names
spi: cadence-qspi: remove system-wide suspend helper calls from runtime PM hooks
spi: cadence-qspi: fix pointer reference in runtime PM hooks
Linus Torvalds [Wed, 28 Feb 2024 19:10:27 +0000 (11:10 -0800)]
Merge tag 'regulator-fix-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"Two small fixes, one small update for the max5970 driver bringing the
driver and DT binding documentation into sync plus a missed update to
the patterns in MAINTAINERS after a DT binding YAML conversion"
* tag 'regulator-fix-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: max5970: Fix regulator child node name
MAINTAINERS: repair entry for MICROCHIP MCP16502 PMIC DRIVER
Linus Torvalds [Wed, 28 Feb 2024 17:30:26 +0000 (09:30 -0800)]
Merge tag 'v6.8-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes a regression in lskcipher and an out-of-bound access
in arm64/neonbs"
* tag 'v6.8-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: arm64/neonbs - fix out-of-bounds access on short input
crypto: lskcipher - Copy IV in lskcipher glue code always
hci_coredump_qca() uses __hci_cmd_sync() to send a vendor-specific command
to trigger firmware coredump, but the command does not have any event as
its sync response, so it is not suitable to use __hci_cmd_sync(), fixed by
using __hci_cmd_send().
Fixes: 06d3fdfcdf5c ("Bluetooth: hci_qca: Add qcom devcoredump support") Signed-off-by: Zijun Hu <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT
BT adapter going into UNCONFIGURED state during BT turn ON when
devicetree has no local-bd-address node.
Bluetooth will not work out of the box on such devices, to avoid this
problem, added check to set HCI_QUIRK_USE_BDADDR_PROPERTY based on
local-bd-address node entry.
When this quirk is not set, the public Bluetooth address read by host
from controller though HCI Read BD Address command is
considered as valid.
Fixes: e668eb1e1578 ("Bluetooth: hci_core: Don't stop BT if the BD address missing in dts") Signed-off-by: Janaki Ramaiah Thota <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Zijun Hu [Fri, 19 Jan 2024 09:45:30 +0000 (17:45 +0800)]
Bluetooth: qca: Fix wrong event type for patch config command
Vendor-specific command patch config has HCI_Command_Complete event as
response, but qca_send_patch_config_cmd() wrongly expects vendor-specific
event for the command, fixed by using right event type.
Kai-Heng Feng [Thu, 25 Jan 2024 06:50:28 +0000 (14:50 +0800)]
Bluetooth: Enforce validation on max value of connection interval
Right now Linux BT stack cannot pass test case "GAP/CONN/CPUP/BV-05-C
'Connection Parameter Update Procedure Invalid Parameters Central
Responder'" in Bluetooth Test Suite revision GAP.TS.p44. [0]
That was revoled by commit c49a8682fc5d ("Bluetooth: validate BLE
connection interval updates"), but later got reverted due to devices
like keyboards and mice may require low connection interval.
So only validate the max value connection interval to pass the Test
Suite, and let devices to request low connection interval if needed.
Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
If we received HCI_EV_IO_CAPA_REQUEST while
HCI_OP_READ_REMOTE_EXT_FEATURES is yet to be responded assume the remote
does support SSP since otherwise this event shouldn't be generated.
Frédéric Danis [Mon, 22 Jan 2024 16:59:55 +0000 (17:59 +0100)]
Bluetooth: mgmt: Fix limited discoverable off timeout
LIMITED_DISCOVERABLE flag is not reset from Class of Device and
advertisement on limited discoverable timeout. This prevents to pass PTS
test GAP/DISC/LIMM/BV-02-C
Calling set_discoverable_sync as when the limited discovery is set
correctly update the Class of Device and advertisement.