]> Git Repo - linux.git/log
linux.git
2 years agolibbpf: Fix potential NULL dereference when parsing ELF
Andrii Nakryiko [Tue, 16 Aug 2022 00:19:26 +0000 (17:19 -0700)]
libbpf: Fix potential NULL dereference when parsing ELF

Fix if condition filtering empty ELF sections to prevent NULL
dereference.

Fixes: 47ea7417b074 ("libbpf: Skip empty sections in bpf_object__init_global_data_maps")
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Hao Luo <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agonet: dsa: microchip: ksz9477: fix fdb_dump last invalid entry
Arun Ramadoss [Tue, 16 Aug 2022 10:55:16 +0000 (16:25 +0530)]
net: dsa: microchip: ksz9477: fix fdb_dump last invalid entry

In the ksz9477_fdb_dump function it reads the ALU control register and
exit from the timeout loop if there is valid entry or search is
complete. After exiting the loop, it reads the alu entry and report to
the user space irrespective of entry is valid. It works till the valid
entry. If the loop exited when search is complete, it reads the alu
table. The table returns all ones and it is reported to user space. So
bridge fdb show gives ff:ff:ff:ff:ff:ff as last entry for every port.
To fix it, after exiting the loop the entry is reported only if it is
valid one.

Fixes: b987e98e50ab ("dsa: add DSA switch driver for Microchip KSZ9477")
Signed-off-by: Arun Ramadoss <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agoMerge branch 'net-dsa-bcm_sf2-utilize-phylink-for-all-ports'
Jakub Kicinski [Wed, 17 Aug 2022 17:55:06 +0000 (10:55 -0700)]
Merge branch 'net-dsa-bcm_sf2-utilize-phylink-for-all-ports'

Florian Fainelli says:

====================
net: dsa: bcm_sf2: Utilize PHYLINK for all ports

This patch series has the bcm_sf2 driver utilize PHYLINK to configure
the CPU port link parameters to unify the configuration and pave the way
for DSA to utilize PHYLINK for all ports in the future.

Tested on BCM7445 and BCM7278
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agonet: dsa: bcm_sf2: Have PHYLINK configure CPU/IMP port(s)
Florian Fainelli [Mon, 15 Aug 2022 17:50:09 +0000 (10:50 -0700)]
net: dsa: bcm_sf2: Have PHYLINK configure CPU/IMP port(s)

Remove the artificial limitations imposed upon
bcm_sf2_sw_mac_link_{up,down} and allow us to override the link
parameters for IMP port(s) as well as regular ports by accounting for
the special differences that exist there.

Remove the code that did override the link parameters in
bcm_sf2_imp_setup().

Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agonet: dsa: bcm_sf2: Introduce helper for port override offset
Florian Fainelli [Mon, 15 Aug 2022 17:50:08 +0000 (10:50 -0700)]
net: dsa: bcm_sf2: Introduce helper for port override offset

Depending upon the generation of switches, we have different offsets for
configuring a given port's status override where link parameters are
applied. Introduce a helper function that we re-use throughout the code
in order to let phylink callbacks configure the IMP/CPU port(s) in
subsequent changes.

Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agonet: sfp: use simplified HWMON_CHANNEL_INFO macro
Beniamin Sandu [Sat, 13 Aug 2022 20:46:58 +0000 (23:46 +0300)]
net: sfp: use simplified HWMON_CHANNEL_INFO macro

This makes the code look cleaner and easier to read.

Signed-off-by: Beniamin Sandu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agoselftests/bpf: Tests libbpf autoattach APIs
Hao Luo [Tue, 16 Aug 2022 23:40:12 +0000 (16:40 -0700)]
selftests/bpf: Tests libbpf autoattach APIs

Adds test for libbpf APIs that toggle bpf program auto-attaching.

Signed-off-by: Hao Luo <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agolibbpf: Allows disabling auto attach
Hao Luo [Tue, 16 Aug 2022 23:40:11 +0000 (16:40 -0700)]
libbpf: Allows disabling auto attach

Adds libbpf APIs for disabling auto-attach for individual functions.
This is motivated by the use case of cgroup iter [1]. Some iter
types require their parameters to be non-zero, therefore applying
auto-attach on them will fail. With these two new APIs, users who
want to use auto-attach and these types of iters can disable
auto-attach on the program and perform manual attach.

[1] https://lore.kernel.org/bpf/CAEf4BzZ+a2uDo_t6kGBziqdz--m2gh2_EUwkGLDtMd65uwxUjA@mail.gmail.com/

Signed-off-by: Hao Luo <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agoice: Fix VF not able to send tagged traffic with no VLAN filters
Sylwester Dziedziuch [Wed, 3 Aug 2022 08:42:46 +0000 (10:42 +0200)]
ice: Fix VF not able to send tagged traffic with no VLAN filters

VF was not able to send tagged traffic when it didn't
have any VLAN interfaces and VLAN anti-spoofing was enabled.
Fix this by allowing VFs with no VLAN filters to send tagged
traffic. After VF adds a VLAN interface it will be able to
send tagged traffic matching VLAN filters only.

Testing hints:
1. Spawn VF
2. Send tagged packet from a VF
3. The packet should be sent out and not dropped
4. Add a VLAN interface on VF
5. Send tagged packet on that VLAN interface
6. Packet should be sent out and not dropped
7. Send tagged packet with id different than VLAN interface
8. Packet should be dropped

Fixes: daf4dd16438b ("ice: Refactor spoofcheck configuration functions")
Signed-off-by: Sylwester Dziedziuch <[email protected]>
Signed-off-by: Mateusz Palczewski <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
2 years agoice: Ignore error message when setting same promiscuous mode
Benjamin Mikailenko [Fri, 12 Aug 2022 13:25:50 +0000 (15:25 +0200)]
ice: Ignore error message when setting same promiscuous mode

Commit 1273f89578f2 ("ice: Fix broken IFF_ALLMULTI handling")
introduced new checks when setting/clearing promiscuous mode. But if the
requested promiscuous mode setting already exists, an -EEXIST error
message would be printed. This is incorrect because promiscuous mode is
either on/off and shouldn't print an error when the requested
configuration is already set.

This can happen when removing a bridge with two bonded interfaces and
promiscuous most isn't fully cleared from VLAN VSI in hardware.

Fix this by ignoring cases where requested promiscuous mode exists.

Fixes: 1273f89578f2 ("ice: Fix broken IFF_ALLMULTI handling")
Signed-off-by: Benjamin Mikailenko <[email protected]>
Signed-off-by: Grzegorz Siwik <[email protected]>
Link: https://lore.kernel.org/all/CAK8fFZ7m-KR57M_rYX6xZN39K89O=LGooYkKsu6HKt0Bs+x6xQ@mail.gmail.com/
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
2 years agoice: Fix clearing of promisc mode with bridge over bond
Grzegorz Siwik [Fri, 12 Aug 2022 13:25:49 +0000 (15:25 +0200)]
ice: Fix clearing of promisc mode with bridge over bond

When at least two interfaces are bonded and a bridge is enabled on the
bond, an error can occur when the bridge is removed and re-added. The
reason for the error is because promiscuous mode was not fully cleared from
the VLAN VSI in the hardware. With this change, promiscuous mode is
properly removed when the bridge disconnects from bonding.

[ 1033.676359] bond1: link status definitely down for interface enp95s0f0, disabling it
[ 1033.676366] bond1: making interface enp175s0f0 the new active one
[ 1033.676369] device enp95s0f0 left promiscuous mode
[ 1033.676522] device enp175s0f0 entered promiscuous mode
[ 1033.676901] ice 0000:af:00.0 enp175s0f0: Error setting Multicast promiscuous mode on VSI 6
[ 1041.795662] ice 0000:af:00.0 enp175s0f0: Error setting Multicast promiscuous mode on VSI 6
[ 1041.944826] bond1: link status definitely down for interface enp175s0f0, disabling it
[ 1041.944874] device enp175s0f0 left promiscuous mode
[ 1041.944918] bond1: now running without any active interface!

Fixes: c31af68a1b94 ("ice: Add outer_vlan_ops and VSI specific VLAN ops implementations")
Co-developed-by: Jesse Brandeburg <[email protected]>
Signed-off-by: Jesse Brandeburg <[email protected]>
Signed-off-by: Grzegorz Siwik <[email protected]>
Link: https://lore.kernel.org/all/CAK8fFZ7m-KR57M_rYX6xZN39K89O=LGooYkKsu6HKt0Bs+x6xQ@mail.gmail.com/
Tested-by: Jaroslav Pulchart <[email protected]>
Tested-by: Igor Raits <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
2 years agoice: Ignore EEXIST when setting promisc mode
Grzegorz Siwik [Fri, 12 Aug 2022 13:25:48 +0000 (15:25 +0200)]
ice: Ignore EEXIST when setting promisc mode

Ignore EEXIST error when setting promiscuous mode.
This fix is needed because the driver could set promiscuous mode
when it still has not cleared properly.
Promiscuous mode could be set only once, so setting it second
time will be rejected.

Fixes: 5eda8afd6bcc ("ice: Add support for PF/VF promiscuous mode")
Signed-off-by: Grzegorz Siwik <[email protected]>
Link: https://lore.kernel.org/all/CAK8fFZ7m-KR57M_rYX6xZN39K89O=LGooYkKsu6HKt0Bs+x6xQ@mail.gmail.com/
Tested-by: Jaroslav Pulchart <[email protected]>
Tested-by: Igor Raits <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
2 years agoice: Fix double VLAN error when entering promisc mode
Grzegorz Siwik [Fri, 12 Aug 2022 13:25:47 +0000 (15:25 +0200)]
ice: Fix double VLAN error when entering promisc mode

Avoid enabling or disabling VLAN 0 when trying to set promiscuous
VLAN mode if double VLAN mode is enabled. This fix is needed
because the driver tries to add the VLAN 0 filter twice (once for
inner and once for outer) when double VLAN mode is enabled. The
filter program is rejected by the firmware when double VLAN is
enabled, because the promiscuous filter only needs to be set once.

This issue was missed in the initial implementation of double VLAN
mode.

Fixes: 5eda8afd6bcc ("ice: Add support for PF/VF promiscuous mode")
Signed-off-by: Grzegorz Siwik <[email protected]>
Link: https://lore.kernel.org/all/CAK8fFZ7m-KR57M_rYX6xZN39K89O=LGooYkKsu6HKt0Bs+x6xQ@mail.gmail.com/
Tested-by: Jaroslav Pulchart <[email protected]>
Tested-by: Igor Raits <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
2 years agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Wed, 17 Aug 2022 15:58:54 +0000 (08:58 -0700)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
 "Most notably this drops the commits that trip up google cloud (turns
  out, any legacy device).

  Plus a kerneldoc patch"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio: kerneldocs fixes and enhancements
  virtio: Revert "virtio: find_vqs() add arg sizes"
  virtio_vdpa: Revert "virtio_vdpa: support the arg sizes of find_vqs()"
  virtio_pci: Revert "virtio_pci: support the arg sizes of find_vqs()"
  virtio-mmio: Revert "virtio_mmio: support the arg sizes of find_vqs()"
  virtio: Revert "virtio: add helper virtio_find_vqs_ctx_size()"
  virtio_net: Revert "virtio_net: set the default max ring size by find_vqs()"

2 years agotesting: selftests: nft_flowtable.sh: rework test to detect offload failure
Florian Westphal [Tue, 16 Aug 2022 12:15:22 +0000 (14:15 +0200)]
testing: selftests: nft_flowtable.sh: rework test to detect offload failure

This test fails on current kernel releases because the flotwable path
now calls dst_check from packet path and will then remove the offload.

Test script has two purposes:
1. check that file (random content) can be sent to other netns (and vv)
2. check that the flow is offloaded (rather than handled by classic
   forwarding path).

Since dst_check is in place, 2) fails because the nftables ruleset in
router namespace 1 intentionally blocks traffic under the assumption
that packets are not passed via classic path at all.

Rework this: Instead of blocking traffic, create two named counters, one
for original and one for reverse direction.

The first three test cases are handled by classic forwarding path
(path mtu discovery is disabled and packets exceed MTU).

But all other tests enable PMTUD, so the originator and responder are
expected to lower packet size and flowtable is expected to do the packet
forwarding.

For those tests, check that the packet counters (which are only
incremented for packets that are passed up to classic forward path)
are significantly lower than the file size transferred.

I've tested that the counter-checks fail as expected when the 'flow add'
statement is removed from the ruleset.

Signed-off-by: Florian Westphal <[email protected]>
2 years agoMerge branch 'wwan-t7xx-fw-flashing-and-coredump-support'
David S. Miller [Wed, 17 Aug 2022 10:53:53 +0000 (11:53 +0100)]
Merge branch 'wwan-t7xx-fw-flashing-and-coredump-support'

M Chetan Kumar says:

====================
net: wwan: t7xx: fw flashing & coredump support

This patch series brings-in the support for FM350 wwan device firmware
flashing & coredump collection using devlink interface.

Below is the high level description of individual patches.
Refer to individual patch commit message for details.

PATCH1:  Enables AP CLDMA communication for firmware flashing &
coredump collection.

PATCH2: Enables the infrastructure & queue configuration required
for early ports enumeration.

PATCH3: Implements device reset and rescan logic required to enter
or exit fastboot mode.

PATCH4: Implements devlink interface & uses the fastboot protocol for
fw flashing and coredump collection.

PATCH5: t7xx devlink commands documentation.
====================

Signed-off-by: David S. Miller <[email protected]>
2 years agonet: wwan: t7xx: Devlink documentation
M Chetan Kumar [Tue, 16 Aug 2022 04:24:17 +0000 (09:54 +0530)]
net: wwan: t7xx: Devlink documentation

Document the t7xx devlink commands usage for fw flashing &
coredump collection.

Refer to t7xx.rst file for details.

Signed-off-by: M Chetan Kumar <[email protected]>
Signed-off-by: Devegowda Chandrashekar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agonet: wwan: t7xx: Enable devlink based fw flashing and coredump collection
M Chetan Kumar [Tue, 16 Aug 2022 04:24:05 +0000 (09:54 +0530)]
net: wwan: t7xx: Enable devlink based fw flashing and coredump collection

This patch brings-in support for t7xx wwan device firmware flashing &
coredump collection using devlink.

Driver Registers with Devlink framework.
Implements devlink ops flash_update callback that programs modem firmware.
Creates region & snapshot required for device coredump log collection.
On early detection of wwan device in fastboot mode driver sets up CLDMA0 HW
tx/rx queues for raw data transfer then registers with devlink framework.
Upon receiving firmware image & partition details driver sends fastboot
commands for flashing the firmware.

In this flow the fastboot command & response gets exchanged between driver
and device. Once firmware flashing is success completion status is reported
to user space application.

Below is the devlink command usage for firmware flashing

$devlink dev flash pci/$BDF file ABC.img component ABC

Note: ABC.img is the firmware to be programmed to "ABC" partition.

In case of coredump collection when wwan device encounters an exception
it reboots & stays in fastboot mode for coredump collection by host driver.
On detecting exception state driver collects the core dump, creates the
devlink region & reports an event to user space application for dump
collection. The user space application invokes devlink region read command
for dump collection.

Below are the devlink commands used for coredump collection.

devlink region new pci/$BDF/mr_dump
devlink region read pci/$BDF/mr_dump snapshot $ID address $ADD length $LEN
devlink region del pci/$BDF/mr_dump snapshot $ID

Signed-off-by: M Chetan Kumar <[email protected]>
Signed-off-by: Devegowda Chandrashekar <[email protected]>
Signed-off-by: Mishra Soumya Prakash <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agonet: wwan: t7xx: PCIe reset rescan
Haijun Liu [Tue, 16 Aug 2022 04:23:53 +0000 (09:53 +0530)]
net: wwan: t7xx: PCIe reset rescan

PCI rescan module implements "rescan work queue". In firmware flashing
or coredump collection procedure WWAN device is programmed to boot in
fastboot mode and a work item is scheduled for removal & detection.
The WWAN device is reset using APCI call as part driver removal flow.
Work queue rescans pci bus at fixed interval for device detection,
later when device is detect work queue exits.

Signed-off-by: Haijun Liu <[email protected]>
Co-developed-by: Madhusmita Sahu <[email protected]>
Signed-off-by: Madhusmita Sahu <[email protected]>
Signed-off-by: Ricardo Martinez <[email protected]>
Signed-off-by: M Chetan Kumar <[email protected]>
Signed-off-by: Devegowda Chandrashekar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agonet: wwan: t7xx: Infrastructure for early port configuration
Haijun Liu [Tue, 16 Aug 2022 04:23:40 +0000 (09:53 +0530)]
net: wwan: t7xx: Infrastructure for early port configuration

To support cases such as FW update or Core dump, the t7xx device
is capable of signaling the host that a special port needs
to be created before the handshake phase.

This patch adds the infrastructure required to create the
early ports which also requires a different configuration of
CLDMA queues.

Signed-off-by: Haijun Liu <[email protected]>
Co-developed-by: Madhusmita Sahu <[email protected]>
Signed-off-by: Madhusmita Sahu <[email protected]>
Signed-off-by: Ricardo Martinez <[email protected]>
Signed-off-by: Devegowda Chandrashekar <[email protected]>
Signed-off-by: M Chetan Kumar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agonet: wwan: t7xx: Add AP CLDMA
Haijun Liu [Tue, 16 Aug 2022 04:23:28 +0000 (09:53 +0530)]
net: wwan: t7xx: Add AP CLDMA

The t7xx device contains two Cross Layer DMA (CLDMA) interfaces to
communicate with AP and Modem processors respectively. So far only
MD-CLDMA was being used, this patch enables AP-CLDMA.

Rename small Application Processor (sAP) to AP.

Signed-off-by: Haijun Liu <[email protected]>
Co-developed-by: Madhusmita Sahu <[email protected]>
Signed-off-by: Madhusmita Sahu <[email protected]>
Signed-off-by: Moises Veleta <[email protected]>
Signed-off-by: Devegowda Chandrashekar <[email protected]>
Signed-off-by: M Chetan Kumar <[email protected]>
Reviewed-by: Ilpo Järvinen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agonet: phy: broadcom: Implement suspend/resume for AC131 and BCM5241
Florian Fainelli [Mon, 15 Aug 2022 19:07:47 +0000 (12:07 -0700)]
net: phy: broadcom: Implement suspend/resume for AC131 and BCM5241

Implement the suspend/resume procedure for the Broadcom AC131 and BCM5241 type
of PHYs (10/100 only) by entering the standard power down followed by the
proprietary standby mode in the auxiliary mode 4 shadow register. On resume,
the PHY software reset is enough to make it come out of standby mode so we can
utilize brcm_fet_config_init() as the resume hook.

Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
David S. Miller [Wed, 17 Aug 2022 09:25:29 +0000 (10:25 +0100)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

====================
Intel Wired LAN Driver Updates 2022-08-16)

This series contains updates to i40e driver only.

Przemyslaw fixes issue with checksum offload on VXLAN tunnels.

Alan disables VSI for Tx timeout when all recovery methods have failed.
====================

Signed-off-by: David S. Miller <[email protected]>
2 years agotls: rx: react to strparser initialization errors
Jakub Kicinski [Tue, 16 Aug 2022 00:23:58 +0000 (17:23 -0700)]
tls: rx: react to strparser initialization errors

Even though the normal strparser's init function has a return
value we got away with ignoring errors until now, as it only
validates the parameters and we were passing correct parameters.

tls_strp can fail to init on memory allocation errors, which
syzbot duly induced and reported.

Reported-by: [email protected]
Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nex
David S. Miller [Wed, 17 Aug 2022 09:20:45 +0000 (10:20 +0100)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nex
t-queue

Tony Nguyen says:

====================
ice: detect and report PTP timestamp issues

Jacob Keller says:

This series fixes a few small issues with the cached PTP Hardware Clock
timestamp used for timestamp extension. It also introduces extra checks to
help detect issues with this logic, such as if the cached timestamp is not
updated within the 2 second window.

This introduces a few statistics similar to the ones already available in
other Intel drivers, including tx_hwtstamp_skipped and tx_hwtstamp_timeouts.

It is intended to aid in debugging issues we're seeing with some setups
which might be related to incorrect cached timestamp values.
====================

Signed-off-by: David S. Miller <[email protected]>
2 years agotcp: Make SYN ACK RTO tunable by BPF programs with TFO
Jie Meng [Mon, 15 Aug 2022 20:29:00 +0000 (13:29 -0700)]
tcp: Make SYN ACK RTO tunable by BPF programs with TFO

Instead of the hardcoded TCP_TIMEOUT_INIT, this diff calls tcp_timeout_init
to initiate req->timeout like the non TFO SYN ACK case.

Tested using the following packetdrill script, on a host with a BPF
program that sets the initial connect timeout to 10ms.

`../../common/defaults.sh`

// Initialize connection
    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_TCP, TCP_FASTOPEN, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

   +0 < S 0:0(0) win 32792 <mss 1000,sackOK,FO TFO_COOKIE>
   +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
   +.01 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
   +.02 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
   +.04 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
   +.01 < . 1:1(0) ack 1 win 32792

   +0 accept(3, ..., ...) = 4

Signed-off-by: Jie Meng <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Acked-by: Yuchung Cheng <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agotesting: selftests: nft_flowtable.sh: use random netns names
Florian Westphal [Tue, 16 Aug 2022 12:15:21 +0000 (14:15 +0200)]
testing: selftests: nft_flowtable.sh: use random netns names

"ns1" is a too generic name, use a random suffix to avoid
errors when such a netns exists.  Also allows to run multiple
instances of the script in parallel.

Signed-off-by: Florian Westphal <[email protected]>
2 years agonetfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to y
Geert Uytterhoeven [Mon, 15 Aug 2022 10:39:20 +0000 (12:39 +0200)]
netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to y

NF_CONNTRACK_PROCFS was marked obsolete in commit 54b07dca68557b09
("netfilter: provide config option to disable ancient procfs parts") in
v3.3.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
2 years agonet: sched: delete unused input parameter in qdisc_create
Zhengchao Shao [Mon, 15 Aug 2022 06:10:23 +0000 (14:10 +0800)]
net: sched: delete unused input parameter in qdisc_create

The input parameter p is unused in qdisc_create. Delete it.

Signed-off-by: Zhengchao Shao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agonet: vertexcom: mse102x: Update email address
Stefan Wahren [Mon, 15 Aug 2022 08:06:26 +0000 (10:06 +0200)]
net: vertexcom: mse102x: Update email address

in-tech smart charging is now chargebyte. So update the email address
accordingly.

Signed-off-by: Stefan Wahren <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agodt-bindings: vertexcom-mse102x: Update email address
Stefan Wahren [Mon, 15 Aug 2022 08:06:25 +0000 (10:06 +0200)]
dt-bindings: vertexcom-mse102x: Update email address

in-tech smart charging is now chargebyte. So update the email address
accordingly.

Signed-off-by: Stefan Wahren <[email protected]>
Acked-by: Krzysztof Kozlowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agonet: sched: fix misuse of qcpu->backlog in gnet_stats_add_queue_cpu
Zhengchao Shao [Mon, 15 Aug 2022 03:08:48 +0000 (11:08 +0800)]
net: sched: fix misuse of qcpu->backlog in gnet_stats_add_queue_cpu

In the gnet_stats_add_queue_cpu function, the qstats->qlen statistics
are incorrectly set to qcpu->backlog.

Fixes: 448e163f8b9b ("gen_stats: Add gnet_stats_add_queue()")
Signed-off-by: Zhengchao Shao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agonet: sched: remove the unused return value of unregister_qdisc
Zhengchao Shao [Mon, 15 Aug 2022 03:04:17 +0000 (11:04 +0800)]
net: sched: remove the unused return value of unregister_qdisc

Return value of unregister_qdisc is unused, remove it.

Signed-off-by: Zhengchao Shao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agoselftests/bpf: Fix attach point for non-x86 arches in test_progs/lsm
Artem Savkov [Tue, 16 Aug 2022 05:52:31 +0000 (07:52 +0200)]
selftests/bpf: Fix attach point for non-x86 arches in test_progs/lsm

Use SYS_PREFIX macro from bpf_misc.h instead of hard-coded '__x64_'
prefix for sys_setdomainname attach point in lsm test.

Signed-off-by: Artem Savkov <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agoMerge tag 'nios2_fixes_v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguye...
Linus Torvalds [Tue, 16 Aug 2022 18:42:11 +0000 (11:42 -0700)]
Merge tag 'nios2_fixes_v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux

Pull NIOS2 fixes from Dinh Nguyen:

 - Security fixes from Al Viro

* tag 'nios2_fixes_v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
  nios2: add force_successful_syscall_return()
  nios2: restarts apply only to the first sigframe we build...
  nios2: fix syscall restart checks
  nios2: traced syscall does need to check the syscall number
  nios2: don't leave NULLs in sys_call_table[]
  nios2: page fault et.al. are *not* restartable syscalls...

2 years agoMerge tag 'spi-fix-v6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Linus Torvalds [Tue, 16 Aug 2022 18:40:15 +0000 (11:40 -0700)]
Merge tag 'spi-fix-v6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A few fixes that came in since my pull request, the Meson fix is a
  little large since it's fixing all possible cases of the problem that
  was observed with the driver and clock API trying to share
  configuration by integrating the device clocking fully with the clock
  API rather than spot fixing the one instance that was observed"

* tag 'spi-fix-v6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: dt-bindings: Drop Pratyush Yadav
  spi: meson-spicc: add local pow2 clock ops to preserve rate between messages
  MAINTAINERS: rectify entry for ARM/HPE GXP ARCHITECTURE
  spi: spi.c: Add missing __percpu annotations in users of spi_statistics

2 years agoMerge tag 'regulator-fix-v6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 16 Aug 2022 18:36:38 +0000 (11:36 -0700)]
Merge tag 'regulator-fix-v6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
 "A couple of small fixes that came in since my pull request, nothing
  major here"

* tag 'regulator-fix-v6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: core: Fix missing error return from regulator_bulk_get()
  regulator: pca9450: Remove restrictions for regulator-name

2 years agox86: simplify load_unaligned_zeropad() implementation
Linus Torvalds [Sun, 14 Aug 2022 21:16:13 +0000 (14:16 -0700)]
x86: simplify load_unaligned_zeropad() implementation

The exception for the "unaligned access at the end of the page, next
page not mapped" never happens, but the fixup code ends up causing
trouble for compilers to optimize well.

clang in particular ends up seeing it being in the middle of a loop, and
tries desperately to optimize the exception fixup code that is never
really reached.

The simple solution is to just move all the fixups into the exception
handler itself, which moves it all out of the hot case code, and means
that the compiler never sees it or needs to worry about it.

Acked-by: Peter Zijlstra <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
2 years agolocking/atomic: Make test_and_*_bit() ordered on failure
Hector Martin [Tue, 16 Aug 2022 07:03:11 +0000 (16:03 +0900)]
locking/atomic: Make test_and_*_bit() ordered on failure

These operations are documented as always ordered in
include/asm-generic/bitops/instrumented-atomic.h, and producer-consumer
type use cases where one side needs to ensure a flag is left pending
after some shared data was updated rely on this ordering, even in the
failure case.

This is the case with the workqueue code, which currently suffers from a
reproducible ordering violation on Apple M1 platforms (which are
notoriously out-of-order) that ends up causing the TTY layer to fail to
deliver data to userspace properly under the right conditions.  This
change fixes that bug.

Change the documentation to restrict the "no order on failure" story to
the _lock() variant (for which it makes sense), and remove the
early-exit from the generic implementation, which is what causes the
missing barrier semantics in that case.  Without this, the remaining
atomic op is fully ordered (including on ARM64 LSE, as of recent
versions of the architecture spec).

Suggested-by: Linus Torvalds <[email protected]>
Cc: [email protected]
Fixes: e986a0d6cb36 ("locking/atomics, asm-generic/bitops/atomic.h: Rewrite using atomic_*() APIs")
Fixes: 61e02392d3c7 ("locking/atomic/bitops: Document and clarify ordering semantics for failed test_and_{}_bit()")
Signed-off-by: Hector Martin <[email protected]>
Acked-by: Will Deacon <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
2 years agoice: introduce ice_ptp_reset_cached_phctime function
Jacob Keller [Wed, 27 Jul 2022 23:16:02 +0000 (16:16 -0700)]
ice: introduce ice_ptp_reset_cached_phctime function

If the PTP hardware clock is adjusted, the ice driver must update the
cached PHC timestamp. This is required in order to perform timestamp
extension on the shorter timestamps captured by the PHY.

Currently, we simply call ice_ptp_update_cached_phctime in the settime and
adjtime callbacks. This has a few issues:

1) if ICE_CFG_BUSY is set because another thread is updating the Rx rings,
   we will exit with an error. This is not checked, and the functions do
   not re-schedule the update. This could leave the cached timestamp
   incorrect until the next scheduled work item execution.

2) even if we did handle an update, any currently outstanding Tx timestamp
   would be extended using the wrong cached PHC time. This would produce
   incorrect results.

To fix these issues, introduce a new ice_ptp_reset_cached_phctime function.
This function calls the ice_ptp_update_cached_phctime, and discards
outstanding Tx timestamps.

If the ice_ptp_update_cached_phctime function fails because ICE_CFG_BUSY is
set, we log a warning and schedule the thread to execute soon. The update
function is modified so that it always updates the cached copy in the PF
regardless. This ensures we have the most up to date values possible and
minimizes the risk of a packet timestamp being extended with the wrong
value.

It would be nice if we could skip reporting Rx timestamps until the cached
values are up to date. However, we can't access the Rx rings while
ICE_CFG_BUSY is set because they are actively being updated by another
thread.

Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
2 years agoice: re-arrange some static functions in ice_ptp.c
Jacob Keller [Wed, 27 Jul 2022 23:16:01 +0000 (16:16 -0700)]
ice: re-arrange some static functions in ice_ptp.c

A following change is going to want to make use of ice_ptp_flush_tx_tracker
earlier in the ice_ptp.c file. To make this work, move the Tx timestamp
tracking functions higher up in the file, and pull the
ice_ptp_update_cached_timestamp function below them. This should have no
functional change.

Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
2 years agoice: track and warn when PHC update is late
Jacob Keller [Wed, 27 Jul 2022 23:16:00 +0000 (16:16 -0700)]
ice: track and warn when PHC update is late

The ice driver requires a cached copy of the PHC time in order to perform
timestamp extension on Tx and Rx hardware timestamp values. This cached PHC
time must always be updated at least once every 2 seconds. Otherwise, the
math used to perform the extension would produce invalid results.

The updates are supposed to occur periodically in the PTP kthread work
item, which is scheduled to run every half second. Thus, we do not expect
an update to be delayed for so long. However, there are error conditions
which can cause the update to be delayed.

Track this situation by using jiffies to determine approximately how long
ago the last update occurred. Add a new statistic and a dev_warn when we
have failed to update the cached PHC time. This makes the error case more
obvious.

Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
2 years agoice: track Tx timestamp stats similar to other Intel drivers
Jacob Keller [Wed, 27 Jul 2022 23:15:59 +0000 (16:15 -0700)]
ice: track Tx timestamp stats similar to other Intel drivers

Several Intel networking drivers which support PTP track when Tx timestamps
are skipped or when they timeout without a timestamp from hardware. The
conditions which could cause these events are rare, but it can be useful to
know when and how often they occur.

Implement similar statistics for the ice driver, tx_hwtstamp_skipped,
tx_hwtstamp_timeouts, and tx_hwtstamp_flushed.

Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
2 years agoice: initialize cached_phctime when creating Rx rings
Jacob Keller [Wed, 27 Jul 2022 23:15:58 +0000 (16:15 -0700)]
ice: initialize cached_phctime when creating Rx rings

When we create new Rx rings, the cached_phctime field is zero initialized.
This could result in incorrect timestamp reporting due to the cached value
not yet being updated. Although a background task will periodically update
the cached value, ensure it matches the existing cached value in the PF
structure at ring initialization.

Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
2 years agoice: set tx_tstamps when creating new Tx rings via ethtool
Jacob Keller [Wed, 27 Jul 2022 23:15:57 +0000 (16:15 -0700)]
ice: set tx_tstamps when creating new Tx rings via ethtool

When the user changes the number of queues via ethtool, the driver
allocates new rings. This allocation did not initialize tx_tstamps. This
results in the tx_tstamps field being zero (due to kcalloc allocation), and
would result in a NULL pointer dereference when attempting a transmit
timestamp on the new ring.

Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
2 years agoi40e: Fix to stop tx_timeout recovery if GLOBR fails
Alan Brady [Tue, 2 Aug 2022 08:19:17 +0000 (10:19 +0200)]
i40e: Fix to stop tx_timeout recovery if GLOBR fails

When a tx_timeout fires, the PF attempts to recover by incrementally
resetting.  First we try a PFR, then CORER and finally a GLOBR.  If the
GLOBR fails, then we keep hitting the tx_timeout and incrementing the
recovery level and issuing dmesgs, which is both annoying to the user
and accomplishes nothing.

If the GLOBR fails, then we're pretty much totally hosed, and there's
not much else we can do to recover, so this makes it such that we just
kill the VSI and stop hitting the tx_timeout in such a case.

Fixes: 41c445ff0f48 ("i40e: main driver core")
Signed-off-by: Alan Brady <[email protected]>
Signed-off-by: Mateusz Palczewski <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
2 years agoi40e: Fix tunnel checksum offload with fragmented traffic
Przemyslaw Patynowski [Wed, 27 Jul 2022 09:19:40 +0000 (11:19 +0200)]
i40e: Fix tunnel checksum offload with fragmented traffic

Fix checksum offload on VXLAN tunnels.
In case, when mpls protocol is not used, set l4 header to transport
header of skb. This fixes case, when user tries to offload checksums
of VXLAN tunneled traffic.

Steps for reproduction (requires link partner with tunnels):
ip l s enp130s0f0 up
ip a f enp130s0f0
ip a a 10.10.110.2/24 dev enp130s0f0
ip l s enp130s0f0 mtu 1600
ip link add vxlan12_sut type vxlan id 12 group 238.168.100.100 dev \
enp130s0f0 dstport 4789
ip l s vxlan12_sut up
ip a a 20.10.110.2/24 dev vxlan12_sut
iperf3 -c 20.10.110.1 #should connect

Without this patch, TX descriptor was using wrong data, due to
l4 header pointing wrong address. NIC would then drop those packets
internally, due to incorrect TX descriptor data, which increased
GLV_TEPC register.

Fixes: b4fb2d33514a ("i40e: Add support for MPLS + TSO")
Signed-off-by: Przemyslaw Patynowski <[email protected]>
Signed-off-by: Mateusz Palczewski <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
2 years agovirtio: kerneldocs fixes and enhancements
Ricardo Cañuelo [Wed, 10 Aug 2022 09:40:03 +0000 (11:40 +0200)]
virtio: kerneldocs fixes and enhancements

Fix variable names in some kerneldocs, naming in others.
Add kerneldocs for struct vring_desc and vring_interrupt.

Signed-off-by: Ricardo Cañuelo <[email protected]>
Message-Id: <20220810094004[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
2 years agovirtio: Revert "virtio: find_vqs() add arg sizes"
Michael S. Tsirkin [Tue, 16 Aug 2022 05:36:58 +0000 (01:36 -0400)]
virtio: Revert "virtio: find_vqs() add arg sizes"

This reverts commit a10fba0377145fccefea4dc4dd5915b7ed87e546: the
proposed API isn't supported on all transports but no
effort was made to address this.

It might not be hard to fix if we want to: maybe just
rename size to size_hint and make sure legacy
transports ignore the hint.

But it's not sure what the benefit is in any case, so
let's drop it.

Fixes: a10fba037714 ("virtio: find_vqs() add arg sizes")
Signed-off-by: Michael S. Tsirkin <[email protected]>
Message-Id: <20220816053602[email protected]>

2 years agovirtio_vdpa: Revert "virtio_vdpa: support the arg sizes of find_vqs()"
Michael S. Tsirkin [Tue, 16 Aug 2022 05:36:45 +0000 (01:36 -0400)]
virtio_vdpa: Revert "virtio_vdpa: support the arg sizes of find_vqs()"

This reverts commit 99e8927d8a4da8eb8a8a5904dc13a3156be8e7c0:
proposed API isn't supported on all transports but no
effort was made to address this.

It might not be hard to fix if we want to: maybe just rename size to
size_hint and make sure legacy transports ignore the hint.

But it's not sure what the benefit is in any case, so let's drop it.

Fixes: 99e8927d8a4d ("virtio_vdpa: support the arg sizes of find_vqs()")
Signed-off-by: Michael S. Tsirkin <[email protected]>
Message-Id: <20220816053602[email protected]>

2 years agovirtio_pci: Revert "virtio_pci: support the arg sizes of find_vqs()"
Michael S. Tsirkin [Tue, 16 Aug 2022 05:36:40 +0000 (01:36 -0400)]
virtio_pci: Revert "virtio_pci: support the arg sizes of find_vqs()"

This reverts commit cdb44806fca2d0ad29ca644cbf1505433902ee0c: the legacy
path is wrong and in fact can not support the proposed API since for a
legacy device we never communicate the vq size to the hypervisor.

Reported-by: Andres Freund <[email protected]>
Fixes: cdb44806fca2 ("virtio_pci: support the arg sizes of find_vqs()")
Signed-off-by: Michael S. Tsirkin <[email protected]>
Message-Id: <20220816053602[email protected]>

2 years agovirtio-mmio: Revert "virtio_mmio: support the arg sizes of find_vqs()"
Michael S. Tsirkin [Tue, 16 Aug 2022 05:36:35 +0000 (01:36 -0400)]
virtio-mmio: Revert "virtio_mmio: support the arg sizes of find_vqs()"

This reverts commit fbed86abba6e0472d98079790e58060e4332608a.
The API is now unused, let's not carry dead code around.

Fixes: fbed86abba6e ("virtio_mmio: support the arg sizes of find_vqs()")
Signed-off-by: Michael S. Tsirkin <[email protected]>
Message-Id: <20220816053602[email protected]>

2 years agovirtio: Revert "virtio: add helper virtio_find_vqs_ctx_size()"
Michael S. Tsirkin [Tue, 16 Aug 2022 05:36:31 +0000 (01:36 -0400)]
virtio: Revert "virtio: add helper virtio_find_vqs_ctx_size()"

This reverts commit fe3dc04e31aa51f91dc7f741a5f76cc4817eb5b4: the
API is now unused and in fact can't be implemented on top of a legacy
device.

Fixes: fe3dc04e31aa ("virtio: add helper virtio_find_vqs_ctx_size()")
Cc: "Xuan Zhuo" <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Message-Id: <20220816053602[email protected]>

2 years agovirtio_net: Revert "virtio_net: set the default max ring size by find_vqs()"
Michael S. Tsirkin [Tue, 16 Aug 2022 05:36:27 +0000 (01:36 -0400)]
virtio_net: Revert "virtio_net: set the default max ring size by find_vqs()"

This reverts commit 762faee5a2678559d3dc09d95f8f2c54cd0466a7.

This has been reported to trip up guests on GCP (Google Cloud).
The reason is that virtio_find_vqs_ctx_size is broken on legacy
devices. We can in theory fix virtio_find_vqs_ctx_size but
in fact the patch itself has several other issues:

- It treats unknown speed as < 10G
- It leaves userspace no way to find out the ring size set by hypervisor
- It tests speed when link is down
- It ignores the virtio spec advice:
        Both \field{speed} and \field{duplex} can change, thus the driver
        is expected to re-read these values after receiving a
        configuration change notification.
- It is not clear the performance impact has been tested properly

Revert the patch for now.

Reported-by: Andres Freund <[email protected]>
Link: https://lore.kernel.org/r/20220814212610.GA3690074%40roeck-us.net
Link: https://lore.kernel.org/r/20220815070203.plwjx7b3cyugpdt7%40awork3.anarazel.de
Link: https://lore.kernel.org/r/3df6bb82-1951-455d-a768-e9e1513eb667%40www.fastmail.com
Link: https://lore.kernel.org/r/FCDC5DDE-3CDD-4B8A-916F-CA7D87B547CE%40anarazel.de
Fixes: 762faee5a267 ("virtio_net: set the default max ring size by find_vqs()")
Cc: Xuan Zhuo <[email protected]>
Cc: Jason Wang <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Tested-by: Andres Freund <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Message-Id: <20220816053602[email protected]>

2 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Jakub Kicinski [Tue, 16 Aug 2022 03:14:39 +0000 (20:14 -0700)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-08-12 (iavf)

This series contains updates to iavf driver only.

Przemyslaw frees memory for admin queues in initialization error paths,
prevents freeing of vf_res which is causing null pointer dereference,
and adjusts calls in error path of reset to avoid iavf_close() which
could cause deadlock.

Ivan Vecera avoids deadlock that can occur when driver if part of
failover.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  iavf: Fix deadlock in initialization
  iavf: Fix reset error handling
  iavf: Fix NULL pointer dereference in iavf_get_link_ksettings
  iavf: Fix adminq error handling
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agonet: rtnetlink: fix module reference count leak issue in rtnetlink_rcv_msg
Zhengchao Shao [Mon, 15 Aug 2022 02:46:29 +0000 (10:46 +0800)]
net: rtnetlink: fix module reference count leak issue in rtnetlink_rcv_msg

When bulk delete command is received in the rtnetlink_rcv_msg function,
if bulk delete is not supported, module_put is not called to release
the reference counting. As a result, module reference count is leaked.

Fixes: a6cec0bcd342 ("net: rtnetlink: add bulk delete support flag")
Signed-off-by: Zhengchao Shao <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agonet: moxa: pass pdev instead of ndev to DMA functions
Sergei Antonov [Fri, 12 Aug 2022 17:13:39 +0000 (20:13 +0300)]
net: moxa: pass pdev instead of ndev to DMA functions

dma_map_single() calls fail in moxart_mac_setup_desc_ring() and
moxart_mac_start_xmit() which leads to an incessant output of this:

[   16.043925] moxart-ethernet 92000000.mac eth0: DMA mapping error
[   16.050957] moxart-ethernet 92000000.mac eth0: DMA mapping error
[   16.058229] moxart-ethernet 92000000.mac eth0: DMA mapping error

Passing pdev to DMA is a common approach among net drivers.

Fixes: 6c821bd9edc9 ("net: Add MOXA ART SoCs ethernet driver")
Signed-off-by: Sergei Antonov <[email protected]>
Suggested-by: Andrew Lunn <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agolibbpf: Making bpf_prog_load() ignore name if kernel doesn't support
Hangbin Liu [Sat, 13 Aug 2022 00:09:36 +0000 (08:09 +0800)]
libbpf: Making bpf_prog_load() ignore name if kernel doesn't support

Similar with commit 10b62d6a38f7 ("libbpf: Add names for auxiliary maps"),
let's make bpf_prog_load() also ignore name if kernel doesn't support
program name.

To achieve this, we need to call sys_bpf_prog_load() directly in
probe_kern_prog_name() to avoid circular dependency. sys_bpf_prog_load()
also need to be exported in the libbpf_internal.h file.

Signed-off-by: Hangbin Liu <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Quentin Monnet <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agoselftests/bpf: Update CI kconfig
Daniel Xu [Thu, 11 Aug 2022 21:55:27 +0000 (15:55 -0600)]
selftests/bpf: Update CI kconfig

The previous selftest changes require two kconfig changes in bpf-ci.

Signed-off-by: Daniel Xu <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Kumar Kartikeya Dwivedi <[email protected]>
Link: https://lore.kernel.org/bpf/2c27c6ebf7a03954915f83560653752450389564.1660254747.git.dxu@dxuuu.xyz
2 years agoselftests/bpf: Add connmark read test
Daniel Xu [Thu, 11 Aug 2022 21:55:26 +0000 (15:55 -0600)]
selftests/bpf: Add connmark read test

Test that the prog can read from the connection mark. This test is nice
because it ensures progs can interact with netfilter subsystem
correctly.

Signed-off-by: Daniel Xu <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Kumar Kartikeya Dwivedi <[email protected]>
Link: https://lore.kernel.org/bpf/d3bc620a491e4c626c20d80631063922cbe13e2b.1660254747.git.dxu@dxuuu.xyz
2 years agoselftests/bpf: Add existing connection bpf_*_ct_lookup() test
Daniel Xu [Thu, 11 Aug 2022 21:55:25 +0000 (15:55 -0600)]
selftests/bpf: Add existing connection bpf_*_ct_lookup() test

Add a test where we do a conntrack lookup on an existing connection.
This is nice because it's a more realistic test than artifically
creating a ct entry and looking it up afterwards.

Signed-off-by: Daniel Xu <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Kumar Kartikeya Dwivedi <[email protected]>
Link: https://lore.kernel.org/bpf/de5a617832f38f8b5631cc87e2a836da7c94d497.1660254747.git.dxu@dxuuu.xyz
2 years agobpftool: Clear errno after libcap's checks
Quentin Monnet [Mon, 15 Aug 2022 16:22:05 +0000 (17:22 +0100)]
bpftool: Clear errno after libcap's checks

When bpftool is linked against libcap, the library runs a "constructor"
function to compute the number of capabilities of the running kernel
[0], at the beginning of the execution of the program. As part of this,
it performs multiple calls to prctl(). Some of these may fail, and set
errno to a non-zero value:

    # strace -e prctl ./bpftool version
    prctl(PR_CAPBSET_READ, CAP_MAC_OVERRIDE) = 1
    prctl(PR_CAPBSET_READ, 0x30 /* CAP_??? */) = -1 EINVAL (Invalid argument)
    prctl(PR_CAPBSET_READ, CAP_CHECKPOINT_RESTORE) = 1
    prctl(PR_CAPBSET_READ, 0x2c /* CAP_??? */) = -1 EINVAL (Invalid argument)
    prctl(PR_CAPBSET_READ, 0x2a /* CAP_??? */) = -1 EINVAL (Invalid argument)
    prctl(PR_CAPBSET_READ, 0x29 /* CAP_??? */) = -1 EINVAL (Invalid argument)
    ** fprintf added at the top of main(): we have errno == 1
    ./bpftool v7.0.0
    using libbpf v1.0
    features: libbfd, libbpf_strict, skeletons
    +++ exited with 0 +++

This has been addressed in libcap 2.63 [1], but until this version is
available everywhere, we can fix it on bpftool side.

Let's clean errno at the beginning of the main() function, to make sure
that these checks do not interfere with the batch mode, where we error
out if errno is set after a bpftool command.

  [0] https://git.kernel.org/pub/scm/libs/libcap/libcap.git/tree/libcap/cap_alloc.c?h=libcap-2.65#n20
  [1] https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/?id=f25a1b7e69f7b33e6afb58b3e38f3450b7d2d9a0

Signed-off-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agoselftests/landlock: fix broken include of linux/landlock.h
Guillaume Tucker [Wed, 3 Aug 2022 20:13:54 +0000 (22:13 +0200)]
selftests/landlock: fix broken include of linux/landlock.h

Revert part of the earlier changes to fix the kselftest build when
using a sub-directory from the top of the tree as this broke the
landlock test build as a side-effect when building with "make -C
tools/testing/selftests/landlock".

Reported-by: Mickaël Salaün <[email protected]>
Fixes: a917dd94b832 ("selftests/landlock: drop deprecated headers dependency")
Fixes: f2745dc0ba3d ("selftests: stop using KSFT_KHDR_INSTALL")
Signed-off-by: Guillaume Tucker <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>
2 years agonetfilter: nf_tables: check NFT_SET_CONCAT flag if field_count is specified
Pablo Neira Ayuso [Mon, 15 Aug 2022 15:55:07 +0000 (17:55 +0200)]
netfilter: nf_tables: check NFT_SET_CONCAT flag if field_count is specified

Since f3a2181e16f1 ("netfilter: nf_tables: Support for sets with
multiple ranged fields"), it possible to combine intervals and
concatenations. Later on, ef516e8625dd ("netfilter: nf_tables:
reintroduce the NFT_SET_CONCAT flag") provides the NFT_SET_CONCAT flag
for userspace to report that the set stores a concatenation.

Make sure NFT_SET_CONCAT is set on if field_count is specified for
consistency. Otherwise, if NFT_SET_CONCAT is specified with no
field_count, bail out with EINVAL.

Fixes: ef516e8625dd ("netfilter: nf_tables: reintroduce the NFT_SET_CONCAT flag")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
2 years agonios2: add force_successful_syscall_return()
Al Viro [Mon, 8 Aug 2022 15:09:45 +0000 (16:09 +0100)]
nios2: add force_successful_syscall_return()

If we use the ancient SysV syscall ABI, we'd better have tell the
kernel how to claim that a negative return value is a success.
Use ->orig_r2 for that - it's inaccessible via ptrace, so it's
a fair game for changes and it's normally[*] non-negative on return
from syscall.  Set to -1; syscall is not going to be restart-worthy
by definition, so we won't interfere with that use either.

[*] the only exception is rt_sigreturn(), where we skip the entire
messing with r1/r2 anyway.

Fixes: 82ed08dd1b0e ("nios2: Exception handling")
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Dinh Nguyen <[email protected]>
2 years agonios2: restarts apply only to the first sigframe we build...
Al Viro [Mon, 8 Aug 2022 15:09:16 +0000 (16:09 +0100)]
nios2: restarts apply only to the first sigframe we build...

Fixes: b53e906d255d ("nios2: Signal handling support")
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Dinh Nguyen <[email protected]>
2 years agonios2: fix syscall restart checks
Al Viro [Mon, 8 Aug 2022 15:08:48 +0000 (16:08 +0100)]
nios2: fix syscall restart checks

sys_foo() returns -512 (aka -ERESTARTSYS) => do_signal() sees
512 in r2 and 1 in r1.

sys_foo() returns 512 => do_signal() sees 512 in r2 and 0 in r1.

The former is restart-worthy; the latter obviously isn't.

Fixes: b53e906d255d ("nios2: Signal handling support")
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Dinh Nguyen <[email protected]>
2 years agonios2: traced syscall does need to check the syscall number
Al Viro [Mon, 8 Aug 2022 15:07:21 +0000 (16:07 +0100)]
nios2: traced syscall does need to check the syscall number

all checks done before letting the tracer modify the register
state are worthless...

Fixes: 82ed08dd1b0e ("nios2: Exception handling")
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Dinh Nguyen <[email protected]>
2 years agonios2: don't leave NULLs in sys_call_table[]
Al Viro [Mon, 8 Aug 2022 15:06:46 +0000 (16:06 +0100)]
nios2: don't leave NULLs in sys_call_table[]

fill the gaps in there with sys_ni_syscall, as everyone does...

Fixes: 82ed08dd1b0e ("nios2: Exception handling")
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Dinh Nguyen <[email protected]>
2 years agonios2: page fault et.al. are *not* restartable syscalls...
Al Viro [Mon, 8 Aug 2022 15:06:04 +0000 (16:06 +0100)]
nios2: page fault et.al. are *not* restartable syscalls...

make sure that ->orig_r2 is negative for everything except
the syscalls.

Fixes: 82ed08dd1b0e ("nios2: Exception handling")
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Dinh Nguyen <[email protected]>
2 years agonetfilter: nf_tables: disallow NFT_SET_ELEM_CATCHALL and NFT_SET_ELEM_INTERVAL_END
Pablo Neira Ayuso [Sat, 13 Aug 2022 13:22:05 +0000 (15:22 +0200)]
netfilter: nf_tables: disallow NFT_SET_ELEM_CATCHALL and NFT_SET_ELEM_INTERVAL_END

These flags are mutually exclusive, report EINVAL in this case.

Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
2 years agonetfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags
Pablo Neira Ayuso [Fri, 12 Aug 2022 14:21:28 +0000 (16:21 +0200)]
netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags

If the NFT_SET_CONCAT|NFT_SET_INTERVAL flags are set on, then the
netlink attribute NFTA_SET_ELEM_KEY_END must be specified. Otherwise,
NFTA_SET_ELEM_KEY_END should not be present.

For catch-all element, NFTA_SET_ELEM_KEY_END should not be present.
The NFT_SET_ELEM_INTERVAL_END is never used with this set flags
combination.

Fixes: 7b225d0b5c6d ("netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
2 years agobpf: Clear up confusion in bpf_skb_adjust_room()'s documentation
Quentin Monnet [Fri, 12 Aug 2022 15:37:27 +0000 (16:37 +0100)]
bpf: Clear up confusion in bpf_skb_adjust_room()'s documentation

Adding or removing room space _below_ layers 2 or 3, as the description
mentions, is ambiguous. This was written with a mental image of the
packet with layer 2 at the top, layer 3 under it, and so on. But it has
led users to believe that it was on lower layers (before the beginning
of the L2 and L3 headers respectively).

Let's make it more explicit, and specify between which layers the room
space is adjusted.

Reported-by: Rumen Telbizov <[email protected]>
Signed-off-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agobpftool: Fix a typo in a comment
Quentin Monnet [Fri, 12 Aug 2022 15:37:25 +0000 (16:37 +0100)]
bpftool: Fix a typo in a comment

This is the wrong library name: libcap, not libpcap.

Signed-off-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agoMerge branch 'mlxsw-fixes'
David S. Miller [Mon, 15 Aug 2022 10:49:58 +0000 (11:49 +0100)]
Merge branch 'mlxsw-fixes'

Petr Machata says:

====================
mlxsw: Fixes for PTP support

This set fixes several issues in mlxsw PTP code.

- Patch #1 fixes compilation warnings.

- Patch #2 adjusts the order of operation during cleanup, thereby
  closing the window after PTP state was already cleaned in the ASIC
  for the given port, but before the port is removed, when the user
  could still in theory make changes to the configuration.

- Patch #3 protects the PTP configuration with a custom mutex, instead
  of relying on RTNL, which is not held in all access paths.

- Patch #4 forbids enablement of PTP only in RX or only in TX. The
  driver implicitly assumed this would be the case, but neglected to
  sanitize the configuration.
====================

Signed-off-by: David S. Miller <[email protected]>
2 years agomlxsw: spectrum_ptp: Forbid PTP enablement only in RX or in TX
Amit Cohen [Fri, 12 Aug 2022 15:32:03 +0000 (17:32 +0200)]
mlxsw: spectrum_ptp: Forbid PTP enablement only in RX or in TX

Currently mlxsw driver configures one global PTP configuration for all
ports. The reason is that the switch behaves like a transparent clock
between CPU port and front-panel ports. When time stamp is enabled in
any port, the hardware is configured to update the correction field. The
fact that the configuration of CPU port affects all the ports, makes the
correction field update to be global for all ports. Otherwise, user will
see odd values in the correction field, as the switch will update the
correction field in the CPU port, but not in all the front-panel ports.

The CPU port is relevant in both RX and TX, so to avoid problematic
configuration, forbid PTP enablement only in one direction, i.e., only in
RX or TX.

Without the change:
$ hwstamp_ctl -i swp1 -r 12 -t 0
current settings:
tx_type 0
rx_filter 0
new settings:
tx_type 0
rx_filter 2
$ echo $?
0

With the change:
$ hwstamp_ctl -i swp1 -r 12 -t 0
current settings:
tx_type 1
rx_filter 2
SIOCSHWTSTAMP failed: Invalid argument

Fixes: 08ef8bc825d96 ("mlxsw: spectrum_ptp: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls")
Signed-off-by: Amit Cohen <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agomlxsw: spectrum_ptp: Protect PTP configuration with a mutex
Amit Cohen [Fri, 12 Aug 2022 15:32:02 +0000 (17:32 +0200)]
mlxsw: spectrum_ptp: Protect PTP configuration with a mutex

Currently the functions mlxsw_sp2_ptp_{configure, deconfigure}_port()
assume that they are called when RTNL is locked and they warn otherwise.

The deconfigure function can be called when port is removed, for example
as part of device reload, then there is no locked RTNL and the function
warns [1].

To avoid such case, do not assume that RTNL protects this code, add a
dedicated mutex instead. The mutex protects 'ptp_state->config' which
stores the existing global configuration in hardware. Use this mutex also
to protect the code which configures the hardware. Then, there will be
only one configuration in any time, which will be updated in 'ptp_state'
and a race will be avoided.

[1]:
RTNL: assertion failed at drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c (1600)
WARNING: CPU: 1 PID: 1583493 at drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c:1600 mlxsw_sp2_ptp_hwtstamp_set+0x2d3/0x300 [mlxsw_spectrum]
[...]
CPU: 1 PID: 1583493 Comm: devlink Not tainted5.19.0-rc8-custom-127022-gb371dffda095 #789
Hardware name: Mellanox Technologies Ltd.MSN3420/VMOD0005, BIOS 5.11 01/06/2019
RIP: 0010:mlxsw_sp2_ptp_hwtstamp_set+0x2d3/0x300[mlxsw_spectrum]
[...]
Call Trace:
 <TASK>
 mlxsw_sp_port_remove+0x7e/0x190 [mlxsw_spectrum]
 mlxsw_sp_fini+0xd1/0x270 [mlxsw_spectrum]
 mlxsw_core_bus_device_unregister+0x55/0x280 [mlxsw_core]
 mlxsw_devlink_core_bus_device_reload_down+0x1c/0x30[mlxsw_core]
 devlink_reload+0x1ee/0x230
 devlink_nl_cmd_reload+0x4de/0x580
 genl_family_rcv_msg_doit+0xdc/0x140
 genl_rcv_msg+0xd7/0x1d0
 netlink_rcv_skb+0x49/0xf0
 genl_rcv+0x1f/0x30
 netlink_unicast+0x22f/0x350
 netlink_sendmsg+0x208/0x440
 __sys_sendto+0xf0/0x140
 __x64_sys_sendto+0x1b/0x20
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: 08ef8bc825d96 ("mlxsw: spectrum_ptp: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls")
Reported-by: Ido Schimmel <[email protected]>
Signed-off-by: Amit Cohen <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agomlxsw: spectrum: Clear PTP configuration after unregistering the netdevice
Amit Cohen [Fri, 12 Aug 2022 15:32:01 +0000 (17:32 +0200)]
mlxsw: spectrum: Clear PTP configuration after unregistering the netdevice

Currently as part of removing port, PTP API is called to clear the
existing configuration and set the 'rx_filter' and 'tx_type' to zero.
The clearing is done before unregistering the netdevice, which means that
there is a window of time in which the user can reconfigure PTP in the
port, and this configuration will not be cleared.

Reorder the operations, clear PTP configuration after unregistering the
netdevice.

Fixes: 8748642751ede ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls")
Signed-off-by: Amit Cohen <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agomlxsw: spectrum_ptp: Fix compilation warnings
Amit Cohen [Fri, 12 Aug 2022 15:32:00 +0000 (17:32 +0200)]
mlxsw: spectrum_ptp: Fix compilation warnings

In case that 'CONFIG_PTP_1588_CLOCK' is not enabled in the config file,
there are implementations for the functions
mlxsw_{sp,sp2}_ptp_txhdr_construct() as part of 'spectrum_ptp.h'. In this
case, they should be defined as 'static' as they are not supposed to be
used out of this file. Make the functions 'static', otherwise the following
warnings are returned:

"warning: no previous prototype for 'mlxsw_sp_ptp_txhdr_construct'"
"warning: no previous prototype for 'mlxsw_sp2_ptp_txhdr_construct'"

In addition, make the functions 'inline' for case that 'spectrum_ptp.h'
will be included anywhere else and the functions would probably not be
used, so compilation warnings about unused static will be returned.

Fixes: 24157bc69f45 ("mlxsw: Send PTP packets as data packets to overcome a limitation")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Amit Cohen <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agonet_sched: cls_route: disallow handle of 0
Jamal Hadi Salim [Sun, 14 Aug 2022 11:27:58 +0000 (11:27 +0000)]
net_sched: cls_route: disallow handle of 0

Follows up on:
https://lore.kernel.org/all/20220809170518[email protected]/

handle of 0 implies from/to of universe realm which is not very
sensible.

Lets see what this patch will do:
$sudo tc qdisc add dev $DEV root handle 1:0 prio

//lets manufacture a way to insert handle of 0
$sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 \
route to 0 from 0 classid 1:10 action ok

//gets rejected...
Error: handle of 0 is not valid.
We have an error talking to the kernel, -1

//lets create a legit entry..
sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 route from 10 \
classid 1:10 action ok

//what did the kernel insert?
$sudo tc filter ls dev $DEV parent 1:0
filter protocol ip pref 100 route chain 0
filter protocol ip pref 100 route chain 0 fh 0x000a8000 flowid 1:10 from 10
action order 1: gact action pass
 random type none pass val 0
 index 1 ref 1 bind 1

//Lets try to replace that legit entry with a handle of 0
$ sudo tc filter replace dev $DEV parent 1:0 protocol ip prio 100 \
handle 0x000a8000 route to 0 from 0 classid 1:10 action drop

Error: Replacing with handle of 0 is invalid.
We have an error talking to the kernel, -1

And last, lets run Cascardo's POC:
$ ./poc
0
0
-22
-22
-22

Signed-off-by: Jamal Hadi Salim <[email protected]>
Acked-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agonet: fix potential refcount leak in ndisc_router_discovery()
Xin Xiong [Sat, 13 Aug 2022 12:49:08 +0000 (20:49 +0800)]
net: fix potential refcount leak in ndisc_router_discovery()

The issue happens on specific paths in the function. After both the
object `rt` and `neigh` are grabbed successfully, when `lifetime` is
nonzero but the metric needs change, the function just deletes the
route and set `rt` to NULL. Then, it may try grabbing `rt` and `neigh`
again if above conditions hold. The function simply overwrite `neigh`
if succeeds or returns if fails, without decreasing the reference
count of previous `neigh`. This may result in memory leaks.

Fix it by decrementing the reference count of `neigh` in place.

Fixes: 6b2e04bc240f ("net: allow user to set metric on default route learned via Router Advertisement")
Signed-off-by: Xin Xiong <[email protected]>
Signed-off-by: Xin Tan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net
David S. Miller [Mon, 15 Aug 2022 10:30:10 +0000 (11:30 +0100)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net
-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-08-11 (ice)

This series contains updates to ice driver only.

Benjamin corrects a misplaced parenthesis for a WARN_ON check.

Michal removes WARN_ON from a check as its recoverable and not
warranting of a call trace.
====================

Signed-off-by: David S. Miller <[email protected]>
2 years agoneighbour: make proxy_queue.qlen limit per-device
Alexander Mikhalitsyn [Thu, 11 Aug 2022 15:20:12 +0000 (18:20 +0300)]
neighbour: make proxy_queue.qlen limit per-device

Right now we have a neigh_param PROXY_QLEN which specifies maximum length
of neigh_table->proxy_queue. But in fact, this limitation doesn't work well
because check condition looks like:
tbl->proxy_queue.qlen > NEIGH_VAR(p, PROXY_QLEN)

The problem is that p (struct neigh_parms) is a per-device thing,
but tbl (struct neigh_table) is a system-wide global thing.

It seems reasonable to make proxy_queue limit per-device based.

v2:
- nothing changed in this patch
v3:
- rebase to net tree

Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Yajun Deng <[email protected]>
Cc: Roopa Prabhu <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Alexey Kuznetsov <[email protected]>
Cc: Alexander Mikhalitsyn <[email protected]>
Cc: Konstantin Khorenko <[email protected]>
Cc: [email protected]
Cc: [email protected]
Suggested-by: Denis V. Lunev <[email protected]>
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Reviewed-by: Denis V. Lunev <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoneigh: fix possible DoS due to net iface start/stop loop
Denis V. Lunev [Thu, 11 Aug 2022 15:20:11 +0000 (18:20 +0300)]
neigh: fix possible DoS due to net iface start/stop loop

Normal processing of ARP request (usually this is Ethernet broadcast
packet) coming to the host is looking like the following:
* the packet comes to arp_process() call and is passed through routing
  procedure
* the request is put into the queue using pneigh_enqueue() if
  corresponding ARP record is not local (common case for container
  records on the host)
* the request is processed by timer (within 80 jiffies by default) and
  ARP reply is sent from the same arp_process() using
  NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED condition (flag is set inside
  pneigh_enqueue())

And here the problem comes. Linux kernel calls pneigh_queue_purge()
which destroys the whole queue of ARP requests on ANY network interface
start/stop event through __neigh_ifdown().

This is actually not a problem within the original world as network
interface start/stop was accessible to the host 'root' only, which
could do more destructive things. But the world is changed and there
are Linux containers available. Here container 'root' has an access
to this API and could be considered as untrusted user in the hosting
(container's) world.

Thus there is an attack vector to other containers on node when
container's root will endlessly start/stop interfaces. We have observed
similar situation on a real production node when docker container was
doing such activity and thus other containers on the node become not
accessible.

The patch proposed doing very simple thing. It drops only packets from
the same namespace in the pneigh_queue_purge() where network interface
state change is detected. This is enough to prevent the problem for the
whole node preserving original semantics of the code.

v2:
- do del_timer_sync() if queue is empty after pneigh_queue_purge()
v3:
- rebase to net tree

Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Yajun Deng <[email protected]>
Cc: Roopa Prabhu <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Alexey Kuznetsov <[email protected]>
Cc: Alexander Mikhalitsyn <[email protected]>
Cc: Konstantin Khorenko <[email protected]>
Cc: [email protected]
Cc: [email protected]
Investigated-by: Alexander Mikhalitsyn <[email protected]>
Signed-off-by: Denis V. Lunev <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agonet: qrtr: start MHI channel after endpoit creation
Maxim Kochetkov [Thu, 11 Aug 2022 09:48:40 +0000 (12:48 +0300)]
net: qrtr: start MHI channel after endpoit creation

MHI channel may generates event/interrupt right after enabling.
It may leads to 2 race conditions issues.

1)
Such event may be dropped by qcom_mhi_qrtr_dl_callback() at check:

if (!qdev || mhi_res->transaction_status)
return;

Because dev_set_drvdata(&mhi_dev->dev, qdev) may be not performed at
this moment. In this situation qrtr-ns will be unable to enumerate
services in device.
---------------------------------------------------------------

2)
Such event may come at the moment after dev_set_drvdata() and
before qrtr_endpoint_register(). In this case kernel will panic with
accessing wrong pointer at qcom_mhi_qrtr_dl_callback():

rc = qrtr_endpoint_post(&qdev->ep, mhi_res->buf_addr,
mhi_res->bytes_xferd);

Because endpoint is not created yet.
--------------------------------------------------------------
So move mhi_prepare_for_transfer_autoqueue after endpoint creation
to fix it.

Fixes: a2e2cc0dbb11 ("net: qrtr: Start MHI channels during init")
Signed-off-by: Maxim Kochetkov <[email protected]>
Reviewed-by: Hemant Kumar <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
Reviewed-by: Loic Poulain <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoLinux 6.0-rc1 v6.0-rc1
Linus Torvalds [Sun, 14 Aug 2022 22:50:18 +0000 (15:50 -0700)]
Linux 6.0-rc1

2 years agoradix-tree: replace gfp.h inclusion with gfp_types.h
Yury Norov [Fri, 12 Aug 2022 05:34:25 +0000 (22:34 -0700)]
radix-tree: replace gfp.h inclusion with gfp_types.h

Radix tree header includes gfp.h for __GFP_BITS_SHIFT only. Now we
have gfp_types.h for this.

Fixes powerpc allmodconfig build:

   In file included from include/linux/nodemask.h:97,
                    from include/linux/mmzone.h:17,
                    from include/linux/gfp.h:7,
                    from include/linux/radix-tree.h:12,
                    from include/linux/idr.h:15,
                    from include/linux/kernfs.h:12,
                    from include/linux/sysfs.h:16,
                    from include/linux/kobject.h:20,
                    from include/linux/pci.h:35,
                    from arch/powerpc/kernel/prom_init.c:24:
   include/linux/random.h: In function 'add_latent_entropy':
>> include/linux/random.h:25:46: error: 'latent_entropy' undeclared (first use in this function); did you mean 'add_latent_entropy'?
      25 |         add_device_randomness((const void *)&latent_entropy, sizeof(latent_entropy));
         |                                              ^~~~~~~~~~~~~~
         |                                              add_latent_entropy
   include/linux/random.h:25:46: note: each undeclared identifier is reported only once for each function it appears in

Reported-by: kernel test robot <[email protected]>
CC: Andy Shevchenko <[email protected]>
CC: Andrew Morton <[email protected]>
CC: Jason A. Donenfeld <[email protected]>
Signed-off-by: Yury Norov <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
2 years agoMerge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sun, 14 Aug 2022 20:03:53 +0000 (13:03 -0700)]
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs lseek fix from Al Viro:
 "Fix proc_reg_llseek() breakage. Always had been possible if somebody
  left NULL ->proc_lseek, became a practical issue now"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  take care to handle NULL ->proc_lseek()

2 years agotake care to handle NULL ->proc_lseek()
Al Viro [Sun, 14 Aug 2022 19:16:18 +0000 (15:16 -0400)]
take care to handle NULL ->proc_lseek()

Easily done now, just by clearing FMODE_LSEEK in ->f_mode
during proc_reg_open() for such entries.

Fixes: 868941b14441 "fs: remove no_llseek"
Signed-off-by: Al Viro <[email protected]>
2 years agoMerge tag 'for-linus-6.0-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 14 Aug 2022 16:28:54 +0000 (09:28 -0700)]
Merge tag 'for-linus-6.0-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull more xen updates from Juergen Gross:

 - fix the handling of the "persistent grants" feature negotiation
   between Xen blkfront and Xen blkback drivers

 - a cleanup of xen.config and adding xen.config to Xen section in
   MAINTAINERS

 - support HVMOP_set_evtchn_upcall_vector, which is more compliant to
   "normal" interrupt handling than the global callback used up to now

 - further small cleanups

* tag 'for-linus-6.0-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  MAINTAINERS: add xen config fragments to XEN HYPERVISOR sections
  xen: remove XEN_SCRUB_PAGES in xen.config
  xen/pciback: Fix comment typo
  xen/xenbus: fix return type in xenbus_file_read()
  xen-blkfront: Apply 'feature_persistent' parameter when connect
  xen-blkback: Apply 'feature_persistent' parameter when connect
  xen-blkback: fix persistent grants negotiation
  x86/xen: Add support for HVMOP_set_evtchn_upcall_vector

2 years agoMerge tag 'perf-tools-fixes-for-v6.0-2022-08-13' of git://git.kernel.org/pub/scm...
Linus Torvalds [Sun, 14 Aug 2022 16:22:11 +0000 (09:22 -0700)]
Merge tag 'perf-tools-fixes-for-v6.0-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull more perf tool updates from Arnaldo Carvalho de Melo:

 - 'perf c2c' now supports ARM64, adjust its output to cope with
   differences with what is in x86_64. Now go find false sharing on
   ARM64 (at least Neoverse) as well!

 - Refactor the JSON processing, making the output more compact and thus
   reducing the size of the resulting perf binary

 - Improvements for 'perf offcpu' profiling, including tracking child
   processes

 - Update Intel JSON metrics and events files for broadwellde,
   broadwellx, cascadelakex, haswellx, icelakex, ivytown, jaketown,
   knightslanding, sapphirerapids, skylakex and snowridgex

 - Add 'perf stat' JSON output and a 'perf test' entry for it

 - Ignore memfd and anonymous mmap events if jitdump present

 - Refactor 'perf test' shell tests allowing subdirs

 - Fix an error handling path in 'parse_perf_probe_command()'

 - Fixes for the guest Intel PT tracing patchkit in the 1st batch of
   this merge window

 - Print debuginfod queries if -v option is used, to explain delays in
   processing when debuginfo servers are enabled to fetch DSOs with
   richer symbol tables

 - Improve error message for 'perf record -p not_existing_pid'

 - Fix openssl and libbpf feature detection

 - Add PMU pai_crypto event description for IBM z16 on 'perf list'

 - Fix typos and duplicated words on comments in various places

* tag 'perf-tools-fixes-for-v6.0-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (81 commits)
  perf test: Refactor shell tests allowing subdirs
  perf vendor events: Update events for snowridgex
  perf vendor events: Update events and metrics for skylakex
  perf vendor events: Update metrics for sapphirerapids
  perf vendor events: Update events for knightslanding
  perf vendor events: Update metrics for jaketown
  perf vendor events: Update metrics for ivytown
  perf vendor events: Update events and metrics for icelakex
  perf vendor events: Update events and metrics for haswellx
  perf vendor events: Update events and metrics for cascadelakex
  perf vendor events: Update events and metrics for broadwellx
  perf vendor events: Update metrics for broadwellde
  perf jevents: Fold strings optimization
  perf jevents: Compress the pmu_events_table
  perf metrics: Copy entire pmu_event in find metric
  perf pmu-events: Hide the pmu_events
  perf pmu-events: Don't assume pmu_event is an array
  perf pmu-events: Move test events/metrics to JSON
  perf test: Use full metric resolution
  perf pmu-events: Hide pmu_events_map
  ...

2 years agoMerge tag 'powerpc-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 14 Aug 2022 15:48:13 +0000 (08:48 -0700)]
Merge tag 'powerpc-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Ensure we never emit lwarx with EH=1 on 32-bit, because some 32-bit
   CPUs trap on it rather than ignoring it as they should.

 - Fix ftrace when building with clang, which was broken by some
   refactoring.

 - A couple of other minor fixes.

Thanks to Christophe Leroy, Naveen N.  Rao, Nick Desaulniers, Ondrej
Mosnacek, Pali Rohár, Russell Currey, and Segher Boessenkool.

* tag 'powerpc-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/kexec: Fix build failure from uninitialised variable
  powerpc/ppc-opcode: Fix PPC_RAW_TW()
  powerpc64/ftrace: Fix ftrace for clang builds
  powerpc: Make eh value more explicit when using lwarx
  powerpc: Don't hide eh field of lwarx behind a macro
  powerpc: Fix eh field when calling lwarx on PPC32

2 years agoMerge tag 'pull-work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sun, 14 Aug 2022 00:35:58 +0000 (17:35 -0700)]
Merge tag 'pull-work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull /proc/mounts fix from Al Viro:
 "Fix for /proc/mounts escaping - escape the '#' character too"

* tag 'pull-work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: escape hash as well

2 years agoMerge tag '5.20-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 14 Aug 2022 00:31:18 +0000 (17:31 -0700)]
Merge tag '5.20-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull more cifs updates from Steve French:

 - two fixes for stable, one for a lock length miscalculation, and
   another fixes a lease break timeout bug

 - improvement to handle leases, allows the close timeout to be
   configured more safely

 - five restructuring/cleanup patches

* tag '5.20-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Do not access tcon->cfids->cfid directly from is_path_accessible
  cifs: Add constructor/destructors for tcon->cfid
  SMB3: fix lease break timeout when multiple deferred close handles for the same file.
  smb3: allow deferred close timeout to be configurable
  cifs: Do not use tcon->cfid directly, use the cfid we get from open_cached_dir
  cifs: Move cached-dir functions into a separate file
  cifs: Remove {cifs,nfs}_fscache_release_page()
  cifs: fix lock length calculation

2 years agoafs: Enable multipage folio support
David Howells [Wed, 10 Aug 2022 17:52:47 +0000 (18:52 +0100)]
afs: Enable multipage folio support

Enable multipage folio support for the afs filesystem.

Support has already been implemented in netfslib, fscache and cachefiles
and in most of afs, but I've waited for Matthew Wilcox's latest folio
changes.

Note that it does require a change to afs_write_begin() to return the
correct subpage.  This is a "temporary" change as we're working on
getting rid of the need for ->write_begin() and ->write_end()
completely, at least as far as network filesystems are concerned - but
it doesn't prevent afs from making use of the capability.

Signed-off-by: David Howells <[email protected]>
Acked-by: Matthew Wilcox (Oracle) <[email protected]>
Tested-by: [email protected]
Cc: Marc Dionne <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Linus Torvalds <[email protected]>
2 years agoMerge tag 'timers-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 13 Aug 2022 21:38:22 +0000 (14:38 -0700)]
Merge tag 'timers-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Ingo Molnar:
 "Misc timer fixes:

   - fix a potential use-after-free bug in posix timers

   - correct a prototype

   - address a build warning"

* tag 'timers-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-cpu-timers: Cleanup CPU timers before freeing them during exec
  time: Correct the prototype of ns_to_kernel_old_timeval and ns_to_timespec64
  posix-timers: Make do_clock_gettime() static

2 years agoMerge tag 'x86-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 13 Aug 2022 21:24:12 +0000 (14:24 -0700)]
Merge tag 'x86-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Ingo Molnar:
 "Fix the 'IBPB mitigated RETBleed' mode of operation on AMD CPUs (not
  turned on by default), which also need STIBP enabled (if available) to
  be '100% safe' on even the shortest speculation windows"

* tag 'x86-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Enable STIBP for IBPB mitigated RETBleed

2 years agoMerge tag 'i2c-for-5.20-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sat, 13 Aug 2022 21:06:08 +0000 (14:06 -0700)]
Merge tag 'i2c-for-5.20-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull more i2c updates from Wolfram Sang:

 - two driver fixes for issues introduced this cycle

 - one trivial driver improvement regarding ACPI

 - more DTS conversion and additions

 - documentation updates

 - subsystem-wide move from strlcpy to strscpy

* tag 'i2c-for-5.20-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  docs: i2c: i2c-sysfs: fix hyperlinks
  docs: i2c: i2c-sysfs: improve wording
  docs: i2c: instantiating-devices: add syntax coloring to dts and C blocks
  docs: i2c: smbus-protocol: improve DataLow/DataHigh definition
  docs: i2c: i2c-protocol: remove unused legend items
  docs: i2c: i2c-protocol,smbus-protocol: remove nonsense words
  docs: i2c: i2c-protocol: update introductory paragraph
  i2c: move core from strlcpy to strscpy
  i2c: move drivers from strlcpy to strscpy
  i2c: kempld: Support ACPI I2C device declaration
  i2c: mediatek: add i2c compatible for MT8188
  dt-bindings: i2c: update bindings for mt8188 soc
  i2c: microchip-corei2c: fix erroneous late ack send
  dt-bindings: i2c: qcom,i2c-cci: convert to dtschema
  i2c: qcom-geni: Fix GPI DMA buffer sync-back

2 years agoMerge tag 'ntb-5.20' of https://github.com/jonmason/ntb
Linus Torvalds [Sat, 13 Aug 2022 21:00:45 +0000 (14:00 -0700)]
Merge tag 'ntb-5.20' of https://github.com/jonmason/ntb

Pull NTB updates from Jon Mason:
 "Non-Transparent Bridge updates.

  Fix of heap data and clang warnings, support for a new Intel NTB
  device, and NTB EndPoint Function (EPF) support and the various fixes
  for that"

* tag 'ntb-5.20' of https://github.com/jonmason/ntb:
  MAINTAINERS: add PCI Endpoint NTB drivers to NTB files
  NTB: EPF: Tidy up some bounds checks
  NTB: EPF: Fix error code in epf_ntb_bind()
  PCI: endpoint: pci-epf-vntb: reduce several globals to statics
  PCI: endpoint: pci-epf-vntb: fix error handle in epf_ntb_mw_bar_init()
  PCI: endpoint: Fix Kconfig dependency
  NTB: EPF: set pointer addr to null using NULL rather than 0
  Documentation: PCI: extend subheading underline for "lspci output" section
  Documentation: PCI: Use code-block block for scratchpad registers diagram
  Documentation: PCI: Add specification for the PCI vNTB function device
  PCI: endpoint: Support NTB transfer between RC and EP
  NTB: epf: Allow more flexibility in the memory BAR map method
  PCI: designware-ep: Allow pci_epc_set_bar() update inbound map address
  ntb: intel: add GNR support for Intel PCIe gen5 NTB
  NTB: ntb_tool: uninitialized heap data in tool_fn_write()
  ntb: idt: fix clang -Wformat warnings

2 years agoMerge tag 'xfs-5.20-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Sat, 13 Aug 2022 20:50:11 +0000 (13:50 -0700)]
Merge tag 'xfs-5.20-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull more xfs updates from Darrick Wong:
 "There's not a lot this time around, just the usual bug fixes and
  corrections for missing error returns.

   - Return error codes from block device flushes to userspace

   - Fix a deadlock between reclaim and mount time quotacheck

   - Fix an unnecessary ENOSPC return when doing COW on a filesystem
     with severe free space fragmentation

   - Fix a miscalculation in the transaction reservation computations
     for file removal operations"

* tag 'xfs-5.20-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix inode reservation space for removing transaction
  xfs: Fix false ENOSPC when performing direct write on a delalloc extent in cow fork
  xfs: fix intermittent hang during quotacheck
  xfs: check return codes when flushing block devices

This page took 0.134089 seconds and 4 git commands to generate.