Git Repo - linux.git/log

eth: remove copies of the NAPI_POLL_WEIGHT define

Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.

Drop the special defines in a bunch of drivers where the
removal is relatively simple so grouping into one patch
does not impact reviewability.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Paul Durrant <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Single chip mode detection for MV88E6*41

The mv88e6xxx driver expects switches that are configured in single chip
addressing mode to have the MDIO address configured as 0. This is due to
the switch ADDR pins representing the single chip addressing mode as 0.
However depending on the device (e.g. MV88E6*41) the switch does not
respond on address 0 or any other address below 16 (the first port
address) in single chip addressing mode. This allows for other devices
to be on the same shared MDIO bus despite the switch being in single
chip addressing mode.

When using a switch that works this way it is not possible to configure
switch driver as single chip addressing via device tree, along with
another MDIO device on the same bus with address 0, as both devices
would have the same address of 0 resulting in mdiobus_register_device
-EBUSY errors for one of the devices with address 0.

In order to support this configuration the switch node can have its MDIO
address configured as 16 (the first address that the device responds
to). During initialization the driver will treat this address similar to
how address 0 is, however because this address is also a valid
multi-chip address (in certain switch models, but not all) the driver
will configure the SMI in single chip addressing mode and attempt to
detect the switch model. If the device is configured in single chip
addressing mode this will succeed and the initialization process can
continue. If it fails to detect a valid model this is because the switch
model register is not a valid register when in multi-chip mode, it will
then fall back to the existing SMI initialization process using the MDIO
address as the multi-chip mode address.

This detection method is safe if the device is in either mode because
the single chip addressing mode read is a direct SMI/MDIO read operation
and has no side effects compared to the SMI writes required for the
multi-chip addressing mode.

In order to implement this change, the reset gpio configuration is moved
to occur before any SMI initialization. This ensures that the device has
the same/correct reset gpio state for both mv88e6xxx_smi_init calls.

Signed-off-by: Nathan Rossi <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: prestera: add police action support

- Add HW api to configure policer:
- SR TCM policer mode is only supported for now.
- Policer ingress/egress direction support.
- Add police action support into flower

Signed-off-by: Volodymyr Mytnyk <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phy: Deduplicate interrupt disablement on PHY attach

phy_attach_direct() first calls phy_init_hw() (which restores interrupt
settings through ->config_intr()), then calls phy_disable_interrupts().

So if phydev->interrupts was previously set to 1, interrupts are briefly
enabled, then disabled, which seems nonsensical.

If it was previously set to 0, interrupts are disabled twice, which is
equally nonsensical.

Deduplicate interrupt disablement.

Signed-off-by: Lukas Wunner <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/805ccdc606bd8898d59931bd4c7c68537ed6e550.1651040826.git.lukas@wunner.de
Signed-off-by: Jakub Kicinski <[email protected]>

net: SO_RCVMARK socket option for SO_MARK with recvmsg()

Adding a new socket option, SO_RCVMARK, to indicate that SO_MARK
should be included in the ancillary data returned by recvmsg().

Renamed the sock_recv_ts_and_drops() function to sock_recv_cmsgs().

Signed-off-by: Erin MacNeil <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Acked-by: Marc Kleine-Budde <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

include/linux/netdevice.h
net/core/dev.c
  6510ea973d8d ("net: Use this_cpu_inc() to increment net->core_stats")
  794c24e9921f ("net-core: rx_otherhost_dropped to core_stats")
https://lore.kernel.org/all/20220428111903.5f4304e0@canb.auug.org.au/

drivers/net/wan/cosa.c
  d48fea8401cf ("net: cosa: fix error check return value of register_chrdev()")
  89fbca3307d4 ("net: wan: remove support for COSA and SRP synchronous serial boards")
https://lore.kernel.org/all/20220428112130.1f689e5e@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'net-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, bpf and netfilter.

  Current release - new code bugs:

   - bridge: switchdev: check br_vlan_group() return value

   - use this_cpu_inc() to increment net->core_stats, fix preempt-rt

  Previous releases - regressions:

   - eth: stmmac: fix write to sgmii_adapter_base

  Previous releases - always broken:

   - netfilter: nf_conntrack_tcp: re-init for syn packets only,
     resolving issues with TCP fastopen

   - tcp: md5: fix incorrect tcp_header_len for incoming connections

   - tcp: fix F-RTO may not work correctly when receiving DSACK

   - tcp: ensure use of most recently sent skb when filling rate samples

   - tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT

   - virtio_net: fix wrong buf address calculation when using xdp

   - xsk: fix forwarding when combining copy mode with busy poll

   - xsk: fix possible crash when multiple sockets are created

   - bpf: lwt: fix crash when using bpf_skb_set_tunnel_key() from
     bpf_xmit lwt hook

   - sctp: null-check asoc strreset_chunk in sctp_generate_reconf_event

   - wireguard: device: check for metadata_dst with skb_valid_dst()

   - netfilter: update ip6_route_me_harder to consider L3 domain

   - gre: make o_seqno start from 0 in native mode

   - gre: switch o_seqno to atomic to prevent races in collect_md mode

  Misc:

   - add Eric Dumazet to networking maintainers

   - dt: dsa: realtek: remove realtek,rtl8367s string

   - netfilter: flowtable: Remove the empty file"

* tag 'net-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
  tcp: fix F-RTO may not work correctly when receiving DSACK
  Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits"
  net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASK
  ixgbe: ensure IPsec VF<->PF compatibility
  MAINTAINERS: Update BNXT entry with firmware files
  netfilter: nft_socket: only do sk lookups when indev is available
  net: fec: add missing of_node_put() in fec_enet_init_stop_mode()
  bnx2x: fix napi API usage sequence
  tls: Skip tls_append_frag on zero copy size
  Add Eric Dumazet to networking maintainers
  netfilter: conntrack: fix udp offload timeout sysctl
  netfilter: nf_conntrack_tcp: re-init for syn packets only
  net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK
  net: Use this_cpu_inc() to increment net->core_stats
  Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted
  Bluetooth: hci_event: Fix creating hci_conn object on error status
  Bluetooth: hci_event: Fix checking for invalid handle on error status
  ice: fix use-after-free when deinitializing mailbox snapshot
  ice: wait 5 s for EMP reset after firmware flash
  ice: Protect vf_state check by cfg_lock in ice_vc_process_vf_msg()
  ...

Merge tag 'thermal-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control fixes from Rafael Wysocki:
"These take back recent chages that started to confuse users and fix up
  an attr.show callback prototype in a driver.

  Specifics:

   - Stop warning about deprecation of the userspace thermal governor
     and cooling device status interface, because there are cases in
     which user space has to drive thermal management with the help of
     them (Daniel Lezcano)

   - Fix attr.show callback prototype in the int340x thermal driver
     (Kees Cook)"

* tag 'thermal-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal/governor: Remove deprecated information
  Revert "thermal/core: Deprecate changing cooling device state from userspace"
  thermal: int340x: Fix attr.show callback prototype

Merge tag 'pm-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These fix up recent intel_idle driver changes and fix some ARM cpufreq
  driver issues.

  Specifics:

   - Fix issues with the Qualcomm's cpufreq driver (Dmitry Baryshkov,
     Vladimir Zapolskiy).

   - Fix memory leak with the Sun501 driver (Xiaobing Luo).

   - Make intel_idle enable C1E promotion on all CPUs when C1E is
     preferred to C1 (Artem Bityutskiy).

   - Make C6 optimization on Sapphire Rapids added recently work as
     expected if both C1E and C1 are "preferred" (Artem Bityutskiy)"

* tag 'pm-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  intel_idle: Fix SPR C6 optimization
  intel_idle: Fix the 'preferred_cstates' module parameter
  cpufreq: qcom-cpufreq-hw: Clear dcvs interrupts
  cpufreq: fix memory leak in sun50i_cpufreq_nvmem_probe
  cpufreq: qcom-cpufreq-hw: Fix throttle frequency value on EPSS platforms
  cpufreq: qcom-hw: provide online/offline operations
  cpufreq: qcom-hw: fix the opp entries refcounting
  cpufreq: qcom-hw: fix the race between LMH worker and cpuhp
  cpufreq: qcom-hw: drop affinity hint before freeing the IRQ

Merge tag 'acpi-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael WysockiL
"These fix up the ACPI processor driver after a change made during the
  5.16 cycle that inadvertently broke falling back to shallower C-states
  when C3 cannot be used.

  Specifics:

   - Make the ACPI processor driver avoid falling back to C3 type of
     C-states when C3 cannot be requested (Ville Syrjälä)

   - Revert a quirk that is not necessary any more after fixing the
     underlying issue properly (Ville Syrjälä)"

* tag 'acpi-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Revert "ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40"
  ACPI: processor: idle: Avoid falling back to C3 type C-states

Merge tag 'platform-drivers-x86-v5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:
"Highlights:

   - asus-wmi bug-fixes

   - intel-sdsu bug-fixes

   - build (warning) fixes

   - couple of hw-id additions"

* tag 'platform-drivers-x86-v5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86/intel: pmc/core: change pmc_lpm_modes to static
  platform/x86/intel/sdsi: Fix bug in multi packet reads
  platform/x86/intel/sdsi: Poll on ready bit for writes
  platform/x86/intel/sdsi: Handle leaky bucket
  platform/x86: intel-uncore-freq: Prevent driver loading in guests
  platform/x86: gigabyte-wmi: added support for B660 GAMING X DDR4 motherboard
  platform/x86: dell-laptop: Add quirk entry for Latitude 7520
  platform/x86: asus-wmi: Fix driver not binding when fan curve control probe fails
  platform/x86: asus-wmi: Potential buffer overflow in asus_wmi_evaluate_method_buf()
  tools/power/x86/intel-speed-select: fix build failure when using -Wl,--as-needed

Merge tag 'regulator-fix-v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
"A minor fix for the DT binding documentation of the rt5190a driver"

* tag 'regulator-fix-v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: dt-bindings: Revise the rt5190a buck/ldo description

tcp: fix F-RTO may not work correctly when receiving DSACK

Currently DSACK is regarded as a dupack, which may cause
F-RTO to incorrectly enter "loss was real" when receiving
DSACK.

Packetdrill to demonstrate:

// Enable F-RTO and TLP
    0 `sysctl -q net.ipv4.tcp_frto=2`
    0 `sysctl -q net.ipv4.tcp_early_retrans=3`
    0 `sysctl -q net.ipv4.tcp_congestion_control=cubic`

// Establish a connection
   +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

// RTT 10ms, RTO 210ms
  +.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
   +0 > S. 0:0(0) ack 1 <...>
+.01 < . 1:1(0) ack 1 win 257
   +0 accept(3, ..., ...) = 4

// Send 2 data segments
   +0 write(4, ..., 2000) = 2000
   +0 > P. 1:2001(2000) ack 1

// TLP
+.022 > P. 1001:2001(1000) ack 1

// Continue to send 8 data segments
   +0 write(4, ..., 10000) = 10000
   +0 > P. 2001:10001(8000) ack 1

// RTO
+.188 > . 1:1001(1000) ack 1

// The original data is acked and new data is sent(F-RTO step 2.b)
   +0 < . 1:1(0) ack 2001 win 257
   +0 > P. 10001:12001(2000) ack 1

// D-SACK caused by TLP is regarded as a dupack, this results in
// the incorrect judgment of "loss was real"(F-RTO step 3.a)
+.022 < . 1:1(0) ack 2001 win 257 <sack 1001:2001,nop,nop>

// Never-retransmitted data(3001:4001) are acked and
// expect to switch to open state(F-RTO step 3.b)
   +0 < . 1:1(0) ack 4001 win 257
+0 %{ assert tcpi_ca_state == 0, tcpi_ca_state }%

Fixes: e33099f96d99 ("tcp: implement RFC5682 F-RTO")
Signed-off-by: Pengcheng Yang <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Tested-by: Neal Cardwell <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Fix incorrect TCP connection tracking window reset for non-syn
   packets, from Florian Westphal.

2) Incorrect dependency on CONFIG_NFT_FLOW_OFFLOAD, from Volodymyr Mytnyk.

3) Fix nft_socket from the output path, from Florian Westphal.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_socket: only do sk lookups when indev is available
  netfilter: conntrack: fix udp offload timeout sysctl
  netfilter: nf_conntrack_tcp: re-init for syn packets only
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'gfs2-v5.18-rc4-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Andreas Gruenbacher:

- No short reads or writes upon glock contention

* tag 'gfs2-v5.18-rc4-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: No short reads or writes upon glock contention

Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits"

This reverts commit 723ad916134784b317b72f3f6cf0f7ba774e5dae

When client requests channel or ring size larger than what the server
can support the server will cap the request to the supported max. So,
the client would not be able to successfully request resources that
exceed the server limit.

Fixes: 723ad9161347 ("ibmvnic: Add ethtool private flag for driver-defined queue limits")
Signed-off-by: Dany Madden <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASK

The Time-Specified Departure feature is indeed mutually exclusive with
TX IP checksumming in ENETC, but TX checksumming in itself is broken and
was removed from this driver in commit 82728b91f124 ("enetc: Remove Tx
checksumming offload code").

The blamed commit declared NETIF_F_HW_CSUM in dev->features to comply
with software TSO's expectations, and still did the checksumming in
software by calling skb_checksum_help(). So there isn't any restriction
for the Time-Specified Departure feature.

However, enetc_setup_tc_txtime() doesn't understand that, and blindly
looks for NETIF_F_CSUM_MASK.

Instead of checking for things which can literally never happen in the
current code base, just remove the check and let the driver offload
tc-etf qdiscs.

Fixes: acede3c5dad5 ("net: enetc: declare NETIF_F_HW_CSUM and do it in software")
Signed-off-by: Vladimir Oltean <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ixgbe: ensure IPsec VF<->PF compatibility

The VF driver can forward any IPsec flags and such makes the function
is not extendable and prone to backward/forward incompatibility.

If new software runs on VF, it won't know that PF configured something
completely different as it "knows" only XFRM_OFFLOAD_INBOUND flag.

Fixes: eda0333ac293 ("ixgbe: add VF IPsec management")
Reviewed-by: Raed Salem <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Shannon Nelson <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'xfs-5.18-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Dave Chinner:

- define buffer bit flags as unsigned to fix gcc-5 + c11 warnings

- remove redundant XFS fields from MAINTAINERS

- fix inode buffer locking order regression

* tag 'xfs-5.18-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: reorder iunlink remove operation in xfs_ifree
  MAINTAINERS: update IOMAP FILESYSTEM LIBRARY and XFS FILESYSTEM
  xfs: convert buffer flags to unsigned.

MAINTAINERS: Update BNXT entry with firmware files

There appears to be a maintainer gap for BNXT TEE firmware files which
causes some patches to be missed. Update the entry for the BNXT Ethernet
controller with its companion firmware files.

Signed-off-by: Florian Fainelli <[email protected]>
Reviewed-by: Michael Chan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'thermal-int340x'

Merge a fix for the attr.show callback prototype in the int340x thermal
driver (Kees Cook).

* thermal-int340x:
thermal: int340x: Fix attr.show callback prototype

netfilter: nft_socket: only do sk lookups when indev is available

Check if the incoming interface is available and NFT_BREAK
in case neither skb->sk nor input device are set.

Because nf_sk_lookup_slow*() assume packet headers are in the
'in' direction, use in postrouting is not going to yield a meaningful
result. Same is true for the forward chain, so restrict the use
to prerouting, input and output.

Use in output work if a socket is already attached to the skb.

Fixes: 554ced0a6e29 ("netfilter: nf_tables: add support for native socket matching")
Reported-and-tested-by: Topi Miettinen <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

Merge branch 'pm-cpuidle'

Merge cpuidle fixes for 5.18-rc5:

- Make intel_idle enable C1E promotion on all CPUs when C1E is
   preferred to C1 (Artem Bityutskiy).

- Make C6 optimization on Sapphire Rapids added recently work as
   expected if both C1E and C1 are "preferred" (Artem Bityutskiy).

* pm-cpuidle:
  intel_idle: Fix SPR C6 optimization
  intel_idle: Fix the 'preferred_cstates' module parameter

gfs2: No short reads or writes upon glock contention

Commit 00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for buffered
I/O") changed gfs2_file_read_iter() and gfs2_file_buffered_write() to
allow dropping the inode glock while faulting in user buffers. When the
lock was dropped, a short result was returned to indicate that the
operation was interrupted.

As pointed out by Linus (see the link below), this behavior is broken
and the operations should always re-acquire the inode glock and resume
the operation instead.

Link: https://lore.kernel.org/lkml/CAHk-=whaz-g_nOOoo8RRiWNjnv2R+h6_xk2F1J4TuSRxk1MtLw@mail.gmail.com/
Fixes: 00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for buffered I/O")
Signed-off-by: Andreas Gruenbacher <[email protected]>

net: make sure net_rx_action() calls skb_defer_free_flush()

I missed a stray return; in net_rx_action(), which very well
is taken whenever trigger_rx_softirq() has been called on
a cpu that is no longer receiving network packets,
or receiving too few of them.

Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Ido Schimmel <[email protected]>
Tested-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge tag 'for-net-2022-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

- Fix regression causing some HCI events to be discarded when they
   shouldn't.

* tag 'for-net-2022-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted
  Bluetooth: hci_event: Fix creating hci_conn object on error status
  Bluetooth: hci_event: Fix checking for invalid handle on error status
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'add-reset-deassertion-for-aspeed-mdio'

Dylan Hung says:

====================
Add reset deassertion for Aspeed MDIO

Add missing reset deassertion for Aspeed MDIO bus controller. The reset
is asserted by the hardware when power-on so the driver only needs to
deassert it. To be able to work with the old DT blobs, the reset is
optional since it may be deasserted by the bootloader or the previous
kernel.

V6:
- fix merge conflict for net-next

V5:
- fix error of dt_binding_check

V4:
- use ASPEED_RESET_MII instead of hardcoding in dt-binding example

V3:
- remove reset property from the required list of the device tree
bindings
- remove "Cc: [email protected]" from the commit messages
- add more description in the commit message of the dt-binding

V2:
- add reset property in the device tree bindings
- add reset assertion in the error path and driver remove
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

ARM: dts: aspeed: add reset properties into MDIO nodes

Add reset control properties into MDIO nodes. The 4 MDIO controllers in
AST2600 SOC share one reset control bit SCU50[3].

Signed-off-by: Dylan Hung <[email protected]>
Reviewed-by: Andrew Jeffery <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: mdio: add reset control for Aspeed MDIO

Add reset assertion/deassertion for Aspeed MDIO. There are 4 MDIO
controllers embedded in Aspeed AST2600 SOC and share one reset control
register SCU50[3]. To work with old DT blobs which don't have the reset
property, devm_reset_control_get_optional_shared is used in this change.

Signed-off-by: Dylan Hung <[email protected]>
Reviewed-by: Philipp Zabel <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Andrew Jeffery <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

dt-bindings: net: add reset property for aspeed, ast2600-mdio binding

The AST2600 MDIO bus controller has a reset control bit and must be
deasserted before manipulating the MDIO controller. By default, the
hardware asserts the reset so the driver only need to deassert it.

Regarding to the old DT blobs which don't have reset property in them,
the reset deassertion is usually done by the bootloader so the reset
property is optional to work with them.

Signed-off-by: Dylan Hung <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Reviewed-by: Andrew Jeffery <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: fec: add missing of_node_put() in fec_enet_init_stop_mode()

Put device node in error path in fec_enet_init_stop_mode().

Fixes: 8a448bf832af ("net: ethernet: fec: move GPR register offset and bit into DT")
Signed-off-by: Yang Yingliang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

bnx2x: fix napi API usage sequence

While handling PCI errors (AER flow) driver tries to
disable NAPI [napi_disable()] after NAPI is deleted
[__netif_napi_del()] which causes unexpected system
hang/crash.

System message log shows the following:
=======================================
[ 3222.537510] EEH: Detected PCI bus error on PHB#384-PE#800000 [ 3222.537511] EEH: This PCI device has failed 2 times in the last hour and will be permanently disabled after 5 failures.
[ 3222.537512] EEH: Notify device drivers to shutdown [ 3222.537513] EEH: Beginning: 'error_detected(IO frozen)'
[ 3222.537514] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
bnx2x->error_detected(IO frozen)
[ 3222.537516] bnx2x: [bnx2x_io_error_detected:14236(eth14)]IO error detected [ 3222.537650] EEH: PE#800000 (PCI 0384:80:00.0): bnx2x driver reports:
'need reset'
[ 3222.537651] EEH: PE#800000 (PCI 0384:80:00.1): Invoking
bnx2x->error_detected(IO frozen)
[ 3222.537651] bnx2x: [bnx2x_io_error_detected:14236(eth13)]IO error detected [ 3222.537729] EEH: PE#800000 (PCI 0384:80:00.1): bnx2x driver reports:
'need reset'
[ 3222.537729] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
[ 3222.537890] EEH: Collect temporary log [ 3222.583481] EEH: of node=0384:80:00.0 [ 3222.583519] EEH: PCI device/vendor: 168e14e4 [ 3222.583557] EEH: PCI cmd/status register: 00100140 [ 3222.583557] EEH: PCI-E capabilities and status follow:
[ 3222.583744] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.583892] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.583893] EEH: PCI-E 20: 00000000 [ 3222.583893] EEH: PCI-E AER capability register set follows:
[ 3222.584079] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.584230] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.584378] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.584416] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.584416] EEH: of node=0384:80:00.1 [ 3222.584454] EEH: PCI device/vendor: 168e14e4 [ 3222.584491] EEH: PCI cmd/status register: 00100140 [ 3222.584492] EEH: PCI-E capabilities and status follow:
[ 3222.584677] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.584825] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.584826] EEH: PCI-E 20: 00000000 [ 3222.584826] EEH: PCI-E AER capability register set follows:
[ 3222.585011] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.585160] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.585309] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.585347] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.586872] RTAS: event: 5, Type: Platform Error (224), Severity: 2 [ 3222.586873] EEH: Reset without hotplug activity [ 3224.762767] EEH: Beginning: 'slot_reset'
[ 3224.762770] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
bnx2x->slot_reset()
[ 3224.762771] bnx2x: [bnx2x_io_slot_reset:14271(eth14)]IO slot reset initializing...
[ 3224.762887] bnx2x 0384:80:00.0: enabling device (0140 -> 0142) [ 3224.768157] bnx2x: [bnx2x_io_slot_reset:14287(eth14)]IO slot reset
--> driver unload

Uninterruptible tasks
=====================
crash> ps | grep UN
     213      2  11  c000000004c89e00  UN   0.0       0      0  [eehd]
     215      2   0  c000000004c80000  UN   0.0       0      0
[kworker/0:2]
    2196      1  28  c000000004504f00  UN   0.1   15936  11136  wickedd
    4287      1   9  c00000020d076800  UN   0.0    4032   3008  agetty
    4289      1  20  c00000020d056680  UN   0.0    7232   3840  agetty
   32423      2  26  c00000020038c580  UN   0.0       0      0
[kworker/26:3]
   32871   4241  27  c0000002609ddd00  UN   0.1   18624  11648  sshd
   32920  10130  16  c00000027284a100  UN   0.1   48512  12608  sendmail
   33092  32987   0  c000000205218b00  UN   0.1   48512  12608  sendmail
   33154   4567  16  c000000260e51780  UN   0.1   48832  12864  pickup
   33209   4241  36  c000000270cb6500  UN   0.1   18624  11712  sshd
   33473  33283   0  c000000205211480  UN   0.1   48512  12672  sendmail
   33531   4241  37  c00000023c902780  UN   0.1   18624  11648  sshd

EEH handler hung while bnx2x sleeping and holding RTNL lock
===========================================================
crash> bt 213
PID: 213    TASK: c000000004c89e00  CPU: 11  COMMAND: "eehd"
  #0 [c000000004d477e0] __schedule at c000000000c70808
  #1 [c000000004d478b0] schedule at c000000000c70ee0
  #2 [c000000004d478e0] schedule_timeout at c000000000c76dec
  #3 [c000000004d479c0] msleep at c0000000002120cc
  #4 [c000000004d479f0] napi_disable at c000000000a06448
                                        ^^^^^^^^^^^^^^^^
  #5 [c000000004d47a30] bnx2x_netif_stop at c0080000018dba94 [bnx2x]
  #6 [c000000004d47a60] bnx2x_io_slot_reset at c0080000018a551c [bnx2x]
  #7 [c000000004d47b20] eeh_report_reset at c00000000004c9bc
  #8 [c000000004d47b90] eeh_pe_report at c00000000004d1a8
  #9 [c000000004d47c40] eeh_handle_normal_event at c00000000004da64

And the sleeping source code
============================
crash> dis -ls c000000000a06448
FILE: ../net/core/dev.c
LINE: 6702

   6697  {
   6698          might_sleep();
   6699          set_bit(NAPI_STATE_DISABLE, &n->state);
   6700
   6701          while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
* 6702                  msleep(1);
   6703          while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
   6704                  msleep(1);
   6705
   6706          hrtimer_cancel(&n->timer);
   6707
   6708          clear_bit(NAPI_STATE_DISABLE, &n->state);
   6709  }

EEH calls into bnx2x twice based on the system log above, first through
bnx2x_io_error_detected() and then bnx2x_io_slot_reset(), and executes
the following call chains:

bnx2x_io_error_detected()
  +-> bnx2x_eeh_nic_unload()
       +-> bnx2x_del_all_napi()
            +-> __netif_napi_del()

bnx2x_io_slot_reset()
  +-> bnx2x_netif_stop()
       +-> bnx2x_napi_disable()
            +->napi_disable()

Fix this by correcting the sequence of NAPI APIs usage,
that is delete the NAPI after disabling it.

Fixes: 7fa6f34081f1 ("bnx2x: AER revised")
Reported-by: David Christensen <[email protected]>
Tested-by: David Christensen <[email protected]>
Signed-off-by: Manish Chopra <[email protected]>
Signed-off-by: Ariel Elior <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2022-04-27

We've added 85 non-merge commits during the last 18 day(s) which contain
a total of 163 files changed, 4499 insertions(+), 1521 deletions(-).

The main changes are:

1) Teach libbpf to enhance BPF verifier log with human-readable and relevant
   information about failed CO-RE relocations, from Andrii Nakryiko.

2) Add typed pointer support in BPF maps and enable it for unreferenced pointers
   (via probe read) and referenced ones that can be passed to in-kernel helpers,
   from Kumar Kartikeya Dwivedi.

3) Improve xsk to break NAPI loop when rx queue gets full to allow for forward
   progress to consume descriptors, from Maciej Fijalkowski & Björn Töpel.

4) Fix a small RCU read-side race in BPF_PROG_RUN routines which dereferenced
   the effective prog array before the rcu_read_lock, from Stanislav Fomichev.

5) Implement BPF atomic operations for RV64 JIT, and add libbpf parsing logic
   for USDT arguments under riscv{32,64}, from Pu Lehui.

6) Implement libbpf parsing of USDT arguments under aarch64, from Alan Maguire.

7) Enable bpftool build for musl and remove nftw with FTW_ACTIONRETVAL usage
   so it can be shipped under Alpine which is musl-based, from Dominique Martinet.

8) Clean up {sk,task,inode} local storage trace RCU handling as they do not
   need to use call_rcu_tasks_trace() barrier, from KP Singh.

9) Improve libbpf API documentation and fix error return handling of various
   API functions, from Grant Seltzer.

10) Enlarge offset check for bpf_skb_{load,store}_bytes() helpers given data
    length of frags + frag_list may surpass old offset limit, from Liu Jian.

11) Various improvements to prog_tests in area of logging, test execution
    and by-name subtest selection, from Mykola Lysenko.

12) Simplify map_btf_id generation for all map types by moving this process
    to build time with help of resolve_btfids infra, from Menglong Dong.

13) Fix a libbpf bug in probing when falling back to legacy bpf_probe_read*()
    helpers; the probing caused always to use old helpers, from Runqing Yang.

14) Add support for ARCompact and ARCv2 platforms for libbpf's PT_REGS
    tracing macros, from Vladimir Isaev.

15) Cleanup BPF selftests to remove old & unneeded rlimit code given kernel
    switched to memcg-based memory accouting a while ago, from Yafang Shao.

16) Refactor of BPF sysctl handlers to move them to BPF core, from Yan Zhu.

17) Fix BPF selftests in two occasions to work around regressions caused by latest
    LLVM to unblock CI until their fixes are worked out, from Yonghong Song.

18) Misc cleanups all over the place, from various others.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits)
  selftests/bpf: Add libbpf's log fixup logic selftests
  libbpf: Fix up verifier log for unguarded failed CO-RE relos
  libbpf: Simplify bpf_core_parse_spec() signature
  libbpf: Refactor CO-RE relo human description formatting routine
  libbpf: Record subprog-resolved CO-RE relocations unconditionally
  selftests/bpf: Add CO-RE relos and SEC("?...") to linked_funcs selftests
  libbpf: Avoid joining .BTF.ext data with BPF programs by section name
  libbpf: Fix logic for finding matching program for CO-RE relocation
  libbpf: Drop unhelpful "program too large" guess
  libbpf: Fix anonymous type check in CO-RE logic
  bpf: Compute map_btf_id during build time
  selftests/bpf: Add test for strict BTF type check
  selftests/bpf: Add verifier tests for kptr
  selftests/bpf: Add C tests for kptr
  libbpf: Add kptr type tag macros to bpf_helpers.h
  bpf: Make BTF type match stricter for release arguments
  bpf: Teach verifier about kptr_get kfunc helpers
  bpf: Wire up freeing of referenced kptr
  bpf: Populate pairs of btf_id and destructor kfunc in btf
  bpf: Adapt copy_map_value for multiple offset case
  ...
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: ksz9477: move get_stats64 to ksz_common.c

The mib counters for the ksz9477 is same for the ksz9477 switch and
LAN937x switch. Hence moving it to ksz_common.c file in order to have it
generic function. The DSA hook get_stats64 now can call ksz_get_stats64.

Signed-off-by: Arun Ramadoss <[email protected]>
Reviewed-by: Oleksij Rempel <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tls: Skip tls_append_frag on zero copy size

Calling tls_append_frag when max_open_record_len == record->len might
add an empty fragment to the TLS record if the call happens to be on the
page boundary. Normally tls_append_frag coalesces the zero-sized
fragment to the previous one, but not if it's on page boundary.

If a resync happens then, the mlx5 driver posts dump WQEs in
tx_post_resync_dump, and the empty fragment may become a data segment
with byte_count == 0, which will confuse the NIC and lead to a CQE
error.

This commit fixes the described issue by skipping tls_append_frag on
zero size to avoid adding empty fragments. The fix is not in the driver,
because an empty fragment is hardly the desired behavior.

Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2022-04-27

We've added 5 non-merge commits during the last 20 day(s) which contain
a total of 6 files changed, 34 insertions(+), 12 deletions(-).

The main changes are:

1) Fix xsk sockets when rx and tx are separately bound to the same umem, also
   fix xsk copy mode combined with busy poll, from Maciej Fijalkowski.

2) Fix BPF tunnel/collect_md helpers with bpf_xmit lwt hook usage which triggered
   a crash due to invalid metadata_dst access, from Eyal Birger.

3) Fix release of page pool in XDP live packet mode, from Toke Høiland-Jørgensen.

4) Fix potential NULL pointer dereference in kretprobes, from Adam Zabrocki.

   (Masami & Steven preferred this small fix to be routed via bpf tree given it's
    follow-up fix to Masami's rethook work that went via bpf earlier, too.)

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  xsk: Fix possible crash when multiple sockets are created
  kprobes: Fix KRETPROBES when CONFIG_KRETPROBE_ON_RETHOOK is set
  bpf, lwt: Fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook
  bpf: Fix release of page_pool in BPF_PROG_RUN in test runner
  xsk: Fix l2fwd for copy mode + busy poll combo
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'akpm' (patches from Andrew)

Merge fixes from Andrew Morton:
"Two patches.

  Subsystems affected by this patch series: mm/kasan and mm/debug"

* emailed patches from Andrew Morton <[email protected]>:
  docs: vm/page_owner: use literal blocks for param description
  kasan: prevent cpu_quarantine corruption when CPU offline and cache shrink occur at same time

docs: vm/page_owner: use literal blocks for param description

Sphinx generates hard-to-read lists of parameters at the bottom of the
page. Fix them by putting literal-block markers of "::" in front of
them.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Akira Yokosawa <[email protected]>
Fixes: 57f2b54a9379 ("Documentation/vm/page_owner.rst: update the documentation")
Cc: Shenghong Han <[email protected]>
Cc: Haowen Bai <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Alex Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kasan: prevent cpu_quarantine corruption when CPU offline and cache shrink occur at same time

kasan_quarantine_remove_cache() is called in kmem_cache_shrink()/
destroy(). The kasan_quarantine_remove_cache() call is protected by
cpuslock in kmem_cache_destroy() to ensure serialization with
kasan_cpu_offline().

However the kasan_quarantine_remove_cache() call is not protected by
cpuslock in kmem_cache_shrink(). When a CPU is going offline and cache
shrink occurs at same time, the cpu_quarantine may be corrupted by
interrupt (per_cpu_remove_cache operation).

So add a cpu_quarantine offline flags check in per_cpu_remove_cache().

[[email protected]: add comment, per Zqiang]

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zqiang <[email protected]>
Reviewed-by: Dmitry Vyukov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

intel_idle: Fix SPR C6 optimization

The Sapphire Rapids (SPR) C6 optimization was added to the end of the
'spr_idle_state_table_update()' function. However, the function has a
'return' which may happen before the optimization has a chance to run.
And this may prevent the optimization from happening.

This is an unlikely scenario, but possible if user boots with, say,
the 'intel_idle.preferred_cstates=6' kernel boot option.

This patch fixes the issue by eliminating the problematic 'return'
statement.

Fixes: 3a9cf77b60dc ("intel_idle: add core C6 optimization for SPR")
Suggested-by: Jan Beulich <[email protected]>
Reported-by: Jan Beulich <[email protected]>
Signed-off-by: Artem Bityutskiy <[email protected]>
[ rjw: Minor changelog edits ]
Signed-off-by: Rafael J. Wysocki <[email protected]>

intel_idle: Fix the 'preferred_cstates' module parameter

Problem description.

When user boots kernel up with the 'intel_idle.preferred_cstates=4' option,
we enable C1E and disable C1 states on Sapphire Rapids Xeon (SPR). In order
for C1E to work on SPR, we have to enable the C1E promotion bit on all
CPUs. However, we enable it only on one CPU.

Fix description.

The 'intel_idle' driver already has the infrastructure for disabling C1E
promotion on every CPU. This patch uses the same infrastructure for
enabling C1E promotion on every CPU. It changes the boolean
'disable_promotion_to_c1e' variable to a tri-state 'c1e_promotion'
variable.

Tested on a 2-socket SPR system. I verified the following combinations:

* C1E promotion enabled and disabled in BIOS.
* Booted with and without the 'intel_idle.preferred_cstates=4' kernel
argument.

In all 4 cases C1E promotion was correctly set on all CPUs.

Also tested on an old Broadwell system, just to make sure it does not cause
a regression. C1E promotion was correctly disabled on that system, both C1
and C1E were exposed (as expected).

Fixes: da0e58c038e6 ("intel_idle: add 'preferred_cstates' module argument")
Reported-by: Jan Beulich <[email protected]>
Signed-off-by: Artem Bityutskiy <[email protected]>
[ rjw: Minor changelog edits ]
Signed-off-by: Rafael J. Wysocki <[email protected]>

Merge tag 'cpufreq-arm-fixes-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm

Pull ARM cpufreq fixes for 5.18-rc5 from Viresh Kumar:

"- Fix issues with the Qualcomm's cpufreq driver (Dmitry Baryshkov and
   Vladimir Zapolskiy).
- Fix memory leak with the Sun501 driver (Xiaobing Luo)."

* tag 'cpufreq-arm-fixes-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  cpufreq: qcom-cpufreq-hw: Clear dcvs interrupts
  cpufreq: fix memory leak in sun50i_cpufreq_nvmem_probe
  cpufreq: qcom-cpufreq-hw: Fix throttle frequency value on EPSS platforms
  cpufreq: qcom-hw: provide online/offline operations
  cpufreq: qcom-hw: fix the opp entries refcounting
  cpufreq: qcom-hw: fix the race between LMH worker and cpuhp
  cpufreq: qcom-hw: drop affinity hint before freeing the IRQ

hex2bin: fix access beyond string end

If we pass too short string to "hex2bin" (and the string size without
the terminating NUL character is even), "hex2bin" reads one byte after
the terminating NUL character. This patch fixes it.

Note that hex_to_bin returns -1 on error and hex2bin return -EINVAL on
error - so we can't just return the variable "hi" or "lo" on error.
This inconsistency may be fixed in the next merge window, but for the
purpose of fixing this bug, we just preserve the existing behavior and
return -1 and -EINVAL.

Signed-off-by: Mikulas Patocka <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Fixes: b78049831ffe ("lib: add error checking to hex2bin")
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>

hex2bin: make the function hex_to_bin constant-time

The function hex2bin is used to load cryptographic keys into device
mapper targets dm-crypt and dm-integrity. It should take constant time
independent on the processed data, so that concurrently running
unprivileged code can't infer any information about the keys via
microarchitectural convert channels.

This patch changes the function hex_to_bin so that it contains no
branches and no memory accesses.

Note that this shouldn't cause performance degradation because the size
of the new function is the same as the size of the old function (on
x86-64) - and the new function causes no branch misprediction penalties.

I compile-tested this function with gcc on aarch64 alpha arm hppa hppa64
i386 ia64 m68k mips32 mips64 powerpc powerpc64 riscv sh4 s390x sparc32
sparc64 x86_64 and with clang on aarch64 arm hexagon i386 mips32 mips64
powerpc powerpc64 s390x sparc32 sparc64 x86_64 to verify that there are
no branches in the generated code.

Signed-off-by: Mikulas Patocka <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'zonefs-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

Pull zonefs fixes from Damien Le Moal:
"Two fixes for rc5:

   - Fix inode initialization to make sure that the inode flags are all
     cleared.

   - Use zone reset operation instead of close to make sure that the
     zone of an empty sequential file in never in an active state after
     closing the file"

* tag 'zonefs-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: Fix management of open zones
  zonefs: Clear inode information flags on inode creation

Merge tag 'mtd/fixes-for-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux

Pull MTD fixes from Miquel Raynal:
"Core fix:

   - Fix a possible data corruption of the 'part' field in mtd_info

  Rawnand fixes:

   - Fix the check on the return value of wait_for_completion_timeout

   - Fix wrong ECC parameters for mt7622

   - Fix a possible memory corruption that might panic in the Qcom
     driver"

* tag 'mtd/fixes-for-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
  mtd: rawnand: qcom: fix memory corruption that causes panic
  mtd: fix 'part' field data corruption in mtd_info
  mtd: rawnand: Fix return value check of wait_for_completion_timeout
  mtd: rawnand: fix ecc parameters for mt7622

Add Eric Dumazet to networking maintainers

Welcome Eric!

Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

floppy: disable FDRAWCMD by default

Minh Yuan reported a concurrency use-after-free issue in the floppy code
between raw_cmd_ioctl and seek_interrupt.

[ It turns out this has been around, and that others have reported the
  KASAN splats over the years, but Minh Yuan had a reproducer for it and
  so gets primary credit for reporting it for this fix   - Linus ]

The problem is, this driver tends to break very easily and nowadays,
nobody is expected to use FDRAWCMD anyway since it was used to
manipulate non-standard formats.  The risk of breaking the driver is
higher than the risk presented by this race, and accessing the device
requires privileges anyway.

Let's just add a config option to completely disable this ioctl and
leave it disabled by default.  Distros shouldn't use it, and only those
running on antique hardware might need to enable it.

Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/lkml/CAKcFiNC=MfYVW-Jt9A3=FPJpTwCD2PL_ULNCpsCVE5s8ZeBQgQ@mail.gmail.com
Link: https://lore.kernel.org/all/CAEAjamu1FRhz6StCe_55XY5s389ZP_xmCF69k987En+1z53=eg@mail.gmail.com
Reported-by: Minh Yuan <[email protected]>
Reported-by: [email protected]
Reported-by: cruise k <[email protected]>
Reported-by: Kyungtae Kim <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Tested-by: Denis Efremov <[email protected]>
Signed-off-by: Willy Tarreau <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

platform/x86/intel: pmc/core: change pmc_lpm_modes to static

Sparse reports this issue
core.c: note: in included file:
core.h:239:12: warning: symbol 'pmc_lpm_modes' was not declared. Should it be static?

Global variables should not be defined in headers. This only works
because core.h is only included by core.c. Single file use
variables should be static, so change its storage-class specifier
to static.

Signed-off-by: Tom Rix <[email protected]>
Reviewed-by: David E. Box <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>

platform/x86/intel/sdsi: Fix bug in multi packet reads

Fix bug that added an offset to the mailbox addr during multi-packet
reads. Did not affect current ABI since it doesn't support multi-packet
transactions.

Fixes: 2546c6000430 ("platform/x86: Add Intel Software Defined Silicon driver")
Signed-off-by: David E. Box <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Hans de Goede <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>

platform/x86/intel/sdsi: Poll on ready bit for writes

Due to change in firmware flow, update mailbox writes to poll on ready bit
instead of run_busy bit. This change makes the polling method consistent
for both writes and reads, which also uses the ready bit.

Fixes: 2546c6000430 ("platform/x86: Add Intel Software Defined Silicon driver")
Signed-off-by: David E. Box <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Hans de Goede <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>

platform/x86/intel/sdsi: Handle leaky bucket

To prevent an agent from indefinitely holding the mailbox firmware has
implemented a leaky bucket algorithm. Repeated access to the mailbox may
now incur a delay of up to 2.1 seconds. Add a retry loop that tries for
up to 2.5 seconds to acquire the mailbox.

Fixes: 2546c6000430 ("platform/x86: Add Intel Software Defined Silicon driver")
Signed-off-by: David E. Box <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Hans de Goede <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>

platform/x86: intel-uncore-freq: Prevent driver loading in guests

Loading this driver in guests results in unchecked MSR access error for
MSR 0x620.

There is no use of reading and modifying package/die scope uncore MSRs
in guests. So check for CPU feature X86_FEATURE_HYPERVISOR to prevent
loading of this driver in guests.

Fixes: dbce412a7733 ("platform/x86/intel-uncore-freq: Split common and enumeration part")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215870
Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Srinivas Pandruvada <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>

platform/x86: gigabyte-wmi: added support for B660 GAMING X DDR4 motherboard

This works on my system.

Signed-off-by: Darryn Anton Jordan <[email protected]>
Acked-by: Thomas Weißschuh <[email protected]>
Link: https://lore.kernel.org/r/Ylguq87YG+9L3foV@hark
Signed-off-by: Hans de Goede <[email protected]>

platform/x86: dell-laptop: Add quirk entry for Latitude 7520

The Latitude 7520 supports AC timeouts, but it has no KBD_LED_AC_TOKEN
and so changes to stop_timeout appear to have no effect if the laptop
is plugged in.

Signed-off-by: Gabriele Mazzotta <[email protected]>
Acked-by: Pali Rohár <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>

platform/x86: asus-wmi: Fix driver not binding when fan curve control probe fails

Before this commit fan_curve_check_present() was trying to not cause
the probe to fail on devices without fan curve control by testing for
known error codes returned by asus_wmi_evaluate_method_buf().

Checking for ENODATA or ENODEV, with the latter being returned by this
function when an ACPI integer with a value of ASUS_WMI_UNSUPPORTED_METHOD
is returned. But for other ACPI integer returns this function just returns
them as is, including the ASUS_WMI_DSTS_UNKNOWN_BIT value of 2.

On the Asus U36SD ASUS_WMI_DSTS_UNKNOWN_BIT gets returned, leading to:

asus-nb-wmi: probe of asus-nb-wmi failed with error 2

Instead of playing whack a mole with error codes here, simply treat all
errors as there not being any fan curves, fixing the driver no longer
loading on the Asus U36SD laptop.

Fixes: e3d13da7f77d ("platform/x86: asus-wmi: Fix regression when probing for fan curve control")
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=2079125
Cc: Luke D. Jones <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

platform/x86: asus-wmi: Potential buffer overflow in asus_wmi_evaluate_method_buf()

This code tests for if the obj->buffer.length is larger than the buffer
but then it just does the memcpy() anyway.

Fixes: 0f0ac158d28f ("platform/x86: asus-wmi: Add support for custom fan curves")
Signed-off-by: Dan Carpenter <[email protected]>
Link: https://lore.kernel.org/r/20220413073744.GB8812@kili
Reviewed-by: Hans de Goede <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>

netfilter: conntrack: fix udp offload timeout sysctl

`nf_flowtable_udp_timeout` sysctl option is available only
if CONFIG_NFT_FLOW_OFFLOAD enabled. But infra for this flow
offload UDP timeout was added under CONFIG_NF_FLOW_TABLE
config option. So, if you have CONFIG_NFT_FLOW_OFFLOAD
disabled and CONFIG_NF_FLOW_TABLE enabled, the
`nf_flowtable_udp_timeout` is not present in sysfs.
Please note, that TCP flow offload timeout sysctl option
is present even CONFIG_NFT_FLOW_OFFLOAD is disabled.

I suppose it was a typo in commit that adds UDP flow offload
timeout and CONFIG_NF_FLOW_TABLE should be used instead.

Fixes: 975c57504da1 ("netfilter: conntrack: Introduce udp offload timeout configuration")
Signed-off-by: Volodymyr Mytnyk <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nf_conntrack_tcp: re-init for syn packets only

Jaco Kroon reported tcp problems that Eric Dumazet and Neal Cardwell
pinpointed to nf_conntrack tcp_in_window() bug.

tcp trace shows following sequence:

I > R Flags [S], seq 3451342529, win 62580, options [.. tfo [|tcp]>
R > I Flags [S.], seq 2699962254, ack 3451342530, win 65535, options [..]
R > I Flags [P.], seq 1:89, ack 1, [..]

Note 3rd ACK is from responder to initiator so following branch is taken:
    } else if (((state->state == TCP_CONNTRACK_SYN_SENT
               && dir == IP_CT_DIR_ORIGINAL)
               || (state->state == TCP_CONNTRACK_SYN_RECV
               && dir == IP_CT_DIR_REPLY))
               && after(end, sender->td_end)) {

... because state == TCP_CONNTRACK_SYN_RECV and dir is REPLY.
This causes the scaling factor to be reset to 0: window scale option
is only present in syn(ack) packets.  This in turn makes nf_conntrack
mark valid packets as out-of-window.

This was always broken, it exists even in original commit where
window tracking was added to ip_conntrack (nf_conntrack predecessor)
in 2.6.9-rc1 kernel.

Restrict to 'tcph->syn', just like the 3rd condtional added in
commit 82b72cb94666 ("netfilter: conntrack: re-init state for retransmitted syn-ack").

Upon closer look, those conditionals/branches can be merged:

Because earlier checks prevent syn-ack from showing up in
original direction, the 'dir' checks in the conditional quoted above are
redundant, remove them. Return early for pure syn retransmitted in reply
direction (simultaneous open).

Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.")
Reported-by: Jaco Kroon <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Acked-by: Jozsef Kadlecsik <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

Merge branch 'remove-virt_to_bus-drivers'

Jakub Kicinski says:

====================
net: remove non-Ethernet drivers using virt_to_bus()

Networking is currently the main offender in using virt_to_bus().
Frankly all the drivers which use it are super old and unlikely
to be used today. They are just an ongoing maintenance burden.

In other words this series is using virt_to_bus() as an excuse
to shed some old stuff. Having done the tree-wide dev_addr_set()
conversion recently I have limited sympathy for carrying dead
code.

Obviously please scream if any of these drivers _is_ in fact
still being used. Otherwise let's take the chance, we can always
apologize and revert if users show up later.

Also I should say thanks to everyone who contributed to this code!
The work continues to be appreciated although realistically in more
of a "history book" fashion...
====================

Signed-off-by: David S. Miller <[email protected]>

net: hamradio: remove support for DMA SCC devices

Another incarnation of Z8530, looks like? Again, no real changes
in the git history, and it needs VIRT_TO_BUS. Unlikely to have
users, let's spend less time refactoring dead code...

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: wan: remove support for Z85230-based devices

Looks like all the changes to this driver had been automated
churn since git era begun. The driver is using virt_to_bus(),
it's just a maintenance burden unlikely to have any users.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: wan: remove support for COSA and SRP synchronous serial boards

Looks like all the changes to this driver had been automated
churn since git era begun. The driver is using virt_to_bus()
so it should be updated to a proper DMA API or removed. Given
the latest "news" entry on the website is from 1999 I'm opting
for the latter.

I'm marking the allocated char device major number as [REMOVED],
I reckon we can't reuse it in case some SW out there assumes its
COSA?

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: atm: remove support for ZeitNet ZN122x ATM devices

This driver received nothing but automated fixes in the last 15 years.
Since it's using virt_to_bus it's unlikely to be used on any modern
platform.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: atm: remove support for Madge Horizon ATM devices

This driver received nothing but automated fixes since git era begun.
Since it's using virt_to_bus it's unlikely to be used on any modern
platform.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: atm: remove support for Fujitsu FireStream ATM devices

This driver received nothing but automated fixes (mostly spelling
and compiler warnings) since git era begun. Since it's using
virt_to_bus it's unlikely to be used on any modern platform.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'lan966x-ptp-programmable-pins'

Horatiu Vultur says:

====================
net: lan966x: Add support for PTP programmable pins

Lan966x has 8 PTP programmable pins. The last pin is hardcoded to be used
by PHC0 and all the rest are shareable between the PHCs. The PTP pins can
implement both extts and perout functions.

v1->v2:
- use ptp_find_pin_unlocked instead of ptp_find_pin inside the irq handler.
====================

Signed-off-by: David S. Miller <[email protected]>

net: lan966x: Add support for PTP_PF_EXTTS

Extend the PTP programmable pins to implement also PTP_PF_EXTTS
function. The PTP pin can be configured to capture only on the rising
edge of the PPS signal. And once an event is seen then an interrupt is
generated and the local time counter is saved.
The interrupt is shared between all the pins.

Signed-off-by: Horatiu Vultur <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: lan966x: Add support for PTP_PF_PEROUT

Lan966x has 8 PTP programmable pins, where the last pins is hardcoded to
be used by PHC0, which does the frame timestamping. All the rest of the
PTP pins can be shared between the PHCs and can have different functions
like perout or extts. For now add support for PTP_FS_PEROUT.
The HW is not able to support absolute start time but can use the nsec
for phase adjustment when generating PPS.

Signed-off-by: Horatiu Vultur <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: lan966x: Add registers used to configure the PTP pin

Add registers that are used to configure the PTP pins. These registers
are used to enable the interrupts per PTP pin and to set the waveform
generated by the pin.

Signed-off-by: Horatiu Vultur <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: lan966x: Change the PTP pin used to read/write the PHC.

To read/write a value to a PHC, it is required to use a PTP pin.
Currently it is used pin 5, but change to pin 7 as is the last pin.
All the other pins will have different functions.

Signed-off-by: Horatiu Vultur <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

dt-bindings: net: lan966x: Extend with the ptp external interrupt.

Extend dt-bindings for lan966x with ptp external interrupt. This is
generated when an external 1pps signal is received on the ptp pin.

Signed-off-by: Horatiu Vultur <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net
-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-04-26

This series contains updates to ice driver only.

Ivan Vecera removes races related to VF message processing by changing
mutex_trylock() call to mutex_lock() and moving additional operations
to occur under mutex.

Petr Oros increases wait time after firmware flash as current time is
not sufficient.

Jake resolves a use-after-free issue for mailbox snapshot.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'mptcp-MP_FAIL-timeout'

Mat Martineau says:

====================
mptcp: Timeout for MP_FAIL response

When one peer sends an infinite mapping to coordinate fallback from
MPTCP to regular TCP, the other peer is expected to send a packet with
the MPTCP MP_FAIL option to acknowledge the infinite mapping. Rather
than leave the connection in some half-fallback state, this series adds
a timeout after which the infinite mapping sender will reset the
connection.

Patch 1 adds a fallback self test.

Patches 2-5 make use of the MPTCP socket's retransmit timer to reset the
MPTCP connection if no MP_FAIL was received.

Patches 6 and 7 extends the self test to check MP_FAIL-related MIBs.
====================

Signed-off-by: David S. Miller <[email protected]>

selftests: mptcp: print extra msg in chk_csum_nr

When the multiple checksum errors occur in chk_csum_nr(), print the
numbers of the errors as an extra message.

Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

selftests: mptcp: check MP_FAIL response mibs

This patch extends chk_fail_nr to check the MP_FAIL response mibs.

Add a new argument invert for chk_fail_nr to allow it can check the
MP_FAIL TX and RX mibs from the opposite direction.

When the infinite map is received before the MP_FAIL response, the
response will be lost. A '-' can be added into fail_tx or fail_rx to
represent that MP_FAIL response TX or RX can be lost when doing the
checks.

Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mptcp: reset subflow when MP_FAIL doesn't respond

This patch adds a new msk->flags bit MPTCP_FAIL_NO_RESPONSE, then reuses
sk_timer to trigger a check if we have not received a response from the
peer after sending MP_FAIL. If the peer doesn't respond properly, reset
the subflow.

Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mptcp: add MP_FAIL response support

This patch adds a new struct member mp_fail_response_expect in struct
mptcp_subflow_context to support MP_FAIL response. In the single subflow
with checksum error and contiguous data special case, a MP_FAIL is sent
in response to another MP_FAIL.

Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mptcp: add data lock for sk timers

mptcp_data_lock() needs to be held when manipulating the msk
retransmit_timer or the sk sk_timer. This patch adds the data
lock for the both timers.

Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mptcp: use mptcp_stop_timer

Use the helper mptcp_stop_timer() instead of using sk_stop_timer() to
stop icsk_retransmit_timer directly.

Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

selftests: mptcp: add infinite map testcase

Add the single subflow test case for MP_FAIL, to test the infinite
mapping case. Use the test_linkfail value to make 128KB test files.

Add a new function reset_with_fail(), in it use 'iptables' and 'tc
action pedit' rules to produce the bit flips to trigger the checksum
failures. Set validate_checksum to enable checksums for the MP_FAIL
tests without passing the '-C' argument. Set check_invert flag to
enable the invert bytes check for the output data in check_transfer().
Instead of the file mismatch error, this test prints out the inverted
bytes.

Add a new function pedit_action_pkts() to get the numbers of the packets
edited by the tc pedit actions. Print this numbers to the output.

Also add the needed kernel configures in the selftests config file.

Suggested-by: Davide Caratti <[email protected]>
Co-developed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK

Commit 4b5923249b8fa4 ("net: dsa: lantiq_gswip: Configure all remaining
GSWIP_MII_CFG bits") added all known bits in the GSWIP_MII_CFGp
register. It helped bring this register into a well-defined state so the
driver has to rely less on the bootloader to do things right.
Unfortunately it also sets the GSWIP_MII_CFG_RMII_CLK bit without any
possibility to configure it. Upon further testing it turns out that all
boards which are supported by the GSWIP driver in OpenWrt which use an
RMII PHY have a dedicated oscillator on the board which provides the
50MHz RMII reference clock.

Don't set the GSWIP_MII_CFG_RMII_CLK bit (but keep the code which always
clears it) to fix support for the Fritz!Box 7362 SL in OpenWrt. This is
a board with two Atheros AR8030 RMII PHYs. With the "RMII clock" bit set
the MAC also generates the RMII reference clock whose signal then
conflicts with the signal from the oscillator on the board. This results
in a constant cycle of the PHY detecting link up/down (and as a result
of that: the two ports using the AR8030 PHYs are not working).

At the time of writing this patch there's no known board where the MAC
(GSWIP) has to generate the RMII reference clock. If needed this can be
implemented in future by providing a device-tree flag so the
GSWIP_MII_CFG_RMII_CLK bit can be toggled per port.

Fixes: 4b5923249b8fa4 ("net: dsa: lantiq_gswip: Configure all remaining GSWIP_MII_CFG bits")
Tested-by: Jan Hoffmann <[email protected]>
Signed-off-by: Martin Blumenstingl <[email protected]>
Acked-by: Hauke Mehrtens <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: Use this_cpu_inc() to increment net->core_stats

The macro dev_core_stats_##FIELD##_inc() disables preemption and invokes
netdev_core_stats_alloc() to return a per-CPU pointer.
netdev_core_stats_alloc() will allocate memory on its first invocation
which breaks on PREEMPT_RT because it requires non-atomic context for
memory allocation.

This can be avoided by enabling preemption in netdev_core_stats_alloc()
assuming the caller always disables preemption.

It might be better to replace local_inc() with this_cpu_inc() now that
dev_core_stats_##FIELD##_inc() gained a preempt-disable section and does
not rely on already disabled preemption. This results in less
instructions on x86-64:
local_inc:
|          incl %gs:__preempt_count(%rip)  # __preempt_count
|          movq    488(%rdi), %rax # _1->core_stats, _22
|          testq   %rax, %rax      # _22
|          je      .L585   #,
|          add %gs:this_cpu_off(%rip), %rax        # this_cpu_off, tcp_ptr__
|  .L586:
|          testq   %rax, %rax      # _27
|          je      .L587   #,
|          incq (%rax)            # _6->a.counter
|  .L587:
|          decl %gs:__preempt_count(%rip)  # __preempt_count

this_cpu_inc(), this patch:
|         movq    488(%rdi), %rax # _1->core_stats, _5
|         testq   %rax, %rax      # _5
|         je      .L591   #,
| .L585:
|         incq %gs:(%rax) # _18->rx_dropped

Use unsigned long as type for the counter. Use this_cpu_inc() to
increment the counter. Use a plain read of the counter.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: stmmac: dwmac-imx: comment spelling fix

Fix spelling in comment.

Fixes: 94abdad6974a ("net: ethernet: dwmac: add ethernet glue logic for NXP imx8 chip")
Signed-off-by: Marcel Ziswiler <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: remove comments that mention obsolete __SLOW_DOWN_IO

The only remaining definitions of __SLOW_DOWN_IO (for alpha and ia64) do
nothing, and the only mentions in networking are in comments. Remove these
mentions.

Signed-off-by: Bjorn Helgaas <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: wan: atp: remove unused eeprom_delay()

atp.h is included only by atp.c, which does not use eeprom_delay(). Remove
the unused definition.

Signed-off-by: Bjorn Helgaas <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: tls: fix async vs NIC crypto offload

When NIC takes care of crypto (or the record has already
been decrypted) we forget to update darg->async. ->async
is supposed to mean whether record is async capable on
input and whether record has been queued for async crypto
on output.

Reported-by: Gal Pressman <[email protected]>
Fixes: 3547a1f9d988 ("tls: rx: use async as an in-out argument")
Tested-by: Gal Pressman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: mt753x: fix pcs conversion regression

Daniel Golle reports that the conversion of mt753x to phylink PCS caused
an oops as below.

The problem is with the placement of the PCS initialisation, which
occurs after mt7531_setup() has been called. However, burited in this
function is a call to setup the CPU port, which requires the PCS
structure to be already setup.

Fix this by changing the initialisation order.

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
Mem abort info:
  ESR = 0x96000005
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x05: level 1 translation fault
Data abort info:
  ISV = 0, ISS = 0x00000005
  CM = 0, WnR = 0
user pgtable: 4k pages, 39-bit VAs, pgdp=0000000046057000
[0000000000000020] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
Internal error: Oops: 96000005 [#1] SMP
Modules linked in:
CPU: 0 PID: 32 Comm: kworker/u4:1 Tainted: G S 5.18.0-rc3-next-20220422+ #0
Hardware name: Bananapi BPI-R64 (DT)
Workqueue: events_unbound deferred_probe_work_func
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : mt7531_cpu_port_config+0xcc/0x1b0
lr : mt7531_cpu_port_config+0xc0/0x1b0
sp : ffffffc008d5b980
x29: ffffffc008d5b990 x28: ffffff80060562c8 x27: 00000000f805633b
x26: ffffff80001a8880 x25: 00000000000009c4 x24: 0000000000000016
x23: ffffff8005eb6470 x22: 0000000000003600 x21: ffffff8006948080
x20: 0000000000000000 x19: 0000000000000006 x18: 0000000000000000
x17: 0000000000000001 x16: 0000000000000001 x15: 02963607fcee069e
x14: 0000000000000000 x13: 0000000000000030 x12: 0101010101010101
x11: ffffffc037302000 x10: 0000000000000870 x9 : ffffffc008d5b800
x8 : ffffff800028f950 x7 : 0000000000000001 x6 : 00000000662b3000
x5 : 00000000000002f0 x4 : 0000000000000000 x3 : ffffff800028f080
x2 : 0000000000000000 x1 : ffffff800028f080 x0 : 0000000000000000
Call trace:
mt7531_cpu_port_config+0xcc/0x1b0
mt753x_cpu_port_enable+0x24/0x1f0
mt7531_setup+0x49c/0x5c0
mt753x_setup+0x20/0x31c
dsa_register_switch+0x8bc/0x1020
mt7530_probe+0x118/0x200
mdio_probe+0x30/0x64
really_probe.part.0+0x98/0x280
__driver_probe_device+0x94/0x140
driver_probe_device+0x40/0x114
__device_attach_driver+0xb0/0x10c
bus_for_each_drv+0x64/0xa0
__device_attach+0xa8/0x16c
device_initial_probe+0x10/0x20
bus_probe_device+0x94/0x9c
deferred_probe_work_func+0x80/0xb4
process_one_work+0x200/0x3a0
worker_thread+0x260/0x4c0
kthread+0xd4/0xe0
ret_from_fork+0x10/0x20
Code: 9409e911 937b7e60 8b0002a0 f9405800 (f9401005)
---[ end trace 0000000000000000 ]---

Reported-by: Daniel Golle <[email protected]>
Tested-by: Daniel Golle <[email protected]>
Fixes: cbd1f243bc41 ("net: dsa: mt7530: partially convert to phylink_pcs")
Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: generalize skb freeing deferral to per-cpu lists

Logic added in commit f35f821935d8 ("tcp: defer skb freeing after socket
lock is released") helped bulk TCP flows to move the cost of skbs
frees outside of critical section where socket lock was held.

But for RPC traffic, or hosts with RFS enabled, the solution is far from
being ideal.

For RPC traffic, recvmsg() has to return to user space right after
skb payload has been consumed, meaning that BH handler has no chance
to pick the skb before recvmsg() thread. This issue is more visible
with BIG TCP, as more RPC fit one skb.

For RFS, even if BH handler picks the skbs, they are still picked
from the cpu on which user thread is running.

Ideally, it is better to free the skbs (and associated page frags)
on the cpu that originally allocated them.

This patch removes the per socket anchor (sk->defer_list) and
instead uses a per-cpu list, which will hold more skbs per round.

This new per-cpu list is drained at the end of net_action_rx(),
after incoming packets have been processed, to lower latencies.

In normal conditions, skbs are added to the per-cpu list with
no further action. In the (unlikely) cases where the cpu does not
run net_action_rx() handler fast enough, we use an IPI to raise
NET_RX_SOFTIRQ on the remote cpu.

Also, we do not bother draining the per-cpu list from dev_cpu_dead()
This is because skbs in this list have no requirement on how fast
they should be freed.

Note that we can add in the future a small per-cpu cache
if we see any contention on sd->defer_lock.

Tested on a pair of hosts with 100Gbit NIC, RFS enabled,
and /proc/sys/net/ipv4/tcp_rmem[2] tuned to 16MB to work around
page recycling strategy used by NIC driver (its page pool capacity
being too small compared to number of skbs/pages held in sockets
receive queues)

Note that this tuning was only done to demonstrate worse
conditions for skb freeing for this particular test.
These conditions can happen in more general production workload.

10 runs of one TCP_STREAM flow

Before:
Average throughput: 49685 Mbit.

Kernel profiles on cpu running user thread recvmsg() show high cost for
skb freeing related functions (*)

    57.81%  [kernel]       [k] copy_user_enhanced_fast_string
(*) 12.87%  [kernel]       [k] skb_release_data
(*)  4.25%  [kernel]       [k] __free_one_page
(*)  3.57%  [kernel]       [k] __list_del_entry_valid
     1.85%  [kernel]       [k] __netif_receive_skb_core
     1.60%  [kernel]       [k] __skb_datagram_iter
(*)  1.59%  [kernel]       [k] free_unref_page_commit
(*)  1.16%  [kernel]       [k] __slab_free
     1.16%  [kernel]       [k] _copy_to_iter
(*)  1.01%  [kernel]       [k] kfree
(*)  0.88%  [kernel]       [k] free_unref_page
     0.57%  [kernel]       [k] ip6_rcv_core
     0.55%  [kernel]       [k] ip6t_do_table
     0.54%  [kernel]       [k] flush_smp_call_function_queue
(*)  0.54%  [kernel]       [k] free_pcppages_bulk
     0.51%  [kernel]       [k] llist_reverse_order
     0.38%  [kernel]       [k] process_backlog
(*)  0.38%  [kernel]       [k] free_pcp_prepare
     0.37%  [kernel]       [k] tcp_recvmsg_locked
(*)  0.37%  [kernel]       [k] __list_add_valid
     0.34%  [kernel]       [k] sock_rfree
     0.34%  [kernel]       [k] _raw_spin_lock_irq
(*)  0.33%  [kernel]       [k] __page_cache_release
     0.33%  [kernel]       [k] tcp_v6_rcv
(*)  0.33%  [kernel]       [k] __put_page
(*)  0.29%  [kernel]       [k] __mod_zone_page_state
     0.27%  [kernel]       [k] _raw_spin_lock

After patch:
Average throughput: 73076 Mbit.

Kernel profiles on cpu running user thread recvmsg() looks better:

    81.35%  [kernel]       [k] copy_user_enhanced_fast_string
     1.95%  [kernel]       [k] _copy_to_iter
     1.95%  [kernel]       [k] __skb_datagram_iter
     1.27%  [kernel]       [k] __netif_receive_skb_core
     1.03%  [kernel]       [k] ip6t_do_table
     0.60%  [kernel]       [k] sock_rfree
     0.50%  [kernel]       [k] tcp_v6_rcv
     0.47%  [kernel]       [k] ip6_rcv_core
     0.45%  [kernel]       [k] read_tsc
     0.44%  [kernel]       [k] _raw_spin_lock_irqsave
     0.37%  [kernel]       [k] _raw_spin_lock
     0.37%  [kernel]       [k] native_irq_return_iret
     0.33%  [kernel]       [k] __inet6_lookup_established
     0.31%  [kernel]       [k] ip6_protocol_deliver_rcu
     0.29%  [kernel]       [k] tcp_rcv_established
     0.29%  [kernel]       [k] llist_reverse_order

v2: kdoc issue (kernel bots)
    do not defer if (alloc_cpu == smp_processor_id()) (Paolo)
    replace the sk_buff_head with a single-linked list (Jakub)
    add a READ_ONCE()/WRITE_ONCE() for the lockless read of sd->defer_list

Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'pinctrl-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:

- Fix some register offsets on Intel Alderlake

- Fix the order the UFS and SDC pins on Qualcomm SM6350

- Fix a build error in Mediatek Moore.

- Fix a pin function table in the Sunplus SP7021.

- Fix some Kconfig and static keywords on the Samsung Tesla FSD SoC.

- Fix up the EOI function for edge triggered IRQs and keep the block
   clock enabled for level IRQs in the STM32 driver.

- Fix some bits and order in the Rockchip RK3308 driver.

- Handle the errorpath in the Pistachio driver probe() properly.

* tag 'pinctrl-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: pistachio: fix use of irq_of_parse_and_map()
  pinctrl: stm32: Keep pinctrl block clock enabled when LEVEL IRQ requested
  pinctrl: rockchip: sort the rk3308_mux_recalced_data entries
  pinctrl: rockchip: fix RK3308 pinmux bits
  pinctrl: stm32: Do not call stm32_gpio_get() for edge triggered IRQs in EOI
  pinctrl: Fix an error in pin-function table of SP7021
  pinctrl: samsung: fix missing GPIOLIB on ARM64 Exynos config
  pinctrl: mediatek: moore: Fix build error
  pinctrl: qcom: sm6350: fix order of UFS & SDC pins
  pinctrl: alderlake: Fix register offsets for ADL-N variant
  pinctrl: samsung: staticize fsd_pin_ctrl

Merge branch 'Teach libbpf to "fix up" BPF verifier log'

Andrii Nakryiko says:

====================

This patch set teaches libbpf to enhance BPF verifier log with human-readable
and relevant information about failed CO-RE relocation. Patch #9 is the main
one with the new logic. See relevant commit messages for some more details.

All the other patches are either fixing various bugs detected
while working on this feature, most prominently a bug with libbpf not handling
CO-RE relocations for SEC("?...") programs, or are refactoring libbpf
internals to allow for easier reuse of CO-RE relo lookup and formatting logic.
====================

Signed-off-by: Alexei Starovoitov <[email protected]>

selftests/bpf: Add libbpf's log fixup logic selftests

Add tests validating that libbpf is indeed patching up BPF verifier log
with CO-RE relocation details. Also test partial and full truncation
scenarios.

This test might be a bit fragile due to changing BPF verifier log
format. If that proves to be frequently breaking, we can simplify tests
or remove the truncation subtests. But for now it seems useful to test
it in those conditions that are otherwise rarely occuring in practice.

Also test CO-RE relo failure in a subprog as that excercises subprogram CO-RE
relocation mapping logic which doesn't work out of the box without extra
relo storage previously done only for gen_loader case.

Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

libbpf: Fix up verifier log for unguarded failed CO-RE relos

Teach libbpf to post-process BPF verifier log on BPF program load
failure and detect known error patterns to provide user with more
context.

Currently there is one such common situation: an "unguarded" failed BPF
CO-RE relocation. While failing CO-RE relocation is expected, it is
expected to be property guarded in BPF code such that BPF verifier
always eliminates BPF instructions corresponding to such failed CO-RE
relos as dead code. In cases when user failed to take such precautions,
BPF verifier provides the best log it can:

  123: (85) call unknown#195896080
  invalid func unknown#195896080

Such incomprehensible log error is due to libbpf "poisoning" BPF
instruction that corresponds to failed CO-RE relocation by replacing it
with invalid `call 0xbad2310` instruction (195896080 == 0xbad2310 reads
"bad relo" if you squint hard enough).

Luckily, libbpf has all the necessary information to look up CO-RE
relocation that failed and provide more human-readable description of
what's going on:

  5: <invalid CO-RE relocation>
  failed to resolve CO-RE relocation <byte_off> [6] struct task_struct___bad.fake_field_subprog (0:2 @ offset 8)

This hopefully makes it much easier to understand what's wrong with
user's BPF program without googling magic constants.

This BPF verifier log fixup is setup to be extensible and is going to be
used for at least one other upcoming feature of libbpf in follow up patches.
Libbpf is parsing lines of BPF verifier log starting from the very end.
Currently it processes up to 10 lines of code looking for familiar
patterns. This avoids wasting lots of CPU processing huge verifier logs
(especially for log_level=2 verbosity level). Actual verification error
should normally be found in last few lines, so this should work
reliably.

If libbpf needs to expand log beyond available log_buf_size, it
truncates the end of the verifier log. Given verifier log normally ends
with something like:

  processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

... truncating this on program load error isn't too bad (end user can
always increase log size, if it needs to get complete log).

Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

libbpf: Simplify bpf_core_parse_spec() signature

Simplify bpf_core_parse_spec() signature to take struct bpf_core_relo as
an input instead of requiring callers to decompose them into type_id,
relo, spec_str, etc. This makes using and reusing this helper easier.

Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

libbpf: Refactor CO-RE relo human description formatting routine

Refactor how CO-RE relocation is formatted. Now it dumps human-readable
representation, currently used by libbpf in either debug or error
message output during CO-RE relocation resolution process, into provided
buffer. This approach allows for better reuse of this functionality
outside of CO-RE relocation resolution, which we'll use in next patch
for providing better error message for BPF verifier rejecting BPF
program due to unguarded failed CO-RE relocation.

It also gets rid of annoying "stitching" of libbpf_print() calls, which
was the only place where we did this.

Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

libbpf: Record subprog-resolved CO-RE relocations unconditionally

Previously, libbpf recorded CO-RE relocations with insns_idx resolved
according to finalized subprog locations (which are appended at the end
of entry BPF program) to simplify the job of light skeleton generator.

This is necessary because once subprogs' instructions are appended to
main entry BPF program all the subprog instruction indices are shifted
and that shift is different for each entry (main) BPF program, so it's
generally impossible to map final absolute insn_idx of the finalized BPF
program to their original locations inside subprograms.

This information is now going to be used not only during light skeleton
generation, but also to map absolute instruction index to subprog's
instruction and its corresponding CO-RE relocation. So start recording
these relocations always, not just when obj->gen_loader is set.

This information is going to be freed at the end of bpf_object__load()
step, as before (but this can change in the future if there will be
a need for this information post load step).

Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

selftests/bpf: Add CO-RE relos and SEC("?...") to linked_funcs selftests

Enhance linked_funcs selftest with two tricky features that might not
obviously work correctly together. We add CO-RE relocations to entry BPF
programs and mark those programs as non-autoloadable with SEC("?...")
annotation. This makes sure that libbpf itself handles .BTF.ext CO-RE
relocation data matching correctly for SEC("?...") programs, as well as
ensures that BPF static linker handles this correctly (this was the case
before, no changes are necessary, but it wasn't explicitly tested).

Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

libbpf: Avoid joining .BTF.ext data with BPF programs by section name

Instead of using ELF section names as a joining key between .BTF.ext and
corresponding BPF programs, pre-build .BTF.ext section number to ELF
section index mapping during bpf_object__open() and use it later for
matching .BTF.ext information (func/line info or CO-RE relocations) to
their respective BPF programs and subprograms.

This simplifies corresponding joining logic and let's libbpf do
manipulations with BPF program's ELF sections like dropping leading '?'
character for non-autoloaded programs. Original joining logic in
bpf_object__relocate_core() (see relevant comment that's now removed)
was never elegant, so it's a good improvement regardless. But it also
avoids unnecessary internal assumptions about preserving original ELF
section name as BPF program's section name (which was broken when
SEC("?abc") support was added).

Fixes: a3820c481112 ("libbpf: Support opting out from autoloading BPF programs declaratively")
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

libbpf: Fix logic for finding matching program for CO-RE relocation

Fix the bug in bpf_object__relocate_core() which can lead to finding
invalid matching BPF program when processing CO-RE relocation. IF
matching program is not found, last encountered program will be assumed
to be correct program and thus error detection won't detect the problem.

Fixes: 9c82a63cf370 ("libbpf: Fix CO-RE relocs against .text section")
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

libbpf: Drop unhelpful "program too large" guess

libbpf pretends it knows actual limit of BPF program instructions based
on UAPI headers it compiled with. There is neither any guarantee that
UAPI headers match host kernel, nor BPF verifier actually uses
BPF_MAXINSNS constant anymore. Just drop unhelpful "guess", BPF verifier
will emit actual reason for failure in its logs anyways.

Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]