Git Repo - linux.git/log

hinic: Replace memcpy() with direct assignment

Under CONFIG_FORTIFY_SOURCE=y and CONFIG_UBSAN_BOUNDS=y, Clang is bugged
here for calculating the size of the destination buffer (0x10 instead of
0x14). This copy is a fixed size (sizeof(struct fw_section_info_st)), with
the source and dest being struct fw_section_info_st, so the memcpy should
be safe, assuming the index is within bounds, which is UBSAN_BOUNDS's
responsibility to figure out.

Avoid the whole thing and just do a direct assignment. This results in
no change to the executable code.

Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Tom Rix <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: Jiri Pirko <[email protected]>
Cc: Vladimir Oltean <[email protected]>
Cc: Simon Horman <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://github.com/ClangBuiltLinux/linux/issues/1592
Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Gustavo A. R. Silva <[email protected]>
Tested-by: Nathan Chancellor <[email protected]> # build
Signed-off-by: David S. Miller <[email protected]>

net: ag71xx: fix discards 'const' qualifier warning

Current kernel will compile this driver with warnings. This patch will
fix it.

drivers/net/ethernet/atheros/ag71xx.c: In function 'ag71xx_fast_reset':
drivers/net/ethernet/atheros/ag71xx.c:996:31: warning: passing argument 2 of 'ag71xx_hw_set
_macaddr' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
  996 |  ag71xx_hw_set_macaddr(ag, dev->dev_addr);
      |                            ~~~^~~~~~~~~~
drivers/net/ethernet/atheros/ag71xx.c:951:69: note: expected 'unsigned char *' but argument
is of type 'const unsigned char *'
  951 | static void ag71xx_hw_set_macaddr(struct ag71xx *ag, unsigned char *mac)
      |                                                      ~~~~~~~~~~~~~~~^~~
drivers/net/ethernet/atheros/ag71xx.c: In function 'ag71xx_open':
drivers/net/ethernet/atheros/ag71xx.c:1441:32: warning: passing argument 2 of 'ag71xx_hw_se
t_macaddr' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
1441 |  ag71xx_hw_set_macaddr(ag, ndev->dev_addr);
      |                            ~~~~^~~~~~~~~~
drivers/net/ethernet/atheros/ag71xx.c:951:69: note: expected 'unsigned char *' but argument
is of type 'const unsigned char *'
  951 | static void ag71xx_hw_set_macaddr(struct ag71xx *ag, unsigned char *mac)
      |                                                      ~~~~~~~~~~~~~~~^~~

Fixes: adeef3e32146 ("net: constify netdev->dev_addr")
Signed-off-by: Oleksij Rempel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: fix build...

Remove accidental dup of tcp_wmem_schedule.

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'pcs-xpcs-stmmac-add-1000BASE-X-AN-for-network-switch'

Ong Boon Leong says:

====================
pcs-xpcs, stmmac: add 1000BASE-X AN for network switch

Thanks for v4 review feedback in [1] and [2]. I have changed the v5
implementation as follow.

v5 changes:
1/5 - No change from v4.
2/5 - No change from v4.
3/5 - [Fix] make xpcs_modify_changed() static and use
      mdiodev_modify_changed() for cleaner code as suggested by
      Russell King.
4/5 - [Fix] Use fwnode_get_phy_mode() as recommended by Andrew Lunn.
5/5 - [Fix] Make fwnode = of_fwnode_handle(priv->plat->phylink_node)
      order after priv = netdev_priv(dev).

v4 changes:
1/5 - Squash v3:1/7 & 2/7 patches into v4:1/6 so that it passes build.
2/5 - [No change] same as v3:3/7
3/5 - [Fix] Fix issues identified by Russell in [1]
4/5 - [Fix] Drop v3:5/7 patch per input by Russell in [2] and make
            dwmac-intel clear the ovr_an_inband flag if fixed-link
            is used in ACPI _DSD.
5/5 - [No change] same as v3:7/7

For the steps to setup ACPI _DSD and checking, they are the same
as in [3]

Reference:
[1] https://patchwork.kernel.org/comment/24894239/
[2] https://patchwork.kernel.org/comment/24895330/
[3] https://patchwork.kernel.org/project/netdevbpf/cover/20220610033610 [email protected]/
====================

Signed-off-by: David S. Miller <[email protected]>

net: stmmac: make mdio register skips PHY scanning for fixed-link

stmmac_mdio_register() lacks fixed-link consideration and only skip PHY
scanning if it has done DT style PHY discovery. So, for DT or ACPI _DSD
setting of fixed-link, the PHY scanning should not happen.

v2: fix incorrect order related to fwnode that is not caught in non-DT
platform.

Tested-by: Emilio Riva <[email protected]>
Signed-off-by: Ong Boon Leong <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

stmmac: intel: add phy-mode and fixed-link ACPI _DSD setting support

Currently, phy_interface for TSN controller instance is set based on its
PCI Device ID. For SGMII PHY interface, phy_interface default to
PHY_INTERFACE_MODE_SGMII. As C37 AN supports both SGMII and 1000BASE-X
mode, we add support for 'phy-mode' ACPI _DSD for port-specific
and customer platform specific customization.

v3: use fwnode_get_phy_mode() as suggested by Andrew Lunn in
https://patchwork.kernel.org/comment/24895330/

v2:
For platform that sets 'fixed-link' using ACPI _DSD, we will unset
xpcs_an_inband within stmmac. Thanks to Russell King for his comment in
https://patchwork.kernel.org/comment/24890222/

v1:
Thanks to Andrew Lunn's guidance in
https://patchwork.kernel.org/comment/24827101/

Tested-by: Emilio Riva <[email protected]>
Signed-off-by: Ong Boon Leong <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: pcs: xpcs: add CL37 1000BASE-X AN support

For CL37 1000BASE-X AN, DW xPCS does not support C22 method but offers
C45 vendor-specific MII MMD for programming.

We also add the ability to disable Autoneg (through ethtool for certain
network switch that supports 1000BASE-X (1000Mbps and Full-Duplex) but
not Autoneg capability.

v4: Fixes to comment from Russell King. Thanks!
    https://patchwork.kernel.org/comment/24894239/
    Make xpcs_modify_changed() as private, change to use
    mdiodev_modify_changed() for cleaner code.

v3: Fixes to issues spotted by Russell King. Thanks!
    https://patchwork.kernel.org/comment/24890210/
    Use phylink_mii_c22_pcs_decode_state(), remove unnecessary
    interrupt clearing and skip speed & duplex setting if AN
    is enabled.

v2: Fixes to issues spotted by Russell King in v1. Thanks!
    https://patchwork.kernel.org/comment/24826650/
    Use phylink_mii_c22_pcs_encode_advertisement() and implement
    C45 MII ADV handling since IP only support C45 access.

Tested-by: Emilio Riva <[email protected]>
Signed-off-by: Ong Boon Leong <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

stmmac: intel: prepare to support 1000BASE-X phy interface setting

Currently, intel_speed_mode_2500() redundantly fix-up phy_interface to
PHY_INTERFACE_MODE_SGMII if the underlying controller is in 1000Mbps
SGMII mode. The value of phy_interface has been initialized earlier.

This patch removes such redundancy to prepare for setting 1000BASE-X
mode for certain hardware platform configuration.

Also update the intel_mgbe_common_data() to include 1000BASE-X setup.

Signed-off-by: Ong Boon Leong <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: make xpcs_do_config to accept advertising for pcs-xpcs and sja1105

xpcs_config() has 'advertising' input that is required for C37 1000BASE-X
AN in later patch series. So, we prepare xpcs_do_config() for it.

For sja1105, xpcs_do_config() is used for xpcs configuration without
depending on advertising input, so set to NULL.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Ong Boon Leong <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'mlxsw-L3-HW-stats-improvements'

Ido Schimmel says:

====================
mlxsw: L3 HW stats improvements

While testing L3 HW stats [1] on top of mlxsw, two issues were found:

1. Stats cannot be enabled for more than 205 netdevs. This was fixed in
commit 4b7a632ac4e7 ("mlxsw: spectrum_cnt: Reorder counter pools").

2. ARP packets are counted as errors. Patch #1 takes care of that. See
the commit message for details.

The goal of the majority of the rest of the patches is to add selftests
that would have discovered that only about 205 netdevs can have L3 HW
stats supported, despite the HW supporting much more. The obvious place
to plug this in is the scale test framework.

The scale tests are currently testing two things: that some number of
instances of a given resource can actually be created; and that when an
attempt is made to create more than the supported amount, the failures
are noted and handled gracefully.

However the ability to allocate the resource does not mean that the
resource actually works when passing traffic. For that, make it possible
for a given scale to also test traffic.

To that end, this patchset adds traffic tests. The goal of these is to
run traffic and observe whether a sample of the allocated resource
instances actually perform their task. Traffic tests are only run on the
positive leg of the scale test (no point trying to pass traffic when the
expected outcome is that the resource will not be allocated). They are
opt-in, if a given test does not expose it, it is not run.

The patchset proceeds as follows:

- Patches #2 and #3 add to "devlink resource" support for number of
  allocated RIFs, and the capacity. This is necessary, because when
  evaluating how many L3 HW stats instances it should be possible to
  allocate, the limiting resource on Spectrum-2 and above currently is
  not the counters themselves, but actually the RIFs.

- Patch #6 adds support for invocation of a traffic test, if a given scale
  tests exposes it.

- Patch #7 adds support for skipping a given scale test. Because on
  Spectrum-2 and above, the limiting factor to L3 HW stats instances is
  actually the number of RIFs, there is no point in running the failing leg
  of a scale tests, because it would test exhaustion of RIFs, not of RIF
  counters.

- With patch #8, the scale tests drivers pass the target number to the
  cleanup function of a scale test.

- In patch #9, add a traffic test to the tc_flower selftests. This makes
  sure that the flow counters installed with the ACLs actually do count as
  they are supposed to.

- In patch #10, add a new scale selftest for RIF counter scale, including a
  traffic test.

- In patch #11, the scale target for the tc_flower selftest is
  dynamically set instead of being hard coded.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ca0a53dcec9495d1dc5bbc369c810c520d728373
====================

Signed-off-by: David S. Miller <[email protected]>

selftests: spectrum-2: tc_flower_scale: Dynamically set scale target

Instead of hard coding the scale target in the test, dynamically set it
based on the maximum number of flow counters and their current
occupancy.

Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

selftests: mlxsw: Add a RIF counter scale test

This tests creates as many RIFs as possible, ideally more than there can be
RIF counters (though that is currently only possible on Spectrum-1). It
then tries to enable L3 HW stats on each of the RIFs. It also contains the
traffic test, which tries to run traffic through a log2 of those counters
and checks that the traffic is shown in the counter values.

Like with tc_flower traffic test, take a log2 subset of rules. The logic
behind picking log2 rules is that then every bit of the instantiated item's
number is exercised. This should catch issues whether they happen at the
high end, low end, or somewhere in between.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

selftests: mlxsw: tc_flower_scale: Add a traffic test

Add a test that checks that the created filters do actually trigger on
matching traffic.

Exercising all the rules would be a very lengthy process. Instead, take a
log2 subset of rules. The logic behind picking log2 rules is that then
every bit of the instantiated item's number is exercised. This should catch
issues whether they happen at the high end, low end, or somewhere in
between.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

selftests: mlxsw: resource_scale: Pass target count to cleanup

The scale tests are verifying behavior of mlxsw when number of instances of
some resource reaches the ASIC capacity. The number of instances is
referred to as "target" number.

No scale tests so far needed to know this target number to clean up. E.g.
the tc_flower simply removes the clsact qdisc that all the tested filters
are hooked onto, and that takes care of collecting all the filters.

However, for the RIF counter test, which is being added in a future patch,
VLAN netdevices are created. These are created as part of the test, but of
course the cleanup needs to undo them again. For that it needs to know how
many there were. To support this usage, pass the target number to the
cleanup callback.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

selftests: mlxsw: resource_scale: Allow skipping a test

The scale tests are currently testing two things: that some number of
instances of a given resource can actually be created; and that when an
attempt is made to create more than the supported amount, the failures are
noted and handled gracefully.

Sometimes the scale test depends on more than one resource. In particular,
a following patch will add a RIF counter scale test, which depends on the
number of RIF counters that can be bound, and also on the number of RIFs
that can be created.

When the test is limited by the auxiliary resource and not by the primary
one, there's no point trying to run the overflow test, because it would be
testing exhaustion of the wrong resource.

To support this use case, when the $test_get_target yields 0, skip the test
instead.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

selftests: mlxsw: resource_scale: Introduce traffic tests

The scale tests are currently testing two things: that some number of
instances of a given resource can actually be created; and that when an
attempt is made to create more than the supported amount, the failures are
noted and handled gracefully.

However the ability to allocate the resource does not mean that the
resource actually works when passing traffic. For that, make it possible
for a given scale to also test traffic.

Traffic test is only run on the positive leg of the scale test (no point
trying to pass traffic when the expected outcome is that the resource will
not be allocated). Traffic tests are opt-in, if a given test does not
expose it, it is not run.

To this end, delay the test cleanup until after the traffic test is run.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

selftests: mlxsw: resource_scale: Update scale target after test setup

The scale of each resource is tested in the following manner:

1. The scale target is queried.
2. The test setup is prepared.
3. The test is invoked.

In some cases, the occupancy of a resource changes as part of the second
step, requiring the test to return a scale target that takes this change
into account.

Make this more robust by re-querying the scale target after the second
step.

Another possible solution is to swap the first and second steps, but
when a test needs to be skipped (i.e., scale target is zero), the setup
would have been in vain.

Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

selftests: mirror_gre_bridge_1q_lag: Enslave port to bridge before other configurations

Using mlxsw driver, the configurations are offloaded just in case that
there is a physical port which is enslaved to the virtual device
(e.g., to a bridge). In 'mirror_gre_bridge_1q_lag' test, the bridge gets an
address and route before there are ports in the bridge. It means that these
configurations are not offloaded.

Till now the test passes with mlxsw driver even that the RIF of the
bridge is not in the hardware, because the ARP packets are trapped in
layer 2 and also mirrored, so there is no real need of the RIF in hardware.
The previous patch changed the traps 'ARP_REQUEST' and 'ARP_RESPONSE' to
be done at layer 3 instead of layer 2. With this change the ARP packets are
not trapped during the test, as the RIF is not in the hardware because of
the order of configurations.

Reorder the configurations to make them to be offloaded, then the test will
pass with the change of the traps.

Signed-off-by: Amit Cohen <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: Add a resource describing number of RIFs

The Spectrum ASIC has a limit on how many L3 devices (called RIFs) can be
created. The limit depends on the ASIC and FW revision, and mlxsw reads it
from the FW. In order to communicate both the number of RIFs that there can
be, and how many are taken now (i.e. occupancy), introduce a corresponding
devlink resource.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: Keep track of number of allocated RIFs

In order to expose number of RIFs as a resource, it is going to be handy
to have the number of currently-allocated RIFs as a single number.
Introduce such.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: Trap ARP packets at layer 3 instead of layer 2

Currently, the traps 'ARP_REQUEST' and 'ARP_RESPONSE' occur at layer 2.
To allow the packets to be flooded, they are configured with the action
'MIRROR_TO_CPU' which means that the CPU receives a replica of the packet.

Today, Spectrum ASICs also support trapping ARP packets at layer 3. This
behavior is better, then the packets can just be trapped and there is no
need to mirror them. An additional motivation is that using the traps at
layer 2, the ARP packets are dropped in the router as they do not have an
IP header, then they are counted as error packets, which might confuse
users.

Add the relevant traps for layer 3 and use them instead of the existing
traps. There is no visible change to user space.

Signed-off-by: Amit Cohen <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'tcp-mem-pressure-fixes'

Eric Dumazet says:

====================
tcp: final (?) round of mem pressure fixes

While working on prior patch series (e10b02ee5b6c "Merge branch
'net-reduce-tcp_memory_allocated-inflation'"), I found that we
could still have frozen TCP flows under memory pressure.

I thought we had solved this in 2015, but the fix was not complete.

v2: deal with zerocopy tx paths.
====================

Signed-off-by: David S. Miller <[email protected]>

tcp: fix possible freeze in tx path under memory pressure

Blamed commit only dealt with applications issuing small writes.

Issue here is that we allow to force memory schedule for the sk_buff
allocation, but we have no guarantee that sendmsg() is able to
copy some payload in it.

In this patch, I make sure the socket can use up to tcp_wmem[0] bytes.

For example, if we consider tcp_wmem[0] = 4096 (default on x86),
and initial skb->truesize being 1280, tcp_sendmsg() is able to
copy up to 2816 bytes under memory pressure.

Before this patch a sendmsg() sending more than 2816 bytes
would either block forever (if persistent memory pressure),
or return -EAGAIN.

For bigger MTU networks, it is advised to increase tcp_wmem[0]
to avoid sending too small packets.

v2: deal with zero copy paths.

Fixes: 8e4d980ac215 ("tcp: fix behavior for epoll edge trigger")
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Wei Wang <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: fix over estimation in sk_forced_mem_schedule()

sk_forced_mem_schedule() has a bug similar to ones fixed
in commit 7c80b038d23e ("net: fix sk_wmem_schedule() and
sk_rmem_schedule() errors")

While this bug has little chance to trigger in old kernels,
we need to fix it before the following patch.

Fixes: d83769a580f1 ("tcp: fix possible deadlock in tcp_send_fin()")
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Reviewed-by: Wei Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'net-lan743x-pci11010-pci11414-devices-enhancements'

Raju Lakkaraju says:

====================
net: lan743x: PCI11010 / PCI11414 devices Enhancements

This patch series continues with the addition of supported features
for the Ethernet function of the PCI11010 / PCI11414 devices to
the LAN743x driver.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phy: add support to get Master-Slave configuration

Add support to Master-Slave configuration and state

Signed-off-by: Raju Lakkaraju <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: lan743x: Add support to SGMII 1G and 2.5G

Add SGMII access read and write functions
Add support to SGMII 1G and 2.5G for PCI11010/PCI11414 chips

Signed-off-by: Raju Lakkaraju <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: lan743x: Add support to Secure-ON WOL

Add support to Magic Packet Detection with Secure-ON for PCI11010/PCI11414 chips

Signed-off-by: Raju Lakkaraju <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: lan743x: Add support to LAN743x register dump

Add support to LAN743x common register dump

Signed-off-by: Raju Lakkaraju <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'net-dsa-realtek-rtl8365mb-improve-handling-of-phy-modes'

Alvin Šipraga says:

====================
net: dsa: realtek: rtl8365mb: improve handling of PHY modes

This series introduces some minor cleanup of the driver and improves the
handling of PHY interface modes to break the assumption that CPU ports
are always over an external interface, and the assumption that user
ports are always using an internal PHY.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: realtek: rtl8365mb: handle PHY interface modes correctly

Realtek switches in the rtl8365mb family always have at least one port
with a so-called external interface, supporting PHY interface modes such
as RGMII or SGMII. The purpose of this patch is to improve the driver's
handling of these ports.

A new struct rtl8365mb_chip_info is introduced together with a static
array of such structs. An instance of this struct is added for each
supported switch, distinguished by its chip ID and version. Embedded in
each chip_info struct is an array of struct rtl8365mb_extint, describing
the external interfaces available. This is more specific than the old
rtl8365mb_extint_port_map, which was only valid for switches with up to
6 ports.

The struct rtl8365mb_extint also contains a bitmask of supported PHY
interface modes, which allows the driver to distinguish which ports
support RGMII. This corrects a previous mistake in the driver whereby it
was assumed that any port with an external interface supports RGMII.
This is not actually the case: for example, the RTL8367S has two
external interfaces, only the second of which supports RGMII. The first
supports only SGMII and HSGMII. This new design will make it easier to
add support for other interface modes.

Finally, rtl8365mb_phylink_get_caps() is fixed up to return supported
capabilities based on the external interface properties described above.
This addresses Vladimir's point in the linked thread that the
capabilities are not actually a function of the DSA port type: Although
most typical applications will treat the ports with internal PHY as user
ports, there is no actual hardware limitation preventing one from using
them as a CPU port. Equally, ports with external interface(s) may well
be treated as user ports, even though it is typical to use those ports
as CPU ports.

Link: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Alvin Šipraga <[email protected]>
Acked-by: Russell King (Oracle) <[email protected]>
Acked-by: Luiz Angelo Daros de Luca <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: realtek: rtl8365mb: remove learn_limit_max private data member

The variable is just assigned the value of a macro, so it can be
removed.

Signed-off-by: Alvin Šipraga <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: realtek: rtl8365mb: correct the max number of ports

The maximum number of ports is actually 11, according to two
observations:

1. The highest port ID used in the vendor driver is 10. Since port IDs
   are indexed from 0, and since DSA follows the same numbering system,
   this means up to 11 ports are to be presumed.

2. The registers with port mask fields always amount to a maximum port
   mask of 0x7FF, corresponding to a maximum 11 ports.

In view of this, I also deleted the comment.

Signed-off-by: Alvin Šipraga <[email protected]>
Reviewed-by: Luiz Angelo Daros de Luca <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: realtek: rtl8365mb: remove port_mask private data member

There is no real need for this variable: the line change interrupt mask
is sufficiently masked out when getting linkup_ind and linkdown_ind in
the interrupt handler.

Signed-off-by: Alvin Šipraga <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: realtek: rtl8365mb: rename macro RTL8367RB -> RTL8367RB_VB

The official name of this switch is RTL8367RB-VB, not RTL8367RB. There
is also an RTL8367RB-VC which is rather different. Change the name of
the CHIP_ID/_VER macros for reasons of consistency.

Signed-off-by: Alvin Šipraga <[email protected]>
Reviewed-by: Luiz Angelo Daros de Luca <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'net-ipa-more-multi-channel-event-ring-work'

Alex Elder says:

====================
net: ipa: more multi-channel event ring work

This series makes a little more progress toward supporting multiple
channels with a single event ring. The first removes the assumption
that consecutive events are associated with the same RX channel.

The second derives the channel associated with an event from the
event itself, and the next does a small cleanup enabled by that.

The fourth causes updates to occur for every event processed (rather
once). And the final patch does a little more rework to make TX
completion have more in common with RX completion.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: ipa: move more code out of gsi_channel_update()

Move the processing done for TX channels in gsi_channel_update()
into gsi_evt_ring_rx_update(). The called function is called for
both RX and TX channels, so rename it to be gsi_evt_ring_update().
As a result, this code no longer assumes events in an event ring are
associated with just one channel.

Because all events in a ring are handled in that function, we can
move the call to gsi_trans_move_complete() there, and can ring the
event ring doorbell there as well after all new events in the ring
have been processed.

Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ipa: call gsi_evt_ring_rx_update() unconditionally

When an RX transaction completes, we update the trans->len field to
contain the actual number of bytes received. This is done in a loop
in gsi_evt_ring_rx_update().

Change that function so it checks the data transfer direction
recorded in the transaction, and only updates trans->len for RX
transfers.

Then call it unconditionally. This means events for TX endpoints
will run through the loop without otherwise doing anything, but
this will change shortly.

Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ipa: pass GSI pointer to gsi_evt_ring_rx_update()

The only reason the event ring's channel pointer is needed in
gsi_evt_ring_rx_update() is so we can get at its GSI pointer.

We can pass the GSI pointer as an argument, along with the event
ring ID, and thereby avoid using the event ring channel pointer.
This is another step toward no longer assuming an event ring
services a single channel.

Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ipa: don't pass channel when mapping transaction

Change gsi_channel_trans_map() so it derives the channel used from
the transaction. Pass the index of the *first* TRE used by the
transaction, and have the called function account for the fact that
the last one used is what's important.

Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ipa: don't assume one channel per event ring

In gsi_evt_ring_rx_update(), use gsi_event_trans() repeatedly
to find the transaction associated with an event, rather than
assuming consecutive events are associated with the same channel.
This removes the only caller of gsi_trans_pool_next(), so get rid
of it.

Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'dt-bindings-dp83867-add-binding-for-io_impedance_ctrl-nvmem-cell'

Rasmus Villemoes says:

====================
dt-bindings: dp83867: add binding for io_impedance_ctrl nvmem cell

We have a board where measurements indicate that the current three
options - leaving IO_IMPEDANCE_CTRL at the reset value (which is
factory calibrated to a value corresponding to approximately 50 ohms)
or using one of the two boolean properties to set it to the min/max
value - are too coarse.

This series adds a device tree binding for an nvmem cell which can be
populated during production with a suitable value calibrated for each
board, and corresponding support in the driver. The second patch adds
a trivial phy wrapper for dev_err_probe(), used in the third.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phy: dp83867: implement support for io_impedance_ctrl nvmem cell

We have a board where measurements indicate that the current three
options - leaving IO_IMPEDANCE_CTRL at the (factory calibrated) reset
value or using one of the two boolean properties to set it to the
min/max value - are too coarse.

Implement support for the newly added binding allowing device tree to
specify an nvmem cell containing an appropriate value for this
specific board.

Signed-off-by: Rasmus Villemoes <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

linux/phy.h: add phydev_err_probe() wrapper for dev_err_probe()

The dev_err_probe() function is quite useful to avoid boilerplate
related to -EPROBE_DEFER handling. Add a phydev_err_probe() helper to
simplify making use of that from phy drivers which otherwise use the
phydev_* helpers.

Signed-off-by: Rasmus Villemoes <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

dt-bindings: dp83867: add binding for io_impedance_ctrl nvmem cell

We have a board where measurements indicate that the current three
options - leaving IO_IMPEDANCE_CTRL at the reset value (which is
factory calibrated to a value corresponding to approximately 50 ohms)
or using one of the two boolean properties to set it to the min/max
value - are too coarse.

There is no fixed mapping from register values to values in the range
35-70 ohms; it varies from chip to chip, and even that target range is
approximate. So add a DT binding for an nvmem cell which can be
populated during production with a value suitable for each specific
board.

Reviewed-by: Rob Herring <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Rasmus Villemoes <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

No conflicts.

Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'net-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Mostly driver fixes.

  Current release - regressions:

   - Revert "net: Add a second bind table hashed by port and address",
     needs more work

   - amd-xgbe: use platform_irq_count(), static setup of IRQ resources
     had been removed from DT core

   - dts: at91: ksz9477_evb: add phy-mode to fix port/phy validation

  Current release - new code bugs:

   - hns3: modify the ring param print info

  Previous releases - always broken:

   - axienet: make the 64b addressable DMA depends on 64b architectures

   - iavf: fix issue with MAC address of VF shown as zero

   - ice: fix PTP TX timestamp offset calculation

   - usb: ax88179_178a needs FLAG_SEND_ZLP

  Misc:

   - document some net.sctp.* sysctls"

* tag 'net-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits)
  net: axienet: add missing error return code in axienet_probe()
  Revert "net: Add a second bind table hashed by port and address"
  net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg
  net: usb: ax88179_178a needs FLAG_SEND_ZLP
  MAINTAINERS: add include/dt-bindings/net to NETWORKING DRIVERS
  ARM: dts: at91: ksz9477_evb: fix port/phy validation
  net: bgmac: Fix an erroneous kfree() in bgmac_remove()
  ice: Fix memory corruption in VF driver
  ice: Fix queue config fail handling
  ice: Sync VLAN filtering features for DVM
  ice: Fix PTP TX timestamp offset calculation
  mlxsw: spectrum_cnt: Reorder counter pools
  docs: networking: phy: Fix a typo
  amd-xgbe: Use platform_irq_count()
  octeontx2-vf: Add support for adaptive interrupt coalescing
  xilinx:  Fix build on x86.
  net: axienet: Use iowrite64 to write all 64b descriptor pointers
  net: axienet: make the 64b addresable DMA depends on 64b archectures
  net: hns3: fix tm port shapping of fibre port is incorrect after driver initialization
  net: hns3: fix PF rss size initialization bug
  ...

net: axienet: add missing error return code in axienet_probe()

It should return error code in error path in axienet_probe().

Fixes: 00be43a74ca2 ("net: axienet: make the 64b addresable DMA depends on 64b archectures")
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Yang Yingliang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Revert "net: Add a second bind table hashed by port and address"

This reverts:

commit d5a42de8bdbe ("net: Add a second bind table hashed by port and address")
commit 538aaf9b2383 ("selftests: Add test for timing a bind request to a port with a populated bhash entry")
Link: https://lore.kernel.org/netdev/[email protected]/
There are a few things that need to be fixed here:
* Updating bhash2 in cases where the socket's rcv saddr changes
* Adding bhash2 hashbucket locks

Links to syzbot reports:
https://lore.kernel.org/netdev/00000000000022208805e0df247a@google.com/
https://lore.kernel.org/netdev/0000000000003f33bc05dfaf44fe@google.com/

Fixes: d5a42de8bdbe ("net: Add a second bind table hashed by port and address")
Reported-by: [email protected]
Reported-by: [email protected]
Reported-by: [email protected]
Signed-off-by: Joanne Koong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'net-mana-add-pf-and-xdp_redirect-support'

Haiyang Zhang says:

====================
net: mana: Add PF and XDP_REDIRECT support

The patch set adds PF and XDP_REDIRECT support.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: mana: Add support of XDP_REDIRECT action

Add a handler of the XDP_REDIRECT return code from a XDP program. The
packets will be flushed at the end of each RX/CQ NAPI poll cycle.
ndo_xdp_xmit() is implemented by sharing the code in mana_xdp_tx().
Ethtool per queue counters are added for XDP redirect and xmit operations.

Signed-off-by: Haiyang Zhang <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: mana: Add the Linux MANA PF driver

This minimal PF driver runs on bare metal.
Currently Ethernet TX/RX works. SR-IOV management is not supported yet.

Signed-off-by: Dexuan Cui <[email protected]>
Co-developed-by: Haiyang Zhang <[email protected]>
Signed-off-by: Haiyang Zhang <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: ethernet: stmmac: reset force speed bit for ipq806x

Some bootloader may set the force speed regs even if the actual
interface should use autonegotiation between PCS and PHY.
This cause the complete malfuction of the interface.

To fix this correctly reset the force speed regs if a fixed-link is not
defined in the DTS. With a fixed-link node correctly configure the
forced speed regs to handle any misconfiguration by the bootloader.

Reported-by: Mark Mentovai <[email protected]>
Co-developed-by: Mark Mentovai <[email protected]>
Signed-off-by: Mark Mentovai <[email protected]>
Signed-off-by: Christian 'Ansuel' Marangi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: ethernet: stmmac: add missing sgmii configure for ipq806x

The different gmacid require different configuration based on the soc
and on the gmac id. Add these missing configuration taken from the
original driver.

Signed-off-by: Christian 'Ansuel' Marangi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

mlxbf_gige: remove own module name define and use KBUILD_MODNAME instead

This patch adds use of KBUILD_MODNAME as defined by the build system,
replacing the definition and use of a custom-defined name.

Signed-off-by: David Thompson <[email protected]>
Signed-off-by: Asmaa Mnebhi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'hardening-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening fixes from Kees Cook:

- Correctly handle vm_map areas in hardened usercopy (Matthew Wilcox)

- Adjust CFI RCU usage to avoid boot splats with cpuidle (Sami Tolvanen)

* tag 'hardening-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  usercopy: Make usercopy resilient against ridiculously large copies
  usercopy: Cast pointer to an integer once
  usercopy: Handle vm_map_ram() areas
  cfi: Fix __cfi_slowpath_diag RCU usage with cpuidle

Merge tag 'tpmdd-next-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull tpm fixes from Jarkko Sakkinen:
"Two fixes for this merge window"

* tag 'tpmdd-next-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
certs: fix and refactor CONFIG_SYSTEM_BLACKLIST_HASH_LIST build
certs/blacklist_hashes.c: fix const confusion in certs blacklist

certs: fix and refactor CONFIG_SYSTEM_BLACKLIST_HASH_LIST build

Commit addf466389d9 ("certs: Check that builtin blacklist hashes are
valid") was applied 8 months after the submission.

In the meantime, the base code had been removed by commit b8c96a6b466c
("certs: simplify $(srctree)/ handling and remove config_filename
macro").

Fix the Makefile.

Create a local copy of $(CONFIG_SYSTEM_BLACKLIST_HASH_LIST). It is
included from certs/blacklist_hashes.c and also works as a timestamp.

Send error messages from check-blacklist-hashes.awk to stderr instead
of stdout.

Fixes: addf466389d9 ("certs: Check that builtin blacklist hashes are valid")
Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Reviewed-by: Mickaël Salaün <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>

certs/blacklist_hashes.c: fix const confusion in certs blacklist

This file fails to compile as follows:

  CC      certs/blacklist_hashes.o
certs/blacklist_hashes.c:4:1: error: ignoring attribute ‘section (".init.data")’ because it conflicts with previous ‘section (".init.rodata")’ [-Werror=attributes]
    4 | const char __initdata *const blacklist_hashes[] = {
      | ^~~~~
In file included from certs/blacklist_hashes.c:2:
certs/blacklist.h:5:38: note: previous declaration here
    5 | extern const char __initconst *const blacklist_hashes[];
      |                                      ^~~~~~~~~~~~~~~~

Apply the same fix as commit 2be04df5668d ("certs/blacklist_nohashes.c:
fix const confusion in certs blacklist").

Fixes: 734114f8782f ("KEYS: Add a system blacklist keyring")
Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Reviewed-by: Mickaël Salaün <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>

Merge tag 'fs.fixes.v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull vfs idmapping fix from Christian Brauner:
"This fixes an issue where we fail to change the group of a file when
  the caller owns the file and is a member of the group to change to.

  This is only relevant on idmapped mounts.

  There's a detailed description in the commit message and regression
  tests have been added to xfstests"

* tag 'fs.fixes.v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  fs: account for group membership

net: sparx5: Allow mdb entries to both CPU and ports

Allow mdb entries to be forwarded to CPU and be switched at the same
time. Only remove entry when no port and the CPU isn't part of the group
anymore.

Signed-off-by: Casper Andersson <[email protected]>
Acked-by: Steen Hegelund <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg

The skb_recv_datagram() in ax25_recvmsg() will hold lock_sock
and block until it receives a packet from the remote. If the client
doesn`t connect to server and calls read() directly, it will not
receive any packets forever. As a result, the deadlock will happen.

The fail log caused by deadlock is shown below:

[  369.606973] INFO: task ax25_deadlock:157 blocked for more than 245 seconds.
[  369.608919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  369.613058] Call Trace:
[  369.613315]  <TASK>
[  369.614072]  __schedule+0x2f9/0xb20
[  369.615029]  schedule+0x49/0xb0
[  369.615734]  __lock_sock+0x92/0x100
[  369.616763]  ? destroy_sched_domains_rcu+0x20/0x20
[  369.617941]  lock_sock_nested+0x6e/0x70
[  369.618809]  ax25_bind+0xaa/0x210
[  369.619736]  __sys_bind+0xca/0xf0
[  369.620039]  ? do_futex+0xae/0x1b0
[  369.620387]  ? __x64_sys_futex+0x7c/0x1c0
[  369.620601]  ? fpregs_assert_state_consistent+0x19/0x40
[  369.620613]  __x64_sys_bind+0x11/0x20
[  369.621791]  do_syscall_64+0x3b/0x90
[  369.622423]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  369.623319] RIP: 0033:0x7f43c8aa8af7
[  369.624301] RSP: 002b:00007f43c8197ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
[  369.625756] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f43c8aa8af7
[  369.626724] RDX: 0000000000000010 RSI: 000055768e2021d0 RDI: 0000000000000005
[  369.628569] RBP: 00007f43c8197f00 R08: 0000000000000011 R09: 00007f43c8198700
[  369.630208] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff845e6afe
[  369.632240] R13: 00007fff845e6aff R14: 00007f43c8197fc0 R15: 00007f43c8198700

This patch replaces skb_recv_datagram() with an open-coded variant of it
releasing the socket lock before the __skb_wait_for_more_packets() call
and re-acquiring it after such call in order that other functions that
need socket lock could be executed.

what's more, the socket lock will be released only when recvmsg() will
block and that should produce nicer overall behavior.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: Thomas Osterried <[email protected]>
Signed-off-by: Duoming Zhou <[email protected]>
Reported-by: Thomas Habets <thomas@@habets.se>
Acked-by: Paolo Abeni <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bcm63xx_enet: switch to napi_build_skb() to reuse skbuff_heads

napi_build_skb() reuses NAPI skbuff_head cache in order to save some
cycles on freeing/allocating skbuff_heads on every new Rx or completed
Tx.
Use napi_consume_skb() to feed the cache with skbuff_heads of completed
Tx so it's never empty.

Signed-off-by: Sieng Piaw Liew <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: don't check skb_count twice

NAPI cache skb_count is being checked twice without condition. Change to
checking the second time only if the first check is run.

Signed-off-by: Sieng Piaw Liew <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: bridge: allow add/remove permanent mdb entries on disabled ports

Adding mdb entries on disabled ports allows you to do setup before
accepting any traffic, avoiding any time where the port is not in the
multicast group.

Signed-off-by: Casper Andersson <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: usb: ax88179_178a needs FLAG_SEND_ZLP

The extra byte inserted by usbnet.c when
(length % dev->maxpacket == 0) is causing problems to device.

This patch sets FLAG_SEND_ZLP to avoid this.

Tested with: 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet

Problems observed:
======================================================================
1) Using ssh/sshfs. The remote sshd daemon can abort with the message:
   "message authentication code incorrect"
   This happens because the tcp message sent is corrupted during the
   USB "Bulk out". The device calculate the tcp checksum and send a
   valid tcp message to the remote sshd. Then the encryption detects
   the error and aborts.
2) NETDEV WATCHDOG: ... (ax88179_178a): transmit queue 0 timed out
3) Stop normal work without any log message.
   The "Bulk in" continue receiving packets normally.
   The host sends "Bulk out" and the device responds with -ECONNRESET.
   (The netusb.c code tx_complete ignore -ECONNRESET)
Under normal conditions these errors take days to happen and in
intense usage take hours.

A test with ping gives packet loss, showing that something is wrong:
ping -4 -s 462 {destination} # 462 = 512 - 42 - 8
Not all packets fail.
My guess is that the device tries to find another packet starting
at the extra byte and will fail or not depending on the next
bytes (old buffer content).
======================================================================

Signed-off-by: Jose Alonso <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

i40e: add xdp frags support to ndo_xdp_xmit

Add the capability to map non-linear xdp frames in XDP_TX and ndo_xdp_xmit
callback.

Tested-by: Sarkar Tirthendu <[email protected]>
Signed-off-by: Lorenzo Bianconi <[email protected]>
Tested-by: George Kuruvinakunnel <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: marvell-88x2222: set proper phydev->port

phydev->port was not set and always reported as PORT_TP.
Set phydev->port according to inserted SFP module.

Signed-off-by: Ivan Bornyakov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

dt-bindings: net: xilinx: document xilinx emaclite driver binding

Add basic description for the xilinx emaclite driver DT bindings.

Signed-off-by: Radhey Shyam Pandey <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-06-14

This series contains updates to ice driver only.

Michal fixes incorrect Tx timestamp offset calculation for E822 devices.

Roman enforces required VLAN filtering settings for double VLAN mode.

Przemyslaw fixes memory corruption issues with VFs by ensuring
queues are disabled in the error path of VF queue configuration and to
disabled VFs during reset.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'ipa-simplify-completion-stats'

Alex Elder says:

====================
net: ipa: simplify completion statistics

The first patch in this series makes the name used for variables
representing a TRE ring be consistent everywhere.  The second
renames two structure fields to better represent their purpose.

The last four rework a little code that manages some tranaction and
byte transfer statistics maintained mainly for TX endpoints.  For
the most part this series is refactoring.  The last one also
includes the first step toward no longer assuming an event ring is
dedicated to a single channel.
====================

Signed-off-by: David S. Miller <[email protected]>

net: ipa: rework gsi_channel_tx_update()

Rename gsi_channel_tx_update() to be gsi_trans_tx_completed(), and
pass it just the transaction pointer, deriving the channel from the
transaction. Update the comments above the function to provide a
more concise description of how statistics for TX endpoints are
maintained and used.

Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ipa: stop counting total RX bytes and transactions

In gsi_evt_ring_rx_update(), we update each transaction so its len
field reflects the actual number of bytes received. In the process,
the total number of transactions and bytes processed on the channel
are summed, and added to a running total for the channel.

But we don't actually use those running totals for RX endpoints.
They're maintained for TX channels to support CoDel when they are
associated with a "real" network device.

So stop maintaining these totals for RX endpoints, and update the
comment where the fields are defined to make it clear they're only
valid for TX channels.

Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ipa: simplify TX completion statistics

When a TX request is issued, its channel's accumulated byte and
transaction counts are recorded.  This currently does *not* take
into account the transaction being committed.

Later, when the transaction completes, the number of bytes and
transactions that have completed since the transaction was committed
are reported to the network stack.  The transaction and its byte
count are accounted for at that time.

Instead, record the transaction and its bytes in the counts recorded
at commit time.  This avoids the need to do so when the transaction
completes, and provides a (small) simplification of that code.

Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ipa: introduce gsi_trans_tx_committed()

Create a new function that encapsulates recording information needed
for TX channel statistics when a transaction is committed.

Record the accumulated length in the transaction before the call
(for both RX and TX), so it can be used when updating TX statistics.

Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ipa: rename two transaction fields

There are two fields in a GSI transaction that keep track of TRE
counts.  The first represents the number of TREs reserved for the
transaction in the TRE ring; that's currently named "tre_count".
The second is the number of TREs that are actually *used* by the
transaction at the time it is committed.

Rename the "tre_count" field to be "rsvd_count", to make its meaning
a little more specific.  The "_count" is present in the name mainly
to avoid interpreting it as a reserved (not-to-be-used) field.  This
name also distinguishes it from the "tre_count" field associated
with a channel.

Rename the "used" field to be "used_count", to match the convention
used for reserved TREs.

Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ipa: use "tre_ring" for all TRE ring local variables

All local variables that represent event rings are named "ring".

All but two functions that represent a channel's TRE ring with a
local variable use the name "tre_ring". For consistency, use that
name in the two functions that don't fit the pattern.

Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'support-mt7531-on-bpi-r2-pro'

Frank Wunderlich says:

====================
Support mt7531 on BPI-R2 Pro

This Series add Support for the mt7531 switch on Bananapi R2 Pro board.

This board uses port5 of the switch to conect to the gmac0 of the
rk3568 SoC.

Currently CPU-Port is hardcoded in the mt7530 driver to port 6.

Compared to v1 the reset-Patch was dropped as it was not needed and
CPU-Port-changes are completely rewriten based on suggestions/code from
Vladimir Oltean (many thanks to this).
In DTS Patch i only dropped the status-property that was not
needed/ignored by driver.

Due to the Changes i also made a regression test on mt7623 bpi-r2
(mt7623 soc + mt7530) and bpi-r64 (mt7622 soc + mt7531) with cpu-
port 6. Tests were done directly (ipv4 config on dsa user port)
and with vlan-aware bridge including vlan that was tagged outgoing
on dsa user port.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

arm64: dts: rockchip: Add mt7531 dsa node to BPI-R2-Pro board

Add Device Tree node for mt7531 switch connected to gmac0.

Signed-off-by: Frank Wunderlich <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

dt-bindings: net: dsa: make reset optional and add rgmii-mode to mt7531

A board may have no independent reset-line, so reset cannot be used
inside switch driver.

E.g. on Bananapi-R2 Pro switch and gmac are connected to same reset-line.

Resets should be acquired only to 1 device/driver. This prevents reset to
be bound to switch-driver if reset is already used for gmac. If reset is
only used by switch driver it resets the switch *and* the gmac after the
mdio bus comes up resulting in mdio bus goes down. It takes some time
until all is up again, switch driver tries to read from mdio, will fail
and defer the probe. On next try the reset does the same again.

Make reset optional for such boards.

Allow port 5 as cpu-port and phy-mode rgmii for mt7531.

- MT7530 supports RGMII on port 5 and RGMII/TRGMII on port 6.
- MT7531 supports on port 5 RGMII and SGMII (dual-sgmii) and
SGMII on port 6.

Signed-off-by: Frank Wunderlich <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: mt7530: get cpu-port via dp->cpu_dp instead of constant

Replace last occurences of hardcoded cpu-port by cpu_dp member of
dsa_port struct.

Now the constant can be dropped.

Suggested-by: Vladimir Oltean <[email protected]>
Signed-off-by: Frank Wunderlich <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: mt7530: rework mt753[01]_setup

Enumerate available cpu-ports instead of using hardcoded constant.

Suggested-by: Vladimir Oltean <[email protected]>
Signed-off-by: Frank Wunderlich <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: mt7530: rework mt7530_hw_vlan_{add,del}

Rework vlan_add/vlan_del functions in preparation for dynamic cpu port.

Currently BIT(MT7530_CPU_PORT) is added to new_members, even though
mt7530_port_vlan_add() will be called on the CPU port too.

Let DSA core decide when to call port_vlan_add for the CPU port, rather
than doing it implicitly.

We can do autonomous forwarding in a certain VLAN, but not add br0 to that
VLAN and avoid flooding the CPU with those packets, if software knows it
doesn't need to process them.

Suggested-by: Vladimir Oltean <[email protected]>
Signed-off-by: Frank Wunderlich <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

dt-bindings: net: dsa: convert binding for mediatek switches

Convert txt binding to yaml binding for Mediatek switches.

Signed-off-by: Frank Wunderlich <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

MAINTAINERS: add include/dt-bindings/net to NETWORKING DRIVERS

Maintainers of the directory Documentation/devicetree/bindings/net
are also the maintainers of the corresponding directory
include/dt-bindings/net.

Add the file entry for include/dt-bindings/net to the appropriate
section in MAINTAINERS.

Signed-off-by: Lukas Bulwahn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ARM: dts: at91: ksz9477_evb: fix port/phy validation

Latest drivers version requires phy-mode to be set. Otherwise we will
use "NA" mode and the switch driver will invalidate this port mode.

Fixes: 65ac79e18120 ("net: dsa: microchip: add the phylink get_caps")
Signed-off-by: Oleksij Rempel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'mlxsw-remove-xm-support'

Ido Schimmel says:

====================
mlxsw: Remove XM support

The XM was supposed to be an external device connected to the
Spectrum-{2,3} ASICs using dedicated Ethernet ports. Its purpose was to
increase the number of routes that can be offloaded to hardware. This was
achieved by having the ASIC act as a cache that refers cache misses to the
XM where the FIB is stored and LPM lookup is performed.

Testing was done over an emulator and dedicated setups in the lab, but
the product was discontinued before shipping to customers.

Therefore, in order to remove dead code and reduce complexity of the
code base, revert the three patchsets that added XM support.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: Revert "Prepare for XM implementation - LPM trees"

This reverts commit 923ba95ea22d ("Merge branch
'mlxsw-spectrum-prepare-for-xm-implementation-lpm-trees'").

Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: Revert "Prepare for XM implementation - prefix insertion and removal"

This reverts commit e7086213f7b4 ("Merge branch
'mlxsw-spectrum-prepare-for-xm-implementation-prefix-insertion-and-removal'").

Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: Revert "Introduce initial XM router support"

This reverts commit 75c2a8fe8e39 ("Merge branch
'mlxsw-introduce-initial-xm-router-support'").

Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: bgmac: Fix an erroneous kfree() in bgmac_remove()

'bgmac' is part of a managed resource allocated with bgmac_alloc(). It
should not be freed explicitly.

Remove the erroneous kfree() from the .remove() function.

Fixes: 34a5102c3235 ("net: bgmac: allocate struct bgmac just once & don't copy it")
Signed-off-by: Christophe JAILLET <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/r/a026153108dd21239036a032b95c25b5cece253b.1655153616.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:

====================
mlx5-next: updates 2022-06-14

1) Updated HW bits and definitions for upcoming features
1.1) vport debug counters
1.2) flow meter
1.3) Execute ASO action for flow entry
1.4) enhanced CQE compression

2) Add ICM header-modify-pattern RDMA API

Leon Says
=========

SW steering manipulates packet's header using "modifying header" actions.
Many of these actions do the same operation, but use different data each time.
Currently we create and keep every one of these actions, which use expensive
and limited resources.

Now we introduce a new mechanism - pattern and argument, which splits
a modifying action into two parts:
1. action pattern: contains the operations to be applied on packet's header,
mainly set/add/copy of fields in the packet
2. action data/argument: contains the data to be used by each operation
in the pattern.

This way we reuse same patterns with different arguments to create new
modifying actions, and since many actions share the same operations, we end
up creating a small number of patterns that we keep in a dedicated cache.

These modify header patterns are implemented as new type of ICM memory,
so the following kernel patch series add the support for this new ICM type.
==========

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Add bits and fields to support enhanced CQE compression
  net/mlx5: Remove not used MLX5_CAP_BITS_RW_MASK
  net/mlx5: group fdb cleanup to single function
  net/mlx5: Add support EXECUTE_ASO action for flow entry
  net/mlx5: Add HW definitions of vport debug counters
  net/mlx5: Add IFC bits and enums for flow meter
  RDMA/mlx5: Support handling of modify-header pattern ICM area
  net/mlx5: Manage ICM of type modify-header pattern
  net/mlx5: Introduce header-modify-pattern ICM properties
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

netfs: fix up netfs_inode_init() docbook comment

Commit e81fb4198e27 ("netfs: Further cleanups after struct netfs_inode
wrapper introduced") changed the argument types and names, and actually
updated the comment too (although that was thanks to David Howells, not
me: my original patch only changed the code).

But the comment fixup didn't go quite far enough, and didn't change the
argument name in the comment, resulting in

include/linux/netfs.h:314: warning: Function parameter or member 'ctx' not described in 'netfs_inode_init'
include/linux/netfs.h:314: warning: Excess function parameter 'inode' description in 'netfs_inode_init'

during htmldoc generation.

Fixes: e81fb4198e27 ("netfs: Further cleanups after struct netfs_inode wrapper introduced")
Reported-by: Stephen Rothwell <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ice: Fix memory corruption in VF driver

Disable VF's RX/TX queues, when it's disabled. VF can have queues enabled,
when it requests a reset. If PF driver assumes that VF is disabled,
while VF still has queues configured, VF may unmap DMA resources.
In such scenario device still can map packets to memory, which ends up
silently corrupting it.
Previously, VF driver could experience memory corruption, which lead to
crash:
[ 5119.170157] BUG: unable to handle kernel paging request at 00001b9780003237
[ 5119.170166] PGD 0 P4D 0
[ 5119.170173] Oops: 0002 [#1] PREEMPT_RT SMP PTI
[ 5119.170181] CPU: 30 PID: 427592 Comm: kworker/u96:2 Kdump: loaded Tainted: G        W I      --------- -  - 4.18.0-372.9.1.rt7.166.el8.x86_64 #1
[ 5119.170189] Hardware name: Dell Inc. PowerEdge R740/014X06, BIOS 2.3.10 08/15/2019
[ 5119.170193] Workqueue: iavf iavf_adminq_task [iavf]
[ 5119.170219] RIP: 0010:__page_frag_cache_drain+0x5/0x30
[ 5119.170238] Code: 0f 0f b6 77 51 85 f6 74 07 31 d2 e9 05 df ff ff e9 90 fe ff ff 48 8b 05 49 db 33 01 eb b4 0f 1f 80 00 00 00 00 0f 1f 44 00 00 <f0> 29 77 34 74 01 c3 48 8b 07 f6 c4 80 74 0f 0f b6 77 51 85 f6 74
[ 5119.170244] RSP: 0018:ffffa43b0bdcfd78 EFLAGS: 00010282
[ 5119.170250] RAX: ffffffff896b3e40 RBX: ffff8fb282524000 RCX: 0000000000000002
[ 5119.170254] RDX: 0000000049000000 RSI: 0000000000000000 RDI: 00001b9780003203
[ 5119.170259] RBP: ffff8fb248217b00 R08: 0000000000000022 R09: 0000000000000009
[ 5119.170262] R10: 2b849d6300000000 R11: 0000000000000020 R12: 0000000000000000
[ 5119.170265] R13: 0000000000001000 R14: 0000000000000009 R15: 0000000000000000
[ 5119.170269] FS:  0000000000000000(0000) GS:ffff8fb1201c0000(0000) knlGS:0000000000000000
[ 5119.170274] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5119.170279] CR2: 00001b9780003237 CR3: 00000008f3e1a003 CR4: 00000000007726e0
[ 5119.170283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5119.170286] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 5119.170290] PKRU: 55555554
[ 5119.170292] Call Trace:
[ 5119.170298]  iavf_clean_rx_ring+0xad/0x110 [iavf]
[ 5119.170324]  iavf_free_rx_resources+0xe/0x50 [iavf]
[ 5119.170342]  iavf_free_all_rx_resources.part.51+0x30/0x40 [iavf]
[ 5119.170358]  iavf_virtchnl_completion+0xd8a/0x15b0 [iavf]
[ 5119.170377]  ? iavf_clean_arq_element+0x210/0x280 [iavf]
[ 5119.170397]  iavf_adminq_task+0x126/0x2e0 [iavf]
[ 5119.170416]  process_one_work+0x18f/0x420
[ 5119.170429]  worker_thread+0x30/0x370
[ 5119.170437]  ? process_one_work+0x420/0x420
[ 5119.170445]  kthread+0x151/0x170
[ 5119.170452]  ? set_kthread_struct+0x40/0x40
[ 5119.170460]  ret_from_fork+0x35/0x40
[ 5119.170477] Modules linked in: iavf sctp ip6_udp_tunnel udp_tunnel mlx4_en mlx4_core nfp tls vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr iTCO_wdt iTCO_vendor_support dell_smbios wmi_bmof dell_wmi_descriptor dcdbas kvm_intel kvm irqbypass intel_rapl_common isst_if_common skx_edac irdma nfit libnvdimm x86_pkg_temp_thermal i40e intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ib_uverbs rapl ipmi_ssif intel_cstate intel_uncore mei_me pcspkr acpi_ipmi ib_core mei lpc_ich i2c_i801 ipmi_si ipmi_devintf wmi ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ice ahci drm libahci crc32c_intel libata tg3 megaraid_sas
[ 5119.170613]  i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: iavf]
[ 5119.170627] CR2: 00001b9780003237

Fixes: ec4f5a436bdf ("ice: Check if VF is disabled for Opcode and other operations")
Signed-off-by: Przemyslaw Patynowski <[email protected]>
Co-developed-by: Slawomir Laba <[email protected]>
Signed-off-by: Slawomir Laba <[email protected]>
Signed-off-by: Mateusz Palczewski <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: Fix queue config fail handling

Disable VF's RX/TX queues, when VIRTCHNL_OP_CONFIG_VSI_QUEUES fail.
Not disabling them might lead to scenario, where PF driver leaves VF
queues enabled, when VF's VSI failed queue config.
In this scenario VF should not have RX/TX queues enabled. If PF failed
to set up VF's queues, VF will reset due to TX timeouts in VF driver.
Initialize iterator 'i' to -1, so if error happens prior to configuring
queues then error path code will not disable queue 0. Loop that
configures queues will is using same iterator, so error path code will
only disable queues that were configured.

Fixes: 77ca27c41705 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap")
Suggested-by: Slawomir Laba <[email protected]>
Signed-off-by: Przemyslaw Patynowski <[email protected]>
Signed-off-by: Mateusz Palczewski <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: Sync VLAN filtering features for DVM

VLAN filtering features, that is C-Tag and S-Tag, in DVM mode must be
both enabled or disabled.
In case of turning off/on only one of the features, another feature must
be turned off/on automatically with issuing an appropriate message to
the kernel log.

Fixes: 1babaf77f49d ("ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev")
Signed-off-by: Roman Storozhenko <[email protected]>
Co-developed-by: Anatolii Gerasymenko <[email protected]>
Signed-off-by: Anatolii Gerasymenko <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

ice: Fix PTP TX timestamp offset calculation

The offset was being incorrectly calculated for E822 - that led to
collisions in choosing TX timestamp register location when more than
one port was trying to use timestamping mechanism.

In E822 one quad is being logically split between ports, so quad 0 is
having trackers for ports 0-3, quad 1 ports 4-7 etc. Each port should
have separate memory location for tracking timestamps. Due to error for
example ports 1 and 2 had been assigned to quad 0 with same offset (0),
while port 1 should have offset 0 and 1 offset 16.

Fix it by correctly calculating quad offset.

Fixes: 3a7496234d17 ("ice: implement basic E822 PTP support")
Signed-off-by: Michal Michalik <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"While last week's pull request contained miscellaneous fixes for x86,
  this one covers other architectures, selftests changes, and a bigger
  series for APIC virtualization bugs that were discovered during 5.20
  development. The idea is to base 5.20 development for KVM on top of
  this tag.

  ARM64:

   - Properly reset the SVE/SME flags on vcpu load

   - Fix a vgic-v2 regression regarding accessing the pending state of a
     HW interrupt from userspace (and make the code common with vgic-v3)

   - Fix access to the idreg range for protected guests

   - Ignore 'kvm-arm.mode=protected' when using VHE

   - Return an error from kvm_arch_init_vm() on allocation failure

   - A bunch of small cleanups (comments, annotations, indentation)

  RISC-V:

   - Typo fix in arch/riscv/kvm/vmid.c

   - Remove broken reference pattern from MAINTAINERS entry

  x86-64:

   - Fix error in page tables with MKTME enabled

   - Dirty page tracking performance test extended to running a nested
     guest

   - Disable APICv/AVIC in cases that it cannot implement correctly"

[ This merge also fixes a misplaced end parenthesis bug introduced in
  commit 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC
  ID or APIC base") pointed out by Sean Christopherson ]

Link: https://lore.kernel.org/all/[email protected]/
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits)
  KVM: selftests: Restrict test region to 48-bit physical addresses when using nested
  KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
  KVM: selftests: Clean up LIBKVM files in Makefile
  KVM: selftests: Link selftests directly with lib object files
  KVM: selftests: Drop unnecessary rule for STATIC_LIBS
  KVM: selftests: Add a helper to check EPT/VPID capabilities
  KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h
  KVM: selftests: Refactor nested_map() to specify target level
  KVM: selftests: Drop stale function parameter comment for nested_map()
  KVM: selftests: Add option to create 2M and 1G EPT mappings
  KVM: selftests: Replace x86_page_size with PG_LEVEL_XX
  KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE
  KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put
  KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blocking
  KVM: x86: disable preemption while updating apicv inhibition
  KVM: x86: SVM: fix avic_kick_target_vcpus_fast
  KVM: x86: SVM: remove avic's broken code that updated APIC ID
  KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base
  KVM: x86: document AVIC/APICv inhibit reasons
  KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs
  ...

Merge tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 MMIO stale data fixes from Thomas Gleixner:
"Yet another hw vulnerability with a software mitigation: Processor
  MMIO Stale Data.

  They are a class of MMIO-related weaknesses which can expose stale
  data by propagating it into core fill buffers. Data which can then be
  leaked using the usual speculative execution methods.

  Mitigations include this set along with microcode updates and are
  similar to MDS and TAA vulnerabilities: VERW now clears those buffers
  too"

* tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation/mmio: Print SMT warning
  KVM: x86/speculation: Disable Fill buffer clear within guests
  x86/speculation/mmio: Reuse SRBDS mitigation for SBDS
  x86/speculation/srbds: Update SRBDS mitigation selection
  x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data
  x86/speculation/mmio: Enable CPU Fill buffer clearing on idle
  x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigations
  x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data
  x86/speculation: Add a common function for MD_CLEAR mitigation update
  x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug
  Documentation: Add documentation for Processor MMIO Stale Data