Git Repo - linux.git/log

Merge branch 'mlxsw-introduce-initial-xm-router-support'

Ido Schimmel says:

====================
mlxsw: Introduce initial XM router support

This patch set implements initial eXtended Mezzanine (XM) router
support.

The XM is an external device connected to the Spectrum-{2,3} ASICs using
dedicated Ethernet ports. Its purpose is to increase the number of
routes that can be offloaded to hardware. This is achieved by having the
ASIC act as a cache that refers cache misses to the XM where the FIB is
stored and LPM lookup is performed.

Future patch sets will add more sophisticated cache flushing and
selftests that utilize cache counters on the ASIC, which we plan to
expose via devlink-metric [1].

Patch set overview:

Patches #1-#2 add registers to insert/remove routes to/from the XM and
to enable/disable it. Patch #3 utilizes these registers in order to
implement XM-specific router low-level operations.

Patches #4-#5 query from firmware the availability of the XM and the
local ports that are used to connect the ASIC to the XM, so that netdevs
will not be created for them.

Patches #6-#8 initialize the XM by configuring its cache parameters.

Patch #9-#10 implement cache management, so that LPM lookup will be
correctly cached in the ASIC.

Patches #11-#13 implement cache flushing, so that routes
insertions/removals to/from the XM will flush the affected entries in
the cache.

Patch #14 configures the ASIC to allocate half of its memory for the
cache, so that room will be left for other entries (e.g., FDBs,
neighbours).

Patch #15 starts using the XM for IPv4 route offload, when available.

[1] https://lore.kernel.org/netdev/20200817125059 [email protected]/
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_router: Use eXtended mezzanine to offload IPv4 router

In case the eXtended mezzanine is present on the system, use it for IPv4
router offload.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum: Set KVH XLT cache mode for Spectrum2/3

Set a profile option to instruct FW to use 1/2 of KVH for XLT cache, not
the whole one.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_router_xm: Introduce basic XM cache flushing

Upon route insertion and removal, it is needed to flush possibly cached
entries from the XM cache. Extend XM op context to carry information
needed for the flush. Implement the flush in delayed work since for HW
design reasons there is a need to wait 50usec before the flush can be
done. If during this time comes the same flush request, consolidate it
to the first one. Implement this queued flushes by a hashtable.

v2:
* Fix GENMASK() high bit

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: reg: Add Router LPM Cache Enable Register

The RLPMCE allows disabling the LPM cache. Can be changed on the fly.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: reg: Add Router LPM Cache ML Delete Register

The RLCMLD register is used to bulk delete the XLT-LPM cache ML entries.
This can be used by SW when L is increased or decreased, thus need to
remove entries with old ML values.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_router_xm: Implement L-value tracking for M-index

There is a table that assigns L-value per M-index. The L is always the
biggest from the currently inserted prefixes. Setup a hashtable to track
the M-index information and the prefixes that are related to it. Ensure
the L-value is always correctly set.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: reg: Add XM Router M Table Register

The XRMT configures the M-Table for the XLT-LPM.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_router: Introduce per-ASIC XM initialization

During the router init flow, call into XM code and initialize couple of
items needed for XM functionality:

1) Query the capabilities and sizes. Check the XM device id.
2) Initialize the M-value. Note that currently the M-value is set fixed
to 16 for IPv4. In future this may change to better cover the actual
inserted routes.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: reg: Add XM Lookup Table Query Register

The XLTQ is used to query HW for XM-related info.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: reg: Add Router XLT M select Register

The RXLTM configures and selects the M for the XM lookups.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: Ignore ports that are connected to eXtended mezanine

Use the info stored in the bus_info struct about the eXtended mezanine
connected ports and don't expose them.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: pci: Obtain info about ports used by eXtended mezanine

The output of boardinfo command was extended to contain information
about XM. Indicates if is present and in case it is, tells which
localports are used for the connection. So parse this info and store it
in bus_info passed up to the driver.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_router: Introduce XM implementation of router low-level ops

In order to offload entries to XM, implement a set of low-level
functions to work with LPM trees in XM and also to pack and write
FIB entries into XM.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: reg: Add Router XLT Enable Register

The RXLTE enables XLT (eXtended Lookup Table) LPM lookups if a capable
XM is present on the system.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: reg: Add XM Direct Register

The XMDR allows direct access to the XM device via the switch.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'bnxt_en-improve-firmware-flashing'

Michael Chan says:

====================
bnxt_en: Improve firmware flashing.

This patchset improves firmware flashing in 2 ways:

- If firmware returns NO_SPACE error during flashing, the driver will
create the UPDATE directory with more staging area and retry.
- Instead of allocating a big DMA buffer for the entire contents of
the firmware package size, fallback to a smaller buffer to DMA the
contents in multiple DMA operations.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

bnxt_en: Enable batch mode when using HWRM_NVM_MODIFY to flash packages.

The current scheme allocates a DMA buffer as big as the requested
firmware package file and DMAs the contents to firmware in one
operation.  The buffer size can be several hundred kilo bytes and
the driver may not be able to allocate the memory.  This will cause
firmware upgrade to fail.

Improve the scheme by using smaller DMA blocks and calling firmware to
DMA each block in a batch mode.  Older firmware can cause excessive
NVRAM erases if the block size is too small so we try to allocate a
256K buffer to begin with and size it down successively if we cannot
allocate the memory.

Reviewed-by: Edwin Peer <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

bnxt_en: Retry installing FW package under NO_SPACE error condition.

In bnxt_flash_package_from_fw_obj(), if firmware returns the NO_SPACE
error, call __bnxt_flash_nvram() to create the UPDATE directory and
then loop back and retry one more time.

Since the first try may fail, we use the silent version to send the
firmware commands.

Reviewed-by: Vasundhara Volam <[email protected]>
Reviewed-by: Edwin Peer <[email protected]>
Signed-off-by: Pavan Chebbi <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

bnxt_en: Restructure bnxt_flash_package_from_fw_obj() to execute in a loop.

On NICs with a smaller NVRAM, FW installation may fail after multiple
updates due to fragmentation.  The driver can retry when FW returns
a special error code.  To faciliate the retry, we restructure the
logic that performs the flashing in a loop.  The actual retry logic
will be added in the next patch.

Signed-off-by: Pavan Chebbi <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

bnxt_en: Rearrange the logic in bnxt_flash_package_from_fw_obj().

This function will be modified in the next patch to retry flashing
the firmware in a loop. To facilate that, we rearrange the code so
that the steps that only need to be done once before the loop will be
moved to the top of the function.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

bnxt_en: Refactor bnxt_flash_nvram.

Refactor bnxt_flash_nvram() into __bnxt_flash_nvram() that takes an
additional dir_item_len parameter. The new function will be used
in subsequent patches with the dir_item_len parameter set to create
the UPDATE directory during flashing.

Signed-off-by: Pavan Chebbi <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

MAINTAINERS: add mvpp2 driver entry

Since its creation Marvell NIC driver for Armada 375/7k8k and
CN913x SoC families mvpp2 has been lacking an entry in MAINTAINERS,
which sometimes lead to unhandled bugs that persisted
across several kernel releases.

Signed-off-by: Marcin Wojtas <[email protected]>
Acked-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: drop bogus skb with CHECKSUM_PARTIAL and offset beyond end of trimmed packet

syzbot reproduces BUG_ON in skb_checksum_help():
tun creates (bogus) skb with huge partial-checksummed area and
small ip packet inside. Then ip_rcv trims the skb based on size
of internal ip packet, after that csum offset points beyond of
trimmed skb. Then checksum_tg() called via netfilter hook
triggers BUG_ON:

offset = skb_checksum_start_offset(skb);
BUG_ON(offset >= skb_headlen(skb));

To work around the problem this patch forces pskb_trim_rcsum_slow()
to return -EINVAL in described scenario. It allows its callers to
drop such kind of packets.

Link: https://syzkaller.appspot.com/bug?id=b419a5ca95062664fe1a60b764621eb4526e2cd0
Reported-by: [email protected]
Signed-off-by: Vasily Averin <[email protected]>
Acked-by: Willem de Bruijn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

inet_ecn: Use csum16_add() helper for IP_ECN_set_* helpers

Jakub pointed out that the IP_ECN_set* helpers basically open-code
csum16_add(), so let's switch them over to using the helper instead.

v2:
- Use __be16 for check_add stack variable in IP_ECN_set_ce() (kbot)
v3:
- Turns out we need __force casts to do arithmetic on __be16 types

Reported-by: Jakub Kicinski <[email protected]>
Tested-by: Jonathan Morton <[email protected]>
Tested-by: Pete Heist <[email protected]>
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: bridge: Fix a warning when del bridge sysfs

I got a warining report:

br_sysfs_addbr: can't create group bridge4/bridge
------------[ cut here ]------------
sysfs group 'bridge' not found for kobject 'bridge4'
WARNING: CPU: 2 PID: 9004 at fs/sysfs/group.c:279 sysfs_remove_group fs/sysfs/group.c:279 [inline]
WARNING: CPU: 2 PID: 9004 at fs/sysfs/group.c:279 sysfs_remove_group+0x153/0x1b0 fs/sysfs/group.c:270
Modules linked in: iptable_nat
...
Call Trace:
  br_dev_delete+0x112/0x190 net/bridge/br_if.c:384
  br_dev_newlink net/bridge/br_netlink.c:1381 [inline]
  br_dev_newlink+0xdb/0x100 net/bridge/br_netlink.c:1362
  __rtnl_newlink+0xe11/0x13f0 net/core/rtnetlink.c:3441
  rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3500
  rtnetlink_rcv_msg+0x385/0x980 net/core/rtnetlink.c:5562
  netlink_rcv_skb+0x134/0x3d0 net/netlink/af_netlink.c:2494
  netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
  netlink_unicast+0x4a0/0x6a0 net/netlink/af_netlink.c:1330
  netlink_sendmsg+0x793/0xc80 net/netlink/af_netlink.c:1919
  sock_sendmsg_nosec net/socket.c:651 [inline]
  sock_sendmsg+0x139/0x170 net/socket.c:671
  ____sys_sendmsg+0x658/0x7d0 net/socket.c:2353
  ___sys_sendmsg+0xf8/0x170 net/socket.c:2407
  __sys_sendmsg+0xd3/0x190 net/socket.c:2440
  do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

In br_device_event(), if the bridge sysfs fails to be added,
br_device_event() should return error. This can prevent warining
when removing bridge sysfs that do not exist.

Fixes: bb900b27a2f4 ("bridge: allow creating bridge devices with netlink")
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Wang Hai <[email protected]>
Tested-by: Nikolay Aleksandrov <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ice, xsk: Move Rx allocation out of while-loop

Instead doing the check for allocation in each loop, move it outside
the while loop and do it every NAPI loop.

This change boosts the xdpsock rxdrop scenario with 15% more
packets-per-second.

Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Björn Töpel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: mtk_eth: simplify the mediatek code return expression

Simplify the return expression at mtk_eth_path.c file, simplify this all.

Signed-off-by: Zheng Yongjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'add-devlink-and-devlink-health-reporters-to'

George Cherian says:

====================
Add devlink and devlink health reporters to octeontx2

Add basic devlink and devlink health reporters.
Devlink health reporters are added for NPA block.

Address Jakub's comment to add devlink support for error reporting.
https://www.spinics.net/lists/netdev/msg670712.html

For now, I have dropped the NIX block health reporters.
This series attempts to add health reporters only for the NPA block.
As per Jakub's suggestion separate reporters per event is used and also
got rid of the counters.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

docs: octeontx2: Add Documentation for NPA health reporters

Add Documentation for devlink health reporters for NPA block.

Signed-off-by: George Cherian <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

octeontx2-af: Add devlink health reporters for NPA

Add health reporters for RVU NPA block.
NPA Health reporters handle following HW event groups
- GENERAL events
- ERROR events
- RAS events
- RVU event

Output:
#devlink health
pci/0002:01:00.0:
   reporter hw_npa_intr
     state healthy error 0 recover 0 grace_period 0 auto_recover true
auto_dump true
   reporter hw_npa_gen
     state healthy error 0 recover 0 grace_period 0 auto_recover true
auto_dump true
   reporter hw_npa_err
     state healthy error 0 recover 0 grace_period 0 auto_recover true
auto_dump true
   reporter hw_npa_ras
     state healthy error 0 recover 0 grace_period 0 auto_recover true
auto_dump true

#devlink health dump show  pci/0002:01:00.0 reporter hw_npa_err
NPA_AF_ERR:
        NPA Error Interrupt Reg : 4096
        AQ Doorbell Error
#devlink health dump show  pci/0002:01:00.0 reporter hw_npa_ras
NPA_AF_RVU_RAS:
        NPA RAS Interrupt Reg : 0

Each reporter dump shows the Register value and the description of the
cause.

Signed-off-by: Sunil Kovvuri Goutham <[email protected]>
Signed-off-by: Jerin Jacob <[email protected]>
Signed-off-by: George Cherian <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

octeontx2-af: Add devlink suppoort to af driver

Add devlink support to AF driver. Basic devlink support is added.
Currently info_get is the only supported devlink ops.

devlink ouptput looks like this
# devlink dev
pci/0002:01:00.0
# devlink dev info
pci/0002:01:00.0:
driver octeontx2-af
#

Signed-off-by: Sunil Kovvuri Goutham <[email protected]>
Signed-off-by: Jerin Jacob <[email protected]>
Signed-off-by: George Cherian <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: test_vxlan_under_vrf: mute unnecessary error message

The cleanup function in this script that tries to delete hv-1 / hv-2
vm-1 / vm-2 netns will generate some uncessary error messages:

Cannot remove namespace file "/run/netns/hv-2": No such file or directory
Cannot remove namespace file "/run/netns/vm-1": No such file or directory
Cannot remove namespace file "/run/netns/vm-2": No such file or directory

Redirect it to /dev/null like other commands in the cleanup function
to reduce confusion.

Signed-off-by: Po-Hsu Lin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: Limit logical shift left of TCP probe0 timeout

For each TCP zero window probe, the icsk_backoff is increased by one and
its max value is tcp_retries2. If tcp_retries2 is greater than 63, the
probe0 timeout shift may exceed its max bits. On x86_64/ARMv8/MIPS, the
shift count would be masked to range 0 to 63. And on ARMv7 the result is
zero. If the shift count is masked, only several probes will be sent
with timeout shorter than TCP_RTO_MAX. But if the timeout is zero, it
needs tcp_retries2 times probes to end this false timeout. Besides,
bitwise shift greater than or equal to the width is an undefined
behavior.

This patch adds a limit to the backoff. The max value of max_when is
TCP_RTO_MAX and the min value of timeout base is TCP_RTO_MIN. The limit
is the backoff from TCP_RTO_MIN to TCP_RTO_MAX.

Signed-off-by: Cambda Zhu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'mptcp-another-set-of-miscellaneous-mptcp-fixes'

Mat Martineau says:

====================
mptcp: Another set of miscellaneous MPTCP fixes

This is another collection of MPTCP fixes and enhancements that we have
tested in the MPTCP tree:

Patch 1 cleans up cgroup attachment for in-kernel subflow sockets.

Patches 2 and 3 make sure that deletion of advertised addresses by an
MPTCP path manager when flushing all addresses behaves similarly to the
remove-single-address operation, and adds related tests.

Patches 4 and 8 do some minor cleanup.

Patches 5-7 add MPTCP_FASTCLOSE functionality. Note that patch 6 adds MPTCP
option parsing to tcp_reset().

Patch 9 optimizes skb size for outgoing MPTCP packets.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

mptcp: let MPTCP create max size skbs

Currently the xmit path of the MPTCP protocol creates smaller-
than-max-size skbs, which is suboptimal for the performances.

There are a few things to improve:
- when coalescing to an existing skb, must clear the PUSH flag
- tcp_build_frag() expect the available space as an argument.
  When coalescing is enable MPTCP already subtracted the
  to-be-coalesced skb len. We must increment said argument
  accordingly.

Before:
./use_mptcp.sh netperf -H 127.0.0.1 -t TCP_STREAM
[...]
131072  16384  16384    30.00    24414.86

After:
./use_mptcp.sh netperf -H 127.0.0.1 -t TCP_STREAM
[...]
131072  16384  16384    30.05    28357.69

Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mptcp: pm: simplify select_local_address()

There is no need to unconditionally acquire the join list
lock, we can simply splice the join list into the subflow
list and traverse only the latter.

Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mptcp: parse and act on incoming FASTCLOSE option

parse the MPTCP FASTCLOSE subtype.

If provided key matches the local one, schedule the work queue to close
(with tcp reset) all subflows.

The MPTCP socket moves to closed state immediately.

Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

tcp: parse mptcp options contained in reset packets

Because TCP-level resets only affect the subflow, there is a MPTCP
option to indicate that the MPTCP-level connection should be closed
immediately without a mptcp-level fin exchange.

This is the 'MPTCP fast close option'. It can be carried on ack
segments or TCP resets. In the latter case, its needed to parse mptcp
options also for reset packets so that MPTCP can act accordingly.

Next patch will add receive side fastclose support in MPTCP.

Acked-by: Matthieu Baerts <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mptcp: hold mptcp socket before calling tcp_done

When processing options from tcp reset path its possible that
tcp_done(ssk) drops the last reference on the mptcp socket which
results in use-after-free.

Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mptcp: use MPTCPOPT_HMAC_LEN macro

Use the macro MPTCPOPT_HMAC_LEN instead of a constant in struct
mptcp_options_received.

Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: mptcp: add the flush addrs testcase

This patch added the flush addrs testcase. In do_transfer, if the number
of removing addresses is less than 8, use the del addr command to remove
the addresses one by one. If the number is more than 8, use the flush addrs
command to remove the addresses.

Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mptcp: remove address when netlink flushes addrs

When the PM netlink flushes the addresses, invoke the remove address
function mptcp_nl_remove_subflow_and_signal_addr to remove the addresses
and the subflows. Since this function should not be invoked under lock,
move __flush_addrs out of the pernet->lock.

Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

mptcp: attach subflow socket to parent cgroup

It has been observed that the kernel sockets created for the subflows
(except the first one) are not in the same cgroup as their parents.
That's because the additional subflows are created by kernel workers.

This is a problem with eBPF programs attached to the parent's
cgroup won't be executed for the children. But also with any other features
of CGroup linked to a sk.

This patch fixes this behaviour.

As the subflow sockets are created by the kernel, we can't use
'mem_cgroup_sk_alloc' because of the current context being the one of the
kworker. This is why we have to do low level memcg manipulation, if
required.

Suggested-by: Matthieu Baerts <[email protected]>
Suggested-by: Paolo Abeni <[email protected]>
Acked-by: Matthieu Baerts <[email protected]>
Signed-off-by: Nicolas Rybowski <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: mhi: Fix unexpected queue wake

This patch checks that MHI queue is not full before waking up the net
queue. This fix sporadic MHI queueing issues in xmit. Indeed xmit and
its symmetric complete callback (ul_callback) can run concurently, it
is then not safe to unconditionnaly waking the queue in the callback
without checking queue fullness.

Fixes: 3ffec6a14f24 ("net: Add mhi-net driver")
Signed-off-by: Loic Poulain <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: mv88e6xxx: don't set non-existing learn2all bit for 6220/6250

The 6220 and 6250 switches do not have a learn2all bit in global1, ATU
control register; bit 3 is reserverd.

On the switches that do have that bit, it is used to control whether
learning frames are sent out the ports that have the message_port bit
set. So rather than adding yet another chip method, use the existence
of the ->port_setup_message_port method as a proxy for determining
whether the learn2all bit exists (and should be set).

Signed-off-by: Rasmus Villemoes <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: openvswitch: fix TTL decrement exception action execution

Currently, the exception actions are not processed correctly as the wrong
dataset is passed. This change fixes this, including the misleading
comment.

In addition, a check was added to make sure we work on an IPv4 packet,
and not just assume if it's not IPv6 it's IPv4.

This was all tested using OVS with patch,
https://patchwork.ozlabs.org/project/openvswitch/list/?series=21639,
applied and sending packets with a TTL of 1 (and 0), both with IPv4
and IPv6.

Fixes: 69929d4c49e1 ("net: openvswitch: fix TTL decrement action netlink message format")
Signed-off-by: Eelco Chaudron <[email protected]>
Link: https://lore.kernel.org/r/160733569860.3007.12938188180387116741.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

1) Missing dependencies in NFT_BRIDGE_REJECT, from Randy Dunlap.

2) Use atomic_inc_return() instead of atomic_add_return() in IPVS,
   from Yejune Deng.

3) Simplify check for overquota in xt_nfacct, from Kaixu Xia.

4) Move nfnl_acct_list away from struct net, from Miao Wang.

5) Pass actual sk in reject actions, from Jan Engelhardt.

6) Add timeout and protoinfo to ctnetlink destroy events,
   from Florian Westphal.

7) Four patches to generalize set infrastructure to support
   for multiple expressions per set element.

* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next:
  netfilter: nftables: netlink support for several set element expressions
  netfilter: nftables: generalize set extension to support for several expressions
  netfilter: nftables: move nft_expr before nft_set
  netfilter: nftables: generalize set expressions support
  netfilter: ctnetlink: add timeout and protoinfo to destroy events
  netfilter: use actual socket sk for REJECT action
  netfilter: nfnl_acct: remove data from struct net
  netfilter: Remove unnecessary conversion to bool
  ipvs: replace atomic_add_return()
  netfilter: nft_reject_bridge: fix build errors due to code movement
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2020-12-14

1) Expose bpf_sk_storage_*() helpers to iterator programs, from Florent Revest.

2) Add AF_XDP selftests based on veth devs to BPF selftests, from Weqaar Janjua.

3) Support for finding BTF based kernel attach targets through libbpf's
   bpf_program__set_attach_target() API, from Andrii Nakryiko.

4) Permit pointers on stack for helper calls in the verifier, from Yonghong Song.

5) Fix overflows in hash map elem size after rlimit removal, from Eric Dumazet.

6) Get rid of direct invocation of llc in BPF selftests, from Andrew Delgadillo.

7) Fix xsk_recvmsg() to reorder socket state check before access, from Björn Töpel.

8) Add new libbpf API helper to retrieve ring buffer epoll fd, from Brendan Jackman.

9) Batch of minor BPF selftest improvements all over the place, from Florian Lehner,
   KP Singh, Jiri Olsa and various others.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (31 commits)
  selftests/bpf: Add a test for ptr_to_map_value on stack for helper access
  bpf: Permits pointers on stack for helper calls
  libbpf: Expose libbpf ring_buffer epoll_fd
  selftests/bpf: Add set_attach_target() API selftest for module target
  libbpf: Support modules in bpf_program__set_attach_target() API
  selftests/bpf: Silence ima_setup.sh when not running in verbose mode.
  selftests/bpf: Drop the need for LLVM's llc
  selftests/bpf: fix bpf_testmod.ko recompilation logic
  samples/bpf: Fix possible hang in xdpsock with multiple threads
  selftests/bpf: Make selftest compilation work on clang 11
  selftests/bpf: Xsk selftests - adding xdpxceiver to .gitignore
  selftests/bpf: Drop tcp-{client,server}.py from Makefile
  selftests/bpf: Xsk selftests - Bi-directional Sockets - SKB, DRV
  selftests/bpf: Xsk selftests - Socket Teardown - SKB, DRV
  selftests/bpf: Xsk selftests - DRV POLL, NOPOLL
  selftests/bpf: Xsk selftests - SKB POLL, NOPOLL
  selftests/bpf: Xsk selftests framework
  bpf: Only provide bpf_sock_from_file with CONFIG_NET
  bpf: Return -ENOTSUPP when attaching to non-kernel BTF
  xsk: Validate socket state in xsk_recvmsg, prior touching socket members
  ...
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/bpf: Add a test for ptr_to_map_value on stack for helper access

Change bpf_iter_task.c such that pointer to map_value may appear
on the stack for bpf_seq_printf() to access. Without previous
verifier patch, the bpf_iter test will fail.

Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf: Permits pointers on stack for helper calls

Currently, when checking stack memory accessed by helper calls,
for spills, only PTR_TO_BTF_ID and SCALAR_VALUE are
allowed.

Song discovered an issue where the below bpf program
  int dump_task(struct bpf_iter__task *ctx)
  {
    struct seq_file *seq = ctx->meta->seq;
    static char[] info = "abc";
    BPF_SEQ_PRINTF(seq, "%s\n", info);
    return 0;
  }
may cause a verifier failure.

The verifier output looks like:
  ; struct seq_file *seq = ctx->meta->seq;
  1: (79) r1 = *(u64 *)(r1 +0)
  ; BPF_SEQ_PRINTF(seq, "%s\n", info);
  2: (18) r2 = 0xffff9054400f6000
  4: (7b) *(u64 *)(r10 -8) = r2
  5: (bf) r4 = r10
  ;
  6: (07) r4 += -8
  ; BPF_SEQ_PRINTF(seq, "%s\n", info);
  7: (18) r2 = 0xffff9054400fe000
  9: (b4) w3 = 4
  10: (b4) w5 = 8
  11: (85) call bpf_seq_printf#126
   R1_w=ptr_seq_file(id=0,off=0,imm=0) R2_w=map_value(id=0,off=0,ks=4,vs=4,imm=0)
  R3_w=inv4 R4_w=fp-8 R5_w=inv8 R10=fp0 fp-8_w=map_value
  last_idx 11 first_idx 0
  regs=8 stack=0 before 10: (b4) w5 = 8
  regs=8 stack=0 before 9: (b4) w3 = 4
  invalid indirect read from stack off -8+0 size 8

Basically, the verifier complains the map_value pointer at "fp-8" location.
To fix the issue, if env->allow_ptr_leaks is true, let us also permit
pointers on the stack to be accessible by the helper.

Reported-by: Song Liu <[email protected]>
Suggested-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

libbpf: Expose libbpf ring_buffer epoll_fd

This provides a convenient perf ringbuf -> libbpf ringbuf migration
path for users of external polling systems. It is analogous to
perf_buffer__epoll_fd.

Signed-off-by: Brendan Jackman <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

selftests/bpf: Add set_attach_target() API selftest for module target

Add test for bpf_program__set_attach_target() API, validating it can find
kernel module fentry target.

Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

libbpf: Support modules in bpf_program__set_attach_target() API

Support finding kernel targets in kernel modules when using
bpf_program__set_attach_target() API. This brings it up to par with what
libbpf supports when doing declarative SEC()-based target determination.

Some minor internal refactoring was needed to make sure vmlinux BTF can be
loaded before bpf_object's load phase.

Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

net: x25: Remove unimplemented X.25-over-LLC code stubs

According to the X.25 documentation, there was a plan to implement
X.25-over-802.2-LLC. It never finished but left various code stubs in the
X.25 code. At this time it is unlikely that it would ever finish so it
may be better to remove those code stubs.

Also change the documentation to make it clear that this is not a ongoing
plan anymore. Change words like "will" to "could", "would", etc.

Cc: Martin Schiller <[email protected]>
Signed-off-by: Xie He <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

inet: frags: batch fqdir destroy works

On a few of our systems, I found frequent 'unshare(CLONE_NEWNET)' calls
make the number of active slab objects including 'sock_inode_cache' type
rapidly and continuously increase.  As a result, memory pressure occurs.

In more detail, I made an artificial reproducer that resembles the
workload that we found the problem and reproduce the problem faster.  It
merely repeats 'unshare(CLONE_NEWNET)' 50,000 times in a loop.  It takes
about 2 minutes.  On 40 CPU cores / 70GB DRAM machine, the available
memory continuously reduced in a fast speed (about 120MB per second,
15GB in total within the 2 minutes).  Note that the issue don't
reproduce on every machine.  On my 6 CPU cores machine, the problem
didn't reproduce.

'cleanup_net()' and 'fqdir_work_fn()' are functions that deallocate the
relevant memory objects.  They are asynchronously invoked by the work
queues and internally use 'rcu_barrier()' to ensure safe destructions.
'cleanup_net()' works in a batched maneer in a single thread worker,
while 'fqdir_work_fn()' works for each 'fqdir_exit()' call in the
'system_wq'.  Therefore, 'fqdir_work_fn()' called frequently under the
workload and made the contention for 'rcu_barrier()' high.  In more
detail, the global mutex, 'rcu_state.barrier_mutex' became the
bottleneck.

This commit avoids such contention by doing the 'rcu_barrier()' and
subsequent lightweight works in a batched manner, as similar to that of
'cleanup_net()'.  The fqdir hashtable destruction, which is done before
the 'rcu_barrier()', is still allowed to run in parallel for fast
processing, but this commit makes it to use a dedicated work queue
instead of the 'system_wq', to make sure that the number of threads is
bounded.

Signed-off-by: SeongJae Park <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

nfc: s3fwrn5: let core configure the interrupt trigger

If interrupt trigger is not set when requesting the interrupt, the core
will take care of reading trigger type from Devicetree. There is no
point to do it in the driver.

Signed-off-by: Krzysztof Kozlowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: mt7530: enable MTU normalization

MT7530 has a global RX length register, so we are actually changing its
MRU.
Enable MTU normalization for this reason.

Signed-off-by: DENG Qingfang <[email protected]>
Acked-by: Landen Chao <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2020-12-12

Just one patch this time:

1) Redact the SA keys with kernel lockdown confidentiality.
   If enabled, no secret keys are sent to uuserspace.
   From Antony Antony.

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  xfrm: redact SA secret with lockdown confidentiality
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'wireless-drivers-next-2020-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for v5.11

Second set of patches for v5.11. iwlwifi gaining support for the new
6 GHz band and rtw88 got a new channel. Lots of new features for mt76
and ath11k now has working suspend for PCI devices. And as always,
smaller fixes and cleanups all over.

Major changes:

rtw88
* add support for channel 144

mt76
* support for more sta interfaces on mt7615/mt7915
* mt7915 encapsulation offload
* performance improvements
* channel noise report on mt7915
* mt7915 testmode support
* mt7915 DBDC support

iwlwifi
* support 6 GHz band

ath11k
* suspend support for QCA6390 PCI devices
* support TXOP duration based RTS threshold
* mesh: add support for 256 bitmap in blockack frames in 11ax

* tag 'wireless-drivers-next-2020-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next: (197 commits)
  ath11k: implement suspend for QCA6390 PCI devices
  ath11k: hif: add ce irq enable and disable functions
  ath11k: implement WoW enable and wakeup commands
  ath11k: set credit_update flag for flow controlled ep only
  ath11k: dp: stop rx pktlog before suspend
  ath11k: htc: implement suspend handling
  ath11k: htc: remove unused struct ath11k_htc_ops
  ath11k: pci: read select_window register to ensure write is finished
  ath11k: hif: implement suspend and resume functions
  ath11k: mhi: hook suspend and resume
  ath11k: Fix incorrect tlvs in scan start command
  ath11k: pci: disable VDD4BLOW
  ath11k: pci: fix L1ss clock unstable problem
  ath11k: pci: fix hot reset stability issues
  ath11k: put hw to DBS using WMI_PDEV_SET_HW_MODE_CMDID
  ath11k: mhi: print a warning if firmware crashed
  ath11k: use MHI provided APIs to allocate and free MHI controller
  ath10k: add atomic protection for device recovery
  ath10k: add option for chip-id based BDF selection
  mt76: remove unused variable q
  ...
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

netfilter: nftables: netlink support for several set element expressions

This patch adds three new netlink attributes to encapsulate a list of
expressions per set elements:

- NFTA_SET_EXPRESSIONS: this attribute provides the set definition in
  terms of expressions. New set elements get attached the list of
  expressions that is specified by this new netlink attribute.
- NFTA_SET_ELEM_EXPRESSIONS: this attribute allows users to restore (or
  initialize) the stateful information of set elements when adding an
  element to the set.
- NFTA_DYNSET_EXPRESSIONS: this attribute specifies the list of
  expressions that the set element gets when it is inserted from the
  packet path.

Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nftables: generalize set extension to support for several expressions

This patch replaces NFT_SET_EXPR by NFT_SET_EXT_EXPRESSIONS. This new
extension allows to attach several expressions to one set element (not
only one single expression as NFT_SET_EXPR provides). This patch
prepares for support for several expressions per set element in the
netlink userspace API.

Signed-off-by: Pablo Neira Ayuso <[email protected]>

Merge tag 'mac80211-next-for-net-next-2020-12-11' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
A new set of wireless changes:
* validate key indices for key deletion
* more preamble support in mac80211
* various 6 GHz scan fixes/improvements
* a common SAR power limitations API
* various small fixes & code improvements

* tag 'mac80211-next-for-net-next-2020-12-11' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: (35 commits)
  mac80211: add ieee80211_set_sar_specs
  nl80211: add common API to configure SAR power limitations
  mac80211: fix a mistake check for rx_stats update
  mac80211: mlme: save ssid info to ieee80211_bss_conf while assoc
  mac80211: Update rate control on channel change
  mac80211: don't filter out beacons once we start CSA
  mac80211: Fix calculation of minimal channel width
  mac80211: ignore country element TX power on 6 GHz
  mac80211: use bitfield helpers for BA session action frames
  mac80211: support Rx timestamp calculation for all preamble types
  mac80211: don't set set TDLS STA bandwidth wider than possible
  mac80211: support driver-based disconnect with reconnect hint
  cfg80211: support immediate reconnect request hint
  mac80211: use struct assignment for he_obss_pd
  cfg80211: remove struct ieee80211_he_bss_color
  nl80211: validate key indexes for cfg80211_registered_device
  cfg80211: include block-tx flag in channel switch started event
  mac80211: disallow band-switch during CSA
  ieee80211: update reduced neighbor report TBTT info length
  cfg80211: Save the regulatory domain when setting custom regulatory
  ...
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

netfilter: nftables: move nft_expr before nft_set

Move the nft_expr structure definition before nft_set. Expressions are
used by rules and sets, remove unnecessary forward declarations. This
comes as preparation to support for multiple expressions per set element.

Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nftables: generalize set expressions support

Currently, the set infrastucture allows for one single expressions per
element. This patch extends the existing infrastructure to allow for up
to two expressions. This is not updating the netlink API yet, this is
coming as an initial preparation patch.

Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: ctnetlink: add timeout and protoinfo to destroy events

DESTROY events do not include the remaining timeout.

Add the timeout if the entry was removed explicitly. This can happen
when a conntrack gets deleted prematurely, e.g. due to a tcp reset,
module removal, netdev notifier (nat/masquerade device went down),
ctnetlink and so on.

Add the protocol state too for the destroy message to check for abnormal
state on connection termination.

Joint work with Pablo.

Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

xdp_return_frame_bulk() needs to pass a xdp_buff
to __xdp_return().

strlcpy got converted to strscpy but here it makes no
functional difference, so just keep the right code.

Conflicts:
net/netfilter/nf_tables_api.c

Signed-off-by: Jakub Kicinski <[email protected]>

Merge ath-next from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git

ath.git patches for v5.11. Major changes:

ath11k

* suspend support for QCA6390 PCI devices

* support TXOP duration based RTS threshold

* mesh: add support for 256 bitmap in blockack frames in 11ax

ath11k: implement suspend for QCA6390 PCI devices

Now that all the needed pieces are in place implement suspend support QCA6390
PCI devices. All other devices will return -EOPNOTSUPP during suspend. The
suspend is implemented by switching the firmware to WoW mode during suspend, so
the firmware will be running on low power mode while host is in suspend.

At the moment we are not able to shutdown and fully power off the device due to
bugs in MHI subsystem, so WoW mode is a workaround for the time being.

During suspend we enable WoW mode, disable CE irq and DP irq, then put MHI to
suspend state. During resume, driver resumes MHI firstly, then enables CE irq
and dp IRQ, and sends WoW wakeup command to firmware.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1

Signed-off-by: Carl Huang <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

ath11k: hif: add ce irq enable and disable functions

Add ce irq enable and disable hif layer functions, so core module can enable
enable them without cleaning pipe and refilling pipe. Needed for suspend.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1

Signed-off-by: Carl Huang <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

ath11k: implement WoW enable and wakeup commands

Implement wow enable ane wow wakeup commands which are needed for suspend.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1

Signed-off-by: Carl Huang <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

ath11k: set credit_update flag for flow controlled ep only

Firmware will check all the pipes before entering WoW mode during suspend. If
ATH11K_HTC_FLAG_NEED_CREDIT_UPDATE is set, firmware treats this pipe needed to
return credit even though it's actually not required. If any pipe needs to
return credit, the suspend_complete message doesn't send to host but is
dropped. So host gets time out and WoW suspend failed.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1

Signed-off-by: Carl Huang <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

ath11k: dp: stop rx pktlog before suspend

Stop dp rx pktlog when entering suspend and reap the mon_status buffer to keep
it empty. During resume restart the reap timer.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1

Signed-off-by: Carl Huang <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

ath11k: htc: implement suspend handling

When ath11k sends suspend command to firmware, firmware will
return suspend_complete events and add handlers for those.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1

Signed-off-by: Carl Huang <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

ath11k: htc: remove unused struct ath11k_htc_ops

No need for it so remove. Compile tested only.

Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

ath11k: pci: read select_window register to ensure write is finished

Just when resume from WoW, the write to select_window doesn't take
effect immediately, so read the register again to ensure the write
operation is finished.

Another change is to reset select_window to ZERO because this
register isn't restored after WoW, so the content of this register
becomes ZERO too.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1

Signed-off-by: Carl Huang <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

ath11k: hif: implement suspend and resume functions

For suspend support add suspend and resume to HIF layer. These ops are optional
and, for example, AHB bus driver does not need to implement these.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1

Signed-off-by: Carl Huang <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

ath11k: mhi: hook suspend and resume

MHI suspend and resume isn't hooked in ath11k yet, so hook these
functions needed for suspend support.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1

Signed-off-by: Carl Huang <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

ath11k: Fix incorrect tlvs in scan start command

Currently 6G specific tlvs have duplicate entries which is causing
scan failures. Fix this by removing the duplicate entries of the same
tlv. This also fixes out-of-bound memory writes caused due to
adding tlvs when num_hint_bssid and num_hint_s_ssid are ZEROs.

Tested-on: QCN9074 hw1.0 PCI WLAN.HK.2.4.0.1-01386-QCAHKSWPL_SILICONZ-1

Fixes: 74601ecfef6e ("ath11k: Add support for 6g scan hint")
Reported-by: Carl Huang <[email protected]>
Signed-off-by: Pradeep Kumar Chitrapu <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

ath11k: pci: disable VDD4BLOW

It's recommended to disable VDD4BLOW during initialisation.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1

Signed-off-by: Carl Huang <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

ath11k: pci: fix L1ss clock unstable problem

For QCA6390, one PCI related clock drifts sometimes, and
it makes PCI link difficult to quit L1ss. Fix it by writing
some registers which are known to fix the problem.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1

Signed-off-by: Carl Huang <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

ath11k: pci: fix hot reset stability issues

For QCA6390, host needs to reset some registers before MHI power up to fix PCI
link unstable issue if hot reset happened. Also clear all pending interrupts
during power up.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1

Signed-off-by: Carl Huang <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

ath11k: put hw to DBS using WMI_PDEV_SET_HW_MODE_CMDID

It's recommended to use wmi command WMI_PDEV_SET_HW_MODE_CMDID to put hardware
to dbs mode instead of wmi_init command. This fixes a few strange stability
issues.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1

Signed-off-by: Carl Huang <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

ath11k: mhi: print a warning if firmware crashed

There was no way to detect if the firmware crashed so add a warning. At the
moment the firmware is not restarted or anything like that, so when this
happens ath11k modules need to be reloaded.

Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1

Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

ath11k: use MHI provided APIs to allocate and free MHI controller

Use MHI provided APIs to allocate and free MHI controller to
improve MHI host driver handling. This also fixes a memory leak
as the MHI controller was allocated but never freed.

Signed-off-by: Bhaumik Bhatt <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

ath10k: add atomic protection for device recovery

When it has more than one restart_work queued meanwhile, the 2nd
restart_work is very easy to break the 1st restart work and lead
recovery fail.

Add a flag to allow only one restart work running untill
device successfully recovered.

It already has flag ATH10K_FLAG_CRASH_FLUSH, but it can not use this
flag again, because it is clear in ath10k_core_start. The function
ieee80211_reconfig(called by ieee80211_restart_work) of mac80211 do
many things and drv_start(call to ath10k_core_start) is 1st thing,
when drv_start complete, it does not mean restart complete. So it
add new flag and clear it in ath10k_reconfig_complete, because it
is the last thing called from drv_reconfig_complete of function
ieee80211_reconfig, after it, the restart process finished.

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049

Signed-off-by: Wen Gong <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/010101746bead6a0-d5e97c66-dedd-4b92-810e-c2e4840fafc9-000000@us-west-2.amazonses.com

ath10k: add option for chip-id based BDF selection

In some devices difference in chip-id should be enough to pick
the right BDF. Add another support for chip-id based BDF selection.
With this new option, ath10k supports 2 fallback options.

The board name with chip-id as option looks as follows
board name 'bus=snoc,qmi-board-id=ff,qmi-chip-id=320'

Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.2.2-00696-QCAHLSWMTPL-1
Tested-on: QCA6174 HW3.2 WLAN.RM.4.4.1-00157-QCARMSWPZ-1
Signed-off-by: Abhishek Kumar <[email protected]>
Reviewed-by: Douglas Anderson <[email protected]>
Reviewed-by: Rakesh Pillai <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/20201207231824.v3.1.Ia6b95087ca566f77423f3802a78b946f7b593ff5@changeid

Merge tag 'mtd/fixes-for-5.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux

Pull mtd fixes from Miquel Raynal:
"Second series of fixes for raw NAND drivers initiated because of a
  rework of the ECC engine subsystem.

  The location of the DT parsing logic got moved, breaking several
  drivers which in fact were not doing the ECC engine initialization at
  the right place.

  These drivers have been fixed by enforcing a particular ECC engine
  type and algorithm, software Hamming, while the algorithm may be
  overwritten by a DT property. This merge request fixes this in the
  xway, socrates, plat_nand, pasemi, orion, mpc5121, gpio, au1550 and
  ams-delta controller drivers"

* tag 'mtd/fixes-for-5.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
  mtd: rawnand: xway: Do not force a particular software ECC engine
  mtd: rawnand: socrates: Do not force a particular software ECC engine
  mtd: rawnand: plat_nand: Do not force a particular software ECC engine
  mtd: rawnand: pasemi: Do not force a particular software ECC engine
  mtd: rawnand: orion: Do not force a particular software ECC engine
  mtd: rawnand: mpc5121: Do not force a particular software ECC engine
  mtd: rawnand: gpio: Do not force a particular software ECC engine
  mtd: rawnand: au1550: Do not force a particular software ECC engine
  mtd: rawnand: ams-delta: Do not force a particular software ECC engine

Merge tag 'mmc-v5.10-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
"A couple of MMC fixes:

  MMC core:
   - Fixup condition for CMD13 polling for RPMB requests

  MMC host:
   - mtk-sd: Fix system suspend/resume support for CQHCI
   - mtd-sd: Extend SDIO IRQ fix to more variants
   - sdhci-of-arasan: Fix clock registration error for Keem Bay SOC
   - tmio: Bring HW to a sane state after a power off"

* tag 'mmc-v5.10-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: mediatek: mark PM functions as __maybe_unused
  mmc: block: Fixup condition for CMD13 polling for RPMB requests
  mmc: tmio: improve bringing HW to a sane state with MMC_POWER_OFF
  mmc: sdhci-of-arasan: Fix clock registration error for Keem Bay SOC
  mmc: mediatek: Extend recheck_sdio_irq fix to more variants
  mmc: mediatek: Fix system suspend/resume support for CQHCI

Merge tag 'zonefs-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

Pull zonefs fix from Damien Le Moal:
"A single patch in this pull request to fix a BIO and page reference
leak when writing sequential zone files"

* tag 'zonefs-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: fix page reference and BIO leak

bpf: Fix enum names for bpf_this_cpu_ptr() and bpf_per_cpu_ptr() helpers

Remove bpf_ prefix, which causes these helpers to be reported in verifier
dump as bpf_bpf_this_cpu_ptr() and bpf_bpf_per_cpu_ptr(), respectively. Lets
fix it as long as it is still possible before UAPI freezes on these helpers.

Fixes: eaa6bcb71ef6 ("bpf: Introduce bpf_per_cpu_ptr()")
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"8 patches.

  Subsystems affected by this patch series: proc, selftests, kbuild, and
  mm (pagecache, kasan, hugetlb)"

* emailed patches from Andrew Morton <[email protected]>:
  mm/hugetlb: clear compound_nr before freeing gigantic pages
  kasan: fix object remaining in offline per-cpu quarantine
  elfcore: fix building with clang
  initramfs: fix clang build failure
  kbuild: avoid static_assert for genksyms
  selftest/fpu: avoid clang warning
  proc: use untagged_addr() for pagemap_read addresses
  revert "mm/filemap: add static for function __add_to_page_cache_locked"

mm/hugetlb: clear compound_nr before freeing gigantic pages

Commit 1378a5ee451a ("mm: store compound_nr as well as compound_order")
added compound_nr counter to first tail struct page, overlaying with
page->mapping.  The overlay itself is fine, but while freeing gigantic
hugepages via free_contig_range(), a "bad page" check will trigger for
non-NULL page->mapping on the first tail page:

  BUG: Bad page state in process bash  pfn:380001
  page:00000000c35f0856 refcount:0 mapcount:0 mapping:00000000126b68aa index:0x0 pfn:0x380001
  aops:0x0
  flags: 0x3ffff00000000000()
  raw: 3ffff00000000000 0000000000000100 0000000000000122 0000000100000000
  raw: 0000000000000000 0000000000000000 ffffffff00000000 0000000000000000
  page dumped because: non-NULL mapping
  Modules linked in:
  CPU: 6 PID: 616 Comm: bash Not tainted 5.10.0-rc7-next-20201208 #1
  Hardware name: IBM 3906 M03 703 (LPAR)
  Call Trace:
    show_stack+0x6e/0xe8
    dump_stack+0x90/0xc8
    bad_page+0xd6/0x130
    free_pcppages_bulk+0x26a/0x800
    free_unref_page+0x6e/0x90
    free_contig_range+0x94/0xe8
    update_and_free_page+0x1c4/0x2c8
    free_pool_huge_page+0x11e/0x138
    set_max_huge_pages+0x228/0x300
    nr_hugepages_store_common+0xb8/0x130
    kernfs_fop_write+0xd2/0x218
    vfs_write+0xb0/0x2b8
    ksys_write+0xac/0xe0
    system_call+0xe6/0x288
  Disabling lock debugging due to kernel taint

This is because only the compound_order is cleared in
destroy_compound_gigantic_page(), and compound_nr is set to
1U << order == 1 for order 0 in set_compound_order(page, 0).

Fix this by explicitly clearing compound_nr for first tail page after
calling set_compound_order(page, 0).

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 1378a5ee451a ("mm: store compound_nr as well as compound_order")
Signed-off-by: Gerald Schaefer <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: <[email protected]> [5.9+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kasan: fix object remaining in offline per-cpu quarantine

We hit this issue in our internal test.  When enabling generic kasan, a
kfree()'d object is put into per-cpu quarantine first.  If the cpu goes
offline, object still remains in the per-cpu quarantine.  If we call
kmem_cache_destroy() now, slub will report "Objects remaining" error.

  =============================================================================
  BUG test_module_slab (Not tainted): Objects remaining in test_module_slab on __kmem_cache_shutdown()
  -----------------------------------------------------------------------------

  Disabling lock debugging due to kernel taint
  INFO: Slab 0x(____ptrval____) objects=34 used=1 fp=0x(____ptrval____) flags=0x2ffff00000010200
  CPU: 3 PID: 176 Comm: cat Tainted: G    B             5.10.0-rc1-00007-g4525c8781ec0-dirty #10
  Hardware name: linux,dummy-virt (DT)
  Call trace:
     dump_backtrace+0x0/0x2b0
     show_stack+0x18/0x68
     dump_stack+0xfc/0x168
     slab_err+0xac/0xd4
     __kmem_cache_shutdown+0x1e4/0x3c8
     kmem_cache_destroy+0x68/0x130
     test_version_show+0x84/0xf0
     module_attr_show+0x40/0x60
     sysfs_kf_seq_show+0x128/0x1c0
     kernfs_seq_show+0xa0/0xb8
     seq_read+0x1f0/0x7e8
     kernfs_fop_read+0x70/0x338
     vfs_read+0xe4/0x250
     ksys_read+0xc8/0x180
     __arm64_sys_read+0x44/0x58
     el0_svc_common.constprop.0+0xac/0x228
     do_el0_svc+0x38/0xa0
     el0_sync_handler+0x170/0x178
     el0_sync+0x174/0x180
  INFO: Object 0x(____ptrval____) @offset=15848
  INFO: Allocated in test_version_show+0x98/0xf0 age=8188 cpu=6 pid=172
     stack_trace_save+0x9c/0xd0
     set_track+0x64/0xf0
     alloc_debug_processing+0x104/0x1a0
     ___slab_alloc+0x628/0x648
     __slab_alloc.isra.0+0x2c/0x58
     kmem_cache_alloc+0x560/0x588
     test_version_show+0x98/0xf0
     module_attr_show+0x40/0x60
     sysfs_kf_seq_show+0x128/0x1c0
     kernfs_seq_show+0xa0/0xb8
     seq_read+0x1f0/0x7e8
     kernfs_fop_read+0x70/0x338
     vfs_read+0xe4/0x250
     ksys_read+0xc8/0x180
     __arm64_sys_read+0x44/0x58
     el0_svc_common.constprop.0+0xac/0x228
  kmem_cache_destroy test_module_slab: Slab cache still has objects

Register a cpu hotplug function to remove all objects in the offline
per-cpu quarantine when cpu is going offline.  Set a per-cpu variable to
indicate this cpu is offline.

[[email protected]: fix slab double free when cpu-hotplug]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kuan-Ying Lee <[email protected]>
Signed-off-by: Zqiang <[email protected]>
Suggested-by: Dmitry Vyukov <[email protected]>
Reported-by: Guangye Yang <[email protected]>
Reviewed-by: Dmitry Vyukov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Matthias Brugger <[email protected]>
Cc: Nicholas Tang <[email protected]>
Cc: Miles Chen <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

elfcore: fix building with clang

kernel/elfcore.c only contains weak symbols, which triggers a bug with
clang in combination with recordmcount:

  Cannot find symbol for section 2: .text.
  kernel/elfcore.o: failed

Move the empty stubs into linux/elfcore.h as inline functions.  As only
two architectures use these, just use the architecture specific Kconfig
symbols to key off the declaration.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Barret Rhoden <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

initramfs: fix clang build failure

There is only one function in init/initramfs.c that is in the .text
section, and it is marked __weak.  When building with clang-12 and the
integrated assembler, this leads to a bug with recordmcount:

  ./scripts/recordmcount  "init/initramfs.o"
  Cannot find symbol for section 2: .text.
  init/initramfs.o: failed

I'm not quite sure what exactly goes wrong, but I notice that this
function is only ever called from an __init function, and normally
inlined.  Marking it __init as well is clearly correct and it leads to
recordmcount no longer complaining.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Barret Rhoden <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kbuild: avoid static_assert for genksyms

genksyms does not know or care about the _Static_assert() built-in, and
sometimes falls back to ignoring the later symbols, which causes
undefined behavior such as

  WARNING: modpost: EXPORT symbol "ethtool_set_ethtool_phy_ops" [vmlinux] version generation failed, symbol will not be versioned.
  ld: net/ethtool/common.o: relocation R_AARCH64_ABS32 against `__crc_ethtool_set_ethtool_phy_ops' can not be used when making a shared object
  net/ethtool/common.o:(_ftrace_annotated_branch+0x0): dangerous relocation: unsupported relocation

Redefine static_assert for genksyms to avoid that.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Suggested-by: Ard Biesheuvel <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Rikard Falkeborn <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

selftest/fpu: avoid clang warning

With extra warnings enabled, clang complains about the redundant
-mhard-float argument:

clang: error: argument unused during compilation: '-mhard-float' [-Werror,-Wunused-command-line-argument]

Move this into the gcc-only part of the Makefile.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 4185b3b92792 ("selftests/fpu: Add an FPU selftest")
Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Petteri Aimonen <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

proc: use untagged_addr() for pagemap_read addresses

When we try to visit the pagemap of a tagged userspace pointer, we find
that the start_vaddr is not correct because of the tag.
To fix it, we should untag the userspace pointers in pagemap_read().

I tested with 5.10-rc4 and the issue remains.

Explanation from Catalin in [1]:

"Arguably, that's a user-space bug since tagged file offsets were never
  supported. In this case it's not even a tag at bit 56 as per the arm64
  tagged address ABI but rather down to bit 47. You could say that the
  problem is caused by the C library (malloc()) or whoever created the
  tagged vaddr and passed it to this function. It's not a kernel
  regression as we've never supported it.

  Now, pagemap is a special case where the offset is usually not
  generated as a classic file offset but rather derived by shifting a
  user virtual address. I guess we can make a concession for pagemap
  (only) and allow such offset with the tag at bit (56 - PAGE_SHIFT + 3)"

My test code is based on [2]:

A userspace pointer which has been tagged by 0xb4: 0xb400007662f541c8

userspace program:

  uint64 OsLayer::VirtualToPhysical(void *vaddr) {
uint64 frame, paddr, pfnmask, pagemask;
int pagesize = sysconf(_SC_PAGESIZE);
off64_t off = ((uintptr_t)vaddr) / pagesize * 8; // off = 0xb400007662f541c8 / pagesize * 8 = 0x5a00003b317aa0
int fd = open(kPagemapPath, O_RDONLY);
...

if (lseek64(fd, off, SEEK_SET) != off || read(fd, &frame, 8) != 8) {
int err = errno;
string errtxt = ErrorString(err);
if (fd >= 0)
close(fd);
return 0;
}
  ...
  }

kernel fs/proc/task_mmu.c:

  static ssize_t pagemap_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
  {
...
src = *ppos;
svpfn = src / PM_ENTRY_BYTES; // svpfn == 0xb400007662f54
start_vaddr = svpfn << PAGE_SHIFT; // start_vaddr == 0xb400007662f54000
end_vaddr = mm->task_size;

/* watch out for wraparound */
// svpfn == 0xb400007662f54
// (mm->task_size >> PAGE) == 0x8000000
if (svpfn > mm->task_size >> PAGE_SHIFT) // the condition is true because of the tag 0xb4
start_vaddr = end_vaddr;

ret = 0;
while (count && (start_vaddr < end_vaddr)) { // we cannot visit correct entry because start_vaddr is set to end_vaddr
int len;
unsigned long end;
...
}
...
  }

[1] https://lore.kernel.org/patchwork/patch/1343258/
[2] https://github.com/stressapptest/stressapptest/blob/master/src/os.cc#L158

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miles Chen <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Song Bao Hua (Barry Song) <[email protected]>
Cc: <[email protected]> [5.4-]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

revert "mm/filemap: add static for function __add_to_page_cache_locked"

Revert commit 3351b16af494 ("mm/filemap: add static for function
__add_to_page_cache_locked") due to incompatibility with
ALLOW_ERROR_INJECTION which result in build errors.

Link: https://lkml.kernel.org/r/CAADnVQJ6tmzBXvtroBuEH6QA0H+q7yaSKxrVvVxhqr3KBZdEXg@mail.gmail.com
Tested-by: Justin Forbes <[email protected]>
Tested-by: Greg Thelen <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Cc: Michal Kubecek <[email protected]>
Cc: Alex Shi <[email protected]>
Cc: Souptick Joarder <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>