Git Repo - linux.git/log

brcmfmac: split brcmf_attach() and brcmf_detach() functions

Move code allocating/freeing wiphy out of above functions. This will
allow reinitializing the driver (e.g. on some error) without allocating
a new wiphy.

Signed-off-by: Rafał Miłecki <[email protected]>
Acked-by: Arend van Spriel <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

brcmfmac: move "cfg80211_ops" pointer to another struct

This moves "ops" pointer from "struct brcmf_cfg80211_info" to the
"struct brcmf_pub". This movement makes it possible to allocate wiphy
without attaching cfg80211 (brcmf_cfg80211_attach()). It's required for
later separation of wiphy allocation and driver initialization.

While at it fix also an unlikely memory leak in the brcmf_attach().

Signed-off-by: Rafał Miłecki <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Fix error path of nf_tables_updobj(), from Dan Carpenter.

2) Move large structure away from stack in the nf_tables offload
   infrastructure, from Arnd Bergmann.

3) Move indirect flow_block logic to nf_tables_offload.

4) Support for synproxy objects, from Fernando Fernandez Mancera.

5) Support for fwd and dup offload.

6) Add __nft_offload_get_chain() helper, this implicitly fixes missing
   mutex and check for offload flags in the indirect block support,
   patch from wenxu.

7) Remove rules on device unregistration, from wenxu. This includes
   two preparation patches to reuse nft_flow_offload_chain() and
   nft_flow_offload_rule().

Large batch from Jeremy Sowden to make a second pass to the
CONFIG_HEADER_TEST support and a bit of housekeeping:

8) Missing include guard in conntrack label header, from Jeremy Sowden.

9) A few coding style errors: trailing whitespace, incorrect indent in
   Kconfig, and semicolons at the end of function definitions.

10) Remove unused ipt_init() and ip6t_init() declarations.

11) Inline xt_hashlimit, ebt_802_3 and xt_physdev headers. They are
    only used once.

12) Update include directive in several netfilter files.

13) Remove unused include/net/netfilter/ipv6/nf_conntrack_icmpv6.h.

14) Move nf_ip6_ext_hdr() to include/linux/netfilter_ipv6.h

15) Move several synproxy structure definitions to nf_synproxy.h

16) Move nf_bridge_frag_data structure to include/linux/netfilter_bridge.h

17) Clean up static inline definitions in nf_conntrack_ecache.h.

18) Replace defined(CONFIG...) || defined(CONFIG...MODULE) with IS_ENABLED(CONFIG...).

19) Missing inline function conditional definitions based on Kconfig
    preferences in synproxy and nf_conntrack_timeout.

20) Update br_nf_pre_routing_ipv6() definition.

21) Move conntrack code in linux/skbuff.h to nf_conntrack headers.

22) Several patches to remove superfluous CONFIG_NETFILTER and
    CONFIG_NF_CONNTRACK checks in headers, coming from the initial batch
    support for CONFIG_HEADER_TEST for netfilter.
====================

Signed-off-by: David S. Miller <[email protected]>

mmc: tmio: Fixup runtime PM management during remove

Accessing the device when it may be runtime suspended is a bug, which is
the case in tmio_mmc_host_remove(). Let's fix the behaviour.

Cc: [email protected]
Signed-off-by: Ulf Hansson <[email protected]>
Tested-by: Geert Uytterhoeven <[email protected]>

mmc: tmio: Fixup runtime PM management during probe

The tmio_mmc_host_probe() calls pm_runtime_set_active() to update the
runtime PM status of the device, as to make it reflect the current status
of the HW. This works fine for most cases, but unfortunate not for all.
Especially, there is a generic problem when the device has a genpd attached
and that genpd have the ->start|stop() callbacks assigned.

More precisely, if the driver calls pm_runtime_set_active() during
->probe(), genpd does not get to invoke the ->start() callback for it,
which means the HW isn't really fully powered on. Furthermore, in the next
phase, when the device becomes runtime suspended, genpd will invoke the
->stop() callback for it, potentially leading to usage count imbalance
problems, depending on what's implemented behind the callbacks of course.

To fix this problem, convert to call pm_runtime_get_sync() from
tmio_mmc_host_probe() rather than pm_runtime_set_active(). Additionally, to
avoid bumping usage counters and unnecessary re-initializing the HW the
first time the tmio driver's ->runtime_resume() callback is called,
introduce a state flag to keeping track of this.

Cc: [email protected]
Signed-off-by: Ulf Hansson <[email protected]>
Tested-by: Geert Uytterhoeven <[email protected]>

Revert "mmc: tmio: move runtime PM enablement to the driver implementations"

This reverts commit 7ff213193310ef8d0ee5f04f79d791210787ac2c.

It turns out that the above commit introduces other problems. For example,
calling pm_runtime_set_active() must not be done prior calling
pm_runtime_enable() as that makes it fail. This leads to additional
problems, such as clock enables being wrongly balanced.

Rather than fixing the problem on top, let's start over by doing a revert.

Fixes: 7ff213193310 ("mmc: tmio: move runtime PM enablement to the driver implementations")
Signed-off-by: Ulf Hansson <[email protected]>
Tested-by: Geert Uytterhoeven <[email protected]>

netfilter: conntrack: remove two unused functions from nf_conntrack_timestamp.h.

Two inline functions defined in nf_conntrack_timestamp.h,
`nf_ct_tstamp_enabled` and `nf_ct_set_tstamp`, are not called anywhere.
Remove them.

Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: conntrack: remove CONFIG_NF_CONNTRACK checks from nf_conntrack_zones.h.

nf_conntrack_zones.h was wrapped in a CONFIG_NF_CONNTRACK check in order
to fix compilation failures:

37ee3d5b3e97 ("netfilter: nf_defrag_ipv4: fix compilation error with NF_CONNTRACK=n")

Subsequent changes mean that these failures will no longer occur and the
check is unnecessary. Remove it.

Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: remove CONFIG_NETFILTER checks from headers.

`struct nf_hook_ops`, `struct nf_hook_state` and the `nf_hookfn`
function typedef appear in function and struct declarations and
definitions in a number of netfilter headers. The structs and typedef
themselves are defined by linux/netfilter.h but only when
CONFIG_NETFILTER is enabled. Define them unconditionally and add
forward declarations in order to remove CONFIG_NETFILTER conditionals
from the other headers.

Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: conntrack: remove CONFIG_NF_CONNTRACK check from nf_conntrack_acct.h.

There is a superfluous `#if IS_ENABLED(CONFIG_NF_CONNTRACK)` check
wrapping some function declarations. Remove it.

Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: conntrack: move code to linux/nf_conntrack_common.h.

Move some `struct nf_conntrack` code from linux/skbuff.h to
linux/nf_conntrack_common.h. Together with a couple of helpers for
getting and setting skb->_nfct, it allows us to remove
CONFIG_NF_CONNTRACK checks from net/netfilter/nf_conntrack.h.

Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: br_netfilter: update stub br_nf_pre_routing_ipv6 parameter to `void *priv`.

The real br_nf_pre_routing_ipv6 function, defined when CONFIG_IPV6 is
enabled, expects `void *priv`, not `const struct nf_hook_ops *ops`.
Update the stub br_nf_pre_routing_ipv6, defined when CONFIG_IPV6 is
disabled, to match.

Fixes: 06198b34a3e0 ("netfilter: Pass priv instead of nf_hook_ops to netfilter hooks")
Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: conntrack: wrap two inline functions in config checks.

nf_conntrack_synproxy.h contains three inline functions.  The contents
of two of them are wrapped in CONFIG_NETFILTER_SYNPROXY checks and just
return NULL if it is not enabled.  The third does nothing if they return
NULL, so wrap its contents as well.

nf_ct_timeout_data is only called if CONFIG_NETFILTER_TIMEOUT is
enabled.  Wrap its contents in a CONFIG_NETFILTER_TIMEOUT check like the
other inline functions in nf_conntrack_timeout.h.

Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: replace defined(CONFIG...) || defined(CONFIG...MODULE) with IS_ENABLED(CONFIG...).

A few headers contain instances of:

#if defined(CONFIG_XXX) or defined(CONFIG_XXX_MODULE)

Replace them with:

#if IS_ENABLED(CONFIG_XXX)

Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: conntrack: use consistent style when defining inline functions

The header contains some inline functions defined as:

  static inline f (...)
  {
  #ifdef CONFIG_NF_CONNTRACK_EVENTS
    ...
  #else
    ...
  #endif
  }

and a few others as:

  #ifdef CONFIG_NF_CONNTRACK_EVENTS
  static inline f (...)
  {
    ...
  }
  #else
  static inline f (...)
  {
    ...
  }
  #endif

Prefer the former style, which is more numerous.

Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: move nf_bridge_frag_data struct definition to a more appropriate header.

There is a struct definition function in nf_conntrack_bridge.h which is
not specific to conntrack and is used elswhere in netfilter. Move it
into netfilter_bridge.h.

Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: synproxy: move code between headers.

There is some non-conntrack code in the nf_conntrack_synproxy.h header.
Move it to the nf_synproxy.h header.

Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: move inline nf_ip6_ext_hdr() function to a more appropriate header.

There is an inline function in ip6_tables.h which is not specific to
ip6tables and is used elswhere in netfilter. Move it into
netfilter_ipv6.h and update the callers.

Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: remove nf_conntrack_icmpv6.h header.

nf_conntrack_icmpv6.h contains two object macros which duplicate macros
in linux/icmpv6.h. The latter definitions are also visible wherever it
is included, so remove it.

Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: update include directives.

Include some headers in files which require them, and remove others
which are not required.

Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: inline xt_hashlimit, ebt_802_3 and xt_physdev headers

Three netfilter headers are only included once. Inline their contents
at those sites and remove them.

Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: ip_tables: remove unused function declarations.

Two headers include declarations of functions which are never defined.
Remove them.

Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: fix coding-style errors.

Several header-files, Kconfig files and Makefiles have trailing
white-space. Remove it.

In netfilter/Kconfig, indent the type of CONFIG_NETFILTER_NETLINK_ACCT
correctly.

There are semicolons at the end of two function definitions in
include/net/netfilter/nf_conntrack_acct.h and
include/net/netfilter/nf_conntrack_ecache.h. Remove them.

Fix indentation in nf_conntrack_l4proto.h.

Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: fix include guards.

nf_conntrack_labels.h has no include guard. Add it.

The comment following the #endif in the nf_flow_table.h include guard
referred to the wrong macro. Fix it.

Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nf_tables_offload: remove rules when the device unregisters

If the net_device unregisters, clean up the offload rules before the
chain is destroy.

Signed-off-by: wenxu <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

Merge branch 'for-5.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fix from Tejun Heo:
"Roman found and fixed a bug in the cgroup2 freezer which allows new
  child cgroup to escape frozen state"

* 'for-5.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: freezer: fix frozen state inheritance
  kselftests: cgroup: add freezer mkdir test

Merge tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
"Here are two fixes, one of them urgent fixing a bug introduced in 5.2
  and reported by many users. It took time to identify the root cause,
  catching the 5.3 release is higly desired also to push the fix to 5.2
  stable tree.

  The bug is a mess up of return values after adding proper error
  handling and honestly the kind of bug that can cause sleeping
  disorders until it's caught. My appologies to everybody who was
  affected.

  Summary of what could happen:

  1) either a hang when committing a transaction, if this happens
     there's no risk of corruption, still the hang is very inconvenient
     and can't be resolved without a reboot

  2) writeback for some btree nodes may never be started and we end up
     committing a transaction without noticing that, this is really
     serious and that will lead to the "parent transid verify failed"
     messages"

* tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: fix unwritten extent buffers and hangs on future writeback attempts
  Btrfs: fix assertion failure during fsync and use of stale transaction

netfilter: nf_tables_offload: refactor the nft_flow_offload_rule function

Pass rule, chain and flow_rule object parameters to nft_flow_offload_rule
to reuse it.

Signed-off-by: wenxu <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nf_tables_offload: refactor the nft_flow_offload_chain function

Pass chain and policy parameters to nft_flow_offload_chain to reuse it.

Signed-off-by: wenxu <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nf_tables_offload: add __nft_offload_get_chain function

Add __nft_offload_get_chain function to get basechain from device. This
function requires that caller holds the per-netns nftables mutex. This
patch implicitly fixes missing offload flags check and proper mutex from
nft_indr_block_cb().

Fixes: 9a32669fecfb ("netfilter: nf_tables_offload: support indr block call")
Signed-off-by: wenxu <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

cgroup: freezer: fix frozen state inheritance

If a new child cgroup is created in the frozen cgroup hierarchy
(one or more of ancestor cgroups is frozen), the CGRP_FREEZE cgroup
flag should be set. Otherwise if a process will be attached to the
child cgroup, it won't become frozen.

The problem can be reproduced with the test_cgfreezer_mkdir test.

This is the output before this patch:
  ~/test_freezer
  ok 1 test_cgfreezer_simple
  ok 2 test_cgfreezer_tree
  ok 3 test_cgfreezer_forkbomb
  Cgroup /sys/fs/cgroup/cg_test_mkdir_A/cg_test_mkdir_B isn't frozen
  not ok 4 test_cgfreezer_mkdir
  ok 5 test_cgfreezer_rmdir
  ok 6 test_cgfreezer_migrate
  ok 7 test_cgfreezer_ptrace
  ok 8 test_cgfreezer_stopped
  ok 9 test_cgfreezer_ptraced
  ok 10 test_cgfreezer_vfork

And with this patch:
  ~/test_freezer
  ok 1 test_cgfreezer_simple
  ok 2 test_cgfreezer_tree
  ok 3 test_cgfreezer_forkbomb
  ok 4 test_cgfreezer_mkdir
  ok 5 test_cgfreezer_rmdir
  ok 6 test_cgfreezer_migrate
  ok 7 test_cgfreezer_ptrace
  ok 8 test_cgfreezer_stopped
  ok 9 test_cgfreezer_ptraced
  ok 10 test_cgfreezer_vfork

Reported-by: Mark Crossen <[email protected]>
Signed-off-by: Roman Gushchin <[email protected]>
Fixes: 76f969e8948d ("cgroup: cgroup v2 freezer")
Cc: Tejun Heo <[email protected]>
Cc: [email protected] # v5.2+
Signed-off-by: Tejun Heo <[email protected]>

kselftests: cgroup: add freezer mkdir test

Add a new cgroup freezer selftest, which checks that if a cgroup is
frozen, their new child cgroups will properly inherit the frozen
state.

It creates a parent cgroup, freezes it, creates a child cgroup
and populates it with a dummy process. Then it checks that both
parent and child cgroup are frozen.

Signed-off-by: Roman Gushchin <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>

ice: Bump version

Bump version to 0.8.1-k

Signed-off-by: Tony Nguyen <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

ice: Enable DDP package download

Attempt to request an optional device-specific DDP package file
(one with the PCIe Device Serial Number in its name so that different DDP
package files can be used on different devices). If the optional package
file exists, download it to the device. If not, download the default
package file.

Log an appropriate message based on whether or not a DDP package
file exists and the return code from the attempt to download it to the
device. If the download fails and there is not already a package file on
the device, go into "Safe Mode" where some features are not supported.

Signed-off-by: Tony Nguyen <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

ice: Initialize DDP package structures

Add functions to initialize, parse, and clean structures representing
the DDP package.

Upon completion of package download, read and store the DDP package
contents to these structures. This configuration is used to
identify the default behavior and later used to update the HW table
entries.

Signed-off-by: Tony Nguyen <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

ice: Implement Dynamic Device Personalization (DDP) download

Add the required defines, structures, and functions to enable downloading
a DDP package. Before download, checks are performed to ensure the package
is valid and compatible.

Note that package download is not yet requested by the driver as further
initialization is required to utilize the package.

Signed-off-by: Tony Nguyen <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

ice: Fix FW version formatting in dmesg

The FW build id is currently being displayed as an int which doesn't make
sense. Instead display FW build id as a hex value. Also add other useful
information to the output such as NVM version, API patch info, and FW
build hash.

Signed-off-by: Lukasz Czapnik <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

ice: send driver version to firmware

The driver is required to send a version to the firmware
to indicate that the driver is up. If the driver doesn't
do this the firmware doesn't behave properly.

Signed-off-by: Paul M Stillwell Jr <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

wil6210: ignore reset errors for FW during probe

There are special kinds of FW such as WMI only which
are used for testing, diagnostics and other specific
scenario.
Such FW is loaded during driver probe and the driver
disallows enabling any network interface, to avoid
operational issues.
In many cases it is used to debug early versions
of FW with new features, which sometimes fail
on startup.
Currently when such FW fails to load (for example,
because of init failure), the driver probe would fail
and shutdown the device making it difficult to debug
the early failure.
To fix this, ignore load failures in WMI only FW and
allow driver probe to succeed, making it possible to
continue and debug the FW load failure.

Signed-off-by: Lior David <[email protected]>
Signed-off-by: Maya Erez <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

wil6210: fix RX short frame check

The short frame check in wil_sring_reap_rx_edma uses
skb->len which store the maximum frame length. Fix
this to use dmalen which is the actual length of
the received frame.

Signed-off-by: Lior David <[email protected]>
Signed-off-by: Maya Erez <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

wil6210: use writel_relaxed in wil_debugfs_iomem_x32_set

writel_relaxed can be used in wil_debugfs_iomem_x32_set
since there is a wmb call immediately after.

Signed-off-by: Lior David <[email protected]>
Signed-off-by: Maya Erez <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

wil6210: report boottime_ns in scan results

Call cfg80211_inform_bss_frame_data to report cfg80211 on the
boottime_ns in order to prevent the scan results filtering due to
aging.

Signed-off-by: Maya Erez <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

wil6210: properly initialize discovery_expired_work

Upon driver rmmod, cancel_work_sync() can be invoked on
p2p.discovery_expired_work before this work struct was initialized.
This causes a WARN_ON with newer kernel version.

Add initialization of discovery_expired_work inside wil_vif_init().

Signed-off-by: Dedy Lansky <[email protected]>
Signed-off-by: Maya Erez <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

wil6210: verify cid value is valid

cid value is not being verified in wmi_evt_delba(),
verification is added.

Signed-off-by: Alexei Avshalom Lazar <[email protected]>
Signed-off-by: Maya Erez <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

wil6210: make sure DR bit is read before rest of the status message

Due to compiler optimization, it's possible that dr_bit (descriptor
ready) is read last from the status message.
Due to race condition between HW writing the status message and
driver reading it, other fields that were read earlier (before dr_bit)
could have invalid values.

Fix this by explicitly reading the dr_bit first and then using rmb
before reading the rest of the status message.

Signed-off-by: Dedy Lansky <[email protected]>
Signed-off-by: Maya Erez <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

wil6210: fix PTK re-key race

Fix a race between cfg80211 add_key call and transmitting of 4/4 EAP
packet. In case the transmit is delayed until after the add key takes
place, message 4/4 will be encrypted with the new key, and the
receiver side (AP) will drop it due to MIC error.

Wil6210 will monitor and look for the transmitted packet 4/4 eap key.
In case add_key takes place before the transmission completed, then
wil6210 will let the FW store the key and wil6210 will notify the FW
to use the PTK key only after 4/4 eap packet transmission was
completed.

Signed-off-by: Ahmad Masri <[email protected]>
Signed-off-by: Maya Erez <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

wil6210: add debugfs to show PMC ring content

PMC is a hardware debug mechanism which allows capturing real time
debug data and stream it to host memory. The driver allocates memory
buffers and set them inside PMC ring of descriptors.
Add pmcring debugfs that application can use to read the binary
content of descriptors inside the PMC ring (cat pmcring).

Signed-off-by: Dedy Lansky <[email protected]>
Signed-off-by: Maya Erez <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

wil6210: add wil_netif_rx() helper function

Move common part of wil_netif_rx_any into new helper function and add
support for non-gro receive using netif_rx_ni.

Signed-off-by: Dedy Lansky <[email protected]>
Signed-off-by: Maya Erez <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

ath10k: fix channel info parsing for non tlv target

The tlv targets such as WCN3990 send more data in the chan info event, which is
not sent by the non tlv targets. There is a minimum size check in the wmi event
for non-tlv targets and hence we cannot update the common channel info
structure as it was done in commit 13104929d2ec ("ath10k: fill the channel
survey results for WCN3990 correctly"). This broke channel survey results on
10.x firmware versions.

If the common channel info structure is updated, the size check for chan info
event for non-tlv targets will fail and return -EPROTO and we see the below
error messages

   ath10k_pci 0000:01:00.0: failed to parse chan info event: -71

Add tlv specific channel info structure and restore the original size of the
common channel info structure to mitigate this issue.

Tested HW: WCN3990
   QCA9887
Tested FW: WLAN.HL.3.1-00784-QCAHLSWMTPLZ-1
   10.2.4-1.0-00037

Fixes: 13104929d2ec ("ath10k: fill the channel survey results for WCN3990 correctly")
Cc: [email protected] # 5.0
Signed-off-by: Rakesh Pillai <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

ath10k: adjust skb length in ath10k_sdio_mbox_rx_packet

When the FW bundles multiple packets, pkt->act_len may be incorrect
as it refers to the first packet only (however, the FW will only
bundle packets that fit into the same pkt->alloc_len).

Before this patch, the skb length would be set (incorrectly) to
pkt->act_len in ath10k_sdio_mbox_rx_packet, and then later manually
adjusted in ath10k_sdio_mbox_rx_process_packet.

The first problem is that ath10k_sdio_mbox_rx_process_packet does not
use proper skb_put commands to adjust the length (it directly changes
skb->len), so we end up with a mismatch between skb->head + skb->tail
and skb->data + skb->len. This is quite serious, and causes corruptions
in the TCP stack, as the stack tries to coalesce packets, and relies
on skb->tail being correct (that is, skb_tail_pointer must point to
the first byte_after_ the data).

Instead of re-adjusting the size in ath10k_sdio_mbox_rx_process_packet,
this moves the code to ath10k_sdio_mbox_rx_packet, and also add a
bounds check, as skb_put would crash the kernel if not enough space is
available.

Tested with QCA6174 SDIO with firmware
WLAN.RMH.4.4.1-00007-QCARMSWP-1.

Fixes: 8530b4e7b22bc3b ("ath10k: sdio: set skb len for all rx packets")
Signed-off-by: Nicolas Boichat <[email protected]>
Signed-off-by: Wen Gong <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

ath10k: free beacon buf later in vdev teardown

My wave-1 firmware often crashes when I am bringing down
AP vdevs, and sometimes at least some machines lockup hard
after spewing IOMMU errors.

I don't see the same issue in STA mode, so I suspect beacons
are the issue.

Moving the beacon buf deletion to later in the vdev teardown
logic appears to help this problem. Firmware still crashes
often, but several iterations did not show IOMMU errors and
machine didn't hang.

Tested hardware: QCA9880
Tested firmware: ath10k-ct from beginning of 2019, exact version unknown

Signed-off-by: Ben Greear <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

Revert "drm/i915/userptr: Acquire the page lock around set_page_dirty()"

The userptr put_pages can be called from inside try_to_unmap, and so
enters with the page lock held on one of the object's backing pages. We
cannot take the page lock ourselves for fear of recursion.

Reported-by: Lionel Landwerlin <[email protected]>
Reported-by: Martin Wilck <[email protected]>
Reported-by: Leo Kraav <[email protected]>
Fixes: aa56a292ce62 ("drm/i915/userptr: Acquire the page lock around set_page_dirty()")
References: https://bugzilla.kernel.org/show_bug.cgi?id=203317
Signed-off-by: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'for-linus-20190912' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux

Pull clone3 fix from Christian Brauner:
"This is a last-minute bugfix for clone3() that should go in before we
  release 5.3 with clone3().

  clone3() did not verify that the exit_signal argument was set to a
  valid signal. This can be used to cause a crash by specifying a signal
  greater than NSIG. e.g. -1.

  The commit from Eugene adds a check to copy_clone_args_from_user() to
  verify that the exit signal is limited by CSIGNAL as with legacy
  clone() and that the signal is valid. With this we don't get the
  legacy clone behavior were an invalid signal could be handed down and
  would only be detected and then ignored in do_notify_parent(). Users
  of clone3() will now get a proper error right when they pass an
  invalid exit signal. Note, that this is not a change in user-visible
  behavior since no kernel with clone3() has been released yet"

* tag 'for-linus-20190912' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
  fork: block invalid exit signals with clone3()

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"A KVM guest fix, and a kdump kernel relocation errors fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/timer: Force PIT initialization when !X86_FEATURE_ARAT
x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors

Merge tag 'drm-misc-fixes-2019-09-12' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

drm-misc-fixes for v5.3 final:
- Constify modes whitelist harder.
- Fix lima driver gem_wait ioctl.

Signed-off-by: Dave Airlie <[email protected]>
From: Maarten Lankhorst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'drm-intel-fixes-2019-09-11' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

Final drm/i915 fixes for v5.3:
- Fox DP MST high color depth regression
- Fix GPU hangs on Vulkan compute workloads

Signed-off-by: Dave Airlie <[email protected]>
From: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

fork: block invalid exit signals with clone3()

Previously, higher 32 bits of exit_signal fields were lost when copied
to the kernel args structure (that uses int as a type for the respective
field). Moreover, as Oleg has noted, exit_signal is used unchecked, so
it has to be checked for sanity before use; for the legacy syscalls,
applying CSIGNAL mask guarantees that it is at least non-negative;
however, there's no such thing is done in clone3() code path, and that
can break at least thread_group_leader.

This commit adds a check to copy_clone_args_from_user() to verify that
the exit signal is limited by CSIGNAL as with legacy clone() and that
the signal is valid. With this we don't get the legacy clone behavior
were an invalid signal could be handed down and would only be detected
and ignored in do_notify_parent(). Users of clone3() will now get a
proper error when they pass an invalid exit signal. Note, that this is
not user-visible behavior since no kernel with clone3() has been
released yet.

The following program will cause a splat on a non-fixed clone3() version
and will fail correctly on a fixed version:

#define _GNU_SOURCE
#include <linux/sched.h>
#include <linux/types.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
        pid_t pid = -1;
        struct clone_args args = {0};
        args.exit_signal = -1;

        pid = syscall(__NR_clone3, &args, sizeof(struct clone_args));
        if (pid < 0)
                exit(EXIT_FAILURE);

        if (pid == 0)
                exit(EXIT_SUCCESS);

        wait(NULL);

        exit(EXIT_SUCCESS);
}

Fixes: 7f192e3cd316 ("fork: add clone3")
Reported-by: Oleg Nesterov <[email protected]>
Suggested-by: Oleg Nesterov <[email protected]>
Suggested-by: Dmitry V. Levin <[email protected]>
Signed-off-by: Eugene Syromiatnikov <[email protected]>
Link: https://lore.kernel.org/r/4b38fa4ce420b119a4c6345f42fe3cec2de9b0b5.1568223594.git.esyr@redhat.com
[[email protected]: simplify check and rework commit message]
Signed-off-by: Christian Brauner <[email protected]>

KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl

When the userspace program runs the KVM_S390_INTERRUPT ioctl to inject
an interrupt, we convert them from the legacy struct kvm_s390_interrupt
to the new struct kvm_s390_irq via the s390int_to_s390irq() function.
However, this function does not take care of all types of interrupts
that we can inject into the guest later (see do_inject_vcpu()). Since we
do not clear out the s390irq values before calling s390int_to_s390irq(),
there is a chance that we copy random data from the kernel stack which
could be leaked to the userspace later.

Specifically, the problem exists with the KVM_S390_INT_PFAULT_INIT
interrupt: s390int_to_s390irq() does not handle it, and the function
__inject_pfault_init() later copies irq->u.ext which contains the
random kernel stack data. This data can then be leaked either to
the guest memory in __deliver_pfault_init(), or the userspace might
retrieve it directly with the KVM_S390_GET_IRQ_STATE ioctl.

Fix it by handling that interrupt type in s390int_to_s390irq(), too,
and by making sure that the s390irq struct is properly pre-initialized.
And while we're at it, make sure that s390int_to_s390irq() now
directly returns -EINVAL for unknown interrupt types, so that we
immediately get a proper error code in case we add more interrupt
types to do_inject_vcpu() without updating s390int_to_s390irq()
sometime in the future.

Cc: [email protected]
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Reviewed-by: Janosch Frank <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>
Link: https://lore.kernel.org/kvm/[email protected]
Signed-off-by: Christian Borntraeger <[email protected]>

sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()'

The '.exit' functions from 'pernet_operations' structure should be marked
as __net_exit, not __net_init.

Fixes: 8e2d61e0aed2 ("sctp: fix race on protocol/netns initialization")
Signed-off-by: Christophe JAILLET <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cxgb4: Fix spelling typos

Fix several spelling typos in comments in t4_hw.c.

Signed-off-by: Arkadiusz Drabczyk <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ixgbe: Fix secpath usage for IPsec TX offload.

The ixgbe driver currently does IPsec TX offloading
based on an existing secpath. However, the secpath
can also come from the RX side, in this case it is
misinterpreted for TX offload and the packets are
dropped with a "bad sa_idx" error. Fix this by using
the xfrm_offload() function to test for TX offload.

Fixes: 592594704761 ("ixgbe: process the Tx ipsec offload")
Reported-by: Michael Marley <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Btrfs: fix unwritten extent buffers and hangs on future writeback attempts

The lock_extent_buffer_io() returns 1 to the caller to tell it everything
went fine and the callers needs to start writeback for the extent buffer
(submit a bio, etc), 0 to tell the caller everything went fine but it does
not need to start writeback for the extent buffer, and a negative value if
some error happened.

When it's about to return 1 it tries to lock all pages, and if a try lock
on a page fails, and we didn't flush any existing bio in our "epd", it
calls flush_write_bio(epd) and overwrites the return value of 1 to 0 or
an error. The page might have been locked elsewhere, not with the goal
of starting writeback of the extent buffer, and even by some code other
than btrfs, like page migration for example, so it does not mean the
writeback of the extent buffer was already started by some other task,
so returning a 0 tells the caller (btree_write_cache_pages()) to not
start writeback for the extent buffer. Note that epd might currently have
either no bio, so flush_write_bio() returns 0 (success) or it might have
a bio for another extent buffer with a lower index (logical address).

Since we return 0 with the EXTENT_BUFFER_WRITEBACK bit set on the
extent buffer and writeback is never started for the extent buffer,
future attempts to writeback the extent buffer will hang forever waiting
on that bit to be cleared, since it can only be cleared after writeback
completes. Such hang is reported with a trace like the following:

  [49887.347053] INFO: task btrfs-transacti:1752 blocked for more than 122 seconds.
  [49887.347059]       Not tainted 5.2.13-gentoo #2
  [49887.347060] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [49887.347062] btrfs-transacti D    0  1752      2 0x80004000
  [49887.347064] Call Trace:
  [49887.347069]  ? __schedule+0x265/0x830
  [49887.347071]  ? bit_wait+0x50/0x50
  [49887.347072]  ? bit_wait+0x50/0x50
  [49887.347074]  schedule+0x24/0x90
  [49887.347075]  io_schedule+0x3c/0x60
  [49887.347077]  bit_wait_io+0x8/0x50
  [49887.347079]  __wait_on_bit+0x6c/0x80
  [49887.347081]  ? __lock_release.isra.29+0x155/0x2d0
  [49887.347083]  out_of_line_wait_on_bit+0x7b/0x80
  [49887.347084]  ? var_wake_function+0x20/0x20
  [49887.347087]  lock_extent_buffer_for_io+0x28c/0x390
  [49887.347089]  btree_write_cache_pages+0x18e/0x340
  [49887.347091]  do_writepages+0x29/0xb0
  [49887.347093]  ? kmem_cache_free+0x132/0x160
  [49887.347095]  ? convert_extent_bit+0x544/0x680
  [49887.347097]  filemap_fdatawrite_range+0x70/0x90
  [49887.347099]  btrfs_write_marked_extents+0x53/0x120
  [49887.347100]  btrfs_write_and_wait_transaction.isra.4+0x38/0xa0
  [49887.347102]  btrfs_commit_transaction+0x6bb/0x990
  [49887.347103]  ? start_transaction+0x33e/0x500
  [49887.347105]  transaction_kthread+0x139/0x15c

So fix this by not overwriting the return value (ret) with the result
from flush_write_bio(). We also need to clear the EXTENT_BUFFER_WRITEBACK
bit in case flush_write_bio() returns an error, otherwise it will hang
any future attempts to writeback the extent buffer, and undo all work
done before (set back EXTENT_BUFFER_DIRTY, etc).

This is a regression introduced in the 5.2 kernel.

Fixes: 2e3c25136adfb ("btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io()")
Fixes: f4340622e0226 ("btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up")
Reported-by: Zdenek Sojka <[email protected]>
Link: https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%[email protected]/T/#u
Reported-by: Stefan Priebe - Profihost AG <[email protected]>
Link: https://lore.kernel.org/linux-btrfs/[email protected]/T/#t
Reported-by: Drazen Kacar <[email protected]>
Link: https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>

Btrfs: fix assertion failure during fsync and use of stale transaction

Sometimes when fsync'ing a file we need to log that other inodes exist and
when we need to do that we acquire a reference on the inodes and then drop
that reference using iput() after logging them.

That generally is not a problem except if we end up doing the final iput()
(dropping the last reference) on the inode and that inode has a link count
of 0, which can happen in a very short time window if the logging path
gets a reference on the inode while it's being unlinked.

In that case we end up getting the eviction callback, btrfs_evict_inode(),
invoked through the iput() call chain which needs to drop all of the
inode's items from its subvolume btree, and in order to do that, it needs
to join a transaction at the helper function evict_refill_and_join().
However because the task previously started a transaction at the fsync
handler, btrfs_sync_file(), it has current->journal_info already pointing
to a transaction handle and therefore evict_refill_and_join() will get
that transaction handle from btrfs_join_transaction(). From this point on,
two different problems can happen:

1) evict_refill_and_join() will often change the transaction handle's
   block reserve (->block_rsv) and set its ->bytes_reserved field to a
   value greater than 0. If evict_refill_and_join() never commits the
   transaction, the eviction handler ends up decreasing the reference
   count (->use_count) of the transaction handle through the call to
   btrfs_end_transaction(), and after that point we have a transaction
   handle with a NULL ->block_rsv (which is the value prior to the
   transaction join from evict_refill_and_join()) and a ->bytes_reserved
   value greater than 0. If after the eviction/iput completes the inode
   logging path hits an error or it decides that it must fallback to a
   transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
   non-zero value from btrfs_log_dentry_safe(), and because of that
   non-zero value it tries to commit the transaction using a handle with
   a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
   the transaction commit hit an assertion failure at
   btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
   the ->block_rsv is NULL. The produced stack trace for that is like the
   following:

   [192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
   [192922.917553] ------------[ cut here ]------------
   [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
   [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
   [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G        W         5.1.4-btrfs-next-47 #1
   [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
   [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
   (...)
   [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
   [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
   [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
   [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
   [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
   [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
   [192922.923200] FS:  00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
   [192922.923579] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
   [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
   [192922.925105] Call Trace:
   [192922.925505]  btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
   [192922.925911]  btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
   [192922.926324]  btrfs_sync_file+0x44c/0x490 [btrfs]
   [192922.926731]  do_fsync+0x38/0x60
   [192922.927138]  __x64_sys_fdatasync+0x13/0x20
   [192922.927543]  do_syscall_64+0x60/0x1c0
   [192922.927939]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
   (...)
   [192922.934077] ---[ end trace f00808b12068168f ]---

2) If evict_refill_and_join() decides to commit the transaction, it will
   be able to do it, since the nested transaction join only increments the
   transaction handle's ->use_count reference counter and it does not
   prevent the transaction from getting committed. This means that after
   eviction completes, the fsync logging path will be using a transaction
   handle that refers to an already committed transaction. What happens
   when using such a stale transaction can be unpredictable, we are at
   least having a use-after-free on the transaction handle itself, since
   the transaction commit will call kmem_cache_free() against the handle
   regardless of its ->use_count value, or we can end up silently losing
   all the updates to the log tree after that iput() in the logging path,
   or using a transaction handle that in the meanwhile was allocated to
   another task for a new transaction, etc, pretty much unpredictable
   what can happen.

In order to fix both of them, instead of using iput() during logging, use
btrfs_add_delayed_iput(), so that the logging path of fsync never drops
the last reference on an inode, that step is offloaded to a safe context
(usually the cleaner kthread).

The assertion failure issue was sporadically triggered by the test case
generic/475 from fstests, which loads the dm error target while fsstress
is running, which lead to fsync failing while logging inodes with -EIO
errors and then trying later to commit the transaction, triggering the
assertion failure.

CC: [email protected] # 4.4+
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>

KVM: s390: kvm_s390_vm_start_migration: check dirty_bitmap before using it as target for memset()

If userspace doesn't set KVM_MEM_LOG_DIRTY_PAGES on memslot before calling
kvm_s390_vm_start_migration(), kernel will oops with:

  Unable to handle kernel pointer dereference in virtual kernel address space
  Failing address: 0000000000000000 TEID: 0000000000000483
  Fault in home space mode while using kernel ASCE.
  AS:0000000002a2000b R2:00000001bff8c00b R3:00000001bff88007 S:00000001bff91000 P:000000000000003d
  Oops: 0004 ilc:2 [#1] SMP
  ...
  Call Trace:
  ([<001fffff804ec552>] kvm_s390_vm_set_attr+0x347a/0x3828 [kvm])
   [<001fffff804ecfc0>] kvm_arch_vm_ioctl+0x6c0/0x1998 [kvm]
   [<001fffff804b67e4>] kvm_vm_ioctl+0x51c/0x11a8 [kvm]
   [<00000000008ba572>] do_vfs_ioctl+0x1d2/0xe58
   [<00000000008bb284>] ksys_ioctl+0x8c/0xb8
   [<00000000008bb2e2>] sys_ioctl+0x32/0x40
   [<000000000175552c>] system_call+0x2b8/0x2d8
  INFO: lockdep is turned off.
  Last Breaking-Event-Address:
   [<0000000000dbaf60>] __memset+0xc/0xa0

due to ms->dirty_bitmap being NULL, which might crash the host.

Make sure that ms->dirty_bitmap is set before using it or
return -EINVAL otherwise.

Cc: <[email protected]>
Fixes: afdad61615cc ("KVM: s390: Fix storage attributes migration with memory slots")
Signed-off-by: Igor Mammedov <[email protected]>
Link: https://lore.kernel.org/kvm/[email protected]/
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Reviewed-by: Janosch Frank <[email protected]>
Signed-off-by: Janosch Frank <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>

net: qrtr: fix memort leak in qrtr_tun_write_iter

In qrtr_tun_write_iter the allocated kbuf should be release in case of
error or success return.

v2 Update: Thanks to David Miller for pointing out the release on success
path as well.

Signed-off-by: Navid Emamdoost <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Fix null de-reference of device refcount

In event of failure during register_netdevice, free_netdev is
invoked immediately. free_netdev assumes that all the netdevice
refcounts have been dropped prior to it being called and as a
result frees and clears out the refcount pointer.

However, this is not necessarily true as some of the operations
in the NETDEV_UNREGISTER notifier handlers queue RCU callbacks for
invocation after a grace period. The IPv4 callback in_dev_rcu_put
tries to access the refcount after free_netdev is called which
leads to a null de-reference-

44837.761523:   <6> Unable to handle kernel paging request at
                    virtual address 0000004a88287000
44837.761651:   <2> pc : in_dev_finish_destroy+0x4c/0xc8
44837.761654:   <2> lr : in_dev_finish_destroy+0x2c/0xc8
44837.762393:   <2> Call trace:
44837.762398:   <2>  in_dev_finish_destroy+0x4c/0xc8
44837.762404:   <2>  in_dev_rcu_put+0x24/0x30
44837.762412:   <2>  rcu_nocb_kthread+0x43c/0x468
44837.762418:   <2>  kthread+0x118/0x128
44837.762424:   <2>  ret_from_fork+0x10/0x1c

Fix this by waiting for the completion of the call_rcu() in
case of register_netdevice errors.

Fixes: 93ee31f14f6f ("[NET]: Fix free_netdev on register_netdev failure.")
Cc: Sean Tranchetti <[email protected]>
Signed-off-by: Subash Abhinov Kasiviswanathan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'add-ksz9567-with-I2C-support-to-ksz9477-driver'

George McCollister says:

====================
add ksz9567 with I2C support to ksz9477 driver

Resurrect KSZ9477 I2C driver support patch originally sent to the list
by Tristram Ha and resolve outstanding issues. It now works as similarly to
the ksz9477 SPI driver as possible, using the same regmap macros.

Add support for ksz9567 to the ksz9477 driver (tested on a board with
ksz9567 connected via I2C).

Remove NET_DSA_TAG_KSZ_COMMON since it's not needed.

Changes since v1:
Put ksz9477_i2c.c includes in alphabetical order.
Added Reviewed-Bys.
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: microchip: remove NET_DSA_TAG_KSZ_COMMON

Remove the superfluous NET_DSA_TAG_KSZ_COMMON and just use the existing
NET_DSA_TAG_KSZ. Update the description to mention the three switch
families it supports. No functional change.

Signed-off-by: George McCollister <[email protected]>
Reviewed-by: Marek Vasut <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: microchip: add ksz9567 to ksz9477 driver

Add support for the KSZ9567 7-Port Gigabit Ethernet Switch to the
ksz9477 driver. The KSZ9567 supports both SPI and I2C. Oddly the
ksz9567 is already in the device tree binding documentation.

Signed-off-by: George McCollister <[email protected]>
Reviewed-by: Marek Vasut <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: microchip: add KSZ9477 I2C driver

Add KSZ9477 I2C driver support. The code ksz9477.c and ksz_common.c are
used together to generate the I2C driver.

Signed-off-by: Tristram Ha <[email protected]>
[[email protected]: bring up to date, use ksz_common regmap macros]
Signed-off-by: George McCollister <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()'

The '.exit' functions from 'pernet_operations' structure should be marked
as __net_exit, not __net_init.

Fixes: d862e5461423 ("net: ipv6: Implement /proc/net/icmp6.")
Signed-off-by: Christophe JAILLET <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tun: fix use-after-free when register netdev failed

I got a UAF repport in tun driver when doing fuzzy test:

[  466.269490] ==================================================================
[  466.271792] BUG: KASAN: use-after-free in tun_chr_read_iter+0x2ca/0x2d0
[  466.271806] Read of size 8 at addr ffff888372139250 by task tun-test/2699
[  466.271810]
[  466.271824] CPU: 1 PID: 2699 Comm: tun-test Not tainted 5.3.0-rc1-00001-g5a9433db2614-dirty #427
[  466.271833] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[  466.271838] Call Trace:
[  466.271858]  dump_stack+0xca/0x13e
[  466.271871]  ? tun_chr_read_iter+0x2ca/0x2d0
[  466.271890]  print_address_description+0x79/0x440
[  466.271906]  ? vprintk_func+0x5e/0xf0
[  466.271920]  ? tun_chr_read_iter+0x2ca/0x2d0
[  466.271935]  __kasan_report+0x15c/0x1df
[  466.271958]  ? tun_chr_read_iter+0x2ca/0x2d0
[  466.271976]  kasan_report+0xe/0x20
[  466.271987]  tun_chr_read_iter+0x2ca/0x2d0
[  466.272013]  do_iter_readv_writev+0x4b7/0x740
[  466.272032]  ? default_llseek+0x2d0/0x2d0
[  466.272072]  do_iter_read+0x1c5/0x5e0
[  466.272110]  vfs_readv+0x108/0x180
[  466.299007]  ? compat_rw_copy_check_uvector+0x440/0x440
[  466.299020]  ? fsnotify+0x888/0xd50
[  466.299040]  ? __fsnotify_parent+0xd0/0x350
[  466.299064]  ? fsnotify_first_mark+0x1e0/0x1e0
[  466.304548]  ? vfs_write+0x264/0x510
[  466.304569]  ? ksys_write+0x101/0x210
[  466.304591]  ? do_preadv+0x116/0x1a0
[  466.304609]  do_preadv+0x116/0x1a0
[  466.309829]  do_syscall_64+0xc8/0x600
[  466.309849]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  466.309861] RIP: 0033:0x4560f9
[  466.309875] Code: 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
[  466.309889] RSP: 002b:00007ffffa5166e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000127
[  466.322992] RAX: ffffffffffffffda RBX: 0000000000400460 RCX: 00000000004560f9
[  466.322999] RDX: 0000000000000003 RSI: 00000000200008c0 RDI: 0000000000000003
[  466.323007] RBP: 00007ffffa516700 R08: 0000000000000004 R09: 0000000000000000
[  466.323014] R10: 0000000000000000 R11: 0000000000000206 R12: 000000000040cb10
[  466.323021] R13: 0000000000000000 R14: 00000000006d7018 R15: 0000000000000000
[  466.323057]
[  466.323064] Allocated by task 2605:
[  466.335165]  save_stack+0x19/0x80
[  466.336240]  __kasan_kmalloc.constprop.8+0xa0/0xd0
[  466.337755]  kmem_cache_alloc+0xe8/0x320
[  466.339050]  getname_flags+0xca/0x560
[  466.340229]  user_path_at_empty+0x2c/0x50
[  466.341508]  vfs_statx+0xe6/0x190
[  466.342619]  __do_sys_newstat+0x81/0x100
[  466.343908]  do_syscall_64+0xc8/0x600
[  466.345303]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  466.347034]
[  466.347517] Freed by task 2605:
[  466.348471]  save_stack+0x19/0x80
[  466.349476]  __kasan_slab_free+0x12e/0x180
[  466.350726]  kmem_cache_free+0xc8/0x430
[  466.351874]  putname+0xe2/0x120
[  466.352921]  filename_lookup+0x257/0x3e0
[  466.354319]  vfs_statx+0xe6/0x190
[  466.355498]  __do_sys_newstat+0x81/0x100
[  466.356889]  do_syscall_64+0xc8/0x600
[  466.358037]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  466.359567]
[  466.360050] The buggy address belongs to the object at ffff888372139100
[  466.360050]  which belongs to the cache names_cache of size 4096
[  466.363735] The buggy address is located 336 bytes inside of
[  466.363735]  4096-byte region [ffff888372139100, ffff88837213a100)
[  466.367179] The buggy address belongs to the page:
[  466.368604] page:ffffea000dc84e00 refcount:1 mapcount:0 mapping:ffff8883df1b4f00 index:0x0 compound_mapcount: 0
[  466.371582] flags: 0x2fffff80010200(slab|head)
[  466.372910] raw: 002fffff80010200 dead000000000100 dead000000000122 ffff8883df1b4f00
[  466.375209] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[  466.377778] page dumped because: kasan: bad access detected
[  466.379730]
[  466.380288] Memory state around the buggy address:
[  466.381844]  ffff888372139100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.384009]  ffff888372139180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.386131] >ffff888372139200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.388257]                                                  ^
[  466.390234]  ffff888372139280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.392512]  ffff888372139300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.394667] ==================================================================

tun_chr_read_iter() accessed the memory which freed by free_netdev()
called by tun_set_iff():

        CPUA                                           CPUB
  tun_set_iff()
    alloc_netdev_mqs()
    tun_attach()
                                                  tun_chr_read_iter()
                                                    tun_get()
                                                    tun_do_read()
                                                      tun_ring_recv()
    register_netdevice() <-- inject error
    goto err_detach
    tun_detach_all() <-- set RCV_SHUTDOWN
    free_netdev() <-- called from
                     err_free_dev path
      netdev_freemem() <-- free the memory
                        without check refcount
      (In this path, the refcount cannot prevent
       freeing the memory of dev, and the memory
       will be used by dev_put() called by
       tun_chr_read_iter() on CPUB.)
                                                     (Break from tun_ring_recv(),
                                                     because RCV_SHUTDOWN is set)
                                                   tun_put()
                                                     dev_put() <-- use the memory
                                                                   freed by netdev_freemem()

Put the publishing of tfile->tun after register_netdevice(),
so tun_get() won't get the tun pointer that freed by
err_detach path if register_netdevice() failed.

Fixes: eb0fb363f920 ("tuntap: attach queue 0 before registering netdevice")
Reported-by: Hulk Robot <[email protected]>
Suggested-by: Jason Wang <[email protected]>
Signed-off-by: Yang Yingliang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
"Last minute bugfixes.

  A couple of security things.

  And an error handling bugfix that is never encountered by most people,
  but that also makes it kind of safe to push at the last minute, and it
  helps push the fix to stable a bit sooner"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost: make sure log_num < in_num
  vhost: block speculation of translated descriptors
  virtio_ring: fix unmap of indirect descriptors

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Ingo Molnar:
"Fix an initialization bug in the hw-breakpoints, which triggered on
the ARM platform"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/hw_breakpoint: Fix arch_hw_breakpoint use-before-initialization

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fix from Ingo Molnar:
"Fix a race in the IRQ resend mechanism, which can result in a NULL
dereference crash"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Prevent NULL pointer dereference in resend_irqs()

Merge tag 'pinctrl-v5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fix from Linus Walleij:
"Hopefully last pin control fix: a single patch for some Aspeed
problems. The BMCs are much happier now"

* tag 'pinctrl-v5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: aspeed: Fix spurious mux failures on the AST2500

Merge tag 'gpio-v5.3-6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
"I don't really like to send so many fixes at the very last minute, but
  the bug-sport activity is unpredictable.

  Four fixes, three are -stable material that will go everywhere, one is
  for the current cycle:

   - An ACPI DSDT error fixup of the type we always see and Hans
     invariably gets to fix.

   - A OF quirk fix for the current release (v5.3)

   - Some consistency checks on the userspace ABI.

   - A memory leak"

* tag 'gpio-v5.3-6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpiolib: acpi: Add gpiolib_acpi_run_edge_events_on_boot option and blacklist
  gpiolib: of: fix fallback quirks handling
  gpio: fix line flag validation in lineevent_create
  gpio: fix line flag validation in linehandle_create
  gpio: mockup: add missing single_release()

pinctrl: aspeed: Fix spurious mux failures on the AST2500

Commit 674fa8daa8c9 ("pinctrl: aspeed-g5: Delay acquisition of regmaps")
was determined to be a partial fix to the problem of acquiring the LPC
Host Controller and GFX regmaps: The AST2500 pin controller may need to
fetch syscon regmaps during expression evaluation as well as when
setting mux state. For example, this case is hit by attempting to export
pins exposing the LPC Host Controller as GPIOs.

An optional eval() hook is added to the Aspeed pinmux operation struct
and called from aspeed_sig_expr_eval() if the pointer is set by the
SoC-specific driver. This enables the AST2500 to perform the custom
action of acquiring its regmap dependencies as required.

John Wang tested the fix on an Inspur FP5280G2 machine (AST2500-based)
where the issue was found, and I've booted the fix on Witherspoon
(AST2500) and Palmetto (AST2400) machines, and poked at relevant pins
under QEMU by forcing mux configurations via devmem before exporting
GPIOs to exercise the driver.

Fixes: 7d29ed88acbb ("pinctrl: aspeed: Read and write bits in LPC and GFX controllers")
Fixes: 674fa8daa8c9 ("pinctrl: aspeed-g5: Delay acquisition of regmaps")
Reported-by: John Wang <[email protected]>
Tested-by: John Wang <[email protected]>
Signed-off-by: Andrew Jeffery <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>

Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2019-09-11

This series contains fixes to ixgbe.

Alex fixes up the adaptive ITR scheme for ixgbe which could result in a
value that was either 0 or something less than 10 which was causing
issues with hardware features, like RSC, that do not function well with
ITR values that low.

Ilya Maximets fixes the ixgbe driver to limit the number of transmit
descriptors to clean by the number of transmit descriptors used in the
transmit ring, so that the driver does not try to "double" clean the
same descriptors.
====================

Signed-off-by: David S. Miller <[email protected]>

nfp: read chip model from the PluDevice register

The PluDevice register provides the authoritative chip model/revision.

Since the model number is purely used for reporting purposes, follow
the hardware team convention of subtracting 0x10 from the PluDevice
register to obtain the chip model/revision number.

Suggested-by: Francois H. Theron <[email protected]>
Signed-off-by: Dirk van der Merwe <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: force a PSH flag on TSO packets

When tcp sends a TSO packet, adding a PSH flag on it
reduces the sojourn time of GRO packet in GRO receivers.

This is particularly the case under pressure, since RX queues
receive packets for many concurrent flows.

A sender can give a hint to GRO engines when it is
appropriate to flush a super-packet, especially when pacing
is in the picture, since next packet is probably delayed by
one ms.

Having less packets in GRO engine reduces chance
of LRU eviction or inflated RTT, and reduces GRO cost.

We found recently that we must not set the PSH flag on
individual full-size MSS segments [1] :

Under pressure (CWR state), we better let the packet sit
for a small delay (depending on NAPI logic) so that the
ACK packet is delayed, and thus next packet we send is
also delayed a bit. Eventually the bottleneck queue can
be drained. DCTCP flows with CWND=1 have demonstrated
the issue.

This patch allows to slowdown the aggregate traffic without
involving high resolution timers on senders and/or
receivers.

It has been used at Google for about four years,
and has been discussed at various networking conferences.

[1] segments smaller than MSS already have PSH flag set
by tcp_sendmsg() / tcp_mark_push(), unless MSG_MORE
has been requested by the user.

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Soheil Hassas Yeganeh <[email protected]>
Cc: Neal Cardwell <[email protected]>
Cc: Yuchung Cheng <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Tariq Toukan <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR

Fix tcp_ecn_withdraw_cwr() to clear the correct bit:
TCP_ECN_QUEUE_CWR.

Rationale: basically, TCP_ECN_DEMAND_CWR is a bit that is purely about
the behavior of data receivers, and deciding whether to reflect
incoming IP ECN CE marks as outgoing TCP th->ece marks. The
TCP_ECN_QUEUE_CWR bit is purely about the behavior of data senders,
and deciding whether to send CWR. The tcp_ecn_withdraw_cwr() function
is only called from tcp_undo_cwnd_reduction() by data senders during
an undo, so it should zero the sender-side state,
TCP_ECN_QUEUE_CWR. It does not make sense to stop the reflection of
incoming CE bits on incoming data packets just because outgoing
packets were spuriously retransmitted.

The bug has been reproduced with packetdrill to manifest in a scenario
with RFC3168 ECN, with an incoming data packet with CE bit set and
carrying a TCP timestamp value that causes cwnd undo. Before this fix,
the IP CE bit was ignored and not reflected in the TCP ECE header bit,
and sender sent a TCP CWR ('W') bit on the next outgoing data packet,
even though the cwnd reduction had been undone. After this fix, the
sender properly reflects the CE bit and does not set the W bit.

Note: the bug actually predates 2005 git history; this Fixes footer is
chosen to be the oldest SHA1 I have tested (from Sep 2007) for which
the patch applies cleanly (since before this commit the code was in a
.h file).

Fixes: bdf1ee5d3bd3 ("[TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it")
Signed-off-by: Neal Cardwell <[email protected]>
Acked-by: Yuchung Cheng <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Cc: Eric Dumazet <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: Don't use dst gateway directly in ip6_confirm_neigh()

This is the equivalent of commit 2c6b55f45d53 ("ipv6: fix neighbour
resolution with raw socket") for ip6_confirm_neigh(): we can send a
packet with MSG_CONFIRM on a raw socket for a connected route, so the
gateway would be :: here, and we should pick the next hop using
rt6_nexthop() instead.

This was found by code review and, to the best of my knowledge, doesn't
actually fix a practical issue: the destination address from the packet
is not considered while confirming a neighbour, as ip6_confirm_neigh()
calls choose_neigh_daddr() without passing the packet, so there are no
similar issues as the one fixed by said commit.

A possible source of issues with the existing implementation might come
from the fact that, if we have a cached dst, we won't consider it,
while rt6_nexthop() takes care of that. I might just not be creative
enough to find a practical problem here: the only way to affect this
with cached routes is to have one coming from an ICMPv6 redirect, but
if the next hop is a directly connected host, there should be no
topology for which a redirect applies here, and tests with redirected
routes show no differences for MSG_CONFIRM (and MSG_PROBE) packets on
raw sockets destined to a directly connected host.

However, directly using the dst gateway here is not consistent anymore
with neighbour resolution, and, in general, as we want the next hop,
using rt6_nexthop() looks like the only sane way to fetch it.

Reported-by: Guillaume Nault <[email protected]>
Signed-off-by: Stefano Brivio <[email protected]>
Acked-by: Guillaume Nault <[email protected]>
Acked-by: Nicolas Dichtel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: stmmac: pci: Add HAPS support using GMAC5

Add the support for Synopsys HAPS board that uses GMAC5.

Signed-off-by: Jose Abreu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: dp83867: Add SGMII mode type switching

This patch adds ability to switch beetween two PHY SGMII modes.
Some hardware, for example, FPGA IP designs may use 6-wire mode
which enables differential SGMII clock to MAC.

Signed-off-by: Vitaly Gaiduk <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: dp83867: Add documentation for SGMII mode type

Add documentation of ti,sgmii-ref-clock-output-enable
which can be used to select SGMII mode type (4 or 6-wire).

Signed-off-by: Vitaly Gaiduk <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

vhost: make sure log_num < in_num

The code assumes log_num < in_num everywhere, and that is true as long as
in_num is incremented by descriptor iov count, and log_num by 1. However
this breaks if there's a zero sized descriptor.

As a result, if a malicious guest creates a vring desc with desc.len = 0,
it may cause the host kernel to crash by overflowing the log array. This
bug can be triggered during the VM migration.

There's no need to log when desc.len = 0, so just don't increment log_num
in this case.

Fixes: 3a4d5c94e959 ("vhost_net: a kernel-level virtio server")
Cc: [email protected]
Reviewed-by: Lidong Chen <[email protected]>
Signed-off-by: ruippan <[email protected]>
Signed-off-by: yongduan <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Tyler Hicks <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost: block speculation of translated descriptors

iovec addresses coming from vhost are assumed to be
pre-validated, but in fact can be speculated to a value
out of range.

Userspace address are later validated with array_index_nospec so we can
be sure kernel info does not leak through these addresses, but vhost
must also not leak userspace info outside the allowed memory table to
guests.

Following the defence in depth principle, make sure
the address is not validated out of node range.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Cc: [email protected]
Acked-by: Jason Wang <[email protected]>
Tested-by: Jason Wang <[email protected]>

ixgbe: fix double clean of Tx descriptors with xdp

Tx code doesn't clear the descriptors' status after cleaning.
So, if the budget is larger than number of used elems in a ring, some
descriptors will be accounted twice and xsk_umem_complete_tx will move
prod_tail far beyond the prod_head breaking the completion queue ring.

Fix that by limiting the number of descriptors to clean by the number
of used descriptors in the Tx ring.

'ixgbe_clean_xdp_tx_irq()' function refactored to look more like
'ixgbe_xsk_clean_tx_ring()' since we're allowed to directly use
'next_to_clean' and 'next_to_use' indexes.

CC: [email protected]
Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support")
Signed-off-by: Ilya Maximets <[email protected]>
Tested-by: William Tu <[email protected]>
Tested-by: Eelco Chaudron <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

ixgbe: Prevent u8 wrapping of ITR value to something less than 10us

There were a couple cases where the ITR value generated via the adaptive
ITR scheme could exceed 126. This resulted in the value becoming either 0
or something less than 10. Switching back and forth between a value less
than 10 and a value greater than 10 can cause issues as certain hardware
features such as RSC to not function well when the ITR value has dropped
that low.

CC: [email protected]
Fixes: b4ded8327fea ("ixgbe: Update adaptive ITR algorithm")
Reported-by: Gregg Leventhal <[email protected]>
Signed-off-by: Alexander Duyck <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: fix potential RX buffer starvation for AF_XDP

When the RX rings are created they are also populated with buffers
so that packets can be received. Usually these are kernel buffers,
but for AF_XDP in zero-copy mode, these are user-space buffers and
in this case the application might not have sent down any buffers
to the driver at this point. And if no buffers are allocated at ring
creation time, no packets can be received and no interrupts will be
generated so the NAPI poll function that allocates buffers to the
rings will never get executed.

To rectify this, we kick the NAPI context of any queue with an
attached AF_XDP zero-copy socket in two places in the code. Once
after an XDP program has loaded and once after the umem is registered.
This take care of both cases: XDP program gets loaded first then AF_XDP
socket is created, and the reverse, AF_XDP socket is created first,
then XDP program is loaded.

Fixes: 0a714186d3c0 ("i40e: add AF_XDP zero-copy Rx support")
Signed-off-by: Magnus Karlsson <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

net/ixgbevf: make array api static const, makes object smaller

Don't populate the array API on the stack but instead make it
static const. Makes the object code smaller by 58 bytes.

Before:
   text    data     bss     dec     hex filename
  82969    9763     256   92988   16b3c ixgbevf/ixgbevf_main.o

After:
   text    data     bss     dec     hex filename
  82815    9859     256   92930   16b02 ixgbevf/ixgbevf_main.o

(gcc version 9.2.1, amd64)

Signed-off-by: Colin Ian King <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

iavf: fix MAC address setting for VFs when filter is rejected

Currently iavf unconditionally applies MAC address change requests. This
brings the VF in a state where it is no longer able to pass traffic if
the PF rejects a MAC filter change for the VF.
A typical scenario for a rejected MAC filter is for an untrusted VF to
request to change the MAC address when an administratively set MAC is
present.

To keep iavf working in this scenario the MAC filter handling in iavf
needs to act on the PF reply regarding the MAC filter change. In the
case of an ack the new MAC address gets set, whereas in the case of a
nack the previous MAC address needs to stay in place.

Signed-off-by: Stefan Assmann <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: clear __I40E_VIRTCHNL_OP_PENDING on invalid min Tx rate

In the case of an invalid min Tx rate being requested
i40e_ndo_set_vf_bw() immediately returns -EINVAL instead of releasing
__I40E_VIRTCHNL_OP_PENDING first.

Signed-off-by: Stefan Assmann <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: use BIT macro to specify the cloud filter field flags

The macros used to specify the cloud filter fields are intended to be
individual bits. Declare them using the BIT() macro to make their
intention a little more clear.

Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: Fix message for other card without FEC.

When variable "req_fec, fec, an" are empty,
dmesg shows log with "Requested FEC: , Negotiated FEC: , Autoneg:".
Add link dmesg log for cards without FEC.

Signed-off-by: Czeslaw Zagorski <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: fix missed "Negotiated" string in i40e_print_link_message()

The "Negotiated" string in i40e_print_link_message() function was missed.
This string has been added to the dmesg and small refactoring done removing
common substrings and unifying link status message format.
Without this patch it was not clear that FEC is related to negotiated FEC.

Signed-off-by: Aleksandr Loktionov <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: mark additional missing bits as reserved

Mark bits 0xD through 0xF for the command flags of a cloud filter as
reserved. These bits are not yet defined and are considered as reserved
in the data sheet.

Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: remove I40E_AQC_ADD_CLOUD_FILTER_OIP

The bit 0x0001 used in the cloud filters adminq command is reserved, and
is not actually a valid type.

The Linux driver has never used this type, and it's not clear if any
driver ever has.

Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: use ktime_get_real_ts64 instead of ktime_to_timespec64

Remove a call to ktime_to_timespec64 by calling ktime_get_real_ts64
directly.

Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>