linux.git
18 months agoi40e: Refactor I40E_MDIO_CLAUSE* macros
Ivan Vecera [Wed, 27 Sep 2023 08:31:29 +0000 (10:31 +0200)]
i40e: Refactor I40E_MDIO_CLAUSE* macros

The macros I40E_MDIO_CLAUSE22* and I40E_MDIO_CLAUSE45* are using I40E_MASK
together with the same values I40E_GLGEN_MSCA_STCODE_SHIFT and
I40E_GLGEN_MSCA_OPCODE_SHIFT to define masks.
Introduce I40E_GLGEN_MSCA_OPCODE_MASK and I40E_GLGEN_MSCA_STCODE_MASK
for both shifts in i40e_register.h and use them to refactor the macros
mentioned above.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
18 months agoi40e: Move I40E_MASK macro to i40e_register.h
Ivan Vecera [Wed, 27 Sep 2023 08:31:28 +0000 (10:31 +0200)]
i40e: Move I40E_MASK macro to i40e_register.h

The macro is practically used only in i40e_register.h header file except
few I40E_MDIO_CLAUSE* macros that are defined in i40e_type.h
Move I40E_MASK macro to i40e_register.h header, I40E_MDIO_CLAUSE* macros
are refactored in subsequent patch.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
18 months agoi40e: Remove back pointer from i40e_hw structure
Ivan Vecera [Wed, 27 Sep 2023 08:31:27 +0000 (10:31 +0200)]
i40e: Remove back pointer from i40e_hw structure

The .back field placed in i40e_hw is used to get pointer to i40e_pf
instance but it is not necessary as the i40e_hw is a part of i40e_pf
and containerof macro can be used to obtain the pointer to i40e_pf.
Remove .back field from i40e_hw structure, introduce i40e_hw_to_pf()
and i40e_hw_to_dev() helpers and use them.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
18 months agonet: lan743x: also select PHYLIB
Randy Dunlap [Mon, 2 Oct 2023 19:35:44 +0000 (12:35 -0700)]
net: lan743x: also select PHYLIB

Since FIXED_PHY depends on PHYLIB, PHYLIB needs to be set to avoid
a kconfig warning:

WARNING: unmet direct dependencies detected for FIXED_PHY
  Depends on [n]: NETDEVICES [=y] && PHYLIB [=n]
  Selected by [y]:
  - LAN743X [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_MICROCHIP [=y] && PCI [=y] && PTP_1588_CLOCK_OPTIONAL [=y]

Fixes: 73c4d1b307ae ("net: lan743x: select FIXED_PHY")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Closes: lore.kernel.org/r/202309261802.JPbRHwti-lkp@intel.com
Cc: Bryan Whitehead <bryan.whitehead@microchip.com>
Cc: UNGLinuxDriver@microchip.com
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Link: https://lore.kernel.org/r/20231002193544.14529-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: ethernet: mediatek: disable irq before schedule napi
Christian Marangi [Mon, 2 Oct 2023 14:08:05 +0000 (16:08 +0200)]
net: ethernet: mediatek: disable irq before schedule napi

While searching for possible refactor of napi_schedule_prep and
__napi_schedule it was notice that the mtk eth driver disable the
interrupt for rx and tx AFTER napi is scheduled.

While this is a very hard to repro case it might happen to have
situation where the interrupt is disabled and never enabled again as the
napi completes and the interrupt is enabled before.

This is caused by the fact that a napi driven by interrupt expect a
logic with:
1. interrupt received. napi prepared -> interrupt disabled -> napi
   scheduled
2. napi triggered. ring cleared -> interrupt enabled -> wait for new
   interrupt

To prevent this case, disable the interrupt BEFORE the napi is
scheduled.

Fixes: 656e705243fd ("net-next: mediatek: add support for MT7623 ethernet")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://lore.kernel.org/r/20231002140805.568-1-ansuelsmth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet_sched: sch_fq: add TCA_FQ_WEIGHTS attribute
Eric Dumazet [Mon, 2 Oct 2023 13:17:38 +0000 (13:17 +0000)]
net_sched: sch_fq: add TCA_FQ_WEIGHTS attribute

This attribute can be used to tune the per band weight
and report them in "tc qdisc show" output:

qdisc fq 802f: parent 1:9 limit 100000p flow_limit 500p buckets 1024 orphan_mask 1023
 quantum 8364b initial_quantum 41820b low_rate_threshold 550Kbit
 refill_delay 40ms timer_slack 10us horizon 10s horizon_drop
 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 weights 589824 196608 65536
 Sent 236460814 bytes 792991 pkt (dropped 0, overlimits 0 requeues 0)
 rate 25816bit 10pps backlog 0b 0p requeues 0
  flows 4 (inactive 4 throttled 0)
  gc 0 throttled 19 latency 17.6us fastpath 773882

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet_sched: sch_fq: add 3 bands and WRR scheduling
Eric Dumazet [Mon, 2 Oct 2023 13:17:37 +0000 (13:17 +0000)]
net_sched: sch_fq: add 3 bands and WRR scheduling

Before Google adopted FQ for its production servers,
we had to ensure AF4 packets would get a higher share
than BE1 ones.

As discussed this week in Netconf 2023 in Paris, it is time
to upstream this for public use.

After this patch FQ can replace pfifo_fast, with the following
differences :

- FQ uses WRR instead of strict prio, to avoid starvation of
  low priority packets.

- We make sure each band/prio tracks its own usage against sch->limit.
  This was done to make sure flood of low priority packets would not
  prevent AF4 packets to be queued. Contributed by Willem.

- priomap can be changed, if needed (default value are the ones
  coming from pfifo_fast).

In this patch, we set default band weights so that :

- high prio (band=0) packets get 90% of the bandwidth
  if they compete with low prio (band=2) packets.

- high prio packets get 75% of the bandwidth
  if they compete with medium prio (band=1) packets.

Following patch in this series adds the possibility to tune
the per-band weights.

As we added many fields in 'struct fq_sched_data', we had
to make sure to have the first cache line read-mostly, and
avoid wasting precious cache lines.

More optimizations are possible but will be sent separately.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet_sched: export pfifo_fast prio2band[]
Eric Dumazet [Mon, 2 Oct 2023 13:17:36 +0000 (13:17 +0000)]
net_sched: export pfifo_fast prio2band[]

pfifo_fast prio2band[] is renamed to sch_default_prio2band[]
and exported because we want to share it in FQ.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet_sched: sch_fq: remove q->ktime_cache
Eric Dumazet [Mon, 2 Oct 2023 13:17:35 +0000 (13:17 +0000)]
net_sched: sch_fq: remove q->ktime_cache

Now that both enqueue() and dequeue() need to use ktime_get_ns(),
there is no point wasting 8 bytes in struct fq_sched_data.

This makes room for future fields. ;)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agoMerge branch 'net-mana-fix-some-tx-processing-bugs'
Paolo Abeni [Thu, 5 Oct 2023 09:45:08 +0000 (11:45 +0200)]
Merge branch 'net-mana-fix-some-tx-processing-bugs'

Haiyang Zhang says:

====================
net: mana: Fix some TX processing bugs

Fix TX processing bugs on error handling, tso_bytes calculation,
and sge0 size.
====================

Link: https://lore.kernel.org/r/1696020147-14989-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: mana: Fix oversized sge0 for GSO packets
Haiyang Zhang [Fri, 29 Sep 2023 20:42:27 +0000 (13:42 -0700)]
net: mana: Fix oversized sge0 for GSO packets

Handle the case when GSO SKB linear length is too large.

MANA NIC requires GSO packets to put only the header part to SGE0,
otherwise the TX queue may stop at the HW level.

So, use 2 SGEs for the skb linear part which contains more than the
packet header.

Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: mana: Fix the tso_bytes calculation
Haiyang Zhang [Fri, 29 Sep 2023 20:42:26 +0000 (13:42 -0700)]
net: mana: Fix the tso_bytes calculation

sizeof(struct hop_jumbo_hdr) is not part of tso_bytes, so remove
the subtraction from header size.

Cc: stable@vger.kernel.org
Fixes: bd7fc6e1957c ("net: mana: Add new MANA VF performance counters for easier troubleshooting")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: mana: Fix TX CQE error handling
Haiyang Zhang [Fri, 29 Sep 2023 20:42:25 +0000 (13:42 -0700)]
net: mana: Fix TX CQE error handling

For an unknown TX CQE error type (probably from a newer hardware),
still free the SKB, update the queue tail, etc., otherwise the
accounting will be wrong.

Also, TX errors can be triggered by injecting corrupted packets, so
replace the WARN_ONCE to ratelimited error logging.

Cc: stable@vger.kernel.org
Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agocan: peak_pci: replace deprecated strncpy with strscpy
Justin Stitt [Thu, 5 Oct 2023 00:05:35 +0000 (00:05 +0000)]
can: peak_pci: replace deprecated strncpy with strscpy

`strncpy` is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

NUL-padding is not required since card is already zero-initialized:
|       card = kzalloc(sizeof(*card), GFP_KERNEL);

A suitable replacement is `strscpy` [2] due to the fact that it
guarantees NUL-termination on the destination buffer without
unnecessarily NUL-padding.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html
Link: https://github.com/KSPP/linux/issues/90
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/all/20231005-strncpy-drivers-net-can-sja1000-peak_pci-c-v1-1-c36e1702cd56@google.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
18 months agowifi: rtlwifi: remove unreachable code in rtl92d_dm_check_edca_turbo()
Dmitry Antipov [Tue, 3 Oct 2023 04:33:16 +0000 (07:33 +0300)]
wifi: rtlwifi: remove unreachable code in rtl92d_dm_check_edca_turbo()

Since '!(0x5ea42b & 0xffff0000)' is always false, remove unreachable
block in 'rtl92d_dm_check_edca_turbo()' and convert EDCA limits to
constant variables. Compile tested only.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231003043318.11370-1-dmantipov@yandex.ru
18 months agowifi: rtw89: debug: txpwr table supports Wi-Fi 7 chips
Zong-Zhe Yang [Tue, 3 Oct 2023 01:54:46 +0000 (09:54 +0800)]
wifi: rtw89: debug: txpwr table supports Wi-Fi 7 chips

We add TX power table format for Wi-Fi 7 chips. Since Wi-Fi 7 tables are
larger, in order to reuse some chunks, we extend code to process nested
entries. Now, dbgfs txpwr_table can work with Wi-Fi 7 chips.

An output example of dbgfs txpwr_table on Wi-Fi 7 chips is shown below.
...
[TX power byrate]
<< BW20 >>
CCK        -  1M      2M      5.5M   11M   |  20,  20,  20,  20, dBm
LEGACY     -  6M      9M      12M    18M   |  18,  18,  18,  18, dBm
LEGACY     -  24M     36M     48M    54M   |  18,  18,  17,  16, dBm
EHT        -  MCS14   MCS15  |   0,   0, dBm
DLRU_EHT   -  MCS14   MCS15  |   0,  18, dBm
MCS_1SS        -  MCS0    MCS1    MCS2   MCS3  |  18,  18,  18,  18, dBm
MCS_1SS        -  MCS4    MCS5    MCS6   MCS7  |  18,  17,  16,  15, dBm
MCS_1SS        -  MCS8    MCS9    MCS10  MCS11 |  14,  13,  12,  11, dBm
MCS_1SS        -  MCS12   MCS13  |  10,   9, dBm
HEDCM_1SS      -  MCS0    MCS1    MCS3   MCS4  |  18,  18,  18,  18, dBm
DLRU_MCS_1SS   -  MCS0    MCS1    MCS2   MCS3  |  18,  18,  18,  18, dBm
DLRU_MCS_1SS   -  MCS4    MCS5    MCS6   MCS7  |  18,  17,  16,  15, dBm
DLRU_MCS_1SS   -  MCS8    MCS9    MCS10  MCS11 |  14,  13,  12,  11, dBm
DLRU_MCS_1SS   -  MCS12   MCS13  |  10,   9, dBm
DLRU_HEDCM_1SS -  MCS0    MCS1    MCS3   MCS4  |  18,  18,  18,  18, dBm
MCS_2SS        -  MCS0    MCS1    MCS2   MCS3  |  18,  18,  18,  18, dBm
...
[TX power limit]
<< 1TX >>
CCK_20M     -  NON_BF  BF |   0,   0, dBm
CCK_40M     -  NON_BF  BF |   0,   0, dBm
OFDM        -  NON_BF  BF |  18,   0, dBm
MCS_20M_0   -  NON_BF  BF |  18,   0, dBm
MCS_20M_1   -  NON_BF  BF |   0,   0, dBm
...

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231003015446.14658-8-pkshih@realtek.com
18 months agowifi: rtw89: debug: show txpwr table according to chip gen
Zong-Zhe Yang [Tue, 3 Oct 2023 01:54:45 +0000 (09:54 +0800)]
wifi: rtw89: debug: show txpwr table according to chip gen

Since current TX power stuffs are for ax chips, add a suffix `_ax` to
them. Then, when requested to show txpwr table, select table according
to chip generation first.

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231003015446.14658-7-pkshih@realtek.com
18 months agowifi: rtw89: phy: set TX power RU limit according to chip gen
Zong-Zhe Yang [Tue, 3 Oct 2023 01:54:44 +0000 (09:54 +0800)]
wifi: rtw89: phy: set TX power RU limit according to chip gen

Wi-Fi 6 chips and Wi-Fi 7 chips have different register design for TX
power RU limit. We rename original setting stuffs with a suffix `_ax`,
concentrate related enum declaration in phy.h, and implement setting
flow for Wi-Fi 7 chips. Then, we set TX power RU limit according to
chip generation.

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231003015446.14658-6-pkshih@realtek.com
18 months agowifi: rtw89: phy: set TX power limit according to chip gen
Zong-Zhe Yang [Tue, 3 Oct 2023 01:54:43 +0000 (09:54 +0800)]
wifi: rtw89: phy: set TX power limit according to chip gen

Wi-Fi 6 chips and Wi-Fi 7 chips have different register design for
TX power limit. We rename original setting stuffs with a suffix `_ax`,
concentrate related enum declaration in phy.h, and implement setting
flow for Wi-Fi 7 chips. Then, we set TX power limit according to chip
generation.

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231003015446.14658-5-pkshih@realtek.com
18 months agowifi: rtw89: phy: set TX power offset according to chip gen
Zong-Zhe Yang [Tue, 3 Oct 2023 01:54:42 +0000 (09:54 +0800)]
wifi: rtw89: phy: set TX power offset according to chip gen

We have a register to control TX power of each rate section to increase
or decrease an offset. But, Wi-Fi 6 chips and Wi-Fi 7 chips have different
address and format for this control register. We rename original setting
stuffs with a suffix `_ax` and implement setting flow for Wi-Fi 7 chips.
Then, we set TX power offset according to chip generation.

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231003015446.14658-4-pkshih@realtek.com
18 months agowifi: rtw89: phy: set TX power by rate according to chip gen
Zong-Zhe Yang [Tue, 3 Oct 2023 01:54:41 +0000 (09:54 +0800)]
wifi: rtw89: phy: set TX power by rate according to chip gen

Wi-Fi 6 chips and Wi-Fi 7 chips have different register design for
TX power by rate. We rename original setting stuffs with a suffix
`_ax` and implement setting flow for Wi-Fi 7 chips. Then, we set TX
power by rate according to chip generation.

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231003015446.14658-3-pkshih@realtek.com
18 months agowifi: rtw89: mac: get TX power control register according to chip gen
Zong-Zhe Yang [Tue, 3 Oct 2023 01:54:40 +0000 (09:54 +0800)]
wifi: rtw89: mac: get TX power control register according to chip gen

There are two difference between Wi-Fi 6 and Wi-Fi 7 chips.
1. Address range of TX power control register
2. Checking code to get a TX power control register

So, separate the implementation of them, access according to
chip generation, and rename original things with a suffix `_ax`.

Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231003015446.14658-2-pkshih@realtek.com
18 months agoMerge tag 'rtla-v6.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bristot...
Linus Torvalds [Thu, 5 Oct 2023 01:19:55 +0000 (18:19 -0700)]
Merge tag 'rtla-v6.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux

Pull rtla fixes from Daniel Bristot de Oliveira:
 "rtla (Real-Time Linux Analysis) tool fixes.

  Timerlat auto-analysis:

   - Timerlat is reporting thread interference time without thread noise
     events occurrence. It was caused because the thread interference
     variable was not reset after the analysis of a timerlat activation
     that did not hit the threshold.

   - The IRQ handler delay is estimated from the delta of the IRQ
     latency reported by timerlat, and the timestamp from IRQ handler
     start event. If the delta is near-zero, the drift from the external
     clock and the trace event and/or the overhead can cause the value
     to be negative. If the value is negative, print a zero-delay.

   - IRQ handlers happening after the timerlat thread event but before
     the stop tracing were being reported as IRQ that happened before
     the *current* IRQ occurrence. Ignore Previous IRQ noise in this
     condition because they are valid only for the *next* timerlat
     activation.

  Timerlat user-space:

   - Timerlat is stopping all user-space thread if a CPU becomes
     offline. Do not stop the entire tool if a CPU is/become offline,
     but only the thread of the unavailable CPU. Stop the tool only, if
     all threads leave because the CPUs become/are offline.

  man-pages:

   - Fix command line example in timerlat hist man page"

* tag 'rtla-v6.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux:
  rtla: fix a example in rtla-timerlat-hist.rst
  rtla/timerlat: Do not stop user-space if a cpu is offline
  rtla/timerlat_aa: Fix previous IRQ delay for IRQs that happens after thread sample
  rtla/timerlat_aa: Fix negative IRQ delay
  rtla/timerlat_aa: Zero thread sum after every sample analysis

18 months agoMerge branch 'ynl-makefile-cleanup'
Jakub Kicinski [Thu, 5 Oct 2023 00:33:56 +0000 (17:33 -0700)]
Merge branch 'ynl-makefile-cleanup'

Jakub Kicinski says:

====================
ynl Makefile cleanup

While catching up on recent changes I noticed unexpected
changes to Makefiles in YNL. Indeed they were not working
as intended but the fixes put in place were not what I had
in mind :)
====================

Link: https://lore.kernel.org/r/20231003153416.2479808-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agotools: ynl: use uAPI include magic for samples
Jakub Kicinski [Tue, 3 Oct 2023 15:34:16 +0000 (08:34 -0700)]
tools: ynl: use uAPI include magic for samples

Makefile.deps provides direct includes in CFLAGS_$(obj).
We just need to rewrite the rules to make use of the extra
flags, no need to hard-include all of tools/include/uapi.

Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20231003153416.2479808-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agotools: ynl: don't regen on every make
Jakub Kicinski [Tue, 3 Oct 2023 15:34:15 +0000 (08:34 -0700)]
tools: ynl: don't regen on every make

As far as I can tell the normal Makefile dependency tracking
works, generated files get re-generated if the YAML was updated.
Let make do its job, don't force the re-generation.
make hardclean can be used to force regeneration.

Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20231003153416.2479808-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoynl: netdev: drop unnecessary enum-as-flags
Jakub Kicinski [Tue, 3 Oct 2023 15:34:14 +0000 (08:34 -0700)]
ynl: netdev: drop unnecessary enum-as-flags

enum-as-flags can be used when enum declares bit positions but
we want to carry bitmask in an attribute. If the definition
is already provided as flags there's no need to indicate
the flag-iness of the attribute.

Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20231003153416.2479808-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonetlink: annotate data-races around sk->sk_err
Eric Dumazet [Tue, 3 Oct 2023 18:34:55 +0000 (18:34 +0000)]
netlink: annotate data-races around sk->sk_err

syzbot caught another data-race in netlink when
setting sk->sk_err.

Annotate all of them for good measure.

BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg

write to 0xffff8881613bb220 of 4 bytes by task 28147 on cpu 0:
netlink_recvmsg+0x448/0x780 net/netlink/af_netlink.c:1994
sock_recvmsg_nosec net/socket.c:1027 [inline]
sock_recvmsg net/socket.c:1049 [inline]
__sys_recvfrom+0x1f4/0x2e0 net/socket.c:2229
__do_sys_recvfrom net/socket.c:2247 [inline]
__se_sys_recvfrom net/socket.c:2243 [inline]
__x64_sys_recvfrom+0x78/0x90 net/socket.c:2243
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

write to 0xffff8881613bb220 of 4 bytes by task 28146 on cpu 1:
netlink_recvmsg+0x448/0x780 net/netlink/af_netlink.c:1994
sock_recvmsg_nosec net/socket.c:1027 [inline]
sock_recvmsg net/socket.c:1049 [inline]
__sys_recvfrom+0x1f4/0x2e0 net/socket.c:2229
__do_sys_recvfrom net/socket.c:2247 [inline]
__se_sys_recvfrom net/socket.c:2243 [inline]
__x64_sys_recvfrom+0x78/0x90 net/socket.c:2243
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x00000000 -> 0x00000016

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 28146 Comm: syz-executor.0 Not tainted 6.6.0-rc3-syzkaller-00055-g9ed22ae6be81 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231003183455.3410550-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: skb_queue_purge_reason() optimizations
Eric Dumazet [Tue, 3 Oct 2023 18:19:20 +0000 (18:19 +0000)]
net: skb_queue_purge_reason() optimizations

1) Exit early if the list is empty.

2) splice the list into a local list,
   so that we block hard irqs only once.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231003181920.3280453-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agosctp: update hb timer immediately after users change hb_interval
Xin Long [Sun, 1 Oct 2023 15:04:20 +0000 (11:04 -0400)]
sctp: update hb timer immediately after users change hb_interval

Currently, when hb_interval is changed by users, it won't take effect
until the next expiry of hb timer. As the default value is 30s, users
have to wait up to 30s to wait its hb_interval update to work.

This becomes pretty bad in containers where a much smaller value is
usually set on hb_interval. This patch improves it by resetting the
hb timer immediately once the value of hb_interval is updated by users.

Note that we don't address the already existing 'problem' when sending
a heartbeat 'on demand' if one hb has just been sent(from the timer)
mentioned in:

  https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg590224.html

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/r/75465785f8ee5df2fb3acdca9b8fafdc18984098.1696172660.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agosctp: update transport state when processing a dupcook packet
Xin Long [Sun, 1 Oct 2023 14:58:45 +0000 (10:58 -0400)]
sctp: update transport state when processing a dupcook packet

During the 4-way handshake, the transport's state is set to ACTIVE in
sctp_process_init() when processing INIT_ACK chunk on client or
COOKIE_ECHO chunk on server.

In the collision scenario below:

  192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408]
    192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885]
    192.168.1.2 > 192.168.1.1: sctp (1) [INIT ACK] [init tag: 3922216408]
    192.168.1.1 > 192.168.1.2: sctp (1) [COOKIE ECHO]
    192.168.1.2 > 192.168.1.1: sctp (1) [COOKIE ACK]
  192.168.1.1 > 192.168.1.2: sctp (1) [INIT ACK] [init tag: 3914796021]

when processing COOKIE_ECHO on 192.168.1.2, as it's in COOKIE_WAIT state,
sctp_sf_do_dupcook_b() is called by sctp_sf_do_5_2_4_dupcook() where it
creates a new association and sets its transport to ACTIVE then updates
to the old association in sctp_assoc_update().

However, in sctp_assoc_update(), it will skip the transport update if it
finds a transport with the same ipaddr already existing in the old asoc,
and this causes the old asoc's transport state not to move to ACTIVE
after the handshake.

This means if DATA retransmission happens at this moment, it won't be able
to enter PF state because of the check 'transport->state == SCTP_ACTIVE'
in sctp_do_8_2_transport_strike().

This patch fixes it by updating the transport in sctp_assoc_update() with
sctp_assoc_add_peer() where it updates the transport state if there is
already a transport with the same ipaddr exists in the old asoc.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/r/fd17356abe49713ded425250cc1ae51e9f5846c6.1696172325.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
Jakub Kicinski [Thu, 5 Oct 2023 00:27:24 +0000 (17:27 -0700)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-10-03 (i40e, iavf)

This series contains updates to i40e and iavf drivers.

Yajun Deng aligns reporting of buffer exhaustion statistics to follow
documentation for i40e.

Jake removes undesired 'inline' from functions in iavf.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  iavf: remove "inline" functions from iavf_txrx.c
  i40e: Add rx_missed_errors for buffer exhaustion
====================

Link: https://lore.kernel.org/r/20231003223610.2004976-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge branch 'fix-a-couple-recent-instances-of-wincompatible-function-pointer-types...
Jakub Kicinski [Thu, 5 Oct 2023 00:15:06 +0000 (17:15 -0700)]
Merge branch 'fix-a-couple-recent-instances-of-wincompatible-function-pointer-types-strict-from-mode_get-implementations'

Nathan Chancellor says:

====================
Fix a couple recent instances of -Wincompatible-function-pointer-types-strict from ->mode_get() implementations

This series fixes a couple of instances of
-Wincompatible-function-pointer-types-strict that were introduced by a
recent series that added a new type of ops, struct dpll_device_ops,
along with implementations of the callback ->mode_get() that had a
mismatched mode type.

This warning is not currently enabled for any build but I am planning on
submitting a patch to add it to W=1 to prevent new instances of the
warning from popping up while we try and fix the existing instances in
other drivers.

This series is based on current net-next but if they need to go into
individual maintainer trees, please feel free to take the patches
individually.
====================

Link: https://lore.kernel.org/r/20231002-net-wifpts-dpll_mode_get-v1-0-a356a16413cf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agomlx5: Fix type of mode parameter in mlx5_dpll_device_mode_get()
Nathan Chancellor [Mon, 2 Oct 2023 20:55:21 +0000 (13:55 -0700)]
mlx5: Fix type of mode parameter in mlx5_dpll_device_mode_get()

When building with -Wincompatible-function-pointer-types-strict, a
warning designed to catch potential kCFI failures at build time rather
than run time due to incorrect function pointer types, there is a
warning due to a mismatch between the type of the mode parameter in
mlx5_dpll_device_mode_get() vs. what the function pointer prototype for
->mode_get() in 'struct dpll_device_ops' expects.

  drivers/net/ethernet/mellanox/mlx5/core/dpll.c:141:14: error: incompatible function pointer types initializing 'int (*)(const struct dpll_device *, void *, enum dpll_mode *, struct netlink_ext_ack *)' with an expression of type 'int (const struct dpll_device *, void *, u32 *, struct netlink_ext_ack *)' (aka 'int (const struct dpll_device *, void *, unsigned int *, struct netlink_ext_ack *)') [-Werror,-Wincompatible-function-pointer-types-strict]
    141 |         .mode_get = mlx5_dpll_device_mode_get,
        |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
  1 error generated.

Change the type of the mode parameter in mlx5_dpll_device_mode_get() to
clear up the warning and avoid kCFI failures at run time.

Fixes: 496fd0a26bbf ("mlx5: Implement SyncE support using DPLL infrastructure")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231002-net-wifpts-dpll_mode_get-v1-2-a356a16413cf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoptp: Fix type of mode parameter in ptp_ocp_dpll_mode_get()
Nathan Chancellor [Mon, 2 Oct 2023 20:55:20 +0000 (13:55 -0700)]
ptp: Fix type of mode parameter in ptp_ocp_dpll_mode_get()

When building with -Wincompatible-function-pointer-types-strict, a
warning designed to catch potential kCFI failures at build time rather
than run time due to incorrect function pointer types, there is a
warning due to a mismatch between the type of the mode parameter in
ptp_ocp_dpll_mode_get() vs. what the function pointer prototype for
->mode_get() in 'struct dpll_device_ops' expects.

  drivers/ptp/ptp_ocp.c:4353:14: error: incompatible function pointer types initializing 'int (*)(const struct dpll_device *, void *, enum dpll_mode *, struct netlink_ext_ack *)' with an expression of type 'int (const struct dpll_device *, void *, u32 *, struct netlink_ext_ack *)' (aka 'int (const struct dpll_device *, void *, unsigned int *, struct netlink_ext_ack *)') [-Werror,-Wincompatible-function-pointer-types-strict]
   4353 |         .mode_get = ptp_ocp_dpll_mode_get,
        |                     ^~~~~~~~~~~~~~~~~~~~~
  1 error generated.

Change the type of the mode parameter in ptp_ocp_dpll_mode_get() to
clear up the warning and avoid kCFI failures at run time.

Fixes: 09eeb3aecc6c ("ptp_ocp: implement DPLL ops")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231002-net-wifpts-dpll_mode_get-v1-1-a356a16413cf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge branch 'r8152-modify-rx_bottom'
Jakub Kicinski [Wed, 4 Oct 2023 23:51:33 +0000 (16:51 -0700)]
Merge branch 'r8152-modify-rx_bottom'

Hayes Wang says:

====================
r8152: modify rx_bottom

v3:
For patch #1, this patch is replaced. The new patch only break the loop,
and keep that the driver would queue the rx packets.

For patch #2, modify the code depends on patch #1. For work_down < budget,
napi_get_frags() and napi_gro_frags() would be used. For the others,
nothing is changed.

v2:
For patch #1, add comment, update commit message, and add Fixes tag.

v1:
These patches are used to improve rx_bottom().
====================

Link: https://lore.kernel.org/r/20230926111714.9448-432-nic_swsd@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agor8152: use napi_gro_frags
Hayes Wang [Tue, 26 Sep 2023 11:17:14 +0000 (19:17 +0800)]
r8152: use napi_gro_frags

Use napi_gro_frags() for the skb of fragments when the work_done is less
than budget.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/20230926111714.9448-434-nic_swsd@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agor8152: break the loop when the budget is exhausted
Hayes Wang [Tue, 26 Sep 2023 11:17:13 +0000 (19:17 +0800)]
r8152: break the loop when the budget is exhausted

A bulk transfer of the USB may contain many packets. And, the total
number of the packets in the bulk transfer may be more than budget.

Originally, only budget packets would be handled by napi_gro_receive(),
and the other packets would be queued in the driver for next schedule.

This patch would break the loop about getting next bulk transfer, when
the budget is exhausted. That is, only the current bulk transfer would
be handled, and the other bulk transfers would be queued for next
schedule. Besides, the packets which are more than the budget in the
current bulk trasnfer would be still queued in the driver, as the
original method.

In addition, a bulk transfer wouldn't contain more than 400 packets, so
the check of queue length is unnecessary. Therefore, I replace it with
WARN_ON_ONCE().

Fixes: cf74eb5a5bc8 ("eth: r8152: try to use a normal budget")
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/20230926111714.9448-433-nic_swsd@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge branch 'chelsio-annotate-structs-with-__counted_by'
Jakub Kicinski [Wed, 4 Oct 2023 22:37:15 +0000 (15:37 -0700)]
Merge branch 'chelsio-annotate-structs-with-__counted_by'

Kees Cook says:

====================
chelsio: Annotate structs with __counted_by

This annotates several chelsio structures with the coming __counted_by
attribute for bounds checking of flexible arrays at run-time. For more details,
see commit dd06e72e68bc ("Compiler Attributes: Add __counted_by macro").
====================

Link: https://lore.kernel.org/r/20230929181042.work.990-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agocxgb4: Annotate struct smt_data with __counted_by
Kees Cook [Fri, 29 Sep 2023 18:11:49 +0000 (11:11 -0700)]
cxgb4: Annotate struct smt_data with __counted_by

Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct smt_data.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230929181149.3006432-5-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agocxgb4: Annotate struct sched_table with __counted_by
Kees Cook [Fri, 29 Sep 2023 18:11:48 +0000 (11:11 -0700)]
cxgb4: Annotate struct sched_table with __counted_by

Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct sched_table.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230929181149.3006432-4-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agocxgb4: Annotate struct cxgb4_tc_u32_table with __counted_by
Kees Cook [Fri, 29 Sep 2023 18:11:47 +0000 (11:11 -0700)]
cxgb4: Annotate struct cxgb4_tc_u32_table with __counted_by

Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct cxgb4_tc_u32_table.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230929181149.3006432-3-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agocxgb4: Annotate struct clip_tbl with __counted_by
Kees Cook [Fri, 29 Sep 2023 18:11:46 +0000 (11:11 -0700)]
cxgb4: Annotate struct clip_tbl with __counted_by

Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct clip_tbl.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230929181149.3006432-2-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agochelsio/l2t: Annotate struct l2t_data with __counted_by
Kees Cook [Fri, 29 Sep 2023 18:11:45 +0000 (11:11 -0700)]
chelsio/l2t: Annotate struct l2t_data with __counted_by

Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct l2t_data.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230929181149.3006432-1-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agotcp: fix delayed ACKs for MSS boundary condition
Neal Cardwell [Sun, 1 Oct 2023 15:12:39 +0000 (11:12 -0400)]
tcp: fix delayed ACKs for MSS boundary condition

This commit fixes poor delayed ACK behavior that can cause poor TCP
latency in a particular boundary condition: when an application makes
a TCP socket write that is an exact multiple of the MSS size.

The problem is that there is painful boundary discontinuity in the
current delayed ACK behavior. With the current delayed ACK behavior,
we have:

(1) If an app reads data when > 1*MSS is unacknowledged, then
    tcp_cleanup_rbuf() ACKs immediately because of:

     tp->rcv_nxt - tp->rcv_wup > icsk->icsk_ack.rcv_mss ||

(2) If an app reads all received data, and the packets were < 1*MSS,
    and either (a) the app is not ping-pong or (b) we received two
    packets < 1*MSS, then tcp_cleanup_rbuf() ACKs immediately beecause
    of:

     ((icsk->icsk_ack.pending & ICSK_ACK_PUSHED2) ||
      ((icsk->icsk_ack.pending & ICSK_ACK_PUSHED) &&
       !inet_csk_in_pingpong_mode(sk))) &&

(3) *However*: if an app reads exactly 1*MSS of data,
    tcp_cleanup_rbuf() does not send an immediate ACK. This is true
    even if the app is not ping-pong and the 1*MSS of data had the PSH
    bit set, suggesting the sending application completed an
    application write.

Thus if the app is not ping-pong, we have this painful case where
>1*MSS gets an immediate ACK, and <1*MSS gets an immediate ACK, but a
write whose last skb is an exact multiple of 1*MSS can get a 40ms
delayed ACK. This means that any app that transfers data in one
direction and takes care to align write size or packet size with MSS
can suffer this problem. With receive zero copy making 4KB MSS values
more common, it is becoming more common to have application writes
naturally align with MSS, and more applications are likely to
encounter this delayed ACK problem.

The fix in this commit is to refine the delayed ACK heuristics with a
simple check: immediately ACK a received 1*MSS skb with PSH bit set if
the app reads all data. Why? If an skb has a len of exactly 1*MSS and
has the PSH bit set then it is likely the end of an application
write. So more data may not be arriving soon, and yet the data sender
may be waiting for an ACK if cwnd-bound or using TX zero copy. Thus we
set ICSK_ACK_PUSHED in this case so that tcp_cleanup_rbuf() will send
an ACK immediately if the app reads all of the data and is not
ping-pong. Note that this logic is also executed for the case where
len > MSS, but in that case this logic does not matter (and does not
hurt) because tcp_cleanup_rbuf() will always ACK immediately if the
app reads data and there is more than an MSS of unACKed data.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Cc: Xin Guo <guoxin0309@gmail.com>
Link: https://lore.kernel.org/r/20231001151239.1866845-2-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agotcp: fix quick-ack counting to count actual ACKs of new data
Neal Cardwell [Sun, 1 Oct 2023 15:12:38 +0000 (11:12 -0400)]
tcp: fix quick-ack counting to count actual ACKs of new data

This commit fixes quick-ack counting so that it only considers that a
quick-ack has been provided if we are sending an ACK that newly
acknowledges data.

The code was erroneously using the number of data segments in outgoing
skbs when deciding how many quick-ack credits to remove. This logic
does not make sense, and could cause poor performance in
request-response workloads, like RPC traffic, where requests or
responses can be multi-segment skbs.

When a TCP connection decides to send N quick-acks, that is to
accelerate the cwnd growth of the congestion control module
controlling the remote endpoint of the TCP connection. That quick-ack
decision is purely about the incoming data and outgoing ACKs. It has
nothing to do with the outgoing data or the size of outgoing data.

And in particular, an ACK only serves the intended purpose of allowing
the remote congestion control to grow the congestion window quickly if
the ACK is ACKing or SACKing new data.

The fix is simple: only count packets as serving the goal of the
quickack mechanism if they are ACKing/SACKing new data. We can tell
whether this is the case by checking inet_csk_ack_scheduled(), since
we schedule an ACK exactly when we are ACKing/SACKing new data.

Fixes: fc6415bcb0f5 ("[TCP]: Fix quick-ack decrementing with TSO.")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231001151239.1866845-1-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge tag 'nf-23-10-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Jakub Kicinski [Wed, 4 Oct 2023 21:53:17 +0000 (14:53 -0700)]
Merge tag 'nf-23-10-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter patches for net

First patch resolves a regression with vlan header matching, this was
broken since 6.5 release.  From myself.

Second patch fixes an ancient problem with sctp connection tracking in
case INIT_ACK packets are delayed.  This comes with a selftest, both
patches from Xin Long.

Patch 4 extends the existing nftables audit selftest, from
Phil Sutter.

Patch 5, also from Phil, avoids a situation where nftables
would emit an audit record twice. This was broken since 5.13 days.

Patch 6, from myself, avoids spurious insertion failure if we encounter an
overlapping but expired range during element insertion with the
'nft_set_rbtree' backend. This problem exists since 6.2.

* tag 'nf-23-10-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure
  netfilter: nf_tables: Deduplicate nft_register_obj audit logs
  selftests: netfilter: Extend nft_audit.sh
  selftests: netfilter: test for sctp collision processing in nf_conntrack
  netfilter: handle the connecting collision properly in nf_conntrack_proto_sctp
  netfilter: nft_payload: rebuild vlan header on h_proto access
====================

Link: https://lore.kernel.org/r/20231004141405.28749-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge tag 'nf-next-23-09-28' of https://git.kernel.org/pub/scm/linux/kernel/git/netfi...
Jakub Kicinski [Wed, 4 Oct 2023 21:25:37 +0000 (14:25 -0700)]
Merge tag 'nf-next-23-09-28' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Florian Westphal says:

====================
netfilter updates for net-next

First patch, from myself, is a bug fix. The issue (connect timeout) is
ancient, so I think its safe to give this more soak time given the esoteric
conditions needed to trigger this.
Also updates the existing selftest to cover this.

Add netlink extacks when an update references a non-existent
table/chain/set.  This allows userspace to provide much better
errors to the user, from Pablo Neira Ayuso.

Last patch adds more policy checks to nf_tables as a better
alternative to the existing runtime checks, from Phil Sutter.

* tag 'nf-next-23-09-28' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: Utilize NLA_POLICY_NESTED_ARRAY
  netfilter: nf_tables: missing extended netlink error in lookup functions
  selftests: netfilter: test nat source port clash resolution interaction with tcp early demux
  netfilter: nf_nat: undo erroneous tcp edemux lookup after port clash
====================

Link: https://lore.kernel.org/r/20230928144916.18339-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agopage_pool: fix documentation typos
Randy Dunlap [Sun, 1 Oct 2023 00:38:45 +0000 (17:38 -0700)]
page_pool: fix documentation typos

Correct grammar for better readability.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/r/20231001003846.29541-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agosctp: Spelling s/preceeding/preceding/g
Geert Uytterhoeven [Thu, 28 Sep 2023 12:17:48 +0000 (14:17 +0200)]
sctp: Spelling s/preceeding/preceding/g

Fix a misspelling of "preceding".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/663b14d07d6d716ddc34482834d6b65a2f714cfb.1695903447.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agotipc: fix a potential deadlock on &tx->lock
Chengfeng Ye [Wed, 27 Sep 2023 18:14:14 +0000 (18:14 +0000)]
tipc: fix a potential deadlock on &tx->lock

It seems that tipc_crypto_key_revoke() could be be invoked by
wokequeue tipc_crypto_work_rx() under process context and
timer/rx callback under softirq context, thus the lock acquisition
on &tx->lock seems better use spin_lock_bh() to prevent possible
deadlock.

This flaw was found by an experimental static analysis tool I am
developing for irq-related deadlock.

tipc_crypto_work_rx() <workqueue>
--> tipc_crypto_key_distr()
--> tipc_bcast_xmit()
--> tipc_bcbase_xmit()
--> tipc_bearer_bc_xmit()
--> tipc_crypto_xmit()
--> tipc_ehdr_build()
--> tipc_crypto_key_revoke()
--> spin_lock(&tx->lock)
<timer interrupt>
   --> tipc_disc_timeout()
   --> tipc_bearer_xmit_skb()
   --> tipc_crypto_xmit()
   --> tipc_ehdr_build()
   --> tipc_crypto_key_revoke()
   --> spin_lock(&tx->lock) <deadlock here>

Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication")
Link: https://lore.kernel.org/r/20230927181414.59928-1-dg573847474@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: stmmac: dwmac-stm32: fix resume on STM32 MCU
Ben Wolsieffer [Wed, 27 Sep 2023 17:57:49 +0000 (13:57 -0400)]
net: stmmac: dwmac-stm32: fix resume on STM32 MCU

The STM32MP1 keeps clk_rx enabled during suspend, and therefore the
driver does not enable the clock in stm32_dwmac_init() if the device was
suspended. The problem is that this same code runs on STM32 MCUs, which
do disable clk_rx during suspend, causing the clock to never be
re-enabled on resume.

This patch adds a variant flag to indicate that clk_rx remains enabled
during suspend, and uses this to decide whether to enable the clock in
stm32_dwmac_init() if the device was suspended.

This approach fixes this specific bug with limited opportunity for
unintended side-effects, but I have a follow up patch that will refactor
the clock configuration and hopefully make it less error prone.

Fixes: 6528e02cc9ff ("net: ethernet: stmmac: add adaptation for stm32mp157c.")
Signed-off-by: Ben Wolsieffer <ben.wolsieffer@hefring.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230927175749.1419774-1-ben.wolsieffer@hefring.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoptp: ocp: fix error code in probe()
Dan Carpenter [Wed, 27 Sep 2023 12:55:10 +0000 (15:55 +0300)]
ptp: ocp: fix error code in probe()

There is a copy and paste error so this uses a valid pointer instead of
an error pointer.

Fixes: 09eeb3aecc6c ("ptp_ocp: implement DPLL ops")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/r/5c581336-0641-48bd-88f7-51984c3b1f79@moroto.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: dsa: mt753x: remove mt753x_phylink_pcs_link_up()
Russell King (Oracle) [Wed, 27 Sep 2023 12:13:56 +0000 (13:13 +0100)]
net: dsa: mt753x: remove mt753x_phylink_pcs_link_up()

Remove the mt753x_phylink_pcs_link_up() function for two reasons:

1) priv->pcs[i].pcs.neg_mode is set true, meaning it doesn't take a
   MLO_AN_FIXED anymore, but one of PHYLINK_PCS_NEG_*. However, this
   is inconsequential due to...
2) priv->pcs[port].pcs.ops is always initialised to point at
   mt7530_pcs_ops, which does not have a pcs_link_up() member.

So, let's remove mt753x_phylink_pcs_link_up() entirely.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/E1qlTQS-008BWe-Va@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: appletalk: remove cops support
Greg Kroah-Hartman [Wed, 27 Sep 2023 09:00:30 +0000 (11:00 +0200)]
net: appletalk: remove cops support

The COPS Appletalk support is very old, never said to actually work
properly, and the firmware code for the devices are under a very suspect
license.  Remove it all to clear up the license issue, if it is still
needed and actually used by anyone, we can add it back later once the
license is cleared up.

Reported-by: Prarit Bhargava <prarit@redhat.com>
Cc: jschlst@samba.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20230927090029.44704-2-gregkh@linuxfoundation.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoipv4: Set offload_failed flag in fibmatch results
Benjamin Poirier [Tue, 26 Sep 2023 18:27:30 +0000 (14:27 -0400)]
ipv4: Set offload_failed flag in fibmatch results

Due to a small omission, the offload_failed flag is missing from ipv4
fibmatch results. Make sure it is set correctly.

The issue can be witnessed using the following commands:
echo "1 1" > /sys/bus/netdevsim/new_device
ip link add dummy1 up type dummy
ip route add 192.0.2.0/24 dev dummy1
echo 1 > /sys/kernel/debug/netdevsim/netdevsim1/fib/fail_route_offload
ip route add 198.51.100.0/24 dev dummy1
ip route
# 192.168.15.0/24 has rt_trap
# 198.51.100.0/24 has rt_offload_failed
ip route get 192.168.15.1 fibmatch
# Result has rt_trap
ip route get 198.51.100.1 fibmatch
# Result differs from the route shown by `ip route`, it is missing
# rt_offload_failed
ip link del dev dummy1
echo 1 > /sys/bus/netdevsim/del_device

Fixes: 36c5100e859d ("IPv4: Add "offload failed" indication to routes")
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230926182730.231208-1-bpoirier@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge tag 'linux-kselftest-fixes-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Wed, 4 Oct 2023 18:35:23 +0000 (11:35 -0700)]
Merge tag 'linux-kselftest-fixes-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fix from Shuah Khan:
 "One single fix to Makefile to fix the incorrect TARGET name for uevent
  test"

* tag 'linux-kselftest-fixes-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: Fix wrong TARGET in kselftest top level Makefile

18 months agoMerge tag 'wireless-2023-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Wed, 4 Oct 2023 18:30:21 +0000 (11:30 -0700)]
Merge tag 'wireless-2023-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================

Quite a collection of fixes this time, really too many
to list individually. Many stack fixes, even rfkill
(found by simulation and the new eevdf scheduler)!

Also a bigger maintainers file cleanup, to remove old
and redundant information.

* tag 'wireless-2023-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (32 commits)
  wifi: iwlwifi: mvm: Fix incorrect usage of scan API
  wifi: mac80211: Create resources for disabled links
  wifi: cfg80211: avoid leaking stack data into trace
  wifi: mac80211: allow transmitting EAPOL frames with tainted key
  wifi: mac80211: work around Cisco AP 9115 VHT MPDU length
  wifi: cfg80211: Fix 6GHz scan configuration
  wifi: mac80211: fix potential key leak
  wifi: mac80211: fix potential key use-after-free
  wifi: mt76: mt76x02: fix MT76x0 external LNA gain handling
  wifi: brcmfmac: Replace 1-element arrays with flexible arrays
  wifi: mwifiex: Fix oob check condition in mwifiex_process_rx_packet
  wifi: rtw88: rtw8723d: Fix MAC address offset in EEPROM
  rfkill: sync before userspace visibility/changes
  wifi: mac80211: fix mesh id corruption on 32 bit systems
  wifi: cfg80211: add missing kernel-doc for cqm_rssi_work
  wifi: cfg80211: fix cqm_config access race
  wifi: iwlwifi: mvm: Fix a memory corruption issue
  wifi: iwlwifi: Ensure ack flag is properly cleared.
  wifi: iwlwifi: dbg_ini: fix structure packing
  iwlwifi: mvm: handle PS changes in vif_cfg_changed
  ...
====================

Link: https://lore.kernel.org/r/20230927095835.25803-2-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoRevert "bnxt_en: Support QOS and TPID settings for the SRIOV VLAN"
Jakub Kicinski [Wed, 4 Oct 2023 18:20:29 +0000 (11:20 -0700)]
Revert "bnxt_en: Support QOS and TPID settings for the SRIOV VLAN"

This reverts commit e76d44fe722761f5480b908e38c5ce1a2c2cb6d6.

We no longer accept drivers extending their use of the legacy
SR-IOV configuration APIs. Users should move to bridge offload.

Link: https://lore.kernel.org/r/20231004112243.41cb6351@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodt-bindings: net: fec: Add imx8dxl description
Fabio Estevam [Tue, 26 Sep 2023 11:10:17 +0000 (08:10 -0300)]
dt-bindings: net: fec: Add imx8dxl description

The imx8dl FEC has the same programming model as the one on the imx8qxp.

Add the imx8dl compatible string.

Signed-off-by: Fabio Estevam <festevam@denx.de>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230926111017.320409-1-festevam@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoice: fix linking when CONFIG_PTP_1588_CLOCK=n
Jacob Keller [Mon, 2 Oct 2023 18:51:32 +0000 (11:51 -0700)]
ice: fix linking when CONFIG_PTP_1588_CLOCK=n

The recent support for DPLL introduced by commit 8a3a565ff210 ("ice: add
admin commands to access cgu configuration") and commit d7999f5ea64b ("ice:
implement dpll interface to control cgu") broke linking the ice driver if
CONFIG_PTP_1588_CLOCK=n:

ld: vmlinux.o: in function `ice_init_feature_support':
(.text+0x8702b8): undefined reference to `ice_is_phy_rclk_present'
ld: (.text+0x8702cd): undefined reference to `ice_is_cgu_present'
ld: (.text+0x8702d9): undefined reference to `ice_is_clock_mux_present_e810t'
ld: vmlinux.o: in function `ice_dpll_init_info_direct_pins':
ice_dpll.c:(.text+0x894167): undefined reference to `ice_cgu_get_pin_freq_supp'
ld: ice_dpll.c:(.text+0x894197): undefined reference to `ice_cgu_get_pin_name'
ld: ice_dpll.c:(.text+0x8941a8): undefined reference to `ice_cgu_get_pin_type'
ld: vmlinux.o: in function `ice_dpll_update_state':
ice_dpll.c:(.text+0x894494): undefined reference to `ice_get_cgu_state'
ld: vmlinux.o: in function `ice_dpll_init':
(.text+0x8953d5): undefined reference to `ice_get_cgu_rclk_pin_info'

The first commit broke things by calling functions in
ice_init_feature_support that are compiled as part of ice_ptp_hw.o,
including:

* ice_is_phy_rclk_present
* ice_is_clock_mux_present_e810t
* ice_is_cgU_present

The second commit continued the break by calling several CGU functions
defined in ice_ptp_hw.c in the DPLL code.
Because the ice_dpll.c file is compiled unconditionally, it will not
link when CONFIG_PTP_1588_CLOCK=n.

It might be possible to break this dependency and expose those functions
without CONFIG_PTP_1588_CLOCK, but that is not clear to me.

For the DPLL case, simply compile ice_dpll.o only when we have
CONFIG_PTP_1588_CLOCK. Add stub no-op implementation of ice_dpll_init() and
ice_dpll_uninit() when CONFIG_PTP_1588_CLOCK=n into ice_dpll.h

The other functions are part of checking the netlist to see if hardware
features are enabled. These checks don't really belong in ice_ptp_hw.c, and
make more sense as part of the ice_common.c file. We already have
ice_is_gps_in_netlist() in ice_common.c which is doing a similar check.

Move the functions into ice_common.c and rename them to have the similar
postfix of "in_netlist()" to be more expressive of what they are actually
checking.

This also makes the ice_find_netlist_node only called from within
ice_common.c, so its safe to mark it static and stop declaring it in the
ice_common.h header as well.

Fixes: 8a3a565ff210 ("ice: add admin commands to access cgu configuration")
Fixes: d7999f5ea64b ("ice: implement dpll interface to control cgu")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202309191214.TaYEct4H-lkp@intel.com
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Link: https://lore.kernel.org/r/20231002185132.1575271-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Jakub Kicinski [Wed, 4 Oct 2023 15:28:07 +0000 (08:28 -0700)]
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2023-10-02

We've added 11 non-merge commits during the last 12 day(s) which contain
a total of 12 files changed, 176 insertions(+), 41 deletions(-).

The main changes are:

1) Fix BPF verifier to reset backtrack_state masks on global function
   exit as otherwise subsequent precision tracking would reuse them,
   from Andrii Nakryiko.

2) Several sockmap fixes for available bytes accounting,
   from John Fastabend.

3) Reject sk_msg egress redirects to non-TCP sockets given this
   is only supported for TCP sockets today, from Jakub Sitnicki.

4) Fix a syzkaller splat in bpf_mprog when hitting maximum program
   limits with BPF_F_BEFORE directive, from Daniel Borkmann
   and Nikolay Aleksandrov.

5) Fix BPF memory allocator to use kmalloc_size_roundup() to adjust
   size_index for selecting a bpf_mem_cache, from Hou Tao.

6) Fix arch_prepare_bpf_trampoline return code for s390 JIT,
   from Song Liu.

7) Fix bpf_trampoline_get when CONFIG_BPF_JIT is turned off,
   from Leon Hwang.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Use kmalloc_size_roundup() to adjust size_index
  selftest/bpf: Add various selftests for program limits
  bpf, mprog: Fix maximum program check on mprog attachment
  bpf, sockmap: Reject sk_msg egress redirects to non-TCP sockets
  bpf, sockmap: Add tests for MSG_F_PEEK
  bpf, sockmap: Do not inc copied_seq when PEEK flag set
  bpf: tcp_read_skb needs to pop skb regardless of seq
  bpf: unconditionally reset backtrack_state masks on global func exit
  bpf: Fix tr dereferencing
  selftests/bpf: Check bpf_cubic_acked() is called via struct_ops
  s390/bpf: Let arch_prepare_bpf_trampoline return program size
====================

Link: https://lore.kernel.org/r/20231002113417.2309-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonetfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure
Florian Westphal [Thu, 28 Sep 2023 13:12:44 +0000 (15:12 +0200)]
netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure

nft_rbtree_gc_elem() walks back and removes the end interval element that
comes before the expired element.

There is a small chance that we've cached this element as 'rbe_ge'.
If this happens, we hold and test a pointer that has been queued for
freeing.

It also causes spurious insertion failures:

$ cat test-testcases-sets-0044interval_overlap_0.1/testout.log
Error: Could not process rule: File exists
add element t s {  0 -  2 }
                   ^^^^^^
Failed to insert  0 -  2 given:
table ip t {
        set s {
                type inet_service
                flags interval,timeout
                timeout 2s
                gc-interval 2s
        }
}

The set (rbtree) is empty. The 'failure' doesn't happen on next attempt.

Reason is that when we try to insert, the tree may hold an expired
element that collides with the range we're adding.
While we do evict/erase this element, we can trip over this check:

if (rbe_ge && nft_rbtree_interval_end(rbe_ge) && nft_rbtree_interval_end(new))
      return -ENOTEMPTY;

rbe_ge was erased by the synchronous gc, we should not have done this
check.  Next attempt won't find it, so retry results in successful
insertion.

Restart in-kernel to avoid such spurious errors.

Such restart are rare, unless userspace intentionally adds very large
numbers of elements with very short timeouts while setting a huge
gc interval.

Even in this case, this cannot loop forever, on each retry an existing
element has been removed.

As the caller is holding the transaction mutex, its impossible
for a second entity to add more expiring elements to the tree.

After this it also becomes feasible to remove the async gc worker
and perform all garbage collection from the commit path.

Fixes: c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection")
Signed-off-by: Florian Westphal <fw@strlen.de>
18 months agonetfilter: nf_tables: Deduplicate nft_register_obj audit logs
Phil Sutter [Sat, 23 Sep 2023 01:53:50 +0000 (03:53 +0200)]
netfilter: nf_tables: Deduplicate nft_register_obj audit logs

When adding/updating an object, the transaction handler emits suitable
audit log entries already, the one in nft_obj_notify() is redundant. To
fix that (and retain the audit logging from objects' 'update' callback),
Introduce an "audit log free" variant for internal use.

Fixes: c520292f29b8 ("audit: log nftables configuration change events once per table")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com> (Audit)
Signed-off-by: Florian Westphal <fw@strlen.de>
18 months agoselftests: netfilter: Extend nft_audit.sh
Phil Sutter [Sat, 23 Sep 2023 01:53:49 +0000 (03:53 +0200)]
selftests: netfilter: Extend nft_audit.sh

Add tests for sets and elements and deletion of all kinds. Also
reorder rule reset tests: By moving the bulk rule add command up, the
two 'reset rules' tests become identical.

While at it, fix for a failing bulk rule add test's error status getting
lost due to its use in a pipe. Avoid this by using a temporary file.

Headings in diff output for failing tests contain no useful data, strip
them.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
18 months agoselftests: netfilter: test for sctp collision processing in nf_conntrack
Xin Long [Tue, 3 Oct 2023 17:17:54 +0000 (13:17 -0400)]
selftests: netfilter: test for sctp collision processing in nf_conntrack

This patch adds a test case to reproduce the SCTP DATA chunk retransmission
timeout issue caused by the improper SCTP collision processing in netfilter
nf_conntrack_proto_sctp.

In this test, client sends a INIT chunk, but the INIT_ACK replied from
server is delayed until the server sends a INIT chunk to start a new
connection from its side. After the connection is complete from server
side, the delayed INIT_ACK arrives in nf_conntrack_proto_sctp.

The delayed INIT_ACK should be dropped in nf_conntrack_proto_sctp instead
of updating the vtag with the out-of-date init_tag, otherwise, the vtag
in DATA chunks later sent by client don't match the vtag in the conntrack
entry and the DATA chunks get dropped.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
18 months agonetfilter: handle the connecting collision properly in nf_conntrack_proto_sctp
Xin Long [Tue, 3 Oct 2023 17:17:53 +0000 (13:17 -0400)]
netfilter: handle the connecting collision properly in nf_conntrack_proto_sctp

In Scenario A and B below, as the delayed INIT_ACK always changes the peer
vtag, SCTP ct with the incorrect vtag may cause packet loss.

Scenario A: INIT_ACK is delayed until the peer receives its own INIT_ACK

  192.168.1.2 > 192.168.1.1: [INIT] [init tag: 1328086772]
    192.168.1.1 > 192.168.1.2: [INIT] [init tag: 1414468151]
    192.168.1.2 > 192.168.1.1: [INIT ACK] [init tag: 1328086772]
  192.168.1.1 > 192.168.1.2: [INIT ACK] [init tag: 1650211246] *
  192.168.1.2 > 192.168.1.1: [COOKIE ECHO]
    192.168.1.1 > 192.168.1.2: [COOKIE ECHO]
    192.168.1.2 > 192.168.1.1: [COOKIE ACK]

Scenario B: INIT_ACK is delayed until the peer completes its own handshake

  192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408]
    192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885]
    192.168.1.2 > 192.168.1.1: sctp (1) [INIT ACK] [init tag: 3922216408]
    192.168.1.1 > 192.168.1.2: sctp (1) [COOKIE ECHO]
    192.168.1.2 > 192.168.1.1: sctp (1) [COOKIE ACK]
  192.168.1.1 > 192.168.1.2: sctp (1) [INIT ACK] [init tag: 3914796021] *

This patch fixes it as below:

In SCTP_CID_INIT processing:
- clear ct->proto.sctp.init[!dir] if ct->proto.sctp.init[dir] &&
  ct->proto.sctp.init[!dir]. (Scenario E)
- set ct->proto.sctp.init[dir].

In SCTP_CID_INIT_ACK processing:
- drop it if !ct->proto.sctp.init[!dir] && ct->proto.sctp.vtag[!dir] &&
  ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario B, Scenario C)
- drop it if ct->proto.sctp.init[dir] && ct->proto.sctp.init[!dir] &&
  ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario A)

In SCTP_CID_COOKIE_ACK processing:
- clear ct->proto.sctp.init[dir] and ct->proto.sctp.init[!dir].
  (Scenario D)

Also, it's important to allow the ct state to move forward with cookie_echo
and cookie_ack from the opposite dir for the collision scenarios.

There are also other Scenarios where it should allow the packet through,
addressed by the processing above:

Scenario C: new CT is created by INIT_ACK.

Scenario D: start INIT on the existing ESTABLISHED ct.

Scenario E: start INIT after the old collision on the existing ESTABLISHED
ct.

  192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408]
  192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885]
  (both side are stopped, then start new connection again in hours)
  192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 242308742]

Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
18 months agonetfilter: nft_payload: rebuild vlan header on h_proto access
Florian Westphal [Fri, 29 Sep 2023 08:42:10 +0000 (10:42 +0200)]
netfilter: nft_payload: rebuild vlan header on h_proto access

nft can perform merging of adjacent payload requests.
This means that:

ether saddr 00:11 ... ether type 8021ad ...

is a single payload expression, for 8 bytes, starting at the
ethernet source offset.

Check that offset+length is fully within the source/destination mac
addersses.

This bug prevents 'ether type' from matching the correct h_proto in case
vlan tag got stripped.

Fixes: de6843be3082 ("netfilter: nft_payload: rebuild vlan header when needed")
Reported-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: Florian Westphal <fw@strlen.de>
18 months agocan: raw: Remove NULL check before dev_{put, hold}
Jiapeng Chong [Fri, 25 Aug 2023 06:46:56 +0000 (14:46 +0800)]
can: raw: Remove NULL check before dev_{put, hold}

The call netdev_{put, hold} of dev_{put, hold} will check NULL, so there
is no need to check before using dev_{put, hold}, remove it to silence
the warning:

./net/can/raw.c:497:2-9: WARNING: NULL check before dev_{put, hold} functions is not needed.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6231
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reported-by: Simon Horman <horms@kernel.org>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20230825064656.87751-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
18 months agoMerge branch 'bnxt_en-hwmon-SRIOV'
David S. Miller [Wed, 4 Oct 2023 10:23:02 +0000 (11:23 +0100)]
Merge branch 'bnxt_en-hwmon-SRIOV'

Michael Chan says:

====================
bnxt_en: hwmon and SRIOV updates

The first 7 patches are v2 of the hwmon patches posted about 6 weeks ago
on Aug 14.  The last 2 patches are SRIOV related updates.

Link to v1 hwmon patches:
https://lore.kernel.org/netdev/20230815045658.80494-11-michael.chan@broadcom.com/
====================

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agobnxt_en: Update VNIC resource calculation for VFs
Vikas Gupta [Wed, 27 Sep 2023 03:57:34 +0000 (20:57 -0700)]
bnxt_en: Update VNIC resource calculation for VFs

Newer versions of firmware will pre-reserve 1 VNIC for every possible
PF and VF function.  Update the driver logic to take this into account
when assigning VNICs to the VFs.  These pre-reserved VNICs for the
inactive VFs should be subtracted from the global pool before
assigning them to the active VFs.

Not doing so may cause discrepancies that ultimately may cause some VFs to
have insufficient VNICs to support features such as aRFS.

Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agobnxt_en: Support QOS and TPID settings for the SRIOV VLAN
Sreekanth Reddy [Wed, 27 Sep 2023 03:57:33 +0000 (20:57 -0700)]
bnxt_en: Support QOS and TPID settings for the SRIOV VLAN

Add these missing settings in the .ndo_set_vf_vlan() method.
Older firmware does not support the TPID setting so check for
proper support.

Remove the unused BNXT_VF_QOS flag.

Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agobnxt_en: Event handler for Thermal event
Kalesh AP [Wed, 27 Sep 2023 03:57:32 +0000 (20:57 -0700)]
bnxt_en: Event handler for Thermal event

Newer FW will send a new async event when it detects that
the chip's temperature has crossed the configured threshold value.
The driver will now notify hwmon and will log a warning message.

Link: https://lore.kernel.org/netdev/20230815045658.80494-13-michael.chan@broadcom.com/
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: linux-hwmon@vger.kernel.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agobnxt_en: Use non-standard attribute to expose shutdown temperature
Kalesh AP [Wed, 27 Sep 2023 03:57:31 +0000 (20:57 -0700)]
bnxt_en: Use non-standard attribute to expose shutdown temperature

Implement the sysfs attributes directly in the driver for
shutdown threshold temperature and pass an extra attribute group
to the hwmon core when registering the hwmon device.

Link: https://lore.kernel.org/netdev/20230815045658.80494-12-michael.chan@broadcom.com/
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: linux-hwmon@vger.kernel.org
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agobnxt_en: Expose threshold temperatures through hwmon
Kalesh AP [Wed, 27 Sep 2023 03:57:30 +0000 (20:57 -0700)]
bnxt_en: Expose threshold temperatures through hwmon

HWRM_TEMP_MONITOR_QUERY response now indicates various
threshold temperatures. Expose these threshold temperatures
through the hwmon sysfs using this mapping:

hwmon_temp_max : bp->warn_thresh_temp
hwmon_temp_crit : bp->crit_thresh_temp
hwmon_temp_emergency : bp->fatal_thresh_temp

hwmon_temp_max_alarm : temp >= bp->warn_thresh_temp
hwmon_temp_crit_alarm : temp >= bp->crit_thresh_temp
hwmon_temp_emergency_alarm : temp >= bp->fatal_thresh_temp

Link: https://lore.kernel.org/netdev/20230815045658.80494-12-michael.chan@broadcom.com/
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: linux-hwmon@vger.kernel.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agobnxt_en: Modify the driver to use hwmon_device_register_with_info
Kalesh AP [Wed, 27 Sep 2023 03:57:29 +0000 (20:57 -0700)]
bnxt_en: Modify the driver to use hwmon_device_register_with_info

The use of hwmon_device_register_with_groups() is deprecated.
Modified the driver to use hwmon_device_register_with_info().

Driver currently exports only temp1_input through hwmon sysfs
interface. But FW has been modified to report more threshold
temperatures and driver want to report them through the
hwmon interface.

Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: linux-hwmon@vger.kernel.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agobnxt_en: Move hwmon functions into a dedicated file
Kalesh AP [Wed, 27 Sep 2023 03:57:28 +0000 (20:57 -0700)]
bnxt_en: Move hwmon functions into a dedicated file

This is in preparation for upcoming patches in the series.
Driver has to expose more threshold temperatures through the
hwmon sysfs interface. More code will be added and do not
want to overload bnxt.c.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: linux-hwmon@vger.kernel.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agobnxt_en: Enhance hwmon temperature reporting
Kalesh AP [Wed, 27 Sep 2023 03:57:27 +0000 (20:57 -0700)]
bnxt_en: Enhance hwmon temperature reporting

Driver currently does hwmon device register and unregister
in open and close() respectively. As a result, user will not
be able to query hwmon temperature when interface is in
ifdown state.

Enhance it by moving the hwmon register/unregister to the
probe/remove functions.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agobnxt_en: Update firmware interface to 1.10.2.171
Michael Chan [Wed, 27 Sep 2023 03:57:26 +0000 (20:57 -0700)]
bnxt_en: Update firmware interface to 1.10.2.171

The main changes are the additional thermal thresholds in
hwrm_temp_monitor_query_output and the new async event to
report thermal errors.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agoibmveth: Remove condition to recompute TCP header checksum.
David Wilder [Tue, 26 Sep 2023 21:42:51 +0000 (16:42 -0500)]
ibmveth: Remove condition to recompute TCP header checksum.

In some OVS environments the TCP pseudo header checksum may need to be
recomputed. Currently this is only done when the interface instance is
configured for "Trunk Mode". We found the issue also occurs in some
Kubernetes environments, these environments do not use "Trunk Mode",
therefor the condition is removed.

Performance tests with this change show only a fractional decrease in
throughput (< 0.2%).

Fixes: 7525de2516fb ("ibmveth: Set CHECKSUM_PARTIAL if NULL TCP CSUM.")
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Reviewed-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agoMerge patch series "can: etas_es58x: clean-up of new GCC W=1 and old checkpatch warnings"
Marc Kleine-Budde [Wed, 4 Oct 2023 09:46:27 +0000 (11:46 +0200)]
Merge patch series "can: etas_es58x: clean-up of new GCC W=1 and old checkpatch warnings"

Vincent Mailhol <mailhol.vincent@wanadoo.fr> says:

The kernel recently added new warnings, one of which triggers a known
false positive on the etas_es58x module. In an effort to keep
es58x_etas free of any W=12 (excluding those produced by foreign
headers), add a workaround to silence it.

While at it, this series also fix a checkpatch warning which I knew
existed for a long time but was too lazy to tackle.

v2 -> v3:
* if the parsing of one of the version/revision numbers fail,
  es58x_parse_product_info() immediately returns. If this occurs early,
  the other version/revision numbers would still be set to zero (which
  is now considered a valid version number). Set the version and
  revision to an invalid number before starting the parsing so that
  everything is set even if an early return occurs.

v1 -> v2:
* v1 had two different check logics for the version numbers:
  - check that none of the sub-version number are zero to make sure
    the parsing succeeded
  - check that all of the sub-version number fit the expected digit
    range to please GCC.

  v2 simplifies things by merging those two logics together.

Link: https://lore.kernel.org/all/20230924110914.183898-1-mailhol.vincent@wanadoo.fr
[mkl: fixed typos]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
18 months agocan: etas_es58x: add missing a blank line after declaration
Vincent Mailhol [Sun, 24 Sep 2023 11:06:48 +0000 (20:06 +0900)]
can: etas_es58x: add missing a blank line after declaration

Fix below checkpatch warning:

  WARNING: Missing a blank line after declarations
  #2233: FILE: drivers/net/can/usb/etas_es58x/es58x_core.c:2233:
  + int ret = es58x_init_netdev(es58x_dev, ch_idx);
  + if (ret) {

Fixes: d8f26fd689dd ("can: etas_es58x: remove es58x_get_product_info()")
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20230924110914.183898-3-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
18 months agocan: etas_es58x: rework the version check logic to silence -Wformat-truncation
Vincent Mailhol [Sun, 24 Sep 2023 11:06:47 +0000 (20:06 +0900)]
can: etas_es58x: rework the version check logic to silence -Wformat-truncation

Following [1], es58x_devlink.c now triggers the following
format-truncation GCC warnings:

  drivers/net/can/usb/etas_es58x/es58x_devlink.c: In function ‘es58x_devlink_info_get’:
  drivers/net/can/usb/etas_es58x/es58x_devlink.c:201:41: warning: ‘%02u’ directive output may be truncated writing between 2 and 3 bytes into a region of size between 1 and 3 [-Wformat-truncation=]
    201 |   snprintf(buf, sizeof(buf), "%02u.%02u.%02u",
        |                                         ^~~~
  drivers/net/can/usb/etas_es58x/es58x_devlink.c:201:30: note: directive argument in the range [0, 255]
    201 |   snprintf(buf, sizeof(buf), "%02u.%02u.%02u",
        |                              ^~~~~~~~~~~~~~~~
  drivers/net/can/usb/etas_es58x/es58x_devlink.c:201:3: note: ‘snprintf’ output between 9 and 12 bytes into a destination of size 9
    201 |   snprintf(buf, sizeof(buf), "%02u.%02u.%02u",
        |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    202 |     fw_ver->major, fw_ver->minor, fw_ver->revision);
        |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  drivers/net/can/usb/etas_es58x/es58x_devlink.c:211:41: warning: ‘%02u’ directive output may be truncated writing between 2 and 3 bytes into a region of size between 1 and 3 [-Wformat-truncation=]
    211 |   snprintf(buf, sizeof(buf), "%02u.%02u.%02u",
        |                                         ^~~~
  drivers/net/can/usb/etas_es58x/es58x_devlink.c:211:30: note: directive argument in the range [0, 255]
    211 |   snprintf(buf, sizeof(buf), "%02u.%02u.%02u",
        |                              ^~~~~~~~~~~~~~~~
  drivers/net/can/usb/etas_es58x/es58x_devlink.c:211:3: note: ‘snprintf’ output between 9 and 12 bytes into a destination of size 9
    211 |   snprintf(buf, sizeof(buf), "%02u.%02u.%02u",
        |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    212 |     bl_ver->major, bl_ver->minor, bl_ver->revision);
        |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  drivers/net/can/usb/etas_es58x/es58x_devlink.c:221:38: warning: ‘%03u’ directive output may be truncated writing between 3 and 5 bytes into a region of size between 2 and 4 [-Wformat-truncation=]
    221 |   snprintf(buf, sizeof(buf), "%c%03u/%03u",
        |                                      ^~~~
  drivers/net/can/usb/etas_es58x/es58x_devlink.c:221:30: note: directive argument in the range [0, 65535]
    221 |   snprintf(buf, sizeof(buf), "%c%03u/%03u",
        |                              ^~~~~~~~~~~~~
  drivers/net/can/usb/etas_es58x/es58x_devlink.c:221:3: note: ‘snprintf’ output between 9 and 13 bytes into a destination of size 9
    221 |   snprintf(buf, sizeof(buf), "%c%03u/%03u",
        |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    222 |     hw_rev->letter, hw_rev->major, hw_rev->minor);
        |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is not an actual bug because the sscanf() parsing makes sure that
the u8 are only two digits long and the u16 only three digits long.
Thus below declaration:

char buf[max(sizeof("xx.xx.xx"), sizeof("axxx/xxx"))];

allocates just what is needed to represent either of the versions.

This warning was known but ignored because, at the time of writing,
-Wformat-truncation was not present in the kernel, not even at W=3 [2].

One way to silence this warning is to check the range of all sub
version numbers are valid: [0, 99] for u8 and range [0, 999] for u16.

The module already has a logic which considers that when all the sub
version numbers are zero, the version number is not set. Note that not
having access to the device specification, this was an arbitrary
decision. This logic can thus be removed in favor of global check that
would cover both cases:

  - the version number is not set (parsing failed)
  - the version number is not valid (paranoiac check to please gcc)

Before starting to parse the product info string, set the version
sub-numbers to the maximum unsigned integer thus violating the
definitions of struct es58x_sw_version or struct es58x_hw_revision.

Then, rework the es58x_sw_version_is_set() and
es58x_hw_revision_is_set() functions: remove the check that the
sub-numbers are non zero and replace it by a check that they fit in
the expected number of digits. This done, rename the functions to
reflect the change and rewrite the documentation. While doing so, also
add a description of the return value.

Finally, the previous version only checked that
&es58x_hw_revision.letter was not the null character. Replace this
check by an alphanumeric character check to make sure that we never
return a special character or a non-printable one and update the
documentation of struct es58x_hw_revision accordingly.

All those extra checks are paranoid but have the merit to silence the
newly introduced W=1 format-truncation warning [1].

[1] commit 6d4ab2e97dcf ("extrawarn: enable format and stringop overflow warnings in W=1")
Link: https://git.kernel.org/torvalds/c/6d4ab2e97dcf
[2] https://lore.kernel.org/all/CAMZ6Rq+K+6gbaZ35SOJcR9qQaTJ7KR0jW=XoDKFkobjhj8CHhw@mail.gmail.com/

Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Closes: https://lore.kernel.org/linux-can/20230914-carrousel-wrecker-720a08e173e9-mkl@pengutronix.de/
Fixes: 9f06631c3f1f ("can: etas_es58x: export product information through devlink_ops::info_get()")
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20230924110914.183898-2-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
18 months agocan: sja1000: Fix comment
Miquel Raynal [Fri, 22 Sep 2023 15:51:30 +0000 (17:51 +0200)]
can: sja1000: Fix comment

There is likely a copy-paste error here, as the exact same comment
appears below in this function, one time calling set_reset_mode(), the
other set_normal_mode().

Fixes: 429da1cc841b ("can: Driver for the SJA1000 CAN controller")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/all/20230922155130.592187-1-miquel.raynal@bootlin.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
18 months agodmaengine: ti: k3-udma-glue: clean up k3_udma_glue_tx_get_irq() return
Dan Carpenter [Tue, 26 Sep 2023 14:06:58 +0000 (17:06 +0300)]
dmaengine: ti: k3-udma-glue: clean up k3_udma_glue_tx_get_irq() return

The k3_udma_glue_tx_get_irq() function currently returns negative error
codes on error, zero on error and positive values for success.  This
complicates life for the callers who need to propagate the error code.
Also GCC will not warn about unsigned comparisons when you check:

if (unsigned_irq <= 0)

All the callers have been fixed now but let's just make this easy going
forward.

Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Acked-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet: ti: icssg-prueth: Fix signedness bug in prueth_init_tx_chns()
Dan Carpenter [Tue, 26 Sep 2023 14:05:59 +0000 (17:05 +0300)]
net: ti: icssg-prueth: Fix signedness bug in prueth_init_tx_chns()

The "tx_chn->irq" variable is unsigned so the error checking does not
work correctly.

Fixes: 128d5874c082 ("net: ti: icssg-prueth: Add ICSSG ethernet driver")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet: ethernet: ti: am65-cpsw: Fix error code in am65_cpsw_nuss_init_tx_chns()
Dan Carpenter [Tue, 26 Sep 2023 14:04:43 +0000 (17:04 +0300)]
net: ethernet: ti: am65-cpsw: Fix error code in am65_cpsw_nuss_init_tx_chns()

This accidentally returns success, but it should return a negative error
code.

Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agovringh: don't use vringh_kiov_advance() in vringh_iov_xfer()
Stefano Garzarella [Mon, 25 Sep 2023 10:30:57 +0000 (12:30 +0200)]
vringh: don't use vringh_kiov_advance() in vringh_iov_xfer()

In the while loop of vringh_iov_xfer(), `partlen` could be 0 if one of
the `iov` has 0 lenght.
In this case, we should skip the iov and go to the next one.
But calling vringh_kiov_advance() with 0 lenght does not cause the
advancement, since it returns immediately if asked to advance by 0 bytes.

Let's restore the code that was there before commit b8c06ad4d67d
("vringh: implement vringh_kiov_advance()"), avoiding using
vringh_kiov_advance().

Fixes: b8c06ad4d67d ("vringh: implement vringh_kiov_advance()")
Cc: stable@vger.kernel.org
Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agoMAINTAINERS: adjust header file entry in DPLL SUBSYSTEM
Lukas Bulwahn [Mon, 25 Sep 2023 05:43:05 +0000 (07:43 +0200)]
MAINTAINERS: adjust header file entry in DPLL SUBSYSTEM

Commit 9431063ad323 ("dpll: core: Add DPLL framework base functions") adds
the section DPLL SUBSYSTEM in MAINTAINERS and includes a file entry to the
non-existing file 'include/net/dpll.h'.

Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a
broken reference. Looking at the file stat of the commit above, this entry
clearly intended to refer to 'include/linux/dpll.h'.

Adjust this header file entry in DPLL SUBSYSTEM.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agorswitch: Fix PHY station management clock setting
Yoshihiro Shimoda [Tue, 26 Sep 2023 12:30:54 +0000 (21:30 +0900)]
rswitch: Fix PHY station management clock setting

Fix the MPIC.PSMCS value following the programming example in the
section 6.4.2 Management Data Clock (MDC) Setting, Ethernet MAC IP,
S4 Hardware User Manual Rev.1.00.

The value is calculated by
    MPIC.PSMCS = clk[MHz] / (MDC frequency[MHz] * 2) - 1
with the input clock frequency from clk_get_rate() and MDC frequency
of 2.5MHz. Otherwise, this driver cannot communicate PHYs on the R-Car
S4 Starter Kit board.

Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Reported-by: Tam Nguyen <tam.nguyen.xa@renesas.com>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230926123054.3976752-1-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: add sysctl to disable rfc4862 5.5.3e lifetime handling
Patrick Rohr [Mon, 25 Sep 2023 21:47:11 +0000 (14:47 -0700)]
net: add sysctl to disable rfc4862 5.5.3e lifetime handling

This change adds a sysctl to opt-out of RFC4862 section 5.5.3e's valid
lifetime derivation mechanism.

RFC4862 section 5.5.3e prescribes that the valid lifetime in a Router
Advertisement PIO shall be ignored if it less than 2 hours and to reset
the lifetime of the corresponding address to 2 hours. An in-progress
6man draft (see draft-ietf-6man-slaac-renum-07 section 4.2) is currently
looking to remove this mechanism. While this draft has not been moving
particularly quickly for other reasons, there is widespread consensus on
section 4.2 which updates RFC4862 section 5.5.3e.

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Jen Linkova <furry@google.com>
Signed-off-by: Patrick Rohr <prohr@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230925214711.959704-1-prohr@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoiavf: remove "inline" functions from iavf_txrx.c
Jacob Keller [Thu, 24 Aug 2023 20:44:00 +0000 (14:44 -0600)]
iavf: remove "inline" functions from iavf_txrx.c

The iAVF txrx hotpath code has several functions that are marked as
"static inline" in the iavf_txrx.c file. This use of inline is frowned
upon in the netdev community and explicitly marked as something to avoid
in the Linux coding-style document (section 15).

Even though these functions are only used once, it is expected that GCC
is smart enough to decide when to perform function inlining where
appropriate without the "hint".

./scripts/bloat-o-meter is showing zero difference with this changes.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
18 months agoi40e: Add rx_missed_errors for buffer exhaustion
Yajun Deng [Wed, 6 Sep 2023 07:27:57 +0000 (15:27 +0800)]
i40e: Add rx_missed_errors for buffer exhaustion

As the comment in struct rtnl_link_stats64, rx_dropped should not
include packets dropped by the device due to buffer exhaustion.
They are counted in rx_missed_errors, procfs folds those two counters
together.

Add rx_missed_errors for buffer exhaustion, rx_missed_errors corresponds
to rx_discards, rx_dropped corresponds to rx_discards_other.

Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
18 months agoMerge branch 'documentation-fixes-for-dpll-subsystem'
Jakub Kicinski [Tue, 3 Oct 2023 22:02:19 +0000 (15:02 -0700)]
Merge branch 'documentation-fixes-for-dpll-subsystem'

Bagas Sanjaya says:

====================
Documentation fixes for dpll subsystem

Here is a mini docs fixes for dpll subsystem. The fixes are all code
block-related.

This series is triggered because I was emailed by kernel test robot,
alerting htmldocs warnings (see patch [1/2]).

[1]: https://lore.kernel.org/all/20230918093240.29824-1-bagasdotme@gmail.com/
====================

Link: https://lore.kernel.org/r/20230928052708.44820-1-bagasdotme@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoDocumentation: dpll: wrap DPLL_CMD_PIN_GET output in a code block
Bagas Sanjaya [Thu, 28 Sep 2023 05:27:08 +0000 (12:27 +0700)]
Documentation: dpll: wrap DPLL_CMD_PIN_GET output in a code block

DPLL_CMD_PIN_GET netlink command output for mux-type pins looks ugly
with normal paragraph formatting. Format it as a code block instead.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/r/20230928052708.44820-3-bagasdotme@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoDocumentation: dpll: Fix code blocks
Bagas Sanjaya [Thu, 28 Sep 2023 05:27:07 +0000 (12:27 +0700)]
Documentation: dpll: Fix code blocks

kernel test robot and Stephen Rothwell report htmldocs warnings:

Documentation/driver-api/dpll.rst:427: WARNING: Error in "code-block" directive:
maximum 1 argument(s) allowed, 18 supplied.

.. code-block:: c
<snipped>...
Documentation/driver-api/dpll.rst:444: WARNING: Error in "code-block" directive:
maximum 1 argument(s) allowed, 21 supplied.

.. code-block:: c
<snipped>...
Documentation/driver-api/dpll.rst:474: WARNING: Error in "code-block" directive:
maximum 1 argument(s) allowed, 12 supplied.

.. code-block:: c
<snipped>...

Fix these above by adding missing blank line separator after code-block
directive.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202309180456.lOhxy9gS-lkp@intel.com/
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20230918131521.155e9e63@canb.auug.org.au/
Fixes: dbb291f19393b6 ("dpll: documentation on DPLL subsystem interface")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/r/20230928052708.44820-2-bagasdotme@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge branch 'introduce-define_flex-macro'
Jakub Kicinski [Tue, 3 Oct 2023 19:17:13 +0000 (12:17 -0700)]
Merge branch 'introduce-define_flex-macro'

Przemek Kitszel says:

====================
introduce DEFINE_FLEX() macro

Add DEFINE_FLEX() macro, that helps on-stack allocation of structures
with trailing flex array member.
Expose __struct_size() macro which reads size of data allocated
by DEFINE_FLEX().

Accompany new macros introduction with actual usage,
in the ice driver - hence targeting for netdev tree.

Obvious benefits include simpler resulting code, less heap usage,
less error checking. Less obvious is the fact that compiler has
more room to optimize, and as a whole, even with more stuff on the stack,
we end up with overall better (smaller) report from bloat-o-meter:
add/remove: 8/6 grow/shrink: 7/18 up/down: 2211/-2270 (-59)
(individual results in each patch).
====================

Link: https://lore.kernel.org/r/20230912115937.1645707-1-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoice: make use of DEFINE_FLEX() in ice_switch.c
Przemek Kitszel [Tue, 12 Sep 2023 11:59:37 +0000 (07:59 -0400)]
ice: make use of DEFINE_FLEX() in ice_switch.c

Use DEFINE_FLEX() macro for 1-elem flex array members of ice_switch.c

Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20230912115937.1645707-8-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoice: make use of DEFINE_FLEX() for struct ice_aqc_dis_txq_item
Przemek Kitszel [Tue, 12 Sep 2023 11:59:36 +0000 (07:59 -0400)]
ice: make use of DEFINE_FLEX() for struct ice_aqc_dis_txq_item

Use DEFINE_FLEX() macro for 1-elem flex array use case
of struct ice_aqc_dis_txq_item.

Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20230912115937.1645707-7-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoice: make use of DEFINE_FLEX() for struct ice_aqc_add_tx_qgrp
Przemek Kitszel [Tue, 12 Sep 2023 11:59:35 +0000 (07:59 -0400)]
ice: make use of DEFINE_FLEX() for struct ice_aqc_add_tx_qgrp

Use DEFINE_FLEX() macro for 1-elem flex array use case
of struct ice_aqc_add_tx_qgrp.

Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20230912115937.1645707-6-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This page took 0.143258 seconds and 4 git commands to generate.