Git Repo - linux.git/log

Revert "md/raid10: pull codes that wait for blocked dev into one function"

This reverts commit f046f5d0d79cdb968f219ce249e497fd1accf484.

Matthew Ruffell reported data corruption in raid10 due to the changes
in discard handling [1]. Revert these changes before we find a proper fix.

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/
Cc: Matthew Ruffell <[email protected]>
Cc: Xiao Ni <[email protected]>
Signed-off-by: Song Liu <[email protected]>

Revert "md/raid10: improve raid10 discard request"

This reverts commit bcc90d280465ebd51ab8688be86e1f00c62dccf9.

Matthew Ruffell reported data corruption in raid10 due to the changes
in discard handling [1]. Revert these changes before we find a proper fix.

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/
Cc: Matthew Ruffell <[email protected]>
Cc: Xiao Ni <[email protected]>
Signed-off-by: Song Liu <[email protected]>

Revert "md/raid10: improve discard request for far layout"

This reverts commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359.

Matthew Ruffell reported data corruption in raid10 due to the changes
in discard handling [1]. Revert these changes before we find a proper fix.

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/
Cc: Matthew Ruffell <[email protected]>
Cc: Xiao Ni <[email protected]>
Signed-off-by: Song Liu <[email protected]>

Revert "dm raid: remove unnecessary discard limits for raid10"

This reverts commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28.

Matthew Ruffell reported data corruption in raid10 due to the changes
in discard handling [1]. Revert these changes before we find a proper fix.

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/
Cc: Matthew Ruffell <[email protected]>
Cc: Xiao Ni <[email protected]>
Cc: Mike Snitzer <[email protected]>
Acked-by: Mike Snitzer <[email protected]>
Signed-off-by: Song Liu <[email protected]>

MAINTAINERS: Add entry for Marvell Prestera Ethernet Switch driver

Add maintainers info for new Marvell Prestera Ethernet switch driver.

Signed-off-by: Mickey Rachamim <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sched: Fix dump of MPLS_OPT_LSE_LABEL attribute in cls_flower

TCA_FLOWER_KEY_MPLS_OPT_LSE_LABEL is a u32 attribute (MPLS label is
20 bits long).

Fixes the following bug:

$ tc filter add dev ethX ingress protocol mpls_uc \
     flower mpls lse depth 2 label 256             \
     action drop

$ tc filter show dev ethX ingress
   filter protocol mpls_uc pref 49152 flower chain 0
   filter protocol mpls_uc pref 49152 flower chain 0 handle 0x1
     eth_type 8847
     mpls
       lse depth 2 label 0  <-- invalid label 0, should be 256
   ...

Fixes: 61aec25a6db5 ("cls_flower: Support filtering on multiple MPLS Label Stack Entries")
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drm/amd/pm: typo fix (CUSTOM -> COMPUTE)

The "COMPUTE" was wrongly spelled as "CUSTOM".

Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 5.9.x

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Switch to RCU in x_tables to fix possible NULL pointer dereference,
   from Subash Abhinov Kasiviswanathan.

2) Fix netlink dump of dynset timeouts later than 23 days.

3) Add comment for the indirect serialization of the nft commit mutex
   with rtnl_mutex.

4) Remove bogus check for confirmed conntrack when matching on the
   conntrack ID, from Brett Mastbergen.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2020-12-09

This series contains updates to igb, ixgbe, i40e, and ice drivers.

Sven Auhagen fixes issues with igb XDP: return correct error value in XDP
xmit back, increase header padding to include space for double VLAN, add
an extack error when Rx buffer is too small for frame size, set metasize if
it is set in xdp, change xdp_do_flush_map to xdp_do_flush, and update
trans_start to avoid possible Tx timeout.

Björn fixes an issue where an Rx buffer can be reused prematurely with
XDP redirect for ixgbe, i40e, and ice drivers.

The following are changes since commit 323a391a220c4a234cb1e678689d7f4c3b73f863:
can: isotp: isotp_setsockopt(): block setsockopt on bound sockets
and are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue 1GbE
====================

Signed-off-by: David S. Miller <[email protected]>

Input: cros_ec_keyb - send 'scancodes' in addition to key events

To let userspace know what 'scancodes' should be used in EVIOCGKEYCODE
and EVIOCSKEYCODE ioctls, we should send EV_MSC/MSC_SCAN events in
addition to EV_KEY/KEY_* events. The driver already declared MSC_SCAN
capability, so it is only matter of actually sending the events.

Link: https://lore.kernel.org/r/[email protected]
Acked-by: Rajat Jain <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>

Merge branch 'mlx4_en-fixes'

Tariq Toukan says:

====================
mlx4_en fixes

This patchset by Moshe contains fixes to the mlx4 Eth driver,
addressing issues in restart flow.

Patch 1 protects the restart task from being rescheduled while active.
  Please queue for -stable >= v2.6.
Patch 2 reconstructs SQs stuck in error state, and adds prints for improved
  debuggability.
  Please queue for -stable >= v3.12.
====================

Signed-off-by: David S. Miller <[email protected]>

net/mlx4_en: Handle TX error CQE

In case error CQE was found while polling TX CQ, the QP is in error
state and all posted WQEs will generate error CQEs without any data
transmitted. Fix it by reopening the channels, via same method used for
TX timeout handling.

In addition add some more info on error CQE and WQE for debug.

Fixes: bd2f631d7c60 ("net/mlx4_en: Notify user when TX ring in error state")
Signed-off-by: Moshe Shemesh <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/mlx4_en: Avoid scheduling restart task if it is already running

Add restarting state flag to avoid scheduling another restart task while
such task is already running. Change task name from watchdog_task to
restart_task to better fit the task role.

Fixes: 1e338db56e5a ("mlx4_en: Fix a race at restart task")
Signed-off-by: Moshe Shemesh <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: fix cwnd-limited bug for TSO deferral where we send nothing

When cwnd is not a multiple of the TSO skb size of N*MSS, we can get
into persistent scenarios where we have the following sequence:

(1) ACK for full-sized skb of N*MSS arrives
  -> tcp_write_xmit() transmit full-sized skb with N*MSS
  -> move pacing release time forward
  -> exit tcp_write_xmit() because pacing time is in the future

(2) TSQ callback or TCP internal pacing timer fires
  -> try to transmit next skb, but TSO deferral finds remainder of
     available cwnd is not big enough to trigger an immediate send
     now, so we defer sending until the next ACK.

(3) repeat...

So we can get into a case where we never mark ourselves as
cwnd-limited for many seconds at a time, even with
bulk/infinite-backlog senders, because:

o In case (1) above, every time in tcp_write_xmit() we have enough
cwnd to send a full-sized skb, we are not fully using the cwnd
(because cwnd is not a multiple of the TSO skb size). So every time we
send data, we are not cwnd limited, and so in the cwnd-limited
tracking code in tcp_cwnd_validate() we mark ourselves as not
cwnd-limited.

o In case (2) above, every time in tcp_write_xmit() that we try to
transmit the "remainder" of the cwnd but defer, we set the local
variable is_cwnd_limited to true, but we do not send any packets, so
sent_pkts is zero, so we don't call the cwnd-limited logic to update
tp->is_cwnd_limited.

Fixes: ca8a22634381 ("tcp: make cwnd-limited checks measurement-based, and gentler")
Reported-by: Ingemar Johansson <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Yuchung Cheng <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: flow_offload: Fix memory leak for indirect flow block

The offending commit introduces a cleanup callback that is invoked
when the driver module is removed to clean up the tunnel device
flow block. But it returns on the first iteration of the for loop.
The remaining indirect flow blocks will never be freed.

Fixes: 1fac52da5942 ("net: flow_offload: consolidate indirect flow_block infrastructure")
CC: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Chris Mi <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>

tcp: Retain ECT bits for tos reflection

For DCTCP, we have to retain the ECT bits set by the congestion control
algorithm on the socket when reflecting syn TOS in syn-ack, in order to
make ECN work properly.

Fixes: ac8f1710c12b ("tcp: reflect tos value received in SYN to the socket")
Reported-by: Alexander Duyck <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ethtool: fix stack overflow in ethnl_parse_bitset()

Syzbot reported a stack overflow in bitmap_from_arr32() called from
ethnl_parse_bitset() when bitset from netlink message is longer than
target bitmap length. While ethnl_compact_sanity_checks() makes sure that
trailing part is all zeros (i.e. the request does not try to touch bits
kernel does not recognize), we also need to cap change_bits to nbits so
that we don't try to write past the prepared bitmaps.

Fixes: 88db6d1e4f62 ("ethtool: add ethnl_parse_bitset() helper")
Reported-by: [email protected]
Signed-off-by: Michal Kubecek <[email protected]>
Link: https://lore.kernel.org/r/3487ee3a98e14cd526f55b6caaa959d2dcbcad9f.1607465316.git.mkubecek@suse.cz
Signed-off-by: Jakub Kicinski <[email protected]>

e1000e: fix S0ix flow to allow S0i3.2 subset entry

Changed a configuration in the flows to align with
architecture requirements to achieve S0i3.2 substate.

This helps both i219V and i219LM configurations.

Also fixed a typo in the previous commit 632fbd5eb5b0
("e1000e: fix S0ix flows for cable connected case").

Fixes: 632fbd5eb5b0 ("e1000e: fix S0ix flows for cable connected case").
Signed-off-by: Vitaly Lifshits <[email protected]>
Tested-by: Aaron Brown <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ice: avoid premature Rx buffer reuse

The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.

To avoid this, store the page count prior invoking xdp_do_redirect().

Fixes: efc2214b6047 ("ice: Add support for XDP")
Reported-and-analyzed-by: Li RongQing <[email protected]>
Signed-off-by: Björn Töpel <[email protected]>
Tested-by: George Kuruvinakunnel <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ixgbe: avoid premature Rx buffer reuse

The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.

To avoid this, store the page count prior invoking xdp_do_redirect().

Fixes: 6453073987ba ("ixgbe: add initial support for xdp redirect")
Reported-and-analyzed-by: Li RongQing <[email protected]>
Signed-off-by: Björn Töpel <[email protected]>
Tested-by: Sandeep Penigalapati <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

i40e: avoid premature Rx buffer reuse

The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.

To avoid this, store the page count prior invoking xdp_do_redirect().

Longer explanation:

Intel NICs have a recycle mechanism. The main idea is that a page is
split into two parts. One part is owned by the driver, one part might
be owned by someone else, such as the stack.

t0: Page is allocated, and put on the Rx ring
              +---------------
used by NIC ->| upper buffer
(rx_buffer)   +---------------
              | lower buffer
              +---------------
  page count  == USHRT_MAX
  rx_buffer->pagecnt_bias == USHRT_MAX

t1: Buffer is received, and passed to the stack (e.g.)
              +---------------
              | upper buff (skb)
              +---------------
used by NIC ->| lower buffer
(rx_buffer)   +---------------
  page count  == USHRT_MAX
  rx_buffer->pagecnt_bias == USHRT_MAX - 1

t2: Buffer is received, and redirected
              +---------------
              | upper buff (skb)
              +---------------
used by NIC ->| lower buffer
(rx_buffer)   +---------------

Now, prior calling xdp_do_redirect():
  page count  == USHRT_MAX
  rx_buffer->pagecnt_bias == USHRT_MAX - 2

This means that buffer *cannot* be flipped/reused, because the skb is
still using it.

The problem arises when xdp_do_redirect() actually frees the
segment. Then we get:
  page count  == USHRT_MAX - 1
  rx_buffer->pagecnt_bias == USHRT_MAX - 2

From a recycle perspective, the buffer can be flipped and reused,
which means that the skb data area is passed to the Rx HW ring!

To work around this, the page count is stored prior calling
xdp_do_redirect().

Note that this is not optimal, since the NIC could actually reuse the
"lower buffer" again. However, then we need to track whether
XDP_REDIRECT consumed the buffer or not.

Fixes: d9314c474d4f ("i40e: add support for XDP_REDIRECT")
Reported-and-analyzed-by: Li RongQing <[email protected]>
Signed-off-by: Björn Töpel <[email protected]>
Tested-by: George Kuruvinakunnel <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

igb: avoid transmit queue timeout in xdp path

Since we share the transmit queue with the network stack,
it is possible that we run into a transmit queue timeout.
This will reset the queue.
This happens under high load when XDP is using the
transmit queue pretty much exclusively.

netdev_start_xmit() sets the trans_start variable of the
transmit queue to jiffies which is later utilized by dev_watchdog(),
so to avoid timeout, let stack know that XDP xmit happened by
bumping the trans_start within XDP Tx routines to jiffies.

Fixes: 9cbc948b5a20 ("igb: add XDP support")
Acked-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Sven Auhagen <[email protected]>
Tested-by: Sandeep Penigalapati <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

igb: use xdp_do_flush

Since it is a new XDP implementation change xdp_do_flush_map
to xdp_do_flush.

Fixes: 9cbc948b5a20 ("igb: add XDP support")
Suggested-by: Maciej Fijalkowski <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Acked-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Sven Auhagen <[email protected]>
Tested-by: Sandeep Penigalapati <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

igb: skb add metasize for xdp

add metasize if it is set in xdp

Fixes: 9cbc948b5a20 ("igb: add XDP support")
Suggested-by: Maciej Fijalkowski <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Acked-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Sven Auhagen <[email protected]>
Tested-by: Sandeep Penigalapati <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

igb: XDP extack message on error

Add an extack error message when the RX buffer size is too small
for the frame size.

Fixes: 9cbc948b5a20 ("igb: add XDP support")
Suggested-by: Maciej Fijalkowski <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Acked-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Sven Auhagen <[email protected]>
Tested-by: Sandeep Penigalapati <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

igb: take VLAN double header into account

Increase the packet header padding to include double VLAN tagging.
This patch uses a macro for this.

Fixes: 9cbc948b5a20 ("igb: add XDP support")
Suggested-by: Maciej Fijalkowski <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Acked-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Sven Auhagen <[email protected]>
Tested-by: Sandeep Penigalapati <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

igb: XDP xmit back fix error code

The igb XDP xmit back function should only return
defined error codes.

Fixes: 9cbc948b5a20 ("igb: add XDP support")
Reported-by: Dan Carpenter <[email protected]>
Acked-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Sven Auhagen <[email protected]>
Tested-by: Sandeep Penigalapati <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

Merge tag 'arm-soc-fixes-v5.10-4b' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
"There are a few more PHY mode changes for allwinner SoC based boards
  with a Realtek PHY after the driver changed its behavior, I assume
  there will be more of these in the future. Also on for Allwinner, the
  Banana Pi M2 board had a regression that led to some devices not
  working because of a slightly incorrect voltage being applied.

  By popular demand, I picked up a change from Krzysztof Kozlowski to
  actually list the SoC tree in the MAINTAINERS file. We don't want to
  get Cc'd on normal patches that are picked up by platform maintainers,
  but the lack of an entry has led to confusion in the past.

  All the other changes are fairly benign, fixing boot-time or
  compile-time warning messages in various places:

   - A dtc warning on the OLPC XO-1.75

   - A boot-time warning on i.MX6 wandboard

   - A harmless compile-time warning

   - A regression causing one of the i.MX6 SoCs to be identified as
     another

   - Missing SoC identification of Allwinner V3 and S3"

* tag 'arm-soc-fixes-v5.10-4b' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  firmware: xilinx: Mark pm_api_features_map with static keyword
  ARM: dts: mmp2-olpc-xo-1-75: clear the warnings when make dtbs
  MAINTAINERS: add a limited ARM and ARM64 SoC entry
  MAINTAINERS: correct SoC Git address (formerly: arm-soc)
  ARM: keystone: remove SECTION_SIZE_BITS/MAX_PHYSMEM_BITS
  arm64: dts: allwinner: H5: NanoPi Neo Plus2: phy-mode rgmii-id
  arm64: dts: allwinner: A64 Sopine: phy-mode rgmii-id
  ARM: dts: imx6qdl-kontron-samx6i: fix I2C_PM scl pin
  ARM: dts: imx6qdl-wandboard-revd1: Remove PAD_GPIO_6 from enetgrp
  ARM: imx: Use correct SRC base address
  ARM: dts: sun7i: pcduino3-nano: enable RGMII RX/TX delay on PHY
  ARM: dts: sun8i: v3s: fix GIC node memory range
  ARM: dts: sun8i: v40: bananapi-m2-berry: Fix ethernet node
  ARM: dts: sun8i: r40: bananapi-m2-berry: Fix dcdc1 regulator
  ARM: dts: sun7i: bananapi: Enable RGMII RX/TX delay on Ethernet PHY
  ARM: dts: s3: pinecube: align compatible property to other S3 boards
  ARM: sunxi: Add machine match for the Allwinner V3 SoC
  arm64: dts: allwinner: h6: orangepi-one-plus: Fix ethernet

Revert "geneve: pull IP header before ECN decapsulation"

This reverts commit 4179b00c04d1 ("geneve: pull IP header before ECN decapsulation").

Eric says: "network header should have been pulled already before
hitting geneve_rx()". Let's revert the syzbot fix since it's causing
more harm than good, and revisit.

Suggested-by: Eric Dumazet <[email protected]>
Reported-by: Jianlin Shi <[email protected]>
Fixes: 4179b00c04d1 ("geneve: pull IP header before ECN decapsulation")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=210569
Link: https://lore.kernel.org/netdev/CANn89iJVWfb=2i7oU1=D55rOyQnBbbikf+Mc6XHMkY7YX-yGEw@mail.gmail.com/
Signed-off-by: Jakub Kicinski <[email protected]>

firmware: xilinx: Mark pm_api_features_map with static keyword

Fix the following sparse warning:

drivers/firmware/xilinx/zynqmp.c:32:1: warning: symbol 'pm_api_features_map' was not declared. Should it be static?

Signed-off-by: Zou Wei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michal Simek <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>

ARM: dts: mmp2-olpc-xo-1-75: clear the warnings when make dtbs

The check_spi_bus_bridge() in scripts/dtc/checks.c requires that the node
have "spi-slave" property must with "#address-cells = <0>" and
"#size-cells = <0>". But currently both "#address-cells" and "#size-cells"
properties are deleted, the corresponding default values are 2 and 1. As a
result, the check fails and below warnings is displayed.

arch/arm/boot/dts/mmp2.dtsi:472.23-480.6: Warning (spi_bus_bridge): \
/soc/apb@d4000000/spi@d4037000: incorrect #address-cells for SPI bus
also defined at arch/arm/boot/dts/mmp2-olpc-xo-1-75.dts:225.7-237.3
arch/arm/boot/dts/mmp2.dtsi:472.23-480.6: Warning (spi_bus_bridge): \
/soc/apb@d4000000/spi@d4037000: incorrect #size-cells for SPI bus
also defined at arch/arm/boot/dts/mmp2-olpc-xo-1-75.dts:225.7-237.3
arch/arm/boot/dts/mmp2-olpc-xo-1-75.dtb: Warning (spi_bus_reg): \
Failed prerequisite 'spi_bus_bridge'

Because the value of "#size-cells" is already defined as zero in the node
"ssp3: spi@d4037000" in arch/arm/boot/dts/mmp2.dtsi. So we only need to
explicitly add "#address-cells = <0>" and keep "#size-cells" no change.

Signed-off-by: Zhen Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]'
Signed-off-by: Arnd Bergmann <[email protected]>

RDMA/cm: Fix an attempt to use non-valid pointer when cleaning timewait

If cm_create_timewait_info() fails, the timewait_info pointer will contain
an error value and will be used in cm_remove_remote() later.

  general protection fault, probably for non-canonical address 0xdffffc0000000024: 0000 [#1] SMP KASAN PTI
  KASAN: null-ptr-deref in range [0×0000000000000120-0×0000000000000127]
  CPU: 2 PID: 12446 Comm: syz-executor.3 Not tainted 5.10.0-rc5-5d4c0742a60e #27
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  RIP: 0010:cm_remove_remote.isra.0+0x24/0×170 drivers/infiniband/core/cm.c:978
  Code: 84 00 00 00 00 00 41 54 55 53 48 89 fb 48 8d ab 2d 01 00 00 e8 7d bf 4b fe 48 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <0f> b6 04 02 48 89 ea 83 e2 07 38 d0 7f 08 84 c0 0f 85 fc 00 00 00
  RSP: 0018:ffff888013127918 EFLAGS: 00010006
  RAX: dffffc0000000000 RBX: fffffffffffffff4 RCX: ffffc9000a18b000
  RDX: 0000000000000024 RSI: ffffffff82edc573 RDI: fffffffffffffff4
  RBP: 0000000000000121 R08: 0000000000000001 R09: ffffed1002624f1d
  R10: 0000000000000003 R11: ffffed1002624f1c R12: ffff888107760c70
  R13: ffff888107760c40 R14: fffffffffffffff4 R15: ffff888107760c9c
  FS:  00007fe1ffcc1700(0000) GS:ffff88811a600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000001b2ff21000 CR3: 000000010f504001 CR4: 0000000000370ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   cm_destroy_id+0x189/0×15b0 drivers/infiniband/core/cm.c:1155
   cma_connect_ib drivers/infiniband/core/cma.c:4029 [inline]
   rdma_connect_locked+0x1100/0×17c0 drivers/infiniband/core/cma.c:4107
   rdma_connect+0x2a/0×40 drivers/infiniband/core/cma.c:4140
   ucma_connect+0x277/0×340 drivers/infiniband/core/ucma.c:1069
   ucma_write+0x236/0×2f0 drivers/infiniband/core/ucma.c:1724
   vfs_write+0x220/0×830 fs/read_write.c:603
   ksys_write+0x1df/0×240 fs/read_write.c:658
   do_syscall_64+0x33/0×40 arch/x86/entry/common.c:46
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation")
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Maor Gottlieb <[email protected]>
Reported-by: Amit Matityahu <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

Merge tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull iommu fix from Will Deacon:
"Fix interrupt table length definition for AMD IOMMU.

  It's actually a fix for a fix, where the size of the interrupt
  remapping table was increased but a related constant for the
  size of the interrupt table was forgotten"

* tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  iommu/amd: Set DTE[IntTabLen] to represent 512 IRTEs

can: isotp: isotp_setsockopt(): block setsockopt on bound sockets

The isotp socket can be widely configured in its behaviour regarding addressing
types, fill-ups, receive pattern tests and link layer length. Usually all
these settings need to be fixed before bind() and can not be changed
afterwards.

This patch adds a check to enforce the common usage pattern.

Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Signed-off-by: Oliver Hartkopp <[email protected]>
Tested-by: Thomas Wagner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'bpf-xdp-offload-fixes'

Toke Høiland-Jørgensen says:

====================
This series restores the test_offload.py selftest to working order. It seems a
number of subtle behavioural changes have crept into various subsystems which
broke test_offload.py in a number of ways. Most of these are fairly benign
changes where small adjustments to the test script seems to be the best fix,
but one is an actual kernel bug that I've observed in the wild caused by a bad
interaction between xdp_attachment_flags_ok() and the rework of XDP program
handling in the core netdev code.

Patch 1 fixes the bug by removing xdp_attachment_flags_ok(), and the reminder of
the patches are adjustments to test_offload.py, including a new feature for
netdevsim to force a BPF verification fail. Please see the individual patches
for details.

Changelog:

v4:
  - Accidentally truncated the Fixes: hashes in patches 3/4 to 11 chars
v3:
  - Add Fixes: tags
v2:
  - Replace xdp_attachment_flags_ok() with a check in dev_xdp_attach()
  - Better packing of struct nsim_dev
====================

Signed-off-by: Daniel Borkmann <[email protected]>

selftests/bpf/test_offload.py: Filter bpftool internal map when counting maps

A few of the tests in test_offload.py expects to see a certain number of
maps created, and checks this by counting the number of maps returned by
bpftool. There is already a filter that will remove any maps already there
at the beginning of the test, but bpftool now creates a map for the PID
iterator rodata on each invocation, which makes the map count wrong. Fix
this by also filtering the pid_iter.rodata map by name when counting.

Fixes: d53dee3fe013 ("tools/bpftool: Show info for processes holding BPF map/prog/link/btf FDs")
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

selftests/bpf/test_offload.py: Reset ethtool features after failed setting

When setting the ethtool feature flag fails (as expected for the test), the
kernel now tracks that the feature was requested to be 'off' and refuses to
subsequently disable it again. So reset it back to 'on' so a subsequent
disable (that's not supposed to fail) can succeed.

Fixes: 417ec26477a5 ("selftests/bpf: add offload test based on netdevsim")
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

selftests/bpf/test_offload.py: Fix expected case of extack messages

Commit 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF programs
in net_device") changed the case of some of the extack messages being
returned when attaching of XDP programs failed. This broke test_offload.py,
so let's fix the test to reflect this.

Fixes: 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device")
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

selftests/bpf/test_offload.py: Only check verifier log on verification fails

Since commit 6f8a57ccf851 ("bpf: Make verifier log more relevant by
default"), the verifier discards log messages for successfully-verified
programs. This broke test_offload.py which is looking for a verification
message from the driver callback. Change test_offload.py to use the toggle
in netdevsim to make the verification fail before looking for the
verification message.

Fixes: 6f8a57ccf851 ("bpf: Make verifier log more relevant by default")
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

netdevsim: Add debugfs toggle to reject BPF programs in verifier

This adds a new debugfs toggle ('bpf_bind_verifier_accept') that can be
used to make netdevsim reject BPF programs from being accepted by the
verifier. If this toggle (which defaults to true) is set to false,
nsim_bpf_verify_insn() will return EOPNOTSUPP on the last
instruction (after outputting the 'Hello from netdevsim' verifier message).

This makes it possible to check the verification callback in the driver
from test_offload.py in selftests, since the verifier now clears the
verifier log on a successful load, hiding the message from the driver.

Fixes: 6f8a57ccf851 ("bpf: Make verifier log more relevant by default")
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

selftests/bpf/test_offload.py: Remove check for program load flags match

Since we just removed the xdp_attachment_flags_ok() callback, also remove
the check for it in test_offload.py, and replace it with a test for the new
ambiguity-avoid check when multiple programs are loaded.

Fixes: 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device")
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

xdp: Remove the xdp_attachment_flags_ok() callback

Since commit 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF
programs in net_device"), the XDP program attachment info is now maintained
in the core code. This interacts badly with the xdp_attachment_flags_ok()
check that prevents unloading an XDP program with different load flags than
it was loaded with. In practice, two kinds of failures are seen:

- An XDP program loaded without specifying a mode (and which then ends up
  in driver mode) cannot be unloaded if the program mode is specified on
  unload.

- The dev_xdp_uninstall() hook always calls the driver callback with the
  mode set to the type of the program but an empty flags argument, which
  means the flags_ok() check prevents the program from being removed,
  leading to bpf prog reference leaks.

The original reason this check was added was to avoid ambiguity when
multiple programs were loaded. With the way the checks are done in the core
now, this is quite simple to enforce in the core code, so let's add a check
there and get rid of the xdp_attachment_flags_ok() callback entirely.

Fixes: 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device")
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

drm/amdgpu: Initialise drm_gem_object_funcs for imported BOs

For BOs imported from outside of amdgpu, setting of amdgpu_gem_object_funcs
was missing in amdgpu_dma_buf_create_obj. Fix by refactoring BO creation
and amdgpu_gem_object_funcs setting into single function called
from both code paths.

Fixes: d693def4fd1c ("drm: Remove obsolete GEM and PRIME callbacks from struct drm_driver")
v2: Use use amdgpu_gem_object_create() directly
v3: fix warning

Reviewed-by: Christian König <[email protected]>
Signed-off-by: Andrey Grodzovsky <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: fix size calculation with stolen vga memory

If we need to keep the stolen vga memory, make sure it is
at least as big as the legacy vga size.

Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amd/pm: update smu10.h WORKLOAD_PPLIB setting for raven

When using old WORKLOAD_PPLIB setting in smu10.h, there is problem that
it can't be able to switch to mak gpu clk during compute workload.
It needs to update WORKLOAD_PPLIB setting to fix this issue.

Signed-off-by: Changfeng <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amdkfd: Fix leak in dmabuf import

Release dmabuf reference before returning from kfd_ioctl_import_dmabuf.
amdgpu_amdkfd_gpuvm_import_dmabuf takes a reference to the underlying
GEM BO and doesn't keep the reference to the dmabuf wrapper.

Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Kent Russell <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: fix sdma instance fw version and feature version init

each sdma instance fw_version and feature_version
should be set right value when asic type isn't
between SIENNA_CICHILD and CHIP_DIMGREY_CAVEFISH

Signed-off-by: Stanley.Yang <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amd/display: Add wm table for Renoir

[Why]
Without additional HostVM Latency, Renoir takes 2us longer to exit
self-refresh. This causes underflow in certain cases.

[How]
Add table for Renoir with updated sr exit latencies for WM set A.

Signed-off-by: Sung Lee <[email protected]>
Reviewed-by: Yongqiang Sun <[email protected]>
Reviewed-by: Roman Li <[email protected]>
Acked-by: Eryk Brol <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Prevent bandwidth overflow

[Why]
At very high pixel clock, bandwidth calculation exceeds 32 bit size
and overflow value. This causes the resulting selection of link rate
to be inaccurate.

[How]
Change order of operation and use fixed point to deal with integer
accuracy. Also address bug found when forcing link rate.

Signed-off-by: Chris Park <[email protected]>
Reviewed-by: Wenjing Liu <[email protected]>
Acked-by: Eryk Brol <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: fix debugfs creation/removal, again

There is still a warning when CONFIG_DEBUG_FS is disabled:

drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1145:13: error: 'amdgpu_ras_debugfs_create_ctrl_node' defined but not used [-Werror=unused-function]
1145 | static void amdgpu_ras_debugfs_create_ctrl_node(struct amdgpu_device *adev)

Change the code again to make the compiler actually drop
this code but not warn about it.

Fixes: ae2bf61ff39e ("drm/amdgpu: guard ras debugfs creation/removal based on CONFIG_DEBUG_FS")
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu/disply: set num_crtc earlier

To avoid a recently added warning:
Bogus possible_crtcs: [ENCODER:65:TMDS-65] possible_crtcs=0xf (full crtc mask=0x7)
WARNING: CPU: 3 PID: 439 at drivers/gpu/drm/drm_mode_config.c:617 drm_mode_config_validate+0x178/0x200 [drm]
In this case the warning is harmless, but confusing to users.

Fixes: 0df108237433 ("drm: Validate encoder->possible_crtcs")
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=209123
Reviewed-by: Daniel Vetter <[email protected]>
Reviewed-by: Nicholas Kazlauskas <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

netfilter: nft_ct: Remove confirmation check for NFT_CT_ID

Since commit 656c8e9cc1ba ("netfilter: conntrack: Use consistent ct id
hash calculation") the ct id will not change from initialization to
confirmation. Removing the confirmation check allows for things like
adding an element to a 'typeof ct id' set in prerouting upon reception
of the first packet of a new connection, and then being able to
reference that set consistently both before and after the connection
is confirmed.

Fixes: 656c8e9cc1ba ("netfilter: conntrack: Use consistent ct id hash calculation")
Signed-off-by: Brett Mastbergen <[email protected]>
Acked-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

xen: don't use page->lru for ZONE_DEVICE memory

Commit 9e2369c06c8a18 ("xen: add helpers to allocate unpopulated
memory") introduced usage of ZONE_DEVICE memory for foreign memory
mappings.

Unfortunately this collides with using page->lru for Xen backend
private page caches.

Fix that by using page->zone_device_data instead.

Cc: <[email protected]> # 5.9
Fixes: 9e2369c06c8a18 ("xen: add helpers to allocate unpopulated memory")
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Reviewed-by: Jason Andryuk <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>

xen: add helpers for caching grant mapping pages

Instead of having similar helpers in multiple backend drivers use
common helpers for caching pages allocated via gnttab_alloc_pages().

Make use of those helpers in blkback and scsiback.

Cc: <[email protected]> # 5.9
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>

gpio: eic-sprd: break loop when getting NULL device resource

EIC controller have unfixed numbers of banks on different Spreadtrum SoCs,
and each bank has its own base address, the loop of getting there base
address in driver should break if the resource gotten via
platform_get_resource() is NULL already. The later ones would be all NULL
even if the loop continues.

Fixes: 25518e024e3a ("gpio: Add Spreadtrum EIC driver support")
Signed-off-by: Chunyan Zhang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>

membarrier: Execute SYNC_CORE on the calling thread

membarrier()'s MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE is documented as
syncing the core on all sibling threads but not necessarily the calling
thread.  This behavior is fundamentally buggy and cannot be used safely.

Suppose a user program has two threads.  Thread A is on CPU 0 and thread B
is on CPU 1.  Thread A modifies some text and calls
membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE).

Then thread B executes the modified code.  If, at any point after
membarrier() decides which CPUs to target, thread A could be preempted and
replaced by thread B on CPU 0.  This could even happen on exit from the
membarrier() syscall.  If this happens, thread B will end up running on CPU
0 without having synced.

In principle, this could be fixed by arranging for the scheduler to issue
sync_core_before_usermode() whenever switching between two threads in the
same mm if there is any possibility of a concurrent membarrier() call, but
this would have considerable overhead.  Instead, make membarrier() sync the
calling CPU as well.

As an optimization, this avoids an extra smp_mb() in the default
barrier-only mode and an extra rseq preempt on the caller.

Fixes: 70216e18e519 ("membarrier: Provide core serializing command, *_SYNC_CORE")
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Mathieu Desnoyers <[email protected]>
Link: https://lore.kernel.org/r/250ded637696d490c69bef1877148db86066881c.1607058304.git.luto@kernel.org

membarrier: Explicitly sync remote cores when SYNC_CORE is requested

membarrier() does not explicitly sync_core() remote CPUs; instead, it
relies on the assumption that an IPI will result in a core sync.  On x86,
this may be true in practice, but it's not architecturally reliable.  In
particular, the SDM and APM do not appear to guarantee that interrupt
delivery is serializing.  While IRET does serialize, IPI return can
schedule, thereby switching to another task in the same mm that was
sleeping in a syscall.  The new task could then SYSRET back to usermode
without ever executing IRET.

Make this more robust by explicitly calling sync_core_before_usermode()
on remote cores.  (This also helps people who search the kernel tree for
instances of sync_core() and sync_core_before_usermode() -- one might be
surprised that the core membarrier code doesn't currently show up in a
such a search.)

Fixes: 70216e18e519 ("membarrier: Provide core serializing command, *_SYNC_CORE")
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Mathieu Desnoyers <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/776b448d5f7bd6b12690707f5ed67bcda7f1d427.1607058304.git.luto@kernel.org

membarrier: Add an actual barrier before rseq_preempt()

It seems that most RSEQ membarrier users will expect any stores done before
the membarrier() syscall to be visible to the target task(s). While this
is extremely likely to be true in practice, nothing actually guarantees it
by a strict reading of the x86 manuals. Rather than providing this
guarantee by accident and potentially causing a problem down the road, just
add an explicit barrier.

Fixes: 70216e18e519 ("membarrier: Provide core serializing command, *_SYNC_CORE")
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Mathieu Desnoyers <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/d3e7197e034fa4852afcf370ca49c30496e58e40.1607058304.git.luto@kernel.org

x86/membarrier: Get rid of a dubious optimization

sync_core_before_usermode() had an incorrect optimization. If the kernel
returns from an interrupt, it can get to usermode without IRET. It just has
to schedule to a different task in the same mm and do SYSRET. Fortunately,
there were no callers of sync_core_before_usermode() that could have had
in_irq() or in_nmi() equal to true, because it's only ever called from the
scheduler.

While at it, clarify a related comment.

Fixes: 70216e18e519 ("membarrier: Provide core serializing command, *_SYNC_CORE")
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Mathieu Desnoyers <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/5afc7632be1422f91eaf7611aaaa1b5b8580a086.1607058304.git.luto@kernel.org

pinctrl: intel: Actually disable Tx and Rx buffers on GPIO request

Mistakenly the buffers (input and output) become enabled together for a short
period of time during GPIO request. This is problematic, because instead of
initial motive to disable them in the commit af7e3eeb84e2
("pinctrl: intel: Disable input and output buffer when switching to GPIO"),
the driven value on the pin, which might be used as an IRQ line, brings
firmwares of some touch pads to an awkward state that needs a full power off
to recover. Fix this, as stated in the culprit commit, by disabling the buffers.

Fixes: af7e3eeb84e2 ("pinctrl: intel: Disable input and output buffer when switching to GPIO")
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=210497
Reported-by: Pierre-Louis Bossart <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Acked-by: Mika Westerberg <[email protected]>
Tested-by: Pierre-Louis Bossart <[email protected]>
Tested-by: Kai-Heng Feng <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>

mm/madvise: remove racy mm ownership check

Jann spotted the security hole due to race of mm ownership check.

If the task is sharing the mm_struct but goes through execve() before
mm_access(), it could skip process_madvise_behavior_valid check.  That
makes *any advice hint* to reach into the remote process.

This patch removes the mm ownership check.  With it, it will lose the
ability that local process could give *any* advice hint with vector
interface for some reason (e.g., performance).  Since there is no
concrete example in upstream yet, it would be better to remove the
abiliity at this moment and need to review when such new advice comes
up.

Fixes: ecb8ac8b1f14 ("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
Reported-by: Jann Horn <[email protected]>
Suggested-by: Jann Horn <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drm/amdgpu/powerplay: parse fan table for CI asics

Set up all the parameters required for SMU fan control if supported.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=201539
Acked-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

bpf, doc: Update KP's email in MAINTAINERS

Helps me use a single account to sign off and send patches use
appropriate email redirection without needing to update MAINTAINERS.

Signed-off-by: KP Singh <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

tcp: select sane initial rcvq_space.space for big MSS

Before commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
small tcp_rmem[1] values were overridden by tcp_fixup_rcvbuf() to accommodate various MSS.

This is no longer the case, and Hazem Mohamed Abuelfotoh reported
that DRS would not work for MTU 9000 endpoints receiving regular (1500 bytes) frames.

Root cause is that tcp_init_buffer_space() uses tp->rcv_wnd for upper limit
of rcvq_space.space computation, while it can select later a smaller
value for tp->rcv_ssthresh and tp->window_clamp.

ss -temoi on receiver would show :

skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) rcv_space:62496 rcv_ssthresh:56596

This means that TCP can not increase its window in tcp_grow_window(),
and that DRS can never kick.

Fix this by making sure that rcvq_space.space is not bigger than number of bytes
that can be held in TCP receive queue.

People unable/unwilling to change their kernel can work around this issue by
selecting a bigger tcp_rmem[1] value as in :

echo "4096 196608 6291456" >/proc/sys/net/ipv4/tcp_rmem

Based on an initial report and patch from Hazem Mohamed Abuelfotoh
https://lore.kernel.org/netdev/20201204180622 [email protected]/

Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
Fixes: 041a14d26715 ("tcp: start receiver buffer autotuning sooner")
Reported-by: Hazem Mohamed Abuelfotoh <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ll_temac: Fix potential NULL dereference in temac_probe()

platform_get_resource() may fail and in this case a NULL dereference
will occur.

Fix it to use devm_platform_ioremap_resource() instead of calling
platform_get_resource() and devm_ioremap().

This is detected by Coccinelle semantic patch.

@@
expression pdev, res, n, t, e, e1, e2;
@@

res = $platform_get_resource\|platform_get_resource_byname$(pdev, t, n);
+ if (!res)
+ return -EINVAL;
... when != res == NULL
e = devm_ioremap(e1, res->start, e2);

Fixes: 8425c41d1ef7 ("net: ll_temac: Extend support to non-device-tree platforms")
Signed-off-by: Zhang Changzhong <[email protected]>
Acked-by: Esben Haabendal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

afs: Fix memory leak when mounting with multiple source parameters

There's a memory leak in afs_parse_source() whereby multiple source=
parameters overwrite fc->source in the fs_context struct without freeing
the previously recorded source.

Fix this by only permitting a single source parameter and rejecting with
an error all subsequent ones.

This was caught by syzbot with the kernel memory leak detector, showing
something like the following trace:

  unreferenced object 0xffff888114375440 (size 32):
    comm "repro", pid 5168, jiffies 4294923723 (age 569.948s)
    backtrace:
      slab_post_alloc_hook+0x42/0x79
      __kmalloc_track_caller+0x125/0x16a
      kmemdup_nul+0x24/0x3c
      vfs_parse_fs_string+0x5a/0xa1
      generic_parse_monolithic+0x9d/0xc5
      do_new_mount+0x10d/0x15a
      do_mount+0x5f/0x8e
      __do_sys_mount+0xff/0x127
      do_syscall_64+0x2d/0x3a
      entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 13fcc6837049 ("afs: Add fs_context support")
Reported-by: [email protected]
Signed-off-by: David Howells <[email protected]>
cc: Randy Dunlap <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

net: tipc: prevent possible null deref of link

`tipc_node_apply_property` does a null check on a `tipc_link_entry`
pointer but also accesses the same pointer out of the null check block.

This triggers a warning on Coverity Static Analyzer because we're
implying that `e->link` can BE null.

Move "Update MTU for node link entry" line into if block to make sure
that we're not in a state that `e->link` is null.

Signed-off-by: Cengiz Can <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'sunxi-fixes-for-5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into arm/fixes

A few more RGMII-ID fixes

* tag 'sunxi-fixes-for-5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
arm64: dts: allwinner: H5: NanoPi Neo Plus2: phy-mode rgmii-id
arm64: dts: allwinner: A64 Sopine: phy-mode rgmii-id

Link: https://lore.kernel.org/r/2a351c9c-470f-4c5e-ba37-80065ae0586d.lettre@localhost
Signed-off-by: Arnd Bergmann <[email protected]>

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull sparc64 csum fix from Al Viro:
"Fix for a brown paperbag regression in sparc64"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
[regression fix] really dumb fuckup in sparc64 __csum_partial_copy() changes

Revert "scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug"

This reverts commit 103fbf8e4020845e4fcf63819288cedb092a3c91.

It turns out that it causes long boot-time latencies (to the point of
timeouts and failed boots).

The cause is the increase in request queues, and a fix for that is
queued up for 5.11, but we're reverting this commit that triggered the
problem for now.

Reported-and-tested-by: John Garry <[email protected]>
Reported-and-tested-by: Julia Lawall <[email protected]>
Reported-by: Qian Cai <[email protected]>
Acked-by: Jens Axboe <[email protected]>
Acked-by: Martin K. Petersen <[email protected]>
Link: https://lore.kernel.org/linux-scsi/[email protected]/
Link: https://lore.kernel.org/lkml/alpine.DEB.2.22.394.2012081813310.2680@hadrien/
Link: https://lore.kernel.org/linux-block/[email protected]/
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'stmmac-fixes'

Joakim Zhang says:

====================
patches for stmmac

A patch set for stmmac, fix some driver issues.

ChangeLogs:
V1->V2:
* add Fixes tag.
* add patch 5/5 into this patch set.

V2->V3:
* rebase to latest net tree where fixes go.
====================

Signed-off-by: David S. Miller <[email protected]>

net: stmmac: overwrite the dma_cap.addr64 according to HW design

The current IP register MAC_HW_Feature1[ADDR64] only defines
32/40/64 bit width, but some SOCs support others like i.MX8MP
support 34 bits but it maps to 40 bits width in MAC_HW_Feature1[ADDR64].
So overwrite dma_cap.addr64 according to HW real design.

Fixes: 94abdad6974a ("net: ethernet: dwmac: add ethernet glue logic for NXP imx8 chip")
Signed-off-by: Fugang Duan <[email protected]>
Signed-off-by: Joakim Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: stmmac: delete the eee_ctrl_timer after napi disabled

There have chance to re-enable the eee_ctrl_timer and fire the timer
in napi callback after delete the timer in .stmmac_release(), which
introduces to access eee registers in the timer function after clocks
are disabled then causes system hang. Found this issue when do
suspend/resume and reboot stress test.

It is safe to delete the timer after napi disabled and disable lpi mode.

Fixes: d765955d2ae0b ("stmmac: add the Energy Efficient Ethernet support")
Signed-off-by: Fugang Duan <[email protected]>
Signed-off-by: Joakim Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: stmmac: free tx skb buffer in stmmac_resume()

When do suspend/resume test, there have WARN_ON() log dump from
stmmac_xmit() funciton, the code logic:
entry = tx_q->cur_tx;
first_entry = entry;
WARN_ON(tx_q->tx_skbuff[first_entry]);

In normal case, tx_q->tx_skbuff[txq->cur_tx] should be NULL because
the skb should be handled and freed in stmmac_tx_clean().

But stmmac_resume() reset queue parameters like below, skb buffers
may not be freed.
tx_q->cur_tx = 0;
tx_q->dirty_tx = 0;

So free tx skb buffer in stmmac_resume() to avoid warning and
memory leak.

log:
[   46.139824] ------------[ cut here ]------------
[   46.144453] WARNING: CPU: 0 PID: 0 at drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:3235 stmmac_xmit+0x7a0/0x9d0
[   46.154969] Modules linked in: crct10dif_ce vvcam(O) flexcan can_dev
[   46.161328] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      5.4.24-2.1.0+g2ad925d15481 #1
[   46.170369] Hardware name: NXP i.MX8MPlus EVK board (DT)
[   46.175677] pstate: 80000005 (Nzcv daif -PAN -UAO)
[   46.180465] pc : stmmac_xmit+0x7a0/0x9d0
[   46.184387] lr : dev_hard_start_xmit+0x94/0x158
[   46.188913] sp : ffff800010003cc0
[   46.192224] x29: ffff800010003cc0 x28: ffff000177e2a100
[   46.197533] x27: ffff000176ef0840 x26: ffff000176ef0090
[   46.202842] x25: 0000000000000000 x24: 0000000000000000
[   46.208151] x23: 0000000000000003 x22: ffff8000119ddd30
[   46.213460] x21: ffff00017636f000 x20: ffff000176ef0cc0
[   46.218769] x19: 0000000000000003 x18: 0000000000000000
[   46.224078] x17: 0000000000000000 x16: 0000000000000000
[   46.229386] x15: 0000000000000079 x14: 0000000000000000
[   46.234695] x13: 0000000000000003 x12: 0000000000000003
[   46.240003] x11: 0000000000000010 x10: 0000000000000010
[   46.245312] x9 : ffff00017002b140 x8 : 0000000000000000
[   46.250621] x7 : ffff00017636f000 x6 : 0000000000000010
[   46.255930] x5 : 0000000000000001 x4 : ffff000176ef0000
[   46.261238] x3 : 0000000000000003 x2 : 00000000ffffffff
[   46.266547] x1 : ffff000177e2a000 x0 : 0000000000000000
[   46.271856] Call trace:
[   46.274302]  stmmac_xmit+0x7a0/0x9d0
[   46.277874]  dev_hard_start_xmit+0x94/0x158
[   46.282056]  sch_direct_xmit+0x11c/0x338
[   46.285976]  __qdisc_run+0x118/0x5f0
[   46.289549]  net_tx_action+0x110/0x198
[   46.293297]  __do_softirq+0x120/0x23c
[   46.296958]  irq_exit+0xb8/0xd8
[   46.300098]  __handle_domain_irq+0x64/0xb8
[   46.304191]  gic_handle_irq+0x5c/0x148
[   46.307936]  el1_irq+0xb8/0x180
[   46.311076]  cpuidle_enter_state+0x84/0x360
[   46.315256]  cpuidle_enter+0x34/0x48
[   46.318829]  call_cpuidle+0x18/0x38
[   46.322314]  do_idle+0x1e0/0x280
[   46.325539]  cpu_startup_entry+0x24/0x40
[   46.329460]  rest_init+0xd4/0xe0
[   46.332687]  arch_call_rest_init+0xc/0x14
[   46.336695]  start_kernel+0x420/0x44c
[   46.340353] ---[ end trace bc1ee695123cbacd ]---

Fixes: 47dd7a540b8a0 ("net: add support for STMicroelectronics Ethernet controllers.")
Signed-off-by: Fugang Duan <[email protected]>
Signed-off-by: Joakim Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: stmmac: start phylink instance before stmmac_hw_setup()

Start phylink instance and resume back the PHY to supply
RX clock to MAC before MAC layer initialization by calling
.stmmac_hw_setup(), since DMA reset depends on the RX clock,
otherwise DMA reset cost maximum timeout value then finally
timeout.

Fixes: 74371272f97f ("net: stmmac: Convert to phylink and remove phylib logic")
Signed-off-by: Fugang Duan <[email protected]>
Signed-off-by: Joakim Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: stmmac: increase the timeout for dma reset

Current timeout value is not enough for gmac5 dma reset
on imx8mp platform, increase the timeout range.

Signed-off-by: Fugang Duan <[email protected]>
Signed-off-by: Joakim Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

[regression fix] really dumb fuckup in sparc64 __csum_partial_copy() changes

~0U is -1, not 1

Reported-by: Anatoly Pugachev <[email protected]>
Tested-by: Anatoly Pugachev <[email protected]>
Fixes: fdf8bee96f9a "sparc64: propagate the calling convention changes down to __csum_partial_copy_...()"
X-brown-paperbag: yes
Signed-off-by: Al Viro <[email protected]>

netfilter: nftables: comment indirect serialization of commit_mutex with rtnl_mutex

Add an explicit comment in the code to describe the indirect
serialization of the holders of the commit_mutex with the rtnl_mutex.
Commit 90d2723c6d4c ("netfilter: nf_tables: do not hold reference on
netdevice from preparation phase") already describes this, but a comment
in this case is better for reference.

Reported-by: Vladimir Oltean <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull seq_file fix from Al Viro:
"This fixes a regression introduced in this cycle wrt iov_iter based
variant for reading a seq_file"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix return values of seq_read_iter()

netfilter: nft_dynset: fix timeouts later than 23 days

Use nf_msecs_to_jiffies64 and nf_jiffies64_to_msecs as provided by
8e1102d5a159 ("netfilter: nf_tables: support timeouts larger than 23
days"), otherwise ruleset listing breaks.

Fixes: a8b1e36d0d1d ("netfilter: nft_dynset: fix element timeout for HZ != 1000")
Signed-off-by: Pablo Neira Ayuso <[email protected]>

bonding: fix feature flag setting at init time

Don't try to adjust XFRM support flags if the bond device isn't yet
registered. Bad things can currently happen when netdev_change_features()
is called without having wanted_features fully filled in yet. This code
runs both on post-module-load mode changes, as well as at module init
time, and when run at module init time, it is before register_netdevice()
has been called and filled in wanted_features. The empty wanted_features
led to features also getting emptied out, which was definitely not the
intended behavior, so prevent that from happening.

Originally, I'd hoped to stop adjusting wanted_features at all in the
bonding driver, as it's documented as being something only the network
core should touch, but we actually do need to do this to properly update
both the features and wanted_features fields when changing the bond type,
or we get to a situation where ethtool sees:

esp-hw-offload: off [requested on]

I do think we should be using netdev_update_features instead of
netdev_change_features here though, so we only send notifiers when the
features actually changed.

Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load")
Reported-by: Ivan Vecera <[email protected]>
Suggested-by: Ivan Vecera <[email protected]>
Cc: Jay Vosburgh <[email protected]>
Cc: Veaceslav Falico <[email protected]>
Cc: Andy Gospodarek <[email protected]>
Signed-off-by: Jarod Wilson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

io_uring: fix file leak on error path of io ctx creation

Put file as part of error handling when setting up io ctx to fix
memory leaks like the following one.

   BUG: memory leak
   unreferenced object 0xffff888101ea2200 (size 256):
     comm "syz-executor355", pid 8470, jiffies 4294953658 (age 32.400s)
     hex dump (first 32 bytes):
       00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
       20 59 03 01 81 88 ff ff 80 87 a8 10 81 88 ff ff   Y..............
     backtrace:
       [<000000002e0a7c5f>] kmem_cache_zalloc include/linux/slab.h:654 [inline]
       [<000000002e0a7c5f>] __alloc_file+0x1f/0x130 fs/file_table.c:101
       [<000000001a55b73a>] alloc_empty_file+0x69/0x120 fs/file_table.c:151
       [<00000000fb22349e>] alloc_file+0x33/0x1b0 fs/file_table.c:193
       [<000000006e1465bb>] alloc_file_pseudo+0xb2/0x140 fs/file_table.c:233
       [<000000007118092a>] anon_inode_getfile fs/anon_inodes.c:91 [inline]
       [<000000007118092a>] anon_inode_getfile+0xaa/0x120 fs/anon_inodes.c:74
       [<000000002ae99012>] io_uring_get_fd fs/io_uring.c:9198 [inline]
       [<000000002ae99012>] io_uring_create fs/io_uring.c:9377 [inline]
       [<000000002ae99012>] io_uring_setup+0x1125/0x1630 fs/io_uring.c:9411
       [<000000008280baad>] do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       [<00000000685d8cf0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported-by: [email protected]
Fixes: 0f2122045b94 ("io_uring: don't rely on weak ->files references")
Cc: Pavel Begunkov <[email protected]>
Signed-off-by: Hillf Danton <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

tools/bpftool: Fix PID fetching with a lot of results

In case of having so many PID results that they don't fit into a singe page
(4096) bytes, bpftool will erroneously conclude that it got corrupted data due
to 4096 not being a multiple of struct pid_iter_entry, so the last entry will
be partially truncated. Fix this by sizing the buffer to fit exactly N entries
with no truncation in the middle of record.

Fixes: d53dee3fe013 ("tools/bpftool: Show info for processes holding BPF map/prog/link/btf FDs")
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

drm/i915/gt: Declare gen9 has 64 mocs entries!

We checked the table size against a hardcoded number of entries, and
that number was excluding the special mocs registers at the end.

Fixes: 777a7717d60c ("drm/i915/gt: Program mocs:63 for cache eviction on gen9")
Signed-off-by: Chris Wilson <[email protected]>
Cc: <[email protected]> # v4.3+
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 444fbf5d7058099447c5366ba8bb60d610aeb44b)
Signed-off-by: Rodrigo Vivi <[email protected]>
[backported and updated the Fixes sha]

drm/i915/display/dp: Compute the correct slice count for VDSC on DP

This patch fixes the slice count computation algorithm
for calculating the slice count based on Peak pixel rate
and the max slice width allowed on the DSC engines.
We need to ensure slice count > min slice count req
as per DP spec based on peak pixel rate and that it is
greater than min slice count based on the max slice width
advertised by DPCD. So use max of these two.
In the prev patch we were using min of these 2 causing it
to violate the max slice width limitation causing a blank
screen on 8K@60.

Fixes: d9218c8f6cf4 ("drm/i915/dp: Add helpers for Compressed BPP and Slice Count for DSC")
Cc: Ankit Nautiyal <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: <[email protected]> # v5.0+
Signed-off-by: Manasi Navare <[email protected]>
Reviewed-by: Ankit Nautiyal <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit d371d6ea92ad2a47f42bbcaa786ee5f6069c9c14)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915: fix size_t greater or equal to zero comparison

Currently the check that the unsigned size_t variable i is >= 0
is always true because the unsigned variable will never be negative,
causing the loop to run forever. Fix this by changing the
pre-decrement check to a zero check on i followed by a decrement of i.

Addresses-Coverity: ("Unsigned compared against 0")
Fixes: bfed6708d6c9 ("drm/i915: use vmap in shmem_pin_map")
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit e70956a2498dc81d8f2522cba074f55ae910e13c)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915/gt: Cancel the preemption timeout on responding to it

We currently presume that the engine reset is successful, cancelling the
expired preemption timer in the process. However, engine resets can
fail, leaving the timeout still pending and we will then respond to the
timeout again next time the tasklet fires. What we want is for the
failed engine reset to be promoted to a full device reset, which is
kicked by the heartbeat once the engine stops processing events.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1168
Fixes: 3a7a92aba8fb ("drm/i915/execlists: Force preemption")
Signed-off-by: Chris Wilson <[email protected]>
Cc: <[email protected]> # v5.5+
Reviewed-by: Mika Kuoppala <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit d997e240ceecb4f732611985d3a939ad1bfc1893)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915/gt: Ignore repeated attempts to suspend request flow across reset

Before reseting the engine, we suspend the execution of the guilty
request, so that we can continue execution with a new context while we
slowly compress the captured error state for the guilty context. However,
if the reset fails, we will promptly attempt to reset the same request
again, and discover the ongoing capture. Ignore the second attempt to
suspend and capture the same request.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1168
Fixes: 32ff621fd744 ("drm/i915/gt: Allow temporary suspension of inflight requests")
Signed-off-by: Chris Wilson <[email protected]>
Cc: <[email protected]> # v5.7+
Reviewed-by: Mika Kuoppala <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit b969540500bce60cf1cdfff5464388af32b9a553)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915/gem: Propagate error from cancelled submit due to context closure

In the course of discovering and closing many races with context closure
and execbuf submission, since commit 61231f6bd056 ("drm/i915/gem: Check
that the context wasn't closed during setup") we started checking that
the context was not closed by another userspace thread during the execbuf
ioctl. In doing so we cancelled the inflight request (by telling it to be
skipped), but kept reporting success since we do submit a request, albeit
one that doesn't execute. As the error is known before we return from the
ioctl, we can report the error we detect immediately, rather than leave
it on the fence status. With the immediate propagation of the error, it
is easier for userspace to handle.

Fixes: 61231f6bd056 ("drm/i915/gem: Check that the context wasn't closed during setup")
Testcase: igt/gem_ctx_exec/basic-close-race
Signed-off-by: Chris Wilson <[email protected]>
Cc: <[email protected]> # v5.7+
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit ba38b79eaeaeed29d2383f122d5c711ebf5ed3d1)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915/gem: Check the correct variable in selftest

There is a copy and paste bug in this code. It's supposed to check
"obj2" instead of checking "obj" a second time.

Fixes: 80f0b679d6f0 ("drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/8ilneOcJAjwqU4t@mwand
(cherry picked from commit 14f2d7604f7ce4cb3d303aea17292d119dfafa75)
Signed-off-by: Rodrigo Vivi <[email protected]>

netfilter: x_tables: Switch synchronization to RCU

When running concurrent iptables rules replacement with data, the per CPU
sequence count is checked after the assignment of the new information.
The sequence count is used to synchronize with the packet path without the
use of any explicit locking. If there are any packets in the packet path using
the table information, the sequence count is incremented to an odd value and
is incremented to an even after the packet process completion.

The new table value assignment is followed by a write memory barrier so every
CPU should see the latest value. If the packet path has started with the old
table information, the sequence counter will be odd and the iptables
replacement will wait till the sequence count is even prior to freeing the
old table info.

However, this assumes that the new table information assignment and the memory
barrier is actually executed prior to the counter check in the replacement
thread. If CPU decides to execute the assignment later as there is no user of
the table information prior to the sequence check, the packet path in another
CPU may use the old table information. The replacement thread would then free
the table information under it leading to a use after free in the packet
processing context-

Unable to handle kernel NULL pointer dereference at virtual
address 000000000000008e
pc : ip6t_do_table+0x5d0/0x89c
lr : ip6t_do_table+0x5b8/0x89c
ip6t_do_table+0x5d0/0x89c
ip6table_filter_hook+0x24/0x30
nf_hook_slow+0x84/0x120
ip6_input+0x74/0xe0
ip6_rcv_finish+0x7c/0x128
ipv6_rcv+0xac/0xe4
__netif_receive_skb+0x84/0x17c
process_backlog+0x15c/0x1b8
napi_poll+0x88/0x284
net_rx_action+0xbc/0x23c
__do_softirq+0x20c/0x48c

This could be fixed by forcing instruction order after the new table
information assignment or by switching to RCU for the synchronization.

Fixes: 80055dab5de0 ("netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore")
Reported-by: Sean Tranchetti <[email protected]>
Reported-by: kernel test robot <[email protected]>
Suggested-by: Florian Westphal <[email protected]>
Signed-off-by: Subash Abhinov Kasiviswanathan <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

pinctrl: aspeed: Fix GPIO requests on pass-through banks

Commit 6726fbff19bf ("pinctrl: aspeed: Fix GPI only function problem.")
fixes access to GPIO banks T and U on the AST2600. Both banks contain
input-only pins and the GPIO pin function is named GPITx and GPIUx
respectively. Unfortunately the fix had a negative impact on GPIO banks
D and E for the AST2400 and AST2500 where the GPIO pass-through
functions take similar "GPI"-style names. The net effect on the older
SoCs was that when the GPIO subsystem requested a pin in banks D or E be
muxed for GPIO, they were instead muxed for pass-through mode.
Mistakenly muxing pass-through mode e.g. breaks booting the host on
IBM's Witherspoon (AC922) platform where GPIOE0 is used for FSI.

Further exploit the names in the provided expression structure to
differentiate pass-through from pin-specific GPIO modes.

This follow-up fix gives the expected behaviour for the following tests:

Witherspoon BMC (AST2500):

1. Power-on the Witherspoon host
2. Request GPIOD1 be muxed via /sys/class/gpio/export
3. Request GPIOE1 be muxed via /sys/class/gpio/export
4. Request the balls for GPIOs E2 and E3 be muxed as GPIO pass-through
("GPIE2" mode) via a pinctrl hog in the devicetree

Rainier BMC (AST2600):

5. Request GPIT0 be muxed via /sys/class/gpio/export
6. Request GPIU0 be muxed via /sys/class/gpio/export

Together the tests demonstrate that all three pieces of functionality
(general GPIOs via 1, 2 and 3, input-only GPIOs via 5 and 6, pass-through
mode via 4) operate as desired across old and new SoCs.

Fixes: 9b92f5c51e9a ("pinctrl: aspeed: Fix GPI only function problem.")
Signed-off-by: Andrew Jeffery <[email protected]>
Tested-by: Joel Stanley <[email protected]>
Reviewed-by: Joel Stanley <[email protected]>
Cc: Billy Tsai <[email protected]>
Cc: Joel Stanley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>

media: vidtv: fix some warnings

As reported by sparse:

drivers/media/test-drivers/vidtv/vidtv_ts.h:47:47: warning: array of flexible structures
drivers/media/test-drivers/vidtv/vidtv_channel.c:458:54: warning: incorrect type in argument 3 (different base types)
drivers/media/test-drivers/vidtv/vidtv_channel.c:458:54: expected unsigned short [usertype] service_id
drivers/media/test-drivers/vidtv/vidtv_channel.c:458:54: got restricted __be16 [usertype] service_id
drivers/media/test-drivers/vidtv/vidtv_s302m.c:471 vidtv_s302m_encoder_init() warn: possible memory leak of 'e'

Address such warnings.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2020-12-07

1) Sysbot reported fixes for the new 64/32 bit compat layer.
   From Dmitry Safonov.

2) Fix a memory leak in xfrm_user_policy that was introduced
   by adding the 64/32 bit compat layer. From Yu Kuai.

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  net: xfrm: fix memory leak in xfrm_user_policy()
  xfrm/compat: Don't allocate memory with __GFP_ZERO
  xfrm/compat: memset(0) 64-bit padding at right place
  xfrm/compat: Translate by copying XFRMA_UNSPEC attribute
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

scsi: hisi_sas: Select a suitable queue for internal I/Os

For when managed interrupts are used (and shost->nr_hw_queues is set), a
fixed queue - set per-device - is still used for internal I/Os.

If all the CPUs mapped to that queue are offlined, then the completions for
that queue are not serviced and any internal I/Os will time out.

Fix by selecting a queue for internal I/Os from the queue mapped from the
current CPU in this scenario.

This is still not ideal as it does not deal with CPU hotplug for inflight
internal I/Os, and needs proper support from [0].

[0] https://lore.kernel.org/linux-scsi/20200703130122 [email protected]/T/#m7d77d049b18f33a24ef206af69ebb66d07440556

Link: https://lore.kernel.org/r/[email protected]
Fixes: 8d98416a55eb ("scsi: hisi_sas: Switch v3 hw to MQ")
Signed-off-by: Xiang Chen <[email protected]>
Signed-off-by: John Garry <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: core: Fix race between handling STS_RESOURCE and completion

When queuing I/O request to LLD, STS_RESOURCE may be returned because:

- Host is in recovery or blocked

- Target queue throttling or target is blocked

- LLD rejection

In these scenarios BLK_STS_DEV_RESOURCE is returned to the block layer to
avoid an unnecessary re-run of the queue. However, all of the requests
queued to this SCSI device may complete immediately after reading
'sdev->device_busy' and BLK_STS_DEV_RESOURCE is returned to block layer. In
that case the current I/O won't get a chance to get queued since it is
invisible at that time for both scsi_run_queue_async() and blk-mq's
RESTART.

Fix the issue by not returning BLK_STS_DEV_RESOURCE in this situation.

Link: https://lore.kernel.org/r/[email protected]
Fixes: 86ff7c2a80cd ("blk-mq: introduce BLK_STS_DEV_RESOURCE")
Cc: Hannes Reinecke <[email protected]>
Cc: Sumit Saxena <[email protected]>
Cc: Kashyap Desai <[email protected]>
Cc: Bart Van Assche <[email protected]>
Cc: Ewan Milne <[email protected]>
Cc: Long Li <[email protected]>
Reported-by: John Garry <[email protected]>
Tested-by: "chenxiang (M)" <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

net: stmmac: dwmac-meson8b: fix mask definition of the m250_sel mux

The m250_sel mux clock uses bit 4 in the PRG_ETH0 register. Fix this by
shifting the PRG_ETH0_CLK_M250_SEL_MASK accordingly as the "mask" in
struct clk_mux expects the mask relative to the "shift" field in the
same struct.

While here, get rid of the PRG_ETH0_CLK_M250_SEL_SHIFT macro and use
__ffs() to determine it from the existing PRG_ETH0_CLK_M250_SEL_MASK
macro.

Fixes: 566e8251625304 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
Signed-off-by: Martin Blumenstingl <[email protected]>
Reviewed-by: Jerome Brunet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

dpaa2-mac: Add a missing of_node_put after of_device_is_available

Add an 'of_node_put()' call when a tested device node is not available.

Fixes: 94ae899b2096 ("dpaa2-mac: add PCS support through the Lynx module")
Signed-off-by: Christophe JAILLET <[email protected]>
Reviewed-by: Ioana Ciornei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

mptcp: print new line in mptcp_seq_show() if mptcp isn't in use

When do cat /proc/net/netstat, the output isn't append with a new line, it looks like this:
[root@localhost ~]# cat /proc/net/netstat
...
MPTcpExt: 0 0 0 0 0 0 0 0 0 0 0 0 0[root@localhost ~]#

This is because in mptcp_seq_show(), if mptcp isn't in use, net->mib.mptcp_statistics is NULL,
so it just puts all 0 after "MPTcpExt:", and return, forgot the '\n'.

After this patch:

[root@localhost ~]# cat /proc/net/netstat
...
MPTcpExt: 0 0 0 0 0 0 0 0 0 0 0 0 0
[root@localhost ~]#

Fixes: fc518953bc9c8d7d ("mptcp: add and use MIB counter infrastructure")
Signed-off-by: Jianguo Wu <[email protected]>
Acked-by: Florian Westphal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

bridge: Fix a deadlock when enabling multicast snooping

When enabling multicast snooping, bridge module deadlocks on multicast_lock
if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2
network.

The deadlock was caused by the following sequence: While holding the lock,
br_multicast_open calls br_multicast_join_snoopers, which eventually causes
IP stack to (attempt to) send out a Listener Report (in igmp6_join_group).
Since the destination Ethernet address is a multicast address, br_dev_xmit
feeds the packet back to the bridge via br_multicast_rcv, which in turn
calls br_multicast_add_group, which then deadlocks on multicast_lock.

The fix is to move the call br_multicast_join_snoopers outside of the
critical section. This works since br_multicast_join_snoopers only deals
with IP and does not modify any multicast data structures of the bridge,
so there's no need to hold the lock.

Steps to reproduce:
1. sysctl net.ipv6.conf.all.force_mld_version=1
2. have another querier
3. ip link set dev bridge type bridge mcast_snooping 0 && \
   ip link set dev bridge type bridge mcast_snooping 1 < deadlock >

A typical call trace looks like the following:

[  936.251495]  _raw_spin_lock+0x5c/0x68
[  936.255221]  br_multicast_add_group+0x40/0x170 [bridge]
[  936.260491]  br_multicast_rcv+0x7ac/0xe30 [bridge]
[  936.265322]  br_dev_xmit+0x140/0x368 [bridge]
[  936.269689]  dev_hard_start_xmit+0x94/0x158
[  936.273876]  __dev_queue_xmit+0x5ac/0x7f8
[  936.277890]  dev_queue_xmit+0x10/0x18
[  936.281563]  neigh_resolve_output+0xec/0x198
[  936.285845]  ip6_finish_output2+0x240/0x710
[  936.290039]  __ip6_finish_output+0x130/0x170
[  936.294318]  ip6_output+0x6c/0x1c8
[  936.297731]  NF_HOOK.constprop.0+0xd8/0xe8
[  936.301834]  igmp6_send+0x358/0x558
[  936.305326]  igmp6_join_group.part.0+0x30/0xf0
[  936.309774]  igmp6_group_added+0xfc/0x110
[  936.313787]  __ipv6_dev_mc_inc+0x1a4/0x290
[  936.317885]  ipv6_dev_mc_inc+0x10/0x18
[  936.321677]  br_multicast_open+0xbc/0x110 [bridge]
[  936.326506]  br_multicast_toggle+0xec/0x140 [bridge]

Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address")
Signed-off-by: Joseph Huang <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>