Git Repo - linux.git/log

Merge tag 'trace-v5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
"Limit the shooting in the foot of tp_printk

  The "tp_printk" option redirects the trace event output to printk at
  boot up. This is useful when a machine crashes before boot where the
  trace events can not be retrieved by the in kernel ring buffer. But it
  can be "dangerous" because trace events can be located in high
  frequency locations such as interrupts and the scheduler, where a
  printk can slow it down that it live locks the machine (because by the
  time the printk finishes, the next event is triggered). Thus tp_printk
  must be used with care.

  It was discovered that the filter logic to trace events does not apply
  to the tp_printk events. This can cause a surprise and live lock when
  the user expects it to be filtered to limit the amount of events
  printed to the console when in fact it still prints everything"

* tag 'trace-v5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Apply trace filters on all output channels

Merge branch 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm

Pull ARM cpufreq fixes for v5.14 from Viresh Kumar:

"This contains:

- Addition of SoCs to blocklist for cpufreq-dt driver (Bjorn Andersson
   and Thara Gopinath).

- Fix error path for scmi driver (Lukasz Luba).

- Temporarily disable highest frequency for armada, its unsafe and
   breaks stuff."

* 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant
  cpufreq: blocklist Qualcomm sm8150 in cpufreq-dt-platdev
  cpufreq: arm_scmi: Fix error path when allocation failed
  cpufreq: blacklist Qualcomm sc8180x in cpufreq-dt-platdev

drm: Copy drm_wait_vblank to user before returning

[Why]
Userspace should get back a copy of drm_wait_vblank that's been modified
even when drm_wait_vblank_ioctl returns a failure.

Rationale:
drm_wait_vblank_ioctl modifies the request and expects the user to read
it back. When the type is RELATIVE, it modifies it to ABSOLUTE and updates
the sequence to become current_vblank_count + sequence (which was
RELATIVE), but now it became ABSOLUTE.
drmWaitVBlank (in libdrm) expects this to be the case as it modifies
the request to be Absolute so it expects the sequence to would have been
updated.

The change is in compat_drm_wait_vblank, which is called by
drm_compat_ioctl. This change of copying the data back regardless of the
return number makes it en par with drm_ioctl, which always copies the
data before returning.

[How]
Return from the function after everything has been copied to user.

Fixes IGT:kms_flip::modeset-vs-vblank-race-interruptible
Tested on ChromeOS Trogdor(msm)

Reviewed-by: Michel Dänzer <[email protected]>
Signed-off-by: Mark Yacoub <[email protected]>
Signed-off-by: Sean Paul <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

net: qlcnic: add missed unlock in qlcnic_83xx_flash_read32

qlcnic_83xx_unlock_flash() is called on all paths after we call
qlcnic_83xx_lock_flash(), except for one error path on failure
of QLCRD32(), which may cause a deadlock. This bug is suggested
by a static analysis tool, please advise.

Fixes: 81d0aeb0a4fff ("qlcnic: flash template based firmware reset recovery")
Signed-off-by: Dinghao Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

blk-mq: fix kernel panic during iterating over flush request

For fixing use-after-free during iterating over requests, we grabbed
request's refcount before calling ->fn in commit 2e315dc07df0 ("blk-mq:
grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter").
Turns out this way may cause kernel panic when iterating over one flush
request:

1) old flush request's tag is just released, and this tag is reused by
one new request, but ->rqs[] isn't updated yet

2) the flush request can be re-used for submitting one new flush command,
so blk_rq_init() is called at the same time

3) meantime blk_mq_queue_tag_busy_iter() is called, and old flush request
is retrieved from ->rqs[tag]; when blk_mq_put_rq_ref() is called,
flush_rq->end_io may not be updated yet, so NULL pointer dereference
is triggered in blk_mq_put_rq_ref().

Fix the issue by calling refcount_set(&flush_rq->ref, 1) after
flush_rq->end_io is set. So far the only other caller of blk_rq_init() is
scsi_ioctl_reset() in which the request doesn't enter block IO stack and
the request reference count isn't used, so the change is safe.

Fixes: 2e315dc07df0 ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter")
Reported-by: "Blank-Burian, Markus, Dr." <[email protected]>
Tested-by: "Blank-Burian, Markus, Dr." <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: John Garry <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

blk-mq: don't grab rq's refcount in blk_mq_check_expired()

Inside blk_mq_queue_tag_busy_iter() we already grabbed request's
refcount before calling ->fn(), so needn't to grab it one more time
in blk_mq_check_expired().

Meantime remove extra request expire check in blk_mq_check_expired().

Cc: Keith Busch <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: John Garry <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

mac80211: fix locking in ieee80211_restart_work()

Ilan's change to move locking around accidentally lost the
wiphy_lock() during some porting, add it back.

Fixes: 45daaa131841 ("mac80211: Properly WARN on HW scan before restart")
Signed-off-by: Johannes Berg <[email protected]>
Link: https://lore.kernel.org/r/20210817121210.47bdb177064f.Ib1ef79440cd27f318c028ddfc0c642406917f512@changeid
Signed-off-by: Jakub Kicinski <[email protected]>

virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LRO

Commit a02e8964eaf92 ("virtio-net: ethtool configurable LRO")
maps LRO to virtio guest offloading features and allows the
administrator to enable and disable those features via ethtool.

This leads to several issues:

- For a device that doesn't support control guest offloads, the "LRO"
  can't be disabled triggering WARN in dev_disable_lro() when turning
  off LRO or when enabling forwarding bridging etc.

- For a device that supports control guest offloads, the guest
  offloads are disabled in cases of bridging, forwarding etc slowing
  down the traffic.

Fix this by using NETIF_F_GRO_HW instead. Though the spec does not
guarantee packets to be re-segmented as the original ones,
we can add that to the spec, possibly with a flag for devices to
differentiate between GRO and LRO.

Further, we never advertised LRO historically before a02e8964eaf92
("virtio-net: ethtool configurable LRO") and so bridged/forwarded
configs effectively always relied on virtio receive offloads behaving
like GRO - thus even if this breaks any configs it is at least not
a regression.

Fixes: a02e8964eaf92 ("virtio-net: ethtool configurable LRO")
Acked-by: Michael S. Tsirkin <[email protected]>
Reported-by: Ivan <[email protected]>
Tested-by: Ivan <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ALSA: hda/via: Apply runtime PM workaround for ASUS B23E

ASUS B23E requires the same workaround like other machines with
VT1802, otherwise it looses the codec power on a few nodes and the
sound kept silence.

Fixes: a0645daf1610 ("ALSA: HDA: Early Forbid of runtime PM")
Link: https://lore.kernel.org/r/[email protected]
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

ALSA: hda: Fix hang during shutdown due to link reset

During system shutdown codecs may be still active, and resetting the
controller->codec HW link in this state - based on the bug reporter's
tests - leads to the shutdown sequence to get stuck. This happens at
least on the reporter's KBL system with an ALC662 codec.

For now fix the issue by skipping the link reset step.

Fixes: 472e18f63c42 ("ALSA: hda: Release controller display power during shutdown/reboot")
References: https://bugzilla.kernel.org/show_bug.cgi?id=214045
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3618#note_1024665
Reported-and-tested-by: [email protected]
Cc: [email protected]
Signed-off-by: Imre Deak <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
"This contains a fix for a potential boot failure due to a missing
Kconfig dependency for people upgrading with the DRBG enabled"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: drbg - select SHA512

vrf: Reset skb conntrack connection on VRF rcv

To fix the "reverse-NAT" for replies.

When a packet is sent over a VRF, the POST_ROUTING hooks are called
twice: Once from the VRF interface, and once from the "actual"
interface the packet will be sent from:
1) First SNAT: l3mdev_l3_out() -> vrf_l3_out() -> .. -> vrf_output_direct()
     This causes the POST_ROUTING hooks to run.
2) Second SNAT: 'ip_output()' calls POST_ROUTING hooks again.

Similarly for replies, first ip_rcv() calls PRE_ROUTING hooks, and
second vrf_l3_rcv() calls them again.

As an example, consider the following SNAT rule:
> iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j SNAT --to-source 2.2.2.2 -o vrf_1

In this case sending over a VRF will create 2 conntrack entries.
The first is from the VRF interface, which performs the IP SNAT.
The second will run the SNAT, but since the "expected reply" will remain
the same, conntrack randomizes the source port of the packet:
e..g With a socket bound to 1.1.1.1:10000, sending to 3.3.3.3:53, the conntrack
rules are:
udp      17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=0 bytes=0 mark=0 use=1
udp      17 29 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1

i.e. First SNAT IP from 1.1.1.1 --> 2.2.2.2, and second the src port is
SNAT-ed from 10000 --> 61033.

But when a reply is sent (3.3.3.3:53 -> 2.2.2.2:61033) only the later
conntrack entry is matched:
udp      17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=1 bytes=49 mark=0 use=1
udp      17 28 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1

And a "port 61033 unreachable" ICMP packet is sent back.

The issue is that when PRE_ROUTING hooks are called from vrf_l3_rcv(),
the skb already has a conntrack flow attached to it, which means
nf_conntrack_in() will not resolve the flow again.

This means only the dest port is "reverse-NATed" (61033 -> 10000) but
the dest IP remains 2.2.2.2, and since the socket is bound to 1.1.1.1 it's
not received.
This can be verified by logging the 4-tuple of the packet in '__udp4_lib_rcv()'.

The fix is then to reset the flow when skb is received on a VRF, to let
conntrack resolve the flow again (which now will hit the earlier flow).

To reproduce: (Without the fix "Got pkt_to_nat_port" will not be printed by
  running 'bash ./repro'):
  $ cat run_in_A1.py
  import logging
  logging.getLogger("scapy.runtime").setLevel(logging.ERROR)
  from scapy.all import *
  import argparse

  def get_packet_to_send(udp_dst_port, msg_name):
      return Ether(src='11:22:33:44:55:66', dst=iface_mac)/ \
          IP(src='3.3.3.3', dst='2.2.2.2')/ \
          UDP(sport=53, dport=udp_dst_port)/ \
          Raw(f'{msg_name}\x0012345678901234567890')

  parser = argparse.ArgumentParser()
  parser.add_argument('-iface_mac', dest="iface_mac", type=str, required=True,
                      help="From run_in_A3.py")
  parser.add_argument('-socket_port', dest="socket_port", type=str,
                      required=True, help="From run_in_A3.py")
  parser.add_argument('-v1_mac', dest="v1_mac", type=str, required=True,
                      help="From script")

  args, _ = parser.parse_known_args()
  iface_mac = args.iface_mac
  socket_port = int(args.socket_port)
  v1_mac = args.v1_mac

  print(f'Source port before NAT: {socket_port}')

  while True:
      pkts = sniff(iface='_v0', store=True, count=1, timeout=10)
      if 0 == len(pkts):
          print('Something failed, rerun the script :(', flush=True)
          break
      pkt = pkts[0]
      if not pkt.haslayer('UDP'):
          continue

      pkt_sport = pkt.getlayer('UDP').sport
      print(f'Source port after NAT: {pkt_sport}', flush=True)

      pkt_to_send = get_packet_to_send(pkt_sport, 'pkt_to_nat_port')
      sendp(pkt_to_send, '_v0', verbose=False) # Will not be received

      pkt_to_send = get_packet_to_send(socket_port, 'pkt_to_socket_port')
      sendp(pkt_to_send, '_v0', verbose=False)
      break

  $ cat run_in_A2.py
  import socket
  import netifaces

  print(f"{netifaces.ifaddresses('e00000')[netifaces.AF_LINK][0]['addr']}",
        flush=True)
  s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
  s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE,
               str('vrf_1' + '\0').encode('utf-8'))
  s.connect(('3.3.3.3', 53))
  print(f'{s. getsockname()[1]}', flush=True)
  s.settimeout(5)

  while True:
      try:
          # Periodically send in order to keep the conntrack entry alive.
          s.send(b'a'*40)
          resp = s.recvfrom(1024)
          msg_name = resp[0].decode('utf-8').split('\0')[0]
          print(f"Got {msg_name}", flush=True)
      except Exception as e:
          pass

  $ cat repro.sh
  ip netns del A1 2> /dev/null
  ip netns del A2 2> /dev/null
  ip netns add A1
  ip netns add A2

  ip -n A1 link add _v0 type veth peer name _v1 netns A2
  ip -n A1 link set _v0 up

  ip -n A2 link add e00000 type bond
  ip -n A2 link add lo0 type dummy
  ip -n A2 link add vrf_1 type vrf table 10001
  ip -n A2 link set vrf_1 up
  ip -n A2 link set e00000 master vrf_1

  ip -n A2 addr add 1.1.1.1/24 dev e00000
  ip -n A2 link set e00000 up
  ip -n A2 link set _v1 master e00000
  ip -n A2 link set _v1 up
  ip -n A2 link set lo0 up
  ip -n A2 addr add 2.2.2.2/32 dev lo0

  ip -n A2 neigh add 1.1.1.10 lladdr 77:77:77:77:77:77 dev e00000
  ip -n A2 route add 3.3.3.3/32 via 1.1.1.10 dev e00000 table 10001

  ip netns exec A2 iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j \
SNAT --to-source 2.2.2.2 -o vrf_1

  sleep 5
  ip netns exec A2 python3 run_in_A2.py > x &
  XPID=$!
  sleep 5

  IFACE_MAC=`sed -n 1p x`
  SOCKET_PORT=`sed -n 2p x`
  V1_MAC=`ip -n A2 link show _v1 | sed -n 2p | awk '{print $2'}`
  ip netns exec A1 python3 run_in_A1.py -iface_mac ${IFACE_MAC} -socket_port \
          ${SOCKET_PORT} -v1_mac ${SOCKET_PORT}
  sleep 5

  kill -9 $XPID
  wait $XPID 2> /dev/null
  ip netns del A1
  ip netns del A2
  tail x -n 2
  rm x
  set +x

Fixes: 73e20b761acf ("net: vrf: Add support for PREROUTING rules on vrf device")
Signed-off-by: Lahav Schlesinger <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'qcom-arm64-fixes-for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes

Qualcomm ARM64 fixes for v5.14

This fixes three regressions across Angler and Bullhead, introduced by
advancements in the platform definition. It then corrects the powerdown
GPIOs for the speaker amps on C630 and lastly fixes a typo that assigned
CPU7 in SC7280 to the wrong CPUfreq domain.

* tag 'qcom-arm64-fixes-for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
  arm64: dts: qcom: sdm845-oneplus: fix reserved-mem
  arm64: dts: qcom: msm8994-angler: Disable cont_splash_mem
  arm64: dts: qcom: sc7280: Fixup cpufreq domain info for cpu7
  arm64: dts: qcom: msm8992-bullhead: Fix cont_splash_mem mapping
  arm64: dts: qcom: msm8992-bullhead: Remove PSCI
  arm64: dts: qcom: c630: fix correct powerdown pin for WSA881x

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'soc-fsl-fix-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/leo/linux into arm/fixes

NXP/FSL SoC driver fixes for v5.14

QE interrupt controller driver
- Convert it to platform_device driver to make it work with fw_devlink
- Fix static analysis issue

* tag 'soc-fsl-fix-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/leo/linux:
soc: fsl: qe: fix static checker warning
soc: fsl: qe: convert QE interrupt controller to platform_device

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>

drm/amd/display: Ensure DCN save after VM setup

[Why]
DM initializes VM context after DMCUB initialization.
This results in loss of DCN_VM_CONTEXT registers after z10.

[How]
Notify DMCUB when VM setup is complete, and have DMCUB
save init registers.

v2: squash in CONFIG_DRM_AMD_DC_DCN3_1 fix

Reviewed-by: Nicholas Kazlauskas <[email protected]>
Acked-by: Wayne Lin <[email protected]>
Signed-off-by: Jake Wang <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: fix random KFDSVMRangeTest.SetGetAttributesTest test failure

KFDSVMRangeTest.SetGetAttributesTest randomly fails in stress test.

Note: Google Test filter = KFDSVMRangeTest.*
[==========] Running 18 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 18 tests from KFDSVMRangeTest
[ RUN      ] KFDSVMRangeTest.BasicSystemMemTest
[       OK ] KFDSVMRangeTest.BasicSystemMemTest (30 ms)
[ RUN      ] KFDSVMRangeTest.SetGetAttributesTest
[          ] Get default atrributes
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:154: Failure
Value of: expectedDefaultResults[i]
  Actual: 4294967295
Expected: outputAttributes[i].value
Which is: 0
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:154: Failure
Value of: expectedDefaultResults[i]
  Actual: 4294967295
Expected: outputAttributes[i].value
Which is: 0
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:152: Failure
Value of: expectedDefaultResults[i]
  Actual: 4
Expected: outputAttributes[i].type
Which is: 2
[          ] Setting/Getting atrributes
[  FAILED  ]

the root cause is that svm work queue has not finished when svm_range_get_attr is called, thus
some garbage svm interval tree data make svm_range_get_attr get wrong result. Flush work queue before
iterate svm interval tree.

Signed-off-by: Yifan Zhang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/pm: change the workload type for some cards

change the workload type for some cards as it is needed.

Signed-off-by: Kenneth Feng <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

Revert "drm/amd/pm: fix workload mismatch on vega10"

This reverts commit 0979d43259e13846d86ba17e451e17fec185d240.
Revert this because it does not apply to all the cards.

Signed-off-by: Kenneth Feng <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

Merge tag 'mtd/fixes-for-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux

Pull MTD fixes from Miquel Raynal:
"MTD core fixes:
   - Fix lock hierarchy in deregister_mtd_blktrans
   - Handle flashes without OTP gracefully
   - Break circular locks in register_mtd_blktrans

  MTD device fixes:
   - mchp48l640:
      - Fix memory leak on cmd
      - Silence some uninitialized variable warnings
   - blkdevs:
      - Initialize rq.limits.discard_granularity

  CFI fixes:
   - Fix crash when erasing/writing AMD cards

  Raw NAND fixes:
   - Fix of_get_nand_secure_regions():
      - Add a missing check
      - Avoid an unwanted probe failure when a DT property is missing"

* tag 'mtd/fixes-for-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
  mtd: rawnand: Fix probe failure due to of_get_nand_secure_regions()
  mtd: fix lock hierarchy in deregister_mtd_blktrans
  mtd: devices: mchp48l640: Fix memory leak on cmd
  mtd: cfi_cmdset_0002: fix crash when erasing/writing AMD cards
  mtd: core: handle flashes without OTP gracefully
  mtd: mchp48l640: silence some uninitialized variable warnings
  mtd: break circular locks in register_mtd_blktrans
  mtd: rawnand: Add a check in of_get_nand_secure_regions()
  mtd: mtd_blkdevs: Initialize rq.limits.discard_granularity

Merge tag 'trace-v5.14-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
"Fixes and clean ups to tracing:

   - Fix header alignment when PREEMPT_RT is enabled for osnoise tracer

   - Inject "stop" event to see where osnoise stopped the trace

   - Define DYNAMIC_FTRACE_WITH_ARGS as some code had an #ifdef for it

   - Fix erroneous message for bootconfig cmdline parameter

   - Fix crash caused by not found variable in histograms"

* tag 'trace-v5.14-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing / histogram: Fix NULL pointer dereference on strcmp() on NULL event name
  init: Suppress wrong warning for bootconfig cmdline parameter
  tracing: define needed config DYNAMIC_FTRACE_WITH_ARGS
  trace/osnoise: Print a stop tracing message
  trace/timerlat: Add a header with PREEMPT_RT additional fields
  trace/osnoise: Add a header with PREEMPT_RT additional fields

ACPI: PM: s2idle: Invert Microsoft UUID entry and exit

It was reported by a user with a Dell m15 R5 (5800H) that
the keyboard backlight was turning on when entering suspend
and turning off when exiting (the opposite of how it should be).

The user bisected it back to commit 5dbf50997578 ("ACPI: PM:
s2idle: Add support for new Microsoft UUID"). Previous to that
commit the LEDs didn't turn off at all. Confirming in the spec,
these were reversed when introduced.

Fix them to match the spec.

BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1230#note_1021836
Fixes: 5dbf50997578 ("ACPI: PM: s2idle: Add support for new Microsoft UUID")
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
"Two nested virtualization fixes for AMD processors"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: nSVM: always intercept VMLOAD/VMSAVE when nested (CVE-2021-3656)
KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl (CVE-2021-3653)

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
"Fixes in virtio, vhost, and vdpa drivers"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vdpa/mlx5: Fix queue type selection logic
  vdpa/mlx5: Avoid destroying MR on empty iotlb
  tools/virtio: fix build
  virtio_ring: pull in spinlock header
  vringh: pull in spinlock header
  virtio-blk: Add validation for block size in config space
  vringh: Use wiov->used to check for read/write desc order
  virtio_vdpa: reject invalid vq indices
  vdpa: Add documentation for vdpa_alloc_device() macro
  vDPA/ifcvf: Fix return value check for vdpa_alloc_device()
  vp_vdpa: Fix return value check for vdpa_alloc_device()
  vdpa_sim: Fix return value check for vdpa_alloc_device()
  vhost: Fix the calculation in vhost_overflow()
  vhost-vdpa: Fix integer overflow in vhost_vdpa_process_iotlb_update()
  virtio_pci: Support surprise removal of virtio pci device
  virtio: Protect vqs list access
  virtio: Keep vring_del_virtqueue() mirror of VQ create
  virtio: Improve vq->broken access to avoid any compiler optimization

ACPI: PRM: Deal with table not present or no module found

On the system PRMT table is not present, dmesg output:

$ dmesg | grep PRM
[ 1.532237] ACPI: PRMT not present
[ 1.532237] PRM: found 4294967277 modules

The result of acpi_table_parse_entries need to be checked and return
immediately if PRMT table is not present or no PRM module found.

Signed-off-by: Aubrey Li <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

tracing: Apply trace filters on all output channels

The event filters are not applied on all of the output, which results in
the flood of printk when using tp_printk. Unfolding
event_trigger_unlock_commit_regs() into trace_event_buffer_commit(), so
the filters can be applied on every output.

Link: https://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Fixes: 0daa2302968c1 ("tracing: Add tp_printk cmdline to have tracepoints go to printk()")
Signed-off-by: Pingfan Liu <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

Merge branch 'opp/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm

Pull operating performance points (OPP) framework fixes for v5.14
from iresh Kumar:

"This removes few WARN() statements from the OPP core."

* 'opp/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
opp: Drop empty-table checks from _put functions
opp: remove WARN when no valid OPPs remain

KVM: nSVM: always intercept VMLOAD/VMSAVE when nested (CVE-2021-3656)

If L1 disables VMLOAD/VMSAVE intercepts, and doesn't enable
Virtual VMLOAD/VMSAVE (currently not supported for the nested hypervisor),
then VMLOAD/VMSAVE must operate on the L1 physical memory, which is only
possible by making L0 intercept these instructions.

Failure to do so allowed the nested guest to run VMLOAD/VMSAVE unintercepted,
and thus read/write portions of the host physical memory.

Fixes: 89c8a4984fc9 ("KVM: SVM: Enable Virtual VMLOAD VMSAVE feature")
Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Maxim Levitsky <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl (CVE-2021-3653)

* Invert the mask of bits that we pick from L2 in
nested_vmcb02_prepare_control

* Invert and explicitly use VIRQ related bits bitmask in svm_clear_vintr

This fixes a security issue that allowed a malicious L1 to run L2 with
AVIC enabled, which allowed the L2 to exploit the uninitialized and enabled
AVIC to read/write the host physical memory at some offsets.

Fixes: 3d6368ef580a ("KVM: SVM: Add VMRUN handler")
Signed-off-by: Maxim Levitsky <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

net: iosm: Prevent underflow in ipc_chnl_cfg_get()

The bounds check on "index" doesn't catch negative values. Using
ARRAY_SIZE() directly is more readable and more robust because it prevents
negative values for "index". Fortunately we only pass valid values to
ipc_chnl_cfg_get() so this patch does not affect runtime.

Reported-by: Solomon Ucko <[email protected]>
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: M Chetan Kumar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drm: ttm: Don't bail from ttm_global_init if debugfs_create_dir fails

In 69de4421bb4c ("drm/ttm: Initialize debugfs from
ttm_global_init()"), ttm_global_init was changed so that if creation
of the debugfs global root directory fails, ttm_global_init will bail
out early and return an error, leading to initialization failure of
DRM drivers. However, not every system will be using debugfs. On such
a system, debugfs directory creation can be expected to fail, but DRM
drivers must still be usable. This changes it so that if creation of
TTM's debugfs root directory fails, then no biggie: keep calm and
carry on.

Fixes: 69de4421bb4c ("drm/ttm: Initialize debugfs from ttm_global_init()")
Signed-off-by: Dan Moulding <[email protected]>
Tested-by: Huacai Chen <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Reviewed-by: Christian König <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Christian König <[email protected]>

btrfs: prevent rename2 from exchanging a subvol with a directory from different parents

Cross-rename lacks a check when that would prevent exchanging a
directory and subvolume from different parent subvolume. This causes
data inconsistencies and is caught before commit by tree-checker,
turning the filesystem to read-only.

Calling the renameat2 with RENAME_EXCHANGE flags like

  renameat2(AT_FDCWD, namesrc, AT_FDCWD, namedest, (1 << 1))

on two paths:

  namesrc = dir1/subvol1/dir2
namedest = subvol2/subvol3

will cause key order problem with following write time tree-checker
report:

  [1194842.307890] BTRFS critical (device loop1): corrupt leaf: root=5 block=27574272 slot=10 ino=258, invalid previous key objectid, have 257 expect 258
  [1194842.322221] BTRFS info (device loop1): leaf 27574272 gen 8 total ptrs 11 free space 15444 owner 5
  [1194842.331562] BTRFS info (device loop1): refs 2 lock_owner 0 current 26561
  [1194842.338772]        item 0 key (256 1 0) itemoff 16123 itemsize 160
  [1194842.338793]                inode generation 3 size 16 mode 40755
  [1194842.338801]        item 1 key (256 12 256) itemoff 16111 itemsize 12
  [1194842.338809]        item 2 key (256 84 2248503653) itemoff 16077 itemsize 34
  [1194842.338817]                dir oid 258 type 2
  [1194842.338823]        item 3 key (256 84 2363071922) itemoff 16043 itemsize 34
  [1194842.338830]                dir oid 257 type 2
  [1194842.338836]        item 4 key (256 96 2) itemoff 16009 itemsize 34
  [1194842.338843]        item 5 key (256 96 3) itemoff 15975 itemsize 34
  [1194842.338852]        item 6 key (257 1 0) itemoff 15815 itemsize 160
  [1194842.338863]                inode generation 6 size 8 mode 40755
  [1194842.338869]        item 7 key (257 12 256) itemoff 15801 itemsize 14
  [1194842.338876]        item 8 key (257 84 2505409169) itemoff 15767 itemsize 34
  [1194842.338883]                dir oid 256 type 2
  [1194842.338888]        item 9 key (257 96 2) itemoff 15733 itemsize 34
  [1194842.338895]        item 10 key (258 12 256) itemoff 15719 itemsize 14
  [1194842.339163] BTRFS error (device loop1): block=27574272 write time tree block corruption detected
  [1194842.339245] ------------[ cut here ]------------
  [1194842.443422] WARNING: CPU: 6 PID: 26561 at fs/btrfs/disk-io.c:449 csum_one_extent_buffer+0xed/0x100 [btrfs]
  [1194842.511863] CPU: 6 PID: 26561 Comm: kworker/u17:2 Not tainted 5.14.0-rc3-git+ #793
  [1194842.511870] Hardware name: empty empty/S3993, BIOS PAQEX0-3 02/24/2008
  [1194842.511876] Workqueue: btrfs-worker-high btrfs_work_helper [btrfs]
  [1194842.511976] RIP: 0010:csum_one_extent_buffer+0xed/0x100 [btrfs]
  [1194842.512068] RSP: 0018:ffffa2c284d77da0 EFLAGS: 00010282
  [1194842.512074] RAX: 0000000000000000 RBX: 0000000000001000 RCX: ffff928867bd9978
  [1194842.512078] RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff928867bd9970
  [1194842.512081] RBP: ffff92876b958000 R08: 0000000000000001 R09: 00000000000c0003
  [1194842.512085] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
  [1194842.512088] R13: ffff92875f989f98 R14: 0000000000000000 R15: 0000000000000000
  [1194842.512092] FS:  0000000000000000(0000) GS:ffff928867a00000(0000) knlGS:0000000000000000
  [1194842.512095] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [1194842.512099] CR2: 000055f5384da1f0 CR3: 0000000102fe4000 CR4: 00000000000006e0
  [1194842.512103] Call Trace:
  [1194842.512128]  ? run_one_async_free+0x10/0x10 [btrfs]
  [1194842.631729]  btree_csum_one_bio+0x1ac/0x1d0 [btrfs]
  [1194842.631837]  run_one_async_start+0x18/0x30 [btrfs]
  [1194842.631938]  btrfs_work_helper+0xd5/0x1d0 [btrfs]
  [1194842.647482]  process_one_work+0x262/0x5e0
  [1194842.647520]  worker_thread+0x4c/0x320
  [1194842.655935]  ? process_one_work+0x5e0/0x5e0
  [1194842.655946]  kthread+0x135/0x160
  [1194842.655953]  ? set_kthread_struct+0x40/0x40
  [1194842.655965]  ret_from_fork+0x1f/0x30
  [1194842.672465] irq event stamp: 1729
  [1194842.672469] hardirqs last  enabled at (1735): [<ffffffffbd1104f5>] console_trylock_spinning+0x185/0x1a0
  [1194842.672477] hardirqs last disabled at (1740): [<ffffffffbd1104cc>] console_trylock_spinning+0x15c/0x1a0
  [1194842.672482] softirqs last  enabled at (1666): [<ffffffffbdc002e1>] __do_softirq+0x2e1/0x50a
  [1194842.672491] softirqs last disabled at (1651): [<ffffffffbd08aab7>] __irq_exit_rcu+0xa7/0xd0

The corrupted data will not be written, and filesystem can be unmounted
and mounted again (all changes since the last commit will be lost).

Add the missing check for new_ino so that all non-subvolumes must reside
under the same parent subvolume. There's an exception allowing to
exchange two subvolumes from any parents as the directory representing a
subvolume is only a logical link and does not have any other structures
related to the parent subvolume, unlike files, directories etc, that
are always in the inode namespace of the parent subvolume.

Fixes: cdd1fedf8261 ("btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT")
CC: [email protected] # 4.7+
Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

Merge branch 'bnxt_en-fixes'

Michael Chan says:

====================
bnxt_en: 2 bug fixes

The first one disables aRFS/NTUPLE on an older broken firmware version.
The second one adds missing memory barriers related to completion ring
handling.
====================

Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Add missing DMA memory barriers

Each completion ring entry has a valid bit to indicate that the entry
contains a valid completion event.  The driver's main poll loop
__bnxt_poll_work() has the proper dma_rmb() to make sure the valid
bit of the next entry has been checked before proceeding further.
But when we call bnxt_rx_pkt() to process the RX event, the RX
completion event consists of two completion entries and only the
first entry has been checked to be valid.  We need the same barrier
after checking the next completion entry.  Add missing dma_rmb()
barriers in bnxt_rx_pkt() and other similar locations.

Fixes: 67a95e2022c7 ("bnxt_en: Need memory barrier when processing the completion ring.")
Reported-by: Lance Richardson <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Reviewed-by: Lance Richardson <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Disable aRFS if running on 212 firmware

212 firmware broke aRFS, so disable it. Traffic may stop after ntuple
filters are inserted and deleted by the 212 firmware.

Fixes: ae10ae740ad2 ("bnxt_en: Add new hardware RFS mode.")
Reviewed-by: Pavan Chebbi <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qed: Fix null-pointer dereference in qed_rdma_create_qp()

Fix a possible null-pointer dereference in qed_rdma_create_qp().

Changes from V2:
- Revert checkpatch fixes.

Reported-by: TOTE Robot <[email protected]>
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: Shai Malin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qed: qed ll2 race condition fixes

Avoiding qed ll2 race condition and NULL pointer dereference as part
of the remove and recovery flows.

Changes form V1:
- Change (!p_rx->set_prod_addr).
- qed_ll2.c checkpatch fixes.

Change from V2:
- Revert "qed_ll2.c checkpatch fixes".

Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: Shai Malin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tipc: call tipc_wait_for_connect only when dlen is not 0

__tipc_sendmsg() is called to send SYN packet by either tipc_sendmsg()
or tipc_connect(). The difference is in tipc_connect(), it will call
tipc_wait_for_connect() after __tipc_sendmsg() to wait until connecting
is done. So there's no need to wait in __tipc_sendmsg() for this case.

This patch is to fix it by calling tipc_wait_for_connect() only when dlen
is not 0 in __tipc_sendmsg(), which means it's called by tipc_connect().

Note this also fixes the failure in tipcutils/test/ptts/:

  # ./tipcTS &
  # ./tipcTC 9
  (hang)

Fixes: 36239dab6da7 ("tipc: fix implicit-connect for SYN+")
Reported-by: Shuang Li <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Acked-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711

The controller doesn't seem to pick-up on clock changes, so set the
SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN flag to query the clock frequency
directly from the clock.

Fixes: f84e411c85be ("mmc: sdhci-iproc: Add support for emmc2 of the BCM2711")
Signed-off-by: Nicolas Saenz Julienne <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>

ptp_pch: Restore dependency on PCI

During the swap dependency on PCH_GBE to selection PTP_1588_CLOCK_PCH
incidentally dropped the implicit dependency on the PCI. Restore it.

Fixes: 18d359ceb044 ("pch_gbe, ptp_pch: Fix the dependency direction between these drivers")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mmc: sdhci-iproc: Cap min clock frequency on BCM2711

There is a known bug on BCM2711's SDHCI core integration where the
controller will hang when the difference between the core clock and the
bus clock is too great. Specifically this can be reproduced under the
following conditions:

- No SD card plugged in, polling thread is running, probing cards at
100 kHz.
- BCM2711's core clock configured at 500MHz or more.

So set 200 kHz as the minimum clock frequency available for that board.

For more information on the issue see this:
https://lore.kernel.org/linux-mmc/20210322185816 [email protected]/T/#m11f2783a09b581da6b8a15f302625b43a6ecdeca

Fixes: f84e411c85be ("mmc: sdhci-iproc: Add support for emmc2 of the BCM2711")
Signed-off-by: Nicolas Saenz Julienne <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>

net: 6pack: fix slab-out-of-bounds in decode_data

Syzbot reported slab-out-of bounds write in decode_data().
The problem was in missing validation checks.

Syzbot's reproducer generated malicious input, which caused
decode_data() to be called a lot in sixpack_decode(). Since
rx_count_cooked is only 400 bytes and noone reported before,
that 400 bytes is not enough, let's just check if input is malicious
and complain about buffer overrun.

Fail log:
==================================================================
BUG: KASAN: slab-out-of-bounds in drivers/net/hamradio/6pack.c:843
Write of size 1 at addr ffff888087c5544e by task kworker/u4:0/7

CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.6.0-rc3-syzkaller #0
...
Workqueue: events_unbound flush_to_ldisc
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x197/0x210 lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
__kasan_report.cold+0x1b/0x32 mm/kasan/report.c:506
kasan_report+0x12/0x20 mm/kasan/common.c:641
__asan_report_store1_noabort+0x17/0x20 mm/kasan/generic_report.c:137
decode_data.part.0+0x23b/0x270 drivers/net/hamradio/6pack.c:843
decode_data drivers/net/hamradio/6pack.c:965 [inline]
sixpack_decode drivers/net/hamradio/6pack.c:968 [inline]

Reported-and-tested-by: [email protected]
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Pavel Skripkin <[email protected]>
Reviewed-by: Dan Carpenter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

opp: Drop empty-table checks from _put functions

The current_opp is released only when whole OPP table is released,
otherwise it's only marked as removed by dev_pm_opp_remove_table().
Functions like dev_pm_opp_put_clkname() and dev_pm_opp_put_supported_hw()
are checking whether OPP table is empty and it's not if current_opp is
set since it holds the refcount of OPP, this produces a noisy warning
from these functions about busy OPP table. Remove the checks to fix it.

Cc: [email protected]
Fixes: 81c4d8a3c414 ("opp: Keep track of currently programmed OPP")
Signed-off-by: Dmitry Osipenko <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>

Linux 5.14-rc6

Merge tag 'powerpc-5.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

- Fix crashes coming out of nap on 32-bit Book3s (eg. powerbooks).

- Fix critical and debug interrupts on BookE, seen as crashes when
   using ptrace.

- Fix an oops when running an SMP kernel on a UP system.

- Update pseries LPAR security flavor after partition migration.

- Fix an oops when using kprobes on BookE.

- Fix oops on 32-bit pmac by not calling do_IRQ() from
   timer_interrupt().

- Fix softlockups on CPU hotplug into a CPU-less node with xive (P9).

Thanks to Cédric Le Goater, Christophe Leroy, Finn Thain, Geetika
Moolchandani, Laurent Dufour, Laurent Vivier, Nicholas Piggin, Pu Lehui,
Radu Rendec, Srikar Dronamraju, and Stan Johnson.

* tag 'powerpc-5.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/xive: Do not skip CPU-less nodes when creating the IPIs
  powerpc/interrupt: Do not call single_step_exception() from other exceptions
  powerpc/interrupt: Fix OOPS by not calling do_IRQ() from timer_interrupt()
  powerpc/kprobes: Fix kprobe Oops happens in booke
  powerpc/pseries: Fix update of LPAR security flavor after LPM
  powerpc/smp: Fix OOPS in topology_init()
  powerpc/32: Fix critical and debug interrupts on BOOKE
  powerpc/32s: Fix napping restore in data storage interrupt (DSI)

Merge tag 'irq-urgent-2021-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
"A set of fixes for PCI/MSI and x86 interrupt startup:

   - Mask all MSI-X entries when enabling MSI-X otherwise stale unmasked
     entries stay around e.g. when a crashkernel is booted.

   - Enforce masking of a MSI-X table entry when updating it, which
     mandatory according to speification

   - Ensure that writes to MSI[-X} tables are flushed.

   - Prevent invalid bits being set in the MSI mask register

   - Properly serialize modifications to the mask cache and the mask
     register for multi-MSI.

   - Cure the violation of the affinity setting rules on X86 during
     interrupt startup which can cause lost and stale interrupts. Move
     the initial affinity setting ahead of actualy enabling the
     interrupt.

   - Ensure that MSI interrupts are completely torn down before freeing
     them in the error handling case.

   - Prevent an array out of bounds access in the irq timings code"

* tag 'irq-urgent-2021-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  driver core: Add missing kernel doc for device::msi_lock
  genirq/msi: Ensure deactivation on teardown
  genirq/timings: Prevent potential array overflow in __irq_timings_store()
  x86/msi: Force affinity setup before startup
  x86/ioapic: Force affinity setup before startup
  genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUP
  PCI/MSI: Protect msi_desc::masked for multi-MSI
  PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown()
  PCI/MSI: Correct misleading comments
  PCI/MSI: Do not set invalid bits in MSI mask
  PCI/MSI: Enforce MSI[X] entry updates to be visible
  PCI/MSI: Enforce that MSI-X table entry is masked for update
  PCI/MSI: Mask all unused MSI-X entries
  PCI/MSI: Enable and mask MSI-X early

Merge tag 'locking_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fix from Borislav Petkov:

- Fix a CONFIG symbol's spelling

* tag 'locking_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rtmutex: Use the correct rtmutex debugging config option

Merge tag 'efi_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull EFI fixes from Borislav Petkov:
"A batch of fixes for the arm64 stub image loader:

   - fix a logic bug that can make the random page allocator fail
     spuriously

   - force reallocation of the Image when it overlaps with firmware
     reserved memory regions

   - fix an oversight that defeated on optimization introduced earlier
     where images loaded at a suitable offset are never moved if booting
     without randomization

   - complain about images that were not loaded at the right offset by
     the firmware image loader"

* tag 'efi_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/libstub: arm64: Double check image alignment at entry
  efi/libstub: arm64: Warn when efi_random_alloc() fails
  efi/libstub: arm64: Relax 2M alignment again for relocatable kernels
  efi/libstub: arm64: Force Image reallocation if BSS was not reserved
  arm64: efi: kaslr: Fix occasional random alloc (and boot) failure

Merge tag 'x86_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:
"Two fixes:

   - An objdump checker fix to ignore parenthesized strings in the
     objdump version

   - Fix resctrl default monitoring groups reporting when new subgroups
     get created"

* tag 'x86_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Fix default monitoring groups reporting
  x86/tools: Fix objdump version check again

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
"ARM:

   - Plug race between enabling MTE and creating vcpus

   - Fix off-by-one bug when checking whether an address range is RAM

  x86:

   - Fixes for the new MMU, especially a memory leak on hosts with <39
     physical address bits

   - Remove bogus EFER.NX checks on 32-bit non-PAE hosts

   - WAITPKG fix"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock
  KVM: x86/mmu: Don't step down in the TDP iterator when zapping all SPTEs
  KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs
  KVM: nVMX: Use vmx_need_pf_intercept() when deciding if L0 wants a #PF
  kvm: vmx: Sync all matching EPTPs when injecting nested EPT fault
  KVM: x86: remove dead initialization
  KVM: x86: Allow guest to set EFER.NX=1 on non-PAE 32-bit kernels
  KVM: VMX: Use current VMCS to query WAITPKG support for MSR emulation
  KVM: arm64: Fix race when enabling KVM_ARM_CAP_MTE
  KVM: arm64: Fix off-by-one in range_is_memory

Merge tag 'icc-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc into char-misc-linus

Georgi writes:

interconnect fix for v5.14

This contains a revert for a patch that has been causing issues:
- Revert: qcom: rpmh: Add BCMs to commit list in pre_aggregate

Signed-off-by: Georgi Djakov <[email protected]>
* tag 'icc-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc:
Revert "interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate"

ALSA: hda/realtek: Enable 4-speaker output for Dell XPS 15 9510 laptop

The 2021-model XPS 15 appears to use the same 4-speakers-on-ALC289 audio
setup as the Precision models, so requires the same quirk to enable woofer
output. Tested on my own 9510.

Signed-off-by: Kristin Paget <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Three minor fixes, all in drivers"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: mpt3sas: Fix incorrectly assigned error return and check
  scsi: storvsc: Log TEST_UNIT_READY errors as warnings
  scsi: lpfc: Move initialization of phba->poll_list earlier to avoid crash

Merge tag 'libnvdimm-fixes-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:
"A couple of fixes for long standing bugs, a warning fixup, and some
  miscellaneous dax cleanups.

  The bugs were recently found due to new platforms looking to use the
  ACPI NFIT "virtual" device definition, and new error injection
  capabilities to trigger error responses to label area requests. Ira's
  cleanups have been long pending, I neglected to send them earlier, and
  see no harm in including them now. This has all appeared in -next with
  no reported issues.

  Summary:

   - Fix support for NFIT "virtual" ranges (BIOS-defined memory disks)

   - Fix recovery from failed label storage areas on NVDIMM devices

   - Miscellaneous cleanups from Ira's investigation of
     dax_direct_access paths preparing for stray-write protection"

* tag 'libnvdimm-fixes-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  tools/testing/nvdimm: Fix missing 'fallthrough' warning
  libnvdimm/region: Fix label activation vs errors
  ACPI: NFIT: Fix support for virtual SPA ranges
  dax: Ensure errno is returned from dax_direct_access
  fs/dax: Clarify nr_pages to dax_direct_access()
  fs/fuse: Remove unneeded kaddr parameter

Merge tag 'usb-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fix from Greg KH:
"A single revert of a commit that caused problems in 5.14-rc5 for
  5.14-rc6. It has been in linux-next almost all week, and has resolved
  the issues that were reported on lots of different systems that were
  not the platform that the change was originally tested on (gotta love
  SoC cores used in multiple devices from multiple vendors...)"

* tag 'usb-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  Revert "usb: dwc3: gadget: Use list_replace_init() before traversing lists"

Merge tag 'staging-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull IIO driver fixes from Greg KH:
"Here are some small IIO driver fixes for reported problems for
  5.14-rc6 (no staging driver fixes at the moment).

  All of them resolve reported issues and have been in linux-next all
  week with no reported problems. Full details are in the shortlog"

* tag 'staging-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  iio: adc: Fix incorrect exit of for-loop
  iio: humidity: hdc100x: Add margin to the conversion time
  dt-bindings: iio: st: Remove wrong items length check
  iio: accel: fxls8962af: fix i2c dependency
  iio: adis: set GPIO reset pin direction
  iio: adc: ti-ads7950: Ensure CS is deasserted after reading channels
  iio: accel: fxls8962af: fix potential use of uninitialized symbol

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"One driver bugfix, a documentation bugfix, and an "uninitialized data"
  leak fix for the core"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  Documentation: i2c: add i2c-sysfs into index
  i2c: dev: zero out array used for i2c reads from userspace
  i2c: iproc: fix race between client unreg and tasklet

io_uring: only assign io_uring_enter() SQPOLL error in actual error case

If an SQPOLL based ring is newly created and an application issues an
io_uring_enter(2) system call on it, then we can return a spurious
-EOWNERDEAD error. This happens because there's nothing to submit, and
if the caller doesn't specify any other action, the initial error
assignment of -EOWNERDEAD never gets overwritten. This causes us to
return it directly, even if it isn't valid.

Move the error assignment into the actual failure case instead.

Cc: [email protected]
Fixes: d9d05217cb69 ("io_uring: stop SQPOLL submit on creator's death")
Reported-by: Sherlock Holo [email protected]
Link: https://github.com/axboe/liburing/issues/413
Signed-off-by: Jens Axboe <[email protected]>

Merge tag 'for-linus-5.14-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
"A small cleanup patch and a fix of a rare race in the Xen evtchn
  driver"

* tag 'for-linus-5.14-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/events: Fix race in set_evtchn_to_irq
  xen/events: remove redundant initialization of variable irq

Merge tag 'riscv-for-linus-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

- avoid passing -mno-relax to compilers that don't support it

- a comment fix

* tag 'riscv-for-linus-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix comment regarding kernel mapping overlapping with IS_ERR_VALUE
riscv: kexec: do not add '-mno-relax' flag if compiler doesn't support it

Merge tag 'configfs-5.14' of git://git.infradead.org/users/hch/configfs

Pull configfs fix from Christoph Hellwig:

- fix to revert to the historic write behavior (Bart Van Assche)

* tag 'configfs-5.14' of git://git.infradead.org/users/hch/configfs:
configfs: restore the kernel v5.13 text attribute write behavior

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"7 patches.

  Subsystems affected by this patch series: mm (kasan, mm/slub,
  mm/madvise, and memcg), and lib"

* emailed patches from Andrew Morton <[email protected]>:
  lib: use PFN_PHYS() in devmem_is_allowed()
  mm/memcg: fix incorrect flushing of lruvec data in obj_stock
  mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ|WRITE)
  mm: slub: fix slub_debug disabling for list of slabs
  slub: fix kmalloc_pagealloc_invalid_free unit test
  kasan, slub: reset tag when printing address
  kasan, kmemleak: reset tags when scanning block

Merge tag '5.14-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
"Four CIFS/SMB3 Fixes, all for stable, two relating to deferred close,
  and one for the 'modefromsid' mount option (when 'idsfromsid' not
  specified)"

* tag '5.14-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Call close synchronously during unlink/rename/lease break.
  cifs: Handle race conditions during rename
  cifs: use the correct max-length for dentry_path_raw()
  cifs: create sd context must be a multiple of 8

Merge tag 'linux-kselftest-fixes-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kselftest fix from Shuah Khan:
"A single patch to sgx test to fix Q1 and Q2 calculation"

* tag 'linux-kselftest-fixes-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/sgx: Fix Q1 and Q2 calculation in sigstruct.c

ice: Fix perout start time rounding

Internal tests found out that the latest code doesn't bring up 1PPS out
as expected. As a result of incorrect define used to round the time up
the time was round down to the past second boundary.

Fix define used for rounding to properly round up to the next Top of
second in ice_ptp_cfg_clkout to fix it.

Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins")
Signed-off-by: Maciej Machnikowski <[email protected]>
Tested-by: Sunitha Mekala <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

lib: use PFN_PHYS() in devmem_is_allowed()

The physical address may exceed 32 bits on 32-bit systems with more than
32 bits of physcial address. Use PFN_PHYS() in devmem_is_allowed(), or
the physical address may overflow and be truncated.

We found this bug when mapping a high addresses through devmem tool,
when CONFIG_STRICT_DEVMEM is enabled on the ARM with ARM_LPAE and devmem
is used to map a high address that is not in the iomem address range, an
unexpected error indicating no permission is returned.

This bug was initially introduced from v2.6.37, and the function was
moved to lib in v5.11.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 087aaffcdf9c ("ARM: implement CONFIG_STRICT_DEVMEM by disabling access to RAM via /dev/mem")
Fixes: 527701eda5f1 ("lib: Add a generic version of devmem_is_allowed()")
Signed-off-by: Liang Wang <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Russell King <[email protected]>
Cc: Liang Wang <[email protected]>
Cc: Xiaoming Ni <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: <[email protected]> [2.6.37+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/memcg: fix incorrect flushing of lruvec data in obj_stock

When mod_objcg_state() is called with a pgdat that is different from
that in the obj_stock, the old lruvec data cached in obj_stock are
flushed out. Unfortunately, they were flushed to the new pgdat and so
the data go to the wrong node. This will screw up the slab data
reported in /sys/devices/system/node/node*/meminfo.

Fix that by flushing the data to the cached pgdat instead.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 68ac5b3c8db2 ("mm/memcg: cache vmstat data in percpu memcg_stock_pcp")
Signed-off-by: Waiman Long <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Alex Shi <[email protected]>
Cc: Chris Down <[email protected]>
Cc: Yafang Shao <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Masayoshi Mizuma <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Waiman Long <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ|WRITE)

Doing some extended tests and polishing the man page update for
MADV_POPULATE_(READ|WRITE), I realized that we end up converting also
SIGBUS (via -EFAULT) to -EINVAL, making it look like yet another
madvise() user error.

We want to report only problematic mappings and permission problems that
the user could have know as -EINVAL.

Let's not convert -EFAULT arising due to SIGBUS (or SIGSEGV) to -EINVAL,
but instead indicate -EFAULT to user space. While we could also convert
it to -ENOMEM, using -EFAULT looks more helpful when user space might
want to troubleshoot what's going wrong: MADV_POPULATE_(READ|WRITE) is
not part of an final Linux release and we can still adjust the behavior.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables")
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Rolf Eike Beer <[email protected]>
Cc: Ram Pai <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: slub: fix slub_debug disabling for list of slabs

Vijayanand Jitta reports:

  Consider the scenario where CONFIG_SLUB_DEBUG_ON is set and we would
  want to disable slub_debug for few slabs. Using boot parameter with
  slub_debug=-,slab_name syntax doesn't work as expected i.e; only
  disabling debugging for the specified list of slabs. Instead it
  disables debugging for all slabs, which is wrong.

This patch fixes it by delaying the moment when the global slub_debug
flags variable is updated.  In case a "slub_debug=-,slab_name" has been
passed, the global flags remain as initialized (depending on
CONFIG_SLUB_DEBUG_ON enabled or disabled) and are not simply reset to 0.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Reported-by: Vijayanand Jitta <[email protected]>
Reviewed-by: Vijayanand Jitta <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Vinayak Menon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

slub: fix kmalloc_pagealloc_invalid_free unit test

The unit test kmalloc_pagealloc_invalid_free makes sure that for the
higher order slub allocation which goes to page allocator, the free is
called with the correct address i.e.  the virtual address of the head
page.

Commit f227f0faf63b ("slub: fix unreclaimable slab stat for bulk free")
unified the free code paths for page allocator based slub allocations
but instead of using the address passed by the caller, it extracted the
address from the page.  Thus making the unit test
kmalloc_pagealloc_invalid_free moot.  So, fix this by using the address
passed by the caller.

Should we fix this? I think yes because dev expect kasan to catch these
type of programming bugs.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: f227f0faf63b ("slub: fix unreclaimable slab stat for bulk free")
Signed-off-by: Shakeel Butt <[email protected]>
Reported-by: Nathan Chancellor <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kasan, slub: reset tag when printing address

The address still includes the tags when it is printed. With hardware
tag-based kasan enabled, we will get a false positive KASAN issue when
we access metadata.

Reset the tag before we access the metadata.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: aa1ef4d7b3f6 ("kasan, mm: reset tags when accessing metadata")
Signed-off-by: Kuan-Ying Lee <[email protected]>
Reviewed-by: Marco Elver <[email protected]>
Reviewed-by: Andrey Konovalov <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chinwen Chang <[email protected]>
Cc: Nicholas Tang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kasan, kmemleak: reset tags when scanning block

Patch series "kasan, slub: reset tag when printing address", v3.

With hardware tag-based kasan enabled, we reset the tag when we access
metadata to avoid from false alarm.

This patch (of 2):

Kmemleak needs to scan kernel memory to check memory leak.  With hardware
tag-based kasan enabled, when it scans on the invalid slab and
dereference, the issue will occur as below.

Hardware tag-based KASAN doesn't use compiler instrumentation, we can not
use kasan_disable_current() to ignore tag check.

Based on the below report, there are 11 0xf7 granules, which amounts to
176 bytes, and the object is allocated from the kmalloc-256 cache.  So
when kmemleak accesses the last 256-176 bytes, it causes faults, as those
are marked with KASAN_KMALLOC_REDZONE == KASAN_TAG_INVALID == 0xfe.

Thus, we reset tags before accessing metadata to avoid from false positives.

  BUG: KASAN: out-of-bounds in scan_block+0x58/0x170
  Read at addr f7ff0000c0074eb0 by task kmemleak/138
  Pointer tag: [f7], memory tag: [fe]

  CPU: 7 PID: 138 Comm: kmemleak Not tainted 5.14.0-rc2-00001-g8cae8cd89f05-dirty #134
  Hardware name: linux,dummy-virt (DT)
  Call trace:
   dump_backtrace+0x0/0x1b0
   show_stack+0x1c/0x30
   dump_stack_lvl+0x68/0x84
   print_address_description+0x7c/0x2b4
   kasan_report+0x138/0x38c
   __do_kernel_fault+0x190/0x1c4
   do_tag_check_fault+0x78/0x90
   do_mem_abort+0x44/0xb4
   el1_abort+0x40/0x60
   el1h_64_sync_handler+0xb4/0xd0
   el1h_64_sync+0x78/0x7c
   scan_block+0x58/0x170
   scan_gray_list+0xdc/0x1a0
   kmemleak_scan+0x2ac/0x560
   kmemleak_scan_thread+0xb0/0xe0
   kthread+0x154/0x160
   ret_from_fork+0x10/0x18

  Allocated by task 0:
   kasan_save_stack+0x2c/0x60
   __kasan_kmalloc+0xec/0x104
   __kmalloc+0x224/0x3c4
   __register_sysctl_paths+0x200/0x290
   register_sysctl_table+0x2c/0x40
   sysctl_init+0x20/0x34
   proc_sys_init+0x3c/0x48
   proc_root_init+0x80/0x9c
   start_kernel+0x648/0x6a4
   __primary_switched+0xc0/0xc8

  Freed by task 0:
   kasan_save_stack+0x2c/0x60
   kasan_set_track+0x2c/0x40
   kasan_set_free_info+0x44/0x54
   ____kasan_slab_free.constprop.0+0x150/0x1b0
   __kasan_slab_free+0x14/0x20
   slab_free_freelist_hook+0xa4/0x1fc
   kfree+0x1e8/0x30c
   put_fs_context+0x124/0x220
   vfs_kern_mount.part.0+0x60/0xd4
   kern_mount+0x24/0x4c
   bdev_cache_init+0x70/0x9c
   vfs_caches_init+0xdc/0xf4
   start_kernel+0x638/0x6a4
   __primary_switched+0xc0/0xc8

  The buggy address belongs to the object at ffff0000c0074e00
   which belongs to the cache kmalloc-256 of size 256
  The buggy address is located 176 bytes inside of
   256-byte region [ffff0000c0074e00, ffff0000c0074f00)
  The buggy address belongs to the page:
  page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x100074
  head:(____ptrval____) order:2 compound_mapcount:0 compound_pincount:0
  flags: 0xbfffc0000010200(slab|head|node=0|zone=2|lastcpupid=0xffff|kasantag=0x0)
  raw: 0bfffc0000010200 0000000000000000 dead000000000122 f5ff0000c0002300
  raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
  page dumped because: kasan: bad access detected

  Memory state around the buggy address:
   ffff0000c0074c00: f0 f0 f0 f0 f0 f0 f0 f0 f0 fe fe fe fe fe fe fe
   ffff0000c0074d00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
  >ffff0000c0074e00: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 fe fe fe fe fe
                                                      ^
   ffff0000c0074f00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
   ffff0000c0075000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ==================================================================
  Disabling lock debugging due to kernel taint
  kmemleak: 181 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kuan-Ying Lee <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Reviewed-by: Andrey Konovalov <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Nicholas Tang <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Chinwen Chang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'block-5.14-2021-08-13' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"A few fixes for block that should go into 5.14:

   - Revert the mq-deadline cgroup addition. More work is needed on this
     front, let's revert it for now and get it right before having it in
     a released kernel (Tejun)

   - blk-iocost lockdep fix (Ming)

   - nbd double completion fix (Xie)

   - Fix for non-idling when clearing the shared tag flag (Yu)"

* tag 'block-5.14-2021-08-13' of git://git.kernel.dk/linux-block:
  nbd: Aovid double completion of a request
  blk-mq: clear active_queues before clearing BLK_MQ_F_TAG_QUEUE_SHARED
  Revert "block/mq-deadline: Add cgroup support"
  blk-iocost: fix lockdep warning on blkcg->lock

Merge tag 'io_uring-5.14-2021-08-13' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
"A bit bigger than the previous weeks, but mostly just a few stable
  bound fixes. In detail:

   - Followup fixes to patches from last week for io-wq, turns out they
     weren't complete (Hao)

   - Two lockdep reported fixes out of the RT camp (me)

   - Sync the io_uring-cp example with liburing, as a few bug fixes
     never made it to the kernel carried version (me)

   - SQPOLL related TIF_NOTIFY_SIGNAL fix (Nadav)

   - Use WRITE_ONCE() when writing sq flags (Nadav)

   - io_rsrc_put_work() deadlock fix (Pavel)"

* tag 'io_uring-5.14-2021-08-13' of git://git.kernel.dk/linux-block:
  tools/io_uring/io_uring-cp: sync with liburing example
  io_uring: fix ctx-exit io_rsrc_put_work() deadlock
  io_uring: drop ctx->uring_lock before flushing work item
  io-wq: fix IO_WORKER_F_FIXED issue in create_io_worker()
  io-wq: fix bug of creating io-wokers unconditionally
  io_uring: rsrc ref lock needs to be IRQ safe
  io_uring: Use WRITE_ONCE() when writing to sq_flags
  io_uring: clear TIF_NOTIFY_SIGNAL when running task work

Merge tag 'pinctrl-v5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
"An assortment of pin control fixes of varying importance, the most
  important ones affecting Intel and AMD laptops turned up the recent
  few days so it's time to push this to your tree.

   - Fix the Kconfig dependency for Qualcomm SM8350 pin controller

   - Fix pin biasing fallback behaviour on the Mediatek pin controller

   - Fix the GPIO numbering scheme for Intel Tiger Lake-H to correspond
     to the products that are now actually out on the market

   - Fix a pin control function itemization in the Sunxi driver
     out-of-bounds access bug

   - Fix disable clocking for the RISC-V K210 pin controller on the
     errorpath

   - Fix a system shutdown bug affecting AMD Ryzen-based laptops, the
     system would not suspend but just bounce back up"

* tag 'pinctrl-v5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: amd: Fix an issue with shutdown when system set to s0ix
  pinctrl: k210: Fix k210_fpioa_probe()
  pinctrl: sunxi: Don't underestimate number of functions
  pinctrl: tigerlake: Fix GPIO mapping for newer version of software
  pinctrl: mediatek: Fix fallback behavior for bias_set_combo
  pinctrl: qcom: fix GPIOLIB dependencies

soc: fsl: qe: fix static checker warning

The patch be7ecbd240b2: "soc: fsl: qe: convert QE interrupt
controller to platform_device" from Aug 3, 2021, leads to the
following static checker warning:

drivers/soc/fsl/qe/qe_ic.c:438 qe_ic_init()
warn: unsigned 'qe_ic->virq_low' is never less than zero.

In old variant irq_of_parse_and_map() returns zero if failed so
unsigned int for virq_high/virq_low was ok.
In new variant platform_get_irq() returns negative error codes
if failed so we need to use int for virq_high/virq_low.

Also simplify high_handler checking and remove the curly braces
to make checkpatch happy.

Fixes: be7ecbd240b2 ("soc: fsl: qe: convert QE interrupt controller to platform_device")
Signed-off-by: Maxim Kochetkov <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Li Yang <[email protected]>

Merge branch 'bnxt-tx-napi-disabling-resiliency-improvements'

Jakub Kicinski says:

====================
bnxt: Tx NAPI disabling resiliency improvements

A lockdep warning was triggered by netpoll because napi poll
was taking the xmit lock. Fix that and a couple more issues
noticed while reading the code.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

bnxt: count Tx drops

Drivers should count packets they are dropping.

Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Reviewed-by: Michael Chan <[email protected]>
Reviewed-by: Edwin Peer <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

bnxt: make sure xmit_more + errors does not miss doorbells

skbs are freed on error and not put on the ring. We may, however,
be in a situation where we're freeing the last skb of a batch,
and there is a doorbell ring pending because of xmit_more() being
true earlier. Make sure we ring the door bell in such situations.

Since errors are rare don't pay attention to xmit_more() and just
always flush the pending frames.

The busy case should be safe to be left alone because it can
only happen if start_xmit races with completions and they
both enable the queue. In that case the kick can't be pending.

Noticed while reading the code.

Fixes: 4d172f21cefe ("bnxt_en: Implement xmit_more.")
Reviewed-by: Michael Chan <[email protected]>
Reviewed-by: Edwin Peer <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

bnxt: disable napi before canceling DIM

napi schedules DIM, napi has to be disabled first,
then DIM canceled.

Noticed while reading the code.

Fixes: 0bc0b97fca73 ("bnxt_en: cleanup DIM work on device shutdown")
Fixes: 6a8788f25625 ("bnxt_en: add support for software dynamic interrupt moderation")
Reviewed-by: Michael Chan <[email protected]>
Reviewed-by: Edwin Peer <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

bnxt: don't lock the tx queue from napi poll

We can't take the tx lock from the napi poll routine, because
netpoll can poll napi at any moment, including with the tx lock
already held.

The tx lock is protecting against two paths - the disable
path, and (as Michael points out) the NETDEV_TX_BUSY case
which may occur if NAPI completions race with start_xmit
and both decide to re-enable the queue.

For the disable/ifdown path use synchronize_net() to make sure
closing the device does not race we restarting the queues.
Annotate accesses to dev_state against data races.

For the NAPI cleanup vs start_xmit path - appropriate barriers
are already in place in the main spot where Tx queue is stopped
but we need to do the same careful dance in the TX_BUSY case.

Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Reviewed-by: Michael Chan <[email protected]>
Reviewed-by: Edwin Peer <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

nbd: Aovid double completion of a request

There is a race between iterating over requests in
nbd_clear_que() and completing requests in recv_work(),
which can lead to double completion of a request.

To fix it, flush the recv worker before iterating over
the requests and don't abort the completed request
while iterating.

Fixes: 96d97e17828f ("nbd: clear_sock on netlink disconnect")
Reported-by: Jiang Yadong <[email protected]>
Signed-off-by: Xie Yongji <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

selftests, bpf: Test that dead ldx_w insns are accepted

Prevent regressions related to zero-extension metadata handling during
dead code sanitization.

Signed-off-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf: Clear zext_dst of dead insns

"access skb fields ok" verifier test fails on s390 with the "verifier
bug. zext_dst is set, but no reg is defined" message. The first insns
of the test prog are ...

   0: 61 01 00 00 00 00 00 00 ldxw %r0,[%r1+0]
   8: 35 00 00 01 00 00 00 00 jge %r0,0,1
  10: 61 01 00 08 00 00 00 00 ldxw %r0,[%r1+8]

... and the 3rd one is dead (this does not look intentional to me, but
this is a separate topic).

sanitize_dead_code() converts dead insns into "ja -1", but keeps
zext_dst. When opt_subreg_zext_lo32_rnd_hi32() tries to parse such
an insn, it sees this discrepancy and bails. This problem can be seen
only with JITs whose bpf_jit_needs_zext() returns true.

Fix by clearning dead insns' zext_dst.

The commits that contributed to this problem are:

1. 5aa5bd14c5f8 ("bpf: add initial suite for selftests"), which
   introduced the test with the dead code.
2. 5327ed3d44b7 ("bpf: verifier: mark verified-insn with
   sub-register zext flag"), which introduced the zext_dst flag.
3. 83a2881903f3 ("bpf: Account for BPF_FETCH in
   insn_has_def32()"), which introduced the sanity check.
4. 9183671af6db ("bpf: Fix leakage under speculation on
   mispredicted branches"), which bisect points to.

It's best to fix this on stable branches that contain the second one,
since that's the point where the inconsistency was introduced.

Fixes: 5327ed3d44b7 ("bpf: verifier: mark verified-insn with sub-register zext flag")
Signed-off-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

tools/io_uring/io_uring-cp: sync with liburing example

This example is missing a few fixes that are in the liburing version,
synchronize with the upstream version.

Reported-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

blk-mq: clear active_queues before clearing BLK_MQ_F_TAG_QUEUE_SHARED

We run a test that delete and recover devcies frequently(two devices on
the same host), and we found that 'active_queues' is super big after a
period of time.

If device a and device b share a tag set, and a is deleted, then
blk_mq_exit_queue() will clear BLK_MQ_F_TAG_QUEUE_SHARED because there
is only one queue that are using the tag set. However, if b is still
active, the active_queues of b might never be cleared even if b is
deleted.

Thus clear active_queues before BLK_MQ_F_TAG_QUEUE_SHARED is cleared.

Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

driver core: Add missing kernel doc for device::msi_lock

Fixes: 77e89afc25f3 ("PCI/MSI: Protect msi_desc::masked for multi-MSI")
Reported-by: Stephen Rothwell <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>

ipack: tpci200: fix memory leak in the tpci200_register

The error handling code in tpci200_register does not free interface_regs
allocated by ioremap and the current version of error handling code is
problematic.

Fix this by refactoring the error handling code and free interface_regs
when necessary.

Fixes: 43986798fd50 ("ipack: add error handling for ioremap_nocache")
Cc: [email protected]
Reported-by: Dongliang Mu <[email protected]>
Signed-off-by: Dongliang Mu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

ipack: tpci200: fix many double free issues in tpci200_pci_probe

The function tpci200_register called by tpci200_install and
tpci200_unregister called by tpci200_uninstall are in pair. However,
tpci200_unregister has some cleanup operations not in the
tpci200_register. So the error handling code of tpci200_pci_probe has
many different double free issues.

Fix this problem by moving those cleanup operations out of
tpci200_unregister, into tpci200_pci_remove and reverting
the previous commit 9272e5d0028d ("ipack/carriers/tpci200:
Fix a double free in tpci200_pci_probe").

Fixes: 9272e5d0028d ("ipack/carriers/tpci200: Fix a double free in tpci200_pci_probe")
Cc: [email protected]
Reported-by: Dongliang Mu <[email protected]>
Signed-off-by: Dongliang Mu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

slimbus: ngd: reset dma setup during runtime pm

During suspend/resume NGD remote instance is power cycled along
with remotely controlled bam dma engine.
So Reset the dma configuration during this suspend resume path
so that we are not dealing with any stale dma setup.

Without this transactions timeout after first suspend resume path.

Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver")
Cc: <[email protected]>
Signed-off-by: Srinivas Kandagatla <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

slimbus: ngd: set correct device for pm

For some reason we ended up using wrong device in some places for pm_runtime calls.
Fix this so that NGG driver can do runtime pm correctly.

Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver")
Cc: <[email protected]>
Signed-off-by: Srinivas Kandagatla <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

slimbus: messaging: check for valid transaction id

In some usecases transaction ids are dynamically allocated inside
the controller driver after sending the messages which have generic
acknowledge responses. So check for this before refcounting pm_runtime.

Without this we would end up imbalancing runtime pm count by
doing pm_runtime_put() in both slim_do_transfer() and slim_msg_response()
for a single pm_runtime_get() in slim_do_transfer()

Fixes: d3062a210930 ("slimbus: messaging: add slim_alloc/free_txn_tid()")
Cc: <[email protected]>
Signed-off-by: Srinivas Kandagatla <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

slimbus: messaging: start transaction ids from 1 instead of zero

As tid is unsigned its hard to figure out if the tid is valid or
invalid. So Start the transaction ids from 1 instead of zero
so that we could differentiate between a valid tid and invalid tids

This is useful in cases where controller would add a tid for controller
specific transfers.

Fixes: d3062a210930 ("slimbus: messaging: add slim_alloc/free_txn_tid()")
Cc: <[email protected]>
Signed-off-by: Srinivas Kandagatla <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Merge branch 'kvm-tdpmmu-fixes' into kvm-master

Merge topic branch with fixes for both 5.14-rc6 and 5.15.

KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock

Add yet another spinlock for the TDP MMU and take it when marking indirect
shadow pages unsync.  When using the TDP MMU and L1 is running L2(s) with
nested TDP, KVM may encounter shadow pages for the TDP entries managed by
L1 (controlling L2) when handling a TDP MMU page fault.  The unsync logic
is not thread safe, e.g. the kvm_mmu_page fields are not atomic, and
misbehaves when a shadow page is marked unsync via a TDP MMU page fault,
which runs with mmu_lock held for read, not write.

Lack of a critical section manifests most visibly as an underflow of
unsync_children in clear_unsync_child_bit() due to unsync_children being
corrupted when multiple CPUs write it without a critical section and
without atomic operations.  But underflow is the best case scenario.  The
worst case scenario is that unsync_children prematurely hits '0' and
leads to guest memory corruption due to KVM neglecting to properly sync
shadow pages.

Use an entirely new spinlock even though piggybacking tdp_mmu_pages_lock
would functionally be ok.  Usurping the lock could degrade performance when
building upper level page tables on different vCPUs, especially since the
unsync flow could hold the lock for a comparatively long time depending on
the number of indirect shadow pages and the depth of the paging tree.

For simplicity, take the lock for all MMUs, even though KVM could fairly
easily know that mmu_lock is held for write.  If mmu_lock is held for
write, there cannot be contention for the inner spinlock, and marking
shadow pages unsync across multiple vCPUs will be slow enough that
bouncing the kvm_arch cacheline should be in the noise.

Note, even though L2 could theoretically be given access to its own EPT
entries, a nested MMU must hold mmu_lock for write and thus cannot race
against a TDP MMU page fault.  I.e. the additional spinlock only _needs_ to
be taken by the TDP MMU, as opposed to being taken by any MMU for a VM
that is running with the TDP MMU enabled.  Holding mmu_lock for read also
prevents the indirect shadow page from being freed.  But as above, keep
it simple and always take the lock.

Alternative #1, the TDP MMU could simply pass "false" for can_unsync and
effectively disable unsync behavior for nested TDP.  Write protecting leaf
shadow pages is unlikely to noticeably impact traditional L1 VMMs, as such
VMMs typically don't modify TDP entries, but the same may not hold true for
non-standard use cases and/or VMMs that are migrating physical pages (from
L1's perspective).

Alternative #2, the unsync logic could be made thread safe.  In theory,
simply converting all relevant kvm_mmu_page fields to atomics and using
atomic bitops for the bitmap would suffice.  However, (a) an in-depth audit
would be required, (b) the code churn would be substantial, and (c) legacy
shadow paging would incur additional atomic operations in performance
sensitive paths for no benefit (to legacy shadow paging).

Fixes: a2855afc7ee8 ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU")
Cc: [email protected]
Cc: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20210812181815.3378104 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86/mmu: Don't step down in the TDP iterator when zapping all SPTEs

Set the min_level for the TDP iterator at the root level when zapping all
SPTEs to optimize the iterator's try_step_down().  Zapping a non-leaf
SPTE will recursively zap all its children, thus there is no need for the
iterator to attempt to step down.  This avoids rereading the top-level
SPTEs after they are zapped by causing try_step_down() to short-circuit.

In most cases, optimizing try_step_down() will be in the noise as the cost
of zapping SPTEs completely dominates the overall time.  The optimization
is however helpful if the zap occurs with relatively few SPTEs, e.g. if KVM
is zapping in response to multiple memslot updates when userspace is adding
and removing read-only memslots for option ROMs.  In that case, the task
doing the zapping likely isn't a vCPU thread, but it still holds mmu_lock
for read and thus can be a noisy neighbor of sorts.

Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20210812181414.3376143 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs

Pass "all ones" as the end GFN to signal "zap all" for the TDP MMU and
really zap all SPTEs in this case.  As is, zap_gfn_range() skips non-leaf
SPTEs whose range exceeds the range to be zapped.  If shadow_phys_bits is
not aligned to the range size of top-level SPTEs, e.g. 512gb with 4-level
paging, the "zap all" flows will skip top-level SPTEs whose range extends
beyond shadow_phys_bits and leak their SPs when the VM is destroyed.

Use the current upper bound (based on host.MAXPHYADDR) to detect that the
caller wants to zap all SPTEs, e.g. instead of using the max theoretical
gfn, 1 << (52 - 12).  The more precise upper bound allows the TDP iterator
to terminate its walk earlier when running on hosts with MAXPHYADDR < 52.

Add a WARN on kmv->arch.tdp_mmu_pages when the TDP MMU is destroyed to
help future debuggers should KVM decide to leak SPTEs again.

The bug is most easily reproduced by running (and unloading!) KVM in a
VM whose host.MAXPHYADDR < 39, as the SPTE for gfn=0 will be skipped.

  =============================================================================
  BUG kvm_mmu_page_header (Not tainted): Objects remaining in kvm_mmu_page_header on __kmem_cache_shutdown()
  -----------------------------------------------------------------------------
  Slab 0x000000004d8f7af1 objects=22 used=2 fp=0x00000000624d29ac flags=0x4000000000000200(slab|zone=1)
  CPU: 0 PID: 1582 Comm: rmmod Not tainted 5.14.0-rc2+ #420
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  Call Trace:
   dump_stack_lvl+0x45/0x59
   slab_err+0x95/0xc9
   __kmem_cache_shutdown.cold+0x3c/0x158
   kmem_cache_destroy+0x3d/0xf0
   kvm_mmu_module_exit+0xa/0x30 [kvm]
   kvm_arch_exit+0x5d/0x90 [kvm]
   kvm_exit+0x78/0x90 [kvm]
   vmx_exit+0x1a/0x50 [kvm_intel]
   __x64_sys_delete_module+0x13f/0x220
   do_syscall_64+0x3b/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Cc: [email protected]
Cc: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20210812181414.3376143 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge tag 'kvmarm-fixes-5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 5.14, take #2

- Plug race between enabling MTE and creating vcpus
- Fix off-by-one bug when checking whether an address range is RAM

KVM: nVMX: Use vmx_need_pf_intercept() when deciding if L0 wants a #PF

Use vmx_need_pf_intercept() when determining if L0 wants to handle a #PF
in L2 or if the VM-Exit should be forwarded to L1. The current logic fails
to account for the case where #PF is intercepted to handle
guest.MAXPHYADDR < host.MAXPHYADDR and ends up reflecting all #PFs into
L1. At best, L1 will complain and inject the #PF back into L2. At
worst, L1 will eat the unexpected fault and cause L2 to hang on infinite
page faults.

Note, while the bug was technically introduced by the commit that added
support for the MAXPHYADDR madness, the shame is all on commit
a0c134347baf ("KVM: VMX: introduce vmx_need_pf_intercept").

Fixes: 1dbf5d68af6f ("KVM: VMX: Add guest physical address check in EPT violation and misconfig")
Cc: [email protected]
Cc: Peter Shier <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: Jim Mattson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20210812045615.3167686 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

kvm: vmx: Sync all matching EPTPs when injecting nested EPT fault

When a nested EPT violation/misconfig is injected into the guest,
the shadow EPT PTEs associated with that address need to be synced.
This is done by kvm_inject_emulated_page_fault() before it calls
nested_ept_inject_page_fault(). However, that will only sync the
shadow EPT PTE associated with the current L1 EPTP. Since the ASID
is based on EP4TA rather than the full EPTP, so syncing the current
EPTP is not enough. The SPTEs associated with any other L1 EPTPs
in the prev_roots cache with the same EP4TA also need to be synced.

Signed-off-by: Junaid Shahid <[email protected]>
Message-Id: <20210806222229.1645356 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge branch 'kvm-vmx-secctl' into kvm-master

Merge common topic branch for 5.14-rc6 and 5.15 merge window.