Git Repo - linux.git/log

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-03-08

This series contains updates to iavf, i40e, and ice drivers.

Michal ensures netdev features are properly updated to reflect VLAN
changes received from PF and adds an additional flag for MSI-X
reinitialization as further differentiation of reinitialization
operations is needed for iavf.

Jake stops disabling of VFs due to failed virtchannel responses for
i40e and ice driver.

Dave moves MTU event notification to the service task to prevent issues
with RTNL lock for ice.

Christophe Jaillet corrects an allocation to GFP_ATOMIC instead of
GFP_KERNEL for ice.

Jedrzej fixes the value for link speed comparison which was preventing
the requested value from being set for ice.
---
Note: This will conflict when merging with net-next. Resolution:

diff --cc drivers/net/ethernet/intel/ice/ice.h
index dc42ff92dbad,3121f9b04f59..000000000000
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@@ -484,10 -481,9 +484,11 @@@ enum ice_pf_flags
        ICE_FLAG_LEGACY_RX,
        ICE_FLAG_VF_TRUE_PROMISC_ENA,
        ICE_FLAG_MDD_AUTO_RESET_VF,
+      ICE_FLAG_VF_VLAN_PRUNING,
        ICE_FLAG_LINK_LENIENT_MODE_ENA,
        ICE_FLAG_PLUG_AUX_DEV,
+       ICE_FLAG_MTU_CHANGED,
+      ICE_FLAG_GNSS,                  /* GNSS successfully initialized */
        ICE_PF_FLAGS_NBITS              /* must be last */
  };
====================

Signed-off-by: David S. Miller <[email protected]>

tipc: fix incorrect order of state message data sanity check

When receiving a state message, function tipc_link_validate_msg()
is called to validate its header portion. Then, its data portion
is validated before it can be accessed correctly. However, current
data sanity check is done after the message header is accessed to
update some link variables.

This commit fixes this issue by moving the data sanity check to
the beginning of state message handling and right after the header
sanity check.

Fixes: 9aa422ad3266 ("tipc: improve size validations for received domain records")
Acked-by: Jon Maloy <[email protected]>
Signed-off-by: Tung Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ethernet: Fix error handling in xemaclite_of_probe

This node pointer is returned by of_parse_phandle() with refcount
incremented in this function. Calling of_node_put() to avoid the
refcount leak. As the remove function do.

Fixes: 5cdaaa12866e ("net: emaclite: adding MDIO and phy lib support")
Signed-off-by: Miaoqian Lin <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

block: fix blk_mq_attempt_bio_merge and rq_qos_throttle protection

Commit 9d497e2941c3 ("block: don't protect submit_bio_checks by
q_usage_counter") moved blk_mq_attempt_bio_merge and rq_qos_throttle
calls out of q_usage_counter protection. However, these functions require
q_usage_counter protection. The blk_mq_attempt_bio_merge call without
the protection resulted in blktests block/005 failure with KASAN null-
ptr-deref or use-after-free at bio merge. The rq_qos_throttle call
without the protection caused kernel hang at qos throttle.

To fix the failures, move the blk_mq_attempt_bio_merge and
rq_qos_throttle calls back to q_usage_counter protection.

Fixes: 9d497e2941c3 ("block: don't protect submit_bio_checks by q_usage_counter")
Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

ice: Fix curr_link_speed advertised speed

Change curr_link_speed advertised speed, due to
link_info.link_speed is not equal phy.curr_user_speed_req.
Without this patch it is impossible to set advertised
speed to same as link_speed.

Testing Hints: Try to set advertised speed
to 25G only with 25G default link (use ethtool -s 0x80000000)

Fixes: 48cb27f2fd18 ("ice: Implement handlers for ethtool PHY/link operations")
Signed-off-by: Grzegorz Siwik <[email protected]>
Signed-off-by: Jedrzej Jagielski <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

ice: Don't use GFP_KERNEL in atomic context

ice_misc_intr() is an irq handler. It should not sleep.

Use GFP_ATOMIC instead of GFP_KERNEL when allocating some memory.

Fixes: 348048e724a0 ("ice: Implement iidc operations")
Signed-off-by: Christophe JAILLET <[email protected]>
Tested-by: Leszek Kaliszczuk <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: Fix error with handling of bonding MTU

When a bonded interface is destroyed, .ndo_change_mtu can be called
during the tear-down process while the RTNL lock is held. This is a
problem since the auxiliary driver linked to the LAN driver needs to be
notified of the MTU change, and this requires grabbing a device_lock on
the auxiliary_device's dev. Currently this is being attempted in the
same execution context as the call to .ndo_change_mtu which is causing a
dead-lock.

Move the notification of the changed MTU to a separate execution context
(watchdog service task) and eliminate the "before" notification.

Fixes: 348048e724a0e ("ice: Implement iidc operations")
Signed-off-by: Dave Ertman <[email protected]>
Tested-by: Jonathan Toppins <[email protected]>
Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

ice: stop disabling VFs due to PF error responses

The ice_vc_send_msg_to_vf function has logic to detect "failure"
responses being sent to a VF. If a VF is sent more than
ICE_DFLT_NUM_INVAL_MSGS_ALLOWED then the VF is marked as disabled.
Almost identical logic also existed in the i40e driver.

This logic was added to the ice driver in commit 1071a8358a28 ("ice:
Implement virtchnl commands for AVF support") which itself copied from
the i40e implementation in commit 5c3c48ac6bf5 ("i40e: implement virtual
device interface").

Neither commit provides a proper explanation or justification of the
check. In fact, later commits to i40e changed the logic to allow
bypassing the check in some specific instances.

The "logic" for this seems to be that error responses somehow indicate a
malicious VF. This is not really true. The PF might be sending an error
for any number of reasons such as lack of resources, etc.

Additionally, this causes the PF to log an info message for every failed
VF response which may confuse users, and can spam the kernel log.

This behavior is not documented as part of any requirement for our
products and other operating system drivers such as the FreeBSD
implementation of our drivers do not include this type of check.

In fact, the change from dev_err to dev_info in i40e commit 18b7af57d9c1
("i40e: Lower some message levels") explains that these messages
typically don't actually indicate a real issue. It is quite likely that
a user who hits this in practice will be very confused as the VF will be
disabled without an obvious way to recover.

We already have robust malicious driver detection logic using actual
hardware detection mechanisms that detect and prevent invalid device
usage. Remove the logic since its not a documented requirement and the
behavior is not intuitive.

Fixes: 1071a8358a28 ("ice: Implement virtchnl commands for AVF support")
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

i40e: stop disabling VFs due to PF error responses

The i40e_vc_send_msg_to_vf_ex (and its wrapper i40e_vc_send_msg_to_vf)
function has logic to detect "failure" responses sent to the VF. If a VF
is sent more than I40E_DEFAULT_NUM_INVALID_MSGS_ALLOWED, then the VF is
marked as disabled. In either case, a dev_info message is printed
stating that a VF opcode failed.

This logic originates from the early implementation of VF support in
commit 5c3c48ac6bf5 ("i40e: implement virtual device interface").

That commit did not go far enough. The "logic" for this behavior seems
to be that error responses somehow indicate a malicious VF. This is not
really true. The PF might be sending an error for any number of reasons
such as lacking resources, an unsupported operation, etc. This does not
indicate a malicious VF. We already have a separate robust malicious VF
detection which relies on hardware logic to detect and prevent a variety
of behaviors.

There is no justification for this behavior in the original
implementation. In fact, a later commit 18b7af57d9c1 ("i40e: Lower some
message levels") reduced the opcode failure message from a dev_err to a
dev_info. In addition, recent commit 01cbf50877e6 ("i40e: Fix to not
show opcode msg on unsuccessful VF MAC change") changed the logic to
allow quieting it for expected failures.

That commit prevented this logic from kicking in for specific
circumstances. This change did not go far enough. The behavior is not
documented nor is it part of any requirement for our products. Other
operating systems such as the FreeBSD implementation of our driver do
not include this logic.

It is clear this check does not make sense, and causes problems which
led to ugly workarounds.

Fix this by just removing the entire logic and the need for the
i40e_vc_send_msg_to_vf_ex function.

Fixes: 01cbf50877e6 ("i40e: Fix to not show opcode msg on unsuccessful VF MAC change")
Fixes: 5c3c48ac6bf5 ("i40e: implement virtual device interface")
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

iavf: Fix adopting new combined setting

In some cases overloaded flag IAVF_FLAG_REINIT_ITR_NEEDED
which should indicate that interrupts need to be completely
reinitialized during reset leads to RTNL deadlocks using ethtool -C
while a reset is in progress.
To fix, it was added a new flag IAVF_FLAG_REINIT_MSIX_NEEDED
used to trigger MSI-X reinit.
New combined setting is fixed adopt after VF reset.
This has been implemented by call reinit interrupt scheme
during VF reset.
Without this fix new combined setting has never been adopted.

Fixes: 209f2f9c7181 ("iavf: Add support for VIRTCHNL_VF_OFFLOAD_VLAN_V2 negotiation")
Signed-off-by: Grzegorz Szczurek <[email protected]>
Signed-off-by: Mitch Williams <[email protected]>
Signed-off-by: Michal Maloszewski <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

iavf: Fix handling of vlan strip virtual channel messages

Modify netdev->features for vlan stripping based on virtual
channel messages received from the PF. Change is needed
to synchronize vlan strip status between PF sysfs and iavf ethtool.

Fixes: 5951a2b9812d ("iavf: Fix VLAN feature flags after VFR")
Signed-off-by: Norbert Ciosek <[email protected]>
Signed-off-by: Michal Maloszewski <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ARM: fix build error when BPF_SYSCALL is disabled

It was missing a semicolon.

Signed-off-by: Emmanuel Gil Peyrot <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Fixes: 25875aa71dfe ("ARM: include unprivileged BPF status in Spectre V2 reporting").
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'devicetree-fixes-for-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

- Fix pinctrl node name warnings in examples

- Add missing 'mux-states' property in ti,tcan104x-can binding

* tag 'devicetree-fixes-for-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: phy: ti,tcan104x-can: Document mux-states property
dt-bindings: mfd: Fix pinctrl node name warnings

Merge tag 'fuse-fixes-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse fixes from Miklos Szeredi:

- Fix an issue with splice on the fuse device

- Fix a regression in the fileattr API conversion

- Add a small userspace API improvement

* tag 'fuse-fixes-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: fix pipe buffer lifetime for direct_io
  fuse: move FUSE_SUPER_MAGIC definition to magic.h
  fuse: fix fileattr op failure

Merge tag 'arm64-spectre-bhb-for-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 spectre fixes from James Morse:
"ARM64 Spectre-BHB mitigations:

   - Make EL1 vectors per-cpu

   - Add mitigation sequences to the EL1 and EL2 vectors on vulnerble
     CPUs

   - Implement ARCH_WORKAROUND_3 for KVM guests

   - Report Vulnerable when unprivileged eBPF is enabled"

* tag 'arm64-spectre-bhb-for-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting
  arm64: Use the clearbhb instruction in mitigations
  KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
  arm64: Mitigate spectre style branch history side channels
  arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2
  arm64: Add percpu vectors for EL1
  arm64: entry: Add macro for reading symbol addresses from the trampoline
  arm64: entry: Add vectors that have the bhb mitigation sequences
  arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
  arm64: entry: Allow the trampoline text to occupy multiple pages
  arm64: entry: Make the kpti trampoline's kpti sequence optional
  arm64: entry: Move trampoline macros out of ifdef'd section
  arm64: entry: Don't assume tramp_vectors is the start of the vectors
  arm64: entry: Allow tramp_alias to access symbols after the 4K boundary
  arm64: entry: Move the trampoline data page before the text page
  arm64: entry: Free up another register on kpti's tramp_exit path
  arm64: entry: Make the trampoline cleanup optional
  KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A
  arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit
  arm64: entry.S: Add ventry overflow sanity checks

Merge tag 'for-linus-bhb' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM spectre fixes from Russell King:
"ARM Spectre BHB mitigations.

  These patches add Spectre BHB migitations for the following Arm CPUs
  to the 32-bit ARM kernels:
   - Cortex A15
   - Cortex A57
   - Cortex A72
   - Cortex A73
   - Cortex A75
   - Brahma B15
  for CVE-2022-23960"

* tag 'for-linus-bhb' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: include unprivileged BPF status in Spectre V2 reporting
  ARM: Spectre-BHB workaround
  ARM: use LOADADDR() to get load address of sections
  ARM: early traps initialisation
  ARM: report Spectre v2 status through sysfs

dt-bindings: phy: ti,tcan104x-can: Document mux-states property

On some boards, for routing CAN signals from controller to transceivers,
muxes might need to be set. This can be implemented using mux-states
property. Therefore, document the same in the respective bindings.

Signed-off-by: Aswath Govindraju <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

dt-bindings: mfd: Fix pinctrl node name warnings

The recent addition pinctrl.yaml in commit c09acbc499e8 ("dt-bindings:
pinctrl: use pinctrl.yaml") resulted in some node name warnings:

Documentation/devicetree/bindings/mfd/cirrus,lochnagar.example.dt.yaml: \
lochnagar-pinctrl: $nodename:0: 'lochnagar-pinctrl' does not match '^(pinctrl|pinmux)(@[0-9a-f]+)?$'
Documentation/devicetree/bindings/mfd/cirrus,madera.example.dt.yaml: \
codec@1a: $nodename:0: 'codec@1a' does not match '^(pinctrl|pinmux)(@[0-9a-f]+)?$'
Documentation/devicetree/bindings/mfd/brcm,cru.example.dt.yaml: \
pin-controller@1c0: $nodename:0: 'pin-controller@1c0' does not match '^(pinctrl|pinmux)(@[0-9a-f]+)?$'

Fix the node names to the preferred 'pinctrl'. For cirrus,madera,
nothing from pinctrl.yaml schema is used, so just drop the reference.

Fixes: c09acbc499e8 ("dt-bindings: pinctrl: use pinctrl.yaml")
Cc: Rafał Miłecki <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Acked-by: Charles Keepax <[email protected]>
Acked-by: Lee Jones <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

MAINTAINERS: Update Jisheng's email address

I'm leaving synaptics. Update my email address to my korg mail
address and add entries to .mailmap as well to map my work
addresses to korg mail address.

Signed-off-by: Jisheng Zhang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]'
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'arm-soc/for-5.18/maintainers' of https://github.com/Broadcom/stblinux into arm/fixes

This pull request updates the MAINTAINERS file for Broadcom SoCs, please
pull the following for 5.18:

- Kuldeep updates the Broadcom iProc entry to use the same up to date
Linux tree as the other Broadcom SoCs.

* tag 'arm-soc/for-5.18/maintainers' of https://github.com/Broadcom/stblinux:
MAINTAINERS: Update git tree for Broadcom iProc SoCs

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>

ARM: include unprivileged BPF status in Spectre V2 reporting

The mitigations for Spectre-BHB are only applied when an exception
is taken, but when unprivileged BPF is enabled, userspace can
load BPF programs that can be used to exploit the problem.

When unprivileged BPF is enabled, report the vulnerable status via
the spectre_v2 sysfs file.

Signed-off-by: Russell King (Oracle) <[email protected]>

Revert "arm64: dts: mt8183: jacuzzi: Fix bus properties in anx's DSI endpoint"

This reverts commit 32568ae37596b529628ac09b875f4874e614f63f.

Signed-off-by: Robert Foss <[email protected]>
Reviewed-by: Chen-Yu Tsai <[email protected]>
Reviewed-by: Laurent Pinchart <[email protected]>
Acked-by: Matthias Brugger <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>

dt-bindings: drm/bridge: anx7625: Revert DPI support

Revert DPI support from binding.

DPI support relies on the bus-type enum which does not yet support
Mipi DPI, since no v4l2_fwnode_bus_type has been defined for this
bus type.

When DPI for anx7625 was initially added, it assumed that
V4L2_FWNODE_BUS_TYPE_PARALLEL was the correct bus type for
representing DPI, which it is not.

In order to prevent adding this mis-usage to the ABI, let's revert
the support.

Signed-off-by: Robert Foss <[email protected]>
Reviewed-by: Laurent Pinchart <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>

x86/module: Fix the paravirt vs alternative order

Ever since commit

4e6292114c74 ("x86/paravirt: Add new features for paravirt patching")

there is an ordering dependency between patching paravirt ops and
patching alternatives, the module loader still violates this.

Fixes: 4e6292114c74 ("x86/paravirt: Add new features for paravirt patching")
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

ARM: dts: aspeed: Fix AST2600 quad spi group

Requesting quad mode for the FMC resulted in an error:

  &fmc {
         status = "okay";
+       pinctrl-names = "default";
+       pinctrl-0 = <&pinctrl_fwqspi_default>'

[    0.742963] aspeed-g6-pinctrl 1e6e2000.syscon:pinctrl: invalid function FWQSPID in map table

This is because the quad mode pins are a group of pins, not a function.

After applying this patch we can request the pins and the QSPI data
lines are muxed:

# cat /sys/kernel/debug/pinctrl/1e6e2000.syscon\:pinctrl-aspeed-g6-pinctrl/pinmux-pins |grep 1e620000.spi
pin 196 (AE12): device 1e620000.spi function FWSPID group FWQSPID
pin 197 (AF12): device 1e620000.spi function FWSPID group FWQSPID
pin 240 (Y1): device 1e620000.spi function FWSPID group FWQSPID
pin 241 (Y2): device 1e620000.spi function FWSPID group FWQSPID
pin 242 (Y3): device 1e620000.spi function FWSPID group FWQSPID
pin 243 (Y4): device 1e620000.spi function FWSPID group FWQSPID

Fixes: f510f04c8c83 ("ARM: dts: aspeed: Add AST2600 pinmux nodes")
Signed-off-by: Joel Stanley <[email protected]>
Reviewed-by: Andrew Jeffery <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]'
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'tegra-for-5.17-arm-dt-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into arm/fixes

ARM: tegra: Device tree fixes for v5.17

One more patch to fix up eDP panels on Nyan FHD models.

* tag 'tegra-for-5.17-arm-dt-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
ARM: tegra: Move Nyan FHD panels to AUX bus
ARM: tegra: Move panels to AUX bus

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>

spi: Fix invalid sgs value

max_seg_size is unsigned int and it can have a value up to 2^32
(for eg:-RZ_DMAC driver sets dma_set_max_seg_size as U32_MAX)
When this value is used in min_t() as an integer type, it becomes
-1 and the value of sgs becomes 0.

Fix this issue by replacing the 'int' data type with 'unsigned int'
in min_t().

Signed-off-by: Biju Das <[email protected]>
Reviewed-by: Lad Prabhakar <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

net: dsa: mt7530: fix incorrect test in mt753x_phylink_validate()

Discussing one of the tests in mt753x_phylink_validate() with Landen
Chao confirms that the "||" should be "&&". Fix this.

Fixes: c288575f7810 ("net: dsa: mt7530: Add the support of MT7531 switch")
Signed-off-by: Russell King (Oracle) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

drm/sun4i: mixer: Fix P010 and P210 format numbers

It turns out that DE3 manual has inverted YUV and YVU format numbers for
P010 and P210. Invert them.

This was tested by playing video decoded to P010 and additionally
confirmed by looking at BSP driver source.

Fixes: 169ca4b38932 ("drm/sun4i: Add separate DE3 VI layer formats")
Signed-off-by: Jernej Skrabec <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

arm64: Ensure execute-only permissions are not allowed without EPAN

Commit 18107f8a2df6 ("arm64: Support execute-only permissions with
Enhanced PAN") re-introduced execute-only permissions when EPAN is
available. When EPAN is not available, arch_filter_pgprot() is supposed
to change a PAGE_EXECONLY permission into PAGE_READONLY_EXEC. However,
if BTI or MTE are present, such check does not detect the execute-only
pgprot in the presence of PTE_GP (BTI) or MT_NORMAL_TAGGED (MTE),
allowing the user to request PROT_EXEC with PROT_BTI or PROT_MTE.

Remove the arch_filter_pgprot() function, change the default VM_EXEC
permissions to PAGE_READONLY_EXEC and update the protection_map[] array
at core_initcall() if EPAN is detected.

Signed-off-by: Catalin Marinas <[email protected]>
Fixes: 18107f8a2df6 ("arm64: Support execute-only permissions with Enhanced PAN")
Cc: <[email protected]> # 5.13.x
Acked-by: Will Deacon <[email protected]>
Reviewed-by: Vladimir Murzin <[email protected]>
Tested-by: Vladimir Murzin <[email protected]>

gpio: sim: Declare gpio_sim_hog_config_item_ops static

Compiler is not happy:

warning: symbol 'gpio_sim_hog_config_item_ops' was not declared. Should it be static?

Fixes: cb8c474e79be ("gpio: sim: new testing module")
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>

Merge tag 'x86_bugs_for_v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 spectre fixes from Borislav Petkov:

- Mitigate Spectre v2-type Branch History Buffer attacks on machines
   which support eIBRS, i.e., the hardware-assisted speculation
   restriction after it has been shown that such machines are vulnerable
   even with the hardware mitigation.

- Do not use the default LFENCE-based Spectre v2 mitigation on AMD as
   it is insufficient to mitigate such attacks. Instead, switch to
   retpolines on all AMD by default.

- Update the docs and add some warnings for the obviously vulnerable
   cmdline configurations.

* tag 'x86_bugs_for_v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
  x86/speculation: Warn about Spectre v2 LFENCE mitigation
  x86/speculation: Update link to AMD speculation whitepaper
  x86/speculation: Use generic retpoline by default on AMD
  x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
  Documentation/hw-vuln: Update spectre doc
  x86/speculation: Add eIBRS + Retpoline options
  x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE

MAINTAINERS: update Krzysztof Kozlowski's email

Use Krzysztof Kozlowski's @kernel.org account in maintainer entries.

Signed-off-by: Krzysztof Kozlowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]'
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'tegra-for-5.17-arm64-dt-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into arm/fixes

arm64: tegra: Device tree fixes for v5.17

This contains a single, last-minute fix to disable the display SMMU by
default because under some circumstances leaving it enabled by default
can cause SMMU faults on boot.

* tag 'tegra-for-5.17-arm64-dt-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
arm64: tegra: Disable ISO SMMU for Tegra194

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>

MAINTAINERS: Update git tree for Broadcom iProc SoCs

Current git tree for Broadcom iProc SoCs is pretty outdated as it has
not updated for a long time. Fix the reference.

Signed-off-by: Kuldeep Singh <[email protected]>

Merge tag 'mtd/fixes-for-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux

Pull MTD fix from Miquel Raynal:
"As part of a previous changeset introducing support for the K3
  architecture, the OMAP_GPMC (a non visible symbol) got selected by the
  selection of MTD_NAND_OMAP2 instead of doing so from the architecture
  directly (like for the other users of these two drivers). Indeed, from
  a hardware perspective, the OMAP NAND controller needs the GPMC to
  work.

  This led to a robot error which got addressed in fix merge into -rc4.
  Unfortunately, the approach at this time still used "select" and lead
  to further build error reports (sparc64:allmodconfig).

  This time we switch to 'depends on' in order to prevent random
  misconfigurations. The different dependencies will however need a
  future cleanup"

* tag 'mtd/fixes-for-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
  mtd: rawnand: omap2: Actually prevent invalid configuration and build error

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
"Some last minute fixes that took a while to get ready. Not
  regressions, but they look safe and seem to be worth to have"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  tools/virtio: handle fallout from folio work
  tools/virtio: fix virtio_test execution
  vhost: remove avail_event arg from vhost_update_avail_event()
  virtio: drop default for virtio-mem
  vdpa: fix use-after-free on vp_vdpa_remove
  virtio-blk: Remove BUG_ON() in virtio_queue_rq()
  virtio-blk: Don't use MAX_DISCARD_SEGMENTS if max_discard_seg is zero
  vhost: fix hung thread due to erroneous iotlb entries
  vduse: Fix returning wrong type in vduse_domain_alloc_iova()
  vdpa/mlx5: add validation for VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command
  vdpa/mlx5: should verify CTRL_VQ feature exists for MQ
  vdpa: factor out vdpa_set_features_unlocked for vdpa internal use
  virtio_console: break out of buf poll on remove
  virtio: document virtio_reset_device
  virtio: acknowledge all features before access
  virtio: unexport virtio_finalize_features

swiotlb: rework "fix info leak with DMA_FROM_DEVICE"

Unfortunately, we ended up merging an old version of the patch "fix info
leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph
(the swiotlb maintainer), he asked me to create an incremental fix
(after I have pointed this out the mix up, and asked him for guidance).
So here we go.

The main differences between what we got and what was agreed are:
* swiotlb_sync_single_for_device is also required to do an extra bounce
* We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters
* The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE
must take precedence over DMA_ATTR_SKIP_CPU_SYNC

Thus this patch removes DMA_ATTR_OVERWRITE, and makes
swiotlb_sync_single_for_device() bounce unconditionally (that is, also
when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
data from the swiotlb buffer.

Let me note, that if the size used with dma_sync_* API is less than the
size used with dma_[un]map_*, under certain circumstances we may still
end up with swiotlb not being transparent. In that sense, this is no
perfect fix either.

To get this bullet proof, we would have to bounce the entire
mapping/bounce buffer. For that we would have to figure out the starting
address, and the size of the mapping in
swiotlb_sync_single_for_device(). While this does seem possible, there
seems to be no firm consensus on how things are supposed to work.

Signed-off-by: Halil Pasic <[email protected]>
Fixes: ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE")
Cc: [email protected]
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ARM: tegra: Move Nyan FHD panels to AUX bus

Similarly to what was earlier done for other Nyan variants, move the eDP
panel on the FHD models to the AUX bus as well.

Suggested-by: Dmitry Osipenko <[email protected]>
Fixes: ef6fb9875ce0 ("ARM: tegra: Add device-tree for 1080p version of Nyan Big")
Signed-off-by: Thierry Reding <[email protected]>

perf tools: Use ARRAY_SIZE() instead of ad hoc equivalent, spotted by array_size.cocci

Fix the following coccicheck warning:

tools/perf/util/trace-event-parse.c:209:35-36: WARNING: Use ARRAY_SIZE

ARRAY_SIZE(arr) is a macro provided in tools/include/linux/kernel.h,
which not only measures the size of the array, but also makes sure
that `arr` is really an array.

It has been tested with gcc (Debian 8.3.0-6) 8.3.0.

Signed-off-by: Guo Zhengkui <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf script: Output branch sample type

The type info is saved when using '-j save_type'. Output this in 'perf
script' so it can be accessed by other tools or for debugging.

It's appended to the end of the list of fields so any existing tools
that split on / and access fields via an index are not affected. Also
output '-' instead of 'N/A' when the branch type isn't saved because /
is used as a field separator.

Entries before this change look like this:

  0xaaaadb350838/0xaaaadb3507a4/P/-/-/0

And afterwards like this:

  0xaaaadb350838/0xaaaadb3507a4/P/-/-/0/CALL

or this if no type info is saved:

  0x7fb57586df6b/0x7fb5758731f0/P/-/-/143/-

Signed-off-by: James Clark <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: German Gomez <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf script: Refactor branch stack printing

Remove duplicate code so that future changes to flags are always made to
all 3 printing variations.

Signed-off-by: James Clark <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: German Gomez <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf session: Print branch stack entry type in --dump-raw-trace

This can help with debugging issues. It only prints when -j save_type
is used otherwise an empty string is printed.

Before the change:

  101603801707130 0xa70 [0x630]: PERF_RECORD_SAMPLE(IP, 0x2): 1108/1108: 0xffff9c1df24c period: 10694 addr: 0
  ... branch stack: nr:64
  .....  0: 0000ffff9c26029c -> 0000ffff9c26f340 0 cycles  P   0
  .....  1: 0000ffff9c2601bc -> 0000ffff9c26f340 0 cycles  P   0

After the change:

  101603801707130 0xa70 [0x630]: PERF_RECORD_SAMPLE(IP, 0x2): 1108/1108: 0xffff9c1df24c period: 10694 addr: 0
  ... branch stack: nr:64
  .....  0: 0000ffff9c26029c -> 0000ffff9c26f340 0 cycles  P   0 CALL
  .....  1: 0000ffff9c2601bc -> 0000ffff9c26f340 0 cycles  P   0 IND_CALL

Signed-off-by: James Clark <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: German Gomez <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf evsel: Add error message for unsupported branch stack cases

EOPNOTSUPP is a possible return value when branch stacks are requested
but they aren't enabled in the kernel or hardware. It's also returned if
they aren't supported on the specific event type. The currently printed
error message about sampling/overflow-interrupts is not correct in this
case.

Add a check for branch stacks before sample_period is checked because
sample_period is also set (to the default value) when using branch
stacks.

Before this change (when branch stacks aren't supported):

  perf record -j any
  Error:
  cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'

After this change:

  perf record -j any
  Error:
  cycles: PMU Hardware or event type doesn't support branch stack sampling.

Signed-off-by: James Clark <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: German Gomez <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting

The mitigations for Spectre-BHB are only applied when an exception is
taken from user-space. The mitigation status is reported via the spectre_v2
sysfs vulnerabilities file.

When unprivileged eBPF is enabled the mitigation in the exception vectors
can be avoided by an eBPF program.

When unprivileged eBPF is enabled, print a warning and report vulnerable
via the sysfs vulnerabilities file.

Acked-by: Catalin Marinas <[email protected]>
Signed-off-by: James Morse <[email protected]>

mtd: rawnand: omap2: Actually prevent invalid configuration and build error

The root of the problem is that we are selecting symbols that have
dependencies. This can cause random configurations that can fail.
The cleanest solution is to avoid using select.

This driver uses interfaces from the OMAP_GPMC driver so we have to
depend on it instead.

Fixes: 4cd335dae3cf ("mtd: rawnand: omap2: Prevent invalid configuration and build error")
Signed-off-by: Roger Quadros <[email protected]>
Signed-off-by: Miquel Raynal <[email protected]>
Tested-by: Randy Dunlap <[email protected]>
Link: https://lore.kernel.org/linux-mtd/[email protected]

fuse: fix pipe buffer lifetime for direct_io

In FOPEN_DIRECT_IO mode, fuse_file_write_iter() calls
fuse_direct_write_iter(), which normally calls fuse_direct_io(), which then
imports the write buffer with fuse_get_user_pages(), which uses
iov_iter_get_pages() to grab references to userspace pages instead of
actually copying memory.

On the filesystem device side, these pages can then either be read to
userspace (via fuse_dev_read()), or splice()d over into a pipe using
fuse_dev_splice_read() as pipe buffers with &nosteal_pipe_buf_ops.

This is wrong because after fuse_dev_do_read() unlocks the FUSE request,
the userspace filesystem can mark the request as completed, causing write()
to return. At that point, the userspace filesystem should no longer have
access to the pipe buffer.

Fix by copying pages coming from the user address space to new pipe
buffers.

Reported-by: Jann Horn <[email protected]>
Fixes: c3021629a0d8 ("fuse: support splice() reading from fuse device")
Cc: <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>

drm/i915/psr: Set "SF Partial Frame Enable" also on full update

Currently we are observing occasional screen flickering when
PSR2 selective fetch is enabled. More specifically glitch seems
to happen on full frame update when cursor moves to coords
x = -1 or y = -1.

According to Bspec SF Single full frame should not be set if
SF Partial Frame Enable is not set. This happened to be true for
ADLP as PSR2_MAN_TRK_CTL_ENABLE is always set and for ADL_P it's
actually "SF Partial Frame Enable" (Bit 31).

Setting "SF Partial Frame Enable" bit also on full update seems to
fix screen flickering.

Also make code more clear by setting PSR2_MAN_TRK_CTL_ENABLE
only if not on ADL_P. Bit 31 has different meaning in ADL_P.

Bspec: 49274

v2: Fix Mihai Harpau email address
v3: Modify commit message and remove unnecessary comment

Tested-by: Lyude Paul <[email protected]>
Fixes: 7f6002e58025 ("drm/i915/display: Enable PSR2 selective fetch by default")
Reported-by: Lyude Paul <[email protected]>
Cc: Mihai Harpau <[email protected]>
Cc: José Roberto de Souza <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Bugzilla: https://gitlab.freedesktop.org/drm/intel/-/issues/5077
Signed-off-by: Jouni Högander <[email protected]>
Reviewed-by: José Roberto de Souza <[email protected]>
Signed-off-by: José Roberto de Souza <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 8d5516d18b323cf7274d1cf5fe76f4a691f879c6)
Signed-off-by: Tvrtko Ursulin <[email protected]>

gpiolib: acpi: Convert ACPI value of debounce to microseconds

It appears that GPIO ACPI library uses ACPI debounce values directly.
However, the GPIO library APIs expect the debounce timeout to be in
microseconds.

Convert ACPI value of debounce to microseconds.

While at it, document this detail where it is appropriate.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215664
Reported-by: Kai-Heng Feng <[email protected]>
Fixes: 8dcb7a15a585 ("gpiolib: acpi: Take into account debounce settings")
Signed-off-by: Andy Shevchenko <[email protected]>
Tested-by: Kai-Heng Feng <[email protected]>
Reviewed-by: Mika Westerberg <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>

gpio: Revert regression in sysfs-gpio (gpiolib.c)

Some GPIO lines have stopped working after the patch
commit 2ab73c6d8323f ("gpio: Support GPIO controllers without pin-ranges")

And this has supposedly been fixed in the following patches
commit 89ad556b7f96a ("gpio: Avoid using pin ranges with !PINCTRL")
commit 6dbbf84603961 ("gpiolib: Don't free if pin ranges are not defined")

But an erratic behavior where some GPIO lines work while others do not work
has been introduced.

This patch reverts those changes so that the sysfs-gpio interface works
properly again.

Signed-off-by: Marcelo Roberto Jimenez <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>

gpio: tegra186: Add IRQ per bank for Tegra241

Add the number of interrupts per bank for Tegra241 (Grace) to
fix the probe failure.

Fixes: d1056b771ddb ("gpio: tegra186: Add support for Tegra241")
Signed-off-by: Akhil R <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>

smsc95xx: Ignore -ENODEV errors when device is unplugged

According to Documentation/driver-api/usb/URB.rst when a device
is unplugged usb_submit_urb() returns -ENODEV.

This error code propagates all the way up to usbnet_read_cmd() and
usbnet_write_cmd() calls inside the smsc95xx.c driver during
Ethernet cable unplug, unbind or reboot.

This causes the following errors to be shown on reboot, for example:

ci_hdrc ci_hdrc.1: remove, state 1
usb usb2: USB disconnect, device number 1
usb 2-1: USB disconnect, device number 2
usb 2-1.1: USB disconnect, device number 3
smsc95xx 2-1.1:1.0 eth1: unregister 'smsc95xx' usb-ci_hdrc.1-1.1, smsc95xx USB 2.0 Ethernet
smsc95xx 2-1.1:1.0 eth1: Failed to read reg index 0x00000114: -19
smsc95xx 2-1.1:1.0 eth1: Error reading MII_ACCESS
smsc95xx 2-1.1:1.0 eth1: __smsc95xx_mdio_read: MII is busy
smsc95xx 2-1.1:1.0 eth1: Failed to read reg index 0x00000114: -19
smsc95xx 2-1.1:1.0 eth1: Error reading MII_ACCESS
smsc95xx 2-1.1:1.0 eth1: __smsc95xx_mdio_read: MII is busy
smsc95xx 2-1.1:1.0 eth1: hardware isn't capable of remote wakeup
usb 2-1.4: USB disconnect, device number 4
ci_hdrc ci_hdrc.1: USB bus 2 deregistered
ci_hdrc ci_hdrc.0: remove, state 4
usb usb1: USB disconnect, device number 1
ci_hdrc ci_hdrc.0: USB bus 1 deregistered
imx2-wdt 30280000.watchdog: Device shutdown: Expect reboot!
reboot: Restarting system

Ignore the -ENODEV errors inside __smsc95xx_mdio_read() and
__smsc95xx_phy_wait_not_busy() and do not print error messages
when -ENODEV is returned.

Fixes: a049a30fc27c ("net: usb: Correct PHY handling of smsc95xx")
Signed-off-by: Fabio Estevam <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qed: return status of qed_iov_get_link

Clang static analysis reports this issue
qed_sriov.c:4727:19: warning: Assigned value is
  garbage or undefined
  ivi->max_tx_rate = tx_rate ? tx_rate : link.speed;
                   ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

link is only sometimes set by the call to qed_iov_get_link()
qed_iov_get_link fails without setting link or returning
status.  So change the decl to return status.

Fixes: 73390ac9d82b ("qed*: support ndo_get_vf_config")
Signed-off-by: Tom Rix <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Fix esp GSO on inter address family tunnels.

The esp tunnel GSO handlers use skb_mac_gso_segment to
push the inner packet to the segmentation handlers.
However, skb_mac_gso_segment takes the Ethernet Protocol
ID from 'skb->protocol' which is wrong for inter address
family tunnels. We fix this by introducing a new
skb_eth_gso_segment function.

This function can be used if it is necessary to pass the
Ethernet Protocol ID directly to the segmentation handler.
First users of this function will be the esp4 and esp6
tunnel segmentation handlers.

Fixes: c35fe4106b92 ("xfrm: Add mode handlers for IPsec on layer 2")
Signed-off-by: Steffen Klassert <[email protected]>

esp: Fix BEET mode inter address family tunneling on GSO

The xfrm{4,6}_beet_gso_segment() functions did not correctly set the
SKB_GSO_IPXIP4 and SKB_GSO_IPXIP6 gso types for the address family
tunneling case. Fix this by setting these gso types.

Fixes: 384a46ea7bdc7 ("esp4: add gso_segment for esp4 beet mode")
Fixes: 7f9e40eb18a99 ("esp6: add gso_segment for esp6 beet mode")
Signed-off-by: Steffen Klassert <[email protected]>

esp: Fix possible buffer overflow in ESP transformation

The maximum message size that can be send is bigger than
the maximum site that skb_page_frag_refill can allocate.
So it is possible to write beyond the allocated buffer.

Fix this by doing a fallback to COW in that case.

v2:

Avoid get get_order() costs as suggested by Linus Torvalds.

Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible")
Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible")
Reported-by: valis <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>

ethernet: sun: Free the coherent when failing in probing

When the driver fails to register net device, it should free the DMA
region first, and then do other cleanup.

Signed-off-by: Zheyu Ma <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: lantiq_xrx200: fix use after free bug

The skb->len field is read after the packet is sent to the network
stack. In the meantime, skb can be freed. This patch fixes this bug.

Fixes: c3e6b2c35b34 ("net: lantiq_xrx200: add ingress SG DMA support")
Reported-by: Eric Dumazet <[email protected]>
Signed-off-by: Aleksander Jan Bajkowski <[email protected]>
Acked-by: Hauke Mehrtens <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: qlogic: check the return value of dma_alloc_coherent() in qed_vf_hw_prepare()

The function dma_alloc_coherent() in qed_vf_hw_prepare() can fail, so
its return value should be checked.

Fixes: 1408cc1fa48c ("qed: Introduce VFs")
Reported-by: TOTE Robot <[email protected]>
Signed-off-by: Jia-Ju Bai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

isdn: hfcpci: check the return value of dma_set_mask() in setup_hw()

The function dma_set_mask() in setup_hw() can fail, so its return value
should be checked.

Fixes: 1700fe1a10dc ("Add mISDN HFC PCI driver")
Reported-by: TOTE Robot <[email protected]>
Signed-off-by: Jia-Ju Bai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mmc: core: Restore (almost) the busy polling for MMC_SEND_OP_COND

Commit 76bfc7ccc2fa ("mmc: core: adjust polling interval for CMD1"),
significantly decreased the polling period from ~10-12ms into just a couple
of us. The purpose was to decrease the total time spent in the busy polling
loop, but unfortunate it has lead to problems, that causes eMMC cards to
never gets out busy and thus fails to be initialized.

To fix the problem, but also to try to keep some of the new improved
behaviour, let's start by using a polling period of 1-2ms, which then
increases for each loop, according to common polling loop in
__mmc_poll_for_busy().

Reported-by: Jean Rene Dawin <[email protected]>
Reported-by: H. Nikolaus Schaller <[email protected]>
Cc: Huijin Park <[email protected]>
Fixes: 76bfc7ccc2fa ("mmc: core: adjust polling interval for CMD1")
Signed-off-by: Ulf Hansson <[email protected]>
Tested-by: Jean Rene Dawin <[email protected]>
Tested-by: H. Nikolaus Schaller <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

xen/netfront: react properly to failing gnttab_end_foreign_access_ref()

When calling gnttab_end_foreign_access_ref() the returned value must
be tested and the reaction to that value should be appropriate.

In case of failure in xennet_get_responses() the reaction should not be
to crash the system, but to disable the network device.

The calls in setup_netfront() can be replaced by calls of
gnttab_end_foreign_access(). While at it avoid double free of ring
pages and grant references via xennet_disconnect_backend() in this case.

This is CVE-2022-23042 / part of XSA-396.

Reported-by: Demi Marie Obenour <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
---
V2:
- avoid double free
V3:
- remove pointless initializer (Jan Beulich)

xen/gnttab: fix gnttab_end_foreign_access() without page specified

gnttab_end_foreign_access() is used to free a grant reference and
optionally to free the associated page. In case the grant is still in
use by the other side processing is being deferred. This leads to a
problem in case no page to be freed is specified by the caller: the
caller doesn't know that the page is still mapped by the other side
and thus should not be used for other purposes.

The correct way to handle this situation is to take an additional
reference to the granted page in case handling is being deferred and
to drop that reference when the grant reference could be freed
finally.

This requires that there are no users of gnttab_end_foreign_access()
left directly repurposing the granted page after the call, as this
might result in clobbered data or information leaks via the not yet
freed grant reference.

This is part of CVE-2022-23041 / XSA-396.

Reported-by: Simon Gaiser <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
---
V4:
- expand comment in header
V5:
- get page ref in case of kmalloc() failure, too

xen/pvcalls: use alloc/free_pages_exact()

Instead of __get_free_pages() and free_pages() use alloc_pages_exact()
and free_pages_exact(). This is in preparation of a change of
gnttab_end_foreign_access() which will prohibit use of high-order
pages.

This is part of CVE-2022-23041 / XSA-396.

Reported-by: Simon Gaiser <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
---
V4:
- new patch

xen/9p: use alloc/free_pages_exact()

Instead of __get_free_pages() and free_pages() use alloc_pages_exact()
and free_pages_exact(). This is in preparation of a change of
gnttab_end_foreign_access() which will prohibit use of high-order
pages.

By using the local variable "order" instead of ring->intf->ring_order
in the error path of xen_9pfs_front_alloc_dataring() another bug is
fixed, as the error path can be entered before ring->intf->ring_order
is being set.

By using alloc_pages_exact() the size in bytes is specified for the
allocation, which fixes another bug for the case of
order < (PAGE_SHIFT - XEN_PAGE_SHIFT).

This is part of CVE-2022-23041 / XSA-396.

Reported-by: Simon Gaiser <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
---
V4:
- new patch

xen/usb: don't use gnttab_end_foreign_access() in xenhcd_gnttab_done()

The usage of gnttab_end_foreign_access() in xenhcd_gnttab_done() is
not safe against a malicious backend, as the backend could keep the
I/O page mapped and modify it even after the granted memory page is
being used for completely other purposes in the local system.

So replace that use case with gnttab_try_end_foreign_access() and
disable the PV host adapter in case the backend didn't stop using the
granted page.

In xenhcd_urb_request_done() immediately return in case of setting
the device state to "error" instead of looking into further backend
responses.

Reported-by: Demi Marie Obenour <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
---
V2:
- use gnttab_try_end_foreign_access()

xen: remove gnttab_query_foreign_access()

Remove gnttab_query_foreign_access(), as it is unused and unsafe to
use.

All previous use cases assumed a grant would not be in use after
gnttab_query_foreign_access() returned 0. This information is useless
in best case, as it only refers to a situation in the past, which could
have changed already.

Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>

xen/gntalloc: don't use gnttab_query_foreign_access()

Using gnttab_query_foreign_access() is unsafe, as it is racy by design.

The use case in the gntalloc driver is not needed at all. While at it
replace the call of gnttab_end_foreign_access_ref() with a call of
gnttab_end_foreign_access(), which is what is really wanted there. In
case the grant wasn't used due to an allocation failure, just free the
grant via gnttab_free_grant_reference().

This is CVE-2022-23039 / part of XSA-396.

Reported-by: Demi Marie Obenour <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
---
V3:
- fix __del_gref() (Jan Beulich)

xen/scsifront: don't use gnttab_query_foreign_access() for mapped status

It isn't enough to check whether a grant is still being in use by
calling gnttab_query_foreign_access(), as a mapping could be realized
by the other side just after having called that function.

In case the call was done in preparation of revoking a grant it is
better to do so via gnttab_try_end_foreign_access() and check the
success of that operation instead.

This is CVE-2022-23038 / part of XSA-396.

Reported-by: Demi Marie Obenour <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
---
V2:
- use gnttab_try_end_foreign_access()

xen/netfront: don't use gnttab_query_foreign_access() for mapped status

It isn't enough to check whether a grant is still being in use by
calling gnttab_query_foreign_access(), as a mapping could be realized
by the other side just after having called that function.

In case the call was done in preparation of revoking a grant it is
better to do so via gnttab_end_foreign_access_ref() and check the
success of that operation instead.

This is CVE-2022-23037 / part of XSA-396.

Reported-by: Demi Marie Obenour <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
---
V2:
- use gnttab_try_end_foreign_access()
V3:
- don't use gnttab_try_end_foreign_access()

xen/blkfront: don't use gnttab_query_foreign_access() for mapped status

It isn't enough to check whether a grant is still being in use by
calling gnttab_query_foreign_access(), as a mapping could be realized
by the other side just after having called that function.

In case the call was done in preparation of revoking a grant it is
better to do so via gnttab_end_foreign_access_ref() and check the
success of that operation instead.

For the ring allocation use alloc_pages_exact() in order to avoid
high order pages in case of a multi-page ring.

If a grant wasn't unmapped by the backend without persistent grants
being used, set the device state to "error".

This is CVE-2022-23036 / part of XSA-396.

Reported-by: Demi Marie Obenour <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Roger Pau Monné <[email protected]>
---
V2:
- use gnttab_try_end_foreign_access()
V4:
- use alloc_pages_exact() and free_pages_exact()
- set state to error if backend didn't unmap (Roger Pau Monné)

xen/grant-table: add gnttab_try_end_foreign_access()

Add a new grant table function gnttab_try_end_foreign_access(), which
will remove and free a grant if it is not in use.

Its main use case is to either free a grant if it is no longer in use,
or to take some other action if it is still in use. This other action
can be an error exit, or (e.g. in the case of blkfront persistent grant
feature) some special handling.

This is CVE-2022-23036, CVE-2022-23038 / part of XSA-396.

Reported-by: Demi Marie Obenour <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
---
V2:
- new patch
V4:
- add comments to header (Jan Beulich)

xen/xenbus: don't let xenbus_grant_ring() remove grants in error case

Letting xenbus_grant_ring() tear down grants in the error case is
problematic, as the other side could already have used these grants.
Calling gnttab_end_foreign_access_ref() without checking success is
resulting in an unclear situation for any caller of xenbus_grant_ring()
as in the error case the memory pages of the ring page might be
partially mapped. Freeing them would risk unwanted foreign access to
them, while not freeing them would leak memory.

In order to remove the need to undo any gnttab_grant_foreign_access()
calls, use gnttab_alloc_grant_references() to make sure no further
error can occur in the loop granting access to the ring pages.

It should be noted that this way of handling removes leaking of
grant entries in the error case, too.

This is CVE-2022-23040 / part of XSA-396.

Reported-by: Demi Marie Obenour <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>

powerpc: Fix STACKTRACE=n build

Our skiroot_defconfig doesn't enable FTRACE, and so doesn't get
STACKTRACE enabled either. That leads to a build failure since commit
1614b2b11fab ("arch: Make ARCH_STACKWALK independent of STACKTRACE")
made stacktrace.c build even when STACKTRACE=n.

  arch/powerpc/kernel/stacktrace.c: In function ‘handle_backtrace_ipi’:
  arch/powerpc/kernel/stacktrace.c:171:2: error: implicit declaration of function ‘nmi_cpu_backtrace’
    171 |  nmi_cpu_backtrace(regs);
        |  ^~~~~~~~~~~~~~~~~
  arch/powerpc/kernel/stacktrace.c: In function ‘arch_trigger_cpumask_backtrace’:
  arch/powerpc/kernel/stacktrace.c:226:2: error: implicit declaration of function ‘nmi_trigger_cpumask_backtrace’
    226 |  nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace_ipi);
        |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This happens because our headers haven't defined
arch_trigger_cpumask_backtrace, which causes lib/nmi_backtrace.c not to
build nmi_cpu_backtrace().

The code in question doesn't actually depend on STACKTRACE=y, that was
just added because arch_trigger_cpumask_backtrace() lived in
stacktrace.c for convenience. So drop the dependency on
CONFIG_STACKTRACE, that causes lib/nmi_backtrace.c to build
nmi_cpu_backtrace() etc. and fixes the build.

Fixes: 1614b2b11fab ("arch: Make ARCH_STACKWALK independent of STACKTRACE")
[mpe: Cherry pick of 5a72345e6a78 from next into fixes]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Linux 5.17-rc7

Merge tag 'for-5.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
"A few more fixes for various problems that have user visible effects
  or seem to be urgent:

   - fix corruption when combining DIO and non-blocking io_uring over
     multiple extents (seen on MariaDB)

   - fix relocation crash due to premature return from commit

   - fix quota deadlock between rescan and qgroup removal

   - fix item data bounds checks in tree-checker (found on a fuzzed
     image)

   - fix fsync of prealloc extents after EOF

   - add missing run of delayed items after unlink during log replay

   - don't start relocation until snapshot drop is finished

   - fix reversed condition for subpage writers locking

   - fix warning on page error"

* tag 'for-5.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fallback to blocking mode when doing async dio over multiple extents
  btrfs: add missing run of delayed items after unlink during log replay
  btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
  btrfs: fix relocation crash due to premature return from btrfs_commit_transaction()
  btrfs: do not start relocation until in progress drops are done
  btrfs: tree-checker: use u64 for item data end to avoid overflow
  btrfs: do not WARN_ON() if we have PageError set
  btrfs: fix lost prealloc extents beyond eof after full fsync
  btrfs: subpage: fix a wrong check on subpage->writers

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"x86 guest:

   - Tweaks to the paravirtualization code, to avoid using them when
     they're pointless or harmful

  x86 host:

   - Fix for SRCU lockdep splat

   - Brown paper bag fix for the propagation of errno"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: pull kvm->srcu read-side to kvm_arch_vcpu_ioctl_run
  KVM: x86/mmu: Passing up the error state of mmu_alloc_shadow_roots()
  KVM: x86: Yield to IPI target vCPU only if it is busy
  x86/kvmclock: Fix Hyper-V Isolated VM's boot issue when vCPUs > 64
  x86/kvm: Don't waste memory if kvmclock is disabled
  x86/kvm: Don't use PV TLB/yield when mwait is advertised

Merge tag 'powerpc-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:
"Fix build failure when CONFIG_PPC_64S_HASH_MMU is not set.

Thanks to Murilo Opsfelder Araujo, and Erhard F"

* tag 'powerpc-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Fix build failure when CONFIG_PPC_64S_HASH_MMU is not set

Merge tag 'trace-v5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

- Fix sorting on old "cpu" value in histograms

- Fix return value of __setup() boot parameter handlers

* tag 'trace-v5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix return value of __setup handlers
tracing/histogram: Fix sorting on old "cpu" value

tools/virtio: handle fallout from folio work

just add a stub

Signed-off-by: Michael S. Tsirkin <[email protected]>

tools/virtio: fix virtio_test execution

virtio_test hangs on __vring_new_virtqueue() because `vqs_list_lock`
is not initialized.

Let's initialize it in vdev_info_init().

Signed-off-by: Stefano Garzarella <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Jason Wang <[email protected]>

vhost: remove avail_event arg from vhost_update_avail_event()

In vhost_update_avail_event() we never used the `avail_event` argument,
since its introduction in commit 2723feaa8ec6 ("vhost: set log when
updating used flags or avail event").

Let's remove it to clean up the code.

Signed-off-by: Stefano Garzarella <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio: drop default for virtio-mem

There's no special reason why virtio-mem needs a default that's
different from what kconfig provides, any more than e.g. virtio blk.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: David Hildenbrand <[email protected]>

vdpa: fix use-after-free on vp_vdpa_remove

When vp_vdpa driver is unbind, vp_vdpa is freed in vdpa_unregister_device
and then vp_vdpa->mdev.pci_dev is dereferenced in vp_modern_remove,
triggering use-after-free.

Call Trace of unbinding driver free vp_vdpa :
do_syscall_64
  vfs_write
    kernfs_fop_write_iter
      device_release_driver_internal
        pci_device_remove
          vp_vdpa_remove
            vdpa_unregister_device
              kobject_release
                device_release
                  kfree

Call Trace of dereference vp_vdpa->mdev.pci_dev:
vp_modern_remove
  pci_release_selected_regions
    pci_release_region
      pci_resource_len
        pci_resource_end
          (dev)->resource[(bar)].end

Signed-off-by: Zhang Min <[email protected]>
Signed-off-by: Yi Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
Fixes: 64b9f64f80a6 ("vdpa: introduce virtio pci driver")
Reviewed-by: Stefano Garzarella <[email protected]>

virtio-blk: Remove BUG_ON() in virtio_queue_rq()

Currently we have a BUG_ON() to make sure the number of sg
list does not exceed queue_max_segments() in virtio_queue_rq().
However, the block layer uses queue_max_discard_segments()
instead of queue_max_segments() to limit the sg list for
discard requests. So the BUG_ON() might be triggered if
virtio-blk device reports a larger value for max discard
segment than queue_max_segments(). To fix it, let's simply
remove the BUG_ON() which has become unnecessary after commit
02746e26c39e("virtio-blk: avoid preallocating big SGL for data").
And the unused vblk->sg_elems can also be removed together.

Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support")
Suggested-by: Christoph Hellwig <[email protected]>
Signed-off-by: Xie Yongji <[email protected]>
Reviewed-by: Max Gurtovoy <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio-blk: Don't use MAX_DISCARD_SEGMENTS if max_discard_seg is zero

Currently the value of max_discard_segment will be set to
MAX_DISCARD_SEGMENTS (256) with no basis in hardware if device
set 0 to max_discard_seg in configuration space. It's incorrect
since the device might not be able to handle such large descriptors.
To fix it, let's follow max_segments restrictions in this case.

Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support")
Signed-off-by: Xie Yongji <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost: fix hung thread due to erroneous iotlb entries

In vhost_iotlb_add_range_ctx(), range size can overflow to 0 when
start is 0 and last is ULONG_MAX. One instance where it can happen
is when userspace sends an IOTLB message with iova=size=uaddr=0
(vhost_process_iotlb_msg). So, an entry with size = 0, start = 0,
last = ULONG_MAX ends up in the iotlb. Next time a packet is sent,
iotlb_access_ok() loops indefinitely due to that erroneous entry.

Call Trace:
<TASK>
iotlb_access_ok+0x21b/0x3e0 drivers/vhost/vhost.c:1340
vq_meta_prefetch+0xbc/0x280 drivers/vhost/vhost.c:1366
vhost_transport_do_send_pkt+0xe0/0xfd0 drivers/vhost/vsock.c:104
vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>

Reported by syzbot at:
https://syzkaller.appspot.com/bug?extid=0abd373e2e50d704db87

To fix this, do two things:

1. Return -EINVAL in vhost_chr_write_iter() when userspace asks to map
a range with size 0.
2. Fix vhost_iotlb_add_range_ctx() to handle the range [0, ULONG_MAX]
by splitting it into two entries.

Fixes: 0bbe30668d89e ("vhost: factor out IOTLB")
Reported-by: [email protected]
Tested-by: [email protected]
Signed-off-by: Anirudh Rayabharam <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>

net: dsa: unlock the rtnl_mutex when dsa_master_setup() fails

After the blamed commit, dsa_tree_setup_master() may exit without
calling rtnl_unlock(), fix that.

Fixes: c146f9bc195a ("net: dsa: hold rtnl_mutex when calling dsa_master_{setup,teardown}")
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Revert "xfrm: state and policy should fail if XFRMA_IF_ID 0"

This reverts commit 68ac0f3810e76a853b5f7b90601a05c3048b8b54 because ID
0 was meant to be used for configuring the policy/state without
matching for a specific interface (e.g., Cilium is affected, see
https://github.com/cilium/cilium/pull/18789 and
https://github.com/cilium/cilium/pull/19019).

Signed-off-by: Kai Lueke <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input updates from Dmitry Torokhov:

- a fixup for Goodix touchscreen driver allowing it to work on certain
   Cherry Trail devices

- a fix for imbalanced enable/disable regulator in Elam touchpad driver
   that became apparent when used with Asus TF103C 2-in-1 dock

- a couple new input keycodes used on newer keyboards

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  HID: add mapping for KEY_ALL_APPLICATIONS
  HID: add mapping for KEY_DICTATE
  Input: elan_i2c - fix regulator enable count imbalance after suspend/resume
  Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()
  Input: goodix - workaround Cherry Trail devices with a bogus ACPI Interrupt() resource
  Input: goodix - use the new soc_intel_is_byt() helper
  Input: samsung-keypad - properly state IOMEM dependency

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"8 patches.

  Subsystems affected by this patch series: mm (hugetlb, pagemap, and
  userfaultfd), memfd, selftests, and kconfig"

* emailed patches from Andrew Morton <[email protected]>:
  configs/debug: set CONFIG_DEBUG_INFO=y properly
  proc: fix documentation and description of pagemap
  kselftest/vm: fix tests build with old libc
  memfd: fix F_SEAL_WRITE after shmem huge page allocated
  mm: fix use-after-free when anon vma name is used after vma is freed
  mm: prevent vm_area_struct::anon_name refcount saturation
  mm: refactor vm_area_struct::anon_vma_name usage code
  selftests/vm: cleanup hugetlb file after mremap test

Merge tag 's390-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:

- Fix HAVE_DYNAMIC_FTRACE_WITH_ARGS implementation by providing correct
   switching between ftrace_caller/ftrace_regs_caller and supplying
   pt_regs only when ftrace_regs_caller is activated.

- Fix exception table sorting.

- Fix breakage of kdump tooling by preserving metadata it cannot
   function without.

* tag 's390-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/extable: fix exception table sorting
  s390/ftrace: fix arch_ftrace_get_regs implementation
  s390/ftrace: fix ftrace_caller/ftrace_regs_caller generation
  s390/setup: preserve memory at OLDMEM_BASE and OLDMEM_SIZE

perf tools: Remove bpf_map__set_priv()/bpf_map__priv() usage

Both bpf_map__set_priv()/bpf_map__priv() are deprecated and will be
eventually removed.

Use hashmap to replace that functionality.

Suggested-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: [email protected]
Cc: lkml <[email protected]>
Cc: [email protected]

perf tools: Remove bpf_program__set_priv/bpf_program__priv usage

Both bpf_program__set_priv/bpf_program__priv are deprecated
and will be eventually removed.

Using hashmap to replace that functionality.

Suggested-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

configs/debug: set CONFIG_DEBUG_INFO=y properly

CONFIG_DEBUG_INFO can't be set by user directly, so set
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y instead.

Otherwise, we end up with no debuginfo in vmlinux which is a big no-no
for kernel debugging.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Qian Cai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

proc: fix documentation and description of pagemap

Since bit 57 was exported for uffd-wp write-protected (commit
fb8e37f35a2f: "mm/pagemap: export uffd-wp protection information"),
fixing it can reduce some unnecessary confusion.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: fb8e37f35a2fe1 ("mm/pagemap: export uffd-wp protection information")
Signed-off-by: Yun Zhou <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Tiberiu A Georgescu <[email protected]>
Cc: Florian Schmidt <[email protected]>
Cc: Ivan Teterevkov <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: Alistair Popple <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kselftest/vm: fix tests build with old libc

The error message when I build vm tests on debian10 (GLIBC 2.28):

    userfaultfd.c: In function `userfaultfd_pagemap_test':
    userfaultfd.c:1393:37: error: `MADV_PAGEOUT' undeclared (first use
    in this function); did you mean `MADV_RANDOM'?
      if (madvise(area_dst, test_pgsize, MADV_PAGEOUT))
                                         ^~~~~~~~~~~~
                                         MADV_RANDOM

This patch includes these newer definitions from UAPI linux/mman.h, is
useful to fix tests build on systems without these definitions in glibc
sys/mman.h.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Chengming Zhou <[email protected]>
Reviewed-by: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

memfd: fix F_SEAL_WRITE after shmem huge page allocated

Wangyong reports: after enabling tmpfs filesystem to support transparent
hugepage with the following command:

  echo always > /sys/kernel/mm/transparent_hugepage/shmem_enabled

the docker program tries to add F_SEAL_WRITE through the following
command, but it fails unexpectedly with errno EBUSY:

  fcntl(5, F_ADD_SEALS, F_SEAL_WRITE) = -1.

That is because memfd_tag_pins() and memfd_wait_for_pins() were never
updated for shmem huge pages: checking page_mapcount() against
page_count() is hopeless on THP subpages - they need to check
total_mapcount() against page_count() on THP heads only.

Make memfd_tag_pins() (compared > 1) as strict as memfd_wait_for_pins()
(compared != 1): either can be justified, but given the non-atomic
total_mapcount() calculation, it is better now to be strict.  Bear in
mind that total_mapcount() itself scans all of the THP subpages, when
choosing to take an XA_CHECK_SCHED latency break.

Also fix the unlikely xa_is_value() case in memfd_wait_for_pins(): if a
page has been swapped out since memfd_tag_pins(), then its refcount must
have fallen, and so it can safely be untagged.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Reported-by: Zeal Robot <[email protected]>
Reported-by: wangyong <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: CGEL ZTE <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Yang Yang <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: fix use-after-free when anon vma name is used after vma is freed

When adjacent vmas are being merged it can result in the vma that was
originally passed to madvise_update_vma being destroyed.  In the current
implementation, the name parameter passed to madvise_update_vma points
directly to vma->anon_name and it is used after the call to vma_merge.
In the cases when vma_merge merges the original vma and destroys it,
this might result in UAF.  For that the original vma would have to hold
the anon_vma_name with the last reference.  The following vma would need
to contain a different anon_vma_name object with the same string.  Such
scenario is shown below:

madvise_vma_behavior(vma)
  madvise_update_vma(vma, ..., anon_name == vma->anon_name)
    vma_merge(vma)
      __vma_adjust(vma) <-- merges vma with adjacent one
        vm_area_free(vma) <-- frees the original vma
    replace_vma_anon_name(anon_name) <-- UAF of vma->anon_name

Fix this by raising the name refcount and stabilizing it.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 9a10064f5625 ("mm: add a field to store names for private anonymous memory")
Signed-off-by: Suren Baghdasaryan <[email protected]>
Reported-by: [email protected]
Acked-by: Michal Hocko <[email protected]>
Cc: Alexey Gladkov <[email protected]>
Cc: Chris Hyser <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Peter Collingbourne <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Xiaofeng Cao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: prevent vm_area_struct::anon_name refcount saturation

A deep process chain with many vmas could grow really high.  With
default sysctl_max_map_count (64k) and default pid_max (32k) the max
number of vmas in the system is 2147450880 and the refcounter has
headroom of 1073774592 before it reaches REFCOUNT_SATURATED
(3221225472).

Therefore it's unlikely that an anonymous name refcounter will overflow
with these defaults.  Currently the max for pid_max is PID_MAX_LIMIT
(4194304) and for sysctl_max_map_count it's INT_MAX (2147483647).  In
this configuration anon_vma_name refcount overflow becomes theoretically
possible (that still require heavy sharing of that anon_vma_name between
processes).

kref refcounting interface used in anon_vma_name structure will detect a
counter overflow when it reaches REFCOUNT_SATURATED value but will only
generate a warning and freeze the ref counter.  This would lead to the
refcounted object never being freed.  A determined attacker could leak
memory like that but it would be rather expensive and inefficient way to
do so.

To ensure anon_vma_name refcount does not overflow, stop anon_vma_name
sharing when the refcount reaches REFCOUNT_MAX (2147483647), which still
leaves INT_MAX/2 (1073741823) values before the counter reaches
REFCOUNT_SATURATED.  This should provide enough headroom for raising the
refcounts temporarily.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Suren Baghdasaryan <[email protected]>
Suggested-by: Michal Hocko <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Alexey Gladkov <[email protected]>
Cc: Chris Hyser <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Peter Collingbourne <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Xiaofeng Cao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>