David S. Miller [Wed, 9 Mar 2022 10:42:14 +0000 (10:42 +0000)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-03-08
This series contains updates to iavf, i40e, and ice drivers.
Michal ensures netdev features are properly updated to reflect VLAN
changes received from PF and adds an additional flag for MSI-X
reinitialization as further differentiation of reinitialization
operations is needed for iavf.
Jake stops disabling of VFs due to failed virtchannel responses for
i40e and ice driver.
Dave moves MTU event notification to the service task to prevent issues
with RTNL lock for ice.
Christophe Jaillet corrects an allocation to GFP_ATOMIC instead of
GFP_KERNEL for ice.
Jedrzej fixes the value for link speed comparison which was preventing
the requested value from being set for ice.
---
Note: This will conflict when merging with net-next. Resolution:
diff --cc drivers/net/ethernet/intel/ice/ice.h
index dc42ff92dbad,3121f9b04f59..000000000000
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@@ -484,10 -481,9 +484,11 @@@ enum ice_pf_flags
ICE_FLAG_LEGACY_RX,
ICE_FLAG_VF_TRUE_PROMISC_ENA,
ICE_FLAG_MDD_AUTO_RESET_VF,
+ ICE_FLAG_VF_VLAN_PRUNING,
ICE_FLAG_LINK_LENIENT_MODE_ENA,
ICE_FLAG_PLUG_AUX_DEV,
+ ICE_FLAG_MTU_CHANGED,
+ ICE_FLAG_GNSS, /* GNSS successfully initialized */
ICE_PF_FLAGS_NBITS /* must be last */
};
====================
Tung Nguyen [Tue, 8 Mar 2022 02:11:59 +0000 (02:11 +0000)]
tipc: fix incorrect order of state message data sanity check
When receiving a state message, function tipc_link_validate_msg()
is called to validate its header portion. Then, its data portion
is validated before it can be accessed correctly. However, current
data sanity check is done after the message header is accessed to
update some link variables.
This commit fixes this issue by moving the data sanity check to
the beginning of state message handling and right after the header
sanity check.
Miaoqian Lin [Tue, 8 Mar 2022 02:47:49 +0000 (02:47 +0000)]
ethernet: Fix error handling in xemaclite_of_probe
This node pointer is returned by of_parse_phandle() with refcount
incremented in this function. Calling of_node_put() to avoid the
refcount leak. As the remove function do.
block: fix blk_mq_attempt_bio_merge and rq_qos_throttle protection
Commit 9d497e2941c3 ("block: don't protect submit_bio_checks by
q_usage_counter") moved blk_mq_attempt_bio_merge and rq_qos_throttle
calls out of q_usage_counter protection. However, these functions require
q_usage_counter protection. The blk_mq_attempt_bio_merge call without
the protection resulted in blktests block/005 failure with KASAN null-
ptr-deref or use-after-free at bio merge. The rq_qos_throttle call
without the protection caused kernel hang at qos throttle.
To fix the failures, move the blk_mq_attempt_bio_merge and
rq_qos_throttle calls back to q_usage_counter protection.
Change curr_link_speed advertised speed, due to
link_info.link_speed is not equal phy.curr_user_speed_req.
Without this patch it is impossible to set advertised
speed to same as link_speed.
Testing Hints: Try to set advertised speed
to 25G only with 25G default link (use ethtool -s 0x80000000)
Fixes: 48cb27f2fd18 ("ice: Implement handlers for ethtool PHY/link operations") Signed-off-by: Grzegorz Siwik <[email protected]> Signed-off-by: Jedrzej Jagielski <[email protected]> Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
Dave Ertman [Fri, 18 Feb 2022 20:39:25 +0000 (12:39 -0800)]
ice: Fix error with handling of bonding MTU
When a bonded interface is destroyed, .ndo_change_mtu can be called
during the tear-down process while the RTNL lock is held. This is a
problem since the auxiliary driver linked to the LAN driver needs to be
notified of the MTU change, and this requires grabbing a device_lock on
the auxiliary_device's dev. Currently this is being attempted in the
same execution context as the call to .ndo_change_mtu which is causing a
dead-lock.
Move the notification of the changed MTU to a separate execution context
(watchdog service task) and eliminate the "before" notification.
Fixes: 348048e724a0e ("ice: Implement iidc operations") Signed-off-by: Dave Ertman <[email protected]> Tested-by: Jonathan Toppins <[email protected]> Tested-by: Gurucharan G <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
Jacob Keller [Thu, 17 Feb 2022 00:51:36 +0000 (16:51 -0800)]
ice: stop disabling VFs due to PF error responses
The ice_vc_send_msg_to_vf function has logic to detect "failure"
responses being sent to a VF. If a VF is sent more than
ICE_DFLT_NUM_INVAL_MSGS_ALLOWED then the VF is marked as disabled.
Almost identical logic also existed in the i40e driver.
This logic was added to the ice driver in commit 1071a8358a28 ("ice:
Implement virtchnl commands for AVF support") which itself copied from
the i40e implementation in commit 5c3c48ac6bf5 ("i40e: implement virtual
device interface").
Neither commit provides a proper explanation or justification of the
check. In fact, later commits to i40e changed the logic to allow
bypassing the check in some specific instances.
The "logic" for this seems to be that error responses somehow indicate a
malicious VF. This is not really true. The PF might be sending an error
for any number of reasons such as lack of resources, etc.
Additionally, this causes the PF to log an info message for every failed
VF response which may confuse users, and can spam the kernel log.
This behavior is not documented as part of any requirement for our
products and other operating system drivers such as the FreeBSD
implementation of our drivers do not include this type of check.
In fact, the change from dev_err to dev_info in i40e commit 18b7af57d9c1
("i40e: Lower some message levels") explains that these messages
typically don't actually indicate a real issue. It is quite likely that
a user who hits this in practice will be very confused as the VF will be
disabled without an obvious way to recover.
We already have robust malicious driver detection logic using actual
hardware detection mechanisms that detect and prevent invalid device
usage. Remove the logic since its not a documented requirement and the
behavior is not intuitive.
Fixes: 1071a8358a28 ("ice: Implement virtchnl commands for AVF support") Signed-off-by: Jacob Keller <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
Jacob Keller [Thu, 17 Feb 2022 00:51:35 +0000 (16:51 -0800)]
i40e: stop disabling VFs due to PF error responses
The i40e_vc_send_msg_to_vf_ex (and its wrapper i40e_vc_send_msg_to_vf)
function has logic to detect "failure" responses sent to the VF. If a VF
is sent more than I40E_DEFAULT_NUM_INVALID_MSGS_ALLOWED, then the VF is
marked as disabled. In either case, a dev_info message is printed
stating that a VF opcode failed.
This logic originates from the early implementation of VF support in
commit 5c3c48ac6bf5 ("i40e: implement virtual device interface").
That commit did not go far enough. The "logic" for this behavior seems
to be that error responses somehow indicate a malicious VF. This is not
really true. The PF might be sending an error for any number of reasons
such as lacking resources, an unsupported operation, etc. This does not
indicate a malicious VF. We already have a separate robust malicious VF
detection which relies on hardware logic to detect and prevent a variety
of behaviors.
There is no justification for this behavior in the original
implementation. In fact, a later commit 18b7af57d9c1 ("i40e: Lower some
message levels") reduced the opcode failure message from a dev_err to a
dev_info. In addition, recent commit 01cbf50877e6 ("i40e: Fix to not
show opcode msg on unsuccessful VF MAC change") changed the logic to
allow quieting it for expected failures.
That commit prevented this logic from kicking in for specific
circumstances. This change did not go far enough. The behavior is not
documented nor is it part of any requirement for our products. Other
operating systems such as the FreeBSD implementation of our driver do
not include this logic.
It is clear this check does not make sense, and causes problems which
led to ugly workarounds.
Fix this by just removing the entire logic and the need for the
i40e_vc_send_msg_to_vf_ex function.
Fixes: 01cbf50877e6 ("i40e: Fix to not show opcode msg on unsuccessful VF MAC change") Fixes: 5c3c48ac6bf5 ("i40e: implement virtual device interface") Signed-off-by: Jacob Keller <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
In some cases overloaded flag IAVF_FLAG_REINIT_ITR_NEEDED
which should indicate that interrupts need to be completely
reinitialized during reset leads to RTNL deadlocks using ethtool -C
while a reset is in progress.
To fix, it was added a new flag IAVF_FLAG_REINIT_MSIX_NEEDED
used to trigger MSI-X reinit.
New combined setting is fixed adopt after VF reset.
This has been implemented by call reinit interrupt scheme
during VF reset.
Without this fix new combined setting has never been adopted.
iavf: Fix handling of vlan strip virtual channel messages
Modify netdev->features for vlan stripping based on virtual
channel messages received from the PF. Change is needed
to synchronize vlan strip status between PF sysfs and iavf ethtool.
Linus Torvalds [Tue, 8 Mar 2022 17:41:18 +0000 (09:41 -0800)]
Merge tag 'fuse-fixes-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
- Fix an issue with splice on the fuse device
- Fix a regression in the fileattr API conversion
- Add a small userspace API improvement
* tag 'fuse-fixes-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: fix pipe buffer lifetime for direct_io
fuse: move FUSE_SUPER_MAGIC definition to magic.h
fuse: fix fileattr op failure
Linus Torvalds [Tue, 8 Mar 2022 17:27:25 +0000 (09:27 -0800)]
Merge tag 'arm64-spectre-bhb-for-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 spectre fixes from James Morse:
"ARM64 Spectre-BHB mitigations:
- Make EL1 vectors per-cpu
- Add mitigation sequences to the EL1 and EL2 vectors on vulnerble
CPUs
- Implement ARCH_WORKAROUND_3 for KVM guests
- Report Vulnerable when unprivileged eBPF is enabled"
* tag 'arm64-spectre-bhb-for-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting
arm64: Use the clearbhb instruction in mitigations
KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
arm64: Mitigate spectre style branch history side channels
arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2
arm64: Add percpu vectors for EL1
arm64: entry: Add macro for reading symbol addresses from the trampoline
arm64: entry: Add vectors that have the bhb mitigation sequences
arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
arm64: entry: Allow the trampoline text to occupy multiple pages
arm64: entry: Make the kpti trampoline's kpti sequence optional
arm64: entry: Move trampoline macros out of ifdef'd section
arm64: entry: Don't assume tramp_vectors is the start of the vectors
arm64: entry: Allow tramp_alias to access symbols after the 4K boundary
arm64: entry: Move the trampoline data page before the text page
arm64: entry: Free up another register on kpti's tramp_exit path
arm64: entry: Make the trampoline cleanup optional
KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A
arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit
arm64: entry.S: Add ventry overflow sanity checks
Linus Torvalds [Tue, 8 Mar 2022 17:08:06 +0000 (09:08 -0800)]
Merge tag 'for-linus-bhb' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM spectre fixes from Russell King:
"ARM Spectre BHB mitigations.
These patches add Spectre BHB migitations for the following Arm CPUs
to the 32-bit ARM kernels:
- Cortex A15
- Cortex A57
- Cortex A72
- Cortex A73
- Cortex A75
- Brahma B15
for CVE-2022-23960"
* tag 'for-linus-bhb' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: include unprivileged BPF status in Spectre V2 reporting
ARM: Spectre-BHB workaround
ARM: use LOADADDR() to get load address of sections
ARM: early traps initialisation
ARM: report Spectre v2 status through sysfs
On some boards, for routing CAN signals from controller to transceivers,
muxes might need to be set. This can be implemented using mux-states
property. Therefore, document the same in the respective bindings.
Rob Herring [Thu, 3 Mar 2022 23:23:49 +0000 (17:23 -0600)]
dt-bindings: mfd: Fix pinctrl node name warnings
The recent addition pinctrl.yaml in commit c09acbc499e8 ("dt-bindings:
pinctrl: use pinctrl.yaml") resulted in some node name warnings:
Documentation/devicetree/bindings/mfd/cirrus,lochnagar.example.dt.yaml: \
lochnagar-pinctrl: $nodename:0: 'lochnagar-pinctrl' does not match '^(pinctrl|pinmux)(@[0-9a-f]+)?$'
Documentation/devicetree/bindings/mfd/cirrus,madera.example.dt.yaml: \
codec@1a: $nodename:0: 'codec@1a' does not match '^(pinctrl|pinmux)(@[0-9a-f]+)?$'
Documentation/devicetree/bindings/mfd/brcm,cru.example.dt.yaml: \
pin-controller@1c0: $nodename:0: 'pin-controller@1c0' does not match '^(pinctrl|pinmux)(@[0-9a-f]+)?$'
Fix the node names to the preferred 'pinctrl'. For cirrus,madera,
nothing from pinctrl.yaml schema is used, so just drop the reference.
Jisheng Zhang [Fri, 4 Mar 2022 07:55:59 +0000 (15:55 +0800)]
MAINTAINERS: Update Jisheng's email address
I'm leaving synaptics. Update my email address to my korg mail
address and add entries to .mailmap as well to map my work
addresses to korg mail address.
ARM: include unprivileged BPF status in Spectre V2 reporting
The mitigations for Spectre-BHB are only applied when an exception
is taken, but when unprivileged BPF is enabled, userspace can
load BPF programs that can be used to exploit the problem.
When unprivileged BPF is enabled, report the vulnerable status via
the spectre_v2 sysfs file.
Robert Foss [Tue, 8 Mar 2022 09:49:10 +0000 (10:49 +0100)]
dt-bindings: drm/bridge: anx7625: Revert DPI support
Revert DPI support from binding.
DPI support relies on the bus-type enum which does not yet support
Mipi DPI, since no v4l2_fwnode_bus_type has been defined for this
bus type.
When DPI for anx7625 was initially added, it assumed that
V4L2_FWNODE_BUS_TYPE_PARALLEL was the correct bus type for
representing DPI, which it is not.
In order to prevent adding this mis-usage to the ABI, let's revert
the support.
[ 0.742963] aspeed-g6-pinctrl 1e6e2000.syscon:pinctrl: invalid function FWQSPID in map table

This is because the quad mode pins are a group of pins, not a function.
After applying this patch we can request the pins and the QSPI data
lines are muxed:
# cat /sys/kernel/debug/pinctrl/1e6e2000.syscon\:pinctrl-aspeed-g6-pinctrl/pinmux-pins |grep 1e620000.spi
pin 196 (AE12): device 1e620000.spi function FWSPID group FWQSPID
pin 197 (AF12): device 1e620000.spi function FWSPID group FWQSPID
pin 240 (Y1): device 1e620000.spi function FWSPID group FWQSPID
pin 241 (Y2): device 1e620000.spi function FWSPID group FWQSPID
pin 242 (Y3): device 1e620000.spi function FWSPID group FWQSPID
pin 243 (Y4): device 1e620000.spi function FWSPID group FWQSPID
Arnd Bergmann [Tue, 8 Mar 2022 12:43:41 +0000 (13:43 +0100)]
Merge tag 'tegra-for-5.17-arm-dt-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into arm/fixes
ARM: tegra: Device tree fixes for v5.17
One more patch to fix up eDP panels on Nyan FHD models.
* tag 'tegra-for-5.17-arm-dt-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
ARM: tegra: Move Nyan FHD panels to AUX bus
ARM: tegra: Move panels to AUX bus
Biju Das [Mon, 7 Mar 2022 18:48:43 +0000 (18:48 +0000)]
spi: Fix invalid sgs value
max_seg_size is unsigned int and it can have a value up to 2^32
(for eg:-RZ_DMAC driver sets dma_set_max_seg_size as U32_MAX)
When this value is used in min_t() as an integer type, it becomes
-1 and the value of sgs becomes 0.
Fix this issue by replacing the 'int' data type with 'unsigned int'
in min_t().
Catalin Marinas [Thu, 3 Mar 2022 18:00:44 +0000 (18:00 +0000)]
arm64: Ensure execute-only permissions are not allowed without EPAN
Commit 18107f8a2df6 ("arm64: Support execute-only permissions with
Enhanced PAN") re-introduced execute-only permissions when EPAN is
available. When EPAN is not available, arch_filter_pgprot() is supposed
to change a PAGE_EXECONLY permission into PAGE_READONLY_EXEC. However,
if BTI or MTE are present, such check does not detect the execute-only
pgprot in the presence of PTE_GP (BTI) or MT_NORMAL_TAGGED (MTE),
allowing the user to request PROT_EXEC with PROT_BTI or PROT_MTE.
Remove the arch_filter_pgprot() function, change the default VM_EXEC
permissions to PAGE_READONLY_EXEC and update the protection_map[] array
at core_initcall() if EPAN is detected.
Linus Torvalds [Tue, 8 Mar 2022 01:29:47 +0000 (17:29 -0800)]
Merge tag 'x86_bugs_for_v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 spectre fixes from Borislav Petkov:
- Mitigate Spectre v2-type Branch History Buffer attacks on machines
which support eIBRS, i.e., the hardware-assisted speculation
restriction after it has been shown that such machines are vulnerable
even with the hardware mitigation.
- Do not use the default LFENCE-based Spectre v2 mitigation on AMD as
it is insufficient to mitigate such attacks. Instead, switch to
retpolines on all AMD by default.
- Update the docs and add some warnings for the obviously vulnerable
cmdline configurations.
* tag 'x86_bugs_for_v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
x86/speculation: Warn about Spectre v2 LFENCE mitigation
x86/speculation: Update link to AMD speculation whitepaper
x86/speculation: Use generic retpoline by default on AMD
x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
Documentation/hw-vuln: Update spectre doc
x86/speculation: Add eIBRS + Retpoline options
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
Arnd Bergmann [Mon, 7 Mar 2022 22:23:56 +0000 (23:23 +0100)]
Merge tag 'tegra-for-5.17-arm64-dt-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into arm/fixes
arm64: tegra: Device tree fixes for v5.17
This contains a single, last-minute fix to disable the display SMMU by
default because under some circumstances leaving it enabled by default
can cause SMMU faults on boot.
* tag 'tegra-for-5.17-arm64-dt-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
arm64: tegra: Disable ISO SMMU for Tegra194
Linus Torvalds [Mon, 7 Mar 2022 19:43:22 +0000 (11:43 -0800)]
Merge tag 'mtd/fixes-for-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull MTD fix from Miquel Raynal:
"As part of a previous changeset introducing support for the K3
architecture, the OMAP_GPMC (a non visible symbol) got selected by the
selection of MTD_NAND_OMAP2 instead of doing so from the architecture
directly (like for the other users of these two drivers). Indeed, from
a hardware perspective, the OMAP NAND controller needs the GPMC to
work.
This led to a robot error which got addressed in fix merge into -rc4.
Unfortunately, the approach at this time still used "select" and lead
to further build error reports (sparc64:allmodconfig).
This time we switch to 'depends on' in order to prevent random
misconfigurations. The different dependencies will however need a
future cleanup"
* tag 'mtd/fixes-for-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: omap2: Actually prevent invalid configuration and build error
Linus Torvalds [Mon, 7 Mar 2022 19:32:17 +0000 (11:32 -0800)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
"Some last minute fixes that took a while to get ready. Not
regressions, but they look safe and seem to be worth to have"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
tools/virtio: handle fallout from folio work
tools/virtio: fix virtio_test execution
vhost: remove avail_event arg from vhost_update_avail_event()
virtio: drop default for virtio-mem
vdpa: fix use-after-free on vp_vdpa_remove
virtio-blk: Remove BUG_ON() in virtio_queue_rq()
virtio-blk: Don't use MAX_DISCARD_SEGMENTS if max_discard_seg is zero
vhost: fix hung thread due to erroneous iotlb entries
vduse: Fix returning wrong type in vduse_domain_alloc_iova()
vdpa/mlx5: add validation for VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command
vdpa/mlx5: should verify CTRL_VQ feature exists for MQ
vdpa: factor out vdpa_set_features_unlocked for vdpa internal use
virtio_console: break out of buf poll on remove
virtio: document virtio_reset_device
virtio: acknowledge all features before access
virtio: unexport virtio_finalize_features
Halil Pasic [Sat, 5 Mar 2022 17:07:14 +0000 (18:07 +0100)]
swiotlb: rework "fix info leak with DMA_FROM_DEVICE"
Unfortunately, we ended up merging an old version of the patch "fix info
leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph
(the swiotlb maintainer), he asked me to create an incremental fix
(after I have pointed this out the mix up, and asked him for guidance).
So here we go.
The main differences between what we got and what was agreed are:
* swiotlb_sync_single_for_device is also required to do an extra bounce
* We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters
* The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE
must take precedence over DMA_ATTR_SKIP_CPU_SYNC
Thus this patch removes DMA_ATTR_OVERWRITE, and makes
swiotlb_sync_single_for_device() bounce unconditionally (that is, also
when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
data from the swiotlb buffer.
Let me note, that if the size used with dma_sync_* API is less than the
size used with dma_[un]map_*, under certain circumstances we may still
end up with swiotlb not being transparent. In that sense, this is no
perfect fix either.
To get this bullet proof, we would have to bounce the entire
mapping/bounce buffer. For that we would have to figure out the starting
address, and the size of the mapping in
swiotlb_sync_single_for_device(). While this does seem possible, there
seems to be no firm consensus on how things are supposed to work.
Guo Zhengkui [Mon, 7 Mar 2022 03:39:59 +0000 (11:39 +0800)]
perf tools: Use ARRAY_SIZE() instead of ad hoc equivalent, spotted by array_size.cocci
Fix the following coccicheck warning:
tools/perf/util/trace-event-parse.c:209:35-36: WARNING: Use ARRAY_SIZE
ARRAY_SIZE(arr) is a macro provided in tools/include/linux/kernel.h,
which not only measures the size of the array, but also makes sure
that `arr` is really an array.
It has been tested with gcc (Debian 8.3.0-6) 8.3.0.
James Clark [Mon, 7 Mar 2022 17:19:17 +0000 (17:19 +0000)]
perf script: Output branch sample type
The type info is saved when using '-j save_type'. Output this in 'perf
script' so it can be accessed by other tools or for debugging.
It's appended to the end of the list of fields so any existing tools
that split on / and access fields via an index are not affected. Also
output '-' instead of 'N/A' when the branch type isn't saved because /
is used as a field separator.
James Clark [Mon, 7 Mar 2022 17:19:14 +0000 (17:19 +0000)]
perf evsel: Add error message for unsupported branch stack cases
EOPNOTSUPP is a possible return value when branch stacks are requested
but they aren't enabled in the kernel or hardware. It's also returned if
they aren't supported on the specific event type. The currently printed
error message about sampling/overflow-interrupts is not correct in this
case.
Add a check for branch stacks before sample_period is checked because
sample_period is also set (to the default value) when using branch
stacks.
Before this change (when branch stacks aren't supported):
perf record -j any
Error:
cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
After this change:
perf record -j any
Error:
cycles: PMU Hardware or event type doesn't support branch stack sampling.
James Morse [Thu, 3 Mar 2022 16:53:56 +0000 (16:53 +0000)]
arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting
The mitigations for Spectre-BHB are only applied when an exception is
taken from user-space. The mitigation status is reported via the spectre_v2
sysfs vulnerabilities file.
When unprivileged eBPF is enabled the mitigation in the exception vectors
can be avoided by an eBPF program.
When unprivileged eBPF is enabled, print a warning and report vulnerable
via the sysfs vulnerabilities file.
Roger Quadros [Sat, 19 Feb 2022 19:36:00 +0000 (21:36 +0200)]
mtd: rawnand: omap2: Actually prevent invalid configuration and build error
The root of the problem is that we are selecting symbols that have
dependencies. This can cause random configurations that can fail.
The cleanest solution is to avoid using select.
This driver uses interfaces from the OMAP_GPMC driver so we have to
depend on it instead.
Miklos Szeredi [Mon, 7 Mar 2022 15:30:44 +0000 (16:30 +0100)]
fuse: fix pipe buffer lifetime for direct_io
In FOPEN_DIRECT_IO mode, fuse_file_write_iter() calls
fuse_direct_write_iter(), which normally calls fuse_direct_io(), which then
imports the write buffer with fuse_get_user_pages(), which uses
iov_iter_get_pages() to grab references to userspace pages instead of
actually copying memory.
On the filesystem device side, these pages can then either be read to
userspace (via fuse_dev_read()), or splice()d over into a pipe using
fuse_dev_splice_read() as pipe buffers with &nosteal_pipe_buf_ops.
This is wrong because after fuse_dev_do_read() unlocks the FUSE request,
the userspace filesystem can mark the request as completed, causing write()
to return. At that point, the userspace filesystem should no longer have
access to the pipe buffer.
Fix by copying pages coming from the user address space to new pipe
buffers.
Jouni Högander [Fri, 25 Feb 2022 07:02:28 +0000 (09:02 +0200)]
drm/i915/psr: Set "SF Partial Frame Enable" also on full update
Currently we are observing occasional screen flickering when
PSR2 selective fetch is enabled. More specifically glitch seems
to happen on full frame update when cursor moves to coords
x = -1 or y = -1.
According to Bspec SF Single full frame should not be set if
SF Partial Frame Enable is not set. This happened to be true for
ADLP as PSR2_MAN_TRK_CTL_ENABLE is always set and for ADL_P it's
actually "SF Partial Frame Enable" (Bit 31).
Setting "SF Partial Frame Enable" bit also on full update seems to
fix screen flickering.
Also make code more clear by setting PSR2_MAN_TRK_CTL_ENABLE
only if not on ADL_P. Bit 31 has different meaning in ADL_P.
Bspec: 49274
v2: Fix Mihai Harpau email address
v3: Modify commit message and remove unnecessary comment
Andy Shevchenko [Mon, 7 Mar 2022 11:56:23 +0000 (13:56 +0200)]
gpiolib: acpi: Convert ACPI value of debounce to microseconds
It appears that GPIO ACPI library uses ACPI debounce values directly.
However, the GPIO library APIs expect the debounce timeout to be in
microseconds.
Convert ACPI value of debounce to microseconds.
While at it, document this detail where it is appropriate.
Some GPIO lines have stopped working after the patch
commit 2ab73c6d8323f ("gpio: Support GPIO controllers without pin-ranges")
And this has supposedly been fixed in the following patches
commit 89ad556b7f96a ("gpio: Avoid using pin ranges with !PINCTRL")
commit 6dbbf84603961 ("gpiolib: Don't free if pin ranges are not defined")
But an erratic behavior where some GPIO lines work while others do not work
has been introduced.
This patch reverts those changes so that the sysfs-gpio interface works
properly again.
Fabio Estevam [Sat, 5 Mar 2022 20:47:20 +0000 (17:47 -0300)]
smsc95xx: Ignore -ENODEV errors when device is unplugged
According to Documentation/driver-api/usb/URB.rst when a device
is unplugged usb_submit_urb() returns -ENODEV.
This error code propagates all the way up to usbnet_read_cmd() and
usbnet_write_cmd() calls inside the smsc95xx.c driver during
Ethernet cable unplug, unbind or reboot.
This causes the following errors to be shown on reboot, for example:
ci_hdrc ci_hdrc.1: remove, state 1
usb usb2: USB disconnect, device number 1
usb 2-1: USB disconnect, device number 2
usb 2-1.1: USB disconnect, device number 3
smsc95xx 2-1.1:1.0 eth1: unregister 'smsc95xx' usb-ci_hdrc.1-1.1, smsc95xx USB 2.0 Ethernet
smsc95xx 2-1.1:1.0 eth1: Failed to read reg index 0x00000114: -19
smsc95xx 2-1.1:1.0 eth1: Error reading MII_ACCESS
smsc95xx 2-1.1:1.0 eth1: __smsc95xx_mdio_read: MII is busy
smsc95xx 2-1.1:1.0 eth1: Failed to read reg index 0x00000114: -19
smsc95xx 2-1.1:1.0 eth1: Error reading MII_ACCESS
smsc95xx 2-1.1:1.0 eth1: __smsc95xx_mdio_read: MII is busy
smsc95xx 2-1.1:1.0 eth1: hardware isn't capable of remote wakeup
usb 2-1.4: USB disconnect, device number 4
ci_hdrc ci_hdrc.1: USB bus 2 deregistered
ci_hdrc ci_hdrc.0: remove, state 4
usb usb1: USB disconnect, device number 1
ci_hdrc ci_hdrc.0: USB bus 1 deregistered
imx2-wdt 30280000.watchdog: Device shutdown: Expect reboot!
reboot: Restarting system
Ignore the -ENODEV errors inside __smsc95xx_mdio_read() and
__smsc95xx_phy_wait_not_busy() and do not print error messages
when -ENODEV is returned.
Fixes: a049a30fc27c ("net: usb: Correct PHY handling of smsc95xx") Signed-off-by: Fabio Estevam <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Tom Rix [Sat, 5 Mar 2022 15:06:42 +0000 (07:06 -0800)]
qed: return status of qed_iov_get_link
Clang static analysis reports this issue
qed_sriov.c:4727:19: warning: Assigned value is
garbage or undefined
ivi->max_tx_rate = tx_rate ? tx_rate : link.speed;
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
link is only sometimes set by the call to qed_iov_get_link()
qed_iov_get_link fails without setting link or returning
status. So change the decl to return status.
Fixes: 73390ac9d82b ("qed*: support ndo_get_vf_config") Signed-off-by: Tom Rix <[email protected]> Signed-off-by: David S. Miller <[email protected]>
The esp tunnel GSO handlers use skb_mac_gso_segment to
push the inner packet to the segmentation handlers.
However, skb_mac_gso_segment takes the Ethernet Protocol
ID from 'skb->protocol' which is wrong for inter address
family tunnels. We fix this by introducing a new
skb_eth_gso_segment function.
This function can be used if it is necessary to pass the
Ethernet Protocol ID directly to the segmentation handler.
First users of this function will be the esp4 and esp6
tunnel segmentation handlers.
Fixes: c35fe4106b92 ("xfrm: Add mode handlers for IPsec on layer 2") Signed-off-by: Steffen Klassert <[email protected]>
esp: Fix BEET mode inter address family tunneling on GSO
The xfrm{4,6}_beet_gso_segment() functions did not correctly set the
SKB_GSO_IPXIP4 and SKB_GSO_IPXIP6 gso types for the address family
tunneling case. Fix this by setting these gso types.
esp: Fix possible buffer overflow in ESP transformation
The maximum message size that can be send is bigger than
the maximum site that skb_page_frag_refill can allocate.
So it is possible to write beyond the allocated buffer.
Fix this by doing a fallback to COW in that case.
v2:
Avoid get get_order() costs as suggested by Linus Torvalds.
Ulf Hansson [Fri, 4 Mar 2022 10:56:56 +0000 (11:56 +0100)]
mmc: core: Restore (almost) the busy polling for MMC_SEND_OP_COND
Commit 76bfc7ccc2fa ("mmc: core: adjust polling interval for CMD1"),
significantly decreased the polling period from ~10-12ms into just a couple
of us. The purpose was to decrease the total time spent in the busy polling
loop, but unfortunate it has lead to problems, that causes eMMC cards to
never gets out busy and thus fails to be initialized.
To fix the problem, but also to try to keep some of the new improved
behaviour, let's start by using a polling period of 1-2ms, which then
increases for each loop, according to common polling loop in
__mmc_poll_for_busy().
Juergen Gross [Mon, 7 Mar 2022 08:48:55 +0000 (09:48 +0100)]
xen/netfront: react properly to failing gnttab_end_foreign_access_ref()
When calling gnttab_end_foreign_access_ref() the returned value must
be tested and the reaction to that value should be appropriate.
In case of failure in xennet_get_responses() the reaction should not be
to crash the system, but to disable the network device.
The calls in setup_netfront() can be replaced by calls of
gnttab_end_foreign_access(). While at it avoid double free of ring
pages and grant references via xennet_disconnect_backend() in this case.
Juergen Gross [Mon, 7 Mar 2022 08:48:55 +0000 (09:48 +0100)]
xen/gnttab: fix gnttab_end_foreign_access() without page specified
gnttab_end_foreign_access() is used to free a grant reference and
optionally to free the associated page. In case the grant is still in
use by the other side processing is being deferred. This leads to a
problem in case no page to be freed is specified by the caller: the
caller doesn't know that the page is still mapped by the other side
and thus should not be used for other purposes.
The correct way to handle this situation is to take an additional
reference to the granted page in case handling is being deferred and
to drop that reference when the grant reference could be freed
finally.
This requires that there are no users of gnttab_end_foreign_access()
left directly repurposing the granted page after the call, as this
might result in clobbered data or information leaks via the not yet
freed grant reference.
This is part of CVE-2022-23041 / XSA-396.
Reported-by: Simon Gaiser <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Jan Beulich <[email protected]>
---
V4:
- expand comment in header
V5:
- get page ref in case of kmalloc() failure, too
Juergen Gross [Mon, 7 Mar 2022 08:48:55 +0000 (09:48 +0100)]
xen/pvcalls: use alloc/free_pages_exact()
Instead of __get_free_pages() and free_pages() use alloc_pages_exact()
and free_pages_exact(). This is in preparation of a change of
gnttab_end_foreign_access() which will prohibit use of high-order
pages.
Juergen Gross [Mon, 7 Mar 2022 08:48:55 +0000 (09:48 +0100)]
xen/9p: use alloc/free_pages_exact()
Instead of __get_free_pages() and free_pages() use alloc_pages_exact()
and free_pages_exact(). This is in preparation of a change of
gnttab_end_foreign_access() which will prohibit use of high-order
pages.
By using the local variable "order" instead of ring->intf->ring_order
in the error path of xen_9pfs_front_alloc_dataring() another bug is
fixed, as the error path can be entered before ring->intf->ring_order
is being set.
By using alloc_pages_exact() the size in bytes is specified for the
allocation, which fixes another bug for the case of
order < (PAGE_SHIFT - XEN_PAGE_SHIFT).
Juergen Gross [Mon, 7 Mar 2022 08:48:55 +0000 (09:48 +0100)]
xen/usb: don't use gnttab_end_foreign_access() in xenhcd_gnttab_done()
The usage of gnttab_end_foreign_access() in xenhcd_gnttab_done() is
not safe against a malicious backend, as the backend could keep the
I/O page mapped and modify it even after the granted memory page is
being used for completely other purposes in the local system.
So replace that use case with gnttab_try_end_foreign_access() and
disable the PV host adapter in case the backend didn't stop using the
granted page.
In xenhcd_urb_request_done() immediately return in case of setting
the device state to "error" instead of looking into further backend
responses.
Juergen Gross [Mon, 7 Mar 2022 08:48:54 +0000 (09:48 +0100)]
xen: remove gnttab_query_foreign_access()
Remove gnttab_query_foreign_access(), as it is unused and unsafe to
use.
All previous use cases assumed a grant would not be in use after
gnttab_query_foreign_access() returned 0. This information is useless
in best case, as it only refers to a situation in the past, which could
have changed already.
Juergen Gross [Mon, 7 Mar 2022 08:48:54 +0000 (09:48 +0100)]
xen/gntalloc: don't use gnttab_query_foreign_access()
Using gnttab_query_foreign_access() is unsafe, as it is racy by design.
The use case in the gntalloc driver is not needed at all. While at it
replace the call of gnttab_end_foreign_access_ref() with a call of
gnttab_end_foreign_access(), which is what is really wanted there. In
case the grant wasn't used due to an allocation failure, just free the
grant via gnttab_free_grant_reference().
Juergen Gross [Mon, 7 Mar 2022 08:48:54 +0000 (09:48 +0100)]
xen/scsifront: don't use gnttab_query_foreign_access() for mapped status
It isn't enough to check whether a grant is still being in use by
calling gnttab_query_foreign_access(), as a mapping could be realized
by the other side just after having called that function.
In case the call was done in preparation of revoking a grant it is
better to do so via gnttab_try_end_foreign_access() and check the
success of that operation instead.
Juergen Gross [Mon, 7 Mar 2022 08:48:54 +0000 (09:48 +0100)]
xen/netfront: don't use gnttab_query_foreign_access() for mapped status
It isn't enough to check whether a grant is still being in use by
calling gnttab_query_foreign_access(), as a mapping could be realized
by the other side just after having called that function.
In case the call was done in preparation of revoking a grant it is
better to do so via gnttab_end_foreign_access_ref() and check the
success of that operation instead.
This is CVE-2022-23037 / part of XSA-396.
Reported-by: Demi Marie Obenour <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Jan Beulich <[email protected]>
---
V2:
- use gnttab_try_end_foreign_access()
V3:
- don't use gnttab_try_end_foreign_access()
Juergen Gross [Mon, 7 Mar 2022 08:48:54 +0000 (09:48 +0100)]
xen/blkfront: don't use gnttab_query_foreign_access() for mapped status
It isn't enough to check whether a grant is still being in use by
calling gnttab_query_foreign_access(), as a mapping could be realized
by the other side just after having called that function.
In case the call was done in preparation of revoking a grant it is
better to do so via gnttab_end_foreign_access_ref() and check the
success of that operation instead.
For the ring allocation use alloc_pages_exact() in order to avoid
high order pages in case of a multi-page ring.
If a grant wasn't unmapped by the backend without persistent grants
being used, set the device state to "error".
This is CVE-2022-23036 / part of XSA-396.
Reported-by: Demi Marie Obenour <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Roger Pau Monné <[email protected]>
---
V2:
- use gnttab_try_end_foreign_access()
V4:
- use alloc_pages_exact() and free_pages_exact()
- set state to error if backend didn't unmap (Roger Pau Monné)
Add a new grant table function gnttab_try_end_foreign_access(), which
will remove and free a grant if it is not in use.
Its main use case is to either free a grant if it is no longer in use,
or to take some other action if it is still in use. This other action
can be an error exit, or (e.g. in the case of blkfront persistent grant
feature) some special handling.
This is CVE-2022-23036, CVE-2022-23038 / part of XSA-396.
Reported-by: Demi Marie Obenour <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Jan Beulich <[email protected]>
---
V2:
- new patch
V4:
- add comments to header (Jan Beulich)
Juergen Gross [Mon, 7 Mar 2022 08:48:54 +0000 (09:48 +0100)]
xen/xenbus: don't let xenbus_grant_ring() remove grants in error case
Letting xenbus_grant_ring() tear down grants in the error case is
problematic, as the other side could already have used these grants.
Calling gnttab_end_foreign_access_ref() without checking success is
resulting in an unclear situation for any caller of xenbus_grant_ring()
as in the error case the memory pages of the ring page might be
partially mapped. Freeing them would risk unwanted foreign access to
them, while not freeing them would leak memory.
In order to remove the need to undo any gnttab_grant_foreign_access()
calls, use gnttab_alloc_grant_references() to make sure no further
error can occur in the loop granting access to the ring pages.
It should be noted that this way of handling removes leaking of
grant entries in the error case, too.
Michael Ellerman [Fri, 11 Feb 2022 06:32:37 +0000 (17:32 +1100)]
powerpc: Fix STACKTRACE=n build
Our skiroot_defconfig doesn't enable FTRACE, and so doesn't get
STACKTRACE enabled either. That leads to a build failure since commit 1614b2b11fab ("arch: Make ARCH_STACKWALK independent of STACKTRACE")
made stacktrace.c build even when STACKTRACE=n.
arch/powerpc/kernel/stacktrace.c: In function ‘handle_backtrace_ipi’:
arch/powerpc/kernel/stacktrace.c:171:2: error: implicit declaration of function ‘nmi_cpu_backtrace’
171 | nmi_cpu_backtrace(regs);
| ^~~~~~~~~~~~~~~~~
arch/powerpc/kernel/stacktrace.c: In function ‘arch_trigger_cpumask_backtrace’:
arch/powerpc/kernel/stacktrace.c:226:2: error: implicit declaration of function ‘nmi_trigger_cpumask_backtrace’
226 | nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace_ipi);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This happens because our headers haven't defined
arch_trigger_cpumask_backtrace, which causes lib/nmi_backtrace.c not to
build nmi_cpu_backtrace().
The code in question doesn't actually depend on STACKTRACE=y, that was
just added because arch_trigger_cpumask_backtrace() lived in
stacktrace.c for convenience. So drop the dependency on
CONFIG_STACKTRACE, that causes lib/nmi_backtrace.c to build
nmi_cpu_backtrace() etc. and fixes the build.
Linus Torvalds [Sun, 6 Mar 2022 20:19:36 +0000 (12:19 -0800)]
Merge tag 'for-5.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more fixes for various problems that have user visible effects
or seem to be urgent:
- fix corruption when combining DIO and non-blocking io_uring over
multiple extents (seen on MariaDB)
- fix relocation crash due to premature return from commit
- fix quota deadlock between rescan and qgroup removal
- fix item data bounds checks in tree-checker (found on a fuzzed
image)
- fix fsync of prealloc extents after EOF
- add missing run of delayed items after unlink during log replay
- don't start relocation until snapshot drop is finished
- fix reversed condition for subpage writers locking
- fix warning on page error"
* tag 'for-5.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fallback to blocking mode when doing async dio over multiple extents
btrfs: add missing run of delayed items after unlink during log replay
btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
btrfs: fix relocation crash due to premature return from btrfs_commit_transaction()
btrfs: do not start relocation until in progress drops are done
btrfs: tree-checker: use u64 for item data end to avoid overflow
btrfs: do not WARN_ON() if we have PageError set
btrfs: fix lost prealloc extents beyond eof after full fsync
btrfs: subpage: fix a wrong check on subpage->writers
Linus Torvalds [Sun, 6 Mar 2022 20:08:42 +0000 (12:08 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"x86 guest:
- Tweaks to the paravirtualization code, to avoid using them when
they're pointless or harmful
x86 host:
- Fix for SRCU lockdep splat
- Brown paper bag fix for the propagation of errno"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: pull kvm->srcu read-side to kvm_arch_vcpu_ioctl_run
KVM: x86/mmu: Passing up the error state of mmu_alloc_shadow_roots()
KVM: x86: Yield to IPI target vCPU only if it is busy
x86/kvmclock: Fix Hyper-V Isolated VM's boot issue when vCPUs > 64
x86/kvm: Don't waste memory if kvmclock is disabled
x86/kvm: Don't use PV TLB/yield when mwait is advertised
Linus Torvalds [Sun, 6 Mar 2022 19:57:42 +0000 (11:57 -0800)]
Merge tag 'powerpc-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fix from Michael Ellerman:
"Fix build failure when CONFIG_PPC_64S_HASH_MMU is not set.
Thanks to Murilo Opsfelder Araujo, and Erhard F"
* tag 'powerpc-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Fix build failure when CONFIG_PPC_64S_HASH_MMU is not set
Linus Torvalds [Sun, 6 Mar 2022 19:47:59 +0000 (11:47 -0800)]
Merge tag 'trace-v5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix sorting on old "cpu" value in histograms
- Fix return value of __setup() boot parameter handlers
* tag 'trace-v5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix return value of __setup handlers
tracing/histogram: Fix sorting on old "cpu" value
vhost: remove avail_event arg from vhost_update_avail_event()
In vhost_update_avail_event() we never used the `avail_event` argument,
since its introduction in commit 2723feaa8ec6 ("vhost: set log when
updating used flags or avail event").
Zhang Min [Tue, 1 Mar 2022 09:10:59 +0000 (17:10 +0800)]
vdpa: fix use-after-free on vp_vdpa_remove
When vp_vdpa driver is unbind, vp_vdpa is freed in vdpa_unregister_device
and then vp_vdpa->mdev.pci_dev is dereferenced in vp_modern_remove,
triggering use-after-free.
Xie Yongji [Fri, 4 Mar 2022 10:00:58 +0000 (18:00 +0800)]
virtio-blk: Remove BUG_ON() in virtio_queue_rq()
Currently we have a BUG_ON() to make sure the number of sg
list does not exceed queue_max_segments() in virtio_queue_rq().
However, the block layer uses queue_max_discard_segments()
instead of queue_max_segments() to limit the sg list for
discard requests. So the BUG_ON() might be triggered if
virtio-blk device reports a larger value for max discard
segment than queue_max_segments(). To fix it, let's simply
remove the BUG_ON() which has become unnecessary after commit 02746e26c39e("virtio-blk: avoid preallocating big SGL for data").
And the unused vblk->sg_elems can also be removed together.
Xie Yongji [Fri, 4 Mar 2022 10:00:57 +0000 (18:00 +0800)]
virtio-blk: Don't use MAX_DISCARD_SEGMENTS if max_discard_seg is zero
Currently the value of max_discard_segment will be set to
MAX_DISCARD_SEGMENTS (256) with no basis in hardware if device
set 0 to max_discard_seg in configuration space. It's incorrect
since the device might not be able to handle such large descriptors.
To fix it, let's follow max_segments restrictions in this case.
vhost: fix hung thread due to erroneous iotlb entries
In vhost_iotlb_add_range_ctx(), range size can overflow to 0 when
start is 0 and last is ULONG_MAX. One instance where it can happen
is when userspace sends an IOTLB message with iova=size=uaddr=0
(vhost_process_iotlb_msg). So, an entry with size = 0, start = 0,
last = ULONG_MAX ends up in the iotlb. Next time a packet is sent,
iotlb_access_ok() loops indefinitely due to that erroneous entry.
Reported by syzbot at:
https://syzkaller.appspot.com/bug?extid=0abd373e2e50d704db87
To fix this, do two things:
1. Return -EINVAL in vhost_chr_write_iter() when userspace asks to map
a range with size 0.
2. Fix vhost_iotlb_add_range_ctx() to handle the range [0, ULONG_MAX]
by splitting it into two entries.
Vladimir Oltean [Thu, 3 Mar 2022 14:08:40 +0000 (16:08 +0200)]
net: dsa: unlock the rtnl_mutex when dsa_master_setup() fails
After the blamed commit, dsa_tree_setup_master() may exit without
calling rtnl_unlock(), fix that.
Fixes: c146f9bc195a ("net: dsa: hold rtnl_mutex when calling dsa_master_{setup,teardown}") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Kai Lueke [Thu, 3 Mar 2022 14:55:10 +0000 (15:55 +0100)]
Revert "xfrm: state and policy should fail if XFRMA_IF_ID 0"
This reverts commit 68ac0f3810e76a853b5f7b90601a05c3048b8b54 because ID
0 was meant to be used for configuring the policy/state without
matching for a specific interface (e.g., Cilium is affected, see
https://github.com/cilium/cilium/pull/18789 and
https://github.com/cilium/cilium/pull/19019).
Linus Torvalds [Sat, 5 Mar 2022 23:49:45 +0000 (15:49 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- a fixup for Goodix touchscreen driver allowing it to work on certain
Cherry Trail devices
- a fix for imbalanced enable/disable regulator in Elam touchpad driver
that became apparent when used with Asus TF103C 2-in-1 dock
- a couple new input keycodes used on newer keyboards
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
HID: add mapping for KEY_ALL_APPLICATIONS
HID: add mapping for KEY_DICTATE
Input: elan_i2c - fix regulator enable count imbalance after suspend/resume
Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()
Input: goodix - workaround Cherry Trail devices with a bogus ACPI Interrupt() resource
Input: goodix - use the new soc_intel_is_byt() helper
Input: samsung-keypad - properly state IOMEM dependency
Linus Torvalds [Sat, 5 Mar 2022 20:03:14 +0000 (12:03 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"8 patches.
Subsystems affected by this patch series: mm (hugetlb, pagemap, and
userfaultfd), memfd, selftests, and kconfig"
* emailed patches from Andrew Morton <[email protected]>:
configs/debug: set CONFIG_DEBUG_INFO=y properly
proc: fix documentation and description of pagemap
kselftest/vm: fix tests build with old libc
memfd: fix F_SEAL_WRITE after shmem huge page allocated
mm: fix use-after-free when anon vma name is used after vma is freed
mm: prevent vm_area_struct::anon_name refcount saturation
mm: refactor vm_area_struct::anon_vma_name usage code
selftests/vm: cleanup hugetlb file after mremap test
Linus Torvalds [Sat, 5 Mar 2022 19:25:26 +0000 (11:25 -0800)]
Merge tag 's390-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix HAVE_DYNAMIC_FTRACE_WITH_ARGS implementation by providing correct
switching between ftrace_caller/ftrace_regs_caller and supplying
pt_regs only when ftrace_regs_caller is activated.
- Fix exception table sorting.
- Fix breakage of kdump tooling by preserving metadata it cannot
function without.
* tag 's390-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/extable: fix exception table sorting
s390/ftrace: fix arch_ftrace_get_regs implementation
s390/ftrace: fix ftrace_caller/ftrace_regs_caller generation
s390/setup: preserve memory at OLDMEM_BASE and OLDMEM_SIZE
Yun Zhou [Sat, 5 Mar 2022 04:29:07 +0000 (20:29 -0800)]
proc: fix documentation and description of pagemap
Since bit 57 was exported for uffd-wp write-protected (commit fb8e37f35a2f: "mm/pagemap: export uffd-wp protection information"),
fixing it can reduce some unnecessary confusion.
Chengming Zhou [Sat, 5 Mar 2022 04:29:04 +0000 (20:29 -0800)]
kselftest/vm: fix tests build with old libc
The error message when I build vm tests on debian10 (GLIBC 2.28):
userfaultfd.c: In function `userfaultfd_pagemap_test':
userfaultfd.c:1393:37: error: `MADV_PAGEOUT' undeclared (first use
in this function); did you mean `MADV_RANDOM'?
if (madvise(area_dst, test_pgsize, MADV_PAGEOUT))
^~~~~~~~~~~~
MADV_RANDOM
This patch includes these newer definitions from UAPI linux/mman.h, is
useful to fix tests build on systems without these definitions in glibc
sys/mman.h.
the docker program tries to add F_SEAL_WRITE through the following
command, but it fails unexpectedly with errno EBUSY:
fcntl(5, F_ADD_SEALS, F_SEAL_WRITE) = -1.
That is because memfd_tag_pins() and memfd_wait_for_pins() were never
updated for shmem huge pages: checking page_mapcount() against
page_count() is hopeless on THP subpages - they need to check
total_mapcount() against page_count() on THP heads only.
Make memfd_tag_pins() (compared > 1) as strict as memfd_wait_for_pins()
(compared != 1): either can be justified, but given the non-atomic
total_mapcount() calculation, it is better now to be strict. Bear in
mind that total_mapcount() itself scans all of the THP subpages, when
choosing to take an XA_CHECK_SCHED latency break.
Also fix the unlikely xa_is_value() case in memfd_wait_for_pins(): if a
page has been swapped out since memfd_tag_pins(), then its refcount must
have fallen, and so it can safely be untagged.
mm: fix use-after-free when anon vma name is used after vma is freed
When adjacent vmas are being merged it can result in the vma that was
originally passed to madvise_update_vma being destroyed. In the current
implementation, the name parameter passed to madvise_update_vma points
directly to vma->anon_name and it is used after the call to vma_merge.
In the cases when vma_merge merges the original vma and destroys it,
this might result in UAF. For that the original vma would have to hold
the anon_vma_name with the last reference. The following vma would need
to contain a different anon_vma_name object with the same string. Such
scenario is shown below:
madvise_vma_behavior(vma)
madvise_update_vma(vma, ..., anon_name == vma->anon_name)
vma_merge(vma)
__vma_adjust(vma) <-- merges vma with adjacent one
vm_area_free(vma) <-- frees the original vma
replace_vma_anon_name(anon_name) <-- UAF of vma->anon_name
Fix this by raising the name refcount and stabilizing it.
A deep process chain with many vmas could grow really high. With
default sysctl_max_map_count (64k) and default pid_max (32k) the max
number of vmas in the system is 2147450880 and the refcounter has
headroom of 1073774592 before it reaches REFCOUNT_SATURATED
(3221225472).
Therefore it's unlikely that an anonymous name refcounter will overflow
with these defaults. Currently the max for pid_max is PID_MAX_LIMIT
(4194304) and for sysctl_max_map_count it's INT_MAX (2147483647). In
this configuration anon_vma_name refcount overflow becomes theoretically
possible (that still require heavy sharing of that anon_vma_name between
processes).
kref refcounting interface used in anon_vma_name structure will detect a
counter overflow when it reaches REFCOUNT_SATURATED value but will only
generate a warning and freeze the ref counter. This would lead to the
refcounted object never being freed. A determined attacker could leak
memory like that but it would be rather expensive and inefficient way to
do so.
To ensure anon_vma_name refcount does not overflow, stop anon_vma_name
sharing when the refcount reaches REFCOUNT_MAX (2147483647), which still
leaves INT_MAX/2 (1073741823) values before the counter reaches
REFCOUNT_SATURATED. This should provide enough headroom for raising the
refcounts temporarily.