Merge tag 'f2fs-fix-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs fixes from Jaegeuk Kim:
"This includes major bug fixes introduced in 5.18-rc1 and 5.17+:
- Remove obsolete whint_mode (5.18-rc1)
- Fix IO split issue caused by op_flags change in f2fs (5.18-rc1)
- Fix a wrong condition check to detect IO failure loop (5.18-rc1)
- Fix wrong data truncation during roll-forward (5.17+)"
* tag 'f2fs-fix-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: should not truncate blocks during roll-forward recovery
f2fs: fix wrong condition check when failing metapage read
f2fs: keep io_flags to avoid IO split due to different op_flags in two fio holders
f2fs: remove obsolete whint_mode
no-MMU: expose vmalloc_huge() for alloc_large_system_hash()
It turns out that for the CONFIG_MMU=n builds, vmalloc_huge() was never
defined, since it's defined in mm/vmalloc.c, which doesn't get built for
the no-MMU configurations.
Just implement the trivial wrapper for the no-MMU case too. In fact,
just make it an alias to the existing __vmalloc() function that has the
same signature.
Fixes: 5b0b9e4c2c89 ("tcp: md5: incorrect tcp_header_len for incoming connections") Signed-off-by: Eric Dumazet <[email protected]> Cc: Francesco Ruggeri <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Eric Dumazet [Mon, 25 Apr 2022 00:34:07 +0000 (17:34 -0700)]
tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT
I had this bug sitting for too long in my pile, it is time to fix it.
Thanks to Doug Porter for reminding me of it!
We had various attempts in the past, including commit 0cbe6a8f089e ("tcp: remove SOCK_QUEUE_SHRUNK"),
but the issue is that TCP stack currently only generates
EPOLLOUT from input path, when tp->snd_una has advanced
and skb(s) cleaned from rtx queue.
If a flow has a big RTT, and/or receives SACKs, it is possible
that the notsent part (tp->write_seq - tp->snd_nxt) reaches 0
and no more data can be sent until tp->snd_una finally advances.
What is needed is to also check if POLLOUT needs to be generated
whenever tp->snd_nxt is advanced, from output path.
This bug triggers more often after an idle period, as
we do not receive ACK for at least one RTT. tcp_notsent_lowat
could be a fraction of what CWND and pacing rate would allow to
send during this RTT.
In a followup patch, I will remove the bogus call
to tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED)
from tcp_check_space(). Fact that we have decided to generate
an EPOLLOUT does not mean the application has immediately
refilled the transmit queue. This optimistic call
might have been the reason the bug seemed not too serious.
Tested:
200 ms rtt, 1% packet loss, 32 MB tcp_rmem[2] and tcp_wmem[2]
$ echo 500000 >/proc/sys/net/ipv4/tcp_notsent_lowat
$ cat bench_rr.sh
SUM=0
for i in {1..10}
do
V=`netperf -H remote_host -l30 -t TCP_RR -- -r 10000000,10000 -o LOCAL_BYTES_SENT | egrep -v "MIGRATED|Bytes"`
echo $V
SUM=$(($SUM + $V))
done
echo SUM=$SUM
Vladimir Oltean [Thu, 21 Apr 2022 23:01:05 +0000 (02:01 +0300)]
net: mscc: ocelot: don't add VID 0 to ocelot->vlans when leaving VLAN-aware bridge
DSA, through dsa_port_bridge_leave(), first notifies the port of the
fact that it left a bridge, then, if that bridge was VLAN-aware, it
notifies the port of the change in VLAN awareness state, towards
VLAN-unaware mode.
So ocelot_port_vlan_filtering() can be called when ocelot_port->bridge
is NULL, and this makes ocelot_add_vlan_unaware_pvid() create a struct
ocelot_bridge_vlan with a vid of 0 and an "untagged" setting of true on
that port.
In a way this structure correctly reflects the reality, but by design,
VID 0 (OCELOT_STANDALONE_PVID) was not meant to be kept in the bridge
VLAN list of the driver, but managed separately.
Having OCELOT_STANDALONE_PVID in ocelot->vlans makes us trip up on
several sanity checks that did not expect to have this VID there.
For example, after we leave a VLAN-aware bridge and we re-join it, we
can no longer program egress-tagged VLANs to hardware:
# ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
# ip link set swp0 master br0
# ip link set swp0 nomaster
# ip link set swp0 master br0
# bridge vlan add dev swp0 vid 100
Error: mscc_ocelot_switch_lib: Port with more than one egress-untagged VLAN cannot have egress-tagged VLANs.
But this configuration is in fact supported by the hardware, since we
could use OCELOT_PORT_TAG_NATIVE. According to its comment:
/* all VLANs except the native VLAN and VID 0 are egress-tagged */
yet when assessing the eligibility for this mode, we do not check for
VID 0 in ocelot_port_uses_native_vlan(), instead we just ensure that
ocelot_port_num_untagged_vlans() == 1. This is simply because VID 0
doesn't have a bridge VLAN structure.
The way I identify the problem is that ocelot_port_vlan_filtering(false)
only means to call ocelot_add_vlan_unaware_pvid() when we dynamically
turn off VLAN awareness for a bridge we are under, and the PVID changes
from the bridge PVID to a reserved PVID based on the bridge number.
Since OCELOT_STANDALONE_PVID is statically added to the VLAN table
during ocelot_vlan_init() and never removed afterwards, calling
ocelot_add_vlan_unaware_pvid() for it is not intended and does not serve
any purpose.
Fix the issue by avoiding the call to ocelot_add_vlan_unaware_pvid(vid=0)
when we're resetting VLAN awareness after leaving the bridge, to become
a standalone port.
Fixes: 54c319846086 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Vladimir Oltean [Thu, 21 Apr 2022 23:01:04 +0000 (02:01 +0300)]
net: mscc: ocelot: ignore VID 0 added by 8021q module
Both the felix DSA driver and ocelot switchdev driver declare
dev->features & NETIF_F_HW_VLAN_CTAG_FILTER under certain circumstances*,
so the 8021q module will add VID 0 to our RX filter when the port goes
up, to ensure 802.1p traffic is not dropped.
We treat VID 0 as a special value (OCELOT_STANDALONE_PVID) which
deliberately does not have a struct ocelot_bridge_vlan associated with
it. Instead, this gets programmed to the VLAN table in ocelot_vlan_init().
If we allow external calls to modify VID 0, we reach the following
situation:
# ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
# ip link set swp0 master br0
# ip link set swp0 up # this adds VID 0 to ocelot->vlans with untagged=false
bridge vlan
port vlan-id
swp0 1 PVID Egress Untagged # the bridge also adds VID 1
br0 1 PVID Egress Untagged
# bridge vlan add dev swp0 vid 100 untagged
Error: mscc_ocelot_switch_lib: Port with egress-tagged VLANs cannot have more than one egress-untagged (native) VLAN.
This configuration should have been accepted, because
ocelot_port_manage_port_tag() should select OCELOT_PORT_TAG_NATIVE.
Yet it isn't, because we have an entry in ocelot->vlans which says
VID 0 should be egress-tagged, something the hardware can't do.
Fix this by suppressing additions/deletions on VID 0 and managing this
VLAN exclusively using OCELOT_STANDALONE_PVID.
*DSA toggles it when the port becomes VLAN-aware by joining a VLAN-aware
bridge. Ocelot declares it unconditionally for some reason.
Fixes: 54c319846086 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Vladimir Oltean [Thu, 21 Apr 2022 22:42:22 +0000 (01:42 +0300)]
net: dsa: flood multicast to CPU when slave has IFF_PROMISC
Certain DSA switches can eliminate flooding to the CPU when none of the
ports have the IFF_ALLMULTI or IFF_PROMISC flags set. This is done by
synthesizing a call to dsa_port_bridge_flags() for the CPU port, a call
which normally comes from the bridge driver via switchdev.
The bridge port flags and IFF_PROMISC|IFF_ALLMULTI have slightly
different semantics, and due to inattention/lack of proper testing, the
IFF_PROMISC flag allows unknown unicast to be flooded to the CPU, but
not unknown multicast.
This must be fixed by setting both BR_FLOOD (unicast) and BR_MCAST_FLOOD
in the synthesized dsa_port_bridge_flags() call, since IFF_PROMISC means
that packets should not be filtered regardless of their MAC DA.
Fixes: 7569459a52c9 ("net: dsa: manage flooding on the CPU ports") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Peilin Ye [Thu, 21 Apr 2022 22:09:02 +0000 (15:09 -0700)]
ip_gre, ip6_gre: Fix race condition on o_seqno in collect_md mode
As pointed out by Jakub Kicinski, currently using TUNNEL_SEQ in
collect_md mode is racy for [IP6]GRE[TAP] devices. Consider the
following sequence of events:
1. An [IP6]GRE[TAP] device is created in collect_md mode using "ip link
add ... external". "ip" ignores "[o]seq" if "external" is specified,
so TUNNEL_SEQ is off, and the device is marked as NETIF_F_LLTX (i.e.
it uses lockless TX);
2. Someone sets TUNNEL_SEQ on outgoing skb's, using e.g.
bpf_skb_set_tunnel_key() in an eBPF program attached to this device;
3. gre_fb_xmit() or __gre6_xmit() processes these skb's:
Since we are not using the TX lock (&txq->_xmit_lock), multiple CPUs may
try to do this tunnel->o_seqno++ in parallel, which is racy. Fix it by
making o_seqno atomic_t.
As mentioned by Eric Dumazet in commit b790e01aee74 ("ip_gre: lockless
xmit"), making o_seqno atomic_t increases "chance for packets being out
of order at receiver" when NETIF_F_LLTX is on.
Maybe a better fix would be:
1. Do not ignore "oseq" in external mode. Users MUST specify "oseq" if
they want the kernel to allow sequencing of outgoing packets;
2. Reject all outgoing TUNNEL_SEQ packets if the device was not created
with "oseq".
Unfortunately, that would break userspace.
We could now make [IP6]GRE[TAP] devices always NETIF_F_LLTX, but let us
do it in separate patches to keep this fix minimal.
Peilin Ye [Thu, 21 Apr 2022 22:08:38 +0000 (15:08 -0700)]
ip6_gre: Make o_seqno start from 0 in native mode
For IP6GRE and IP6GRETAP devices, currently o_seqno starts from 1 in
native mode. According to RFC 2890 2.2., "The first datagram is sent
with a sequence number of 0." Fix it.
It is worth mentioning that o_seqno already starts from 0 in collect_md
mode, see the "if (tunnel->parms.collect_md)" clause in __gre6_xmit(),
where tunnel->o_seqno is passed to gre_build_header() before getting
incremented.
Peilin Ye [Thu, 21 Apr 2022 22:07:57 +0000 (15:07 -0700)]
ip_gre: Make o_seqno start from 0 in native mode
For GRE and GRETAP devices, currently o_seqno starts from 1 in native
mode. According to RFC 2890 2.2., "The first datagram is sent with a
sequence number of 0." Fix it.
It is worth mentioning that o_seqno already starts from 0 in collect_md
mode, see gre_fb_xmit(), where tunnel->o_seqno is passed to
gre_build_header() before getting incremented.
Dan Carpenter [Thu, 21 Apr 2022 15:46:13 +0000 (18:46 +0300)]
net: lan966x: fix a couple off by one bugs
The lan966x->ports[] array has lan966x->num_phys_ports elements. These
are assigned in lan966x_probe(). That means the > comparison should be
changed to >=.
The first off by one check is harmless but the second one could lead to
an out of bounds access and a crash.
Fixes: 5ccd66e01cbe ("net: lan966x: add support for interrupts from analyzer") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net/smc: sync err code when tcp connection was refused
In the current implementation, when TCP initiates a connection
to an unavailable [ip,port], ECONNREFUSED will be stored in the
TCP socket, but SMC will not. However, some apps (like curl) use
getsockopt(,,SO_ERROR,,) to get the error information, which makes
them miss the error message and behave strangely.
net: hns3: add return value for mailbox handling in PF
Currently, there are some querying mailboxes sent from VF to PF,
and VF will wait the PF's handling result. For mailbox
HCLGE_MBX_GET_QID_IN_PF and HCLGE_MBX_GET_RSS_KEY, it may fail
when the input parameter is invalid, but the prototype of their
handler function is void. In this case, PF always return success
to VF, which may cause the VF get incorrect result.
Fixes it by adding return value for these function.
Fixes: 63b1279d9905 ("net: hns3: check queue id range before using") Fixes: 532cfc0df1e4 ("net: hns3: add a check for index in hclge_get_rss_key()") Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jie Wang [Sun, 24 Apr 2022 12:57:23 +0000 (20:57 +0800)]
net: hns3: modify the return code of hclge_get_ring_chain_from_mbx
Currently, function hclge_get_ring_chain_from_mbx will return -ENOMEM if
ring_num is bigger than HCLGE_MBX_MAX_RING_CHAIN_PARAM_NUM. It is better to
return -EINVAL for the invalid parameter case.
So this patch fixes it by return -EINVAL in this abnormal branch.
Fixes: 5d02a58dae60 ("net: hns3: fix for buffer overflow smatch warning") Signed-off-by: Jie Wang <[email protected]> Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Peng Li [Sun, 24 Apr 2022 12:57:22 +0000 (20:57 +0800)]
net: hns3: fix error log of tx/rx tqps stats
The comments in function hclge_comm_tqps_update_stats is not right,
so fix it.
Fixes: 287db5c40d15 ("net: hns3: create new set of common tqp stats APIs for PF and VF reuse") Signed-off-by: Peng Li <[email protected]> Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
For debugfs node rx/tx_queue_info and rx/tx_bd_info, their output info is
aligned to the right, it's not aligned with output of other debugfs node,
so uniform their output info.
Fixes: 907676b13071 ("net: hns3: use tx bounce buffer for small packets") Fixes: e44c495d95e0 ("net: hns3: refactor queue info of debugfs") Fixes: 77e9184869c9 ("net: hns3: refactor dump bd info of debugfs") Signed-off-by: Hao Chen <[email protected]> Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net: hns3: clear inited state and stop client after failed to register netdev
If failed to register netdev, it needs to clear INITED state and stop
client in case of cause problem when concurrency with uninitialized
process of driver.
Fixes: a289a7e5c1d4 ("net: hns3: put off calling register_netdev() until client initialize complete") Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Martin Willi [Tue, 19 Apr 2022 13:47:00 +0000 (15:47 +0200)]
netfilter: Update ip6_route_me_harder to consider L3 domain
The commit referenced below fixed packet re-routing if Netfilter mangles
a routing key property of a packet and the packet is routed in a VRF L3
domain. The fix, however, addressed IPv4 re-routing, only.
This commit applies the same behavior for IPv6. While at it, untangle
the nested ternary operator to make the code more readable.
if sun50i_cpufreq_get_efuse failed, then opp_tables leak.
Fixes: f328584f7bff ("cpufreq: Add sun50i nvmem based CPU scaling driver") Signed-off-by: Xiaobing Luo <[email protected]> Reviewed-by: Samuel Holland <[email protected]> Signed-off-by: Viresh Kumar <[email protected]>
drm/i915: Check EDID for HDR static metadata when choosing blc
We have now seen panel (XMG Core 15 e21 laptop) advertizing support
for Intel proprietary eDP backlight control via DPCD registers, but
actually working only with legacy pwm control.
This patch adds panel EDID check for possible HDR static metadata and
Intel proprietary eDP backlight control is used only if that exists.
Missing HDR static metadata is ignored if user specifically asks for
Intel proprietary eDP backlight control via enable_dpcd_backlight
parameter.
v2 :
- Ignore missing HDR static metadata if Intel proprietary eDP
backlight control is forced via i915.enable_dpcd_backlight
- Printout info message if panel is missing HDR static metadata and
support for Intel proprietary eDP backlight control is detected
Hans de Goede [Mon, 18 Apr 2022 15:09:36 +0000 (17:09 +0200)]
drm/i915: Fix DISP_POS_Y and DISP_HEIGHT defines
Commit 428cb15d5b00 ("drm/i915: Clean up pre-skl primary plane registers")
introduced DISP_POS_Y and DISP_HEIGHT defines but accidentally set these
their masks to REG_GENMASK(31, 0) instead of REG_GENMASK(31, 16).
This breaks the primary display pane on at least pineview machines, fix
the mask to fix the primary display pane only showing black.
Tested on an Acer One AO532h with an Intel N450 SoC.
Merge tag 'perf_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Add Sapphire Rapids CPU support
- Fix a perf vmalloc-ed buffer mapping error (PERF_USE_VMALLOC in use)
* tag 'perf_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/cstate: Add SAPPHIRERAPIDS_X CPU support
perf/core: Fix perf_mmap fail when CONFIG_PERF_USE_VMALLOC enabled
Merge tag 'edac_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fix from Borislav Petkov:
- Read the reported error count from the proper register on
synopsys_edac
* tag 'edac_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/synopsys: Read the error count from the correct register
kvmalloc: use vmalloc_huge for vmalloc allocations
Since commit 559089e0a93d ("vmalloc: replace VM_NO_HUGE_VMAP with
VM_ALLOW_HUGE_VMAP"), the use of hugepage mappings for vmalloc is an
opt-in strategy, because it caused a number of problems that weren't
noticed until x86 enabled it too.
One of the issues was fixed by Nick Piggin in commit 3b8000ae185c
("mm/vmalloc: huge vmalloc backing pages should be split rather than
compound"), but I'm still worried about page protection issues, and
VM_FLUSH_RESET_PERMS in particular.
However, like the hash table allocation case (commit f2edd118d02d:
"page_alloc: use vmalloc_huge for large system hash"), the use of
kvmalloc() should be safe from any such games, since the returned
pointer might be a SLUB allocation, and as such no user should
reasonably be using it in any odd ways.
We also know that the allocations are fairly large, since it falls back
to the vmalloc case only when a kmalloc() fails. So using a hugepage
mapping seems both safe and relevant.
This patch does show a weakness in the opt-in strategy: since the opt-in
flag is in the 'vm_flags', not the usual gfp_t allocation flags, very
few of the usual interfaces actually expose it.
That's not much of an issue in this case that already used one of the
fairly specialized low-level vmalloc interfaces for the allocation, but
for a lot of other vmalloc() users that might want to opt in, it's going
to be very inconvenient.
We'll either have to fix any compatibility problems, or expose it in the
gfp flags (__GFP_COMP would have made a lot of sense) to allow normal
vmalloc() users to use hugepage mappings. That said, the cases that
really matter were probably already taken care of by the hash tabel
allocation.
Merge tag '5.18-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull ksmbd server fixes from Steve French:
- cap maximum sector size reported to avoid mount problems
- reference count fix
- fix filename rename race
* tag '5.18-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: set fixed sector size to FS_SECTOR_SIZE_INFORMATION
ksmbd: increment reference count of parent fp
ksmbd: remove filename in ksmbd_file
Merge tag 'arc-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- Assorted fixes
* tag 'arc-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: remove redundant READ_ONCE() in cmpxchg loop
ARC: atomic: cleanup atomic-llsc definitions
arc: drop definitions of pgd_index() and pgd_offset{, _k}() entirely
ARC: dts: align SPI NOR node name with dtschema
ARC: Remove a redundant memset()
ARC: fix typos in comments
ARC: entry: fix syscall_trace_exit argument
Xin Long [Wed, 20 Apr 2022 20:52:41 +0000 (16:52 -0400)]
sctp: check asoc strreset_chunk in sctp_generate_reconf_event
A null pointer reference issue can be triggered when the response of a
stream reconf request arrives after the timer is triggered, such as:
send Incoming SSN Reset Request --->
CPU0:
reconf timer is triggered,
go to the handler code before hold sk lock
<--- reply with Outgoing SSN Reset Request
CPU1:
process Outgoing SSN Reset Request,
and set asoc->strreset_chunk to NULL
CPU0:
continue the handler code, hold sk lock,
and try to hold asoc->strreset_chunk, crash!
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"One fix for an information leak caused by copying a buffer to
userspace without checking for error first in the sr driver"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sr: Do not leak information in ioctl
Merge tag 'for-linus-5.18-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"A simple cleanup patch and a refcount fix for Xen on Arm"
* tag 'for-linus-5.18-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
arm/xen: Fix some refcount leaks
xen: Convert kmap() to kmap_local_page()
Merge tag 'drm-fixes-2022-04-23' of git://anongit.freedesktop.org/drm/drm
Pull more drm fixes from Dave Airlie:
"Maarten was away, so Maxine stepped up and sent me the drm-fixes
merge, so no point leaving it for another week.
The big change is an OF revert around bridge/panels, it may have some
driver fallout, but hopefully this revert gets them shook out in the
next week easier.
Otherwise it's a bunch of locking/refcounts across drivers, a radeon
dma_resv logic fix and some raspberry pi panel fixes.
panel:
- revert of patch that broke panel/bridge issues
dma-buf:
- remove unused header file.
amdgpu:
- partial revert of locking change
radeon:
- fix dma_resv logic inversion
panel:
- pi touchscreen panel init fixes
vc4:
- build fix
- runtime pm refcount fix
vmwgfx:
- refcounting fix"
* tag 'drm-fixes-2022-04-23' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: partial revert "remove ctx->lock" v2
Revert "drm: of: Lookup if child node has panel or bridge"
Revert "drm: of: Properly try all possible cases for bridge/panel detection"
drm/vc4: Use pm_runtime_resume_and_get to fix pm_runtime_get_sync() usage
drm/vmwgfx: Fix gem refcounting and memory evictions
drm/vc4: Fix build error when CONFIG_DRM_VC4=y && CONFIG_RASPBERRYPI_FIRMWARE=m
drm/panel/raspberrypi-touchscreen: Initialise the bridge in prepare
drm/panel/raspberrypi-touchscreen: Avoid NULL deref if not initialised
dma-buf-map: remove renamed header file
drm/radeon: fix logic inversion in radeon_sync_resv
Merge tag 'block-5.18-2022-04-22' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Just two small regression fixes for bcache"
* tag 'block-5.18-2022-04-22' of git://git.kernel.dk/linux-block:
bcache: fix wrong bdev parameter when calling bio_alloc_clone() in do_bio_hook()
bcache: put bch_bio_map() back to correct location in journal_write_unlocked()
Merge tag 'io_uring-5.18-2022-04-22' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Just two small fixes - one fixing a potential leak for the iovec for
larger requests added in this cycle, and one fixing a theoretical leak
with CQE_SKIP and IOPOLL"
* tag 'io_uring-5.18-2022-04-22' of git://git.kernel.dk/linux-block:
io_uring: fix leaks on IOPOLL and CQE_SKIP
io_uring: free iovec if file assignment fails
Merge tag 'perf-tools-fixes-for-v5.18-2022-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix header include for LLVM >= 14 when building with libclang.
- Allow access to 'data_src' for auxtrace in 'perf script' with ARM SPE
perf.data files, fixing processing data with such attributes.
- Fix error message for test case 71 ("Convert perf time to TSC") on
s390, where it is not supported.
* tag 'perf-tools-fixes-for-v5.18-2022-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf test: Fix error message for test case 71 on s390, where it is not supported
perf report: Set PERF_SAMPLE_DATA_SRC bit for Arm SPE event
perf script: Always allow field 'data_src' for auxtrace
perf clang: Fix header include for LLVM >= 14
Randy Dunlap [Sat, 23 Apr 2022 03:25:17 +0000 (20:25 -0700)]
sparc: cacheflush_32.h needs struct page
Add a struct page forward declaration to cacheflush_32.h.
Fixes this build warning:
CC drivers/crypto/xilinx/zynqmp-sha.o
In file included from arch/sparc/include/asm/cacheflush.h:11,
from include/linux/cacheflush.h:5,
from drivers/crypto/xilinx/zynqmp-sha.c:6:
arch/sparc/include/asm/cacheflush_32.h:38:37: warning: 'struct page' declared inside parameter list will not be visible outside of this definition or declaration
38 | void sparc_flush_page_to_ram(struct page *page);
Exposed by commit 0e03b8fd2936 ("crypto: xilinx - Turn SHA into a
tristate and allow COMPILE_TEST") but not Fixes: that commit because the
underlying problem is older.
Dave Airlie [Sat, 23 Apr 2022 05:00:33 +0000 (15:00 +1000)]
Merge tag 'drm-misc-fixes-2022-04-22' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Two fixes for the raspberrypi panel initialisation, one fix for a logic
inversion in radeon, a build and pm refcounting fix for vc4, two reverts
for drm_of_get_bridge that caused a number of regression and a locking
regression for amdgpu.
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix some syzbot-detected bugs, as well as other bugs found by I/O
injection testing.
Change ext4's fallocate to consistently drop set[ug]id bits when an
fallocate operation might possibly change the user-visible contents of
a file.
Also, improve handling of potentially invalid values in the the
s_overhead_cluster superblock field to avoid ext4 returning a negative
number of free blocks"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd2: fix a potential race while discarding reserved buffers after an abort
ext4: update the cached overhead value in the superblock
ext4: force overhead calculation if the s_overhead_cluster makes no sense
ext4: fix overhead calculation to account for the reserved gdt blocks
ext4, doc: fix incorrect h_reserved size
ext4: limit length to bitmap_maxbytes - blocksize in punch_hole
ext4: fix use-after-free in ext4_search_dir
ext4: fix bug_on in start_this_handle during umount filesystem
ext4: fix symlink file size not match to file content
ext4: fix fallocate to use file_modified to update permissions consistently
Merge tag 'ata-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ATA fix from Damien Le Moal:
"A single fix to avoid a NULL pointer dereference in the pata_marvell
driver with adapters not supporting DMA, from Zheyu"
* tag 'ata-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: pata_marvell: Check the 'bmdma_addr' beforing reading
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"The main and larger change here is a workaround for AMD's lack of
cache coherency for encrypted-memory guests.
I have another patch pending, but it's waiting for review from the
architecture maintainers.
RISC-V:
- Remove 's' & 'u' as valid ISA extension
- Do not allow disabling the base extensions 'i'/'m'/'a'/'c'
x86:
- Fix NMI watchdog in guests on AMD
- Fix for SEV cache incoherency issues
- Don't re-acquire SRCU lock in complete_emulated_io()
- Avoid NULL pointer deref if VM creation fails
- Fix race conditions between APICv disabling and vCPU creation
- Bugfixes for disabling of APICv
- Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume
selftests:
- Do not use bitfields larger than 32-bits, they differ between GCC
and clang"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: selftests: introduce and use more page size-related constants
kvm: selftests: do not use bitfields larger than 32-bits for PTEs
KVM: SEV: add cache flush to solve SEV cache incoherency issues
KVM: SVM: Flush when freeing encrypted pages even on SME_COHERENT CPUs
KVM: SVM: Simplify and harden helper to flush SEV guest page(s)
KVM: selftests: Silence compiler warning in the kvm_page_table_test
KVM: x86/pmu: Update AMD PMC sample period to fix guest NMI-watchdog
x86/kvm: Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume
KVM: SPDX style and spelling fixes
KVM: x86: Skip KVM_GUESTDBG_BLOCKIRQ APICv update if APICv is disabled
KVM: x86: Pend KVM_REQ_APICV_UPDATE during vCPU creation to fix a race
KVM: nVMX: Defer APICv updates while L2 is active until L1 is active
KVM: x86: Tag APICv DISABLE inhibit, not ABSENT, if APICv is disabled
KVM: Initialize debugfs_dentry when a VM is created to avoid NULL deref
KVM: Add helpers to wrap vcpu->srcu_idx and yell if it's abused
KVM: RISC-V: Use kvm_vcpu.srcu_idx, drop RISC-V's unnecessary copy
KVM: x86: Don't re-acquire SRCU lock in complete_emulated_io()
RISC-V: KVM: Restrict the extensions that can be disabled
RISC-V: KVM: Remove 's' & 'u' as valid ISA extension
net: ethernet: stmmac: fix write to sgmii_adapter_base
I made a mistake with the commit a6aaa0032424 ("net: ethernet: stmmac:
fix altr_tse_pcs function when using a fixed-link"). I should have
tested against both scenario of having a SGMII interface and one
without.
Without the SGMII PCS TSE adpater, the sgmii_adapter_base address is
NULL, thus a write to this address will fail.
Jakub Kicinski [Fri, 22 Apr 2022 22:59:08 +0000 (15:59 -0700)]
Merge branch 'wireguard-patches-for-5-18-rc4'
Jason A. Donenfeld says:
====================
wireguard patches for 5.18-rc4
Here are two small wireguard fixes for 5.18-rc4:
1) We enable ACPI in the QEMU test harness, so that multiple CPUs are
actually used on x86 for testing for races.
2) Sending skbs with metadata dsts attached resulted in a null pointer
dereference, triggerable from executing eBPF programs. The fix is a
oneliner, changing a skb_dst() null check into a skb_valid_dst()
boolean check.
====================
wireguard: device: check for metadata_dst with skb_valid_dst()
When we try to transmit an skb with md_dst attached through wireguard
we hit a null pointer dereference in wg_xmit() due to the use of
dst_mtu() which calls into dst_blackhole_mtu() which in turn tries to
dereference dst->dev.
Since wireguard doesn't use md_dsts we should use skb_valid_dst(), which
checks for DST_METADATA flag, and if it's set, then falls back to
wireguard's device mtu. That gives us the best chance of transmitting
the packet; otherwise if the blackhole netdev is used we'd get
ETH_MIN_MTU.
It turns out that by having CONFIG_ACPI=n, we've been failing to boot
additional CPUs, and so these systems were functionally UP. The code
bloat is unfortunate for build times, but I don't see an alternative. So
this commit sets CONFIG_ACPI=y for x86_64 and i686 configs.
Pengcheng Yang [Wed, 20 Apr 2022 02:34:41 +0000 (10:34 +0800)]
tcp: ensure to use the most recently sent skb when filling the rate sample
If an ACK (s)acks multiple skbs, we favor the information
from the most recently sent skb by choosing the skb with
the highest prior_delivered count. But in the interval
between receiving ACKs, we send multiple skbs with the same
prior_delivered, because the tp->delivered only changes
when we receive an ACK.
We used RACK's solution, copying tcp_rack_sent_after() as
tcp_skb_sent_after() helper to determine "which packet was
sent last?". Later, we will use tcp_skb_sent_after() instead
in RACK.
There is no need to add new compatible strings for each new supported
chip version. The compatible string is used only to select the subdriver
(rtl8365mb.c or rtl8366rb.c). Once in the subdriver, it will detect the
chip model by itself, ignoring which compatible string was used.
Compatible strings are used to help the driver find the chip ID/version
register for each chip family. After that, the driver can setup the
switch accordingly. Keep only the first supported model for each family
as a compatible string and reference other chip models in the
description.
The removed compatible strings have never been used in a released kernel.
The current EOI handler for LEVEL triggered interrupts calls clk_enable(),
register IO, clk_disable(). The clock manipulation requires locking which
happens with IRQs disabled in clk_enable_lock(). Instead of turning the
clock on and off all the time, enable the clock in case LEVEL interrupt is
requested and keep the clock enabled until all LEVEL interrupts are freed.
The LEVEL interrupts are an exception on this platform and seldom used, so
this does not affect the common case.
This simplifies the LEVEL interrupt handling considerably and also fixes
the following splat found when using preempt-rt:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at kernel/locking/rtmutex.c:2040 __rt_mutex_trylock+0x37/0x62
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.109-rt65-stable-standard-00068-g6a5afc4b1217 #85
Hardware name: STM32 (Device Tree Support)
[<c010a45d>] (unwind_backtrace) from [<c010766f>] (show_stack+0xb/0xc)
[<c010766f>] (show_stack) from [<c06353ab>] (dump_stack+0x6f/0x84)
[<c06353ab>] (dump_stack) from [<c01145e3>] (__warn+0x7f/0xa4)
[<c01145e3>] (__warn) from [<c063386f>] (warn_slowpath_fmt+0x3b/0x74)
[<c063386f>] (warn_slowpath_fmt) from [<c063b43d>] (__rt_mutex_trylock+0x37/0x62)
[<c063b43d>] (__rt_mutex_trylock) from [<c063c053>] (rt_spin_trylock+0x7/0x16)
[<c063c053>] (rt_spin_trylock) from [<c036a2f3>] (clk_enable_lock+0xb/0x80)
[<c036a2f3>] (clk_enable_lock) from [<c036ba69>] (clk_core_enable_lock+0x9/0x18)
[<c036ba69>] (clk_core_enable_lock) from [<c034e9f3>] (stm32_gpio_get+0x11/0x24)
[<c034e9f3>] (stm32_gpio_get) from [<c034ef43>] (stm32_gpio_irq_trigger+0x1f/0x48)
[<c034ef43>] (stm32_gpio_irq_trigger) from [<c014aa53>] (handle_fasteoi_irq+0x71/0xa8)
[<c014aa53>] (handle_fasteoi_irq) from [<c0147111>] (generic_handle_irq+0x19/0x22)
[<c0147111>] (generic_handle_irq) from [<c014752d>] (__handle_domain_irq+0x55/0x64)
[<c014752d>] (__handle_domain_irq) from [<c0346f13>] (gic_handle_irq+0x53/0x64)
[<c0346f13>] (gic_handle_irq) from [<c0100ba5>] (__irq_svc+0x65/0xc0)
Exception stack(0xc0e01f18 to 0xc0e01f60)
1f00: 0000300c00000000
1f20: 0000300cc010ff010000000000000000c0e00000c0e0771400000001c0e01f78
1f40: c0e0775800000000ef7cd0ffc0e01f68c010554bc010554240000033ffffffff
[<c0100ba5>] (__irq_svc) from [<c0105542>] (arch_cpu_idle+0xc/0x1e)
[<c0105542>] (arch_cpu_idle) from [<c063be95>] (default_idle_call+0x21/0x3c)
[<c063be95>] (default_idle_call) from [<c01324f7>] (do_idle+0xe3/0x1e4)
[<c01324f7>] (do_idle) from [<c01327b3>] (cpu_startup_entry+0x13/0x14)
[<c01327b3>] (cpu_startup_entry) from [<c0a00c13>] (start_kernel+0x397/0x3d4)
[<c0a00c13>] (start_kernel) from [<00000000>] (0x0)
---[ end trace 0000000000000002 ]---
Power consumption measured on STM32MP157C DHCOM SoM is not increased or
is below noise threshold.
tcp: md5: incorrect tcp_header_len for incoming connections
In tcp_create_openreq_child we adjust tcp_header_len for md5 using the
remote address in newsk. But that address is still 0 in newsk at this
point, and it is only set later by the callers (tcp_v[46]_syn_recv_sock).
Use the address from the request socket instead.
Thomas Richter [Wed, 20 Apr 2022 06:29:21 +0000 (08:29 +0200)]
perf test: Fix error message for test case 71 on s390, where it is not supported
Test case 71 'Convert perf time to TSC' is not supported on s390.
Subtest 71.1 is skipped with the correct message, but subtest 71.2 is
not skipped and fails.
The root cause is function evlist__open() called from
test__perf_time_to_tsc(). evlist__open() returns -ENOENT because the
event cycles:u is not supported by the selected PMU, for example
platform s390 on z/VM or an x86_64 virtual machine.
The PMU driver returns -ENOENT in this case. This error is leads to the
failure.
Fix this by returning TEST_SKIP on -ENOENT.
Output before:
71: Convert perf time to TSC:
71.1: TSC support: Skip (This architecture does not support)
71.2: Perf time to TSC: FAILED!
Output after:
71: Convert perf time to TSC:
71.1: TSC support: Skip (This architecture does not support)
71.2: Perf time to TSC: Skip (perf_read_tsc_conversion is not supported)
This also happens on an x86_64 virtual machine:
# uname -m
x86_64
$ ./perf test -F 71
71: Convert perf time to TSC :
71.1: TSC support : Ok
71.2: Perf time to TSC : FAILED!
$
Committer testing:
Continues to work on x86_64:
$ perf test 71
71: Convert perf time to TSC :
71.1: TSC support : Ok
71.2: Perf time to TSC : Ok
$
Leo Yan [Thu, 14 Apr 2022 12:32:01 +0000 (20:32 +0800)]
perf report: Set PERF_SAMPLE_DATA_SRC bit for Arm SPE event
Since commit bb30acae4c4dacfa ("perf report: Bail out --mem-mode if mem
info is not available") "perf mem report" and "perf report --mem-mode"
don't report result if the PERF_SAMPLE_DATA_SRC bit is missed in sample
type.
The commit ffab487052054162 ("perf: arm-spe: Fix perf report
--mem-mode") partially fixes the issue. It adds PERF_SAMPLE_DATA_SRC
bit for Arm SPE event, this allows the perf data file generated by
kernel v5.18-rc1 or later version can be reported properly.
On the other hand, perf tool still fails to be backward compatibility
for a data file recorded by an older version's perf which contains Arm
SPE trace data. This patch is a workaround in reporting phase, when
detects ARM SPE PMU event and without PERF_SAMPLE_DATA_SRC bit, it will
force to set the bit in the sample type and give a warning info.
Leo Yan [Sun, 17 Apr 2022 11:48:37 +0000 (19:48 +0800)]
perf script: Always allow field 'data_src' for auxtrace
If use command 'perf script -F,+data_src' to dump memory samples with
Arm SPE trace data, it reports error:
# perf script -F,+data_src
Samples for 'dummy:u' event do not have DATA_SRC attribute set. Cannot print 'data_src' field.
This is because the 'dummy:u' event is absent DATA_SRC bit in its sample
type, so if a file contains AUX area tracing data then always allow
field 'data_src' to be selected as an option for perf script.
Commit 5467801f1fcb ("gpio: Restrict usage of GPIO chip irq members
before initialization") attempted to fix a race condition that lead to a
NULL pointer, but in the process caused a regression for _AEI/_EVT
declared GPIOs.
This manifests in messages showing deferred probing while trying to
allocate IRQs like so:
amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x0000 to IRQ, err -517
amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x002C to IRQ, err -517
amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x003D to IRQ, err -517
[ .. more of the same .. ]
The code for walking _AEI doesn't handle deferred probing and so this
leads to non-functional GPIO interrupts.
Fix this issue by moving the call to `acpi_gpiochip_request_interrupts`
to occur after gc->irc.initialized is set.
Merge tag 'riscv-for-linus-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes Palmer Dabbelt:
- A pair of build fixes for the recent cpuidle driver
- A fix for systems without sv57 that manifests as a crash
early in boot
* tag 'riscv-for-linus-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: cpuidle: fix Kconfig select for RISCV_SBI_CPUIDLE
RISC-V: mm: Fix set_satp_mode() for platform not having Sv57
cpuidle: riscv: support non-SMP config
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"There's no real pattern to the fixes, but the main one fixes our
pmd_leaf() definition to resolve a NULL dereference on the migration
path.
- Fix PMU event validation in the absence of any event counters
- Fix allmodconfig build using clang in conjunction with binutils
- Fix definitions of pXd_leaf() to handle PROT_NONE entries
- More typo fixes"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mm: fix p?d_leaf()
arm64: fix typos in comments
arm64: Improve HAVE_DYNAMIC_FTRACE_WITH_REGS selection for clang
arm_pmu: Validate single/group leader events
Miaoqian Lin [Wed, 20 Apr 2022 01:49:13 +0000 (01:49 +0000)]
arm/xen: Fix some refcount leaks
The of_find_compatible_node() function returns a node pointer with
refcount incremented, We should use of_node_put() on it when done
Add the missing of_node_put() to release the refcount.
Fixes: 9b08aaa3199a ("ARM: XEN: Move xen_early_init() before efi_init()") Fixes: b2371587fe0c ("arm/xen: Read extended regions from DT and init Xen resource") Signed-off-by: Miaoqian Lin <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]> Signed-off-by: Stefano Stabellini <[email protected]>
Merge tag 'xarray-5.18a' of git://git.infradead.org/users/willy/xarray
Pull xarray fixes from Matthew Wilcox:
"Syzbot found a nasty race between large page splitting and page
lookup. Details in the commit log, but fortunately it has a reliable
reproducer. I thought it better to send this one to you straight away.
Also fix the test suite build for kmem_cache_alloc_lru()"
* tag 'xarray-5.18a' of git://git.infradead.org/users/willy/xarray:
XArray: Disallow sibling entries of nodes
tools: Add kmem_cache_alloc_lru()
Merge tag '5.18-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Four fixes, two of them for stable:
- fcollapse fix
- reconnect lock fix
- DFS oops fix
- minor cleanup patch"
* tag '5.18-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: destage any unwritten data to the server before calling copychunk_write
cifs: use correct lock type in cifs_reconnect()
cifs: fix NULL ptr dereference in refresh_mounts()
cifs: Use kzalloc instead of kmalloc/memset
Merge tag 'fs.fixes.v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull mount_setattr fix from Christian Brauner:
"The recent cleanup in e257039f0fc7 ("mount_setattr(): clean the
control flow and calling conventions") switched the mount attribute
codepaths from do-while to for loops as they are more idiomatic when
walking mounts.
However, we did originally choose do-while constructs because if we
request a mount or mount tree to be made read-only we need to hold
writers in the following way: The mount attribute code will grab
lock_mount_hash() and then call mnt_hold_writers() which will
_unconditionally_ set MNT_WRITE_HOLD on the mount.
Any callers that need write access have to call mnt_want_write(). They
will immediately see that MNT_WRITE_HOLD is set on the mount and the
caller will then either spin (on non-preempt-rt) or wait on
lock_mount_hash() (on preempt-rt).
The fact that MNT_WRITE_HOLD is set unconditionally means that once
mnt_hold_writers() returns we need to _always_ pair it with
mnt_unhold_writers() in both the failure and success paths.
The do-while constructs did take care of this. But Al's change to a
for loop in the failure path stops on the first mount we failed to
change mount attributes _without_ going into the loop to call
mnt_unhold_writers().
This in turn means that once we failed to make a mount read-only via
mount_setattr() - i.e. there are already writers on that mount - we
will block any writers indefinitely. Fix this by ensuring that the for
loop always unsets MNT_WRITE_HOLD including the first mount we failed
to change to read-only. Also sprinkle a few comments into the cleanup
code to remind people about what is happening including myself. After
all, I didn't catch it during review.
This is only relevant on mainline and was reported by syzbot. Details
about the syzbot reports are all in the commit message"
* tag 'fs.fixes.v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
fs: unset MNT_WRITE_HOLD on failure
Merge tag 'sound-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"At this time, the majority of changes are for pending ASoC fixes while
a few usual HD-audio and USB-audio quirks are found.
Almost all patches are small device-specific fixes, and nothing
worrisome stands out, so far"
* tag 'sound-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (37 commits)
ALSA: hda/realtek: Add quirk for Clevo NP70PNP
ALSA: hda: intel-dsp-config: Add RaptorLake PCI IDs
ALSA: hda/realtek: Enable mute/micmute LEDs and limit mic boost on EliteBook 845/865 G9
ALSA: usb-audio: Clear MIDI port active flag after draining
ALSA: usb-audio: add mapping for MSI MAG X570S Torpedo MAX.
ALSA: hda/i915: Fix one too many pci_dev_put()
ALSA: hda/hdmi: add HDMI codec VID for Raptorlake-P
ALSA: hda/hdmi: fix warning about PCM count when used with SOF
sound/oss/dmasound: fix 'dmasound_setup' defined but not used
firmware: cs_dsp: Fix overrun of unterminated control name string
ASoC: codecs: Fix an error handling path in (rx|tx|va)_macro_probe()
ASoC: Intel: sof_es8336: Add a quirk for Huawei Matebook D15
ASoC: Intel: sof_es8336: add a quirk for headset at mic1 port
ASoC: Intel: sof_es8336: support a separate gpio to control headphone
ASoC: Intel: sof_es8336: simplify speaker gpio naming
ASoC: wm8731: Disable the regulator when probing fails
ASoC: Intel: soc-acpi: correct device endpoints for max98373
ASoC: codecs: wcd934x: do not switch off SIDO Buck when codec is in use
ASoC: SOF: topology: Fix memory leak in sof_control_load()
ASoC: SOF: topology: cleanup dailinks on widget unload
...
There is a race between xas_split() and xas_load() which can result in
the wrong page being returned, and thus data corruption. Fortunately,
it's hard to hit (syzbot took three months to find it) and often guarded
with VM_BUG_ON().
The anatomy of this race is:
thread A thread B
order-9 page is stored at index 0x200
lookup of page at index 0x274
page split starts
load of sibling entry at offset 9
stores nodes at offsets 8-15
load of entry at offset 8
The entry at offset 8 turns out to be a node, and so we descend into it,
and load the page at index 0x234 instead of 0x274. This is hard to fix
on the split side; we could replace the entire node that contains the
order-9 page instead of replacing the eight entries. Fixing it on
the lookup side is easier; just disallow sibling entries that point
to nodes. This cannot ever be a useful thing as the descent would not
know the correct offset to use within the new node.
The test suite continues to pass, but I have not added a new test for
this bug.
Turn kmem_cache_alloc() into a wrapper around kmem_cache_alloc_lru().
Fixes: 9bbdc0f32409 ("xarray: use kmem_cache_alloc_lru to allocate xa_node") Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reported-by: Liam R. Howlett <[email protected]> Reported-by: Li Wang <[email protected]>
Subsystems affected by this patch series: mm (memory-failure, memcg,
userfaultfd, hugetlbfs, mremap, oom-kill, kasan, hmm), and kcov"
* emailed patches from Andrew Morton <[email protected]>:
mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove()
kcov: don't generate a warning on vm_insert_page()'s failure
MAINTAINERS: add Vincenzo Frascino to KASAN reviewers
oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup
selftest/vm: add skip support to mremap_test
selftest/vm: support xfail in mremap_test
selftest/vm: verify remap destination address in mremap_test
selftest/vm: verify mmap addr in mremap_test
mm, hugetlb: allow for "high" userspace addresses
userfaultfd: mark uffd_wp regardless of VM_WRITE flag
memcg: sync flush only if periodic flush is delayed
mm/memory-failure.c: skip huge_zero_page in memory_failure()
mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()
Nicholas Piggin [Fri, 22 Apr 2022 06:01:05 +0000 (16:01 +1000)]
mm/vmalloc: huge vmalloc backing pages should be split rather than compound
Huge vmalloc higher-order backing pages were allocated with __GFP_COMP
in order to allow the sub-pages to be refcounted by callers such as
"remap_vmalloc_page [sic]" (remap_vmalloc_range).
However a similar problem exists for other struct page fields callers
use, for example fb_deferred_io_fault() takes a vmalloc'ed page and
not only refcounts it but uses ->lru, ->mapping, ->index.
This is not compatible with compound sub-pages, and can cause bad page
state issues like
The correct approach is to use split high-order pages for the huge
vmalloc backing. These allow callers to treat them in exactly the same
way as individually-allocated order-0 pages.
bpf, lwt: Fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook
xmit_check_hhlen() observes the dst for getting the device hard header
length to make sure a modified packet can fit. When a helper which changes
the dst - such as bpf_skb_set_tunnel_key() - is called as part of the
xmit program the accessed dst is no longer valid.
Daniel Lezcano [Fri, 22 Apr 2022 14:10:29 +0000 (16:10 +0200)]
thermal/governor: Remove deprecated information
The userspace governor is still in use on production systems and the
deprecating warning is scary.
Even if we want to get rid of the userspace governor, it is too soon
yet as the alternatives are not yet adopted.
Change the deprecated warning by an information message suggesting to
switch to the netlink thermal events.
Fixes: 0275c9fb0eff ("thermal/core: Make the userspace governor deprecated") Signed-off-by: Daniel Lezcano <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
It has been reported the warning is annoying as the cooling device
state is still needed on some production system.
Meanwhile we provide a way to consolidate the thermal framework to
prevent multiple actors acting on the cooling devices with conflicting
decisions, let's revert this warning.
netfilter: nft_set_rbtree: overlap detection with element re-addition after deletion
This patch fixes spurious EEXIST errors.
Extend d2df92e98a34 ("netfilter: nft_set_rbtree: handle element
re-addition after deletion") to deal with elements with same end flags
in the same transation.
Reset the overlap flag as described by 7c84d41416d8 ("netfilter:
nft_set_rbtree: Detect partial overlaps on insertion").
Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion") Fixes: d2df92e98a34 ("netfilter: nft_set_rbtree: handle element re-addition after deletion") Signed-off-by: Pablo Neira Ayuso <[email protected]> Reviewed-by: Stefano Brivio <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
Miaoqian Lin [Wed, 20 Apr 2022 11:04:08 +0000 (19:04 +0800)]
net: dsa: Add missing of_node_put() in dsa_port_link_register_of
The device_node pointer is returned by of_parse_phandle() with refcount
incremented. We should use of_node_put() on it when done.
of_node_put() will check for NULL value.
Fixes: a20f997010c4 ("net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed") Signed-off-by: Miaoqian Lin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Muchun Song [Fri, 22 Apr 2022 06:00:33 +0000 (14:00 +0800)]
arm64: mm: fix p?d_leaf()
The pmd_leaf() is used to test a leaf mapped PMD, however, it misses
the PROT_NONE mapped PMD on arm64. Fix it. A real world issue [1]
caused by this was reported by Qian Cai. Also fix pud_leaf().
net: cosa: fix error check return value of register_chrdev()
If major equal 0, register_chrdev() returns error code when it fails.
This function dynamically allocate a major and return its number on
success, so we should use "< 0" to check it instead of "!".
Merge tag 'drm-fixes-2022-04-22' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Extra quiet after Easter, only have minor i915 and msm pulls. However
I haven't seen a PR from our misc tree in a little while, I've cc'ed
all the suspects. Once that unblocks I expect a bit larger bunch of
patches to arrive.
Otherwise as I said, one msm revert and two i915 fixes.
msm:
- revert iommu change that broke some platforms.
i915:
- Unset enable_psr2_sel_fetch if PSR2 detection fails
- Fix to detect when VRR is turned off from panel settings"
* tag 'drm-fixes-2022-04-22' of git://anongit.freedesktop.org/drm/drm:
drm/i915/display/psr: Unset enable_psr2_sel_fetch if other checks in intel_psr2_config_valid() fails
drm/msm: Revert "drm/msm: Stop using iommu_present()"
drm/i915/display/vrr: Reset VRR capable property on a long hpd
mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove()
In some cases it is possible for mmu_interval_notifier_remove() to race
with mn_tree_inv_end() allowing it to return while the notifier data
structure is still in use. Consider the following sequence:
As the wait_event() condition is true it will return immediately. This
can lead to use-after-free type errors if the caller frees the data
structure containing the interval notifier subscription while it is
still on a deferred list. Fix this by taking the appropriate lock when
reading invalidate_seq to ensure proper synchronisation.
I observed this whilst running stress testing during some development.
You do have to be pretty unlucky, but it leads to the usual problems of
use-after-free (memory corruption, kernel crash, difficult to diagnose
WARN_ON, etc).
Aleksandr Nogikh [Thu, 21 Apr 2022 23:36:07 +0000 (16:36 -0700)]
kcov: don't generate a warning on vm_insert_page()'s failure
vm_insert_page()'s failure is not an unexpected condition, so don't do
WARN_ONCE() in such a case.
Instead, print a kernel message and just return an error code.
This flaw has been reported under an OOM condition by sysbot [1].
The message is mainly for the benefit of the test log, in this case the
fuzzer's log so that humans inspecting the log can figure out what was
going on. KCOV is a testing tool, so I think being a little more chatty
when KCOV unexpectedly is about to fail will save someone debugging
time.
We don't want the WARN, because it's not a kernel bug that syzbot should
report, and failure can happen if the fuzzer tries hard enough (as
above).
oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup
The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which
can be targeted by the oom reaper. This mapping is used to store the
futex robust list head; the kernel does not keep a copy of the robust
list and instead references a userspace address to maintain the
robustness during a process death.
A race can occur between exit_mm and the oom reaper that allows the oom
reaper to free the memory of the futex robust list before the exit path
has handled the futex death:
selftest/vm: verify remap destination address in mremap_test
Because mremap does not have a MAP_FIXED_NOREPLACE flag, it can destroy
existing mappings. This causes a segfault when regions such as text are
remapped and the permissions are changed.
Verify the requested mremap destination address does not overlap any
existing mappings by using mmap's MAP_FIXED_NOREPLACE flag. Keep
incrementing the destination address until a valid mapping is found or
fail the current test once the max address is reached.
Avoid calling mmap with requested addresses that are less than the
system's mmap_min_addr. When run as root, mmap returns EACCES when
trying to map addresses < mmap_min_addr. This is not one of the error
codes for the condition to retry the mmap in the test.
Rather than arbitrarily retrying on EACCES, don't attempt an mmap until
addr > vm.mmap_min_addr.
Add a munmap call after an alignment check as the mappings are retained
after the retry and can reach the vm.max_map_count sysctl.
This is a fix for commit f6795053dac8 ("mm: mmap: Allow for "high"
userspace addresses") for hugetlb.
This patch adds support for "high" userspace addresses that are
optionally supported on the system and have to be requested via a hint
mechanism ("high" addr parameter to mmap).
Architectures such as powerpc and x86 achieve this by making changes to
their architectural versions of hugetlb_get_unmapped_area() function.
However, arm64 uses the generic version of that function.
So take into account arch_get_mmap_base() and arch_get_mmap_end() in
hugetlb_get_unmapped_area(). To allow that, move those two macros out
of mm/mmap.c into include/linux/sched/mm.h
If these macros are not defined in architectural code then they default
to (TASK_SIZE) and (base) so should not introduce any behavioural
changes to architectures that do not define them.
For the time being, only ARM64 is affected by this change.
Catalin (ARM64) said
"We should have fixed hugetlb_get_unmapped_area() as well when we added
support for 52-bit VA. The reason for commit f6795053dac8 was to
prevent normal mmap() from returning addresses above 48-bit by default
as some user-space had hard assumptions about this.
It's a slight ABI change if you do this for hugetlb_get_unmapped_area()
but I doubt anyone would notice. It's more likely that the current
behaviour would cause issues, so I'd rather have them consistent.
Basically when arm64 gained support for 52-bit addresses we did not
want user-space calling mmap() to suddenly get such high addresses,
otherwise we could have inadvertently broken some programs (similar
behaviour to x86 here). Hence we added commit f6795053dac8. But we
missed hugetlbfs which could still get such high mmap() addresses. So
in theory that's a potential regression that should have bee addressed
at the same time as commit f6795053dac8 (and before arm64 enabled
52-bit addresses)"
Nadav Amit [Thu, 21 Apr 2022 23:35:43 +0000 (16:35 -0700)]
userfaultfd: mark uffd_wp regardless of VM_WRITE flag
When a PTE is set by UFFD operations such as UFFDIO_COPY, the PTE is
currently only marked as write-protected if the VMA has VM_WRITE flag
set. This seems incorrect or at least would be unexpected by the users.
Consider the following sequence of operations that are being performed
on a certain page:
At this point the user would expect to still get UFFD notification when
the page is accessed for write, but the user would not get one, since
the PTE was not marked as UFFD_WP during UFFDIO_COPY.
Fix it by always marking PTEs as UFFD_WP regardless on the
write-permission in the VMA flags.
memcg: sync flush only if periodic flush is delayed
Daniel Dao has reported [1] a regression on workloads that may trigger a
lot of refaults (anon and file). The underlying issue is that flushing
rstat is expensive. Although rstat flush are batched with (nr_cpus *
MEMCG_BATCH) stat updates, it seems like there are workloads which
genuinely do stat updates larger than batch value within short amount of
time. Since the rstat flush can happen in the performance critical
codepaths like page faults, such workload can suffer greatly.
This patch fixes this regression by making the rstat flushing
conditional in the performance critical codepaths. More specifically,
the kernel relies on the async periodic rstat flusher to flush the stats
and only if the periodic flusher is delayed by more than twice the
amount of its normal time window then the kernel allows rstat flushing
from the performance critical codepaths.
Now the question: what are the side-effects of this change? The worst
that can happen is the refault codepath will see 4sec old lruvec stats
and may cause false (or missed) activations of the refaulted page which
may under-or-overestimate the workingset size. Though that is not very
concerning as the kernel can already miss or do false activations.
There are two more codepaths whose flushing behavior is not changed by
this patch and we may need to come to them in future. One is the
writeback stats used by dirty throttling and second is the deactivation
heuristic in the reclaim. For now keeping an eye on them and if there
is report of regression due to these codepaths, we will reevaluate then.
mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()
There is a race condition between memory_failure_hugetlb() and hugetlb
free/demotion, which causes setting PageHWPoison flag on the wrong page.
The one simple result is that wrong processes can be killed, but another
(more serious) one is that the actual error is left unhandled, so no one
prevents later access to it, and that might lead to more serious results
like consuming corrupted data.
Think about the below race window:
CPU 1 CPU 2
memory_failure_hugetlb
struct page *head = compound_head(p);
hugetlb page might be freed to
buddy, or even changed to another
compound page.
get_hwpoison_page -- page is not what we want now...
The current code first does prechecks roughly and then reconfirms after
taking refcount, but it's found that it makes code overly complicated,
so move the prechecks in a single hugetlb_lock range.
A newly introduced function, try_memory_failure_hugetlb(), always takes
hugetlb_lock (even for non-hugetlb pages). That can be improved, but
memory_failure() is rare in principle, so should not be a big problem.