Git Repo - linux.git/log

team: protect features update by RCU to avoid deadlock

Function __team_compute_features() is protected by team->lock
mutex when it is called from team_compute_features() used when
features of an underlying device is changed. This causes
a deadlock when NETDEV_FEAT_CHANGE notifier for underlying device
is fired due to change propagated from team driver (e.g. MTU
change). It's because callbacks like team_change_mtu() or
team_vlan_rx_{add,del}_vid() protect their port list traversal
by team->lock mutex.

Example (r8169 case where this driver disables TSO for certain MTU
values):
...
[ 6391.348202]  __mutex_lock.isra.6+0x2d0/0x4a0
[ 6391.358602]  team_device_event+0x9d/0x160 [team]
[ 6391.363756]  notifier_call_chain+0x47/0x70
[ 6391.368329]  netdev_update_features+0x56/0x60
[ 6391.373207]  rtl8169_change_mtu+0x14/0x50 [r8169]
[ 6391.378457]  dev_set_mtu_ext+0xe1/0x1d0
[ 6391.387022]  dev_set_mtu+0x52/0x90
[ 6391.390820]  team_change_mtu+0x64/0xf0 [team]
[ 6391.395683]  dev_set_mtu_ext+0xe1/0x1d0
[ 6391.399963]  do_setlink+0x231/0xf50
...

In fact team_compute_features() called from team_device_event()
does not need to be protected by team->lock mutex and rcu_read_lock()
is sufficient there for port list traversal.

Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device")
Cc: Saeed Mahameed <[email protected]>
Signed-off-by: Ivan Vecera <[email protected]>
Reviewed-by: Cong Wang <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

MAINTAINERS: add David Ahern to IPv4/IPv6 maintainers

David has been the de-facto maintainer for much of the IP code
for the last couple of years, let's make it official.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net/mlx5: CT: Fix incorrect removal of tuple_nat_node from nat rhashtable

If a non nat tuple entry is inserted just to the regular tuples
rhashtable (ct_tuples_ht) and not to natted tuples rhashtable
(ct_nat_tuples_ht). Commit bc562be9674b ("net/mlx5e: CT: Save ct entries
tuples in hashtables") mixed up the return labels and names sot that on
cleanup or failure we still try to remove for the natted tuples rhashtable.

Fix that by correctly checking if a natted tuples insertion
before removing it. While here make it more readable.

Fixes: bc562be9674b ("net/mlx5e: CT: Save ct entries tuples in hashtables")
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Paul Blakey <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Revert parameters on errors when changing MTU and LRO state without reset

Sometimes, channel params are changed without recreating the channels.
It happens in two basic cases: when the channels are closed, and when
the parameter being changed doesn't affect how channels are configured.
Such changes invoke a hardware command that might fail. The whole
operation should be reverted in such cases, but the code that restores
the parameters' values in the driver was missing. This commit adds this
handling.

Fixes: 2e20a151205b ("net/mlx5e: Fail safe mtu and lro setting")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Revert parameters on errors when changing trust state without reset

Trust state may be changed without recreating the channels. It happens
when the channels are closed, and when channel parameters (min inline
mode) stay the same after changing the trust state. Changing the trust
state is a hardware command that may fail. The current code didn't
restore the channel parameters to their old values if an error happened
and the channels were closed. This commit adds handling for this case.

Fixes: 6e0504c69811 ("net/mlx5e: Change inline mode correctly when changing trust state")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Correctly handle changing the number of queues when the interface is down

This commit addresses two issues related to changing the number of
queues when the channels are closed:

1. Missing call to mlx5e_num_channels_changed to update
real_num_tx_queues when the number of TCs is changed.

2. When mlx5e_num_channels_changed returns an error, the channel
parameters must be reverted.

Two Fixes: tags correspond to the first commits where these two issues
were introduced.

Fixes: 3909a12e7913 ("net/mlx5e: Fix configuration of XPS cpumasks and netdev queues in corner cases")
Fixes: fa3748775b92 ("net/mlx5e: Handle errors from netif_set_real_num_{tx,rx}_queues")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Fix CT rule + encap slow path offload and deletion

Currently, if a neighbour isn't valid when offloading tunnel encap rules,
we offload the original match and replace the original action with
"goto slow path" action. For this we use a temporary flow attribute based
on the original flow attribute and then change the action. Flow flags,
which among those is the CT flag, are still shared for the slow path rule
offload, so we end up parsing this flow as a CT + goto slow path rule.

Besides being unnecessary, CT action offload saves extra information in
the passed flow attribute, such as created ct_flow and mod_hdr, which
is lost onces the temporary flow attribute is freed.

When a neigh is updated and is valid, we offload the original CT rule
with original CT action, which again creates a ct_flow and mod_hdr
and saves it in the flow's original attribute. Then we delete the slow
path rule with a temporary flow attribute based on original updated
flow attribute, and we free the relevant ct_flow and mod_hdr.

Then when tc deletes this flow, we try to free the ct_flow and mod_hdr
on the flow's attribute again.

To fix the issue, skip all furture proccesing (CT/Sample/Split rules)
in offload/unoffload of slow path rules.

Call trace:
[  758.850525] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000218
[  758.952987] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[  758.964170] Modules linked in: act_csum(E) act_pedit(E) act_tunnel_key(E) act_ct(E) nf_flow_table(E) xt_nat(E) ip6table_filter(E) ip6table_nat(E) xt_comment(E) ip6_tables(E) xt_conntrack(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) xt_addrtype(E) iptable_filter(E) iptable_nat(E) bpfilter(E) br_netfilter(E) bridge(E) stp(E) llc(E) xfrm_user(E) overlay(E) act_mirred(E) act_skbedit(E) rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) esp6_offload(E) esp6(E) esp4_offload(E) esp4(E) xfrm_algo(E) mlx5_ib(OE) ib_uverbs(OE) geneve(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink_cttimeout(E) nfnetlink(E) mlx5_core(OE) act_gact(E) cls_flower(E) sch_ingress(E) openvswitch(E) nsh(E) nf_conncount(E) nf_nat(E) mlxfw(OE) psample(E) nf_conntrack(E) nf_defrag_ipv4(E) vfio_mdev(E) mdev(E) ib_core(OE) mlx_compat(OE) crct10dif_ce(E) uio_pdrv_genirq(E) uio(E) i2c_mlx(E) mlxbf_pmc(E) sbsa_gwdt(E) mlxbf_gige(E) gpio_mlxbf2(E) mlxbf_pka(E) mlx_trio(E) mlx_bootctl(E) bluefield_edac(E) knem(O)
[  758.964225]  ip_tables(E) mlxbf_tmfifo(E) ipv6(E) crc_ccitt(E) nf_defrag_ipv6(E)
[  759.154186] CPU: 5 PID: 122 Comm: kworker/u16:1 Tainted: G           OE     5.4.60-mlnx.52.gde81e85 #1
[  759.172870] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS BlueField:3.5.0-2-gc1b5d64 Jan  4 2021
[  759.195466] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core]
[  759.207344] pstate: a0000005 (NzCv daif -PAN -UAO)
[  759.217003] pc : mlx5_del_flow_rules+0x5c/0x160 [mlx5_core]
[  759.228229] lr : mlx5_del_flow_rules+0x34/0x160 [mlx5_core]
[  759.405858] Call trace:
[  759.410804]  mlx5_del_flow_rules+0x5c/0x160 [mlx5_core]
[  759.421337]  __mlx5_eswitch_del_rule.isra.43+0x5c/0x1c8 [mlx5_core]
[  759.433963]  mlx5_eswitch_del_offloaded_rule_ct+0x34/0x40 [mlx5_core]
[  759.446942]  mlx5_tc_rule_delete_ct+0x68/0x74 [mlx5_core]
[  759.457821]  mlx5_tc_ct_delete_flow+0x160/0x21c [mlx5_core]
[  759.469051]  mlx5e_tc_unoffload_fdb_rules+0x158/0x168 [mlx5_core]
[  759.481325]  mlx5e_tc_encap_flows_del+0x140/0x26c [mlx5_core]
[  759.492901]  mlx5e_rep_update_flows+0x11c/0x1ec [mlx5_core]
[  759.504127]  mlx5e_rep_neigh_update+0x160/0x200 [mlx5_core]
[  759.515314]  process_one_work+0x178/0x400
[  759.523350]  worker_thread+0x58/0x3e8
[  759.530685]  kthread+0x100/0x12c
[  759.537152]  ret_from_fork+0x10/0x18
[  759.544320] Code: 97ffef55 51000673 3100067f 54ffff41 (b9421ab3)
[  759.556548] ---[ end trace fab818bb1085832d ]---

Fixes: 4c3844d9e97e ("net/mlx5e: CT: Introduce connection tracking")
Signed-off-by: Paul Blakey <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Reviewed-by: Vlad Buslov <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Disable hw-tc-offload when MLX5_CLS_ACT config is disabled

The cited commit introduce new CONFIG_MLX5_CLS_ACT kconfig variable
to control compilation of TC hardware offloads implementation.
When this configuration is disabled the driver is still wrongly
reports in ethtool that hw-tc-offload is supported.

Fixed by reporting hw-tc-offload is supported only when
CONFIG_MLX5_CLS_ACT is enabled.

Fixes: d956873f908c ("net/mlx5e: Introduce kconfig var for TC support")
Signed-off-by: Maor Dickman <[email protected]>
Reviewed-by: Vlad Buslov <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: Maintain separate page trees for ECPF and PF functions

Pages for the host PF and ECPF were stored in the same tree, so the ECPF
pages were being freed along with the host PF's when the host driver
unloaded.

Combine the function ID and ECPF flag to use as an index into the
x-array containing the trees to get a different tree for the host PF and
ECPF.

Fixes: c6168161f693 ("net/mlx5: Add support for release all pages event")
Signed-off-by: Daniel Jurgens <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Fix IPSEC stats

When IPSEC offload isn't active, the number of stats is not zero, but
the strings are not filled, leading to exposing stats with empty names.
Fix this by using the same condition for NUM_STATS and FILL_STRS.

Fixes: 0aab3e1b04ae ("net/mlx5e: IPSec, Expose IPsec HW stat only for supporting HW")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Reviewed-by: Raed Salem <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Reduce tc unsupported key print level

"Unsupported key used:" appears in kernel log when flows with
unsupported key are used, arp fields for example.

OpenVSwitch was changed to match on arp fields by default that
caused this warning to appear in kernel log for every arp rule, which
can be a lot.

Fix by lowering print level from warning to debug.

Fixes: e3a2b7ed018e ("net/mlx5e: Support offload cls_flower with drop action")
Signed-off-by: Maor Dickman <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Reviewed-by: Saeed Mahameed <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: free page before return

Instead of directly return, goto the error handling label to free
allocated page.

Fixes: 5f29458b77d5 ("net/mlx5e: Support dump callback in TX reporter")
Signed-off-by: Pan Bian <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: E-switch, Fix rate calculation for overflow

rate_bytes_ps is a 64-bit field. It passed as 32-bit field to
apply_police_params(). Due to this when police rate is higher
than 4Gbps, 32-bit calculation ignores the carry. This results
in incorrect rate configurationn the device.

Fix it by performing 64-bit calculation.

Fixes: fcb64c0f5640 ("net/mlx5: E-Switch, add ingress rate support")
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Eli Cohen <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: Fix memory leak on flow table creation error flow

When we create the ft object we also init rhltable in ft->fgs_hash.
So in error flow before kfree of ft we need to destroy that rhltable.

Fixes: 693c6883bbc4 ("net/mlx5: Add hash table for flow groups in flow table")
Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

Merge tag 'mac80211-for-net-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
A couple of fixes:
* fix 160 MHz channel switch in mac80211
* fix a staging driver to not deadlock due to some
   recent cfg80211 changes
* fix NULL-ptr deref if cfg80211 returns -EINPROGRESS
   to wext (syzbot)
* pause TX in mac80211 in type change to prevent crashes
   (syzbot)

* tag 'mac80211-for-net-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211:
  staging: rtl8723bs: fix wireless regulatory API misuse
  mac80211: pause TX while changing interface type
  wext: fix NULL-ptr-dereference with cfg80211's lack of commit()
  mac80211: 160MHz with extended NSS BW in CSA
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'wireless-drivers-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for v5.11

Second set of fixes for v5.11. Like in last time we again have more
fixes than usual Actually a bit too much for my liking in this state
of the cycle, but due to unrelated challenges I was only able to
submit them now.

We have few important crash fixes, iwlwifi modifying read-only data
being the most reported issue, and also smaller fixes to iwlwifi.

mt76
* fix a clang warning about enum usage
* fix rx buffer refcounting crash

mt7601u
* fix rx buffer refcounting crash
* fix crash when unbplugging the device

iwlwifi
* fix a crash where we were modifying read-only firmware data
* lots of smaller fixes all over the driver

* tag 'wireless-drivers-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers: (24 commits)
  mt7601u: fix kernel crash unplugging the device
  iwlwifi: queue: bail out on invalid freeing
  iwlwifi: mvm: guard against device removal in reprobe
  iwlwifi: Fix IWL_SUBDEVICE_NO_160 macro to use the correct bit.
  iwlwifi: mvm: clear IN_D3 after wowlan status cmd
  iwlwifi: pcie: add rules to match Qu with Hr2
  iwlwifi: mvm: invalidate IDs of internal stations at mvm start
  iwlwifi: mvm: fix the return type for DSM functions 1 and 2
  iwlwifi: pcie: reschedule in long-running memory reads
  iwlwifi: pcie: use jiffies for memory read spin time limit
  iwlwifi: pcie: fix context info memory leak
  iwlwifi: pcie: add a NULL check in iwl_pcie_txq_unmap
  iwlwifi: pcie: set LTR on more devices
  iwlwifi: queue: don't crash if txq->entries is NULL
  iwlwifi: fix the NMI flow for old devices
  iwlwifi: pnvm: don't try to load after failures
  iwlwifi: pnvm: don't skip everything when not reloading
  iwlwifi: pcie: avoid potential PNVM leaks
  iwlwifi: mvm: take mutex for calling iwl_mvm_get_sync_time()
  iwlwifi: mvm: skip power command when unbinding vif during CSA
  ...
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

iwlwifi: provide gso_type to GSO packets

net/core/tso.c got recent support for USO, and this broke iwlfifi
because the driver implemented a limited form of GSO.

Providing ->gso_type allows for skb_is_gso_tcp() to provide
a correct result.

Fixes: 3d5b459ba0e3 ("net: tso: add UDP segmentation support")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Ben Greear <[email protected]>
Tested-by: Ben Greear <[email protected]>
Cc: Luca Coelho <[email protected]>
Cc: Johannes Berg <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209913
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

bpf, preload: Fix build when $(O) points to a relative path

Building the kernel with CONFIG_BPF_PRELOAD, and by providing a relative
path for the output directory, may fail with the following error:

  $ make O=build bindeb-pkg
  ...
  /.../linux/tools/scripts/Makefile.include:5: *** O=build does not exist.  Stop.
  make[7]: *** [/.../linux/kernel/bpf/preload/Makefile:9: kernel/bpf/preload/libbpf.a] Error 2
  make[6]: *** [/.../linux/scripts/Makefile.build:500: kernel/bpf/preload] Error 2
  make[5]: *** [/.../linux/scripts/Makefile.build:500: kernel/bpf] Error 2
  make[4]: *** [/.../linux/Makefile:1799: kernel] Error 2
  make[4]: *** Waiting for unfinished jobs....

In the case above, for the "bindeb-pkg" target, the error is produced by
the "dummy" check in Makefile.include, called from libbpf's Makefile.
This check changes directory to $(PWD) before checking for the existence
of $(O). But at this step we have $(PWD) pointing to "/.../linux/build",
and $(O) pointing to "build". So the Makefile.include tries in fact to
assert the existence of a directory named "/.../linux/build/build",
which does not exist.

Note that the error does not occur for all make targets and
architectures combinations. This was observed on x86 for "bindeb-pkg",
or for a regular build for UML [0].

Here are some details. The root Makefile recursively calls itself once,
after changing directory to $(O). The content for the variable $(PWD) is
preserved across recursive calls to make, so it is unchanged at this
step. For "bindeb-pkg", $(PWD) is eventually updated because the target
writes a new Makefile (as debian/rules) and calls it indirectly through
dpkg-buildpackage. This script does not preserve $(PWD), which is reset
to the current working directory when the target in debian/rules is
called.

Although not investigated, it seems likely that something similar causes
UML to change its value for $(PWD).

Non-trivial fixes could be to remove the use of $(PWD) from the "dummy"
check, or to make sure that $(PWD) and $(O) are preserved or updated to
always play well and form a valid $(PWD)/$(O) path across the different
targets and architectures. Instead, we take a simpler approach and just
update $(O) when calling libbpf's Makefile, so it points to an absolute
path which should always resolve for the "dummy" check run (through
includes) by that Makefile.

David Gow previously posted a slightly different version of this patch
as a RFC [0], two months ago or so.

  [0] https://lore.kernel.org/bpf/20201119085022.3606135 [email protected]/t/#u

Fixes: d71fa5c9763c ("bpf: Add kernel module with user mode driver that populates bpffs.")
Reported-by: David Gow <[email protected]>
Signed-off-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Cc: Brendan Higgins <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

igc: fix link speed advertising

Link speed advertising in igc has two problems:

- When setting the advertisement via ethtool, the link speed is converted
  to the legacy 32 bit representation for the intel PHY code.
  This inadvertently drops ETHTOOL_LINK_MODE_2500baseT_Full_BIT (being
  beyond bit 31).  As a result, any call to `ethtool -s ...' drops the
  2500Mbit/s link speed from the PHY settings.  Only reloading the driver
  alleviates that problem.

  Fix this by converting the ETHTOOL_LINK_MODE_2500baseT_Full_BIT to the
  Intel PHY ADVERTISE_2500_FULL bit explicitly.

- Rather than checking the actual PHY setting, the .get_link_ksettings
  function always fills link_modes.advertising with all link speeds
  the device is capable of.

  Fix this by checking the PHY autoneg_advertised settings and report
  only the actually advertised speeds up to ethtool.

Fixes: 8c5ad0dae93c ("igc: Add ethtool support")
Signed-off-by: Corinna Vinschen <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

um: time: fix initialization in time-travel mode

In time-travel mode, since my previous patch, the start time was
initialized too late, so that the system would read it before we
set it, thus always starting system time at 0 (1970-01-01). This
happens because timekeeping_init() reads the time and is called
before time_init().

Unfortunately, I didn't see this before because I was testing it
only with the RTC patch applied (and enabled), and then the time
is read again by the RTC a little - after time_init() this time.

Fix this by just doing the initialization whenever necessary.

Fixes: 2701c1bd91dd ("um: time: Fix read_persistent_clock64() in time-travel")
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>

um: fix os_idle_sleep() to not hang

Changing os_idle_sleep() to use pause() (I accidentally described
it as an empty select() in the commit log because I had changed it
from that to pause() in a later revision) exposed a race condition
in the idle code. The following can happen:

timer_settime(0, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=624017}}, NULL) = 0
...
<SIGALRM is delivered but we're already on the way to idle>
pause()

and we now hang forever. This was previously possible as well, but
it could never cause UML to hang for more than a second since we
could only sleep for that much, so at most you'd notice a "hiccup"
in the UML. Obviously, any sort of external interrupt also "saves"
it and interrupts pause().

Fix this by properly handling the race, rather than papering over
it again:

- first, block SIGALRM, and obtain the old signal set
- check the timer
- suspend, waiting for any signal out of the old set, if, and only
if, the timer will fire in the future
- restore the old signal mask

This ensures race-free operation: as it's blocked, the signal won't
be delivered while we're looking at the timer even if it were to be
triggered right _after_ we've returned from timer_gettime() with a
non-zero value (telling us the timer will trigger). Thus, despite
getting to sigsuspend() because timer_gettime() told us we're still
waiting, we'll not hang because sigsuspend() will return immediately
due to the pending signal.

Fixes: 49da38a3ef33 ("um: Simplify os_idle_sleep() and sleep longer")
Signed-off-by: Johannes Berg <[email protected]>
Acked-By: Anton Ivanov <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>

Revert "um: support some of ARCH_HAS_SET_MEMORY"

This reverts commit 963285b0b47a ("um: support some of
ARCH_HAS_SET_MEMORY"), as it turns out that it's not only not
working (due to um never using the protection bits in the
page tables) but also corrupts the page tables if used on a
non-vmalloc page, since um never allocates proper page tables
for the 'physmem' in the first place.

Fixing all this will take more effort, so for now revert it.

Reported-by: Benjamin Berg <[email protected]>
Fixes: 963285b0b47a ("um: support some of ARCH_HAS_SET_MEMORY")
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>

Revert "um: allocate a guard page to helper threads"

This reverts commit ef4459a6da09 ("um: allocate a guard page to
helper threads"), it's broken in multiple ways:

1) the free no longer matches the alloc; and

2) more importantly, the set_memory_ro() causes allocation of
    page tables for the normal memory that doesn't have any,
    and that later causes corruption and crashes (usually but
    not always in vfree()).

We could fix the first bug and use vmalloc() to work around the
second, but set_memory_ro() actually doesn't do anything either
so I'll just revert that as well.

Reported-by: Benjamin Berg <[email protected]>
Fixes: ef4459a6da09 ("um: allocate a guard page to helper threads")
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>

um: virtio: free vu_dev only with the contained struct device

Since struct device is refcounted, we shouldn't free the vu_dev
immediately when it's removed from the platform device, but only
when the references actually all go away. Move the freeing to
the release to accomplish that.

Fixes: 5d38f324993f ("um: drivers: Add virtio vhost-user driver")
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>

um: kmsg_dumper: always dump when not tty console

With the addition of the ttynull console driver, the chance that a
console driver was already registerd did increase. Refine the logic when
to dump the kernel message buffer: always dump the buffer, when the UML
stdio console driver is not active and the preferred console.

Signed-off-by: Thomas Meyer <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>

um: stdio_console: Make preferred console

The addition of the "ttynull" console driver did break the ordering of the
UML stdio console driver.
The UML stdio console driver is added in late_initcall (7), whereby the
ttynull driver is added in device_initcall (6), which always does make the
ttynull driver the default console.

Fix it by explicitly adding the UML stdio console as the preferred console,
in case no 'console=' command line option was specified.

Signed-off-by: Thomas Meyer <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>

um: return error from ioremap()

Back a few years ago, ioremap() was added to UML so that we'd
not break the build for everything all the time. However, for
some reason, v1 of the patch got applied, rather than the v2
that returned NULL, which was discussed here:

https://lore.kernel.org/lkml/1495726955 [email protected]/

Fix that now.

Signed-off-by: Johannes Berg <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>

um: ubd: fix command line handling of ubd

This commit fixes a regression to handle command line parameters of ubd.
With a simple line "./linux ubd0="./disk-ext4.img", it fails at
ubd_setup_common(). The commit adds additional checks to the variables
in order to properly parse the paremeters which previously worked.

Fixes: ef3ba87cb7c9 ("um: ubd: Set device serial attribute from cmdline")
Cc: Christopher Obbard <[email protected]>
Signed-off-by: Hajime Tazaki <[email protected]>
Acked-by: Christopher Obbard <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>

Merge tag 'qcom-arm64-defconfig-fixes-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes

Qualcomm ARM64 defconfig fixes for v5.11

Devicetree patches for SDM845 introduced in v5.11 requires the
platform's interconnect driver to be buildin, or the kernel will fail to
provide a valid console when we hit userspace.

* tag 'qcom-arm64-defconfig-fixes-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
arm64: defconfig: Make INTERCONNECT_QCOM_SDM845 builtin

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'qcom-arm64-fixes-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes

Qualcomm ARM64 fixes for 5.11

This fixes a regression in Lenovo Yoga C630, where the touchpad in some
units stopped working, by re-enabling the "tsc2" device.

It also marks the LPASS related clocks as protected to allow DB845c and
the Lenovo Yoga C630 to boot even if CONFIG_SDM_LPASSCC_845 is enabled.

* tag 'qcom-arm64-fixes-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
arm64: dts: qcom: sdm845: Reserve LPASS clocks in gcc
arm64: dts: qcom: c630: keep both touchpad devices enabled

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'amlogic-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic into arm/fixes

arm64: dts: amlogic: fixes for v5.11-rc
- meson-g12: Set FL-adj property value

* tag 'amlogic-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic:
arm64: dts: amlogic: meson-g12: Set FL-adj property value

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'stm32-dt-for-v5.11-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32 into arm/fixes

STM32 DT fixes for v5.11, round 1

Highlights:
-----------

-Fixes are for DHCOM/DHCOR boards:
  - Fix DRC02 uSD card detect polarity
  - use uSD card detect on DHCOM
  - Disable uSD WP on DHCOM
  - Disable TSC2004 on DRC02
  - Fix GPIO hogs on DHCOM boards

* tag 'stm32-dt-for-v5.11-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32:
  ARM: dts: stm32: Fix GPIO hog flags on DHCOM DRC02
  ARM: dts: stm32: Fix GPIO hog flags on DHCOM PicoITX
  ARM: dts: stm32: Fix GPIO hog names on DHCOM
  ARM: dts: stm32: Disable optional TSC2004 on DRC02 board
  ARM: dts: stm32: Disable WP on DHCOM uSD slot
  ARM: dts: stm32: Connect card-detect signal on DHCOM
  ARM: dts: stm32: Fix polarity of the DH DRC02 uSD card detect

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>

mailmap: remove the "repo-abbrev" comment

Remove the magical "repo-abbrev" comment added when this file was
introduced in e0ab1ec9fcd3 ([PATCH] add .mailmap for proper
git-shortlog output, 2007-02-14).

It's been an undocumented feature of git-shortlog(1), originally added
to git for Linus's use. Since then he's no longer using it[1], and
I've removed the feature in git.git's 4e168333a87 (shortlog: remove
unused(?) "repo-abbrev" feature, 2021-01-12). It's on the "master"
branch, but not yet in a release version.

Let's also remove it from linux.git, both as a heads-up to any
potential users of it in linux.git whose use would be broken sooner
than later by git itself, and because it'll eventually be entirely
redundant.

1. https://lore.kernel.org/git/CAHk-=wixHyBKZVUcxq+NCWMbkrX0xnppb7UCopRWw1+oExYpYw@mail.gmail.com/

Acked-by: Linus Torvalds <[email protected]>
Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

parisc: Enable -mlong-calls gcc option by default when !CONFIG_MODULES

When building a kernel without module support, the CONFIG_MLONGCALL option
needs to be enabled in order to reach symbols which are outside of a 22-bit
branch.

This patch changes the autodetection in the Kconfig script to always enable
CONFIG_MLONGCALL when modules are disabled and uses a far call to
preempt_schedule_irq() in intr_do_preempt() to reach the symbol in all cases.

Signed-off-by: Helge Deller <[email protected]>
Reported-by: kernel test robot <[email protected]>
Cc: [email protected] # v5.6+

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:

-  x86 bugfixes

- Documentation fixes

- Avoid performance regression due to SEV-ES patches

- ARM:
     - Don't allow tagged pointers to point to memslots
     - Filter out ARMv8.1+ PMU events on v8.0 hardware
     - Hide PMU registers from userspace when no PMU is configured
     - More PMU cleanups
     - Don't try to handle broken PSCI firmware
     - More sys_reg() to reg_to_encoding() conversions

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX
  KVM: x86: Revert "KVM: x86: Mark GPRs dirty when written"
  KVM: SVM: Unconditionally sync GPRs to GHCB on VMRUN of SEV-ES guest
  KVM: nVMX: Sync unsync'd vmcs02 state to vmcs12 on migration
  kvm: tracing: Fix unmatched kvm_entry and kvm_exit events
  KVM: Documentation: Update description of KVM_{GET,CLEAR}_DIRTY_LOG
  KVM: x86: get smi pending status correctly
  KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[]
  KVM: x86/pmu: Fix UBSAN shift-out-of-bounds warning in intel_pmu_refresh()
  KVM: x86: Add more protection against undefined behavior in rsvd_bits()
  KVM: Documentation: Fix spec for KVM_CAP_ENABLE_CAP_VM
  KVM: Forbid the use of tagged userspace addresses for memslots
  KVM: arm64: Filter out v8.1+ events on v8.0 HW
  KVM: arm64: Compute TPIDR_EL2 ignoring MTE tag
  KVM: arm64: Use the reg_to_encoding() macro instead of sys_reg()
  KVM: arm64: Allow PSCI SYSTEM_OFF/RESET to return
  KVM: arm64: Simplify handling of absent PMU system registers
  KVM: arm64: Hide PMU registers from userspace when not available

Merge tag 'spi-fix-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"One new device ID here, plus an error handling fix - nothing
  remarkable in either"

* tag 'spi-fix-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spidev: Add cisco device compatible
  spi: altera: Fix memory leak on error path

Merge tag 'regulator-fix-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
"The main thing here is a change to make sure that we don't try to
  double resolve the supply of a regulator if we have two probes going
  on simultaneously, plus an incremental fix on top of that to resolve a
  lockdep issue it introduced.

  There's also a patch from Dmitry Osipenko adding stubs for some
  functions to avoid build issues in consumers in some configurations"

* tag 'regulator-fix-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: Fix lockdep warning resolving supplies
  regulator: consumer: Add missing stubs to regulator/consumer.h
  regulator: core: avoid regulator_resolve_supply() race condition

parisc: Remove leftover reference to the power_tasklet

This was removed long ago, back in:

6e16d9409e1 ([PARISC] Convert soft power switch driver to kthread)

Signed-off-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

i40e: acquire VSI pointer only after VF is initialized

This change simplifies the VF initialization check and also minimizes
the delay between acquiring the VSI pointer and using it. As known by
the commit being fixed, there is a risk of the VSI pointer getting
changed. Therefore minimize the delay between getting and using the
pointer.

Fixes: 9889707b06ac ("i40e: Fix crash caused by stress setting of VF MAC addresses")
Signed-off-by: Stefan Assmann <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: Fix MSI-X vector fallback logic

The current MSI-X enablement logic tries to enable best-case MSI-X
vectors and if that fails we only support a bare-minimum set. This
includes a single MSI-X for 1 Tx and 1 Rx queue and a single MSI-X
for the OICR interrupt. Unfortunately, the driver fails to load when we
don't get as many MSI-X as requested for a couple reasons.

First, the code to allocate MSI-X in the driver tries to allocate
num_online_cpus() MSI-X for LAN traffic without caring about the number
of MSI-X actually enabled/requested from the kernel for LAN traffic.
So, when calling ice_get_res() for the PF VSI, it returns failure
because the number of available vectors is less than requested. Fix
this by not allowing the PF VSI to allocation more than
pf->num_lan_msix MSI-X vectors and pf->num_lan_msix Rx/Tx queues.
Limiting the number of queues is done because we don't want more than
1 Tx/Rx queue per interrupt due to performance conerns.

Second, the driver assigns pf->num_lan_msix = 2, to account for LAN
traffic and the OICR. However, pf->num_lan_msix is only meant for LAN
MSI-X. This is causing a failure when the PF VSI tries to
allocate/reserve the minimum pf->num_lan_msix because the OICR MSI-X has
already been reserved, so there may not be enough MSI-X vectors left.
Fix this by setting pf->num_lan_msix = 1 for the failure case. Then the
ICE_MIN_MSIX accounts for the LAN MSI-X and the OICR MSI-X needed for
the failure case.

Update the related defines used in ice_ena_msix_range() to align with
the above behavior and remove the unused RDMA defines because RDMA is
currently not supported. Also, remove the now incorrect comment.

Fixes: 152b978a1f90 ("ice: Rework ice_ena_msix_range")
Signed-off-by: Brett Creeley <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: Don't allow more channels than LAN MSI-X available

Currently users could create more channels than LAN MSI-X available.
This is happening because there is no check against pf->num_lan_msix
when checking the max allowed channels and will cause performance issues
if multiple Tx and Rx queues are tied to a single MSI-X. Fix this by not
allowing more channels than LAN MSI-X available in pf->num_lan_msix.

Fixes: 87324e747fde ("ice: Implement ethtool ops for channels")
Signed-off-by: Brett Creeley <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: update dev_addr in ice_set_mac_address even if HW filter exists

Fix the driver to copy the MAC address configured in ndo_set_mac_address
into dev_addr, even if the MAC filter already exists in HW. In some
situations (e.g. bonding) the netdev's dev_addr could have been modified
outside of the driver, with no change to the HW filter, so the driver
cannot assume that they match.

Fixes: 757976ab16be ("ice: Fix check for removing/adding mac filters")
Signed-off-by: Nick Nunley <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: Implement flow for IPv6 next header (extension header)

This patch is based on a similar change to i40e by Slawomir Laba:
"i40e: Implement flow for IPv6 next header (extension header)".

When a packet contains an IPv6 header with next header which is
an extension header and not a protocol one, the kernel function
skb_transport_header called with such sk_buff will return a
pointer to the extension header and not to the TCP one.

The above explained call caused a problem with packet processing
for skb with encapsulation for tunnel with ICE_TX_CTX_EIPT_IPV6.
The extension header was not skipped at all.

The ipv6_skip_exthdr function does check if next header of the IPV6
header is an extension header and doesn't modify the l4_proto pointer
if it points to a protocol header value so its safe to omit the
comparison of exthdr and l4.hdr pointers. The ipv6_skip_exthdr can
return value -1. This means that the skipping process failed
and there is something wrong with the packet so it will be dropped.

Fixes: a4e82a81f573 ("ice: Add support for tunnel offloads")
Signed-off-by: Nick Nunley <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: fix FDir IPv6 flexbyte

The packet classifier would occasionally misrecognize an IPv6 training
packet when the next protocol field was 0. The correct value for
unspecified protocol is IPPROTO_NONE.

Fixes: 165d80d6adab ("ice: Support IPv6 Flow Director filters")
Signed-off-by: Henry Tieman <[email protected]>
Reviewed-by: Paul Menzel <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

Revert "mm: fix initialization of struct page for holes in memory layout"

This reverts commit d3921cb8be29ce5668c64e23ffdaeec5f8c69399.

Chris Wilson reports that it causes boot problems:

"We have half a dozen or so different machines in CI that are silently
failing to boot, that we believe is bisected to this patch"

and the CI team confirmed that a revert fixed the issues.

The cause is unknown for now, so let's revert it.

Link: https://lore.kernel.org/lkml/[email protected]/
Reported-and-tested-by: Chris Wilson <[email protected]>
Acked-by: Mike Rapoport <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kconfig: mconf: fix HOSTCC call

Commit c0f975af1745 ("kconfig: Support building mconf with vendor
sysroot ncurses") introduces a bug when HOSTCC contains parameters:
the whole command line is treated as the program name (with spaces
in it). Therefore, we have to remove the quotes.

Fixes: c0f975af1745 ("kconfig: Support building mconf with vendor sysroot ncurses")
Signed-off-by: Enrico Weigelt, metux IT consult <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

media: hantro: Fix reset_raw_fmt initialization

raw_fmt->height in never initialized. But width in initialized twice.

Fixes: 88d06362d1d05 ("media: hantro: Refactor for V4L2 API spec compliancy")
Signed-off-by: Ricardo Ribalda <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
Cc: <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: cec: add stm32 driver

Missing stm32 directory to Makefile.

Signed-off-by: Yannick Fertre <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
Fixes: 4be5e8648b0c ("media: move CEC platform drivers to a separate directory")
Cc: <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: cedrus: Fix H264 decoding

During H264 API overhaul subtle bug was introduced Cedrus driver.
Progressive references have both, top and bottom reference flags set.
Cedrus reference list expects only bottom reference flag and only when
interlaced frames are decoded. However, due to a bug in Cedrus check,
exclusivity is not tested and that flag is set also for progressive
references. That causes "jumpy" background with many videos.

Fix that by checking that only bottom reference flag is set in control
and nothing else.

Tested-by: Andre Heider <[email protected]>
Fixes: cfc8c3ed533e ("media: cedrus: h264: Properly configure reference field")
Signed-off-by: Jernej Skrabec <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
Cc: <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: v4l2-subdev.h: BIT() is not available in userspace

The BIT macro is not available in userspace, so replace BIT(0) by
0x00000001.

Signed-off-by: Hans Verkuil <[email protected]>
Fixes: 6446ec6cbf46 ("media: v4l2-subdev: add VIDIOC_SUBDEV_QUERYCAP ioctl")
Cc: <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

arm64: Fix kernel address detection of __is_lm_address()

Currently, the __is_lm_address() check just masks out the top 12 bits
of the address, but if they are 0, it still yields a true result.
This has as a side effect that virt_addr_valid() returns true even for
invalid virtual addresses (e.g. 0x0).

Fix the detection checking that it's actually a kernel address starting
at PAGE_OFFSET.

Fixes: 68dd8ef32162 ("arm64: memory: Fix virt_addr_valid() using __is_lm_address()")
Cc: <[email protected]> # 5.4.x
Cc: Will Deacon <[email protected]>
Suggested-by: Catalin Marinas <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Signed-off-by: Vincenzo Frascino <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>

ALSA: hda/via: Apply the workaround generically for Clevo machines

We've got another report indicating a similar problem wrt the
power-saving behavior with VIA codec on Clevo machines. Let's apply
the existing workaround generically to all Clevo devices with VIA
codecs to cover all in once.

BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=1181330
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

bpf: Drop disabled LSM hooks from the sleepable set

Some networking and keys LSM hooks are conditionally enabled
and when building the new sleepable BPF LSM hooks with those
LSM hooks disabled, the following build error occurs:

BTFIDS vmlinux
FAILED unresolved symbol bpf_lsm_socket_socketpair

To fix the error, conditionally add the relevant networking/keys
LSM hooks to the sleepable set.

Fixes: 423f16108c9d8 ("bpf: Augment the set of sleepable LSM hooks")
Signed-off-by: Mikko Ylinen <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: KP Singh <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

Revert "arm64: dts: amlogic: add missing ethernet reset ID"

It has been reported on IRC and in KernelCI boot tests, this change breaks
internal PHY support on the Amlogic G12A/SM1 Based boards.

We suspect the added signal to reset more than the Ethernet MAC but also
the MDIO/(RG)MII mux used to redirect the MAC signals to the internal PHY.

This reverts commit f3362f0c18174a1f334a419ab7d567a36bd1b3f3 while we find
and acceptable solution to cleanly reset the Ethernet MAC.

Reported-by: Corentin Labbe <[email protected]>
Acked-by: Jérôme Brunet <[email protected]>
Signed-off-by: Neil Armstrong <[email protected]>
Signed-off-by: Kevin Hilman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

io_uring: fix cancellation taking mutex while TASK_UNINTERRUPTIBLE

do not call blocking ops when !TASK_RUNNING; state=2 set at
[<00000000ced9dbfc>] prepare_to_wait+0x1f4/0x3b0
kernel/sched/wait.c:262
WARNING: CPU: 1 PID: 19888 at kernel/sched/core.c:7853
__might_sleep+0xed/0x100 kernel/sched/core.c:7848
RIP: 0010:__might_sleep+0xed/0x100 kernel/sched/core.c:7848
Call Trace:
__mutex_lock_common+0xc4/0x2ef0 kernel/locking/mutex.c:935
__mutex_lock kernel/locking/mutex.c:1103 [inline]
mutex_lock_nested+0x1a/0x20 kernel/locking/mutex.c:1118
io_wq_submit_work+0x39a/0x720 fs/io_uring.c:6411
io_run_cancel fs/io-wq.c:856 [inline]
io_wqe_cancel_pending_work fs/io-wq.c:990 [inline]
io_wq_cancel_cb+0x614/0xcb0 fs/io-wq.c:1027
io_uring_cancel_files fs/io_uring.c:8874 [inline]
io_uring_cancel_task_requests fs/io_uring.c:8952 [inline]
__io_uring_files_cancel+0x115d/0x19e0 fs/io_uring.c:9038
io_uring_files_cancel include/linux/io_uring.h:51 [inline]
do_exit+0x2e6/0x2490 kernel/exit.c:780
do_group_exit+0x168/0x2d0 kernel/exit.c:922
get_signal+0x16b5/0x2030 kernel/signal.c:2770
arch_do_signal_or_restart+0x8e/0x6a0 arch/x86/kernel/signal.c:811
handle_signal_work kernel/entry/common.c:147 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0xac/0x1e0 kernel/entry/common.c:201
__syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
syscall_exit_to_user_mode+0x48/0x190 kernel/entry/common.c:302
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Rewrite io_uring_cancel_files() to mimic __io_uring_task_cancel()'s
counting scheme, so it does all the heavy work before setting
TASK_UNINTERRUPTIBLE.

Cc: [email protected] # 5.9+
Reported-by: [email protected]
Signed-off-by: Pavel Begunkov <[email protected]>
[axboe: fix inverted task check]
Signed-off-by: Jens Axboe <[email protected]>

io_uring: fix __io_uring_files_cancel() with TASK_UNINTERRUPTIBLE

If the tctx inflight number haven't changed because of cancellation,
__io_uring_task_cancel() will continue leaving the task in
TASK_UNINTERRUPTIBLE state, that's not expected by
__io_uring_files_cancel(). Ensure we always call finish_wait() before
retrying.

Cc: [email protected] # 5.9+
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

usb: xhci-mtk: fix unreleased bandwidth data

xhci-mtk needs XHCI_MTK_HOST quirk functions in add_endpoint() and
drop_endpoint() to handle its own sw bandwidth management.

It stores bandwidth data into an internal table every time
add_endpoint() is called, and drops those in drop_endpoint().
But when bandwidth allocation fails at one endpoint, all earlier
allocation from the same interface could still remain at the table.

This patch moves bandwidth management codes to check_bandwidth() and
reset_bandwidth() path. To do so, this patch also adds those functions
to xhci_driver_overrides and lets mtk-xhci to release all failed
endpoints in reset_bandwidth() path.

Fixes: 08e469de87a2 ("usb: xhci-mtk: supports bandwidth scheduling with multi-TT")
Signed-off-by: Ikjoon Jang <[email protected]>
Link: https://lore.kernel.org/r/20210113180444.v6.1.Id0d31b5f3ddf5e734d2ab11161ac5821921b1e1e@changeid
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: gadget: aspeed: add missing of_node_put

Breaking out of for_each_child_of_node requires a put on the
child value.

Generated by: scripts/coccinelle/iterators/for_each_child.cocci

Fixes: 82c2d81361ec ("coccinelle: iterators: Add for_each_child.cocci script")
CC: Sumera Priyadarsini <[email protected]>
Reported-by: kernel test robot <[email protected]>
Signed-off-by: kernel test robot <[email protected]>
Signed-off-by: Julia Lawall <[email protected]>
Cc: stable <[email protected]>
Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2101211907060.14700@hadrien
Signed-off-by: Greg Kroah-Hartman <[email protected]>

USB: usblp: don't call usb_set_interface if there's a single alt

Some devices, such as the Winbond Electronics Corp. Virtual Com Port
(Vendor=0416, ProdId=5011), lockup when usb_set_interface() or
usb_clear_halt() are called. This device has only a single
altsetting, so it should not be necessary to call usb_set_interface().

Acked-by: Pete Zaitcev <[email protected]>
Signed-off-by: Jeremy Figgins <[email protected]>
Link: https://lore.kernel.org/r/YAy9kJhM/rG8EQXC@watson
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

futex: Handle faults correctly for PI futexes

fixup_pi_state_owner() tries to ensure that the state of the rtmutex,
pi_state and the user space value related to the PI futex are consistent
before returning to user space. In case that the user space value update
faults and the fault cannot be resolved by faulting the page in via
fault_in_user_writeable() the function returns with -EFAULT and leaves
the rtmutex and pi_state owner state inconsistent.

A subsequent futex_unlock_pi() operates on the inconsistent pi_state and
releases the rtmutex despite not owning it which can corrupt the RB tree of
the rtmutex and cause a subsequent kernel stack use after free.

It was suggested to loop forever in fixup_pi_state_owner() if the fault
cannot be resolved, but that results in runaway tasks which is especially
undesired when the problem happens due to a programming error and not due
to malice.

As the user space value cannot be fixed up, the proper solution is to make
the rtmutex and the pi_state consistent so both have the same owner. This
leaves the user space value out of sync. Any subsequent operation on the
futex will fail because the 10th rule of PI futexes (pi_state owner and
user space value are consistent) has been violated.

As a consequence this removes the inept attempts of 'fixing' the situation
in case that the current task owns the rtmutex when returning with an
unresolvable fault by unlocking the rtmutex which left pi_state::owner and
rtmutex::owner out of sync in a different and only slightly less dangerous
way.

Fixes: 1b7558e457ed ("futexes: fix fault handling in futex_lock_pi")
Reported-by: [email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]

futex: Simplify fixup_pi_state_owner()

Too many gotos already and an upcoming fix would make it even more
unreadable.

Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]

futex: Use pi_state_update_owner() in put_pi_state()

No point in open coding it. This way it gains the extra sanity checks.

Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]

rtmutex: Remove unused argument from rt_mutex_proxy_unlock()

Nothing uses the argument. Remove it as preparation to use
pi_state_update_owner().

Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]

futex: Provide and use pi_state_update_owner()

Updating pi_state::owner is done at several places with the same
code. Provide a function for it and use that at the obvious places.

This is also a preparation for a bug fix to avoid yet another copy of the
same code or alternatively introducing a completely unpenetratable mess of
gotos.

Originally-by: Peter Zijlstra <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]

futex: Replace pointless printk in fixup_owner()

If that unexpected case of inconsistent arguments ever happens then the
futex state is left completely inconsistent and the printk is not really
helpful. Replace it with a warning and make the state consistent.

Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]

futex: Ensure the correct return value from futex_lock_pi()

In case that futex_lock_pi() was aborted by a signal or a timeout and the
task returned without acquiring the rtmutex, but is the designated owner of
the futex due to a concurrent futex_unlock_pi() fixup_owner() is invoked to
establish consistent state. In that case it invokes fixup_pi_state_owner()
which in turn tries to acquire the rtmutex again. If that succeeds then it
does not propagate this success to fixup_owner() and futex_lock_pi()
returns -EINTR or -ETIMEOUT despite having the futex locked.

Return success from fixup_pi_state_owner() in all cases where the current
task owns the rtmutex and therefore the futex and propagate it correctly
through fixup_owner(). Fixup the other callsite which does not expect a
positive return value.

Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]

drm/i915/gt: Always try to reserve GGTT address 0x0

Since writing to address 0 is a very common mistake, let's try to avoid
putting anything sensitive there.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/2989
Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Cc: [email protected]
(cherry picked from commit 56b429cc584c6ed8b895d8d8540959655db1ff73)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915: Always flush the active worker before returning from the wait

The first thing the active retirement worker does is decrement the
i915_active count.

The first thing we do during i915_active_wait is try to increment the
i915_active count, but only if already active [non-zero].

The wait may see that the retirement is already started and so marked the
i915_active as idle, and skip waiting for the retirement handler.
However, the caller of i915_active_wait may immediately free the
i915_active upon returning (e.g. i915_vma_destroy) so we must not return
before the concurrent access from the worker is completed. We must
always flush the worker.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2473
Fixes: 274cbf20fd10 ("drm/i915: Push the i915_active.retire into a worker")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: <[email protected]> # v5.5+
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 977a372e972cb42799746c284035a33c64ebace9)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915/selftest: Fix potential memory leak

Object out is not released on path that no VMA instance found. The root
cause is jumping to an unexpected label on the error path.

Fixes: a47e788c2310 ("drm/i915/selftests: Exercise CS TLB invalidation")
Signed-off-by: Pan Bian <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 2b015017d5cb01477a79ca184ac25c247d664568)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915: Check for all subplatform bits

Current code is checking only 2 bits in the subplatform, but actually 3
bits are allocated for the field. Check all 3 bits.

Fixes: 805446c8347c ("drm/i915: Introduce concept of a sub-platform")
Cc: Tvrtko Ursulin <[email protected]>
Signed-off-by: Umesh Nerlige Ramappa <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 27b695ee1af9bb36605e67055874ec081306ac28)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915: Fix ICL MG PHY vswing handling

The MH PHY vswing table does have all the entries these days. Get
rid of the old hacks in the code which claim otherwise.

This hack was totally bogus anyway. The correct way to handle the
lack of those two entries would have been to declare our max
vswing and pre-emph to both be level 2.

Cc: José Roberto de Souza <[email protected]>
Cc: Clinton Taylor <[email protected]>
Fixes: 9f7ffa297978 ("drm/i915/tc/icl: Update TC vswing tables")
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Imre Deak <[email protected]>
Reviewed-by: José Roberto de Souza <[email protected]>
(cherry picked from commit 5ec346476e795089b7dac8ab9dcee30c8d80ad84)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915/gt: Clear CACHE_MODE prior to clearing residuals

Since we do a bare context switch with no restore, the clear residual
kernel runs on dirty state, and we must be careful to avoid executing
with bad state from context registers inherited from a malicious client.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2955
Fixes: 09aa9e45863e ("drm/i915/gt: Restore clear-residual mitigations for Ivybridge, Baytrail")
Testcase: igt/gem_ctx_isolation # ivb,vlv
Signed-off-by: Chris Wilson <[email protected]>
Cc: Mika Kuoppala <[email protected]>
Cc: Akeem G Abodunrin <[email protected]>
Reviewed-by: Akeem G Abodunrin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit ace44e13e577c2ae59980e9a6ff5ca253b1cf831)
Signed-off-by: Jani Nikula <[email protected]>

Merge tag 'asoc-fix-v5.11-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v5.11

More fixes for v5.11, almost all driver specific issues including new
device IDs - there's one error handling fix for the topology stuff too.

staging: rtl8723bs: fix wireless regulatory API misuse

This code ends up calling wiphy_apply_custom_regulatory(), for which
we document that it should be called before wiphy_register(). This
driver doesn't do that, but calls it from ndo_open() with the RTNL
held, which caused deadlocks.

Since the driver just registers static regdomain data and then the
notifier applies the channel changes if any, there's no reason for
it to call this in ndo_open(), move it earlier to fix the deadlock.

Reported-and-tested-by: Hans de Goede <[email protected]>
Fixes: 51d62f2f2c50 ("cfg80211: Save the regulatory domain with a lock")
Link: https://lore.kernel.org/r/20210126115409.d5fd6f8fe042.Ib5823a6feb2e2aa01ca1a565d2505367f38ad246@changeid
Acked-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>

mac80211: pause TX while changing interface type

syzbot reported a crash that happened when changing the interface
type around a lot, and while it might have been easy to fix just
the symptom there, a little deeper investigation found that really
the reason is that we allowed packets to be transmitted while in
the middle of changing the interface type.

Disallow TX by stopping the queues while changing the type.

Fixes: 34d4bc4d41d2 ("mac80211: support runtime interface type changes")
Reported-by: [email protected]
Link: https://lore.kernel.org/r/20210122171115.b321f98f4d4f.I6997841933c17b093535c31d29355be3c0c39628@changeid
Signed-off-by: Johannes Berg <[email protected]>

wext: fix NULL-ptr-dereference with cfg80211's lack of commit()

Since cfg80211 doesn't implement commit, we never really cared about
that code there (and it's configured out w/o CONFIG_WIRELESS_EXT).
After all, since it has no commit, it shouldn't return -EIWCOMMIT to
indicate commit is needed.

However, EIWCOMMIT is actually an alias for EINPROGRESS, which _can_
happen if e.g. we try to change the frequency but we're already in
the process of connecting to some network, and drivers could return
that value (or even cfg80211 itself might).

This then causes us to crash because dev->wireless_handlers is NULL
but we try to check dev->wireless_handlers->standard[0].

Fix this by also checking dev->wireless_handlers. Also simplify the
code a little bit.

Cc: [email protected]
Reported-by: [email protected]
Reported-by: [email protected]
Link: https://lore.kernel.org/r/20210121171621.2076e4a37d5a.I5d9c72220fe7bb133fb718751da0180a57ecba4e@changeid
Signed-off-by: Johannes Berg <[email protected]>

HID: wacom: Correct NULL dereference on AES pen proximity

The recent commit to fix a memory leak introduced an inadvertant NULL
pointer dereference. The `wacom_wac->pen_fifo` variable was never
intialized, resuling in a crash whenever functions tried to use it.
Since the FIFO is only used by AES pens (to buffer events from pen
proximity until the hardware reports the pen serial number) this would
have been easily overlooked without testing an AES device.

This patch converts `wacom_wac->pen_fifo` over to a pointer (since the
call to `devres_alloc` allocates memory for us) and ensures that we assign
it to point to the allocated and initalized `pen_fifo` before the function
returns.

Link: https://github.com/linuxwacom/input-wacom/issues/230
Fixes: 37309f47e2f5 ("HID: wacom: Fix memory leakage caused by kfifo_alloc")
CC: [email protected] # v4.19+
Signed-off-by: Jason Gerecke <[email protected]>
Tested-by: Ping Cheng <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>

xen-blkfront: allow discard-* nodes to be optional

This is inline with the specification described in blkif.h:

* discard-granularity: should be set to the physical block size if
node is not present.
* discard-alignment, discard-secure: should be set to 0 if node not
present.

This was detected as QEMU would only create the discard-granularity
node but not discard-alignment, and thus the setup done in
blkfront_setup_discard would fail.

Fix blkfront_setup_discard to not fail on missing nodes, and also fix
blkif_set_queue_limits to set the discard granularity to the physical
block size if none is specified in xenbus.

Fixes: ed30bf317c5ce ('xen-blkfront: Handle discard requests.')
Reported-by: Arthur Borsboom <[email protected]>
Signed-off-by: Roger Pau Monné <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Tested-By: Arthur Borsboom <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>

ecryptfs: fix uid translation for setxattr on security.capability

Prior to commit 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into
vfs_setxattr()") the translation of nscap->rootid did not take stacked
filesystems (overlayfs and ecryptfs) into account.

That patch fixed the overlay case, but made the ecryptfs case worse.

Restore old the behavior for ecryptfs that existed before the overlayfs
fix. This does not fix ecryptfs's handling of complex user namespace
setups, but it does make sure existing setups don't regress.

Reported-by: Eric W. Biederman <[email protected]>
Cc: Tyler Hicks <[email protected]>
Fixes: 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into vfs_setxattr()")
Signed-off-by: Miklos Szeredi <[email protected]>
Signed-off-by: Tyler Hicks <[email protected]>

KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX

VMX also uses KVM_REQ_GET_NESTED_STATE_PAGES for the Hyper-V eVMCS,
which may need to be loaded outside guest mode. Therefore we cannot
WARN in that case.

However, that part of nested_get_vmcs12_pages is _not_ needed at
vmentry time. Split it out of KVM_REQ_GET_NESTED_STATE_PAGES handling,
so that both vmentry and migration (and in the latter case, independent
of is_guest_mode) do the parts that are needed.

Cc: <[email protected]> # 5.10.x: f2c7ef3ba: KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES
Cc: <[email protected]> # 5.10.x
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: Revert "KVM: x86: Mark GPRs dirty when written"

Revert the dirty/available tracking of GPRs now that KVM copies the GPRs
to the GHCB on any post-VMGEXIT VMRUN, even if a GPR is not dirty. Per
commit de3cd117ed2f ("KVM: x86: Omit caching logic for always-available
GPRs"), tracking for GPRs noticeably impacts KVM's code footprint.

This reverts commit 1c04d8c986567c27c56c05205dceadc92efb14ff.

Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20210122235049.3107620 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: SVM: Unconditionally sync GPRs to GHCB on VMRUN of SEV-ES guest

Drop the per-GPR dirty checks when synchronizing GPRs to the GHCB, the
GRPs' dirty bits are set from time zero and never cleared, i.e. will
always be seen as dirty.  The obvious alternative would be to clear
the dirty bits when appropriate, but removing the dirty checks is
desirable as it allows reverting GPR dirty+available tracking, which
adds overhead to all flavors of x86 VMs.

Note, unconditionally writing the GPRs in the GHCB is tacitly allowed
by the GHCB spec, which allows the hypervisor (or guest) to provide
unnecessary info; it's the guest's responsibility to consume only what
it needs (the hypervisor is untrusted after all).

  The guest and hypervisor can supply additional state if desired but
  must not rely on that additional state being provided.

Cc: Brijesh Singh <[email protected]>
Cc: Tom Lendacky <[email protected]>
Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20210122235049.3107620 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: nVMX: Sync unsync'd vmcs02 state to vmcs12 on migration

Even when we are outside the nested guest, some vmcs02 fields
may not be in sync vs vmcs12. This is intentional, even across
nested VM-exit, because the sync can be delayed until the nested
hypervisor performs a VMCLEAR or a VMREAD/VMWRITE that affects those
rarely accessed fields.

However, during KVM_GET_NESTED_STATE, the vmcs12 has to be up to date to
be able to restore it. To fix that, call copy_vmcs02_to_vmcs12_rare()
before the vmcs12 contents are copied to userspace.

Fixes: 7952d769c29ca ("KVM: nVMX: Sync rarely accessed guest fields only when needed")
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Maxim Levitsky <[email protected]>
Message-Id: <20210114205449 [email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>

kvm: tracing: Fix unmatched kvm_entry and kvm_exit events

On VMX, if we exit and then re-enter immediately without leaving
the vmx_vcpu_run() function, the kvm_entry event is not logged.
That means we will see one (or more) kvm_exit, without its (their)
corresponding kvm_entry, as shown here:

CPU-1979 [002] 89.871187: kvm_entry: vcpu 1
CPU-1979 [002] 89.871218: kvm_exit:  reason MSR_WRITE
CPU-1979 [002] 89.871259: kvm_exit:  reason MSR_WRITE

It also seems possible for a kvm_entry event to be logged, but then
we leave vmx_vcpu_run() right away (if vmx->emulation_required is
true). In this case, we will have a spurious kvm_entry event in the
trace.

Fix these situations by moving trace_kvm_entry() inside vmx_vcpu_run()
(where trace_kvm_exit() already is).

A trace obtained with this patch applied looks like this:

CPU-14295 [000] 8388.395387: kvm_entry: vcpu 0
CPU-14295 [000] 8388.395392: kvm_exit:  reason MSR_WRITE
CPU-14295 [000] 8388.395393: kvm_entry: vcpu 0
CPU-14295 [000] 8388.395503: kvm_exit:  reason EXTERNAL_INTERRUPT

Of course, not calling trace_kvm_entry() in common x86 code any
longer means that we need to adjust the SVM side of things too.

Signed-off-by: Lorenzo Brescia <[email protected]>
Signed-off-by: Dario Faggioli <[email protected]>
Message-Id: <160873470698.11652.13483635328769030605.stgit@Wayrath>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: Documentation: Update description of KVM_{GET,CLEAR}_DIRTY_LOG

Update various words, including the wrong parameter name and the vague
description of the usage of "slot" field.

Signed-off-by: Zenghui Yu <[email protected]>
Message-Id: <20201208043439 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: get smi pending status correctly

The injection process of smi has two steps:

    Qemu                        KVM
Step1:
    cpu->interrupt_request &= \
        ~CPU_INTERRUPT_SMI;
    kvm_vcpu_ioctl(cpu, KVM_SMI)

                                call kvm_vcpu_ioctl_smi() and
                                kvm_make_request(KVM_REQ_SMI, vcpu);

Step2:
    kvm_vcpu_ioctl(cpu, KVM_RUN, 0)

                                call process_smi() if
                                kvm_check_request(KVM_REQ_SMI, vcpu) is
                                true, mark vcpu->arch.smi_pending = true;

The vcpu->arch.smi_pending will be set true in step2, unfortunately if
vcpu paused between step1 and step2, the kvm_run->immediate_exit will be
set and vcpu has to exit to Qemu immediately during step2 before mark
vcpu->arch.smi_pending true.
During VM migration, Qemu will get the smi pending status from KVM using
KVM_GET_VCPU_EVENTS ioctl at the downtime, then the smi pending status
will be lost.

Signed-off-by: Jay Zhou <[email protected]>
Signed-off-by: Shengen Zhuang <[email protected]>
Message-Id: <20210118084720 [email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[]

The HW_REF_CPU_CYCLES event on the fixed counter 2 is pseudo-encoded as
0x0300 in the intel_perfmon_event_map[]. Correct its usage.

Fixes: 62079d8a4312 ("KVM: PMU: add proper support for fixed counter 2")
Signed-off-by: Like Xu <[email protected]>
Message-Id: <20201230081916 [email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86/pmu: Fix UBSAN shift-out-of-bounds warning in intel_pmu_refresh()

Since we know vPMU will not work properly when (1) the guest bit_width(s)
of the [gp|fixed] counters are greater than the host ones, or (2) guest
requested architectural events exceeds the range supported by the host, so
we can setup a smaller left shift value and refresh the guest cpuid entry,
thus fixing the following UBSAN shift-out-of-bounds warning:

shift exponent 197 is too large for 64-bit type 'long long unsigned int'

Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x107/0x163 lib/dump_stack.c:120
ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
__ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
intel_pmu_refresh.cold+0x75/0x99 arch/x86/kvm/vmx/pmu_intel.c:348
kvm_vcpu_after_set_cpuid+0x65a/0xf80 arch/x86/kvm/cpuid.c:177
kvm_vcpu_ioctl_set_cpuid2+0x160/0x440 arch/x86/kvm/cpuid.c:308
kvm_arch_vcpu_ioctl+0x11b6/0x2d70 arch/x86/kvm/x86.c:4709
kvm_vcpu_ioctl+0x7b9/0xdb0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3386
vfs_ioctl fs/ioctl.c:48 [inline]
__do_sys_ioctl fs/ioctl.c:753 [inline]
__se_sys_ioctl fs/ioctl.c:739 [inline]
__x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported-by: [email protected]
Signed-off-by: Like Xu <[email protected]>
Message-Id: <20210118025800 [email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: Add more protection against undefined behavior in rsvd_bits()

Add compile-time asserts in rsvd_bits() to guard against KVM passing in
garbage hardcoded values, and cap the upper bound at '63' for dynamic
values to prevent generating a mask that would overflow a u64.

Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20210113204515.3473079 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: Documentation: Fix spec for KVM_CAP_ENABLE_CAP_VM

The documentation classifies KVM_ENABLE_CAP with KVM_CAP_ENABLE_CAP_VM
as a vcpu ioctl, which is incorrect. Fix it by specifying it as a VM
ioctl.

Fixes: e5d83c74a580 ("kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnostic")
Signed-off-by: Quentin Perret <[email protected]>
Message-Id: <20210108165349 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge tag 'kvmarm-fixes-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 5.11, take #2

- Don't allow tagged pointers to point to memslots
- Filter out ARMv8.1+ PMU events on v8.0 hardware
- Hide PMU registers from userspace when no PMU is configured
- More PMU cleanups
- Don't try to handle broken PSCI firmware
- More sys_reg() to reg_to_encoding() conversions

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
"Fix a regression in the cesa driver"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: marvel/cesa - Fix tdma descriptor on 64-bit

uapi: fix big endian definition of ipv6_rpl_sr_hdr

Following RFC 6554 [1], the current order of fields is wrong for big
endian definition. Indeed, here is how the header looks like:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Next Header  |  Hdr Ext Len  | Routing Type  | Segments Left |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CmprI | CmprE |  Pad  |               Reserved                |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

This patch reorders fields so that big endian definition is now correct.

  [1] https://tools.ietf.org/html/rfc6554#section-3

Fixes: cfa933d938d8 ("include: uapi: linux: add rpl sr header definition")
Signed-off-by: Justin Iurman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

dt-bindings: usb: j721e: add ranges and dma-coherent props

Add missed 'ranges' and 'dma-coherent' properties as cdns-usb DT nodes has
child node and DMA IO is coherent on TI K3 J721E/J7200 SoCs.

This also fixes dtbs_check warning:
cdns-usb@4104000: 'dma-coherent', 'ranges' do not match any of the regexes: '^usb@', 'pinctrl-[0-9]+'

Signed-off-by: Grygorii Strashko <[email protected]>
Acked-by: Aswath Govindraju <[email protected]>
Reviewed-by: Aswath Govindraju <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>

arm64: dts: rockchip: Disable display for NanoPi R2S

NanoPi R2S is headless, so rightly does not enable any of the display
interface hardware, which currently provokes an obnoxious error in the
boot log from the fake DRM device failing to find anything to bind to.
It probably isn't *too* hard to obviate the fake device shenanigans
entirely with a bit of driver reshuffling, but for now let's just
disable it here to shut up the spurious error.

Signed-off-by: Robin Murphy <[email protected]>
Link: https://lore.kernel.org/r/c4553dfad1ad6792c4f22454c135ff55de77e2d6.1611186099.git.robin.murphy@arm.com
Signed-off-by: Heiko Stuebner <[email protected]>

doc: gcc-plugins: update gcc-plugins.rst

This document was written a long time ago. Update it.

[1] Drop the version information

The range of the supported GCC versions are always changing. The
current minimal GCC version is 4.9, and commit 1e860048c53e
("gcc-plugins: simplify GCC plugin-dev capability test") removed the
old code accordingly.

We do not need to mention specific version ranges like "all gcc versions
from 4.5 to 6.0" since we forget to update the documentation when we
raise the minimal compiler version.

[2] Drop the C compiler statements

Since commit 77342a02ff6e ("gcc-plugins: drop support for GCC <= 4.7")
the GCC plugin infrastructure only supports g++.

[3] Drop supported architectures

As of v5.11-rc4, the infrastructure supports more architectures;
arm, arm64, mips, powerpc, riscv, s390, um, and x86. (just grep
"select HAVE_GCC_PLUGINS") Again, we miss to update this document when a
new architecture is supported. Let's just say "only some architectures".

[4] Update the apt-get example

We are now discussing to bump the minimal version to GCC 5. The GCC 4.9
support will be removed sooner or later. Change the package example to
gcc-10-plugin-dev while we are here.

[5] Update the build target

Since commit ce2fd53a10c7 ("kbuild: descend into scripts/gcc-plugins/
via scripts/Makefile"), "make gcc-plugins" is not supported.
"make scripts" builds all the enabled plugins, including some other
tools.

[6] Update the steps for adding a new plugin

At first, all CONFIG options for GCC plugins were located in arch/Kconfig.
After commit 45332b1bdfdc ("gcc-plugins: split out Kconfig entries to
scripts/gcc-plugins/Kconfig"), scripts/gcc-plugins/Kconfig became the
central place to collect plugin CONFIG options. In my understanding,
this requirement no longer exists because commit 9f671e58159a ("security:
Create "kernel hardening" config area") moved some of plugin CONFIG
options to another file. Find an appropriate place to add the new CONFIG.

The sub-directory support was never used by anyone, and removed by
commit c17d6179ad5a ("gcc-plugins: remove unused GCC_PLUGIN_SUBDIR").

Remove the useless $(src)/ prefix.

Signed-off-by: Masahiro Yamada <[email protected]>

platform/x86: hp-wmi: Disable tablet-mode reporting by default

Recently userspace has started making more use of SW_TABLET_MODE
(when an input-dev reports this).

Specifically recent GNOME3 versions will:

1.  When SW_TABLET_MODE is reported and is reporting 0:
1.1 Disable accelerometer-based screen auto-rotation
1.2 Disable automatically showing the on-screen keyboard when a
    text-input field is focussed

2.  When SW_TABLET_MODE is reported and is reporting 1:
2.1 Ignore input-events from the builtin keyboard and touchpad
    (this is for 360° hinges style 2-in-1s where the keyboard and
     touchpads are accessible on the back of the tablet when folded
     into tablet-mode)

This means that claiming to support SW_TABLET_MODE when it does not
actually work / reports correct values has bad side-effects.

The check in the hp-wmi code which is used to decide if the input-dev
should claim SW_TABLET_MODE support, only checks if the
HPWMI_HARDWARE_QUERY is supported. It does *not* check if the hardware
actually is capable of reporting SW_TABLET_MODE.

This leads to the hp-wmi input-dev claiming SW_TABLET_MODE support,
while in reality it will always report 0 as SW_TABLET_MODE value.
This has been seen on a "HP ENVY x360 Convertible 15-cp0xxx" and
this likely is the case on a whole lot of other HP models.

This problem causes both auto-rotation and on-screen keyboard
support to not work on affected x360 models.

There is no easy fix for this, but since userspace expects
SW_TABLET_MODE reporting to be reliable when advertised it is
better to not claim/report SW_TABLET_MODE support at all, then
to claim to support it while it does not work.

To avoid the mentioned problems, add a new enable_tablet_mode_sw
module-parameter which defaults to false.

Note I've made this an int using the standard -1=auto, 0=off, 1=on
triplett, with the hope that in the future we can come up with a
better way to detect SW_TABLET_MODE support. ATM the default
auto option just does the same as off.

BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1918255
Cc: Stefan Brüns <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Acked-by: Mark Gross <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

SUNRPC: Handle 0 length opaque XDR object data properly

When handling an auth_gss downcall, it's possible to get 0-length
opaque object for the acceptor.  In the case of a 0-length XDR
object, make sure simple_get_netobj() fills in dest->data = NULL,
and does not continue to kmemdup() which will set
dest->data = ZERO_SIZE_PTR for the acceptor.

The trace event code can handle NULL but not ZERO_SIZE_PTR for a
string, and so without this patch the rpcgss_context trace event
will crash the kernel as follows:

[  162.887992] BUG: kernel NULL pointer dereference, address: 0000000000000010
[  162.898693] #PF: supervisor read access in kernel mode
[  162.900830] #PF: error_code(0x0000) - not-present page
[  162.902940] PGD 0 P4D 0
[  162.904027] Oops: 0000 [#1] SMP PTI
[  162.905493] CPU: 4 PID: 4321 Comm: rpc.gssd Kdump: loaded Not tainted 5.10.0 #133
[  162.908548] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  162.910978] RIP: 0010:strlen+0x0/0x20
[  162.912505] Code: 48 89 f9 74 09 48 83 c1 01 80 39 00 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee c3 0f 1f 80 00 00 00 00 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
[  162.920101] RSP: 0018:ffffaec900c77d90 EFLAGS: 00010202
[  162.922263] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000fffde697
[  162.925158] RDX: 000000000000002f RSI: 0000000000000080 RDI: 0000000000000010
[  162.928073] RBP: 0000000000000010 R08: 0000000000000e10 R09: 0000000000000000
[  162.930976] R10: ffff8e698a590cb8 R11: 0000000000000001 R12: 0000000000000e10
[  162.933883] R13: 00000000fffde697 R14: 000000010034d517 R15: 0000000000070028
[  162.936777] FS:  00007f1e1eb93700(0000) GS:ffff8e6ab7d00000(0000) knlGS:0000000000000000
[  162.940067] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  162.942417] CR2: 0000000000000010 CR3: 0000000104eba000 CR4: 00000000000406e0
[  162.945300] Call Trace:
[  162.946428]  trace_event_raw_event_rpcgss_context+0x84/0x140 [auth_rpcgss]
[  162.949308]  ? __kmalloc_track_caller+0x35/0x5a0
[  162.951224]  ? gss_pipe_downcall+0x3a3/0x6a0 [auth_rpcgss]
[  162.953484]  gss_pipe_downcall+0x585/0x6a0 [auth_rpcgss]
[  162.955953]  rpc_pipe_write+0x58/0x70 [sunrpc]
[  162.957849]  vfs_write+0xcb/0x2c0
[  162.959264]  ksys_write+0x68/0xe0
[  162.960706]  do_syscall_64+0x33/0x40
[  162.962238]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  162.964346] RIP: 0033:0x7f1e1f1e57df

Signed-off-by: Dave Wysochanski <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>

SUNRPC: Move simple_get_bytes and simple_get_netobj into private header

Remove duplicated helper functions to parse opaque XDR objects
and place inside new file net/sunrpc/auth_gss/auth_gss_internal.h.
In the new file carry the license and copyright from the source file
net/sunrpc/auth_gss/auth_gss.c. Finally, update the comment inside
include/linux/sunrpc/xdr.h since lockd is not the only user of
struct xdr_netobj.

Signed-off-by: Dave Wysochanski <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>

fs/pipe: allow sendfile() to pipe again

After commit 36e2c7421f02 ("fs: don't allow splice read/write
without explicit ops") sendfile() could no longer send data
from a real file to a pipe, breaking for example certain cgit
setups (e.g. when running behind fcgiwrap), because in this
case cgit will try to do exactly this: sendfile() to a pipe.

Fix this by using iter_file_splice_write for the splice_write
method of pipes, as suggested by Christoph.

Cc: [email protected]
Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops")
Suggested-by: Christoph Hellwig <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Johannes Berg <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>