Git Repo - linux.git/log

s390: uv: Fix sysfs max number of VCPUs reporting

The number reported by the query is N-1 and I think people reading the
sysfs file would expect N instead. For users creating VMs there's no
actual difference because KVM's limit is currently below the UV's
limit.

Signed-off-by: Janosch Frank <[email protected]>
Fixes: a0f60f8431999 ("s390/protvirt: Add sysfs firmware interface for Ultravisor information")
Cc: [email protected]
Reviewed-by: Claudio Imbrenda <[email protected]>
Acked-by: Cornelia Huck <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/vfio-ap: No need to disable IRQ after queue reset

The queues assigned to a matrix mediated device are currently reset when:

* The VFIO_DEVICE_RESET ioctl is invoked
* The mdev fd is closed by userspace (QEMU)
* The mdev is removed from sysfs.

Immediately after the reset of a queue, a call is made to disable
interrupts for the queue. This is entirely unnecessary because the reset of
a queue disables interrupts, so this will be removed.

Furthermore, vfio_ap_irq_disable() does an unconditional PQAP/AQIC which
can result in a specification exception (when the corresponding facility
is not available), so this is actually a bugfix.

Signed-off-by: Tony Krowiak <[email protected]>
[[email protected]: minor rework before merging]
Signed-off-by: Halil Pasic <[email protected]>
Fixes: ec89b55e3bce ("s390: ap: implement PAPQ AQIC interception in kernel")
Cc: <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated

The vfio_ap device driver registers a group notifier with VFIO when the
file descriptor for a VFIO mediated device for a KVM guest is opened to
receive notification that the KVM pointer is set (VFIO_GROUP_NOTIFY_SET_KVM
event). When the KVM pointer is set, the vfio_ap driver takes the
following actions:
1. Stashes the KVM pointer in the vfio_ap_mdev struct that holds the state
   of the mediated device.
2. Calls the kvm_get_kvm() function to increment its reference counter.
3. Sets the function pointer to the function that handles interception of
   the instruction that enables/disables interrupt processing.
4. Sets the masks in the KVM guest's CRYCB to pass AP resources through to
   the guest.

In order to avoid memory leaks, when the notifier is called to receive
notification that the KVM pointer has been set to NULL, the vfio_ap device
driver should reverse the actions taken when the KVM pointer was set.

Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")
Signed-off-by: Tony Krowiak <[email protected]>
Reviewed-by: Halil Pasic <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Borntraeger <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

can: dev: prevent potential information leak in can_fill_info()

The "bec" struct isn't necessarily always initialized. For example, the
mcp251xfd_get_berr_counter() function doesn't initialize anything if the
interface is down.

Fixes: 52c793f24054 ("can: netlink support for bus-error reporting and counters")
Link: https://lore.kernel.org/r/YAkaRdRJncsJO8Ve@mwanda
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

rtc: mc146818: Detect and handle broken RTCs

The recent fix for handling the UIP bit unearthed another issue in the RTC
code. If the RTC is advertised but the readout is straight 0xFF because
it's not available, the old code just proceeded with crappy values, but the
new code hangs because it waits for the UIP bit to become low.

Add a sanity check in the RTC CMOS probe function which reads the RTC_VALID
register (Register D) which should have bit 0-6 cleared. If that's not the
case then fail to register the CMOS.

Add the same check to mc146818_get_time(), warn once when the condition
is true and invalidate the rtc_time data.

Reported-by: Mickaël Salaün <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Mickaël Salaün <[email protected]>
Acked-by: Alexandre Belloni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

xen: Fix XenStore initialisation for XS_LOCAL

In commit 3499ba8198ca ("xen: Fix event channel callback via INTX/GSI")
I reworked the triggering of xenbus_probe().

I tried to simplify things by taking out the workqueue based startup
triggered from wake_waiting(); the somewhat poorly named xenbus IRQ
handler.

I missed the fact that in the XS_LOCAL case (Dom0 starting its own
xenstored or xenstore-stubdom, which happens after the kernel is booted
completely), that IRQ-based trigger is still actually needed.

So... put it back, except more cleanly. By just spawning a xenbus_probe
thread which waits on xb_waitq and runs the probe the first time it
gets woken, just as the workqueue-based hack did.

This is actually a nicer approach for *all* the back ends with different
interrupt methods, and we can switch them all over to that without the
complex conditions for when to trigger it. But not in -rc6. This is
the minimal fix for the regression, although it's a step in the right
direction instead of doing a partial revert and actually putting the
workqueue back. It's also simpler than the workqueue.

Fixes: 3499ba8198ca ("xen: Fix event channel callback via INTX/GSI")
Reported-by: Juergen Gross <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>

io_uring: fix wqe->lock/completion_lock deadlock

Joseph reports following deadlock:

CPU0:
...
io_kill_linked_timeout  // &ctx->completion_lock
io_commit_cqring
__io_queue_deferred
__io_queue_async_work
io_wq_enqueue
io_wqe_enqueue  // &wqe->lock

CPU1:
...
__io_uring_files_cancel
io_wq_cancel_cb
io_wqe_cancel_pending_work  // &wqe->lock
io_cancel_task_cb  // &ctx->completion_lock

Only __io_queue_deferred() calls queue_async_work() while holding
ctx->completion_lock, enqueue drained requests via io_req_task_queue()
instead.

Cc: [email protected] # 5.9+
Reported-by: Joseph Qi <[email protected]>
Tested-by: Joseph Qi <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Merge branch 'net-fec-fix-temporary-rmii-clock-reset-on-link-up'

Laurent Badel says:

====================
net: fec: Fix temporary RMII clock reset on link up

v2: fixed a compilation warning

The FEC drivers performs a "hardware reset" of the MAC module when the
link is reported to be up. This causes a short glitch in the RMII clock
due to the hardware reset clearing the receive control register which
controls the MII mode. It seems that some link partners do not tolerate
this glitch, and invalidate the link, which leads to a never-ending loop
of negotiation-link up-link down events.

This was observed with the iMX28 Soc and LAN8720/LAN8742 PHYs, with two
Intel adapters I218-LM and X722-DA2 as link partners, though a number of
other link partners do not seem to mind the clock glitch. Changing the
hardware reset to a software reset (clearing bit 1 of the ECR register)
cured the issue.

Attempts to optimize fec_restart() in order to minimize the duration of
the glitch were unsuccessful. Furthermore manually producing the glitch by
setting MII mode and then back to RMII in two consecutive instructions,
resulting in a clock glitch <10us in duration, was enough to cause the
partner to invalidate the link. This strongly suggests that the root cause
of the link being dropped is indeed the change in clock frequency.

In an effort to minimize changes to driver, the patch proposes to use
soft reset only for tested SoCs (iMX28) and only if the link is up. This
preserves hardware reset in other situations, which might be required for
proper setup of the MAC.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: fec: Fix temporary RMII clock reset on link up

fec_restart() does a hard reset of the MAC module when the link status
changes to up. This temporarily resets the R_CNTRL register which controls
the MII mode of the ENET_OUT clock. In the case of RMII, the clock
frequency momentarily drops from 50MHz to 25MHz until the register is
reconfigured. Some link partners do not tolerate this glitch and
invalidate the link causing failure to establish a stable link when using
PHY polling mode. Since as per IEEE802.3 the criteria for link validity
are PHY-specific, what the partner should tolerate cannot be assumed, so
avoid resetting the MII clock by using software reset instead of hardware
reset when the link is up. This is generally relevant only if the SoC
provides the clock to an external PHY and the PHY is configured for RMII.

Signed-off-by: Laurent Badel <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: lapb: Add locking to the lapb module

In the lapb module, the timers may run concurrently with other code in
this module, and there is currently no locking to prevent the code from
racing on "struct lapb_cb". This patch adds locking to prevent racing.

1. Add "spinlock_t lock" to "struct lapb_cb"; Add "spin_lock_bh" and
"spin_unlock_bh" to APIs, timer functions and notifier functions.

2. Add "bool t1timer_stop, t2timer_stop" to "struct lapb_cb" to make us
able to ask running timers to abort; Modify "lapb_stop_t1timer" and
"lapb_stop_t2timer" to make them able to abort running timers;
Modify "lapb_t2timer_expiry" and "lapb_t1timer_expiry" to make them
abort after they are stopped by "lapb_stop_t1timer", "lapb_stop_t2timer",
and "lapb_start_t1timer", "lapb_start_t2timer".

3. Let lapb_unregister wait for other API functions and running timers
to stop.

4. The lapb_device_event function calls lapb_disconnect_request. In
order to avoid trying to hold the lock twice, add a new function named
"__lapb_disconnect_request" which assumes the lock is held, and make
it called by lapb_disconnect_request and lapb_device_event.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: Martin Schiller <[email protected]>
Signed-off-by: Xie He <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ARM: zImage: atags_to_fdt: Fix node names on added root nodes

Commit 7536c7e03e74 ("of/fdt: Remove redundant kbasename function
call") exposed a bug creating DT nodes in the ATAGS to DT fixup code.
Non-existent nodes would mistaken get created with a leading '/'. The
problem was fdt_path_offset() takes a full path while creating a node
with fdt_add_subnode() takes just the basename.

Since this we only add root child nodes, we can just skip over the '/'.

Fixes: 7536c7e03e74 ("of/fdt: Remove redundant kbasename function call")
Reported-by: Chris Packham <[email protected]>
Cc: Qi Zheng <[email protected]>
Cc: Russell King <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Tested-by: Chris Packham <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

team: protect features update by RCU to avoid deadlock

Function __team_compute_features() is protected by team->lock
mutex when it is called from team_compute_features() used when
features of an underlying device is changed. This causes
a deadlock when NETDEV_FEAT_CHANGE notifier for underlying device
is fired due to change propagated from team driver (e.g. MTU
change). It's because callbacks like team_change_mtu() or
team_vlan_rx_{add,del}_vid() protect their port list traversal
by team->lock mutex.

Example (r8169 case where this driver disables TSO for certain MTU
values):
...
[ 6391.348202]  __mutex_lock.isra.6+0x2d0/0x4a0
[ 6391.358602]  team_device_event+0x9d/0x160 [team]
[ 6391.363756]  notifier_call_chain+0x47/0x70
[ 6391.368329]  netdev_update_features+0x56/0x60
[ 6391.373207]  rtl8169_change_mtu+0x14/0x50 [r8169]
[ 6391.378457]  dev_set_mtu_ext+0xe1/0x1d0
[ 6391.387022]  dev_set_mtu+0x52/0x90
[ 6391.390820]  team_change_mtu+0x64/0xf0 [team]
[ 6391.395683]  dev_set_mtu_ext+0xe1/0x1d0
[ 6391.399963]  do_setlink+0x231/0xf50
...

In fact team_compute_features() called from team_device_event()
does not need to be protected by team->lock mutex and rcu_read_lock()
is sufficient there for port list traversal.

Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device")
Cc: Saeed Mahameed <[email protected]>
Signed-off-by: Ivan Vecera <[email protected]>
Reviewed-by: Cong Wang <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

MAINTAINERS: add David Ahern to IPv4/IPv6 maintainers

David has been the de-facto maintainer for much of the IP code
for the last couple of years, let's make it official.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net/mlx5: CT: Fix incorrect removal of tuple_nat_node from nat rhashtable

If a non nat tuple entry is inserted just to the regular tuples
rhashtable (ct_tuples_ht) and not to natted tuples rhashtable
(ct_nat_tuples_ht). Commit bc562be9674b ("net/mlx5e: CT: Save ct entries
tuples in hashtables") mixed up the return labels and names sot that on
cleanup or failure we still try to remove for the natted tuples rhashtable.

Fix that by correctly checking if a natted tuples insertion
before removing it. While here make it more readable.

Fixes: bc562be9674b ("net/mlx5e: CT: Save ct entries tuples in hashtables")
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Paul Blakey <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Revert parameters on errors when changing MTU and LRO state without reset

Sometimes, channel params are changed without recreating the channels.
It happens in two basic cases: when the channels are closed, and when
the parameter being changed doesn't affect how channels are configured.
Such changes invoke a hardware command that might fail. The whole
operation should be reverted in such cases, but the code that restores
the parameters' values in the driver was missing. This commit adds this
handling.

Fixes: 2e20a151205b ("net/mlx5e: Fail safe mtu and lro setting")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Revert parameters on errors when changing trust state without reset

Trust state may be changed without recreating the channels. It happens
when the channels are closed, and when channel parameters (min inline
mode) stay the same after changing the trust state. Changing the trust
state is a hardware command that may fail. The current code didn't
restore the channel parameters to their old values if an error happened
and the channels were closed. This commit adds handling for this case.

Fixes: 6e0504c69811 ("net/mlx5e: Change inline mode correctly when changing trust state")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Correctly handle changing the number of queues when the interface is down

This commit addresses two issues related to changing the number of
queues when the channels are closed:

1. Missing call to mlx5e_num_channels_changed to update
real_num_tx_queues when the number of TCs is changed.

2. When mlx5e_num_channels_changed returns an error, the channel
parameters must be reverted.

Two Fixes: tags correspond to the first commits where these two issues
were introduced.

Fixes: 3909a12e7913 ("net/mlx5e: Fix configuration of XPS cpumasks and netdev queues in corner cases")
Fixes: fa3748775b92 ("net/mlx5e: Handle errors from netif_set_real_num_{tx,rx}_queues")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Fix CT rule + encap slow path offload and deletion

Currently, if a neighbour isn't valid when offloading tunnel encap rules,
we offload the original match and replace the original action with
"goto slow path" action. For this we use a temporary flow attribute based
on the original flow attribute and then change the action. Flow flags,
which among those is the CT flag, are still shared for the slow path rule
offload, so we end up parsing this flow as a CT + goto slow path rule.

Besides being unnecessary, CT action offload saves extra information in
the passed flow attribute, such as created ct_flow and mod_hdr, which
is lost onces the temporary flow attribute is freed.

When a neigh is updated and is valid, we offload the original CT rule
with original CT action, which again creates a ct_flow and mod_hdr
and saves it in the flow's original attribute. Then we delete the slow
path rule with a temporary flow attribute based on original updated
flow attribute, and we free the relevant ct_flow and mod_hdr.

Then when tc deletes this flow, we try to free the ct_flow and mod_hdr
on the flow's attribute again.

To fix the issue, skip all furture proccesing (CT/Sample/Split rules)
in offload/unoffload of slow path rules.

Call trace:
[  758.850525] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000218
[  758.952987] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[  758.964170] Modules linked in: act_csum(E) act_pedit(E) act_tunnel_key(E) act_ct(E) nf_flow_table(E) xt_nat(E) ip6table_filter(E) ip6table_nat(E) xt_comment(E) ip6_tables(E) xt_conntrack(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) xt_addrtype(E) iptable_filter(E) iptable_nat(E) bpfilter(E) br_netfilter(E) bridge(E) stp(E) llc(E) xfrm_user(E) overlay(E) act_mirred(E) act_skbedit(E) rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) esp6_offload(E) esp6(E) esp4_offload(E) esp4(E) xfrm_algo(E) mlx5_ib(OE) ib_uverbs(OE) geneve(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink_cttimeout(E) nfnetlink(E) mlx5_core(OE) act_gact(E) cls_flower(E) sch_ingress(E) openvswitch(E) nsh(E) nf_conncount(E) nf_nat(E) mlxfw(OE) psample(E) nf_conntrack(E) nf_defrag_ipv4(E) vfio_mdev(E) mdev(E) ib_core(OE) mlx_compat(OE) crct10dif_ce(E) uio_pdrv_genirq(E) uio(E) i2c_mlx(E) mlxbf_pmc(E) sbsa_gwdt(E) mlxbf_gige(E) gpio_mlxbf2(E) mlxbf_pka(E) mlx_trio(E) mlx_bootctl(E) bluefield_edac(E) knem(O)
[  758.964225]  ip_tables(E) mlxbf_tmfifo(E) ipv6(E) crc_ccitt(E) nf_defrag_ipv6(E)
[  759.154186] CPU: 5 PID: 122 Comm: kworker/u16:1 Tainted: G           OE     5.4.60-mlnx.52.gde81e85 #1
[  759.172870] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS BlueField:3.5.0-2-gc1b5d64 Jan  4 2021
[  759.195466] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core]
[  759.207344] pstate: a0000005 (NzCv daif -PAN -UAO)
[  759.217003] pc : mlx5_del_flow_rules+0x5c/0x160 [mlx5_core]
[  759.228229] lr : mlx5_del_flow_rules+0x34/0x160 [mlx5_core]
[  759.405858] Call trace:
[  759.410804]  mlx5_del_flow_rules+0x5c/0x160 [mlx5_core]
[  759.421337]  __mlx5_eswitch_del_rule.isra.43+0x5c/0x1c8 [mlx5_core]
[  759.433963]  mlx5_eswitch_del_offloaded_rule_ct+0x34/0x40 [mlx5_core]
[  759.446942]  mlx5_tc_rule_delete_ct+0x68/0x74 [mlx5_core]
[  759.457821]  mlx5_tc_ct_delete_flow+0x160/0x21c [mlx5_core]
[  759.469051]  mlx5e_tc_unoffload_fdb_rules+0x158/0x168 [mlx5_core]
[  759.481325]  mlx5e_tc_encap_flows_del+0x140/0x26c [mlx5_core]
[  759.492901]  mlx5e_rep_update_flows+0x11c/0x1ec [mlx5_core]
[  759.504127]  mlx5e_rep_neigh_update+0x160/0x200 [mlx5_core]
[  759.515314]  process_one_work+0x178/0x400
[  759.523350]  worker_thread+0x58/0x3e8
[  759.530685]  kthread+0x100/0x12c
[  759.537152]  ret_from_fork+0x10/0x18
[  759.544320] Code: 97ffef55 51000673 3100067f 54ffff41 (b9421ab3)
[  759.556548] ---[ end trace fab818bb1085832d ]---

Fixes: 4c3844d9e97e ("net/mlx5e: CT: Introduce connection tracking")
Signed-off-by: Paul Blakey <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Reviewed-by: Vlad Buslov <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Disable hw-tc-offload when MLX5_CLS_ACT config is disabled

The cited commit introduce new CONFIG_MLX5_CLS_ACT kconfig variable
to control compilation of TC hardware offloads implementation.
When this configuration is disabled the driver is still wrongly
reports in ethtool that hw-tc-offload is supported.

Fixed by reporting hw-tc-offload is supported only when
CONFIG_MLX5_CLS_ACT is enabled.

Fixes: d956873f908c ("net/mlx5e: Introduce kconfig var for TC support")
Signed-off-by: Maor Dickman <[email protected]>
Reviewed-by: Vlad Buslov <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: Maintain separate page trees for ECPF and PF functions

Pages for the host PF and ECPF were stored in the same tree, so the ECPF
pages were being freed along with the host PF's when the host driver
unloaded.

Combine the function ID and ECPF flag to use as an index into the
x-array containing the trees to get a different tree for the host PF and
ECPF.

Fixes: c6168161f693 ("net/mlx5: Add support for release all pages event")
Signed-off-by: Daniel Jurgens <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Fix IPSEC stats

When IPSEC offload isn't active, the number of stats is not zero, but
the strings are not filled, leading to exposing stats with empty names.
Fix this by using the same condition for NUM_STATS and FILL_STRS.

Fixes: 0aab3e1b04ae ("net/mlx5e: IPSec, Expose IPsec HW stat only for supporting HW")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Reviewed-by: Raed Salem <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Reduce tc unsupported key print level

"Unsupported key used:" appears in kernel log when flows with
unsupported key are used, arp fields for example.

OpenVSwitch was changed to match on arp fields by default that
caused this warning to appear in kernel log for every arp rule, which
can be a lot.

Fix by lowering print level from warning to debug.

Fixes: e3a2b7ed018e ("net/mlx5e: Support offload cls_flower with drop action")
Signed-off-by: Maor Dickman <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Reviewed-by: Saeed Mahameed <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: free page before return

Instead of directly return, goto the error handling label to free
allocated page.

Fixes: 5f29458b77d5 ("net/mlx5e: Support dump callback in TX reporter")
Signed-off-by: Pan Bian <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: E-switch, Fix rate calculation for overflow

rate_bytes_ps is a 64-bit field. It passed as 32-bit field to
apply_police_params(). Due to this when police rate is higher
than 4Gbps, 32-bit calculation ignores the carry. This results
in incorrect rate configurationn the device.

Fix it by performing 64-bit calculation.

Fixes: fcb64c0f5640 ("net/mlx5: E-Switch, add ingress rate support")
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Eli Cohen <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: Fix memory leak on flow table creation error flow

When we create the ft object we also init rhltable in ft->fgs_hash.
So in error flow before kfree of ft we need to destroy that rhltable.

Fixes: 693c6883bbc4 ("net/mlx5: Add hash table for flow groups in flow table")
Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

Merge tag 'mac80211-for-net-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
A couple of fixes:
* fix 160 MHz channel switch in mac80211
* fix a staging driver to not deadlock due to some
   recent cfg80211 changes
* fix NULL-ptr deref if cfg80211 returns -EINPROGRESS
   to wext (syzbot)
* pause TX in mac80211 in type change to prevent crashes
   (syzbot)

* tag 'mac80211-for-net-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211:
  staging: rtl8723bs: fix wireless regulatory API misuse
  mac80211: pause TX while changing interface type
  wext: fix NULL-ptr-dereference with cfg80211's lack of commit()
  mac80211: 160MHz with extended NSS BW in CSA
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'wireless-drivers-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for v5.11

Second set of fixes for v5.11. Like in last time we again have more
fixes than usual Actually a bit too much for my liking in this state
of the cycle, but due to unrelated challenges I was only able to
submit them now.

We have few important crash fixes, iwlwifi modifying read-only data
being the most reported issue, and also smaller fixes to iwlwifi.

mt76
* fix a clang warning about enum usage
* fix rx buffer refcounting crash

mt7601u
* fix rx buffer refcounting crash
* fix crash when unbplugging the device

iwlwifi
* fix a crash where we were modifying read-only firmware data
* lots of smaller fixes all over the driver

* tag 'wireless-drivers-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers: (24 commits)
  mt7601u: fix kernel crash unplugging the device
  iwlwifi: queue: bail out on invalid freeing
  iwlwifi: mvm: guard against device removal in reprobe
  iwlwifi: Fix IWL_SUBDEVICE_NO_160 macro to use the correct bit.
  iwlwifi: mvm: clear IN_D3 after wowlan status cmd
  iwlwifi: pcie: add rules to match Qu with Hr2
  iwlwifi: mvm: invalidate IDs of internal stations at mvm start
  iwlwifi: mvm: fix the return type for DSM functions 1 and 2
  iwlwifi: pcie: reschedule in long-running memory reads
  iwlwifi: pcie: use jiffies for memory read spin time limit
  iwlwifi: pcie: fix context info memory leak
  iwlwifi: pcie: add a NULL check in iwl_pcie_txq_unmap
  iwlwifi: pcie: set LTR on more devices
  iwlwifi: queue: don't crash if txq->entries is NULL
  iwlwifi: fix the NMI flow for old devices
  iwlwifi: pnvm: don't try to load after failures
  iwlwifi: pnvm: don't skip everything when not reloading
  iwlwifi: pcie: avoid potential PNVM leaks
  iwlwifi: mvm: take mutex for calling iwl_mvm_get_sync_time()
  iwlwifi: mvm: skip power command when unbinding vif during CSA
  ...
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

iwlwifi: provide gso_type to GSO packets

net/core/tso.c got recent support for USO, and this broke iwlfifi
because the driver implemented a limited form of GSO.

Providing ->gso_type allows for skb_is_gso_tcp() to provide
a correct result.

Fixes: 3d5b459ba0e3 ("net: tso: add UDP segmentation support")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Ben Greear <[email protected]>
Tested-by: Ben Greear <[email protected]>
Cc: Luca Coelho <[email protected]>
Cc: Johannes Berg <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209913
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

igc: fix link speed advertising

Link speed advertising in igc has two problems:

- When setting the advertisement via ethtool, the link speed is converted
  to the legacy 32 bit representation for the intel PHY code.
  This inadvertently drops ETHTOOL_LINK_MODE_2500baseT_Full_BIT (being
  beyond bit 31).  As a result, any call to `ethtool -s ...' drops the
  2500Mbit/s link speed from the PHY settings.  Only reloading the driver
  alleviates that problem.

  Fix this by converting the ETHTOOL_LINK_MODE_2500baseT_Full_BIT to the
  Intel PHY ADVERTISE_2500_FULL bit explicitly.

- Rather than checking the actual PHY setting, the .get_link_ksettings
  function always fills link_modes.advertising with all link speeds
  the device is capable of.

  Fix this by checking the PHY autoneg_advertised settings and report
  only the actually advertised speeds up to ethtool.

Fixes: 8c5ad0dae93c ("igc: Add ethtool support")
Signed-off-by: Corinna Vinschen <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

mailmap: remove the "repo-abbrev" comment

Remove the magical "repo-abbrev" comment added when this file was
introduced in e0ab1ec9fcd3 ([PATCH] add .mailmap for proper
git-shortlog output, 2007-02-14).

It's been an undocumented feature of git-shortlog(1), originally added
to git for Linus's use. Since then he's no longer using it[1], and
I've removed the feature in git.git's 4e168333a87 (shortlog: remove
unused(?) "repo-abbrev" feature, 2021-01-12). It's on the "master"
branch, but not yet in a release version.

Let's also remove it from linux.git, both as a heads-up to any
potential users of it in linux.git whose use would be broken sooner
than later by git itself, and because it'll eventually be entirely
redundant.

1. https://lore.kernel.org/git/CAHk-=wixHyBKZVUcxq+NCWMbkrX0xnppb7UCopRWw1+oExYpYw@mail.gmail.com/

Acked-by: Linus Torvalds <[email protected]>
Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

parisc: Enable -mlong-calls gcc option by default when !CONFIG_MODULES

When building a kernel without module support, the CONFIG_MLONGCALL option
needs to be enabled in order to reach symbols which are outside of a 22-bit
branch.

This patch changes the autodetection in the Kconfig script to always enable
CONFIG_MLONGCALL when modules are disabled and uses a far call to
preempt_schedule_irq() in intr_do_preempt() to reach the symbol in all cases.

Signed-off-by: Helge Deller <[email protected]>
Reported-by: kernel test robot <[email protected]>
Cc: [email protected] # v5.6+

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:

-  x86 bugfixes

- Documentation fixes

- Avoid performance regression due to SEV-ES patches

- ARM:
     - Don't allow tagged pointers to point to memslots
     - Filter out ARMv8.1+ PMU events on v8.0 hardware
     - Hide PMU registers from userspace when no PMU is configured
     - More PMU cleanups
     - Don't try to handle broken PSCI firmware
     - More sys_reg() to reg_to_encoding() conversions

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX
  KVM: x86: Revert "KVM: x86: Mark GPRs dirty when written"
  KVM: SVM: Unconditionally sync GPRs to GHCB on VMRUN of SEV-ES guest
  KVM: nVMX: Sync unsync'd vmcs02 state to vmcs12 on migration
  kvm: tracing: Fix unmatched kvm_entry and kvm_exit events
  KVM: Documentation: Update description of KVM_{GET,CLEAR}_DIRTY_LOG
  KVM: x86: get smi pending status correctly
  KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[]
  KVM: x86/pmu: Fix UBSAN shift-out-of-bounds warning in intel_pmu_refresh()
  KVM: x86: Add more protection against undefined behavior in rsvd_bits()
  KVM: Documentation: Fix spec for KVM_CAP_ENABLE_CAP_VM
  KVM: Forbid the use of tagged userspace addresses for memslots
  KVM: arm64: Filter out v8.1+ events on v8.0 HW
  KVM: arm64: Compute TPIDR_EL2 ignoring MTE tag
  KVM: arm64: Use the reg_to_encoding() macro instead of sys_reg()
  KVM: arm64: Allow PSCI SYSTEM_OFF/RESET to return
  KVM: arm64: Simplify handling of absent PMU system registers
  KVM: arm64: Hide PMU registers from userspace when not available

Merge tag 'spi-fix-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"One new device ID here, plus an error handling fix - nothing
  remarkable in either"

* tag 'spi-fix-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spidev: Add cisco device compatible
  spi: altera: Fix memory leak on error path

Merge tag 'regulator-fix-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
"The main thing here is a change to make sure that we don't try to
  double resolve the supply of a regulator if we have two probes going
  on simultaneously, plus an incremental fix on top of that to resolve a
  lockdep issue it introduced.

  There's also a patch from Dmitry Osipenko adding stubs for some
  functions to avoid build issues in consumers in some configurations"

* tag 'regulator-fix-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: Fix lockdep warning resolving supplies
  regulator: consumer: Add missing stubs to regulator/consumer.h
  regulator: core: avoid regulator_resolve_supply() race condition

parisc: Remove leftover reference to the power_tasklet

This was removed long ago, back in:

6e16d9409e1 ([PARISC] Convert soft power switch driver to kthread)

Signed-off-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

i40e: acquire VSI pointer only after VF is initialized

This change simplifies the VF initialization check and also minimizes
the delay between acquiring the VSI pointer and using it. As known by
the commit being fixed, there is a risk of the VSI pointer getting
changed. Therefore minimize the delay between getting and using the
pointer.

Fixes: 9889707b06ac ("i40e: Fix crash caused by stress setting of VF MAC addresses")
Signed-off-by: Stefan Assmann <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: Fix MSI-X vector fallback logic

The current MSI-X enablement logic tries to enable best-case MSI-X
vectors and if that fails we only support a bare-minimum set. This
includes a single MSI-X for 1 Tx and 1 Rx queue and a single MSI-X
for the OICR interrupt. Unfortunately, the driver fails to load when we
don't get as many MSI-X as requested for a couple reasons.

First, the code to allocate MSI-X in the driver tries to allocate
num_online_cpus() MSI-X for LAN traffic without caring about the number
of MSI-X actually enabled/requested from the kernel for LAN traffic.
So, when calling ice_get_res() for the PF VSI, it returns failure
because the number of available vectors is less than requested. Fix
this by not allowing the PF VSI to allocation more than
pf->num_lan_msix MSI-X vectors and pf->num_lan_msix Rx/Tx queues.
Limiting the number of queues is done because we don't want more than
1 Tx/Rx queue per interrupt due to performance conerns.

Second, the driver assigns pf->num_lan_msix = 2, to account for LAN
traffic and the OICR. However, pf->num_lan_msix is only meant for LAN
MSI-X. This is causing a failure when the PF VSI tries to
allocate/reserve the minimum pf->num_lan_msix because the OICR MSI-X has
already been reserved, so there may not be enough MSI-X vectors left.
Fix this by setting pf->num_lan_msix = 1 for the failure case. Then the
ICE_MIN_MSIX accounts for the LAN MSI-X and the OICR MSI-X needed for
the failure case.

Update the related defines used in ice_ena_msix_range() to align with
the above behavior and remove the unused RDMA defines because RDMA is
currently not supported. Also, remove the now incorrect comment.

Fixes: 152b978a1f90 ("ice: Rework ice_ena_msix_range")
Signed-off-by: Brett Creeley <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: Don't allow more channels than LAN MSI-X available

Currently users could create more channels than LAN MSI-X available.
This is happening because there is no check against pf->num_lan_msix
when checking the max allowed channels and will cause performance issues
if multiple Tx and Rx queues are tied to a single MSI-X. Fix this by not
allowing more channels than LAN MSI-X available in pf->num_lan_msix.

Fixes: 87324e747fde ("ice: Implement ethtool ops for channels")
Signed-off-by: Brett Creeley <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: update dev_addr in ice_set_mac_address even if HW filter exists

Fix the driver to copy the MAC address configured in ndo_set_mac_address
into dev_addr, even if the MAC filter already exists in HW. In some
situations (e.g. bonding) the netdev's dev_addr could have been modified
outside of the driver, with no change to the HW filter, so the driver
cannot assume that they match.

Fixes: 757976ab16be ("ice: Fix check for removing/adding mac filters")
Signed-off-by: Nick Nunley <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: Implement flow for IPv6 next header (extension header)

This patch is based on a similar change to i40e by Slawomir Laba:
"i40e: Implement flow for IPv6 next header (extension header)".

When a packet contains an IPv6 header with next header which is
an extension header and not a protocol one, the kernel function
skb_transport_header called with such sk_buff will return a
pointer to the extension header and not to the TCP one.

The above explained call caused a problem with packet processing
for skb with encapsulation for tunnel with ICE_TX_CTX_EIPT_IPV6.
The extension header was not skipped at all.

The ipv6_skip_exthdr function does check if next header of the IPV6
header is an extension header and doesn't modify the l4_proto pointer
if it points to a protocol header value so its safe to omit the
comparison of exthdr and l4.hdr pointers. The ipv6_skip_exthdr can
return value -1. This means that the skipping process failed
and there is something wrong with the packet so it will be dropped.

Fixes: a4e82a81f573 ("ice: Add support for tunnel offloads")
Signed-off-by: Nick Nunley <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: fix FDir IPv6 flexbyte

The packet classifier would occasionally misrecognize an IPv6 training
packet when the next protocol field was 0. The correct value for
unspecified protocol is IPPROTO_NONE.

Fixes: 165d80d6adab ("ice: Support IPv6 Flow Director filters")
Signed-off-by: Henry Tieman <[email protected]>
Reviewed-by: Paul Menzel <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

Revert "mm: fix initialization of struct page for holes in memory layout"

This reverts commit d3921cb8be29ce5668c64e23ffdaeec5f8c69399.

Chris Wilson reports that it causes boot problems:

"We have half a dozen or so different machines in CI that are silently
failing to boot, that we believe is bisected to this patch"

and the CI team confirmed that a revert fixed the issues.

The cause is unknown for now, so let's revert it.

Link: https://lore.kernel.org/lkml/[email protected]/
Reported-and-tested-by: Chris Wilson <[email protected]>
Acked-by: Mike Rapoport <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

media: hantro: Fix reset_raw_fmt initialization

raw_fmt->height in never initialized. But width in initialized twice.

Fixes: 88d06362d1d05 ("media: hantro: Refactor for V4L2 API spec compliancy")
Signed-off-by: Ricardo Ribalda <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
Cc: <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: cec: add stm32 driver

Missing stm32 directory to Makefile.

Signed-off-by: Yannick Fertre <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
Fixes: 4be5e8648b0c ("media: move CEC platform drivers to a separate directory")
Cc: <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: cedrus: Fix H264 decoding

During H264 API overhaul subtle bug was introduced Cedrus driver.
Progressive references have both, top and bottom reference flags set.
Cedrus reference list expects only bottom reference flag and only when
interlaced frames are decoded. However, due to a bug in Cedrus check,
exclusivity is not tested and that flag is set also for progressive
references. That causes "jumpy" background with many videos.

Fix that by checking that only bottom reference flag is set in control
and nothing else.

Tested-by: Andre Heider <[email protected]>
Fixes: cfc8c3ed533e ("media: cedrus: h264: Properly configure reference field")
Signed-off-by: Jernej Skrabec <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
Cc: <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: v4l2-subdev.h: BIT() is not available in userspace

The BIT macro is not available in userspace, so replace BIT(0) by
0x00000001.

Signed-off-by: Hans Verkuil <[email protected]>
Fixes: 6446ec6cbf46 ("media: v4l2-subdev: add VIDIOC_SUBDEV_QUERYCAP ioctl")
Cc: <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

arm64: Fix kernel address detection of __is_lm_address()

Currently, the __is_lm_address() check just masks out the top 12 bits
of the address, but if they are 0, it still yields a true result.
This has as a side effect that virt_addr_valid() returns true even for
invalid virtual addresses (e.g. 0x0).

Fix the detection checking that it's actually a kernel address starting
at PAGE_OFFSET.

Fixes: 68dd8ef32162 ("arm64: memory: Fix virt_addr_valid() using __is_lm_address()")
Cc: <[email protected]> # 5.4.x
Cc: Will Deacon <[email protected]>
Suggested-by: Catalin Marinas <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Signed-off-by: Vincenzo Frascino <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>

ALSA: hda/via: Apply the workaround generically for Clevo machines

We've got another report indicating a similar problem wrt the
power-saving behavior with VIA codec on Clevo machines. Let's apply
the existing workaround generically to all Clevo devices with VIA
codecs to cover all in once.

BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=1181330
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

io_uring: fix cancellation taking mutex while TASK_UNINTERRUPTIBLE

do not call blocking ops when !TASK_RUNNING; state=2 set at
[<00000000ced9dbfc>] prepare_to_wait+0x1f4/0x3b0
kernel/sched/wait.c:262
WARNING: CPU: 1 PID: 19888 at kernel/sched/core.c:7853
__might_sleep+0xed/0x100 kernel/sched/core.c:7848
RIP: 0010:__might_sleep+0xed/0x100 kernel/sched/core.c:7848
Call Trace:
__mutex_lock_common+0xc4/0x2ef0 kernel/locking/mutex.c:935
__mutex_lock kernel/locking/mutex.c:1103 [inline]
mutex_lock_nested+0x1a/0x20 kernel/locking/mutex.c:1118
io_wq_submit_work+0x39a/0x720 fs/io_uring.c:6411
io_run_cancel fs/io-wq.c:856 [inline]
io_wqe_cancel_pending_work fs/io-wq.c:990 [inline]
io_wq_cancel_cb+0x614/0xcb0 fs/io-wq.c:1027
io_uring_cancel_files fs/io_uring.c:8874 [inline]
io_uring_cancel_task_requests fs/io_uring.c:8952 [inline]
__io_uring_files_cancel+0x115d/0x19e0 fs/io_uring.c:9038
io_uring_files_cancel include/linux/io_uring.h:51 [inline]
do_exit+0x2e6/0x2490 kernel/exit.c:780
do_group_exit+0x168/0x2d0 kernel/exit.c:922
get_signal+0x16b5/0x2030 kernel/signal.c:2770
arch_do_signal_or_restart+0x8e/0x6a0 arch/x86/kernel/signal.c:811
handle_signal_work kernel/entry/common.c:147 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0xac/0x1e0 kernel/entry/common.c:201
__syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
syscall_exit_to_user_mode+0x48/0x190 kernel/entry/common.c:302
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Rewrite io_uring_cancel_files() to mimic __io_uring_task_cancel()'s
counting scheme, so it does all the heavy work before setting
TASK_UNINTERRUPTIBLE.

Cc: [email protected] # 5.9+
Reported-by: [email protected]
Signed-off-by: Pavel Begunkov <[email protected]>
[axboe: fix inverted task check]
Signed-off-by: Jens Axboe <[email protected]>

io_uring: fix __io_uring_files_cancel() with TASK_UNINTERRUPTIBLE

If the tctx inflight number haven't changed because of cancellation,
__io_uring_task_cancel() will continue leaving the task in
TASK_UNINTERRUPTIBLE state, that's not expected by
__io_uring_files_cancel(). Ensure we always call finish_wait() before
retrying.

Cc: [email protected] # 5.9+
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

futex: Handle faults correctly for PI futexes

fixup_pi_state_owner() tries to ensure that the state of the rtmutex,
pi_state and the user space value related to the PI futex are consistent
before returning to user space. In case that the user space value update
faults and the fault cannot be resolved by faulting the page in via
fault_in_user_writeable() the function returns with -EFAULT and leaves
the rtmutex and pi_state owner state inconsistent.

A subsequent futex_unlock_pi() operates on the inconsistent pi_state and
releases the rtmutex despite not owning it which can corrupt the RB tree of
the rtmutex and cause a subsequent kernel stack use after free.

It was suggested to loop forever in fixup_pi_state_owner() if the fault
cannot be resolved, but that results in runaway tasks which is especially
undesired when the problem happens due to a programming error and not due
to malice.

As the user space value cannot be fixed up, the proper solution is to make
the rtmutex and the pi_state consistent so both have the same owner. This
leaves the user space value out of sync. Any subsequent operation on the
futex will fail because the 10th rule of PI futexes (pi_state owner and
user space value are consistent) has been violated.

As a consequence this removes the inept attempts of 'fixing' the situation
in case that the current task owns the rtmutex when returning with an
unresolvable fault by unlocking the rtmutex which left pi_state::owner and
rtmutex::owner out of sync in a different and only slightly less dangerous
way.

Fixes: 1b7558e457ed ("futexes: fix fault handling in futex_lock_pi")
Reported-by: [email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]

futex: Simplify fixup_pi_state_owner()

Too many gotos already and an upcoming fix would make it even more
unreadable.

Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]

futex: Use pi_state_update_owner() in put_pi_state()

No point in open coding it. This way it gains the extra sanity checks.

Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]

rtmutex: Remove unused argument from rt_mutex_proxy_unlock()

Nothing uses the argument. Remove it as preparation to use
pi_state_update_owner().

Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]

futex: Provide and use pi_state_update_owner()

Updating pi_state::owner is done at several places with the same
code. Provide a function for it and use that at the obvious places.

This is also a preparation for a bug fix to avoid yet another copy of the
same code or alternatively introducing a completely unpenetratable mess of
gotos.

Originally-by: Peter Zijlstra <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]

futex: Replace pointless printk in fixup_owner()

If that unexpected case of inconsistent arguments ever happens then the
futex state is left completely inconsistent and the printk is not really
helpful. Replace it with a warning and make the state consistent.

Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]

futex: Ensure the correct return value from futex_lock_pi()

In case that futex_lock_pi() was aborted by a signal or a timeout and the
task returned without acquiring the rtmutex, but is the designated owner of
the futex due to a concurrent futex_unlock_pi() fixup_owner() is invoked to
establish consistent state. In that case it invokes fixup_pi_state_owner()
which in turn tries to acquire the rtmutex again. If that succeeds then it
does not propagate this success to fixup_owner() and futex_lock_pi()
returns -EINTR or -ETIMEOUT despite having the futex locked.

Return success from fixup_pi_state_owner() in all cases where the current
task owns the rtmutex and therefore the futex and propagate it correctly
through fixup_owner(). Fixup the other callsite which does not expect a
positive return value.

Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]

drm/i915/gt: Always try to reserve GGTT address 0x0

Since writing to address 0 is a very common mistake, let's try to avoid
putting anything sensitive there.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/2989
Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Cc: [email protected]
(cherry picked from commit 56b429cc584c6ed8b895d8d8540959655db1ff73)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915: Always flush the active worker before returning from the wait

The first thing the active retirement worker does is decrement the
i915_active count.

The first thing we do during i915_active_wait is try to increment the
i915_active count, but only if already active [non-zero].

The wait may see that the retirement is already started and so marked the
i915_active as idle, and skip waiting for the retirement handler.
However, the caller of i915_active_wait may immediately free the
i915_active upon returning (e.g. i915_vma_destroy) so we must not return
before the concurrent access from the worker is completed. We must
always flush the worker.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2473
Fixes: 274cbf20fd10 ("drm/i915: Push the i915_active.retire into a worker")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: <[email protected]> # v5.5+
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 977a372e972cb42799746c284035a33c64ebace9)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915/selftest: Fix potential memory leak

Object out is not released on path that no VMA instance found. The root
cause is jumping to an unexpected label on the error path.

Fixes: a47e788c2310 ("drm/i915/selftests: Exercise CS TLB invalidation")
Signed-off-by: Pan Bian <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 2b015017d5cb01477a79ca184ac25c247d664568)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915: Check for all subplatform bits

Current code is checking only 2 bits in the subplatform, but actually 3
bits are allocated for the field. Check all 3 bits.

Fixes: 805446c8347c ("drm/i915: Introduce concept of a sub-platform")
Cc: Tvrtko Ursulin <[email protected]>
Signed-off-by: Umesh Nerlige Ramappa <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 27b695ee1af9bb36605e67055874ec081306ac28)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915: Fix ICL MG PHY vswing handling

The MH PHY vswing table does have all the entries these days. Get
rid of the old hacks in the code which claim otherwise.

This hack was totally bogus anyway. The correct way to handle the
lack of those two entries would have been to declare our max
vswing and pre-emph to both be level 2.

Cc: José Roberto de Souza <[email protected]>
Cc: Clinton Taylor <[email protected]>
Fixes: 9f7ffa297978 ("drm/i915/tc/icl: Update TC vswing tables")
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Imre Deak <[email protected]>
Reviewed-by: José Roberto de Souza <[email protected]>
(cherry picked from commit 5ec346476e795089b7dac8ab9dcee30c8d80ad84)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915/gt: Clear CACHE_MODE prior to clearing residuals

Since we do a bare context switch with no restore, the clear residual
kernel runs on dirty state, and we must be careful to avoid executing
with bad state from context registers inherited from a malicious client.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2955
Fixes: 09aa9e45863e ("drm/i915/gt: Restore clear-residual mitigations for Ivybridge, Baytrail")
Testcase: igt/gem_ctx_isolation # ivb,vlv
Signed-off-by: Chris Wilson <[email protected]>
Cc: Mika Kuoppala <[email protected]>
Cc: Akeem G Abodunrin <[email protected]>
Reviewed-by: Akeem G Abodunrin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit ace44e13e577c2ae59980e9a6ff5ca253b1cf831)
Signed-off-by: Jani Nikula <[email protected]>

Merge tag 'asoc-fix-v5.11-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v5.11

More fixes for v5.11, almost all driver specific issues including new
device IDs - there's one error handling fix for the topology stuff too.

staging: rtl8723bs: fix wireless regulatory API misuse

This code ends up calling wiphy_apply_custom_regulatory(), for which
we document that it should be called before wiphy_register(). This
driver doesn't do that, but calls it from ndo_open() with the RTNL
held, which caused deadlocks.

Since the driver just registers static regdomain data and then the
notifier applies the channel changes if any, there's no reason for
it to call this in ndo_open(), move it earlier to fix the deadlock.

Reported-and-tested-by: Hans de Goede <[email protected]>
Fixes: 51d62f2f2c50 ("cfg80211: Save the regulatory domain with a lock")
Link: https://lore.kernel.org/r/20210126115409.d5fd6f8fe042.Ib5823a6feb2e2aa01ca1a565d2505367f38ad246@changeid
Acked-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>

mac80211: pause TX while changing interface type

syzbot reported a crash that happened when changing the interface
type around a lot, and while it might have been easy to fix just
the symptom there, a little deeper investigation found that really
the reason is that we allowed packets to be transmitted while in
the middle of changing the interface type.

Disallow TX by stopping the queues while changing the type.

Fixes: 34d4bc4d41d2 ("mac80211: support runtime interface type changes")
Reported-by: [email protected]
Link: https://lore.kernel.org/r/20210122171115.b321f98f4d4f.I6997841933c17b093535c31d29355be3c0c39628@changeid
Signed-off-by: Johannes Berg <[email protected]>

wext: fix NULL-ptr-dereference with cfg80211's lack of commit()

Since cfg80211 doesn't implement commit, we never really cared about
that code there (and it's configured out w/o CONFIG_WIRELESS_EXT).
After all, since it has no commit, it shouldn't return -EIWCOMMIT to
indicate commit is needed.

However, EIWCOMMIT is actually an alias for EINPROGRESS, which _can_
happen if e.g. we try to change the frequency but we're already in
the process of connecting to some network, and drivers could return
that value (or even cfg80211 itself might).

This then causes us to crash because dev->wireless_handlers is NULL
but we try to check dev->wireless_handlers->standard[0].

Fix this by also checking dev->wireless_handlers. Also simplify the
code a little bit.

Cc: [email protected]
Reported-by: [email protected]
Reported-by: [email protected]
Link: https://lore.kernel.org/r/20210121171621.2076e4a37d5a.I5d9c72220fe7bb133fb718751da0180a57ecba4e@changeid
Signed-off-by: Johannes Berg <[email protected]>

HID: wacom: Correct NULL dereference on AES pen proximity

The recent commit to fix a memory leak introduced an inadvertant NULL
pointer dereference. The `wacom_wac->pen_fifo` variable was never
intialized, resuling in a crash whenever functions tried to use it.
Since the FIFO is only used by AES pens (to buffer events from pen
proximity until the hardware reports the pen serial number) this would
have been easily overlooked without testing an AES device.

This patch converts `wacom_wac->pen_fifo` over to a pointer (since the
call to `devres_alloc` allocates memory for us) and ensures that we assign
it to point to the allocated and initalized `pen_fifo` before the function
returns.

Link: https://github.com/linuxwacom/input-wacom/issues/230
Fixes: 37309f47e2f5 ("HID: wacom: Fix memory leakage caused by kfifo_alloc")
CC: [email protected] # v4.19+
Signed-off-by: Jason Gerecke <[email protected]>
Tested-by: Ping Cheng <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>

xen-blkfront: allow discard-* nodes to be optional

This is inline with the specification described in blkif.h:

* discard-granularity: should be set to the physical block size if
node is not present.
* discard-alignment, discard-secure: should be set to 0 if node not
present.

This was detected as QEMU would only create the discard-granularity
node but not discard-alignment, and thus the setup done in
blkfront_setup_discard would fail.

Fix blkfront_setup_discard to not fail on missing nodes, and also fix
blkif_set_queue_limits to set the discard granularity to the physical
block size if none is specified in xenbus.

Fixes: ed30bf317c5ce ('xen-blkfront: Handle discard requests.')
Reported-by: Arthur Borsboom <[email protected]>
Signed-off-by: Roger Pau Monné <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Tested-By: Arthur Borsboom <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>

ecryptfs: fix uid translation for setxattr on security.capability

Prior to commit 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into
vfs_setxattr()") the translation of nscap->rootid did not take stacked
filesystems (overlayfs and ecryptfs) into account.

That patch fixed the overlay case, but made the ecryptfs case worse.

Restore old the behavior for ecryptfs that existed before the overlayfs
fix. This does not fix ecryptfs's handling of complex user namespace
setups, but it does make sure existing setups don't regress.

Reported-by: Eric W. Biederman <[email protected]>
Cc: Tyler Hicks <[email protected]>
Fixes: 7c03e2cda4a5 ("vfs: move cap_convert_nscap() call into vfs_setxattr()")
Signed-off-by: Miklos Szeredi <[email protected]>
Signed-off-by: Tyler Hicks <[email protected]>

KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX

VMX also uses KVM_REQ_GET_NESTED_STATE_PAGES for the Hyper-V eVMCS,
which may need to be loaded outside guest mode. Therefore we cannot
WARN in that case.

However, that part of nested_get_vmcs12_pages is _not_ needed at
vmentry time. Split it out of KVM_REQ_GET_NESTED_STATE_PAGES handling,
so that both vmentry and migration (and in the latter case, independent
of is_guest_mode) do the parts that are needed.

Cc: <[email protected]> # 5.10.x: f2c7ef3ba: KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES
Cc: <[email protected]> # 5.10.x
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: Revert "KVM: x86: Mark GPRs dirty when written"

Revert the dirty/available tracking of GPRs now that KVM copies the GPRs
to the GHCB on any post-VMGEXIT VMRUN, even if a GPR is not dirty. Per
commit de3cd117ed2f ("KVM: x86: Omit caching logic for always-available
GPRs"), tracking for GPRs noticeably impacts KVM's code footprint.

This reverts commit 1c04d8c986567c27c56c05205dceadc92efb14ff.

Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20210122235049.3107620 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: SVM: Unconditionally sync GPRs to GHCB on VMRUN of SEV-ES guest

Drop the per-GPR dirty checks when synchronizing GPRs to the GHCB, the
GRPs' dirty bits are set from time zero and never cleared, i.e. will
always be seen as dirty.  The obvious alternative would be to clear
the dirty bits when appropriate, but removing the dirty checks is
desirable as it allows reverting GPR dirty+available tracking, which
adds overhead to all flavors of x86 VMs.

Note, unconditionally writing the GPRs in the GHCB is tacitly allowed
by the GHCB spec, which allows the hypervisor (or guest) to provide
unnecessary info; it's the guest's responsibility to consume only what
it needs (the hypervisor is untrusted after all).

  The guest and hypervisor can supply additional state if desired but
  must not rely on that additional state being provided.

Cc: Brijesh Singh <[email protected]>
Cc: Tom Lendacky <[email protected]>
Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20210122235049.3107620 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: nVMX: Sync unsync'd vmcs02 state to vmcs12 on migration

Even when we are outside the nested guest, some vmcs02 fields
may not be in sync vs vmcs12. This is intentional, even across
nested VM-exit, because the sync can be delayed until the nested
hypervisor performs a VMCLEAR or a VMREAD/VMWRITE that affects those
rarely accessed fields.

However, during KVM_GET_NESTED_STATE, the vmcs12 has to be up to date to
be able to restore it. To fix that, call copy_vmcs02_to_vmcs12_rare()
before the vmcs12 contents are copied to userspace.

Fixes: 7952d769c29ca ("KVM: nVMX: Sync rarely accessed guest fields only when needed")
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Maxim Levitsky <[email protected]>
Message-Id: <20210114205449 [email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>

kvm: tracing: Fix unmatched kvm_entry and kvm_exit events

On VMX, if we exit and then re-enter immediately without leaving
the vmx_vcpu_run() function, the kvm_entry event is not logged.
That means we will see one (or more) kvm_exit, without its (their)
corresponding kvm_entry, as shown here:

CPU-1979 [002] 89.871187: kvm_entry: vcpu 1
CPU-1979 [002] 89.871218: kvm_exit:  reason MSR_WRITE
CPU-1979 [002] 89.871259: kvm_exit:  reason MSR_WRITE

It also seems possible for a kvm_entry event to be logged, but then
we leave vmx_vcpu_run() right away (if vmx->emulation_required is
true). In this case, we will have a spurious kvm_entry event in the
trace.

Fix these situations by moving trace_kvm_entry() inside vmx_vcpu_run()
(where trace_kvm_exit() already is).

A trace obtained with this patch applied looks like this:

CPU-14295 [000] 8388.395387: kvm_entry: vcpu 0
CPU-14295 [000] 8388.395392: kvm_exit:  reason MSR_WRITE
CPU-14295 [000] 8388.395393: kvm_entry: vcpu 0
CPU-14295 [000] 8388.395503: kvm_exit:  reason EXTERNAL_INTERRUPT

Of course, not calling trace_kvm_entry() in common x86 code any
longer means that we need to adjust the SVM side of things too.

Signed-off-by: Lorenzo Brescia <[email protected]>
Signed-off-by: Dario Faggioli <[email protected]>
Message-Id: <160873470698.11652.13483635328769030605.stgit@Wayrath>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: Documentation: Update description of KVM_{GET,CLEAR}_DIRTY_LOG

Update various words, including the wrong parameter name and the vague
description of the usage of "slot" field.

Signed-off-by: Zenghui Yu <[email protected]>
Message-Id: <20201208043439 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: get smi pending status correctly

The injection process of smi has two steps:

    Qemu                        KVM
Step1:
    cpu->interrupt_request &= \
        ~CPU_INTERRUPT_SMI;
    kvm_vcpu_ioctl(cpu, KVM_SMI)

                                call kvm_vcpu_ioctl_smi() and
                                kvm_make_request(KVM_REQ_SMI, vcpu);

Step2:
    kvm_vcpu_ioctl(cpu, KVM_RUN, 0)

                                call process_smi() if
                                kvm_check_request(KVM_REQ_SMI, vcpu) is
                                true, mark vcpu->arch.smi_pending = true;

The vcpu->arch.smi_pending will be set true in step2, unfortunately if
vcpu paused between step1 and step2, the kvm_run->immediate_exit will be
set and vcpu has to exit to Qemu immediately during step2 before mark
vcpu->arch.smi_pending true.
During VM migration, Qemu will get the smi pending status from KVM using
KVM_GET_VCPU_EVENTS ioctl at the downtime, then the smi pending status
will be lost.

Signed-off-by: Jay Zhou <[email protected]>
Signed-off-by: Shengen Zhuang <[email protected]>
Message-Id: <20210118084720 [email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[]

The HW_REF_CPU_CYCLES event on the fixed counter 2 is pseudo-encoded as
0x0300 in the intel_perfmon_event_map[]. Correct its usage.

Fixes: 62079d8a4312 ("KVM: PMU: add proper support for fixed counter 2")
Signed-off-by: Like Xu <[email protected]>
Message-Id: <20201230081916 [email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86/pmu: Fix UBSAN shift-out-of-bounds warning in intel_pmu_refresh()

Since we know vPMU will not work properly when (1) the guest bit_width(s)
of the [gp|fixed] counters are greater than the host ones, or (2) guest
requested architectural events exceeds the range supported by the host, so
we can setup a smaller left shift value and refresh the guest cpuid entry,
thus fixing the following UBSAN shift-out-of-bounds warning:

shift exponent 197 is too large for 64-bit type 'long long unsigned int'

Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x107/0x163 lib/dump_stack.c:120
ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
__ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
intel_pmu_refresh.cold+0x75/0x99 arch/x86/kvm/vmx/pmu_intel.c:348
kvm_vcpu_after_set_cpuid+0x65a/0xf80 arch/x86/kvm/cpuid.c:177
kvm_vcpu_ioctl_set_cpuid2+0x160/0x440 arch/x86/kvm/cpuid.c:308
kvm_arch_vcpu_ioctl+0x11b6/0x2d70 arch/x86/kvm/x86.c:4709
kvm_vcpu_ioctl+0x7b9/0xdb0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3386
vfs_ioctl fs/ioctl.c:48 [inline]
__do_sys_ioctl fs/ioctl.c:753 [inline]
__se_sys_ioctl fs/ioctl.c:739 [inline]
__x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported-by: [email protected]
Signed-off-by: Like Xu <[email protected]>
Message-Id: <20210118025800 [email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: Add more protection against undefined behavior in rsvd_bits()

Add compile-time asserts in rsvd_bits() to guard against KVM passing in
garbage hardcoded values, and cap the upper bound at '63' for dynamic
values to prevent generating a mask that would overflow a u64.

Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20210113204515.3473079 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: Documentation: Fix spec for KVM_CAP_ENABLE_CAP_VM

The documentation classifies KVM_ENABLE_CAP with KVM_CAP_ENABLE_CAP_VM
as a vcpu ioctl, which is incorrect. Fix it by specifying it as a VM
ioctl.

Fixes: e5d83c74a580 ("kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnostic")
Signed-off-by: Quentin Perret <[email protected]>
Message-Id: <20210108165349 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge tag 'kvmarm-fixes-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 5.11, take #2

- Don't allow tagged pointers to point to memslots
- Filter out ARMv8.1+ PMU events on v8.0 hardware
- Hide PMU registers from userspace when no PMU is configured
- More PMU cleanups
- Don't try to handle broken PSCI firmware
- More sys_reg() to reg_to_encoding() conversions

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
"Fix a regression in the cesa driver"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: marvel/cesa - Fix tdma descriptor on 64-bit

uapi: fix big endian definition of ipv6_rpl_sr_hdr

Following RFC 6554 [1], the current order of fields is wrong for big
endian definition. Indeed, here is how the header looks like:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Next Header  |  Hdr Ext Len  | Routing Type  | Segments Left |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CmprI | CmprE |  Pad  |               Reserved                |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

This patch reorders fields so that big endian definition is now correct.

  [1] https://tools.ietf.org/html/rfc6554#section-3

Fixes: cfa933d938d8 ("include: uapi: linux: add rpl sr header definition")
Signed-off-by: Justin Iurman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

dt-bindings: usb: j721e: add ranges and dma-coherent props

Add missed 'ranges' and 'dma-coherent' properties as cdns-usb DT nodes has
child node and DMA IO is coherent on TI K3 J721E/J7200 SoCs.

This also fixes dtbs_check warning:
cdns-usb@4104000: 'dma-coherent', 'ranges' do not match any of the regexes: '^usb@', 'pinctrl-[0-9]+'

Signed-off-by: Grygorii Strashko <[email protected]>
Acked-by: Aswath Govindraju <[email protected]>
Reviewed-by: Aswath Govindraju <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>

SUNRPC: Handle 0 length opaque XDR object data properly

When handling an auth_gss downcall, it's possible to get 0-length
opaque object for the acceptor.  In the case of a 0-length XDR
object, make sure simple_get_netobj() fills in dest->data = NULL,
and does not continue to kmemdup() which will set
dest->data = ZERO_SIZE_PTR for the acceptor.

The trace event code can handle NULL but not ZERO_SIZE_PTR for a
string, and so without this patch the rpcgss_context trace event
will crash the kernel as follows:

[  162.887992] BUG: kernel NULL pointer dereference, address: 0000000000000010
[  162.898693] #PF: supervisor read access in kernel mode
[  162.900830] #PF: error_code(0x0000) - not-present page
[  162.902940] PGD 0 P4D 0
[  162.904027] Oops: 0000 [#1] SMP PTI
[  162.905493] CPU: 4 PID: 4321 Comm: rpc.gssd Kdump: loaded Not tainted 5.10.0 #133
[  162.908548] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  162.910978] RIP: 0010:strlen+0x0/0x20
[  162.912505] Code: 48 89 f9 74 09 48 83 c1 01 80 39 00 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee c3 0f 1f 80 00 00 00 00 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
[  162.920101] RSP: 0018:ffffaec900c77d90 EFLAGS: 00010202
[  162.922263] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000fffde697
[  162.925158] RDX: 000000000000002f RSI: 0000000000000080 RDI: 0000000000000010
[  162.928073] RBP: 0000000000000010 R08: 0000000000000e10 R09: 0000000000000000
[  162.930976] R10: ffff8e698a590cb8 R11: 0000000000000001 R12: 0000000000000e10
[  162.933883] R13: 00000000fffde697 R14: 000000010034d517 R15: 0000000000070028
[  162.936777] FS:  00007f1e1eb93700(0000) GS:ffff8e6ab7d00000(0000) knlGS:0000000000000000
[  162.940067] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  162.942417] CR2: 0000000000000010 CR3: 0000000104eba000 CR4: 00000000000406e0
[  162.945300] Call Trace:
[  162.946428]  trace_event_raw_event_rpcgss_context+0x84/0x140 [auth_rpcgss]
[  162.949308]  ? __kmalloc_track_caller+0x35/0x5a0
[  162.951224]  ? gss_pipe_downcall+0x3a3/0x6a0 [auth_rpcgss]
[  162.953484]  gss_pipe_downcall+0x585/0x6a0 [auth_rpcgss]
[  162.955953]  rpc_pipe_write+0x58/0x70 [sunrpc]
[  162.957849]  vfs_write+0xcb/0x2c0
[  162.959264]  ksys_write+0x68/0xe0
[  162.960706]  do_syscall_64+0x33/0x40
[  162.962238]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  162.964346] RIP: 0033:0x7f1e1f1e57df

Signed-off-by: Dave Wysochanski <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>

SUNRPC: Move simple_get_bytes and simple_get_netobj into private header

Remove duplicated helper functions to parse opaque XDR objects
and place inside new file net/sunrpc/auth_gss/auth_gss_internal.h.
In the new file carry the license and copyright from the source file
net/sunrpc/auth_gss/auth_gss.c. Finally, update the comment inside
include/linux/sunrpc/xdr.h since lockd is not the only user of
struct xdr_netobj.

Signed-off-by: Dave Wysochanski <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>

fs/pipe: allow sendfile() to pipe again

After commit 36e2c7421f02 ("fs: don't allow splice read/write
without explicit ops") sendfile() could no longer send data
from a real file to a pipe, breaking for example certain cgit
setups (e.g. when running behind fcgiwrap), because in this
case cgit will try to do exactly this: sendfile() to a pipe.

Fix this by using iter_file_splice_write for the splice_write
method of pipes, as suggested by Christoph.

Cc: [email protected]
Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops")
Suggested-by: Christoph Hellwig <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Johannes Berg <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Commit 9bb48c82aced ("tty: implement write_iter") converted the tty
layer to use write_iter. Fix the redirected_tty_write declaration
also in n_tty and change the comparisons to use write_iter instead of
write.

[ Also moved the declaration of redirected_tty_write() to the proper
  location in a header file. The reason for the bug was the bogus extern
  declaration in n_tty.c silently not matching the changed definition in
  tty_io.c, and because it wasn't in a shared header file, there was no
  cross-checking of the declaration.

  Sami noticed because Clang's Control Flow Integrity checking ended up
  incidentally noticing the inconsistent declaration.    - Linus ]

Fixes: 9bb48c82aced ("tty: implement write_iter")
Signed-off-by: Sami Tolvanen <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

dt-bindings:iio:adc: update adc.yaml reference

Changeset b70d154d6558 ("dt-bindings:iio:adc: convert adc.txt to yaml")
renamed: Documentation/devicetree/bindings/iio/adc/adc.txt
to: Documentation/devicetree/bindings/iio/adc/adc.yaml.

Update its cross-reference accordingly.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Acked-by: Jonathan Cameron <[email protected]>
Link: https://lore.kernel.org/r/8e37dba8ae9099acd649bab8a1cf718caa4f3e6a.1610535350.git.mchehab+huawei@kernel.org
Signed-off-by: Rob Herring <[email protected]>

dt-bindings: memory: mediatek: update mediatek,smi-larb.yaml references

Changeset 27bb0e42855a ("dt-bindings: memory: mediatek: Convert SMI to DT schema")
renamed: Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.txt
to: Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml.

Update its cross-references accordingly.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Link: https://lore.kernel.org/r/c70bd79b311a65babe7374eaf81974563400a943.1610535350.git.mchehab+huawei@kernel.org
Signed-off-by: Rob Herring <[email protected]>

dt-bindings: display: mediatek: update mediatek,dpi.yaml reference

Changeset 9273cf7d3942 ("dt-bindings: display: mediatek: convert the dpi bindings to yaml")
renamed: Documentation/devicetree/bindings/display/mediatek/mediatek,dpi.txt
to: Documentation/devicetree/bindings/display/mediatek/mediatek,dpi.yaml.

Update its cross-reference accordingly.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Link: https://lore.kernel.org/r/3bf906f39b797d18800abd387187cce71296e5eb.1610535350.git.mchehab+huawei@kernel.org
Signed-off-by: Rob Herring <[email protected]>

ASoC: audio-graph-card: update audio-graph-card.yaml reference

Changeset 97198614f6c3 ("ASoC: audio-graph-card: switch to yaml base Documentation")
renamed: Documentation/devicetree/bindings/sound/audio-graph-card.txt
to: Documentation/devicetree/bindings/sound/audio-graph-card.yaml.

Update its cross-reference accordingly.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Link: https://lore.kernel.org/r/8a779e6b9644d19c5d77b382059f6ccf9781434d.1610535350.git.mchehab+huawei@kernel.org
Signed-off-by: Rob Herring <[email protected]>

Merge tag 'printk-for-5.11-urgent-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk fix from Petr Mladek:
"The fix of a potential buffer overflow in 5.11-rc5 introduced another
  one. The trailing '\0' might be written up to the message "len" past
  the buffer. Fortunately, it is not that easy to hit.

  Most readers use 1kB buffers for a single message. Typical messages
  fit into the temporary buffer with enough reserve.

  Also readers do not rely on the '\0'. It is related to the previous
  fix. Some readers required the space for the trailing '\0'. We decided
  to write it there to avoid such regressions in the future.

  The most realistic victims are dumpers using kmsg_dump_get_buffer().
  They are filling the entire buffer with as many messages as possible.
  They are typically used when handling panic()"

* tag 'printk-for-5.11-urgent-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: fix string termination for record_print_text()

nbd: freeze the queue while we're adding connections

When setting up a device, we can krealloc the config->socks array to add
new sockets to the configuration. However if we happen to get a IO
request in at this point even though we aren't setup we could hit a UAF,
as we deref config->socks without any locking, assuming that the
configuration was setup already and that ->socks is safe to access it as
we have a reference on the configuration.

But there's nothing really preventing IO from occurring at this point of
the device setup, we don't want to incur the overhead of a lock to
access ->socks when it will never change while the device is running.
To fix this UAF scenario simply freeze the queue if we are adding
sockets. This will protect us from this particular case without adding
any additional overhead for the normal running case.

Cc: [email protected]
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

PM: hibernate: flush swap writer after marking

Flush the swap writer after, not before, marking the files, to ensure the
signature is properly written.

Fixes: 6f612af57821 ("PM / Hibernate: Group swap ops")
Signed-off-by: Laurent Badel <[email protected]>
Cc: All applicable <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

btrfs: fix log replay failure due to race with space cache rebuild

After a sudden power failure we may end up with a space cache on disk that
is not valid and needs to be rebuilt from scratch.

If that happens, during log replay when we attempt to pin an extent buffer
from a log tree, at btrfs_pin_extent_for_log_replay(), we do not wait for
the space cache to be rebuilt through the call to:

    btrfs_cache_block_group(cache, 1);

That is because that only waits for the task (work queue job) that loads
the space cache to change the cache state from BTRFS_CACHE_FAST to any
other value. That is ok when the space cache on disk exists and is valid,
but when the cache is not valid and needs to be rebuilt, it ends up
returning as soon as the cache state changes to BTRFS_CACHE_STARTED (done
at caching_thread()).

So this means that we can end up trying to unpin a range which is not yet
marked as free in the block group. This results in the call to
btrfs_remove_free_space() to return -EINVAL to
btrfs_pin_extent_for_log_replay(), which in turn makes the log replay fail
as well as mounting the filesystem. More specifically the -EINVAL comes
from free_space_cache.c:remove_from_bitmap(), because the requested range
is not marked as free space (ones in the bitmap), we have the following
condition triggered:

static noinline int remove_from_bitmap(struct btrfs_free_space_ctl *ctl,
(...)
       if (ret < 0 || search_start != *offset)
            return -EINVAL;
(...)

It's the "search_start != *offset" that results in the condition being
evaluated to true.

When this happens we got the following in dmesg/syslog:

[72383.415114] BTRFS: device fsid 32b95b69-0ea9-496a-9f02-3f5a56dc9322 devid 1 transid 1432 /dev/sdb scanned by mount (3816007)
[72383.417837] BTRFS info (device sdb): disk space caching is enabled
[72383.418536] BTRFS info (device sdb): has skinny extents
[72383.423846] BTRFS info (device sdb): start tree-log replay
[72383.426416] BTRFS warning (device sdb): block group 30408704 has wrong amount of free space
[72383.427686] BTRFS warning (device sdb): failed to load free space cache for block group 30408704, rebuilding it now
[72383.454291] BTRFS: error (device sdb) in btrfs_recover_log_trees:6203: errno=-22 unknown (Failed to pin buffers while recovering log root tree.)
[72383.456725] BTRFS: error (device sdb) in btrfs_replay_log:2253: errno=-22 unknown (Failed to recover log tree)
[72383.460241] BTRFS error (device sdb): open_ctree failed

We also mark the range for the extent buffer in the excluded extents io
tree. That is fine when the space cache is valid on disk and we can load
it, in which case it causes no problems.

However, for the case where we need to rebuild the space cache, because it
is either invalid or it is missing, having the extent buffer range marked
in the excluded extents io tree leads to a -EINVAL failure from the call
to btrfs_remove_free_space(), resulting in the log replay and mount to
fail. This is because by having the range marked in the excluded extents
io tree, the caching thread ends up never adding the range of the extent
buffer as free space in the block group since the calls to
add_new_free_space(), called from load_extent_tree_free(), filter out any
ranges that are marked as excluded extents.

So fix this by making sure that during log replay we wait for the caching
task to finish completely when we need to rebuild a space cache, and also
drop the need to mark the extent buffer range in the excluded extents io
tree, as well as clearing ranges from that tree at
btrfs_finish_extent_commit().

This started to happen with some frequency on large filesystems having
block groups with a lot of fragmentation since the recent commit
e747853cae3ae3 ("btrfs: load free space cache asynchronously"), but in
fact the issue has been there for years, it was just much less likely
to happen.

Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: fix lockdep warning due to seqcount_mutex on 32bit arch

This effectively reverts commit d5c8238849e7 ("btrfs: convert
data_seqcount to seqcount_mutex_t").

While running fstests on 32 bits test box, many tests failed because of
warnings in dmesg. One of those warnings (btrfs/003):

  [66.441317] WARNING: CPU: 6 PID: 9251 at include/linux/seqlock.h:279 btrfs_remove_chunk+0x58b/0x7b0 [btrfs]
  [66.441446] CPU: 6 PID: 9251 Comm: btrfs Tainted: G           O      5.11.0-rc4-custom+ #5
  [66.441449] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014
  [66.441451] EIP: btrfs_remove_chunk+0x58b/0x7b0 [btrfs]
  [66.441472] EAX: 00000000 EBX: 00000001 ECX: c576070c EDX: c6b15803
  [66.441475] ESI: 10000000 EDI: 00000000 EBP: c56fbcfc ESP: c56fbc70
  [66.441477] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010246
  [66.441481] CR0: 80050033 CR2: 05c8da20 CR3: 04b20000 CR4: 00350ed0
  [66.441485] Call Trace:
  [66.441510]  btrfs_relocate_chunk+0xb1/0x100 [btrfs]
  [66.441529]  ? btrfs_lookup_block_group+0x17/0x20 [btrfs]
  [66.441562]  btrfs_balance+0x8ed/0x13b0 [btrfs]
  [66.441586]  ? btrfs_ioctl_balance+0x333/0x3c0 [btrfs]
  [66.441619]  ? __this_cpu_preempt_check+0xf/0x11
  [66.441643]  btrfs_ioctl_balance+0x333/0x3c0 [btrfs]
  [66.441664]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
  [66.441683]  btrfs_ioctl+0x414/0x2ae0 [btrfs]
  [66.441700]  ? __lock_acquire+0x35f/0x2650
  [66.441717]  ? lockdep_hardirqs_on+0x87/0x120
  [66.441720]  ? lockdep_hardirqs_on_prepare+0xd0/0x1e0
  [66.441724]  ? call_rcu+0x2d3/0x530
  [66.441731]  ? __might_fault+0x41/0x90
  [66.441736]  ? kvm_sched_clock_read+0x15/0x50
  [66.441740]  ? sched_clock+0x8/0x10
  [66.441745]  ? sched_clock_cpu+0x13/0x180
  [66.441750]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
  [66.441750]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
  [66.441768]  __ia32_sys_ioctl+0x165/0x8a0
  [66.441773]  ? __this_cpu_preempt_check+0xf/0x11
  [66.441785]  ? __might_fault+0x89/0x90
  [66.441791]  __do_fast_syscall_32+0x54/0x80
  [66.441796]  do_fast_syscall_32+0x32/0x70
  [66.441801]  do_SYSENTER_32+0x15/0x20
  [66.441805]  entry_SYSENTER_32+0x9f/0xf2
  [66.441808] EIP: 0xab7b5549
  [66.441814] EAX: ffffffda EBX: 00000003 ECX: c4009420 EDX: bfa91f5c
  [66.441816] ESI: 00000003 EDI: 00000001 EBP: 00000000 ESP: bfa91e98
  [66.441818] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000292
  [66.441833] irq event stamp: 42579
  [66.441835] hardirqs last  enabled at (42585): [<c60eb065>] console_unlock+0x495/0x590
  [66.441838] hardirqs last disabled at (42590): [<c60eafd5>] console_unlock+0x405/0x590
  [66.441840] softirqs last  enabled at (41698): [<c601b76c>] call_on_stack+0x1c/0x60
  [66.441843] softirqs last disabled at (41681): [<c601b76c>] call_on_stack+0x1c/0x60

  ========================================================================
  btrfs_remove_chunk+0x58b/0x7b0:
  __seqprop_mutex_assert at linux/./include/linux/seqlock.h:279
  (inlined by) btrfs_device_set_bytes_used at linux/fs/btrfs/volumes.h:212
  (inlined by) btrfs_remove_chunk at linux/fs/btrfs/volumes.c:2994
  ========================================================================

The warning is produced by lockdep_assert_held() in
__seqprop_mutex_assert() if CONFIG_LOCKDEP is enabled.
And "olumes.c:2994 is btrfs_device_set_bytes_used() with mutex lock
fs_info->chunk_mutex held already.

After adding some debug prints, the cause was found that many
__alloc_device() are called with NULL @fs_info (during scanning ioctl).
Inside the function, btrfs_device_data_ordered_init() is expanded to
seqcount_mutex_init().  In this scenario, its second
parameter info->chunk_mutex  is &NULL->chunk_mutex which equals
to offsetof(struct btrfs_fs_info, chunk_mutex) unexpectedly. Thus,
seqcount_mutex_init() is called in wrong way. And later
btrfs_device_get/set helpers trigger lockdep warnings.

The device and filesystem object lifetimes are different and we'd have
to synchronize initialization of the btrfs_device::data_seqcount with
the fs_info, possibly using some additional synchronization. It would
still not prevent concurrent access to the seqcount lock when it's used
for read and initialization.

Commit d5c8238849e7 ("btrfs: convert data_seqcount to seqcount_mutex_t")
does not mention a particular problem being fixed so revert should not
cause any harm and we'll get the lockdep warning fixed.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=210139
Reported-by: Erhard F <[email protected]>
Fixes: d5c8238849e7 ("btrfs: convert data_seqcount to seqcount_mutex_t")
CC: [email protected] # 5.10
CC: Davidlohr Bueso <[email protected]>
Signed-off-by: Su Yue <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: fix possible free space tree corruption with online conversion

While running btrfs/011 in a loop I would often ASSERT() while trying to
add a new free space entry that already existed, or get an EEXIST while
adding a new block to the extent tree, which is another indication of
double allocation.

This occurs because when we do the free space tree population, we create
the new root and then populate the tree and commit the transaction.
The problem is when you create a new root, the root node and commit root
node are the same.  During this initial transaction commit we will run
all of the delayed refs that were paused during the free space tree
generation, and thus begin to cache block groups.  While caching block
groups the caching thread will be reading from the main root for the
free space tree, so as we make allocations we'll be changing the free
space tree, which can cause us to add the same range twice which results
in either the ASSERT(ret != -EEXIST); in __btrfs_add_free_space, or in a
variety of different errors when running delayed refs because of a
double allocation.

Fix this by marking the fs_info as unsafe to load the free space tree,
and fall back on the old slow method.  We could be smarter than this,
for example caching the block group while we're populating the free
space tree, but since this is a serious problem I've opted for the
simplest solution.

CC: [email protected] # 4.9+
Fixes: a5ed91828518 ("Btrfs: implement the free space B-tree")
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: David Sterba <[email protected]>

kernel: kexec: remove the lock operation of system_transition_mutex

Function kernel_kexec() is called with lock system_transition_mutex
held in reboot system call. While inside kernel_kexec(), it will
acquire system_transition_mutex agin. This will lead to dead lock.

The dead lock should be easily triggered, it hasn't caused any
failure report just because the feature 'kexec jump' is almost not
used by anyone as far as I know. An inquiry can be made about who
is using 'kexec jump' and where it's used. Before that, let's simply
remove the lock operation inside CONFIG_KEXEC_JUMP ifdeffery scope.

Fixes: 55f2503c3b69 ("PM / reboot: Eliminate race between reboot and suspend")
Signed-off-by: Baoquan He <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Reviewed-by: Pingfan Liu <[email protected]>
Cc: 4.19+ <[email protected]> # 4.19+
Signed-off-by: Rafael J. Wysocki <[email protected]>