Git Repo - linux.git/log

s390/uaccess: use symbolic names for inline assembler operands

Make code easier to read by using symbolic names.

Signed-off-by: Heiko Carstens <[email protected]>

s390/mcck: isolate SIE instruction when setting CIF_MCCK_GUEST flag

Commit d768bd892fc8 ("s390: add options to change branch prediction
behaviour for the kernel") introduced .Lsie_exit label - supposedly
to fence off SIE instruction. However, the corresponding address
range length .Lsie_crit_mcck_length was not updated, which led to
BPON code potentionally marked with CIF_MCCK_GUEST flag.

Both .Lsie_exit and .Lsie_crit_mcck_length were removed with commit
0b0ed657fe00 ("s390: remove critical section cleanup from entry.S"),
but the issue persisted - currently BPOFF and BPENTER macros might
get wrongly considered by the machine check handler as a guest.

Fixes: d768bd892fc8 ("s390: add options to change branch prediction behaviour for the kernel")
Reviewed-by: Sven Schnelle <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Signed-off-by: Alexander Gordeev <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>

s390/mm: use non-quiescing sske for KVM switch to keyed guest

The switch to a keyed guest does not require a classic sske as the other
guest CPUs are not accessing the key before the switch is complete.
By using the NQ SSKE things are faster especially with multiple guests.

Signed-off-by: Christian Borntraeger <[email protected]>
Suggested-by: Janis Schoetterl-Glausch <[email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Borntraeger <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>

s390/gmap: voluntarily schedule during key setting

With large and many guest with storage keys it is possible to create
large latencies or stalls during initial key setting:

rcu: INFO: rcu_sched self-detected stall on CPU
rcu:   18-....: (2099 ticks this GP) idle=54e/1/0x4000000000000002 softirq=35598716/35598716 fqs=998
       (t=2100 jiffies g=155867385 q=20879)
Task dump for CPU 18:
CPU 1/KVM       R  running task        0 1030947 256019 0x06000004
Call Trace:
sched_show_task
rcu_dump_cpu_stacks
rcu_sched_clock_irq
update_process_times
tick_sched_handle
tick_sched_timer
__hrtimer_run_queues
hrtimer_interrupt
do_IRQ
ext_int_handler
ptep_zap_key

The mmap lock is held during the page walking but since this is a
semaphore scheduling is still possible. Same for the kvm srcu.
To minimize overhead do this on every segment table entry or large page.

Signed-off-by: Christian Borntraeger <[email protected]>
Reviewed-by: Alexander Gordeev <[email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Borntraeger <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>

MAINTAINERS: Update s390 virtio-ccw

Add myself to the kernel side of virtio-ccw

Signed-off-by: Eric Farman <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Acked-by: Halil Pasic <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Borntraeger <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>

s390/kexec: add __GFP_NORETRY to KEXEC_CONTROL_MEMORY_GFP

Avoid invoking the OOM-killer when allocating the control page. This
is the s390 variant of commit dc5cccacf427 ("kexec: don't invoke
OOM-killer for control page allocation").

Signed-off-by: Heiko Carstens <[email protected]>

s390/Kconfig.debug: fix indentation

The convention for indentation seems to be a single tab. Help text is
further indented by an additional two whitespaces. Fix the lines that
violate these rules.

Signed-off-by: Juerg Haefliger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Carstens <[email protected]>

s390/Kconfig: fix indentation

The convention for indentation seems to be a single tab. Help text is
further indented by an additional two whitespaces. Fix the lines that
violate these rules.

Signed-off-by: Juerg Haefliger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Carstens <[email protected]>

macsec: fix UAF bug for real_dev

Create a new macsec device but not get reference to real_dev. That can
not ensure that real_dev is freed after macsec. That will trigger the
UAF bug for real_dev as following:

==================================================================
BUG: KASAN: use-after-free in macsec_get_iflink+0x5f/0x70 drivers/net/macsec.c:3662
Call Trace:
...
macsec_get_iflink+0x5f/0x70 drivers/net/macsec.c:3662
dev_get_iflink+0x73/0xe0 net/core/dev.c:637
default_operstate net/core/link_watch.c:42 [inline]
rfc2863_policy+0x233/0x2d0 net/core/link_watch.c:54
linkwatch_do_dev+0x2a/0x150 net/core/link_watch.c:161

Allocated by task 22209:
...
alloc_netdev_mqs+0x98/0x1100 net/core/dev.c:10549
rtnl_create_link+0x9d7/0xc00 net/core/rtnetlink.c:3235
veth_newlink+0x20e/0xa90 drivers/net/veth.c:1748

Freed by task 8:
...
kfree+0xd6/0x4d0 mm/slub.c:4552
kvfree+0x42/0x50 mm/util.c:615
device_release+0x9f/0x240 drivers/base/core.c:2229
kobject_cleanup lib/kobject.c:673 [inline]
kobject_release lib/kobject.c:704 [inline]
kref_put include/linux/kref.h:65 [inline]
kobject_put+0x1c8/0x540 lib/kobject.c:721
netdev_run_todo+0x72e/0x10b0 net/core/dev.c:10327

After commit faab39f63c1f ("net: allow out-of-order netdev unregistration")
and commit e5f80fcf869a ("ipv6: give an IPv6 dev to blackhole_netdev"), we
can add dev_hold_track() in macsec_dev_init() and dev_put_track() in
macsec_free_netdev() to fix the problem.

Fixes: 2bce1ebed17d ("macsec: fix refcnt leak in module exit routine")
Reported-by: [email protected]
Signed-off-by: Ziyang Xuan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

gpu: host1x: Add context bus

The context bus is a "dummy" bus that contains struct devices that
correspond to IOMMU contexts assigned through Host1x to processes.

Even when host1x itself is built as a module, the bus is registered
in built-in code so that the built-in ARM SMMU driver is able to
reference it.

Signed-off-by: Mikko Perttunen <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>

octeontx2-af: fix error code in is_valid_offset()

The is_valid_offset() function returns success/true if the call to
validate_and_get_cpt_blkaddr() fails.

Fixes: ecad2ce8c48f ("octeontx2-af: cn10k: Add mailbox to configure reassembly timeout")
Signed-off-by: Dan Carpenter <[email protected]>
Link: https://lore.kernel.org/r/YpXDrTPb8qV01JSP@kili
Signed-off-by: Paolo Abeni <[email protected]>

wifi: mac80211: fix use-after-free in chanctx code

In ieee80211_vif_use_reserved_context(), when we have an
old context and the new context's replace_state is set to
IEEE80211_CHANCTX_REPLACE_NONE, we free the old context
in ieee80211_vif_use_reserved_reassign(). Therefore, we
cannot check the old_ctx anymore, so we should set it to
NULL after this point.

However, since the new_ctx replace state is clearly not
IEEE80211_CHANCTX_REPLACES_OTHER, we're not going to do
anything else in this function and can just return to
avoid accessing the freed old_ctx.

Cc: [email protected]
Fixes: 5bcae31d9cb1 ("mac80211: implement multi-vif in-place reservations")
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/20220601091926.df419d91b165.I17a9b3894ff0b8323ce2afdb153b101124c821e5@changeid

bonding: guard ns_targets by CONFIG_IPV6

Guard ns_targets in struct bond_params by CONFIG_IPV6, which could save
256 bytes if IPv6 not configed. Also add this protection for function
bond_is_ip6_target_ok() and bond_get_targets_ip6().

Remove the IS_ENABLED() check for bond_opts[] as this will make
BOND_OPT_NS_TARGETS uninitialized if CONFIG_IPV6 not enabled. Add
a dummy bond_option_ns_ip6_targets_set() for this situation.

Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets")
Signed-off-by: Hangbin Liu <[email protected]>
Acked-by: Jonathan Toppins <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge tag 'asoc-fix-v5.19-rc0' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v5.19

A few more fixes that came in during the merge window - nothing
huge here, there is one core fix for DPCM from Pierre but mostly
driver changes.

vdpa: ifcvf: set pci driver data in probe

We should set the pci driver data in probe instead of the vdpa device
adding callback. Otherwise if no vDPA device is created we will lose
the pointer to the management device.

Fixes: 6b5df347c6482 ("vDPA/ifcvf: implement management netlink framework for ifcvf")
Tested-by: Zheyu Ma <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Message-Id: <20220524055557 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vdpa/mlx5: Add RX MAC VLAN filter support

Support HW offloaded filtering of MAC/VLAN packets.
To allow that, we add a handler to handle VLAN configurations coming
through the control VQ. Two operations are supported.

1. Adding VLAN - in this case, an entry will be added to the RX flow
   table that will allow the combination of the MAC/VLAN to be
   forwarded to the TIR.
2. Removing VLAN - will remove the entry from the flow table,
   effectively blocking such packets from going through.

Currently the control VQ does not propagate changes to the MAC of the
VLAN device so we always use the MAC of the parent device.

Examples:
1. Create vlan device:
$ ip link add link ens1 name ens1.8 type vlan id 8

Signed-off-by: Eli Cohen <[email protected]>
Message-Id: <20220411122942 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Jason Wang <[email protected]>

vdpa/mlx5: Remove flow counter from steering

The flow counter has been introduced in early versions of the driver to
aid in debugging. It is no longer needed and can harm performance.

Remove it.

Signed-off-by: Eli Cohen <[email protected]>
Message-Id: <20220411122942 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

xen: replace xen_remap() with memremap()

xen_remap() is used to establish mappings for frames not under direct
control of the kernel: for Xenstore and console ring pages, and for
grant pages of non-PV guests.

Today xen_remap() is defined to use ioremap() on x86 (doing uncached
mappings), and ioremap_cache() on Arm (doing cached mappings).

Uncached mappings for those use cases are bad for performance, so they
should be avoided if possible. As all use cases of xen_remap() don't
require uncached mappings (the mapped area is always physical RAM),
a mapping using the standard WB cache mode is fine.

As sparse is flagging some of the xen_remap() use cases to be not
appropriate for iomem(), as the result is not annotated with the
__iomem modifier, eliminate xen_remap() completely and replace all
use cases with memremap() specifying the MEMREMAP_WB caching mode.

xen_unmap() can be replaced with memunmap().

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Acked-by: Stefano Stabellini <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>

cifs: fix potential deadlock in direct reclaim

The srv_mutex is used during writeback so cifs should ensure that
allocations done when that mutex is held are done with GFP_NOFS, to
avoid having direct reclaim ending up waiting for the same mutex and
causing a deadlock.  This is detected by lockdep with the splat below:

======================================================
WARNING: possible circular locking dependency detected
5.18.0 #70 Not tainted
------------------------------------------------------
kswapd0/49 is trying to acquire lock:
ffff8880195782e0 (&tcp_ses->srv_mutex){+.+.}-{3:3}, at: compound_send_recv

but task is already holding lock:
ffffffffa98e66c0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (fs_reclaim){+.+.}-{0:0}:
        fs_reclaim_acquire
        kmem_cache_alloc_trace
        __request_module
        crypto_alg_mod_lookup
        crypto_alloc_tfm_node
        crypto_alloc_shash
        cifs_alloc_hash
        smb311_crypto_shash_allocate
        smb311_update_preauth_hash
        compound_send_recv
        cifs_send_recv
        SMB2_negotiate
        smb2_negotiate
        cifs_negotiate_protocol
        cifs_get_smb_ses
        cifs_mount
        cifs_smb3_do_mount
        smb3_get_tree
        vfs_get_tree
        path_mount
        __x64_sys_mount
        do_syscall_64
        entry_SYSCALL_64_after_hwframe

-> #0 (&tcp_ses->srv_mutex){+.+.}-{3:3}:
        __lock_acquire
        lock_acquire
        __mutex_lock
        mutex_lock_nested
        compound_send_recv
        cifs_send_recv
        SMB2_write
        smb2_sync_write
        cifs_write
        cifs_writepage_locked
        cifs_writepage
        shrink_page_list
        shrink_lruvec
        shrink_node
        balance_pgdat
        kswapd
        kthread
        ret_from_fork

other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(fs_reclaim);
                                lock(&tcp_ses->srv_mutex);
                                lock(fs_reclaim);
   lock(&tcp_ses->srv_mutex);

  *** DEADLOCK ***

1 lock held by kswapd0/49:
  #0: ffffffffa98e66c0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat

stack backtrace:
CPU: 2 PID: 49 Comm: kswapd0 Not tainted 5.18.0 #70
Call Trace:
  <TASK>
  dump_stack_lvl
  dump_stack
  print_circular_bug.cold
  check_noncircular
  __lock_acquire
  lock_acquire
  __mutex_lock
  mutex_lock_nested
  compound_send_recv
  cifs_send_recv
  SMB2_write
  smb2_sync_write
  cifs_write
  cifs_writepage_locked
  cifs_writepage
  shrink_page_list
  shrink_lruvec
  shrink_node
  balance_pgdat
  kswapd
  kthread
  ret_from_fork
  </TASK>

Fix this by using the memalloc_nofs_save/restore APIs around the places
where the srv_mutex is held.  Do this in a wrapper function for the
lock/unlock of the srv_mutex, and rename the srv_mutex to avoid missing
call sites in the conversion.

Note that there is another lockdep warning involving internal crypto
locks, which was masked by this problem and is visible after this fix,
see the discussion in this thread:

https://lore.kernel.org/all/20220523123755 [email protected]/

Link: https://lore.kernel.org/r/CANT5p=rqcYfYMVHirqvdnnca4Mo+JQSw5Qu12v=kPfpk5yhhmg@mail.gmail.com/
Reported-by: Shyam Prasad N <[email protected]>
Suggested-by: Lars Persson <[email protected]>
Reviewed-by: Ronnie Sahlberg <[email protected]>
Reviewed-by: Enzo Matsumiya <[email protected]>
Signed-off-by: Vincent Whitchurch <[email protected]>
Signed-off-by: Steve French <[email protected]>

tcp: tcp_rtx_synack() can be called from process context

Laurent reported the enclosed report [1]

This bug triggers with following coditions:

0) Kernel built with CONFIG_DEBUG_PREEMPT=y

1) A new passive FastOpen TCP socket is created.
   This FO socket waits for an ACK coming from client to be a complete
   ESTABLISHED one.
2) A socket operation on this socket goes through lock_sock()
   release_sock() dance.
3) While the socket is owned by the user in step 2),
   a retransmit of the SYN is received and stored in socket backlog.
4) At release_sock() time, the socket backlog is processed while
   in process context.
5) A SYNACK packet is cooked in response of the SYN retransmit.
6) -> tcp_rtx_synack() is called in process context.

Before blamed commit, tcp_rtx_synack() was always called from BH handler,
from a timer handler.

Fix this by using TCP_INC_STATS() & NET_INC_STATS()
which do not assume caller is in non preemptible context.

[1]
BUG: using __this_cpu_add() in preemptible [00000000] code: epollpep/2180
caller is tcp_rtx_synack.part.0+0x36/0xc0
CPU: 10 PID: 2180 Comm: epollpep Tainted: G           OE     5.16.0-0.bpo.4-amd64 #1  Debian 5.16.12-1~bpo11+1
Hardware name: Supermicro SYS-5039MC-H8TRF/X11SCD-F, BIOS 1.7 11/23/2021
Call Trace:
<TASK>
dump_stack_lvl+0x48/0x5e
check_preemption_disabled+0xde/0xe0
tcp_rtx_synack.part.0+0x36/0xc0
tcp_rtx_synack+0x8d/0xa0
? kmem_cache_alloc+0x2e0/0x3e0
? apparmor_file_alloc_security+0x3b/0x1f0
inet_rtx_syn_ack+0x16/0x30
tcp_check_req+0x367/0x610
tcp_rcv_state_process+0x91/0xf60
? get_nohz_timer_target+0x18/0x1a0
? lock_timer_base+0x61/0x80
? preempt_count_add+0x68/0xa0
tcp_v4_do_rcv+0xbd/0x270
__release_sock+0x6d/0xb0
release_sock+0x2b/0x90
sock_setsockopt+0x138/0x1140
? __sys_getsockname+0x7e/0xc0
? aa_sk_perm+0x3e/0x1a0
__sys_setsockopt+0x198/0x1e0
__x64_sys_setsockopt+0x21/0x30
do_syscall_64+0x38/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Laurent Fasnacht <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Missing proper sanitization for nft_set_desc_concat_parse().

2) Missing mutex in nf_tables pre_exit path.

3) Possible double hook unregistration from clean_net path.

4) Missing FLOWI_FLAG_ANYSRC flag in flowtable route lookup.
   Fix incorrect source and destination address in case of NAT.
   Patch from wenxu.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: flowtable: fix nft_flow_route source address for nat case
  netfilter: flowtable: fix missing FLOWI_FLAG_ANYSRC flag
  netfilter: nf_tables: double hook unregistration in netns path
  netfilter: nf_tables: hold mutex on netns pre_exit path
  netfilter: nf_tables: sanitize nft_set_desc_concat_parse()
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: sched: add barrier to fix packet stuck problem for lockless qdisc

In qdisc_run_end(), the spin_unlock() only has store-release semantic,
which guarantees all earlier memory access are visible before it. But
the subsequent test_bit() has no barrier semantics so may be reordered
ahead of the spin_unlock(). The store-load reordering may cause a packet
stuck problem.

The concurrent operations can be described as below,
         CPU 0                      |          CPU 1
   qdisc_run_end()                  |     qdisc_run_begin()
          .                         |           .
----> /* may be reorderd here */   |           .
|         .                         |           .
|     spin_unlock()                 |         set_bit()
|         .                         |         smp_mb__after_atomic()
---- test_bit()                    |         spin_trylock()
          .                         |          .

Consider the following sequence of events:
    CPU 0 reorder test_bit() ahead and see MISSED = 0
    CPU 1 calls set_bit()
    CPU 1 calls spin_trylock() and return fail
    CPU 0 executes spin_unlock()

At the end of the sequence, CPU 0 calls spin_unlock() and does nothing
because it see MISSED = 0. The skb on CPU 1 has beed enqueued but no one
take it, until the next cpu pushing to the qdisc (if ever ...) will
notice and dequeue it.

This patch fix this by adding one explicit barrier. As spin_unlock() and
test_bit() ordering is a store-load ordering, a full memory barrier
smp_mb() is needed here.

Fixes: a90c57f2cedd ("net: sched: fix packet stuck problem for lockless qdisc")
Signed-off-by: Guoju Fang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

dt-bindings: net: Fix unevaluatedProperties warnings in examples

The 'unevaluatedProperties' schema checks is not fully working and doesn't
catch some cases where there's a $ref to another schema. A fix is pending,
but results in new warnings in examples. Fix the warnings by removing
spurious properties or adding missing properties to the schema.

Signed-off-by: Rob Herring <[email protected]>
Acked-by: Nicolas Ferre <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

dt-bindings: PCI: socionext,uniphier-pcie: Add missing child interrupt controller

The Socionext interrupt controller internal to the the PCI block isn't
documented which causes warnings when unevaluatedProperties check is
also fixed. Add the 'interrupt-controller' child node and properties and
fixup the example so that interrupt properties can be parsed.

Signed-off-by: Rob Herring <[email protected]>
Reviewed-by: Kunihiko Hayashi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

dt-bindings: usb: snps,dwc3: Add missing 'dma-coherent' property

The 'unevaluatedProperties' schema checks is not fully working and doesn't
catch some cases where there's a $ref to another schema. A fix is pending,
but results in new warnings in examples.

Some DWC3 implementations such as Xilinx are hooked up coherently and need
to set the 'dma-coherent' property.

Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

dt-bindings: soc: imx8mp-media-blk-ctrl: Fix DT example

The DT example incorrectly names the ISP power domain "isp1" instead of
"isp". This causes a validation failure:

Documentation/devicetree/bindings/soc/imx/fsl,imx8mp-media-blk-ctrl.example.dtb: blk-ctl@32ec0000: power-domain-names:7: 'isp' was expected
From schema: Documentation/devicetree/bindings/soc/imx/fsl,imx8mp-media-blk-ctrl.yaml

Fix it.

Fixes: 8b3dd27bfe47 ("dt-bindings: soc: Add i.MX8MP media block control DT bindings")
Reported-by: Rob Herring <[email protected]>
Signed-off-by: Laurent Pinchart <[email protected]>
Reviewed-by: Paul Elder <[email protected]>
Acked-by: Rob Herring <[email protected]>
Reviewed-by: Peng Fan <[email protected]>
Acked-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Merge tag 'amd-drm-next-5.19-2022-05-26-2' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-5.19-2022-05-26-2:

amdgpu:
- Update fdinfo to the common drm format

UAPI:
- Add VM_NOALLOC GPUVM attribute to prevent buffers for going into the MALL
Add AMDGPU_GEM_CREATE_DISCARDABLE flag to create buffers that can be discarded on eviction
Mesa code which uses these: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16466

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'amd-drm-next-5.19-2022-05-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-5.19-2022-05-26:

amdgpu:
- Link training fixes
- DPIA fixes
- Misc code cleanups
- Aux fixes
- Hotplug fixes
- More FP clean up
- Misc GFX9/10 fixes
- Fix a possible memory leak in SMU shutdown
- SMU 13 updates
- RAS fixes
- TMZ fixes
- GC 11 updates
- SMU 11 metrics fixes
- Fix coverage blend mode for overlay plane
- Note DDR vs LPDDR memory
- Fuzz fix for CS IOCTL
- Add new PCI DID

amdkfd:
- Clean up hive setup
- Misc fixes

radeon:
- Fix a possible NULL pointer dereference

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'nfs-for-5.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
"New Features:
   - Add support for 'dacl' and 'sacl' attributes

  Bugfixes and Cleanups:
   - Fixes for reporting mapping errors
   - Fixes for memory allocation errors
   - Improve warning message when locks are lost
   - Update documentation for the nfs4_unique_id parameter
   - Add an explanation of NFSv4 client identifiers
   - Ensure the i_size attribute is written to the fscache storage
   - Fix freeing uninitialized nfs4_labels
   - Better handling when xprtrdma bc_serv is NULL
   - Mark qualified async operations as MOVEABLE tasks"

* tag 'nfs-for-5.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFSv4.1 mark qualified async operations as MOVEABLE tasks
  xprtrdma: treat all calls not a bcall when bc_serv is NULL
  NFSv4: Fix free of uninitialized nfs4_label on referral lookup.
  NFS: Pass i_size to fscache_unuse_cookie() when a file is released
  Documentation: Add an explanation of NFSv4 client identifiers
  NFS: update documentation for the nfs4_unique_id parameter
  NFS: Improve warning message when locks are lost.
  NFSv4.1: Enable access to the NFSv4.1 'dacl' and 'sacl' attributes
  NFSv4: Add encoders/decoders for the NFSv4.1 dacl and sacl attributes
  NFSv4: Specify the type of ACL to cache
  NFSv4: Don't hold the layoutget locks across multiple RPC calls
  pNFS/files: Fall back to I/O through the MDS on non-fatal layout errors
  NFS: Further fixes to the writeback error handling
  NFSv4/pNFS: Do not fail I/O when we fail to allocate the pNFS layout
  NFS: Memory allocation failures are not server fatal errors
  NFS: Don't report errors from nfs_pageio_complete() more than once
  NFS: Do not report flush errors in nfs_write_end()
  NFS: Don't report ENOSPC write errors twice
  NFS: fsync() should report filesystem errors over EINTR/ERESTARTSYS
  NFS: Do not report EINTR/ERESTARTSYS as mapping errors

Merge tag 'f2fs-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
"In this round, we've refactored the existing atomic write support
  implemented by in-memory operations to have storing data in disk
  temporarily, which can give us a benefit to accept more atomic writes.

  At the same time, we removed the existing volatile write support.

  We've also revisited the file pinning and GC flows and found some
  corner cases which contributeed abnormal system behaviours.

  As usual, there're several minor code refactoring for readability,
  sanity check, and clean ups.

  Enhancements:

   - allow compression for mmap files in compress_mode=user

   - kill volatile write support

   - change the current atomic write way

   - give priority to select unpinned section for foreground GC

   - introduce data read/write showing path info

   - remove unnecessary f2fs_lock_op in f2fs_new_inode

  Bug fixes:

   - fix the file pinning flow during checkpoint=disable and GCs

   - fix foreground and background GCs to select the right victims and
     get free sections on time

   - fix GC flags on defragmenting pages

   - avoid an infinite loop to flush node pages

   - fix fallocate to use file_modified to update permissions
     consistently"

* tag 'f2fs-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits)
  f2fs: fix to tag gcing flag on page during file defragment
  f2fs: replace F2FS_I(inode) and sbi by the local variable
  f2fs: add f2fs_init_write_merge_io function
  f2fs: avoid unneeded error handling for revoke_entry_slab allocation
  f2fs: allow compression for mmap files in compress_mode=user
  f2fs: fix typo in comment
  f2fs: make f2fs_read_inline_data() more readable
  f2fs: fix to do sanity check for inline inode
  f2fs: fix fallocate to use file_modified to update permissions consistently
  f2fs: don't use casefolded comparison for "." and ".."
  f2fs: do not stop GC when requiring a free section
  f2fs: keep wait_ms if EAGAIN happens
  f2fs: introduce f2fs_gc_control to consolidate f2fs_gc parameters
  f2fs: reject test_dummy_encryption when !CONFIG_FS_ENCRYPTION
  f2fs: kill volatile write support
  f2fs: change the current atomic write way
  f2fs: don't need inode lock for system hidden quota
  f2fs: stop allocating pinned sections if EAGAIN happens
  f2fs: skip GC if possible when checkpoint disabling
  f2fs: give priority to select unpinned section for foreground GC
  ...

cifs: when extending a file with falloc we should make files not-sparse

as this is the only way to make sure the region is allocated.
Fix the conditional that was wrong and only tried to make already
non-sparse files non-sparse.

Cc: [email protected]
Signed-off-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>

Merge tag 'leds-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds

Pull LED updates from Pavel Machek:
"Most significant here is the driver for Qualcomm LPG. Apparently it
  drives backlight on some boards, so it is quite important for some
  people"

* tag 'leds-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds:
  leds: qcom-lpg: Require pattern to follow documentation
  leds: lp50xx: Remove duplicated error reporting in .remove()
  leds: qcom-lpg: add missing PWM dependency
  leds: ktd2692: Make aux-gpios optional
  dt-bindings: leds: convert ktd2692 bindings to yaml
  leds: ktd2692: Avoid duplicate error messages on probe deferral
  leds: is31fl32xx: Improve error reporting in .remove()
  leds: Move pwm-multicolor driver into rgb directory
  leds: Add PWM multicolor driver
  dt-bindings: leds: Add multicolor PWM LED bindings
  dt-bindings: leds: Optional multi-led unit address
  leds: regulator: Make probeable from device tree
  leds: regulator: Add dev helper variable
  dt-bindings: leds: Add regulator-led binding
  leds: pca9532: Make pca9532_destroy_devices() return void
  leds: Add pm8350c support to Qualcomm LPG driver
  dt-bindings: leds: Add pm8350c pmic support
  leds: Add driver for Qualcomm LPG
  dt-bindings: leds: Add Qualcomm Light Pulse Generator binding

Merge tag 'i2c-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c updates from Wolfram Sang:
"Only driver updates for 5.19.

  Bigger changes are for Meson, NPCM, and R-Car, but there are also
  changes all over the place"

* tag 'i2c-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (34 commits)
  i2c: meson: fix typo in comment
  i2c: rcar: use flags instead of atomic_xfer
  i2c: rcar: REP_AFTER_RD is not a persistent flag
  i2c: rcar: use BIT macro consistently
  i2c: qcom-geni: remove unnecessary conditions
  i2c: mt7621: Use devm_platform_get_and_ioremap_resource()
  i2c: rcar: refactor handling of first message
  i2c: rcar: avoid race condition with SMIs
  i2c: xiic: Correct the datatype for rx_watermark
  i2c: rcar: fix PM ref counts in probe error paths
  i2c: npcm: Handle spurious interrupts
  i2c: npcm: Correct register access width
  i2c: npcm: Add tx complete counter
  i2c: npcm: Fix timeout calculation
  i2c: npcm: Remove unused variable clk_regmap
  i2c: npcm: Change the way of getting GCR regmap
  i2c: xiic: Fix Tx Interrupt path for grouped messages
  i2c: xiic: Fix coding style issues
  i2c: xiic: return value of xiic_reinit
  i2c: cadence: Increase timeout per message if necessary
  ...

netfilter: flowtable: fix nft_flow_route source address for nat case

For snat and dnat cases, the saddr should be taken from reverse tuple.

Fixes: 3412e1641828 (netfilter: flowtable: nft_flow_route use more data for reverse route)
Signed-off-by: wenxu <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: flowtable: fix missing FLOWI_FLAG_ANYSRC flag

The nf_flow_table gets route through ip_route_output_key. If the saddr
is not local one, then FLOWI_FLAG_ANYSRC flag should be set. Without
this flag, the route lookup for other_dst will fail.

Fixes: 3412e1641828 (netfilter: flowtable: nft_flow_route use more data for reverse route)
Signed-off-by: wenxu <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nf_tables: double hook unregistration in netns path

__nft_release_hooks() is called from pre_netns exit path which
unregisters the hooks, then the NETDEV_UNREGISTER event is triggered
which unregisters the hooks again.

[  565.221461] WARNING: CPU: 18 PID: 193 at net/netfilter/core.c:495 __nf_unregister_net_hook+0x247/0x270
[...]
[  565.246890] CPU: 18 PID: 193 Comm: kworker/u64:1 Tainted: G            E     5.18.0-rc7+ #27
[  565.253682] Workqueue: netns cleanup_net
[  565.257059] RIP: 0010:__nf_unregister_net_hook+0x247/0x270
[...]
[  565.297120] Call Trace:
[  565.300900]  <TASK>
[  565.304683]  nf_tables_flowtable_event+0x16a/0x220 [nf_tables]
[  565.308518]  raw_notifier_call_chain+0x63/0x80
[  565.312386]  unregister_netdevice_many+0x54f/0xb50

Unregister and destroy netdev hook from netns pre_exit via kfree_rcu
so the NETDEV_UNREGISTER path see unregistered hooks.

Fixes: 767d1216bff8 ("netfilter: nftables: fix possible UAF over chains from packet path in netns")
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nf_tables: hold mutex on netns pre_exit path

clean_net() runs in workqueue while walking over the lists, grab mutex.

Fixes: 767d1216bff8 ("netfilter: nftables: fix possible UAF over chains from packet path in netns")
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nf_tables: sanitize nft_set_desc_concat_parse()

Add several sanity checks for nft_set_desc_concat_parse():

- validate desc->field_count not larger than desc->field_len array.
- field length cannot be larger than desc->field_len (ie. U8_MAX)
- total length of the concatenation cannot be larger than register array.

Joint work with Florian Westphal.

Fixes: f3a2181e16f1 ("netfilter: nf_tables: Support for sets with multiple ranged fields")
Reported-by: <[email protected]>
Reviewed-by: Stefano Brivio <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

Merge tag 'riscv-for-linus-5.19-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

- Support for the Svpbmt extension, which allows memory attributes to
   be encoded in pages

- Support for the Allwinner D1's implementation of page-based memory
   attributes

- Support for running rv32 binaries on rv64 systems, via the compat
   subsystem

- Support for kexec_file()

- Support for the new generic ticket-based spinlocks, which allows us
   to also move to qrwlock. These should have already gone in through
   the asm-geneic tree as well

- A handful of cleanups and fixes, include some larger ones around
   atomics and XIP

* tag 'riscv-for-linus-5.19-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits)
  RISC-V: Prepare dropping week attribute from arch_kexec_apply_relocations[_add]
  riscv: compat: Using seperated vdso_maps for compat_vdso_info
  RISC-V: Fix the XIP build
  RISC-V: Split out the XIP fixups into their own file
  RISC-V: ignore xipImage
  RISC-V: Avoid empty create_*_mapping definitions
  riscv: Don't output a bogus mmu-type on a no MMU kernel
  riscv: atomic: Add custom conditional atomic operation implementation
  riscv: atomic: Optimize dec_if_positive functions
  riscv: atomic: Cleanup unnecessary definition
  RISC-V: Load purgatory in kexec_file
  RISC-V: Add purgatory
  RISC-V: Support for kexec_file on panic
  RISC-V: Add kexec_file support
  RISC-V: use memcpy for kexec_file mode
  kexec_file: Fix kexec_file.c build error for riscv platform
  riscv: compat: Add COMPAT Kbuild skeletal support
  riscv: compat: ptrace: Add compat_arch_ptrace implement
  riscv: compat: signal: Add rt_frame implementation
  riscv: add memory-type errata for T-Head
  ...

NFSv4.1 mark qualified async operations as MOVEABLE tasks

Mark async operations such as RENAME, REMOVE, COMMIT MOVEABLE
for the nfsv4.1+ sessions.

Fixes: 85e39feead948 ("NFSv4.1 identify and mark RPC tasks that can move between transports")
Signed-off-by: Olga Kornievskaia <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

xprtrdma: treat all calls not a bcall when bc_serv is NULL

When a rdma server returns a fault format reply, nfs v3 client may
treats it as a bcall when bc service is not exist.

The debug message at rpcrdma_bc_receive_call are,

[56579.837169] RPC:       rpcrdma_bc_receive_call: callback XID
00000001, length=20
[56579.837174] RPC:       rpcrdma_bc_receive_call: 00 00 00 01 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 04

After that, rpcrdma_bc_receive_call will meets NULL pointer as,

[  226.057890] BUG: unable to handle kernel NULL pointer dereference at
00000000000000c8
...
[  226.058704] RIP: 0010:_raw_spin_lock+0xc/0x20
...
[  226.059732] Call Trace:
[  226.059878]  rpcrdma_bc_receive_call+0x138/0x327 [rpcrdma]
[  226.060011]  __ib_process_cq+0x89/0x170 [ib_core]
[  226.060092]  ib_cq_poll_work+0x26/0x80 [ib_core]
[  226.060257]  process_one_work+0x1a7/0x360
[  226.060367]  ? create_worker+0x1a0/0x1a0
[  226.060440]  worker_thread+0x30/0x390
[  226.060500]  ? create_worker+0x1a0/0x1a0
[  226.060574]  kthread+0x116/0x130
[  226.060661]  ? kthread_flush_work_fn+0x10/0x10
[  226.060724]  ret_from_fork+0x35/0x40
...

Signed-off-by: Kinglong Mee <[email protected]>
Reviewed-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

NFSv4: Fix free of uninitialized nfs4_label on referral lookup.

Send along the already-allocated fattr along with nfs4_fs_locations, and
drop the memcpy of fattr. We end up growing two more allocations, but this
fixes up a crash as:

PID: 790 TASK: ffff88811b43c000 CPU: 0 COMMAND: "ls"
#0 [ffffc90000857920] panic at ffffffff81b9bfde
#1 [ffffc900008579c0] do_trap at ffffffff81023a9b
#2 [ffffc90000857a10] do_error_trap at ffffffff81023b78
#3 [ffffc90000857a58] exc_stack_segment at ffffffff81be1f45
#4 [ffffc90000857a80] asm_exc_stack_segment at ffffffff81c009de
#5 [ffffc90000857b08] nfs_lookup at ffffffffa0302322 [nfs]
#6 [ffffc90000857b70] __lookup_slow at ffffffff813a4a5f
#7 [ffffc90000857c60] walk_component at ffffffff813a86c4
#8 [ffffc90000857cb8] path_lookupat at ffffffff813a9553
#9 [ffffc90000857cf0] filename_lookup at ffffffff813ab86b

Suggested-by: Trond Myklebust <[email protected]>
Fixes: 9558a007dbc3 ("NFS: Remove the label from the nfs4_lookup_res struct")
Signed-off-by: Benjamin Coddington <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

net/mlx5: Fix mlx5_get_next_dev() peer device matching

In some use-cases, mlx5 instances will need to search for their peer
device (the other port on the same HCA). For that, mlx5 device matching
mechanism relied on auxiliary_find_device() to search, and used a bad matching
callback function.

This approach has two issues:

1) next_phys_dev() the matching function, assumed all devices are
   of the type mlx5_adev (mlx5 auxiliary device) which is wrong and
   could lead to crashes, this worked for a while, since only lately
   other drivers started registering auxiliary devices.

2) using the auxiliary class bus (auxiliary_find_device) to search for
   mlx5_core_dev devices, who are actually PCIe device instances, is wrong.
   This works since mlx5_core always has at least one mlx5_adev instance
   hanging around in the aux bus.

As suggested by others we can fix 1. by comparing device names prefixes
if they have the string "mlx5_core" in them, which is not a best practice !
but even with that fixed, still 2. needs fixing, we are trying to
match pcie device peers so we should look in the right bus (pci bus),
hence this fix.

The fix:
1) search the pci bus for mlx5 peer devices, instead of the aux bus
2) to validated devices are the same type "mlx5_core_dev" compare if
   they have the same driver, which is bulletproof.

   This wouldn't have worked with the aux bus since the various mlx5 aux
   device types don't share the same driver, even if they share the same device
   wrapper struct (mlx5_adev) "which helped to find the parent device"

Fixes: a925b5e309c9 ("net/mlx5: Register mlx5 devices to auxiliary virtual bus")
Reported-by: Alexander Lobakin <[email protected]>
Reported-by: Maher Sanalla <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Reviewed-by: Mark Bloch <[email protected]>
Reviewed-by: Maher Sanalla <[email protected]>

net/mlx5e: Update netdev features after changing XDP state

Some features (LRO, HW GRO) conflict with XDP. If there is an attempt to
enable such features while XDP is active, they will be set to `off
[requested on]`. In order to activate these features after XDP is turned
off, the driver needs to call netdev_update_features(). This commit adds
this missing call after XDP state changes.

Fixes: cf6e34c8c22f ("net/mlx5e: Properly block LRO when XDP is enabled")
Fixes: b0617e7b3500 ("net/mlx5e: Properly block HW GRO when XDP is enabled")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: correct ECE offset in query qp output

ECE field should be after opt_param_mask in query qp output.

Fixes: 6b646a7e4af6 ("net/mlx5: Add ability to read and write ECE options")
Signed-off-by: Changcheng Liu <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Disable softirq in mlx5e_activate_rq to avoid race condition

When the driver activates the channels, it assumes NAPI isn't running
yet. mlx5e_activate_rq posts a NOP WQE to ICOSQ to trigger a hardware
interrupt and start NAPI, which will run mlx5e_alloc_rx_mpwqe and post
UMR WQEs to ICOSQ to be able to receive packets with striding RQ.

Unfortunately, a race condition is possible if NAPI is triggered by
something else (for example, TX) at a bad timing, before
mlx5e_activate_rq finishes. In this case, mlx5e_alloc_rx_mpwqe may post
UMR WQEs to ICOSQ, and with the bad timing, the wqe_info of the first
UMR may be overwritten by the wqe_info of the NOP posted by
mlx5e_activate_rq.

The consequence is that icosq->db.wqe_info[0].num_wqebbs will be changed
from MLX5E_UMR_WQEBBS to 1, disrupting the integrity of the array-based
linked list in wqe_info[]. mlx5e_poll_ico_cq will hang in an infinite
loop after processing wqe_info[0], because after the corruption, the
next item to be processed will be wqe_info[1], which is filled with
zeros, and `sqcc += wi->num_wqebbs` will never move further.

This commit fixes this race condition by using async_icosq to post the
NOP and trigger the interrupt. async_icosq is always protected with a
spinlock, eliminating the race condition.

Fixes: bc77b240b3c5 ("net/mlx5e: Add fragmented memory support for RX multi packet WQE")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Reported-by: Karsten Nielsen <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Reviewed-by: Gal Pressman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: CT: Fix header-rewrite re-use for tupels

Tuple entries that don't have nat configured for them
which are added to the ct nat table will always create
a new modify header, as we don't check for possible
re-use on them. The same for tuples that have nat configured
for them but are added to ct table.

Fix the above by only avoiding wasteful re-use lookup
for actually natted entries in ct nat table.

Fixes: 7fac5c2eced3 ("net/mlx5: CT: Avoid reusing modify header context for natted entries")
Signed-off-by: Paul Blakey <[email protected]>
Reviewed-by: Ariel Levkovich <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: TC NIC mode, fix tc chains miss table

The cited commit changed promisc table to be created on demand with the
highest priority in the NIC table replacing the vlan table, this caused
tc NIC tables miss flow to skip the prmoisc table because it use vlan
table as miss table.

OVS offload in NIC mode use promisc by default so any unicast packet
which will be handled by tc NIC tables miss flow will skip the promisc
rule and will be dropped.

Fix this by adding new empty table in new tc level with low priority and
point the nic tc chain miss to it, the new table is managed so it will
point to vlan table if promisc is disabled and to promisc table if enabled.

Fixes: 1c46d7409f30 ("net/mlx5e: Optimize promiscuous mode")
Signed-off-by: Maor Dickman <[email protected]>
Reviewed-by: Paul Blakey <[email protected]>
Reviewed-by: Ariel Levkovich <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: Don't use already freed action pointer

The call to mlx5dr_action_destroy() releases "action" memory. That
pointer is set to miss_action later and generates the following smatch
error:

drivers/net/ethernet/mellanox/mlx5/core/steering/fs_dr.c:53 set_miss_action()
warn: 'action' was already freed.

Make sure that the pointer is always valid by setting NULL after destroy.

Fixes: 6a48faeeca10 ("net/mlx5: Add direct rule fs_cmd implementation")
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

dm verity: set DM_TARGET_IMMUTABLE feature flag

The device-mapper framework provides a mechanism to mark targets as
immutable (and hence fail table reloads that try to change the target
type). Add the DM_TARGET_IMMUTABLE flag to the dm-verity target's
feature flags to prevent switching the verity target with a different
target type.

Fixes: a4ffc152198e ("dm: add verity target")
Cc: [email protected]
Signed-off-by: Sarthak Kukreti <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>

cifs: remove repeated debug message on cifs_put_smb_ses()

Similar message is printed a few lines later in the same function

Signed-off-by: Enzo Matsumiya <[email protected]>
Signed-off-by: Steve French <[email protected]>

MAINTAINERS: Update Lorenzo Pieralisi's email address

I will soon lose my @arm.com email address, so to prevent any possible
issue let's update all kernel references (inclusive of .mailmap) to my
@kernel.org alias ahead of time.

My @arm.com address is still working and will likely resume to work at some
point in the future; nonetheless, it is safer to switch to the @kernel.org
alias from now onwards so that continuity is guaranteed.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lorenzo Pieralisi <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Catalin Marinas <[email protected]>

PCI/PM: Fix bridge_d3_blacklist[] Elo i2 overwrite of Gigabyte X299

92597f97a40b ("PCI/PM: Avoid putting Elo i2 PCIe Ports in D3cold") omitted
braces around the new Elo i2 entry, so it overwrote the existing Gigabyte
X299 entry.  Add the appropriate braces.

Found by:

  $ make W=1 drivers/pci/pci.o
    CC      drivers/pci/pci.o
  drivers/pci/pci.c:2974:12: error: initialized field overwritten [-Werror=override-init]
   2974 |   .ident = "Elo i2",
        |            ^~~~~~~~

Link: https://lore.kernel.org/r/20220526221258.GA409855@bhelgaas
Fixes: 92597f97a40b ("PCI/PM: Avoid putting Elo i2 PCIe Ports in D3cold")
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: [email protected] # v5.15+

Revert "PCI: brcmstb: Split brcm_pcie_setup() into two funcs"

This reverts commit 830aa6f29f07a4e2f1a947dfa72b3ccddb46dd21.

This is part of a revert of the following commits:

  11ed8b8624b8 ("PCI: brcmstb: Do not turn off WOL regulators on suspend")
  93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
  67211aadcb4b ("PCI: brcmstb: Add mechanism to turn on subdev regulators")
  830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")

Cyril reported that 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup()
into two funcs"), which appeared in v5.17-rc1, broke booting on the
Raspberry Pi Compute Module 4.  Apparently 830aa6f29f07 panics with an
Asynchronous SError Interrupt, and after further commits here is a black
screen on HDMI and no output on the serial console.

This does not seem to affect the Raspberry Pi 4 B.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
Link: https://lore.kernel.org/r/[email protected]
Reported-by: Cyril Brulebois <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>

Revert "PCI: brcmstb: Add mechanism to turn on subdev regulators"

This reverts commit 67211aadcb4b968d0fdc57bc27240fa71500c2d4.

This is part of a revert of the following commits:

  11ed8b8624b8 ("PCI: brcmstb: Do not turn off WOL regulators on suspend")
  93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
  67211aadcb4b ("PCI: brcmstb: Add mechanism to turn on subdev regulators")
  830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")

Cyril reported that 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup()
into two funcs"), which appeared in v5.17-rc1, broke booting on the
Raspberry Pi Compute Module 4.  Apparently 830aa6f29f07 panics with an
Asynchronous SError Interrupt, and after further commits here is a black
screen on HDMI and no output on the serial console.

This does not seem to affect the Raspberry Pi 4 B.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
Link: https://lore.kernel.org/r/[email protected]
Reported-by: Cyril Brulebois <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>

Revert "PCI: brcmstb: Add control of subdevice voltage regulators"

This reverts commit 93e41f3fca3d4a0f927b784012338c37f80a8a80.

This is part of a revert of the following commits:

  11ed8b8624b8 ("PCI: brcmstb: Do not turn off WOL regulators on suspend")
  93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
  67211aadcb4b ("PCI: brcmstb: Add mechanism to turn on subdev regulators")
  830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")

Cyril reported that 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup()
into two funcs"), which appeared in v5.17-rc1, broke booting on the
Raspberry Pi Compute Module 4.  Apparently 830aa6f29f07 panics with an
Asynchronous SError Interrupt, and after further commits here is a black
screen on HDMI and no output on the serial console.

This does not seem to affect the Raspberry Pi 4 B.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
Link: https://lore.kernel.org/r/[email protected]
Reported-by: Cyril Brulebois <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>

Revert "PCI: brcmstb: Do not turn off WOL regulators on suspend"

This reverts commit 11ed8b8624b8085f706864b4addcd304b1e4fc38.

This is part of a revert of the following commits:

  11ed8b8624b8 ("PCI: brcmstb: Do not turn off WOL regulators on suspend")
  93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
  67211aadcb4b ("PCI: brcmstb: Add mechanism to turn on subdev regulators")
  830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")

Cyril reported that 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup()
into two funcs"), which appeared in v5.17-rc1, broke booting on the
Raspberry Pi Compute Module 4.  Apparently 830aa6f29f07 panics with an
Asynchronous SError Interrupt, and after further commits here is a black
screen on HDMI and no output on the serial console.

This does not seem to affect the Raspberry Pi 4 B.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
Link: https://lore.kernel.org/r/[email protected]
Reported-by: Cyril Brulebois <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>

dm table: fix dm_table_supports_poll to return false if no data devices

It was reported that the "generic/250" test in xfstests (which uses
the dm-error target) demonstrates a regression where the kernel
crashes in bioset_exit().

Since commit cfc97abcbe0b ("dm: conditionally enable
BIOSET_PERCPU_CACHE for dm_io bioset") the bioset_init() for the dm_io
bioset will setup the bioset's per-cpu alloc cache if all devices have
QUEUE_FLAG_POLL set.

But there was an bug where a target that doesn't have any data devices
(and that doesn't even set the .iterate_devices dm target callback)
will incorrectly return true from dm_table_supports_poll().

Fix this by updating dm_table_supports_poll() to follow dm-table.c's
well-worn pattern for testing that _all_ targets in a DM table do in
fact have underlying devices that set QUEUE_FLAG_POLL.

NOTE: An additional block fix is still needed so that
bio_alloc_cache_destroy() clears the bioset's ->cache member.
Otherwise, a DM device's table reload that transitions the DM device's
bioset from using a per-cpu alloc cache to _not_ using one will result
in bioset_exit() crashing in bio_alloc_cache_destroy() because dm's
dm_io bioset ("io_bs") was left with a stale ->cache member.

Fixes: cfc97abcbe0b ("dm: conditionally enable BIOSET_PERCPU_CACHE for dm_io bioset")
Reported-by: Matthew Wilcox <[email protected]>
Reported-by: Dave Chinner <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>

Merge tag 'iommu-updates-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

- Intel VT-d driver updates:
     - Domain force snooping improvement.
     - Cleanups, no intentional functional changes.

- ARM SMMU driver updates:
     - Add new Qualcomm device-tree compatible strings
     - Add new Nvidia device-tree compatible string for Tegra234
     - Fix UAF in SMMUv3 shared virtual addressing code
     - Force identity-mapped domains for users of ye olde SMMU legacy
       binding
     - Minor cleanups

- Fix a BUG_ON in the vfio_iommu_group_notifier:
     - Groundwork for upcoming iommufd framework
     - Introduction of DMA ownership so that an entire IOMMU group is
       either controlled by the kernel or by user-space

- MT8195 and MT8186 support in the Mediatek IOMMU driver

- Make forcing of cache-coherent DMA more coherent between IOMMU
   drivers

- Fixes for thunderbolt device DMA protection

- Various smaller fixes and cleanups

* tag 'iommu-updates-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (88 commits)
  iommu/amd: Increase timeout waiting for GA log enablement
  iommu/s390: Tolerate repeat attach_dev calls
  iommu/vt-d: Remove hard coding PGSNP bit in PASID entries
  iommu/vt-d: Remove domain_update_iommu_snooping()
  iommu/vt-d: Check domain force_snooping against attached devices
  iommu/vt-d: Block force-snoop domain attaching if no SC support
  iommu/vt-d: Size Page Request Queue to avoid overflow condition
  iommu/vt-d: Fold dmar_insert_one_dev_info() into its caller
  iommu/vt-d: Change return type of dmar_insert_one_dev_info()
  iommu/vt-d: Remove unneeded validity check on dev
  iommu/dma: Explicitly sort PCI DMA windows
  iommu/dma: Fix iova map result check bug
  iommu/mediatek: Fix NULL pointer dereference when printing dev_name
  iommu: iommu_group_claim_dma_owner() must always assign a domain
  iommu/arm-smmu: Force identity domains for legacy binding
  iommu/arm-smmu: Support Tegra234 SMMU
  dt-bindings: arm-smmu: Add compatible for Tegra234 SOC
  dt-bindings: arm-smmu: Document nvidia,memory-controller property
  iommu/arm-smmu-qcom: Add SC8280XP support
  dt-bindings: arm-smmu: Add compatible for Qualcomm SC8280XP
  ...

Merge tag 'microblaze-v5.19' of git://git.monstr.eu/linux-2.6-microblaze

Pull microblaze updates from Michal Simek:

- Fix issues with freestanding

- Wire memblock_dump_all()

- Add support for memory reservation from DT

* tag 'microblaze-v5.19' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: fix typos in comments
  microblaze: Add support for reserved memory defined by DT
  microblaze: Wire memblock_dump_all()
  microblaze: Use simple memmove/memcpy implementation from lib/string.c
  microblaze: Do loop unrolling for optimized memset implementation
  microblaze: Use simple memset implementation from lib/string.c

vhost: rename vhost_work_dev_flush

This patch renames vhost_work_dev_flush to just vhost_dev_flush to
relfect that it flushes everything on the device and that drivers
don't know/care that polls are based on vhost_works. Drivers just
flush the entire device and polls, and works for vhost-scsi
management TMFs and IO net virtqueues, etc all are flushed.

Signed-off-by: Mike Christie <[email protected]>
Acked-by: Jason Wang <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Message-Id: <20220517180850 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost-test: drop flush after vhost_dev_cleanup

The flush after vhost_dev_cleanup is not needed because:

1. It doesn't do anything. vhost_dev_cleanup will stop the worker thread
so the flush call will just return since the worker has not device.

2. It's not needed. The comment about jobs re-queueing themselves does
not look correct because handle_vq does not requeue work.

Signed-off-by: Mike Christie <[email protected]>
Acked-by: Jason Wang <[email protected]>
Message-Id: <20220517180850 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost-scsi: drop flush after vhost_dev_cleanup

The flush after vhost_dev_cleanup is not needed because:

1. It doesn't do anything. vhost_dev_cleanup will stop the worker thread
so the flush call will just return since the worker has not device.

2. It's not needed for the re-queue case. vhost_scsi_evt_handle_kick grabs
the mutex and if the backend is NULL will return without queueing a work.
vhost_scsi_clear_endpoint will set the backend to NULL under the vq->mutex
then drops the mutex and does a flush. So we know when
vhost_scsi_clear_endpoint has dropped the mutex after clearing the backend
no evt related work will be able to requeue. The flush would then make sure
any queued evts are run and return.

Signed-off-by: Mike Christie <[email protected]>
Acked-by: Jason Wang <[email protected]>
Message-Id: <20220517180850 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost_vsock: simplify vhost_vsock_flush()

vhost_vsock_flush() calls vhost_work_dev_flush(vsock->vqs[i].poll.dev)
before vhost_work_dev_flush(&vsock->dev). This seems pointless
as vsock->vqs[i].poll.dev is the same as &vsock->dev and several flushes
in a row doesn't do anything useful, one is just enough.

Signed-off-by: Andrey Ryabinin <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Signed-off-by: Mike Christie <[email protected]>
Acked-by: Jason Wang <[email protected]>
Message-Id: <20220517180850 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost_test: remove vhost_test_flush_vq()

vhost_test_flush_vq() just a simple wrapper around vhost_work_dev_flush()
which seems have no value. It's just easier to call vhost_work_dev_flush()
directly. Besides there is no point in obtaining vhost_dev pointer
via 'n->vqs[index].poll.dev' while we can just use &n->dev.
It's the same pointers, see vhost_test_open()/vhost_dev_init().

Signed-off-by: Andrey Ryabinin <[email protected]>
Signed-off-by: Mike Christie <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Acked-by: Jason Wang <[email protected]>
Message-Id: <20220517180850 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>

vhost_net: get rid of vhost_net_flush_vq() and extra flush calls

vhost_net_flush_vq() calls vhost_work_dev_flush() twice passing
vhost_dev pointer obtained via 'n->poll[index].dev' and
'n->vqs[index].vq.poll.dev'. This is actually the same pointer,
initialized in vhost_net_open()/vhost_dev_init()/vhost_poll_init()

Remove vhost_net_flush_vq() and call vhost_work_dev_flush() directly.
Do the flushes only once instead of several flush calls in a row
which seems rather useless.

Signed-off-by: Andrey Ryabinin <[email protected]>
[drop vhost_dev forward declaration in vhost.h]
Signed-off-by: Mike Christie <[email protected]>
Acked-by: Jason Wang <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Message-Id: <20220517180850 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost: flush dev once during vhost_dev_stop

When vhost_work_dev_flush returns all work queued at that time will have
completed. There is then no need to flush after every vhost_poll_stop
call, and we can move the flush call to after the loop that stops the
pollers.

Signed-off-by: Mike Christie <[email protected]>
Acked-by: Jason Wang <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Message-Id: <20220517180850 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost: get rid of vhost_poll_flush() wrapper

vhost_poll_flush() is a simple wrapper around vhost_work_dev_flush().
It gives wrong impression that we are doing some work over vhost_poll,
while in fact it flushes vhost_poll->dev.
It only complicate understanding of the code and leads to mistakes
like flushing the same vhost_dev several times in a row.

Just remove vhost_poll_flush() and call vhost_work_dev_flush() directly.

Signed-off-by: Andrey Ryabinin <[email protected]>
[merge vhost_poll_flush removal from Stefano Garzarella]
Signed-off-by: Mike Christie <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Acked-by: Jason Wang <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Message-Id: <20220517180850 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost-vdpa: return -EFAULT on copy_to_user() failure

The copy_to_user() function returns the number of bytes remaining to be
copied. However, we need to return a negative error code, -EFAULT, to
the user.

Fixes: 87f4c217413a ("vhost-vdpa: introduce uAPI to get the number of virtqueue groups")
Fixes: e96ef636f154 ("vhost-vdpa: introduce uAPI to get the number of address spaces")
Signed-off-by: Dan Carpenter <[email protected]>
Message-Id: <YotG1vXKXXSayr63@kili>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>

vdpasim: Off by one in vdpasim_set_group_asid()

The > comparison needs to be >= to prevent an out of bounds access
of the vdpasim->iommu[] array. The vdpasim->iommu[] is allocated in
vdpasim_create() and it has vdpasim->dev_attr.nas elements.

Fixes: 87e5afeac247 ("vdpasim: control virtqueue support")
Signed-off-by: Dan Carpenter <[email protected]>
Message-Id: <YotGQU1q224RKZR8@kili>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Jason Wang <[email protected]>

virtio: Directly use ida_alloc()/free()

Use ida_alloc()/ida_free() instead of deprecated
ida_simple_get()/ida_simple_remove() .

Signed-off-by: keliu <[email protected]>
Message-Id: <20220527073302.2474073 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio: use WARN_ON() to warning illegal status value

We used to use BUG_ON() in virtio_device_ready() to detect illegal
status value, this seems sub-optimal since the value is under the
control of the device. Switch to use WARN_ON() instead.

Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Halil Pasic <[email protected]>
Cc: Cornelia Huck <[email protected]>
Cc: Vineeth Vijayan <[email protected]>
Cc: Peter Oberparleiter <[email protected]>
Cc: [email protected]
Signed-off-by: Jason Wang <[email protected]>
Message-Id: <20220527060120 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Xuan Zhuo <[email protected]>

virtio: harden vring IRQ

This is a rework on the previous IRQ hardening that is done for
virtio-pci where several drawbacks were found and were reverted:

1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
that is used by some device such as virtio-blk
2) done only for PCI transport

The vq->broken is re-used in this patch for implementing the IRQ
hardening. The vq->broken is set to true during both initialization
and reset. And the vq->broken is set to false in
virtio_device_ready(). Then vring_interrupt() can check and return
when vq->broken is true. And in this case, switch to return IRQ_NONE
to let the interrupt core aware of such invalid interrupt to prevent
IRQ storm.

The reason of using a per queue variable instead of a per device one
is that we may need it for per queue reset hardening in the future.

Note that the hardening is only done for vring interrupt since the
config interrupt hardening is already done in commit 22b7050a024d7
("virtio: defer config changed notifications"). But the method that is
used by config interrupt can't be reused by the vring interrupt
handler because it uses spinlock to do the synchronization which is
expensive.

Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Halil Pasic <[email protected]>
Cc: Cornelia Huck <[email protected]>
Cc: Vineeth Vijayan <[email protected]>
Cc: Peter Oberparleiter <[email protected]>
Cc: [email protected]
Signed-off-by: Jason Wang <[email protected]>
Message-Id: <20220527060120 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Xuan Zhuo <[email protected]>

virtio: allow to unbreak virtqueue

This patch allows the new introduced __virtio_break_device() to
unbreak the virtqueue.

Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Halil Pasic <[email protected]>
Cc: Cornelia Huck <[email protected]>
Cc: Vineeth Vijayan <[email protected]>
Cc: Peter Oberparleiter <[email protected]>
Cc: [email protected]
Signed-off-by: Jason Wang <[email protected]>
Message-Id: <20220527060120 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Xuan Zhuo <[email protected]>

virtio-ccw: implement synchronize_cbs()

This patch tries to implement the synchronize_cbs() for ccw. For the
vring_interrupt() that is called via virtio_airq_handler(), the
synchronization is simply done via the airq_info's lock. For the
vring_interrupt() that is called via virtio_ccw_int_handler(), a per
device rwlock is introduced and used in the synchronization method.

Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Halil Pasic <[email protected]>
Cc: Cornelia Huck <[email protected]>
Cc: Vineeth Vijayan <[email protected]>
Cc: Peter Oberparleiter <[email protected]>
Cc: [email protected]
Reviewed-by: Halil Pasic <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Message-Id: <20220527060120 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio-mmio: implement synchronize_cbs()

Simply synchronize the platform irq that is used by us.

Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Halil Pasic <[email protected]>
Cc: Cornelia Huck <[email protected]>
Cc: Vineeth Vijayan <[email protected]>
Cc: Peter Oberparleiter <[email protected]>
Cc: [email protected]
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Message-Id: <20220527060120 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Xuan Zhuo <[email protected]>

virtio-pci: implement synchronize_cbs()

We can simply reuse vp_synchronize_vectors() for .synchronize_cbs().

Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Halil Pasic <[email protected]>
Cc: Cornelia Huck <[email protected]>
Cc: Vineeth Vijayan <[email protected]>
Cc: Peter Oberparleiter <[email protected]>
Cc: [email protected]
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Message-Id: <20220527060120 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Xuan Zhuo <[email protected]>

virtio: introduce config op to synchronize vring callbacks

This patch introduces new virtio config op to vring
callbacks. Transport specific method is required to make sure the
write before this function is visible to the vring_interrupt() that is
called after the return of this function. For the transport that
doesn't provide synchronize_vqs(), use synchornize_rcu() which
synchronize with IRQ implicitly as a fallback.

Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Halil Pasic <[email protected]>
Cc: Cornelia Huck <[email protected]>
Cc: Vineeth Vijayan <[email protected]>
Cc: Peter Oberparleiter <[email protected]>
Cc: [email protected]
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Message-Id: <20220527060120 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Xuan Zhuo <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>

virtio: use virtio_reset_device() when possible

This allows us to do common extension without duplicating code.

Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Halil Pasic <[email protected]>
Cc: Cornelia Huck <[email protected]>
Cc: Vineeth Vijayan <[email protected]>
Cc: Peter Oberparleiter <[email protected]>
Cc: [email protected]
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Message-Id: <20220527060120 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Xuan Zhuo <[email protected]>
Reviewed-by: Eugenio Pérez <[email protected]>

virtio: use virtio_device_ready() in virtio_device_restore()

It will allow us to do extension on virtio_device_ready() without
duplicating code.

Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Halil Pasic <[email protected]>
Cc: Cornelia Huck <[email protected]>
Cc: Vineeth Vijayan <[email protected]>
Cc: Peter Oberparleiter <[email protected]>
Cc: [email protected]
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Message-Id: <20220527060120 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Xuan Zhuo <[email protected]>

vdpasim: allow to enable a vq repeatedly

Code must be resilient to enable a queue many times.

At the moment the queue is resetting so it's definitely not the expected
behavior.

v2: set vq->ready = 0 at disable.

Fixes: 2c53d0f64c06 ("vdpasim: vDPA device simulator")
Cc: [email protected]
Signed-off-by: Eugenio Pérez <[email protected]>
Message-Id: <20220519145919 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>

vDPA/ifcvf: fix uninitialized config_vector warning

Static checkers are not informed that config_vector is controlled
by vf->msix_vector_status, which can only be
MSIX_VECTOR_SHARED_VQ_AND_CONFIG, MSIX_VECTOR_SHARED_VQ_AND_CONFIG
and MSIX_VECTOR_DEV_SHARED.

This commit uses an "if...elseif...else" code block to tell the
checkers that it is a complete set, and config_vector can be
initialized anyway

Signed-off-by: Zhu Lingshan <[email protected]>
Reviewed-by: Dan Carpenter <[email protected]>
Message-Id: <20220424072806.1083189 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Jason Wang <[email protected]>

vdpa/vp_vdpa : add vdpa tool support in vp_vdpa

this patch is to add the support for vdpa tool in vp_vdpa
here is the example steps

modprobe vp_vdpa
modprobe vhost_vdpa
echo 0000:00:06.0>/sys/bus/pci/drivers/virtio-pci/unbind
echo 1af4 1041 > /sys/bus/pci/drivers/vp-vdpa/new_id

vdpa dev add name vdpa1 mgmtdev pci/0000:00:06.0

Signed-off-by: Cindy Lu <[email protected]>
Message-Id: <20220429091030 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Jason Wang <[email protected]>

virtio: Replace long long int with long long

This patch addresses the checkpatch.pl warning that long long is
preferred over long long int.

Signed-off-by: Solomon Tan <[email protected]>
Message-Id: <YlzTUQa06sP94zxB@ArchDesktop>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio: Replace unsigned with unsigned int

This patch addresses the checkpatch.pl warning where unsigned int is
preferred over unsigned.

Signed-off-by: Solomon Tan <[email protected]>
Message-Id: <YlzS49Wo8JMDhKOt@ArchDesktop>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio-crypto: enable retry for virtio-crypto-dev

Enable retry for virtio-crypto-dev, so that crypto-engine
can process cipher-requests parallelly.

Cc: Michael S. Tsirkin <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Gonglei <[email protected]>
Reviewed-by: Gonglei <[email protected]>
Signed-off-by: lei he <[email protected]>
Signed-off-by: zhenwei pi <[email protected]>
Message-Id: <20220506131627 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio-crypto: adjust dst_len at ops callback

For some akcipher operations(eg, decryption of pkcs1pad(rsa)),
the length of returned result maybe less than akcipher_req->dst_len,
we need to recalculate the actual dst_len through the virt-queue
protocol.

Cc: Michael S. Tsirkin <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Gonglei <[email protected]>
Reviewed-by: Gonglei <[email protected]>
Signed-off-by: lei he <[email protected]>
Signed-off-by: zhenwei pi <[email protected]>
Message-Id: <20220506131627 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio-crypto: wait ctrl queue instead of busy polling

Originally, after submitting request into virtio crypto control
queue, the guest side polls the result from the virt queue. This
works like following:
    CPU0   CPU1               ...             CPUx  CPUy
     |      |                                  |     |
     \      \                                  /     /
      \--------spin_lock(&vcrypto->ctrl_lock)-------/
                           |
                 virtqueue add & kick
                           |
                  busy poll virtqueue
                           |
              spin_unlock(&vcrypto->ctrl_lock)
                          ...

There are two problems:
1, The queue depth is always 1, the performance of a virtio crypto
   device gets limited. Multi user processes share a single control
   queue, and hit spin lock race from control queue. Test on Intel
   Platinum 8260, a single worker gets ~35K/s create/close session
   operations, and 8 workers get ~40K/s operations with 800% CPU
   utilization.
2, The control request is supposed to get handled immediately, but
   in the current implementation of QEMU(v6.2), the vCPU thread kicks
   another thread to do this work, the latency also gets unstable.
   Tracking latency of virtio_crypto_alg_akcipher_close_session in 5s:
        usecs               : count     distribution
         0 -> 1          : 0        |                        |
         2 -> 3          : 7        |                        |
         4 -> 7          : 72       |                        |
         8 -> 15         : 186485   |************************|
        16 -> 31         : 687      |                        |
        32 -> 63         : 5        |                        |
        64 -> 127        : 3        |                        |
       128 -> 255        : 1        |                        |
       256 -> 511        : 0        |                        |
       512 -> 1023       : 0        |                        |
      1024 -> 2047       : 0        |                        |
      2048 -> 4095       : 0        |                        |
      4096 -> 8191       : 0        |                        |
      8192 -> 16383      : 2        |                        |
This means that a CPU may hold vcrypto->ctrl_lock as long as 8192~16383us.

To improve the performance of control queue, a request on control queue
waits completion instead of busy polling to reduce lock racing, and gets
completed by control queue callback.
    CPU0   CPU1               ...             CPUx  CPUy
     |      |                                  |     |
     \      \                                  /     /
      \--------spin_lock(&vcrypto->ctrl_lock)-------/
                           |
                 virtqueue add & kick
                           |
      ---------spin_unlock(&vcrypto->ctrl_lock)------
     /      /                                  \     \
     |      |                                  |     |
    wait   wait                               wait  wait

Test this patch, the guest side get ~200K/s operations with 300% CPU
utilization.

Cc: Michael S. Tsirkin <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Gonglei <[email protected]>
Reviewed-by: Gonglei <[email protected]>
Signed-off-by: zhenwei pi <[email protected]>
Message-Id: <20220506131627 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio-crypto: use private buffer for control request

Originally, all of the control requests share a single buffer(
ctrl & input & ctrl_status fields in struct virtio_crypto), this
allows queue depth 1 only, the performance of control queue gets
limited by this design.

In this patch, each request allocates request buffer dynamically, and
free buffer after request, so the scope protected by ctrl_lock also
get optimized here.
It's possible to optimize control queue depth in the next step.

A necessary comment is already in code, still describe it again:
/*
* Note: there are padding fields in request, clear them to zero before
* sending to host to avoid to divulge any information.
* Ex, virtio_crypto_ctrl_request::ctrl::u::destroy_session::padding[48]
*/
So use kzalloc to allocate buffer of struct virtio_crypto_ctrl_request.

Potentially dereferencing uninitialized variables:
Reported-by: kernel test robot <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Gonglei <[email protected]>
Reviewed-by: Gonglei <[email protected]>
Signed-off-by: zhenwei pi <[email protected]>
Message-Id: <20220506131627 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio-crypto: change code style

Use temporary variable to make code easy to read and maintain.
/* Pad cipher's parameters */
        vcrypto->ctrl.u.sym_create_session.op_type =
                cpu_to_le32(VIRTIO_CRYPTO_SYM_OP_CIPHER);
        vcrypto->ctrl.u.sym_create_session.u.cipher.para.algo =
                vcrypto->ctrl.header.algo;
        vcrypto->ctrl.u.sym_create_session.u.cipher.para.keylen =
                cpu_to_le32(keylen);
        vcrypto->ctrl.u.sym_create_session.u.cipher.para.op =
                cpu_to_le32(op);
-->
sym_create_session = &ctrl->u.sym_create_session;
sym_create_session->op_type = cpu_to_le32(VIRTIO_CRYPTO_SYM_OP_CIPHER);
sym_create_session->u.cipher.para.algo = ctrl->header.algo;
sym_create_session->u.cipher.para.keylen = cpu_to_le32(keylen);
sym_create_session->u.cipher.para.op = cpu_to_le32(op);

The new style shows more obviously:
- the variable we want to operate.
- an assignment statement in a single line.

Cc: Michael S. Tsirkin <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Gonglei <[email protected]>
Reviewed-by: Gonglei <[email protected]>
Signed-off-by: zhenwei pi <[email protected]>
Message-Id: <20220506131627 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio-pci: Remove wrong address verification in vp_del_vqs()

GCC 12 enhanced -Waddress when comparing array address to null [0],
which warns:

    drivers/virtio/virtio_pci_common.c: In function ‘vp_del_vqs’:
    drivers/virtio/virtio_pci_common.c:257:29: warning: the comparison will always evaluate as ‘true’ for the pointer operand in ‘vp_dev->msix_affinity_masks + (sizetype)((long unsigned int)i * 256)’ must not be NULL [-Waddress]
      257 |                         if (vp_dev->msix_affinity_masks[i])
          |                             ^~~~~~

In fact, the verification is comparing the result of a pointer
arithmetic, the address "msix_affinity_masks + i", which will always
evaluate to true.

Under the hood, free_cpumask_var() calls kfree(), which is safe to pass
NULL, not requiring non-null verification.  So remove the verification
to make compiler happy (happy compiler, happy life).

[0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102103

Signed-off-by: Murilo Opsfelder Araujo <[email protected]>
Message-Id: <20220415023002 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Christophe de Dinechin <[email protected]>

virtio: pci: Fix an error handling path in vp_modern_probe()

If an error occurs after a successful pci_request_selected_regions() call,
it should be undone by a corresponding pci_release_selected_regions() call,
as already done in vp_modern_remove().

Fixes: fd502729fbbf ("virtio-pci: introduce modern device module")
Signed-off-by: Christophe JAILLET <[email protected]>
Message-Id: <237109725aad2c3c03d14549f777b1927c84b045.1648977064 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vdpasim: control virtqueue support

This patch introduces the control virtqueue support for vDPA
simulator. This is a requirement for supporting advanced features like
multiqueue.

A requirement for control virtqueue is to isolate its memory access
from the rx/tx virtqueues. This is because when using vDPA device
for VM, the control virqueue is not directly assigned to VM. Userspace
(Qemu) will present a shadow control virtqueue to control for
recording the device states.

The isolation is done via the virtqueue groups and ASID support in
vDPA through vhost-vdpa. The simulator is extended to have:

1) three virtqueues: RXVQ, TXVQ and CVQ (control virtqueue)
2) two virtqueue groups: group 0 contains RXVQ and TXVQ; group 1
   contains CVQ
3) two address spaces and the simulator simply implements the address
   spaces by mapping it 1:1 to IOTLB.

For the VM use cases, userspace(Qemu) may set AS 0 to group 0 and AS 1
to group 1. So we have:

1) The IOTLB for virtqueue group 0 contains the mappings of guest, so
   RX and TX can be assigned to guest directly.
2) The IOTLB for virtqueue group 1 contains the mappings of CVQ which
   is the buffers that allocated and managed by VMM only. So CVQ of
   vhost-vdpa is visible to VMM only. And Guest can not access the CVQ
   of vhost-vdpa.

For the other use cases, since AS 0 is associated to all virtqueue
groups by default. All virtqueues share the same mapping by default.

To demonstrate the function, VIRITO_NET_F_CTRL_MACADDR is
implemented in the simulator for the driver to set mac address.

Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: Gautam Dawar <[email protected]>
Message-Id: <20220330180436 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vdpa_sim: filter destination mac address

This patch implements a simple unicast filter for vDPA simulator.

Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: Gautam Dawar <[email protected]>
Message-Id: <20220330180436 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vdpa_sim: factor out buffer completion logic

Wrap up common buffer completion logic in to vdpasim_net_complete

Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: Gautam Dawar <[email protected]>
Message-Id: <20220330180436 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vdpa_sim: advertise VIRTIO_NET_F_MTU

We've already reported maximum mtu via config space, so let's
advertise the feature.

Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: Gautam Dawar <[email protected]>
Message-Id: <20220330180436 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost-vdpa: support ASID based IOTLB API

This patch extends the vhost-vdpa to support ASID based IOTLB API. The
vhost-vdpa device will allocated multiple IOTLBs for vDPA device that
supports multiple address spaces. The IOTLBs and vDPA device memory
mappings is determined and maintained through ASID.

Note that we still don't support vDPA device with more than one
address spaces that depends on platform IOMMU. This work will be done
by moving the IOMMU logic from vhost-vDPA to vDPA device driver.

Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: Gautam Dawar <[email protected]>
Message-Id: <20220330180436 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Includes fixup:

vhost-vdpa: Fix some error handling path in vhost_vdpa_process_iotlb_msg()

In the error paths introduced by the original patch, a mutex may be left locked.
Add the correct goto instead of a direct return.

Signed-off-by: Christophe JAILLET <[email protected]>
Message-Id: <89ef0ae4c26ac3cfa440c71e97e392dcb328ac1b.1653227924 [email protected]>
Acked-by: Jason Wang <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost-vdpa: introduce uAPI to set group ASID

Follows the vDPA support for associating ASID to a specific virtqueue
group. This patch adds a uAPI to support setting them from userspace.

Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: Gautam Dawar <[email protected]>
Message-Id: <20220330180436 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost-vdpa: uAPI to get virtqueue group id

Follows the support for virtqueue group in vDPA. This patches
introduces uAPI to get the virtqueue group ID for a specific virtqueue
in vhost-vdpa.

Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: Gautam Dawar <[email protected]>
Message-Id: <20220330180436 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost-vdpa: introduce uAPI to get the number of address spaces

This patch introduces the uAPI for getting the number of address
spaces supported by this vDPA device.

Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: Gautam Dawar <[email protected]>
Message-Id: <20220330180436 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>