Git Repo - linux.git/log

ARM: 9208/1: entry: add .ltorg directive to keep literals in range

LKP reports a build issue on Clang, related to a literal load of
__current issued through the ldr_va macro. This turns out to be due to
the fact that group relocations are disabled when CONFIG_COMPILE_TEST=y,
which means that the ldr_va macro resolves to a pair of LDR
instructions, the first one being a literal load issued too far from its
literal pool.

Due to the introduction of a couple of new uses of this macro in commit
508074607c7b95b2 ("ARM: 9195/1: entry: avoid explicit literal loads"),
the literal pools end up getting rearranged in a way that causes the
literal for __current to go out of range. Let's fix this up by putting a
.ltorg directive in a suitable place in the code.

Link: https://lore.kernel.org/all/[email protected]/
Fixes: 508074607c7b95b2 ("ARM: 9195/1: entry: avoid explicit literal loads")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>

ARM: 9207/1: amba: fix refcount underflow if amba_device_add() fails

"ARM: 9192/1: amba: fix memory leak in amba_device_try_add()" leads
to a refcount underflow if amba_device_add() fails, which called by
of_amba_device_create(), the of_amba_device_create() already exists
the error handling, so amba_put_device() only need to be added into
amba_deferred_retry().

Fixes: 7719a68b2fa4 ("ARM: 9192/1: amba: fix memory leak in amba_device_try_add()")
Reported-by: Guenter Roeck <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Signed-off-by: Kefeng Wang <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>

Documentation: kvm: clarify histogram units

In the case of histogram statistics, the values are always sample
counts; the unit instead applies to the bucket range. For example,
halt_poll_success_hist is a nanosecond statistic because the buckets are
for 0ns, 1ns, 2-3ns, 4-7ns etc. There isn't really any other sensible
interpretation, but clarify this anyway in the Documentation.

Signed-off-by: Paolo Bonzini <[email protected]>

kvm: stats: tell userspace which values are boolean

Some of the statistics values exported by KVM are always only 0 or 1.
It can be useful to export this fact to userspace so that it can track
them specially (for example by polling the value every now and then to
compute a % of time spent in a specific state).

Therefore, add "boolean value" as a new "unit". While it is not exactly
a unit, it walks and quacks like one. In particular, using the type
would be wrong because boolean values could be instantaneous or peak
values (e.g. "is the rmap allocated?") or even two-bucket histograms
(e.g. "number of posted vs. non-posted interrupt injections").

Suggested-by: Amneesh Singh <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

x86/kvm: fix FASTOP_SIZE when return thunks are enabled

The return thunk call makes the fastop functions larger, just like IBT
does. Consider a 16-byte FASTOP_SIZE when CONFIG_RETHUNK is enabled.

Otherwise, functions will be incorrectly aligned and when computing their
position for differently sized operators, they will executed in the middle
or end of a function, which may as well be an int3, leading to a crash
like:

[   36.091116] int3: 0000 [#1] SMP NOPTI
[   36.091119] CPU: 3 PID: 1371 Comm: qemu-system-x86 Not tainted 5.15.0-41-generic #44
[   36.091120] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[   36.091121] RIP: 0010:xaddw_ax_dx+0x9/0x10 [kvm]
[   36.091185] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3 cc cc
[   36.091186] RSP: 0018:ffffb1f541143c98 EFLAGS: 00000202
[   36.091188] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000
[   36.091188] RDX: 0000000076543210 RSI: ffffffffc073c6d0 RDI: 0000000000000200
[   36.091189] RBP: ffffb1f541143ca0 R08: ffff9f1803350a70 R09: 0000000000000002
[   36.091190] R10: ffff9f1803350a70 R11: 0000000000000000 R12: ffff9f1803350a70
[   36.091190] R13: ffffffffc077fee0 R14: 0000000000000000 R15: 0000000000000000
[   36.091191] FS:  00007efdfce8d640(0000) GS:ffff9f187dd80000(0000) knlGS:0000000000000000
[   36.091192] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   36.091192] CR2: 0000000000000000 CR3: 0000000009b62002 CR4: 0000000000772ee0
[   36.091195] PKRU: 55555554
[   36.091195] Call Trace:
[   36.091197]  <TASK>
[   36.091198]  ? fastop+0x5a/0xa0 [kvm]
[   36.091222]  x86_emulate_insn+0x7b8/0xe90 [kvm]
[   36.091244]  x86_emulate_instruction+0x2f4/0x630 [kvm]
[   36.091263]  ? kvm_arch_vcpu_load+0x7c/0x230 [kvm]
[   36.091283]  ? vmx_prepare_switch_to_host+0xf7/0x190 [kvm_intel]
[   36.091290]  complete_emulated_mmio+0x297/0x320 [kvm]
[   36.091310]  kvm_arch_vcpu_ioctl_run+0x32f/0x550 [kvm]
[   36.091330]  kvm_vcpu_ioctl+0x29e/0x6d0 [kvm]
[   36.091344]  ? kvm_vcpu_ioctl+0x120/0x6d0 [kvm]
[   36.091357]  ? __fget_files+0x86/0xc0
[   36.091362]  ? __fget_files+0x86/0xc0
[   36.091363]  __x64_sys_ioctl+0x92/0xd0
[   36.091366]  do_syscall_64+0x59/0xc0
[   36.091369]  ? syscall_exit_to_user_mode+0x27/0x50
[   36.091370]  ? do_syscall_64+0x69/0xc0
[   36.091371]  ? syscall_exit_to_user_mode+0x27/0x50
[   36.091372]  ? __x64_sys_writev+0x1c/0x30
[   36.091374]  ? do_syscall_64+0x69/0xc0
[   36.091374]  ? exit_to_user_mode_prepare+0x37/0xb0
[   36.091378]  ? syscall_exit_to_user_mode+0x27/0x50
[   36.091379]  ? do_syscall_64+0x69/0xc0
[   36.091379]  ? do_syscall_64+0x69/0xc0
[   36.091380]  ? do_syscall_64+0x69/0xc0
[   36.091381]  ? do_syscall_64+0x69/0xc0
[   36.091381]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[   36.091384] RIP: 0033:0x7efdfe6d1aff
[   36.091390] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
[   36.091391] RSP: 002b:00007efdfce8c460 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   36.091393] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007efdfe6d1aff
[   36.091393] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000c
[   36.091394] RBP: 0000558f1609e220 R08: 0000558f13fb8190 R09: 00000000ffffffff
[   36.091394] R10: 0000558f16b5e950 R11: 0000000000000246 R12: 0000000000000000
[   36.091394] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[   36.091396]  </TASK>
[   36.091397] Modules linked in: isofs nls_iso8859_1 kvm_intel joydev kvm input_leds serio_raw sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_devintf ipmi_msghandler drm msr ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel virtio_net net_failover crypto_simd ahci xhci_pci cryptd psmouse virtio_blk libahci xhci_pci_renesas failover
[   36.123271] ---[ end trace db3c0ab5a48fabcc ]---
[   36.123272] RIP: 0010:xaddw_ax_dx+0x9/0x10 [kvm]
[   36.123319] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3 cc cc
[   36.123320] RSP: 0018:ffffb1f541143c98 EFLAGS: 00000202
[   36.123321] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000
[   36.123321] RDX: 0000000076543210 RSI: ffffffffc073c6d0 RDI: 0000000000000200
[   36.123322] RBP: ffffb1f541143ca0 R08: ffff9f1803350a70 R09: 0000000000000002
[   36.123322] R10: ffff9f1803350a70 R11: 0000000000000000 R12: ffff9f1803350a70
[   36.123323] R13: ffffffffc077fee0 R14: 0000000000000000 R15: 0000000000000000
[   36.123323] FS:  00007efdfce8d640(0000) GS:ffff9f187dd80000(0000) knlGS:0000000000000000
[   36.123324] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   36.123325] CR2: 0000000000000000 CR3: 0000000009b62002 CR4: 0000000000772ee0
[   36.123327] PKRU: 55555554
[   36.123328] Kernel panic - not syncing: Fatal exception in interrupt
[   36.123410] Kernel Offset: 0x1400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   36.135305] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Fixes: aa3d480315ba ("x86: Use return-thunk in asm code")
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Co-developed-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Reported-by: Linux Kernel Functional Testing <[email protected]>
Message-Id: <20220713171241 [email protected]>
Tested-by: Jack Wang <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: nVMX: Always enable TSC scaling for L2 when it was enabled for L1

Windows 10/11 guests with Hyper-V role (WSL2) enabled are observed to
hang upon boot or shortly after when a non-default TSC frequency was
set for L1. The issue is observed on a host where TSC scaling is
supported. The problem appears to be that Windows doesn't use TSC
frequency for its guests even when the feature is advertised and KVM
filters SECONDARY_EXEC_TSC_SCALING out when creating L2 controls from
L1's. This leads to L2 running with the default frequency (matching
host's) while L1 is running with an altered one.

Keep SECONDARY_EXEC_TSC_SCALING in secondary exec controls for L2 when
it was set for L1. TSC_MULTIPLIER is already correctly computed and
written by prepare_vmcs02().

Signed-off-by: Vitaly Kuznetsov <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Message-Id: <20220712135009 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

um: Add missing apply_returns()

Implement apply_returns() stub for UM, just like all the other patching
routines.

Fixes: 15e67227c49a ("x86: Undo return-thunk damage")
Reported-by: Randy Dunlap <[email protected])
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/Ys%2Ft45l%[email protected]

net: atlantic: remove aq_nic_deinit() when resume

aq_nic_deinit() has been called while suspending, so we don't have to call
it again on resume.
Actually, call it again leads to another hang issue when resuming from
S3.

Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992345] Call Trace:
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992346] <TASK>
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992348] aq_nic_deinit+0xb4/0xd0 [atlantic]
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992356] aq_pm_thaw+0x7f/0x100 [atlantic]
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992362] pci_pm_resume+0x5c/0x90
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992366] ? pci_pm_thaw+0x80/0x80
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992368] dpm_run_callback+0x4e/0x120
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992371] device_resume+0xad/0x200
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992373] async_resume+0x1e/0x40
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992374] async_run_entry_fn+0x33/0x120
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992377] process_one_work+0x220/0x3c0
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992380] worker_thread+0x4d/0x3f0
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992382] ? process_one_work+0x3c0/0x3c0
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992384] kthread+0x12a/0x150
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992386] ? set_kthread_struct+0x40/0x40
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992387] ret_from_fork+0x22/0x30
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992391] </TASK>
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992392] ---[ end trace 1ec8c79604ed5e0d ]---
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992394] PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992397] atlantic 0000:02:00.0: PM: failed to resume async: error -110

Fixes: 1809c30b6e5a ("net: atlantic: always deep reset on pm op, fixing up my null deref regression")
Signed-off-by: Chia-Lin Kao (AceLan) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: atlantic: remove deep parameter on suspend/resume functions

Below commit claims that atlantic NIC requires to reset the device on pm
op, and had set the deep to true for all suspend/resume functions.
commit 1809c30b6e5a ("net: atlantic: always deep reset on pm op, fixing up my null deref regression")
So, we could remove deep parameter on suspend/resume functions without
any functional change.

Fixes: 1809c30b6e5a ("net: atlantic: always deep reset on pm op, fixing up my null deref regression")
Signed-off-by: Chia-Lin Kao (AceLan) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

sfc: fix kernel panic when creating VF

When creating VFs a kernel panic can happen when calling to
efx_ef10_try_update_nic_stats_vf.

When releasing a DMA coherent buffer, sometimes, I don't know in what
specific circumstances, it has to unmap memory with vunmap. It is
disallowed to do that in IRQ context or with BH disabled. Otherwise, we
hit this line in vunmap, causing the crash:
  BUG_ON(in_interrupt());

This patch reenables BH to release the buffer.

Log messages when the bug is hit:
kernel BUG at mm/vmalloc.c:2727!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 6 PID: 1462 Comm: NetworkManager Kdump: loaded Tainted: G          I      --------- ---  5.14.0-119.el9.x86_64 #1
Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS 2.8.2 08/27/2020
RIP: 0010:vunmap+0x2e/0x30
...skip...
Call Trace:
  __iommu_dma_free+0x96/0x100
  efx_nic_free_buffer+0x2b/0x40 [sfc]
  efx_ef10_try_update_nic_stats_vf+0x14a/0x1c0 [sfc]
  efx_ef10_update_stats_vf+0x18/0x40 [sfc]
  efx_start_all+0x15e/0x1d0 [sfc]
  efx_net_open+0x5a/0xe0 [sfc]
  __dev_open+0xe7/0x1a0
  __dev_change_flags+0x1d7/0x240
  dev_change_flags+0x21/0x60
  ...skip...

Fixes: d778819609a2 ("sfc: DMA the VF stats only when requested")
Reported-by: Ma Yuying <[email protected]>
Signed-off-by: Íñigo Huguet <[email protected]>
Acked-by: Edward Cree <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge tag 'kvm-riscv-fixes-5.19-2' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv fixes for 5.19, take #2

- Fix missing PAGE_PFN_MASK

- Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests()

Merge branch 'seg6-fix-skb-checksum-for-srh-encapsulation-insertion'

Andrea Mayer says:

====================
seg6: fix skb checksum for SRH encapsulation/insertion

The Linux kernel supports Segment Routing Header (SRH)
encapsulation/insertion operations by providing the capability to: i)
encapsulate a packet in an outer IPv6 header with a specified SRH; ii)
insert a specified SRH directly after the IPv6 header of the packet.
Note that the insertion operation is also referred to as 'injection'.

The two operations are respectively supported by seg6_do_srh_encap() and
seg6_do_srh_inline(), which operate on the skb associated to the packet as
needed (e.g. adding the necessary headers and initializing them, while
taking care to recalculate the skb checksum).

seg6_do_srh_encap() and seg6_do_srh_inline() do not initialize the payload
length of the IPv6 header, which is carried out by the caller functions.
However, this approach causes the corruption of the skb checksum which
needs to be updated only after initialization of headers is completed
(thanks to Paolo Abeni for detecting this issue).

The patchset fixes the skb checksum corruption by moving the IPv6 header
payload length initialization from the callers of seg6_do_srh_encap() and
seg6_do_srh_inline() directly into these functions.

This patchset is organized as follows:
- patch 1/3, seg6: fix skb checksum evaluation in SRH
   encapsulation/insertion;
    (* SRH encapsulation/insertion available since v4.10)

- patch 2/3, seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps
   behaviors;
    (* SRv6 End.B6 and End.B6.Encaps behaviors available since v4.14)

- patch 3/3, seg6: bpf: fix skb checksum in bpf_push_seg6_encap();
    (* bpf IPv6 Segment Routing helpers available since v4.18)

====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

seg6: bpf: fix skb checksum in bpf_push_seg6_encap()

Both helper functions bpf_lwt_seg6_action() and bpf_lwt_push_encap() use
the bpf_push_seg6_encap() to encapsulate the packet in an IPv6 with Segment
Routing Header (SRH) or insert an SRH between the IPv6 header and the
payload.
To achieve this result, such helper functions rely on bpf_push_seg6_encap()
which, in turn, leverages seg6_do_srh_{encap,inline}() to perform the
required operation (i.e. encap/inline).

This patch removes the initialization of the IPv6 header payload length
from bpf_push_seg6_encap(), as it is now handled properly by
seg6_do_srh_{encap,inline}() to prevent corruption of the skb checksum.

Fixes: fe94cc290f53 ("bpf: Add IPv6 Segment Routing helpers")
Signed-off-by: Andrea Mayer <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps behaviors

The SRv6 End.B6 and End.B6.Encaps behaviors rely on functions
seg6_do_srh_{encap,inline}() to, respectively: i) encapsulate the
packet within an outer IPv6 header with the specified Segment Routing
Header (SRH); ii) insert the specified SRH directly after the IPv6
header of the packet.

This patch removes the initialization of the IPv6 header payload length
from the input_action_end_b6{_encap}() functions, as it is now handled
properly by seg6_do_srh_{encap,inline}() to avoid corruption of the skb
checksum.

Fixes: 140f04c33bbc ("ipv6: sr: implement several seg6local actions")
Signed-off-by: Andrea Mayer <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

seg6: fix skb checksum evaluation in SRH encapsulation/insertion

Support for SRH encapsulation and insertion was introduced with
commit 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and
injection with lwtunnels"), through the seg6_do_srh_encap() and
seg6_do_srh_inline() functions, respectively.
The former encapsulates the packet in an outer IPv6 header along with
the SRH, while the latter inserts the SRH between the IPv6 header and
the payload. Then, the headers are initialized/updated according to the
operating mode (i.e., encap/inline).
Finally, the skb checksum is calculated to reflect the changes applied
to the headers.

The IPv6 payload length ('payload_len') is not initialized
within seg6_do_srh_{inline,encap}() but is deferred in seg6_do_srh(), i.e.
the caller of seg6_do_srh_{inline,encap}().
However, this operation invalidates the skb checksum, since the
'payload_len' is updated only after the checksum is evaluated.

To solve this issue, the initialization of the IPv6 payload length is
moved from seg6_do_srh() directly into the seg6_do_srh_{inline,encap}()
functions and before the skb checksum update takes place.

Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Reported-by: Paolo Abeni <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Andrea Mayer <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

netfs: do not unlock and put the folio twice

check_write_begin() will unlock and put the folio when return
non-zero.  So we should avoid unlocking and putting it twice in
netfs layer.

Change the way ->check_write_begin() works in the following two ways:

(1) Pass it a pointer to the folio pointer, allowing it to unlock and put
     the folio prior to doing the stuff it wants to do, provided it clears
     the folio pointer.

(2) Change the return values such that 0 with folio pointer set means
     continue, 0 with folio pointer cleared means re-get and all error
     codes indicating an error (no special treatment for -EAGAIN).

[ bagasdotme: use Sphinx code text syntax for *foliop pointer ]

Cc: [email protected]
Link: https://tracker.ceph.com/issues/56423
Link: https://lore.kernel.org/r/[email protected]
Co-developed-by: David Howells <[email protected]>
Signed-off-by: Xiubo Li <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Bagas Sanjaya <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

x86/entry: Remove UNTRAIN_RET from native_irq_return_ldt

UNTRAIN_RET is not needed in native_irq_return_ldt because RET
untraining has already been done at this point.

In addition, when the RETBleed mitigation is IBPB, UNTRAIN_RET clobbers
several registers (AX, CX, DX) so here it trashes user values which are
in these registers.

Signed-off-by: Alexandre Chartre <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

x86/bugs: Mark retbleed_strings static

This symbol is not used outside of bugs.c, so mark it static.

Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Jiapeng Chong <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

scsi: pm80xx: Set stopped phy's linkrate to Disabled

Negotiated link rate needs to be updated to 'Disabled' when phy is stopped.

Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Igor Pylypiv <[email protected]>
Signed-off-by: Changyuan Lyu <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: pm80xx: Fix 'Unknown' max/min linkrate

Currently, the data flow of the max/min linkrate in the driver is

* in pm8001_get_lrate_mode():
   hardcoded value ==> struct sas_phy

* in pm8001_bytes_dmaed():
   struct pm8001_phy ==> struct sas_phy

* in pm8001_phy_control():
   libsas data ==> struct pm8001_phy

Since pm8001_bytes_dmaed() follows pm8001_get_lrate_mode(), and the fields
in struct pm8001_phy are not initialized, sysfs
`/sys/class/sas_phy/phy-*/maximum_linkrate` always shows `Unknown`.

To fix the issue, change the dataflow to the following:

* in pm8001_phy_init():
   initial value ==> struct pm8001_phy

* in pm8001_get_lrate_mode():
   struct pm8001_phy ==> struct sas_phy

* in pm8001_phy_control():
   libsas data ==> struct pm8001_phy

For negotiated linkrate, the current dataflow is:

* in pm8001_get_lrate_mode():
   iomb data ==> struct asd_sas_phy ==> struct sas_phy

* in pm8001_bytes_dmaed():
   struct asd_sas_phy ==> struct sas_phy

Since pm8001_bytes_dmaed() follows pm8001_get_lrate_mode(), the assignment
statements in pm8001_bytes_dmaed() are unnecessary and cleaned up.

Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Igor Pylypiv <[email protected]>
Acked-by: Jack Wang <[email protected]>
Signed-off-by: Changyuan Lyu <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: ufs: core: Fix missing clk change notification on host reset

In ufshcd_host_reset_and_restore(), ufshcd_set_clk_freq() is called to
scale clock rate. However, this did not call vops->clk_scale_notify() to
inform platform driver of clock change.

Call ufshcd_scale_clks() instead so that clock change can be properly
handled.

Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Bart Van Assche <[email protected]>
Reviewed-by: Stanley Chu <[email protected]>
Signed-off-by: Po-Wen Kao <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: ufs: core: Drop loglevel of WriteBoost message

Commit '3b5f3c0d0548 ("scsi: ufs: core: Tidy up WB configuration code")'
changed the log level of the write boost enable/disable notification from
debug to info. This results in a lot of noise in the kernel log during
normal operation.

Drop it back to debug level to avoid this.

Link: https://lore.kernel.org/r/[email protected]
Fixes: 3b5f3c0d0548 ("scsi: ufs: core: Tidy up WB configuration code")
Reviewed-by: Alim Akhtar <[email protected]>
Acked-by: Bean Huo <[email protected]>
Signed-off-by: Bjorn Andersson <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-07-12

This series contains updates to ice driver only.

Paul fixes detection of E822 devices for firmware update and changes NVM
read for snapshot creation to be done in chunks as some systems cannot
read the entire NVM in the allotted time.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

scsi: megaraid: Clear READ queue map's nr_queues

The megaraid SCSI driver sets set->nr_maps as 3 if poll_queues is > 0, and
blk-mq actually initializes each map's nr_queues as nr_hw_queues.
Consequently the driver has to clear READ queue map's nr_queues, otherwise
the queue map becomes broken if poll_queues is set as non-zero.

Link: https://lore.kernel.org/r/[email protected]
Fixes: 9e4bec5b2a23 ("scsi: megaraid_sas: mq_poll support")
Cc: Kashyap Desai <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Hannes Reinecke <[email protected]>
Reported-by: Guangwu Zhang <[email protected]>
Tested-by: Guangwu Zhang <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

sfc: fix use after free when disabling sriov

Use after free is detected by kfence when disabling sriov. What was read
after being freed was vf->pci_dev: it was freed from pci_disable_sriov
and later read in efx_ef10_sriov_free_vf_vports, called from
efx_ef10_sriov_free_vf_vswitching.

Set the pointer to NULL at release time to not trying to read it later.

Reproducer and dmesg log (note that kfence doesn't detect it every time):
$ echo 1 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs
$ echo 0 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs

BUG: KFENCE: use-after-free read in efx_ef10_sriov_free_vf_vswitching+0x82/0x170 [sfc]

Use-after-free read at 0x00000000ff3c1ba5 (in kfence-#224):
  efx_ef10_sriov_free_vf_vswitching+0x82/0x170 [sfc]
  efx_ef10_pci_sriov_disable+0x38/0x70 [sfc]
  efx_pci_sriov_configure+0x24/0x40 [sfc]
  sriov_numvfs_store+0xfe/0x140
  kernfs_fop_write_iter+0x11c/0x1b0
  new_sync_write+0x11f/0x1b0
  vfs_write+0x1eb/0x280
  ksys_write+0x5f/0xe0
  do_syscall_64+0x5c/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

kfence-#224: 0x00000000edb8ef95-0x00000000671f5ce1, size=2792, cache=kmalloc-4k

allocated by task 6771 on cpu 10 at 3137.860196s:
  pci_alloc_dev+0x21/0x60
  pci_iov_add_virtfn+0x2a2/0x320
  sriov_enable+0x212/0x3e0
  efx_ef10_sriov_configure+0x67/0x80 [sfc]
  efx_pci_sriov_configure+0x24/0x40 [sfc]
  sriov_numvfs_store+0xba/0x140
  kernfs_fop_write_iter+0x11c/0x1b0
  new_sync_write+0x11f/0x1b0
  vfs_write+0x1eb/0x280
  ksys_write+0x5f/0xe0
  do_syscall_64+0x5c/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

freed by task 6771 on cpu 12 at 3170.991309s:
  device_release+0x34/0x90
  kobject_cleanup+0x3a/0x130
  pci_iov_remove_virtfn+0xd9/0x120
  sriov_disable+0x30/0xe0
  efx_ef10_pci_sriov_disable+0x57/0x70 [sfc]
  efx_pci_sriov_configure+0x24/0x40 [sfc]
  sriov_numvfs_store+0xfe/0x140
  kernfs_fop_write_iter+0x11c/0x1b0
  new_sync_write+0x11f/0x1b0
  vfs_write+0x1eb/0x280
  ksys_write+0x5f/0xe0
  do_syscall_64+0x5c/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 3c5eb87605e85 ("sfc: create vports for VFs and assign random MAC addresses")
Reported-by: Yanghang Liu <[email protected]>
Signed-off-by: Íñigo Huguet <[email protected]>
Acked-by: Martin Habets <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

smb3: workaround negprot bug in some Samba servers

Mount can now fail to older Samba servers due to a server
bug handling padding at the end of the last negotiate
context (negotiate contexts typically are rounded up to 8
bytes by adding padding if needed). This server bug can
be avoided by switching the order of negotiate contexts,
placing a negotiate context at the end that does not
require padding (prior to the recent netname context fix
this was the case on the client).

Fixes: 73130a7b1ac9 ("smb3: fix empty netname context on secondary channels")
Reported-by: Julian Sikorski <[email protected]>
Tested-by: Julian Sikorski <[email protected]>
Reviewed-by: Shyam Prasad N <[email protected]>
Signed-off-by: Steve French <[email protected]>

ACPI: CPPC: Fix enabling CPPC on AMD systems with shared memory

When commit 72f2ecb7ece7 ("ACPI: bus: Set CPPC _OSC bits for all
and when CPPC_LIB is supported") was introduced, we found collateral
damage that a number of AMD systems that supported CPPC but
didn't advertise support in _OSC stopped having a functional
amd-pstate driver. The _OSC was only enforced on Intel systems at that
time.

This was fixed for the MSR based designs by commit 8b356e536e69f
("ACPI: CPPC: Don't require _OSC if X86_FEATURE_CPPC is supported")
but some shared memory based designs also support CPPC but haven't
advertised support in the _OSC. Add support for those designs as well by
hardcoding the list of systems.

Fixes: 72f2ecb7ece7 ("ACPI: bus: Set CPPC _OSC bits for all and when CPPC_LIB is supported")
Fixes: 8b356e536e69f ("ACPI: CPPC: Don't require _OSC if X86_FEATURE_CPPC is supported")
Link: https://lore.kernel.org/all/[email protected]/
Cc: 5.18+ <[email protected]> # 5.18+
Reported-and-tested-by: Oleksandr Natalenko <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

vf/remap: return the amount of bytes actually deduplicated

When using the FIDEDUPRANGE ioctl, in case of success the requested size
is returned. In some cases this might not be the actual amount of bytes
deduplicated.

This change modifies vfs_dedupe_file_range() to report the actual amount
of bytes deduplicated, instead of the requested amount.

Link: https://lore.kernel.org/linux-fsdevel/[email protected]/
Reported-by: Ansgar Lößer <[email protected]>
Reported-by: Max Schlecht <[email protected]>
Reported-by: Björn Scheuermann <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Darrick J Wong <[email protected]>
Signed-off-by: Ansgar Lößer <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'cgroup-for-5.19-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fix from Tejun Heo:
"Fix an old and subtle bug in the migration path.

  css_sets are used to track tasks and migrations are tasks moving from
  a group of css_sets to another group of css_sets. The migration path
  pins all source and destination css_sets in the prep stage.

  Unfortunately, it was overloading the same list_head entry to track
  sources and destinations, which got confused for migrations which are
  partially identity leading to use-after-frees.

  Fixed by using dedicated list_heads for tracking sources and
  destinations"

* tag 'cgroup-for-5.19-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Use separate src/dst nodes when preloading css_sets for migration

Merge tag 'dt-fixes-for-palmer-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git into fixes

Microchip RISC-V devicetree fixes for 5.19-rc6

A single fix for mpfs.dtsi:
- The l2 cache controller was never hooked up in the dt, so userspace
  is presented with the wrong topology information, so it has been
  hooked up.

* tag 'dt-fixes-for-palmer-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git:
  riscv: dts: microchip: hook up the mpfs' l2cache

fs/remap: constrain dedupe of EOF blocks

If dedupe of an EOF block is not constrainted to match against only
other EOF blocks with the same EOF offset into the block, it can
match against any other block that has the same matching initial
bytes in it, even if the bytes beyond EOF in the source file do
not match.

Fix this by constraining the EOF block matching to only match
against other EOF blocks that have identical EOF offsets and data.
This allows "whole file dedupe" to continue to work without allowing
eof blocks to randomly match against partial full blocks with the
same data.

Reported-by: Ansgar Lößer <[email protected]>
Fixes: 1383a7ed6749 ("vfs: check file ranges before cloning files")
Link: https://lore.kernel.org/linux-fsdevel/[email protected]/
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drm/amd/display: Ensure valid event timestamp for cursor-only commits

Requires enabling the vblank machinery for them.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2030
Signed-off-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amd/display: correct check of coverage blend mode

Check the value of per_pixel_alpha to decide whether the Coverage pixel
blend mode is applicable or not.

Fixes: 76818cdd11a2 ("drm/amd/display: add Coverage blend mode for overlay plane")
Reported-by: kernel test robot <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Reviewed-by: Harry Wentland <[email protected]>
Signed-off-by: Melissa Wen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/pm: Prevent divide by zero

divide error: 0000 [#1] SMP PTI
CPU: 3 PID: 78925 Comm: tee Not tainted 5.15.50-1-lts #1
Hardware name: MSI MS-7A59/Z270 SLI PLUS (MS-7A59), BIOS 1.90 01/30/2018
RIP: 0010:smu_v11_0_set_fan_speed_rpm+0x11/0x110 [amdgpu]

Speed is user-configurable through a file.
I accidentally set it to zero, and the driver crashed.

Reviewed-by: Evan Quan <[email protected]>
Reviewed-by: André Almeida <[email protected]>
Signed-off-by: Yefim Barashkin <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amd/display: Only use depth 36 bpp linebuffers on DCN display engines.

Various DCE versions had trouble with 36 bpp lb depth, requiring fixes,
last time in commit 353ca0fa5630 ("drm/amd/display: Fix 10bit 4K display
on CIK GPUs") for DCE-8. So far >= DCE-11.2 was considered ok, but now I
found out that on DCE-11.2 it causes dithering when there shouldn't be
any, so identity pixel passthrough with identity gamma LUTs doesn't work
when it should. This breaks various important neuroscience applications,
as reported to me by scientific users of Polaris cards under Ubuntu 22.04
with Linux 5.15, and confirmed by testing it myself on DCE-11.2.

Lets only use depth 36 for DCN engines, where my testing showed that it
is both necessary for high color precision output, e.g., RGBA16 fb's,
and not harmful, as far as more than one year in real-world use showed.

DCE engines seem to work fine for high precision output at 30 bpp, so
this ("famous last words") depth 30 should hopefully fix all known problems
without introducing new ones.

Successfully retested on DCE-11.2 Polaris and DCN-1.0 Raven Ridge on
top of Linux 5.19.0-rc2 + drm-next.

Fixes: 353ca0fa5630 ("drm/amd/display: Fix 10bit 4K display on CIK GPUs")
Signed-off-by: Mario Kleiner <[email protected]>
Tested-by: Mario Kleiner <[email protected]>
Cc: [email protected] # 5.14.0
Cc: Alex Deucher <[email protected]>
Cc: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: correct the MEC atomic support firmware checking for GC 10.3.7

On the GC 10.3.7 platform the initial MEC release version #3 can support
atomic operation,so need correct and set its MEC atomic support version to #3.

Signed-off-by: Prike Liang <[email protected]>
Reviewed-by: Aaron Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 5.18.x

drm/amd/display: Ignore First MST Sideband Message Return Error

[why]
First MST sideband message returns AUX_RET_ERROR_HPD_DISCON
on certain intel platform. Aux transaction considered failure
if HPD unexpected pulled low. The actual aux transaction success
in such case, hence do not return error.

[how]
Not returning error when AUX_RET_ERROR_HPD_DISCON detected
on the first sideband message.

v2: squash in additional DMI entries
v3: squash in static fix

Signed-off-by: Fangzhi Zuo <[email protected]>
Acked-by: Solomon Chiu <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

ima: Fix potential memory leak in ima_init_crypto()

On failure to allocate the SHA1 tfm, IMA fails to initialize and exits
without freeing the ima_algo_array. Add the missing kfree() for
ima_algo_array to avoid the potential memory leak.

Signed-off-by: Jianglei Nie <[email protected]>
Fixes: 6d94809af6b0 ("ima: Allocate and initialize tfm for each PCR bank")
Signed-off-by: Mimi Zohar <[email protected]>

ima: force signature verification when CONFIG_KEXEC_SIG is configured

Currently, an unsigned kernel could be kexec'ed when IMA arch specific
policy is configured unless lockdown is enabled. Enforce kernel
signature verification check in the kexec_file_load syscall when IMA
arch specific policy is configured.

Fixes: 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE")
Reported-and-suggested-by: Mimi Zohar <[email protected]>
Signed-off-by: Coiby Xu <[email protected]>
Signed-off-by: Mimi Zohar <[email protected]>

net: sunhme: output link status with a single print.

This driver currently prints the link status using four separate
printk calls, which these days gets presented to the user as four
distinct messages, not exactly ideal:

  [   32.582778] eth0: Link is up using
  [   32.582828] internal
  [   32.582837] transceiver at
  [   32.582888] 100Mb/s, Full Duplex.

Restructure the display_link_mode function to use a single netdev_info
call to present all this information as a single message, which is much
nicer:

  [   33.640143] hme 0000:00:01.1 eth0: Link is up using internal transceiver at 100Mb/s, Full Duplex.

The display_forced_link_mode function has a similar structure, so adjust
it in a similar fashion.

Signed-off-by: Nick Bowler <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

r8152: fix accessing unset transport header

A warning is triggered by commit 66e4c8d95008 ("net: warn if transport
header was not set"). The warning is harmless, because the value from
skb_transport_offset() is only used for skb_is_gso() is true or the
skb->ip_summed is equal to CHECKSUM_PARTIAL.

Fixes: 66e4c8d95008 ("net: warn if transport header was not set")
Signed-off-by: Hayes Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'wireless-2022-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
A small set of fixes for
* queue selection in mesh/ocb
* queue handling on interface stop
* hwsim virtio device vs. some other virtio changes
* dt-bindings email addresses
* color collision memory allocation
* a const variable in rtw88
* shared SKB transmit in the ethernet format path
* P2P client port authorization
====================

Signed-off-by: David S. Miller <[email protected]>

s390/nospec: remove unneeded header includes

Commit 4efd417f298b ("s390: raise minimum supported machine generation
to z10") removed the usage of alternatives and lowcore in expolines
macros. Remove unneeded header includes as well.

With that, expoline.S doesn't require asm-offsets.h and
expoline_prepare target dependency could be removed.

Tested-by: Joe Lawrence <[email protected]>
Link: https://lore.kernel.org/r/patch-2.thread-d13b6c.git-d13b6c96fb5f.your-ad-here.call-01656331067-ext-4899@work.hours
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Alexander Gordeev <[email protected]>

s390/nospec: build expoline.o for modules_prepare target

When CONFIG_EXPOLINE_EXTERN is used expoline thunks are generated
from arch/s390/lib/expoline.S and postlinked into every module.
This is also true for external modules. Add expoline.o build to
the modules_prepare target.

Fixes: 1d2ad084800e ("s390/nospec: add an option to use thunk-extern")
Reported-by: Joe Lawrence <[email protected]>
Tested-by: Sumanth Korikkar <[email protected]>
Acked-by: Sumanth Korikkar <[email protected]>
Tested-by: C. Erastus Toe <[email protected]>
Tested-by: Joe Lawrence <[email protected]>
Link: https://lore.kernel.org/r/patch-1.thread-d13b6c.git-a2387a74dc49.your-ad-here.call-01656331067-ext-4899@work.hours
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Alexander Gordeev <[email protected]>

net: stmmac: fix leaks in probe

These two error paths should clean up before returning.

Fixes: 2bb4b98b60d7 ("net: stmmac: Add Ingenic SoCs MAC support.")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ftgmac100: Hold reference returned by of_get_child_by_name()

In ftgmac100_probe(), we should hold the refernece returned by
of_get_child_by_name() and use it to call of_node_put() for
reference balance.

Fixes: 39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle property")
Signed-off-by: Liang He <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'net-sysctl-races'

Kuniyuki Iwashima says:

====================
sysctl: Fix data-races around ipv4_net_table (Roun).

This series fixes data-races around the first 13 knobs and
nexthop_compat_mode in ipv4_net_table.

I will post another patch for three early_demux knobs later,
so the next round will start from ip_default_ttl.
====================

Signed-off-by: David S. Miller <[email protected]>

nexthop: Fix data-races around nexthop_compat_mode.

While reading nexthop_compat_mode, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 4f80116d3df3 ("net: ipv4: add sysctl for nexthop api compatibility mode")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv4: Fix data-races around sysctl_ip_dynaddr.

While reading sysctl_ip_dynaddr, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: Fix a data-race around sysctl_tcp_ecn_fallback.

While reading sysctl_tcp_ecn_fallback, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 492135557dc0 ("tcp: add rfc3168, section 6.1.1.1. fallback")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: Fix data-races around sysctl_tcp_ecn.

While reading sysctl_tcp_ecn, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

raw: Fix a data-race around sysctl_raw_l3mdev_accept.

While reading sysctl_raw_l3mdev_accept, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 6897445fb194 ("net: provide a sysctl raw_l3mdev_accept for raw socket lookup with VRFs")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

icmp: Fix a data-race around sysctl_icmp_ratemask.

While reading sysctl_icmp_ratemask, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

icmp: Fix a data-race around sysctl_icmp_ratelimit.

While reading sysctl_icmp_ratelimit, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

icmp: Fix a data-race around sysctl_icmp_errors_use_inbound_ifaddr.

While reading sysctl_icmp_errors_use_inbound_ifaddr, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.

Fixes: 1c2fb7f93cb2 ("[IPV4]: Sysctl configurable icmp error source address.")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

icmp: Fix a data-race around sysctl_icmp_ignore_bogus_error_responses.

While reading sysctl_icmp_ignore_bogus_error_responses, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

icmp: Fix a data-race around sysctl_icmp_echo_ignore_broadcasts.

While reading sysctl_icmp_echo_ignore_broadcasts, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

icmp: Fix data-races around sysctl_icmp_echo_enable_probe.

While reading sysctl_icmp_echo_enable_probe, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.

Fixes: d329ea5bd884 ("icmp: add response to RFC 8335 PROBE messages")
Fixes: 1fd07f33c3ea ("ipv6: ICMPV6: add response to ICMPV6 RFC 8335 PROBE messages")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

icmp: Fix a data-race around sysctl_icmp_echo_ignore_all.

While reading sysctl_icmp_echo_ignore_all, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: Fix a data-race around sysctl_max_tw_buckets.

While reading sysctl_max_tw_buckets, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sysctl: Fix data-races in proc_dointvec_ms_jiffies().

A sysctl variable is accessed concurrently, and there is always a chance
of data-race. So, all readers and writers need some basic protection to
avoid load/store-tearing.

This patch changes proc_dointvec_ms_jiffies() to use READ_ONCE() and
WRITE_ONCE() internally to fix data-races on the sysctl side. For now,
proc_dointvec_ms_jiffies() itself is tolerant to a data-race, but we still
need to add annotations on the other subsystem's side.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sysctl: Fix data-races in proc_dou8vec_minmax().

A sysctl variable is accessed concurrently, and there is always a chance
of data-race. So, all readers and writers need some basic protection to
avoid load/store-tearing.

This patch changes proc_dou8vec_minmax() to use READ_ONCE() and
WRITE_ONCE() internally to fix data-races on the sysctl side. For now,
proc_dou8vec_minmax() itself is tolerant to a data-race, but we still
need to add annotations on the other subsystem's side.

Fixes: cb9444130662 ("sysctl: add proc_dou8vec_minmax()")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cpufreq: mediatek: Handle sram regulator probe deferral

If the regulator_get_optional() call for the SRAM regulator returns
a probe deferral, we must bail out and retry probing later: failing
to do this will produce unstabilities on platforms requiring the
handling for this regulator.

Fixes: ffa7bdf7f344 ("cpufreq: mediatek: Make sram regulator optional")
Signed-off-by: AngeloGioacchino Del Regno <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>

x86/pat: Fix x86_has_pat_wp()

x86_has_pat_wp() is using a wrong test, as it relies on the normal
PAT configuration used by the kernel. In case the PAT MSR has been
setup by another entity (e.g. Xen hypervisor) it might return false
even if the PAT configuration is allowing WP mappings. This due to the
fact that when running as Xen PV guest the PAT MSR is setup by the
hypervisor and cannot be changed by the guest. This results in the WP
related entry to be at a different position when running as Xen PV
guest compared to the bare metal or fully virtualized case.

The correct way to test for WP support is:

1. Get the PTE protection bits needed to select WP mode by reading
   __cachemode2pte_tbl[_PAGE_CACHE_MODE_WP] (depending on the PAT MSR
   setting this might return protection bits for a stronger mode, e.g.
   UC-)
2. Translate those bits back into the real cache mode selected by those
   PTE bits by reading __pte2cachemode_tbl[__pte2cm_idx(prot)]
3. Test for the cache mode to be _PAGE_CACHE_MODE_WP

Fixes: f88a68facd9a ("x86/mm: Extend early_memremap() support with additional attrs")
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: <[email protected]> # 4.14
Link: https://lore.kernel.org/r/[email protected]

x86/asm/32: Fix ANNOTATE_UNRET_SAFE use on 32-bit

The build on x86_32 currently fails after commit

9bb2ec608a20 (objtool: Update Retpoline validation)

with:

arch/x86/kernel/../../x86/xen/xen-head.S:35: Error: no such instruction: `annotate_unret_safe'

ANNOTATE_UNRET_SAFE is defined in nospec-branch.h. And head_32.S is
missing this include. Fix this.

Fixes: 9bb2ec608a20 ("objtool: Update Retpoline validation")
Signed-off-by: Jiri Slaby <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

ALSA: hda/realtek - Enable the headset-mic on a Xiaomi's laptop

The headset on this machine is not defined, after applying the quirk
ALC256_FIXUP_ASUS_HEADSET_MIC, the headset-mic works well

Signed-off-by: Meng Tang <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

mmc: sdhci-omap: Fix a lockdep warning for PM runtime init

We need runtime PM enabled early in probe before sdhci_setup_host() for
sdhci_omap_set_capabilities(). But on the first runtime resume we must
not call sdhci_runtime_resume_host() as sdhci_setup_host() has not been
called yet. Let's check for an initialized controller like we already do
for context restore to fix a lockdep warning.

Fixes: f433e8aac6b9 ("mmc: sdhci-omap: Implement PM runtime functions")
Reported-by: Yegor Yefremov <[email protected]>
Suggested-by: Arnd Bergmann <[email protected]>
Signed-off-by: Tony Lindgren <[email protected]>
Acked-by: Adrian Hunter <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>

perf/core: Fix data race between perf_event_set_output() and perf_mmap_close()

Yang Jihing reported a race between perf_event_set_output() and
perf_mmap_close():

CPU1 CPU2

perf_mmap_close(e2)
  if (atomic_dec_and_test(&e2->rb->mmap_count)) // 1 - > 0
    detach_rest = true

ioctl(e1, IOC_SET_OUTPUT, e2)
  perf_event_set_output(e1, e2)

  ...
  list_for_each_entry_rcu(e, &e2->rb->event_list, rb_entry)
    ring_buffer_attach(e, NULL);
    // e1 isn't yet added and
    // therefore not detached

    ring_buffer_attach(e1, e2->rb)
      list_add_rcu(&e1->rb_entry,
   &e2->rb->event_list)

After this; e1 is attached to an unmapped rb and a subsequent
perf_mmap() will loop forever more:

again:
mutex_lock(&e->mmap_mutex);
if (event->rb) {
...
if (!atomic_inc_not_zero(&e->rb->mmap_count)) {
...
mutex_unlock(&e->mmap_mutex);
goto again;
}
}

The loop in perf_mmap_close() holds e2->mmap_mutex, while the attach
in perf_event_set_output() holds e1->mmap_mutex. As such there is no
serialization to avoid this race.

Change perf_event_set_output() to take both e1->mmap_mutex and
e2->mmap_mutex to alleviate that problem. Additionally, have the loop
in perf_mmap() detach the rb directly, this avoids having to wait for
the concurrent perf_mmap_close() to get around to doing it to make
progress.

Fixes: 9bb5d40cd93c ("perf: Fix mmap() accounting hole")
Reported-by: Yang Jihong <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Yang Jihong <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

Merge tag 'qcom-dts-fixes-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes

Qualcomm DTS fixe for 5.19

The pinctrl state was lost in the recent refactoring of the MSM8974
Devicetree, this contains a fix for this.

* tag 'qcom-dts-fixes-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
ARM: dts: qcom: msm8974: re-add missing pinctrl

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'arm-soc/for-5.19/devicetree-arm64-fixes' of https://github.com/Broadcom/stblinux into arm/fixes

This pull request contains Broadcom ARM64-based SoCs Device Tree fixes
for 5.19, please pull the following:

- William corrects the BCM4906 DTS to have an armv8-timer Device Tree,
  necessary for booting, and fixes the BCM4908 DTS to have a proper
  'enable-method' and 'cpu-release-addr' properties for the kernel to boot
  when using u-boot

* tag 'arm-soc/for-5.19/devicetree-arm64-fixes' of https://github.com/Broadcom/stblinux:
  arm64: dts: broadcom: bcm4908: Fix cpu node for smp boot
  arm64: dts: broadcom: bcm4908: Fix timer node for BCM4906 SoC

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>

ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc221

On a HP 288 Pro G2 MT (X9W02AV), the front mic could not be detected.
In order to get it working, the pin configuration needs to be set
correctly, and the ALC221_FIXUP_HP_288PRO_MIC_NO_PRESENCE fixup needs
to be applied.

Signed-off-by: Meng Tang <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

ALSA: hda/realtek: fix mute/micmute LEDs for HP machines

The HP ProBook 440/450 G9 and EliteBook 640/650 G9 have multiple
motherboard design and they are using different subsystem ID of audio
codec. Add the same quirk for other MBs.

Signed-off-by: Jeremy Szu <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

USB: serial: ftdi_sio: add Belimo device ids

Those two product ids are known.

Signed-off-by: Lucien Buchmann <[email protected]>
Cc: [email protected]
Signed-off-by: Johan Hovold <[email protected]>

Merge branch 'bnxt_en-5-bug-fixes'

Michael Chan says:

====================
bnxt_en: 5 Bug fixes

This patchset fixes various issues, including SRIOV error unwinding,
one error recovery path, live patch reporting, XDP transmit path,
and PHC clock reading.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

bnxt_en: Fix bnxt_refclk_read()

The upper 32-bit PHC register is not latched when reading the lower
32-bit PHC register. Current code leaves a small window where we may
not read correct higher order bits if the lower order bits are just about
to wrap around.

This patch fixes this by reading higher order bits twice and makes
sure that final value is correctly paired with its lower 32 bits.

Fixes: 30e96f487f64 ("bnxt_en: Do not read the PTP PHC during chip reset")
Cc: Richard Cochran <[email protected]>
Signed-off-by: Pavan Chebbi <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

bnxt_en: Fix and simplify XDP transmit path

Fix the missing length hint in the TX BD for the XDP transmit path.  The
length hint is required on legacy chips.

Also, simplify the code by eliminating the first_buf local variable.
tx_buf contains the same value.  The opaque value only needs to be set
on the first BD.  Fix this also for correctness.

Fixes: a7559bc8c17c ("bnxt: support transmit and free of aggregation buffers")
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

bnxt_en: fix livepatch query

In the livepatch query fw_target BNXT_FW_SRT_PATCH is
applicable for P5 chips only.

Fixes: 3c4153394e2c ("bnxt_en: implement firmware live patching")
Reviewed-by: Saravanan Vajravel <[email protected]>
Reviewed-by: Somnath Kotur <[email protected]>
Signed-off-by: Vikas Gupta <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

bnxt_en: Fix bnxt_reinit_after_abort() code path

bnxt_reinit_after_abort() is called during ifup when a previous
FW reset sequence has aborted or a previous ifup has failed after
detecting FW reset.  In all cases, it is safe to assume that a
previous FW reset has completed and the driver may not have fully
reinitialized.

Prior to this patch, it is assumed that the
FUNC_DRV_IF_CHANGE_RESP_FLAGS_HOT_FW_RESET_DONE flag will always be
set by the firmware in bnxt_hwrm_if_change().  This may not be true if
the driver has already attempted to register with the firmware.  The
firmware may not set the RESET_DONE flag again after the driver has
registered, assuming that the driver has seen the flag already.

Fix it to always go through the FW reset initialization path if
the BNXT_STATE_FW_RESET_DET flag is set.  This flag is always set
by the driver after successfully going through bnxt_reinit_after_abort().

Fixes: 6882c36cf82e ("bnxt_en: attempt to reinitialize after aborted reset")
Reviewed-by: Pavan Chebbi <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

bnxt_en: reclaim max resources if sriov enable fails

If bnxt_sriov_enable() fails after some resources have been reserved
for the VFs, the current code is not unwinding properly and the
reserved resources become unavailable afterwards. Fix it by
properly unwinding with a call to bnxt_hwrm_func_qcaps() to
reset all maximum resources.

Also, add the missing bnxt_ulp_sriov_cfg() call to let the RDMA
driver know to abort.

Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'trace-v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
"Fixes and minor clean ups for tracing:

   - Fix memory leak by reverting what was thought to be a double free.

     A static tool had gave a false positive that a double free was
     possible in the error path, but it was actually a different
     location that confused the static analyzer (and those of us that
     reviewed it).

   - Move use of static buffers by ftrace_dump() to a location that can
     be used by kgdb's ftdump(), as it needs it for the same reasons.

   - Clarify in the Kconfig description that function tracing has
     negligible impact on x86, but may have a bit bigger impact on other
     architectures.

   - Remove unnecessary extra semicolon in trace event.

   - Make a local variable static that is used in the fprobes sample

   - Use KSYM_NAME_LEN for length of function in kprobe sample and get
     rid of unneeded macro for the same purpose"

* tag 'trace-v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  samples: Use KSYM_NAME_LEN for kprobes
  fprobe/samples: Make sample_probe static
  blk-iocost: tracing: atomic64_read(&ioc->vtime_rate) is assigned an extra semicolon
  ftrace: Be more specific about arch impact when function tracer is enabled
  tracing: Fix sleeping while atomic in kdb ftdump
  tracing/histograms: Fix memory leak problem

drm/i915/selftests: fix subtraction overflow bug

On some machines hole_end can be small enough to cause subtraction
overflow. On the other side (addr + 2 * min_alignment) can overflow
in case of mock tests. This patch should handle both cases.

Fixes: e1c5f754067b59 ("drm/i915: Avoid overflow in computing pot_hole loop termination")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3674
Signed-off-by: Andrzej Hajda <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Signed-off-by: Andi Shyti <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
(cherry picked from commit ab3edc679c552a466e4bf0b11af3666008bd65a2)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915/gem: Look for waitboosting across the whole object prior to individual waits

We employ a "waitboost" heuristic to detect when userspace is stalled
waiting for results from earlier execution. Under latency sensitive work
mixed between the gpu/cpu, the GPU is typically under-utilised and so
RPS sees that low utilisation as a reason to downclock the frequency,
causing longer stalls and lower throughput. The user left waiting for
the results is not impressed.

On applying commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv
workaround") it was observed that deinterlacing h264 on Haswell
performance dropped by 2-5x. The reason being that the natural workload
was not intense enough to trigger RPS (using HW evaluation intervals) to
upclock, and so it was depending on waitboosting for the throughput.

Commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
changes the composition of dma-resv from keeping a single write fence +
multiple read fences, to a single array of multiple write and read
fences (a maximum of one pair of write/read fences per context). The
iteration order was also changed implicitly from all-read fences then
the single write fence, to a mix of write fences followed by read
fences. It is that ordering change that belied the fragility of
waitboosting.

Currently, a waitboost is inspected at the point of waiting on an
outstanding fence. If the GPU is backlogged such that we haven't yet
stated the request we need to wait on, we force the GPU to upclock until
the completion of that request. By changing the order in which we waited
upon requests, we ended up waiting on those requests in sequence and as
such we saw that each request was already started and so not a suitable
candidate for waitboosting.

Instead of asking whether to boost each fence in turn, we can look at
whether boosting is required for the dma-resv ensemble prior to waiting
on any fence, making the heuristic more robust to the order in which
fences are stored in the dma-resv.

Reported-by: Thomas Voegtle <[email protected]>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6284
Fixes: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Signed-off-by: Karolina Drobnik <[email protected]>
Tested-by: Thomas Voegtle <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Acked-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/07e05518d9f6620d20cc1101ec1849203fe973f9.1657289332.git.karolina.drobnik@intel.com
(cherry picked from commit 394e2b57a989113de494c52d4683444bcb02d4e1)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915/gt: Serialize TLB invalidates with GT resets

Avoid trying to invalidate the TLB in the middle of performing an
engine reset, as this may result in the reset timing out. Currently,
the TLB invalidate is only serialised by its own mutex, forgoing the
uncore lock, but we can take the uncore->lock as well to serialise
the mmio access, thereby serialising with the GDRST.

Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
i915 selftest/hangcheck.

Cc: [email protected] # v4.4 and upper
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Reported-by: Mauro Carvalho Chehab <[email protected]>
Tested-by: Mauro Carvalho Chehab <[email protected]>
Reviewed-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Acked-by: Thomas Hellström <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/1e59a7c45dd919a530256b9ac721ac6ea86c0677.1657639152.git.mchehab@kernel.org
(cherry picked from commit 33da97894758737895e90c909f16786052680ef4)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915/gt: Serialize GRDOM access between multiple engine resets

Don't allow two engines to be reset in parallel, as they would both
try to select a reset bit (and send requests to common registers)
and wait on that register, at the same time. Serialize control of
the reset requests/acks using the uncore->lock, which will also ensure
that no other GT state changes at the same time as the actual reset.

Cc: [email protected] # v4.4 and upper
Reported-by: Mika Kuoppala <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Acked-by: Mika Kuoppala <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Acked-by: Thomas Hellström <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/e0a2d894e77aed7c2e36b0d1abdc7dbac3011729.1657639152.git.mchehab@kernel.org
(cherry picked from commit 336561a914fc0c6f1218228718f633b31b7af1c3)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915/ttm: fix sg_table construction

If we encounter some monster sized local-memory page that exceeds the
maximum sg length (UINT32_MAX), ensure that don't end up with some
misaligned address in the entry that follows, leading to fireworks
later. Also ensure we have some coverage of this in the selftests.

v2(Chris):
- Use round_down consistently to avoid udiv errors
v3(Nirmoy):
- Also update the max_segment in the selftest

Fixes: f701b16d4cc5 ("drm/i915/ttm: add i915_sg_from_buddy_resource")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6379
Signed-off-by: Matthew Auld <[email protected]>
Cc: Thomas Hellström <[email protected]>
Cc: Nirmoy Das <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit bc99f1209f19fefa3ee11e77464ccfae541f4291)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915/selftests: fix a couple IS_ERR() vs NULL tests

The shmem_pin_map() function doesn't return error pointers, it returns
NULL.

Fixes: be1cb55a07bf ("drm/i915/gt: Keep a no-frills swappable copy of the default context state")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Signed-off-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20220708094104.GL2316@kadam
(cherry picked from commit d50f5a109cf4ed50c5b575c1bb5fc3bd17b23308)
Signed-off-by: Rodrigo Vivi <[email protected]>

Merge tag 'gvt-fixes-2022-07-11' of https://github.com/intel/gvt-linux into drm-intel-fixes

gvt-fixes-2022-07-11

- Fix return value for shmem_pin_map()

Signed-off-by: Rodrigo Vivi <[email protected]>
From: Zhenyu Wang <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/i915: Fix vm use-after-free in vma destruction

In vma destruction, the following race may occur:

Thread 1:        Thread 2:
i915_vma_destroy();

  ...
  list_del_init(vma->vm_link);
  ...
  mutex_unlock(vma->vm->mutex);
  __i915_vm_release();
release_references();

And in release_reference() we dereference vma->vm to get to the
vm gt pointer, leading to a use-after free.

However, __i915_vm_release() grabs the vm->mutex so the vm won't be
destroyed before vma->vm->mutex is released, so extract the gt pointer
under the vm->mutex to avoid the vma->vm dereference in
release_references().

v2: Fix a typo in the commit message (Andi Shyti)

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5944
Fixes: e1a7ab4fca0c ("drm/i915: Remove the vm open count")
Cc: Niranjana Vishwanathapura <[email protected]>
Cc: Matthew Auld <[email protected]>
Signed-off-by: Thomas Hellström <[email protected]>
Acked-by: Nirmoy Das <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 1926a6b75954fc1a8b44d10bd0c67db957b78cf7)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915/guc: ADL-N should use the same GuC FW as ADL-S

The only difference between the ADL S and P GuC FWs is the HWConfig
support. ADL-N does not support HWConfig, so we should use the same
binary as ADL-S, otherwise the GuC might attempt to fetch a config
table that does not exist. ADL-N is internally identified as an ADL-P,
so we need to special-case it in the FW selection code.

Fixes: 7e28d0b26759 ("drm/i915/adl-n: Enable ADL-N platform")
Cc: John Harrison <[email protected]>
Cc: Tejas Upadhyay <[email protected]>
Cc: Anusha Srivatsa <[email protected]>
Cc: Jani Nikula <[email protected]>
Signed-off-by: Daniele Ceraolo Spurio <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 971e4a9781742aaad1587e25fd5582b2dd595ef8)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915: fix a possible refcount leak in intel_dp_add_mst_connector()

If drm_connector_init fails, intel_connector_free will be called to take
care of proper free. So it is necessary to drop the refcount of port
before intel_connector_free.

Fixes: 091a4f91942a ("drm/i915: Handle drm-layer errors in intel_dp_add_mst_connector")
Signed-off-by: Hangyu Hua <[email protected]>
Reviewed-by: José Roberto de Souza <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: José Roberto de Souza <[email protected]>
(cherry picked from commit cea9ed611e85d36a05db52b6457bf584b7d969e2)
Signed-off-by: Rodrigo Vivi <[email protected]>

samples: Use KSYM_NAME_LEN for kprobes

It is better and enough to use KSYM_NAME_LEN for kprobes
in samples, no need to define and use the other values.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Tiezhu Yang <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

fprobe/samples: Make sample_probe static

This symbol is not used outside of fprobe_example.c, so marks it static.

Fixes the following warning:

sparse warnings: (new ones prefixed by >>)
>> samples/fprobe/fprobe_example.c:23:15: sparse: sparse: symbol 'sample_probe'
was not declared. Should it be static?

Link: https://lkml.kernel.org/r/[email protected]
Reported-by: kernel test robot <[email protected]>
Signed-off-by: sunliming <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

blk-iocost: tracing: atomic64_read(&ioc->vtime_rate) is assigned an extra semicolon

Remove extra semicolon.

Link: https://lkml.kernel.org/r/[email protected]
Cc: Tejun Heo <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Li kunyu <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

ftrace: Be more specific about arch impact when function tracer is enabled

It was brought up that on ARMv7, that because the FUNCTION_TRACER does not
use nops to keep function tracing disabled because of the use of a link
register, it does have some performance impact.

The start of functions when -pg is used to compile the kernel is:

push    {lr}
bl      8010e7c0 <__gnu_mcount_nc>

When function tracing is tuned off, it becomes:

push    {lr}
add   sp, sp, #4

Which just puts the stack back to its normal location. But these two
instructions at the start of every function does incur some overhead.

Be more honest in the Kconfig FUNCTION_TRACER description and specify that
the overhead being in the noise was x86 specific, but other architectures
may vary.

Link: https://lore.kernel.org/all/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Reported-by: Sascha Hauer <[email protected]>
Acked-by: Sascha Hauer <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

tracing: Fix sleeping while atomic in kdb ftdump

If you drop into kdb and type "ftdump" you'll get a sleeping while
atomic warning from memory allocation in trace_find_next_entry().

This appears to have been caused by commit ff895103a84a ("tracing:
Save off entry when peeking at next entry"), which added the
allocation in that path. The problematic commit was already fixed by
commit 8e99cf91b99b ("tracing: Do not allocate buffer in
trace_find_next_entry() in atomic") but that fix missed the kdb case.

The fix here is easy: just move the assignment of the static buffer to
the place where it should have been to begin with:
trace_init_global_iter(). That function is called in two places, once
is right before the assignment of the static buffer added by the
previous fix and once is in kdb.

Note that it appears that there's a second static buffer that we need
to assign that was added in commit efbbdaa22bb7 ("tracing: Show real
address for trace event arguments"), so we'll move that too.

Link: https://lkml.kernel.org/r/20220708170919.1.I75844e5038d9425add2ad853a608cb44bb39df40@changeid
Fixes: ff895103a84a ("tracing: Save off entry when peeking at next entry")
Fixes: efbbdaa22bb7 ("tracing: Show real address for trace event arguments")
Signed-off-by: Douglas Anderson <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

tracing/histograms: Fix memory leak problem

This reverts commit 46bbe5c671e06f070428b9be142cc4ee5cedebac.

As commit 46bbe5c671e0 ("tracing: fix double free") said, the
"double free" problem reported by clang static analyzer is:
  > In parse_var_defs() if there is a problem allocating
  > var_defs.expr, the earlier var_defs.name is freed.
  > This free is duplicated by free_var_defs() which frees
  > the rest of the list.

However, if there is a problem allocating N-th var_defs.expr:
  + in parse_var_defs(), the freed 'earlier var_defs.name' is
    actually the N-th var_defs.name;
  + then in free_var_defs(), the names from 0th to (N-1)-th are freed;

                        IF ALLOCATING PROBLEM HAPPENED HERE!!! -+
                                                                 \
                                                                  |
          0th           1th                 (N-1)-th      N-th    V
          +-------------+-------------+-----+-------------+-----------
var_defs: | name | expr | name | expr | ... | name | expr | name | ///
          +-------------+-------------+-----+-------------+-----------

These two frees don't act on same name, so there was no "double free"
problem before. Conversely, after that commit, we get a "memory leak"
problem because the above "N-th var_defs.name" is not freed.

If enable CONFIG_DEBUG_KMEMLEAK and inject a fault at where the N-th
var_defs.expr allocated, then execute on shell like:
  $ echo 'hist:key=call_site:val=$v1,$v2:v1=bytes_req,v2=bytes_alloc' > \
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger

Then kmemleak reports:
  unreferenced object 0xffff8fb100ef3518 (size 8):
    comm "bash", pid 196, jiffies 4295681690 (age 28.538s)
    hex dump (first 8 bytes):
      76 31 00 00 b1 8f ff ff                          v1......
    backtrace:
      [<0000000038fe4895>] kstrdup+0x2d/0x60
      [<00000000c99c049a>] event_hist_trigger_parse+0x206f/0x20e0
      [<00000000ae70d2cc>] trigger_process_regex+0xc0/0x110
      [<0000000066737a4c>] event_trigger_write+0x75/0xd0
      [<000000007341e40c>] vfs_write+0xbb/0x2a0
      [<0000000087fde4c2>] ksys_write+0x59/0xd0
      [<00000000581e9cdf>] do_syscall_64+0x3a/0x80
      [<00000000cf3b065c>] entry_SYSCALL_64_after_hwframe+0x46/0xb0

Link: https://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Fixes: 46bbe5c671e0 ("tracing: fix double free")
Reported-by: Hulk Robot <[email protected]>
Suggested-by: Steven Rostedt <[email protected]>
Reviewed-by: Tom Zanussi <[email protected]>
Signed-off-by: Zheng Yejian <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

ice: change devlink code to read NVM in blocks

When creating a snapshot of the NVM the driver needs to read the entire
contents from the NVM and store it. The NVM reads are protected by a lock
that is shared between the driver and the firmware.

If the driver takes too long to read the entire NVM (which can happen on
some systems) then the firmware could reclaim the lock and cause subsequent
reads from the driver to fail.

We could fix this by increasing the timeout that we pass to the firmware,
but we could end up in the same situation again if the system is slow.
Instead have the driver break the reading of the NVM into blocks that are
small enough that we have confidence that the read will complete within the
timeout time, but large enough not to cause significant AQ overhead.

Fixes: dce730f17825 ("ice: add a devlink region for dumping NVM contents")
Signed-off-by: Paul M Stillwell Jr <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

ice: handle E822 generic device ID in PLDM header

The driver currently presumes that the record data in the PLDM header
of the firmware image will match the device ID of the running device.
This is true for E810 devices. It appears that for E822 devices that
this is not guaranteed to be true.

Fix this by adding a check for the generic E822 device.

Fixes: d69ea414c9b4 ("ice: implement device flash update via devlink")
Signed-off-by: Paul M Stillwell Jr <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>

Merge tag 'ovl-fixes-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs

Pull overlayfs fix from Miklos Szeredi:
"Add a temporary fix for posix acls on idmapped mounts introduced in
this cycle. A proper fix will be added in the next cycle"

* tag 'ovl-fixes-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: turn off SB_POSIXACL with idmapped layers temporarily

Merge tag 'drm-fixes-2022-07-12' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"I see you picked up one of the fbdev fixes, this is the other stuff
  that was queued up last week.

  A bit of a scattering of fixes, three for i915, one amdgpu, and a
  couple of panfrost, rockchip, panel and bridge ones.

  amdgpu:
   - Hibernation fix

  dma-buf:
   - fix use after free of fence

  i915:
   - Fix a possible refcount leak in DP MST connector (Hangyu)
   - Fix on loading guc on ADL-N (Daniele)
   - Fix vm use-after-free in vma destruction (Thomas)

  bridge:
   - fsl-ldb : 3 LVDS modesetting fixes

  rockchip:
   - iommu domain fix

  panfrost:
   - fix memory corruption
   - error path fix

  panel:
   - orientation quirk fix for Yoga tablet 2

  ssd130x:
   - fix pre-charge period setting"

* tag 'drm-fixes-2022-07-12' of git://anongit.freedesktop.org/drm/drm:
  drm/ssd130x: Fix pre-charge period setting
  dma-buf: Fix one use-after-free of fence
  drm/i915: Fix vm use-after-free in vma destruction
  drm/i915/guc: ADL-N should use the same GuC FW as ADL-S
  drm/i915: fix a possible refcount leak in intel_dp_add_mst_connector()
  drm/amdgpu/display: disable prefer_shadow for generic fb helpers
  drm/amdgpu: keep fbdev buffers pinned during suspend
  drm/panfrost: Fix shrinker list corruption by madvise IOCTL
  drm/panfrost: Put mapping instead of shmem obj on panfrost_mmu_map_fault_addr() error
  drm/rockchip: Detach from ARM DMA domain in attach_device
  drm/bridge: fsl-ldb: Drop DE signal polarity inversion
  drm/bridge: fsl-ldb: Enable split mode for LVDS dual link
  drm/bridge: fsl-ldb: Fix mode clock rate validation
  drm/aperture: Run fbdev removal before internal helpers
  drm: panel-orientation-quirks: Add quirk for the Lenovo Yoga Tablet 2 830