Git Repo - linux.git/log

gfs2: check context in gfs2_glock_put

Add a might_sleep call into gfs2_glock_put which can sleep in DLM when
the last reference is released. This will show problems earlier, and
not only when the last reference is put.

Signed-off-by: Alexander Aring <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: Fix glock_hash_walk bugs

So far, glock_hash_walk took a reference on each glock it iterated over, and it
was the examiner's responsibility to drop those references.  Dropping the final
reference to a glock can sleep and the examiners are called in a RCU critical
section with spin locks held, so examiners that didn't need the extra reference
had to drop it asynchronously via gfs2_glock_queue_put or similar.  This wasn't
done correctly in thaw_glock which did call gfs2_glock_put, and not at all in
dump_glock_func.

Change glock_hash_walk to not take glock references at all.  That way, the
examiners that don't need them won't have to bother with slow asynchronous
puts, and the examiners that do need references can take them themselves.

Reported-by: Alexander Aring <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: Cancel remote delete work asynchronously

In gfs2_inode_lookup and gfs2_create_inode, we're calling
gfs2_cancel_delete_work which currently cancels any remote delete work
(delete_work_func) synchronously.  This means that if the work is
currently running, it will wait for it to finish.  We're doing this to
pevent a previous instance of an inode from having any influence on the
next instance.

However, delete_work_func uses gfs2_inode_lookup internally, and we can
end up in a deadlock when delete_work_func gets interrupted at the wrong
time.  For example,

  (1) An inode's iopen glock has delete work queued, but the inode
      itself has been evicted from the inode cache.

  (2) The delete work is preempted before reaching gfs2_inode_lookup.

  (3) Another process recreates the inode (gfs2_create_inode).  It tries
      to cancel any outstanding delete work, which blocks waiting for
      the ongoing delete work to finish.

  (4) The delete work calls gfs2_inode_lookup, which blocks waiting for
      gfs2_create_inode to instantiate and unlock the new inode =>
      deadlock.

It turns out that when the delete work notices that its inode has been
re-instantiated, it will do nothing.  This means that it's safe to
cancel the delete work asynchronously.  This prevents the kind of
deadlock described above.

Signed-off-by: Andreas Gruenbacher <[email protected]>
Signed-off-by: Bob Peterson <[email protected]>

gfs2: set glock object after nq

Before this patch, function gfs2_create_inode called glock_set_object to
set the gl_object for inode and iopen glocks before the glock was locked.
That's wrong because other competing processes like evict may be
blocked waiting for the glock and still have gl_object set before the
actual eviction can take place.

This patch moves the call to glock_set_object until after the glock is
acquire in function gfs2_create_inode, so it waits for possibly
competing evicts to finish their processing first.

Signed-off-by: Bob Peterson <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: remove RDF_UPTODATE flag

The new GLF_INSTANTIATE_NEEDED flag obsoletes the old rgrp flag
GFS2_RDF_UPTODATE, so this patch replaces it like we did with inodes.

Signed-off-by: Bob Peterson <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: Eliminate GIF_INVALID flag

With the addition of the new GLF_INSTANTIATE_NEEDED flag, the
GIF_INVALID flag is now redundant. This patch removes it.
Since inode_instantiate is only called when instantiation is needed,
the check in inode_instantiate is removed too.

Signed-off-by: Bob Peterson <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: fix GL_SKIP node_scope problems

Before this patch, when a glock was locked, the very first holder on the
queue would unlock the lockref and call the go_instantiate glops function
(if one existed), unless GL_SKIP was specified. When we introduced the new
node-scope concept, we allowed multiple holders to lock glocks in EX mode
and share the lock.

But node-scope introduced a new problem: if the first holder has GL_SKIP
and the next one does NOT, since it is not the first holder on the queue,
the go_instantiate op was not called. Eventually the GL_SKIP holder may
call the instantiate sub-function (e.g. gfs2_rgrp_bh_get) but there was
still a window of time in which another non-GL_SKIP holder assumes the
instantiate function had been called by the first holder. In the case of
rgrp glocks, this led to a NULL pointer dereference on the buffer_heads.

This patch tries to fix the problem by introducing two new glock flags:

GLF_INSTANTIATE_NEEDED, which keeps track of when the instantiate function
needs to be called to "fill in" or "read in" the object before it is
referenced.

GLF_INSTANTIATE_IN_PROG which is used to determine when a process is
in the process of reading in the object. Whenever a function needs to
reference the object, it checks the GLF_INSTANTIATE_NEEDED flag, and if
set, it sets GLF_INSTANTIATE_IN_PROG and calls the glops "go_instantiate"
function.

As before, the gl_lockref spin_lock is unlocked during the IO operation,
which may take a relatively long amount of time to complete. While
unlocked, if another process determines go_instantiate is still needed,
it sees GLF_INSTANTIATE_IN_PROG is set, and waits for the go_instantiate
glop operation to be completed. Once GLF_INSTANTIATE_IN_PROG is cleared,
it needs to check GLF_INSTANTIATE_NEEDED again because the other process's
go_instantiate operation may not have been successful.

Functions that previously called the instantiate sub-functions now call
directly into gfs2_instantiate so the new bits are managed properly.

Signed-off-by: Bob Peterson <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: split glock instantiation off from do_promote

Before this patch, function do_promote had a section of code that did
the actual instantiation. This patch splits that off into its own
function, gfs2_instantiate, which prepares us for the next patch that
will use that function.

Signed-off-by: Bob Peterson <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: further simplify do_promote

This patch further simplifies function do_promote by eliminating some
redundant code in favor of using a lock_released flag. This is just
prep work for a future patch.

Signed-off-by: Bob Peterson <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: re-factor function do_promote

This patch simply re-factors function do_promote to reduce the indents.
The logic should be unchanged. This makes future patches more readable.

Signed-off-by: Bob Peterson <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: Remove 'first' trace_gfs2_promote argument

Remove the 'first' argument of trace_gfs2_promote: with GL_SKIP, the
'first' holder isn't the one that instantiates the glock
(gl_instantiate), which is what the 'first' flag was apparently supposed
to indicate.

Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: change go_lock to go_instantiate

Before this patch, the go_lock glock operations (glops) did not do
any actual locking. They were used to instantiate objects, like reading
in dinodes and rgrps from the media.

This patch renames the functions to go_instantiate for clarity.

Signed-off-by: Bob Peterson <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: dump glocks from gfs2_consist_OBJ_i

Before this patch, failed consistency checks printed out the object
that failed, but not the object's glock. This patch makes it also
print out the object glock so we can see the glock's holders and flags
to aid with debugging.

Signed-off-by: Bob Peterson <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: dequeue iopen holder in gfs2_inode_lookup error

Before this patch, if function gfs2_inode_lookup encountered an error
after it had locked the iopen glock, it never unlocked it, relying on
the evict code to do the cleanup.  The evict code then took the
inode glock while holding the iopen glock, which violates the locking
order.  For example,

(1) node A does a gfs2_inode_lookup that fails, leaving the iopen glock
     locked.

(2) node B calls delete_work_func -> gfs2_lookup_by_inum ->
     gfs2_inode_lookup.  It locks the inode glock and blocks trying to
     lock the iopen glock, which is held by node A.

(3) node A eventually calls gfs2_evict_inode -> evict_should_delete.
     It blocks trying to lock the inode glock, which is now held by
     node B.

This patch introduces error handling to function gfs2_inode_lookup
so it properly dequeues held iopen glocks on errors.

Signed-off-by: Bob Peterson <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: Save ip from gfs2_glock_nq_init

Before this patch, when a glock was locked by function gfs2_glock_nq_init,
it initialized the holder gh_ip (return address) as gfs2_glock_nq_init.
That made it extremely difficult to track down problems because many
functions call gfs2_glock_nq_init. This patch changes the function so
that it saves gh_ip from the caller of gfs2_glock_nq_init, which makes
it easy to backtrack which holder took the lock.

Signed-off-by: Andreas Gruenbacher <[email protected]>
Signed-off-by: Bob Peterson <[email protected]>

gfs2: Allow append and immutable bits to coexist

Before this patch, function do_gfs2_set_flags checked if the append
and immutable flags were being set while already set. If so, error -EPERM
was given. There's no reason why these two flags should be mutually
exclusive, and if you set them separately, you will, in essence, set
one while it is already set. For example:

chattr +a /mnt/gfs2/file1
chattr +i /mnt/gfs2/file1

The first command sets the append-only flag. Since they are additive,
the second command sets the immutable flag AND append-only flag,
since they both coexist in i_diskflags. So the second command should
not return an error. This bug caused xfstests generic/545 to fail.

This patch simply removes the invalid checks.
I also eliminated an unused parm from do_gfs2_set_flags.

Signed-off-by: Bob Peterson <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: Switch some BUG_ON to GLOCK_BUG_ON for debug

In rgrp.c, there are several places where it does BUG_ON. This tells us
the call stack but nothing more, which is not very helpful.
This patch switches them to GLOCK_BUG_ON which also prints the glock,
its holders, and many of the rgrp values, which will help us debug
problems in the future.

Signed-off-by: Bob Peterson <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: move GL_SKIP check from glops to do_promote

Before this patch, each individual "go_lock" glock operation (glop)
checked the GL_SKIP flag, and if set, would skip further processing.

This patch changes the logic so the go_lock caller, function go_promote,
checks the GL_SKIP flag before calling the go_lock op in the first place.
This avoids having to unnecessarily unlock gl_lockref.lock only to
re-lock it again.

Signed-off-by: Bob Peterson <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: Add GL_SKIP holder flag to dump_holder

Somehow, the GL_SKIP flag was missed when dumping glock holders.
This patch adds it to function hflags2str. I added it at the end because
I wanted Holder and Skip flags together to read "Hs" rather than "sH"
to avoid confusion with "Shared" ("SH") holder state.

Signed-off-by: Bob Peterson <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: remove redundant check in gfs2_rgrp_go_lock

Before this patch, function gfs2_rgrp_go_lock checked if GL_SKIP and
ar_rgrplvb were both true. However, GL_SKIP is only set for rgrps if
ar_rgrplvb is true (see gfs2_inplace_reserve). This patch simply removes
the redundant check.

Signed-off-by: Bob Peterson <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: Fix mmap + page fault deadlocks for direct I/O

Also disable page faults during direct I/O requests and implement a
similar kind of retry logic as in the buffered I/O case.

The retry logic in the direct I/O case differs from the buffered I/O
case in the following way: direct I/O doesn't provide the kinds of
consistency guarantees between concurrent reads and writes that buffered
I/O provides, so once we lose the inode glock while faulting in user
pages, we always resume the operation.  We never need to return a
partial read or write.

This locking problem was originally reported by Jan Kara.  Linus came up
with the idea of disabling page faults.  Many thanks to Al Viro and
Matthew Wilcox for their feedback.

Signed-off-by: Andreas Gruenbacher <[email protected]>

iov_iter: Introduce nofault flag to disable page faults

Introduce a new nofault flag to indicate to iov_iter_get_pages not to
fault in user pages.

This is implemented by passing the FOLL_NOFAULT flag to get_user_pages,
which causes get_user_pages to fail when it would otherwise fault in a
page. We'll use the ->nofault flag to prevent iomap_dio_rw from faulting
in pages when page faults are not allowed.

Signed-off-by: Andreas Gruenbacher <[email protected]>

gup: Introduce FOLL_NOFAULT flag to disable page faults

Introduce a new FOLL_NOFAULT flag that causes get_user_pages to return
-EFAULT when it would otherwise trigger a page fault. This is roughly
similar to FOLL_FAST_ONLY but available on all architectures, and less
fragile.

Signed-off-by: Andreas Gruenbacher <[email protected]>

iomap: Add done_before argument to iomap_dio_rw

Add a done_before argument to iomap_dio_rw that indicates how much of
the request has already been transferred.  When the request succeeds, we
report that done_before additional bytes were tranferred.  This is
useful for finishing a request asynchronously when part of the request
has already been completed synchronously.

We'll use that to allow iomap_dio_rw to be used with page faults
disabled: when a page fault occurs while submitting a request, we
synchronously complete the part of the request that has already been
submitted.  The caller can then take care of the page fault and call
iomap_dio_rw again for the rest of the request, passing in the number of
bytes already tranferred.

Signed-off-by: Andreas Gruenbacher <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>

iomap: Support partial direct I/O on user copy failures

In iomap_dio_rw, when iomap_apply returns an -EFAULT error and the
IOMAP_DIO_PARTIAL flag is set, complete the request synchronously and
return a partial result. This allows the caller to deal with the page
fault and retry the remainder of the request.

Signed-off-by: Andreas Gruenbacher <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>

iomap: Fix iomap_dio_rw return value for user copies

When a user copy fails in one of the helpers of iomap_dio_rw, fail with
-EFAULT instead of returning 0. This matches what iomap_dio_bio_actor
returns when it gets an -EFAULT from bio_iov_iter_get_pages. With these
changes, iomap_dio_actor now consistently fails with -EFAULT when a user
page cannot be faulted in.

Signed-off-by: Andreas Gruenbacher <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

gfs2: Fix mmap + page fault deadlocks for buffered I/O

In the .read_iter and .write_iter file operations, we're accessing
user-space memory while holding the inode glock.  There is a possibility
that the memory is mapped to the same file, in which case we'd recurse
on the same glock.

We could detect and work around this simple case of recursive locking,
but more complex scenarios exist that involve multiple glocks,
processes, and cluster nodes, and working around all of those cases
isn't practical or even possible.

Avoid these kinds of problems by disabling page faults while holding the
inode glock.  If a page fault would occur, we either end up with a
partial read or write or with -EFAULT if nothing could be read or
written.  In either case, we know that we're not done with the
operation, so we indicate that we're willing to give up the inode glock
and then we fault in the missing pages.  If that made us lose the inode
glock, we return a partial read or write.  Otherwise, we resume the
operation.

This locking problem was originally reported by Jan Kara.  Linus came up
with the idea of disabling page faults.  Many thanks to Al Viro and
Matthew Wilcox for their feedback.

Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: Eliminate ip->i_gh

Now that gfs2_file_buffered_write is the only remaining user of
ip->i_gh, we can move the glock holder to the stack (or rather, use the
one we already have on the stack); there is no need for keeping the
holder in the inode anymore.

This is slightly complicated by the fact that we're using ip->i_gh for
the statfs inode in gfs2_file_buffered_write as well. Writing to the
statfs inode isn't very common, so allocate the statfs holder
dynamically when needed.

Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: Move the inode glock locking to gfs2_file_buffered_write

So far, for buffered writes, we were taking the inode glock in
gfs2_iomap_begin and dropping it in gfs2_iomap_end with the intention of
not holding the inode glock while iomap_write_actor faults in user
pages. It turns out that iomap_write_actor is called inside iomap_begin
... iomap_end, so the user pages were still faulted in while holding the
inode glock and the locking code in iomap_begin / iomap_end was
completely pointless.

Move the locking into gfs2_file_buffered_write instead. We'll take care
of the potential deadlocks due to faulting in user pages while holding a
glock in a subsequent patch.

Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: Introduce flag for glock holder auto-demotion

This patch introduces a new HIF_MAY_DEMOTE flag and infrastructure that
will allow glocks to be demoted automatically on locking conflicts.
When a locking request comes in that isn't compatible with the locking
state of an active holder and that holder has the HIF_MAY_DEMOTE flag
set, the holder will be demoted before the incoming locking request is
granted.

Note that this mechanism demotes active holders (with the HIF_HOLDER
flag set), while before we were only demoting glocks without any active
holders. This allows processes to keep hold of locks that may form a
cyclic locking dependency; the core glock logic will then break those
dependencies in case a conflicting locking request occurs. We'll use
this to avoid giving up the inode glock proactively before faulting in
pages.

Processes that allow a glock holder to be taken away indicate this by
calling gfs2_holder_allow_demote(), which sets the HIF_MAY_DEMOTE flag.
Later, they call gfs2_holder_disallow_demote() to clear the flag again,
and then they check if their holder is still queued: if it is, they are
still holding the glock; if it isn't, they can re-acquire the glock (or
abort).

Signed-off-by: Bob Peterson <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: Clean up function may_grant

Pass the first current glock holder into function may_grant and
deobfuscate the logic there.

While at it, switch from BUG_ON to GLOCK_BUG_ON in may_grant. To make
that build cleanly, de-constify the may_grant arguments.

We're now using function find_first_holder in do_promote, so move the
function's definition above do_promote.

Signed-off-by: Andreas Gruenbacher <[email protected]>

gfs2: Add wrapper for iomap_file_buffered_write

Add a wrapper around iomap_file_buffered_write. We'll add code for when
the operation needs to be retried here later.

Signed-off-by: Andreas Gruenbacher <[email protected]>

iov_iter: Introduce fault_in_iov_iter_writeable

Introduce a new fault_in_iov_iter_writeable helper for safely faulting
in an iterator for writing. Uses get_user_pages() to fault in the pages
without actually writing to them, which would be destructive.

We'll use fault_in_iov_iter_writeable in gfs2 once we've determined that
the iterator passed to .read_iter isn't in memory.

Signed-off-by: Andreas Gruenbacher <[email protected]>

iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable

Turn iov_iter_fault_in_readable into a function that returns the number
of bytes not faulted in, similar to copy_to_user, instead of returning a
non-zero value when any of the requested pages couldn't be faulted in.
This supports the existing users that require all pages to be faulted in
as well as new users that are happy if any pages can be faulted in.

Rename iov_iter_fault_in_readable to fault_in_iov_iter_readable to make
sure this change doesn't silently break things.

Signed-off-by: Andreas Gruenbacher <[email protected]>

gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable}

Turn fault_in_pages_{readable,writeable} into versions that return the
number of bytes not faulted in, similar to copy_to_user, instead of
returning a non-zero value when any of the requested pages couldn't be
faulted in. This supports the existing users that require all pages to
be faulted in as well as new users that are happy if any pages can be
faulted in.

Rename the functions to fault_in_{readable,writeable} to make sure
this change doesn't silently break things.

Neither of these functions is entirely trivial and it doesn't seem
useful to inline them, so move them to mm/gup.c.

Signed-off-by: Andreas Gruenbacher <[email protected]>

powerpc/kvm: Fix kvm_use_magic_page

When switching from __get_user to fault_in_pages_readable, commit
9f9eae5ce717 broke kvm_use_magic_page: like __get_user,
fault_in_pages_readable returns 0 on success.

Fixes: 9f9eae5ce717 ("powerpc/kvm: Prefer fault_in_pages_readable function")
Cc: [email protected] # v4.18+
Signed-off-by: Andreas Gruenbacher <[email protected]>

iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value

Both iov_iter_get_pages and iov_iter_get_pages_alloc return the number
of bytes of the iovec they could get the pages for. When they cannot
get any pages, they're supposed to return 0, but when the start of the
iovec isn't page aligned, the calculation goes wrong and they return a
negative value. Fix both functions.

In addition, change iov_iter_get_pages_alloc to return NULL in that case
to prevent resource leaks.

Signed-off-by: Andreas Gruenbacher <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

Linux 5.15-rc5

Merge tag 'powerpc-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
"A bit of a big batch, partly because I didn't send any last week, and
  also just because the BPF fixes happened to land this week.

  Summary:

   - Fix a regression hit by the IPR SCSI driver, introduced by the
     recent addition of MSI domains on pseries.

   - A big series including 8 BPF fixes, some with potential security
     impact and the rest various code generation issues.

   - Fix our program check assembler entry path, which was accidentally
     jumping into a gas macro and generating strange stack frames, which
     could confuse find_bug().

   - A couple of fixes, and related changes, to fix corner cases in our
     machine check handling.

   - Fix our DMA IOMMU ops, which were not always returning the optimal
     DMA mask, leading to at least one device falling back to 32-bit DMA
     when it shouldn't.

   - A fix for KUAP handling on 32-bit Book3S.

   - Fix crashes seen when kdumping on some pseries systems.

  Thanks to Naveen N. Rao, Nicholas Piggin, Alexey Kardashevskiy, Cédric
  Le Goater, Christophe Leroy, Mahesh Salgaonkar, Abdul Haleem,
  Christoph Hellwig, Johan Almbladh, Stan Johnson"

* tag 'powerpc-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  pseries/eeh: Fix the kdump kernel crash during eeh_pseries_init
  powerpc/32s: Fix kuap_kernel_restore()
  powerpc/pseries/msi: Add an empty irq_write_msi_msg() handler
  powerpc/64s: Fix unrecoverable MCE calling async handler from NMI
  powerpc/64/interrupt: Reconcile soft-mask state in NMI and fix false BUG
  powerpc/64: warn if local irqs are enabled in NMI or hardirq context
  powerpc/traps: do not enable irqs in _exception
  powerpc/64s: fix program check interrupt emergency stack path
  powerpc/bpf ppc32: Fix BPF_SUB when imm == 0x80000000
  powerpc/bpf ppc32: Do not emit zero extend instruction for 64-bit BPF_END
  powerpc/bpf ppc32: Fix JMP32_JSET_K
  powerpc/bpf ppc32: Fix ALU32 BPF_ARSH operation
  powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC
  powerpc/security: Add a helper to query stf_barrier type
  powerpc/bpf: Fix BPF_SUB when imm == 0x80000000
  powerpc/bpf: Fix BPF_MOD when imm == 1
  powerpc/bpf: Validate branch ranges
  powerpc/lib: Add helper to check if offset is within conditional branch range
  powerpc/iommu: Report the correct most efficient DMA mask for PCI devices

Merge tag 'objtool_urgent_for_v5.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fixes from Borislav Petkov:

- Remove an extra section.len member in favour of section.sh_size

- Align .altinstructions section creation with the kernel's by creating
   them with entry size of 0

- Fix objtool to convert a reloc symbol to a section offset and not to
   not warn about not knowing how

* tag 'objtool_urgent_for_v5.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Remove redundant 'len' field from struct section
  objtool: Make .altinstructions section entry size consistent
  objtool: Remove reloc symbol type checks in get_alt_entry()

Merge tag 'x86_urgent_for_v5.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

- A FPU fix to properly handle invalid MXCSR values: 32-bit masks them
   out due to historical reasons and 64-bit kernels reject them

- A fix to clear X86_FEATURE_SMAP when support for is not
   config-enabled

- Three fixes correcting misspelled Kconfig symbols used in code

- Two resctrl object cleanup fixes

- Yet another attempt at fixing the neverending saga of botched x86
   timers, this time because some incredibly smart hardware decides to
   turn off the HPET timer in a low power state - who cares if the OS is
   relying on it...

- Check the full return value range of an SEV VMGEXIT call to determine
   whether it returned an error

* tag 'x86_urgent_for_v5.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Restore the masking out of reserved MXCSR bits
  x86/Kconfig: Correct reference to MWINCHIP3D
  x86/platform/olpc: Correct ifdef symbol to intended CONFIG_OLPC_XO15_SCI
  x86/entry: Clear X86_FEATURE_SMAP when CONFIG_X86_SMAP=n
  x86/entry: Correct reference to intended CONFIG_64_BIT
  x86/resctrl: Fix kfree() of the wrong type in domain_add_cpu()
  x86/resctrl: Free the ctrlval arrays when domain_setup_mon_state() fails
  x86/hpet: Use another crystalball to evaluate HPET usability
  x86/sev: Return an error on a returned non-zero SW_EXITINFO1[31:0]

Merge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"Three driver bugfixes and one leak fix for the core"

* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: mlxcpld: Modify register setting for 400KHz frequency
  i2c: mlxcpld: Fix criteria for frequency setting
  i2c: mediatek: Add OFFSET_EXT_CONF setting back
  i2c: acpi: fix resource leak in reconfiguration device addition

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Five fixes, all in drivers.

  The big change is the UFS task management rework, with lpfc next and
  the rest being fairly minor and obvious fixes"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: iscsi: Fix iscsi_task use after free
  scsi: lpfc: Fix memory overwrite during FC-GS I/O abort handling
  scsi: elx: efct: Delete stray unlock statement
  scsi: ufs: core: Fix task management completion
  scsi: acornscsi: Remove scsi_cmd_to_tag() reference

Merge tag 'block-5.15-2021-10-09' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"Two small fixes for this release:

   - Add missing QUEUE_FLAG_HCTX_ACTIVE in the debugfs handling
     (Johannes)

   - Fix double free / UAF issue in __alloc_disk_node (Tetsuo)"

* tag 'block-5.15-2021-10-09' of git://git.kernel.dk/linux-block:
  block: decode QUEUE_FLAG_HCTX_ACTIVE in debugfs output
  block: genhd: fix double kfree() in __alloc_disk_node()

Merge tag '5.15-rc4-ksmbd-fixes' of git://git.samba.org/ksmbd

Pull ksmbd fixes from Steve French:
"Six fixes for the ksmbd kernel server, including two additional
  overflow checks, a fix for oops, and some cleanup (e.g. remove dead
  code for less secure dialects that has been removed)"

* tag '5.15-rc4-ksmbd-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix oops from fuse driver
  ksmbd: fix version mismatch with out of tree
  ksmbd: use buf_data_size instead of recalculation in smb3_decrypt_req()
  ksmbd: remove the leftover of smb2.0 dialect support
  ksmbd: check strictly data area in ksmbd_smb2_check_message()
  ksmbd: add the check to vaildate if stream protocol length exceeds maximum value

Merge tag 'riscv-for-linus-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

- A pair of fixes (along with the necessory cleanup) to our VDSO, to
   avoid a locking during OOM and to prevent the text from overflowing
   into the data page

- A fix to checksyscalls to teach it about our rv32 UABI

- A fix to add clone3() to the rv32 UABI, which was pointed out by
   checksyscalls

- A fix to properly flush the icache on the local CPU in addition to
   the remote CPUs

* tag 'riscv-for-linus-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  checksyscalls: Unconditionally ignore fstat{,at}64
  riscv: Flush current cpu icache before other cpus
  RISC-V: Include clone3() on rv32
  riscv/vdso: make arch_setup_additional_pages wait for mmap_sem for write killable
  riscv/vdso: Move vdso data page up front
  riscv/vdso: Refactor asm/vdso.h

Merge tag 's390-5.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:

- Fix potential memory leak on a error path in eBPF

- Fix handling of zpci device on reserve

* tag 's390-5.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/pci: fix zpci_zdev_put() on reserve
bpf, s390: Fix potential memory leak about jit_data

Merge tag 'xtensa-20211008' of git://github.com/jcmvbkbc/linux-xtensa

Pull xtensa fixes from Max Filippov:

- fix build/boot issues caused by CONFIG_OF vs CONFIC_USE_OF usage

- fix reset handler for xtfpga boards

* tag 'xtensa-20211008' of git://github.com/jcmvbkbc/linux-xtensa:
  xtensa: xtfpga: Try software restart before simulating CPU reset
  xtensa: xtfpga: use CONFIG_USE_OF instead of CONFIG_OF
  xtensa: call irqchip_init only when CONFIG_USE_OF is selected
  xtensa: use CONFIG_USE_OF instead of CONFIG_OF

Merge tag 'for-linus-5.15b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

- fix two minor issues in the Xen privcmd driver plus a cleanup patch
   for that driver

- fix multiple issues related to running as PVH guest and some related
   earlyprintk fixes for other Xen guest types

- fix an issue introduced in 5.15 the Xen balloon driver

* tag 'for-linus-5.15b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/balloon: fix cancelled balloon action
  xen/x86: adjust data placement
  x86/PVH: adjust function/data placement
  xen/x86: hook up xen_banner() also for PVH
  xen/x86: generalize preferred console model from PV to PVH Dom0
  xen/x86: make "earlyprintk=xen" work for HVM/PVH DomU
  xen/x86: allow "earlyprintk=xen" to work for PV Dom0
  xen/x86: make "earlyprintk=xen" work better for PVH Dom0
  xen/x86: allow PVH Dom0 without XEN_PV=y
  xen/x86: prevent PVH type from getting clobbered
  xen/privcmd: drop "pages" parameter from xen_remap_pfn()
  xen/privcmd: fix error handling in mmap-resource processing
  xen/privcmd: replace kcalloc() by kvcalloc() when allocating empty pages

Merge tag 'asm-generic-fixes-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic fixes from Arnd Bergmann:
"There is one build fix for Arm platforms that ended up impacting most
  architectures because of the way the drivers/firmware Kconfig file is
  wired up:

  The CONFIG_QCOM_SCM dependency have caused a number of randconfig
  regressions over time, and some still remain in v5.15-rc4. The fix we
  agreed on in the end is to make this symbol selected by any driver
  using it, and then building it even for non-Arm platforms with
  CONFIG_COMPILE_TEST.

  To make this work on all architectures, the drivers/firmware/Kconfig
  file needs to be included for all architectures to make the symbol
  itself visible.

  In a separate discussion, we found that a sound driver patch that is
  pending for v5.16 needs the same change to include this Kconfig file,
  so the easiest solution seems to have my Kconfig rework included in
  v5.15.

  Finally, the branch also includes a small unrelated build fix for
  NOMMU architectures"

Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/all/[email protected]/
* tag 'asm-generic-fixes-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic/io.h: give stub iounmap() on !MMU same prototype as elsewhere
  qcom_scm: hide Kconfig symbol
  firmware: include drivers/firmware/Kconfig unconditionally

Merge tag 'acpi-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
"Fix a recent ACPI-related regression in the PCI subsystem that
  introduced a NULL pointer dereference possible to trigger from
  user space via sysfs on some systems"

* tag 'acpi-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PCI: ACPI: Check parent pointer in acpi_pci_find_companion()

Merge tag 'usb-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some small USB fixes for 5.15-rc5 that resolve a number of
  reported issues:

   - gadget driver fixes

   - xhci build warning fixes

   - build configuration fix

   - cdc-acm tty handling fixes

   - cdc-wdm fix

   - typec fixes

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: cdc-acm: fix break reporting
  USB: cdc-acm: fix racy tty buffer accesses
  usb: gadget: f_uac2: fixed EP-IN wMaxPacketSize
  usb: cdc-wdm: Fix check for WWAN
  usb: chipidea: ci_hdrc_imx: Also search for 'phys' phandle
  usb: typec: tcpm: handle SRC_STARTUP state if cc changes
  usb: typec: tcpci: don't handle vSafe0V event if it's not enabled
  usb: typec: tipd: Remove dependency on "connector" child fwnode
  Partially revert "usb: Kconfig: using select for USB_COMMON dependency"
  usb: dwc3: gadget: Revert "set gadgets parent to the right controller"
  usb: xhci: tegra: mark PM functions as __maybe_unused

Merge tag 'mmc-v5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
"A couple of MMC host fixes:

   - meson-gx: Fix read/write access for dram-access-quirk

   - sdhci-of-at91: Fix calibration sequence"

* tag 'mmc-v5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: meson-gx: do not use memcpy_to/fromio for dram-access-quirk
  mmc: sdhci-of-at91: replace while loop with read_poll_timeout
  mmc: sdhci-of-at91: wait for calibration done before proceed

Merge tag 'drm-fixes-2021-10-08' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"I've returned from my tropical island retreat, even managed to bring
  one of my kids on a dive with some turtles. Thanks to Daniel for doing
  last week's work.

  Otherwise this is the weekly fixes pull, it's a bit bigger because the
  vc4 reverts in your tree caused some problems with fixes in the
  drm-misc tree so it got left out last week, so this week has the misc
  fixes rebased without the vc4 pieces.

  Otherwise it's i915, amdgpu with the usual fixes and a scattering over
  other drivers.

  I expect things should calm down a bit more next week.

  core:
   - Kconfig fix for fb_simple vs simpledrm.

  i915:
   - Fix RKL HDMI audio
   - Fix runtime pm imbalance on i915_gem_shrink() error path
   - Fix Type-C port access before hw/sw state sync
   - Fix VBT backlight struct version/size check
   - Fix VT-d async flip on SKL/BXT with plane stretch workaround

  amdgpu:
   - DCN 3.1 DP alt mode fixes
   - S0ix gfxoff fix
   - Fix DRM_AMD_DC_SI dependencies
   - PCIe DPC handling fix
   - DCN 3.1 scaling fix
   - Documentation fix

  amdkfd:
   - Fix potential memory leak
   - IOMMUv2 init fixes

  vc4 (there were some hdmi fixes but things got reverted, sort it out
       later):
   - compiler fix

  nouveau:
   - Cursor fix
   - Fix ttm buffer moves for ampere gpu's by adding minimal
     acceleration support.
   - memory leak fixes

  rockchip:
   - crtc/clk fixup

  panel:
   - ili9341 Fix DT bindings indent
   - y030xx067a - yellow tint init seq fix

  gbefb:
   - Fix gbefb when built with COMPILE_TEST"

* tag 'drm-fixes-2021-10-08' of git://anongit.freedesktop.org/drm/drm: (33 commits)
  drm/amd/display: Fix detection of 4 lane for DPALT
  drm/amd/display: Limit display scaling to up to 4k for DCN 3.1
  drm/amd/display: Skip override for preferred link settings during link training
  drm/nouveau/debugfs: fix file release memory leak
  drm/nouveau/kms/nv50-: fix file release memory leak
  drm/nouveau: avoid a use-after-free when BO init fails
  DRM: delete DRM IRQ legacy midlayer docs
  video: fbdev: gbefb: Only instantiate device when built for IP32
  fbdev: simplefb: fix Kconfig dependencies
  drm/panel: abt-y030xx067a: yellow tint fix
  dt-bindings: panel: ili9341: correct indentation
  drm/nouveau/fifo/ga102: initialise chid on return from channel creation
  drm/rockchip: Update crtc fixup to account for fractional clk change
  drm/nouveau/ga102-: support ttm buffer moves via copy engine
  drm/nouveau/kms/tu102-: delay enabling cursor until after assign_windows
  drm/sun4i: dw-hdmi: Fix HDMI PHY clock setup
  drm/vc4: hdmi: Remove unused struct
  drm/kmb: Enable alpha blended second plane
  drm/amdgpu: handle the case of pci_channel_io_frozen only in amdgpu_pci_resume
  drm/amdgpu: init iommu after amdkfd device init
  ...

asm-generic/io.h: give stub iounmap() on !MMU same prototype as elsewhere

It made -Werror sad.

Signed-off-by: Adam Borowski <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>

x86/fpu: Restore the masking out of reserved MXCSR bits

Ser Olmy reported a boot failure:

  init[1] bad frame in sigreturn frame:(ptrval) ip:b7c9fbe6 sp:bf933310 orax:ffffffff \
  in libc-2.33.so[b7bed000+156000]
  Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
  CPU: 0 PID: 1 Comm: init Tainted: G        W         5.14.9 #1
  Hardware name: Hewlett-Packard HP PC/HP Board, BIOS  JD.00.06 12/06/2001
  Call Trace:
   dump_stack_lvl
   dump_stack
   panic
   do_exit.cold
   do_group_exit
   get_signal
   arch_do_signal_or_restart
   ? force_sig_info_to_task
   ? force_sig
   exit_to_user_mode_prepare
   syscall_exit_to_user_mode
   do_int80_syscall_32
   entry_INT80_32

on an old 32-bit Intel CPU:

  vendor_id       : GenuineIntel
  cpu family      : 6
  model           : 6
  model name      : Celeron (Mendocino)
  stepping        : 5
  microcode       : 0x3

Ser bisected the problem to the commit in Fixes.

tglx suggested reverting the rejection of invalid MXCSR values which
this commit introduced and replacing it with what the old code did -
simply masking them out to zero.

Further debugging confirmed his suggestion:

  fpu->state.fxsave.mxcsr: 0xb7be13b4, mxcsr_feature_mask: 0xffbf
  WARNING: CPU: 0 PID: 1 at arch/x86/kernel/fpu/signal.c:384 __fpu_restore_sig+0x51f/0x540

so restore the original behavior only for 32-bit kernels where you have
ancient machines with buggy hardware. For 32-bit programs on 64-bit
kernels, user space which supplies wrong MXCSR values is considered
malicious so fail the sigframe restoration there.

Fixes: 6f9866a166cd ("x86/fpu/signal: Let xrstor handle the features to init")
Reported-by: Ser Olmy <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Tested-by: Ser Olmy <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

Merge tag 'amd-drm-fixes-5.15-2021-10-06' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-5.15-2021-10-06:

amdgpu:
- DCN 3.1 DP alt mode fixes
- S0ix gfxoff fix
- Fix DRM_AMD_DC_SI dependencies
- PCIe DPC handling fix
- DCN 3.1 scaling fix
- Documentation fix

amdkfd:
- Fix potential memory leak
- IOMMUv2 init fixes

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'drm-misc-fixes-2021-10-06' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Rebased drm-misc-fixes for v5.15-rc5:
- Dropped vc4 patches.
- Compiler fix for vc4.
- Cursor fix for nouveau.
- Fix ttm buffer moves for ampere gpu's by adding minimal acceleration support.
- Small rockchip fixes.
- Fix DT bindings indent for ili9341.
- Fix y030xx067a init sequence to not get a yellow tint.
- Kconfig fix for fb_simple vs simpledrm.
- Assorted nouvaeu memory leaks.
- Fix gbefb when built with COMPILE_TEST.

Signed-off-by: Dave Airlie <[email protected]>
From: Maarten Lankhorst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'drm-intel-fixes-2021-10-07' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

drm/i915 fixes for v5.15-rc5:
- Fix RKL HDMI audio
- Fix runtime pm imbalance on i915_gem_shrink() error path
- Fix Type-C port access before hw/sw state sync
- Fix VBT backlight struct version/size check
- Fix VT-d async flip on SKL/BXT with plane stretch workaround

Signed-off-by: Dave Airlie <[email protected]>
From: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

checksyscalls: Unconditionally ignore fstat{,at}64

These can be replaced by statx(). Since rv32 has a 64-bit time_t we
just never ended up with them in the first place. This is now an error
due to -Werror.

Suggested-by: Arnd Bergmann <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

Merge tag 'nfsd-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:
"Bug fixes for NFSD error handling paths"

* tag 'nfsd-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: Keep existing listeners on portlist error
  SUNRPC: fix sign error causing rpcsec_gss drops
  nfsd: Fix a warning for nfsd_file_close_inode
  nfsd4: Handle the NFSv4 READDIR 'dircount' hint being zero
  nfsd: fix error handling of register_pernet_subsys() in init_nfsd()

Merge tag 'armsoc-fixes-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
"This is a larger than normal update for Arm SoC specific code, most of
  it in device trees, but also drivers and the omap and at91/sama7
  platforms:

   - There are four new entries to the MAINTAINERS file: Sven Peter and
     Alyssa Rosenzweig for Apple M1, Romain Perier for Mstar/sigmastar,
     and Vignesh Raghavendra for TI K3

   - Build fixes to address randconfig warnings in sharpsl, dove, omap1,
     and qcom platforms as well as the scmi and op-tee subsystems

   - Regression fixes for missing CONFIG_FB and other options for
     several defconfigs

   - Several bug fixes for the newly added Microchip SAMA7 platform,
     mostly regarding power management

   - Missing SMP barriers to protect accesses to SCMI virtio device

   - Regression fixes for TI OMAP, including a boot-time hang on am335x.

   - Lots of bug fixes for NXP i.MX, mostly addressing incorrect
     settings in devicetree files, and one revert for broken suspend.

   - Fixes for ARM Juno/Vexpress devicetree files, addressing a couple
     of schema warnings.

   - Regression fixes for qualcomm SoC specific drivers and devicetree
     files, reverting an mdt_loader change and at least pastially
     reverting some of the 5.15 DTS changes, plus some minor bugfixes"

* tag 'armsoc-fixes-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (64 commits)
  MAINTAINERS: Add Sven Peter as ARM/APPLE MACHINE maintainer
  MAINTAINERS: Add Alyssa Rosenzweig as M1 reviewer
  firmware: arm_scmi: Add proper barriers to scmi virtio device
  firmware: arm_scmi: Simplify spinlocks in virtio transport
  ARM: dts: omap3430-sdp: Fix NAND device node
  bus: ti-sysc: Use CLKDM_NOAUTO for dra7 dcan1 for errata i893
  ARM: sharpsl_param: work around -Wstringop-overread warning
  ARM: defconfig: gemini: Restore framebuffer
  ARM: dove: mark 'putc' as inline
  ARM: omap1: move omap15xx local bus handling to usb.c
  MAINTAINERS: Add Vignesh to TI K3 platform maintainership
  arm64: dts: imx8m*-venice-gw7902: fix M2_RST# gpio
  ARM: imx6: disable the GIC CPU interface before calling stby-poweroff sequence
  arm64: dts: ls1028a: fix eSDHC2 node
  arm64: dts: imx8mm-kontron-n801x-som: do not allow to switch off buck2
  ARM: dts: at91: sama7g5ek: to not touch slew-rate for SDMMC pins
  ARM: dts: at91: sama7g5ek: use proper slew-rate settings for GMACs
  ARM: at91: pm: preload base address of controllers in tlb
  ARM: at91: pm: group constants and addresses loading
  ARM: dts: at91: sama7g5ek: add suspend voltage for ddr3l rail
  ...

Merge tag 'asahi-soc-fixes-5.15' of https://github.com/AsahiLinux/linux into arm/fixes

Apple SoC fixes for 5.15; just two MAINTAINERS updates.

- MAINTAINERS: Add Sven Peter as ARM/APPLE MACHINE maintainer
- MAINTAINERS: Add Alyssa Rosenzweig as M1 reviewer

* tag 'asahi-soc-fixes-5.15' of https://github.com/AsahiLinux/linux:
MAINTAINERS: Add Sven Peter as ARM/APPLE MACHINE maintainer
MAINTAINERS: Add Alyssa Rosenzweig as M1 reviewer

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'scmi-fixes-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into arm/fixes

SCMI fixes for v5.15

A few fixes addressing:
- Kconfig dependency between VIRTIO and ARM_SCMI_PROTOCOL
- Link-time error with __exit annotation for virtio_scmi_exit
- Unnecessary nested irqsave/irqrestore spinlocks in virtio transport
- Missing SMP barriers to protect accesses to SCMI virtio device

* tag 'scmi-fixes-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
  firmware: arm_scmi: Add proper barriers to scmi virtio device
  firmware: arm_scmi: Simplify spinlocks in virtio transport
  firmware: arm_scmi: Remove __exit annotation
  firmware: arm_scmi: Fix virtio transport Kconfig dependency

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'omap-for-v5.15/fixes-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes

Fixes for omaps for v5.15

Few regression fixes for omaps for the v5.15-rc cycle. There is a fix
for boot time hangs that can happen on some am335x devices that started
when the pruss devicetree nodes were added. The other fixes are less
critical:

- Fix compiler warning for sysc_init_soc() that got recently introduced

- Fix external abort for am335x pruss as otherwise some am335x will hang

- Use CLKDM_NOAUTO quirk also for dra7 dcan1

- Fix older NAND device node regression for omap3-sdp

* tag 'omap-for-v5.15/fixes-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: omap3430-sdp: Fix NAND device node
  bus: ti-sysc: Use CLKDM_NOAUTO for dra7 dcan1 for errata i893
  soc: ti: omap-prm: Fix external abort for am335x pruss
  bus: ti-sysc: Add break in switch statement in sysc_init_soc()

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'misc-fixes-20211007' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull netfslib, cachefiles and afs fixes from David Howells:

- Fix another couple of oopses in cachefiles tracing stemming from the
   possibility of passing in a NULL object pointer

- Fix netfs_clear_unread() to set READ on the iov_iter so that source
   it is passed to doesn't do the wrong thing (some drivers look at the
   flag on iov_iter rather than other available information to determine
   the direction)

- Fix afs_launder_page() to write back at the correct file position on
   the server so as not to corrupt data

* tag 'misc-fixes-20211007' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Fix afs_launder_page() to set correct start file position
  netfs: Fix READ/WRITE confusion when calling iov_iter_xarray()
  cachefiles: Fix oops with cachefiles_cull() due to NULL object

Merge tag 'perf-tools-fixes-for-v5.15-2021-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

- Fix plugin static linking with libopencsd on ARM and ARM64

- Add missing -lstdc++ when linking with libopencsd

- Add missing topdown metrics events to 'perf test attr'

- Plug leak sys_event_tables list after processing JSON vendor events
   entries

- Sync sound/asound.h copy with the kernel sources

* tag 'perf-tools-fixes-for-v5.15-2021-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf tests attr: Add missing topdown metrics events
  tools include UAPI: Sync sound/asound.h copy with the kernel sources
  perf build: Fix plugin static linking with libopencsd on ARM and ARM64
  perf build: Add missing -lstdc++ when linking with libopencsd
  perf jevents: Free the sys_event_tables list after processing entries

Merge tag 'net-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from xfrm, bpf, netfilter, and wireless.

  Current release - regressions:

   - xfrm: fix XFRM_MSG_MAPPING ABI breakage caused by inserting a new
     value in the middle of an enum

   - unix: fix an issue in unix_shutdown causing the other end
     read/write failures

   - phy: mdio: fix memory leak

  Current release - new code bugs:

   - mlx5e: improve MQPRIO resiliency against bad configs

  Previous releases - regressions:

   - bpf: fix integer overflow leading to OOB access in map element
     pre-allocation

   - stmmac: dwmac-rk: fix ethernet on rk3399 based devices

   - netfilter: conntrack: fix boot failure with
     nf_conntrack.enable_hooks=1

   - brcmfmac: revert using ISO3166 country code and 0 rev as fallback

   - i40e: fix freeing of uninitialized misc IRQ vector

   - iavf: fix double unlock of crit_lock

  Previous releases - always broken:

   - bpf, arm: fix register clobbering in div/mod implementation

   - netfilter: nf_tables: correct issues in netlink rule change event
     notifications

   - dsa: tag_dsa: fix mask for trunked packets

   - usb: r8152: don't resubmit rx immediately to avoid soft lockup on
     device unplug

   - i40e: fix endless loop under rtnl if FW fails to correctly respond
     to capability query

   - mlx5e: fix rx checksum offload coexistence with ipsec offload

   - mlx5: force round second at 1PPS out start time and allow it only
     in supported clock modes

   - phy: pcs: xpcs: fix incorrect CL37 AN sequence, EEE disable
     sequence

  Misc:

   - xfrm: slightly rejig the new policy uAPI to make it less cryptic"

* tag 'net-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits)
  net: prefer socket bound to interface when not in VRF
  iavf: fix double unlock of crit_lock
  i40e: Fix freeing of uninitialized misc IRQ vector
  i40e: fix endless loop under rtnl
  dt-bindings: net: dsa: marvell: fix compatible in example
  ionic: move filter sync_needed bit set
  gve: report 64bit tx_bytes counter from gve_handle_report_stats()
  gve: fix gve_get_stats()
  rtnetlink: fix if_nlmsg_stats_size() under estimation
  gve: Properly handle errors in gve_assign_qpl
  gve: Avoid freeing NULL pointer
  gve: Correct available tx qpl check
  unix: Fix an issue in unix_shutdown causing the other end read/write failures
  net: stmmac: trigger PCS EEE to turn off on link down
  net: pcs: xpcs: fix incorrect steps on disable EEE
  netlink: annotate data races around nlk->bound
  net: pcs: xpcs: fix incorrect CL37 AN sequence
  net: sfp: Fix typo in state machine debug string
  net/sched: sch_taprio: properly cancel timer from taprio_destroy()
  net: bridge: fix under estimation in br_get_linkxstats_size()
  ...

Merge tag 'hyperv-fixes-signed-20211007' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

- Replace uuid.h with types.h in a header (Andy Shevchenko)

- Avoid sleeping in atomic context in PCI driver (Long Li)

- Avoid sending IPI to self when it shouldn't (Vitaly Kuznetsov)

* tag 'hyperv-fixes-signed-20211007' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/hyperv: Avoid erroneously sending IPI to 'self'
  hyper-v: Replace uuid.h with types.h
  PCI: hv: Fix sleep while in non-sleep context when removing child devices from the bus

PCI: ACPI: Check parent pointer in acpi_pci_find_companion()

If acpi_pci_find_companion() is called for a device whose parent
pointer is NULL, it will crash when attempting to get the ACPI
companion of the parent due to a NULL pointer dereference in
the ACPI_COMPANION() macro.

This was not a problem before commit 375553a93201 ("PCI: Setup ACPI
fwnode early and at the same time with OF") that made pci_setup_device()
call pci_set_acpi_fwnode() and so it allowed devices with NULL parent
pointers to be passed to acpi_pci_find_companion() which is the case
in pci_iov_add_virtfn(), for instance.

Fix this issue by making acpi_pci_find_companion() check the device's
parent pointer upfront and bail out if it is NULL.

While pci_iov_add_virtfn() can be changed to set the device's parent
pointer before calling pci_setup_device() for it, checking pointers
against NULL before dereferencing them is prudent anyway and looking
for ACPI companions of virtual functions isn't really useful.

Fixes: 375553a93201 ("PCI: Setup ACPI fwnode early and at the same time with OF")
Link: https://lore.kernel.org/linux-acpi/[email protected]/
Reported-by: Niklas Schnelle <[email protected]>
Tested-by: Niklas Schnelle <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>

MAINTAINERS: Add Sven Peter as ARM/APPLE MACHINE maintainer

Hector suggested I should add myself to help him maintain the
platform.

Acked-by: Hector Martin <[email protected]>
Signed-off-by: Sven Peter <[email protected]>

MAINTAINERS: Add Alyssa Rosenzweig as M1 reviewer

Add myself as a reviewer for Asahi Linux (Apple M1) patches.

I would like to be CC'ed on Asahi Linux patches for review and testing.
I am also collecting Asahi Linux patches downstream, rebasing on
linux-next periodically, and would like to be notified of what to
cherry-pick from lists.

Cc: Hector Martin <[email protected]>
Cc: Sven Peter <[email protected]>
Acked-by: Hector Martin <[email protected]>
Acked-by: Sven Peter <[email protected]>
Signed-off-by: Alyssa Rosenzweig <[email protected]>

ksmbd: fix oops from fuse driver

Marios reported kernel oops from fuse driver when ksmbd call
mark_inode_dirty(). This patch directly update ->i_ctime after removing
mark_inode_ditry() and notify_change will put inode to dirty list.

Cc: Tom Talpey <[email protected]>
Cc: Ronnie Sahlberg <[email protected]>
Cc: Ralph Böhme <[email protected]>
Cc: Hyunchul Lee <[email protected]>
Reported-by: Marios Makassikis <[email protected]>
Tested-by: Marios Makassikis <[email protected]>
Acked-by: Hyunchul Lee <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>

ksmbd: fix version mismatch with out of tree

Fix version mismatch with out of tree, This updated version will be
matched with ksmbd-tools.

Cc: Tom Talpey <[email protected]>
Cc: Ronnie Sahlberg <[email protected]>
Cc: Ralph Böhme <[email protected]>
Cc: Steve French <[email protected]>
Cc: Hyunchul Lee <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>

ksmbd: use buf_data_size instead of recalculation in smb3_decrypt_req()

Tom suggested to use buf_data_size that is already calculated, to verify
these offsets.

Cc: Tom Talpey <[email protected]>
Cc: Ronnie Sahlberg <[email protected]>
Cc: Ralph Böhme <[email protected]>
Suggested-by: Tom Talpey <[email protected]>
Acked-by: Hyunchul Lee <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>

ksmbd: remove the leftover of smb2.0 dialect support

Although ksmbd doesn't send SMB2.0 support in supported dialect list of smb
negotiate response, There is the leftover of smb2.0 dialect.
This patch remove it not to support SMB2.0 in ksmbd.

Cc: Tom Talpey <[email protected]>
Cc: Ronnie Sahlberg <[email protected]>
Cc: Ralph Böhme <[email protected]>
Cc: Hyunchul Lee <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>

ksmbd: check strictly data area in ksmbd_smb2_check_message()

When invalid data offset and data length in request,
ksmbd_smb2_check_message check strictly and doesn't allow to process such
requests.

Cc: Tom Talpey <[email protected]>
Cc: Ronnie Sahlberg <[email protected]>
Cc: Ralph Böhme <[email protected]>
Acked-by: Hyunchul Lee <[email protected]>
Reviewed-by: Ralph Boehme <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>

qcom_scm: hide Kconfig symbol

Now that SCM can be a loadable module, we have to add another
dependency to avoid link failures when ipa or adreno-gpu are
built-in:

aarch64-linux-ld: drivers/net/ipa/ipa_main.o: in function `ipa_probe':
ipa_main.c:(.text+0xfc4): undefined reference to `qcom_scm_is_available'

ld.lld: error: undefined symbol: qcom_scm_is_available
>>> referenced by adreno_gpu.c
>>>               gpu/drm/msm/adreno/adreno_gpu.o:(adreno_zap_shader_load) in archive drivers/built-in.a

This can happen when CONFIG_ARCH_QCOM is disabled and we don't select
QCOM_MDT_LOADER, but some other module selects QCOM_SCM. Ideally we'd
use a similar dependency here to what we have for QCOM_RPROC_COMMON,
but that causes dependency loops from other things selecting QCOM_SCM.

This appears to be an endless problem, so try something different this
time:

- CONFIG_QCOM_SCM becomes a hidden symbol that nothing 'depends on'
   but that is simply selected by all of its users

- All the stubs in include/linux/qcom_scm.h can go away

- arm-smccc.h needs to provide a stub for __arm_smccc_smc() to
   allow compile-testing QCOM_SCM on all architectures.

- To avoid a circular dependency chain involving RESET_CONTROLLER
   and PINCTRL_SUNXI, drop the 'select RESET_CONTROLLER' statement.
   According to my testing this still builds fine, and the QCOM
   platform selects this symbol already.

Acked-by: Kalle Valo <[email protected]>
Acked-by: Alex Elder <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>

firmware: include drivers/firmware/Kconfig unconditionally

Compile-testing drivers that require access to a firmware layer
fails when that firmware symbol is unavailable. This happened
twice this week:

- My proposed to change to rework the QCOM_SCM firmware symbol
   broke on ppc64 and others.

- The cs_dsp firmware patch added device specific firmware loader
   into drivers/firmware, which broke on the same set of
   architectures.

We should probably do the same thing for other subsystems as well,
but fix this one first as this is a dependency for other patches
getting merged.

Reviewed-by: Bjorn Andersson <[email protected]>
Reviewed-by: Charles Keepax <[email protected]>
Acked-by: Will Deacon <[email protected]>
Acked-by: Bjorn Andersson <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Liam Girdwood <[email protected]>
Cc: Charles Keepax <[email protected]>
Cc: Simon Trimmer <[email protected]>
Cc: Michael Ellerman <[email protected]>
Reviewed-by: Mark Brown <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>

net: prefer socket bound to interface when not in VRF

The commit 6da5b0f027a8 ("net: ensure unbound datagram socket to be
chosen when not in a VRF") modified compute_score() so that a device
match is always made, not just in the case of an l3mdev skb, then
increments the score also for unbound sockets. This ensures that
sockets bound to an l3mdev are never selected when not in a VRF.
But as unbound and bound sockets are now scored equally, this results
in the last opened socket being selected if there are matches in the
default VRF for an unbound socket and a socket bound to a dev that is
not an l3mdev. However, handling prior to this commit was to always
select the bound socket in this case. Reinstate this handling by
incrementing the score only for bound sockets. The required isolation
due to choosing between an unbound socket and a socket bound to an
l3mdev remains in place due to the device match always being made.
The same approach is taken for compute_score() for stream sockets.

Fixes: 6da5b0f027a8 ("net: ensure unbound datagram socket to be chosen when not in a VRF")
Fixes: e78190581aff ("net: ensure unbound stream socket to be chosen when not in a VRF")
Signed-off-by: Mike Manning <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2021-10-07

We've added 7 non-merge commits during the last 8 day(s) which contain
a total of 8 files changed, 38 insertions(+), 21 deletions(-).

The main changes are:

1) Fix ARM BPF JIT to preserve caller-saved regs for DIV/MOD JIT-internal
   helper call, from Johan Almbladh.

2) Fix integer overflow in BPF stack map element size calculation when
   used with preallocation, from Tatsuhiko Yasumatsu.

3) Fix an AF_UNIX regression due to added BPF sockmap support related
   to shutdown handling, from Jiang Wang.

4) Fix a segfault in libbpf when generating light skeletons from objects
   without BTF, from Kumar Kartikeya Dwivedi.

5) Fix a libbpf memory leak in strset to free the actual struct strset
   itself, from Andrii Nakryiko.

6) Dual-license bpf_insn.h similarly as we did for libbpf and bpftool,
   with ACKs from all contributors, from Luca Boccassi.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

pseries/eeh: Fix the kdump kernel crash during eeh_pseries_init

On pseries LPAR when an empty slot is assigned to partition OR in single
LPAR mode, kdump kernel crashes during issuing PHB reset.

In the kdump scenario, we traverse all PHBs and issue reset using the
pe_config_addr of the first child device present under each PHB. However
the code assumes that none of the PHB slots can be empty and uses
list_first_entry() to get the first child device under the PHB. Since
list_first_entry() expects the list to be non-empty, it returns an
invalid pci_dn entry and ends up accessing NULL phb pointer under
pci_dn->phb causing kdump kernel crash.

This patch fixes the below kdump kernel crash by skipping empty slots:

  audit: initializing netlink subsys (disabled)
  thermal_sys: Registered thermal governor 'fair_share'
  thermal_sys: Registered thermal governor 'step_wise'
  cpuidle: using governor menu
  pstore: Registered nvram as persistent store backend
  Issue PHB reset ...
  audit: type=2000 audit(1631267818.000:1): state=initialized audit_enabled=0 res=1
  BUG: Kernel NULL pointer dereference on read at 0x00000268
  Faulting instruction address: 0xc000000008101fb0
  Oops: Kernel access of bad area, sig: 7 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in:
  CPU: 7 PID: 1 Comm: swapper/7 Not tainted 5.14.0 #1
  NIP:  c000000008101fb0 LR: c000000009284ccc CTR: c000000008029d70
  REGS: c00000001161b840 TRAP: 0300   Not tainted  (5.14.0)
  MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000224  XER: 20040002
  CFAR: c000000008101f0c DAR: 0000000000000268 DSISR: 00080000 IRQMASK: 0
  ...
  NIP pseries_eeh_get_pe_config_addr+0x100/0x1b0
  LR  __machine_initcall_pseries_eeh_pseries_init+0x2cc/0x350
  Call Trace:
    0xc00000001161bb80 (unreliable)
    __machine_initcall_pseries_eeh_pseries_init+0x2cc/0x350
    do_one_initcall+0x60/0x2d0
    kernel_init_freeable+0x350/0x3f8
    kernel_init+0x3c/0x17c
    ret_from_kernel_thread+0x5c/0x64

Fixes: 5a090f7c363fd ("powerpc/pseries: PCIE PHB reset")
Signed-off-by: Mahesh Salgaonkar <[email protected]>
[mpe: Tweak wording and trim oops]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/163215558252.413351.8600189949820258982.stgit@jupiter

powerpc/32s: Fix kuap_kernel_restore()

At interrupt exit, kuap_kernel_restore() calls kuap_unlock() with the
value contained in regs->kuap. However, when regs->kuap contains
0xffffffff it means that KUAP was not unlocked so calling kuap_unlock()
is unrelevant and results in jeopardising the contents of kernel space
segment registers.

So check that regs->kuap doesn't contain KUAP_NONE before calling
kuap_unlock(). In the meantime it also means that if KUAP has not
been correcly locked back at interrupt exit, it must be locked
before continuing. This is done by checking the content of
current->thread.kuap which was returned by kuap_get_and_assert_locked()

Fixes: 16132529cee5 ("powerpc/32s: Rework Kernel Userspace Access Protection")
Reported-by: Stan Johnson <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/0d0c4d0f050a637052287c09ba521bad960a2790.1631715131.git.christophe.leroy@csgroup.eu

powerpc/pseries/msi: Add an empty irq_write_msi_msg() handler

The IPR drivers tests for MSI support at probe time with MSI vector 0
and when done, frees the IRQ with free_irq(). This test was introduced
by 95fecd90397e ("ipr: add test for MSI interrupt support") as an
improvement of commit 5a9ef25b14d3 ("[SCSI] ipr: add MSI support")
because a boot failure was reported on a Bimini PowerPC system:

https://lore.kernel.org/r/1242926159 [email protected]

It was finally decided to remove MSI support on Bimini systems in
6eb0ac03899a ("powerpc/maple: Add a quirk to disable MSI for IPR on
Bimini").

Linux 5.15-rc1 added MSI domain support to the pseries machine and
when free_irq is called() in the driver, msi_domain_deactivate() also
is. This resets the MSI table entry of the associate vector by calling
__pci_write_msi_msg() with an empty message and breaks any further
activation of the same vector. In the case of the IPR driver, it
breaks the initialization sequence of the IOA.

Introduce an empty irq_write_msi_msg() handler in the MSI domain of
the pseries machine to avoid clearing the MSI vector entry. Updating
the entry is not strictly necessary since it is initialized by the
underlying hypervisor, PowerVM or QEMU/KVM.

Fixes: a5f3d2c17b07 ("powerpc/pseries/pci: Add MSI domains")
Signed-off-by: Cédric Le Goater <[email protected]>
Reported-by: Abdul Haleem <[email protected]>
Tested-by: Mahesh Salgaonkar <[email protected]>
[mpe: Tweak comment wording and formatting slightly]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/
ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2021-10-07

1) Fix a sysbot reported shift-out-of-bounds in xfrm_get_default.
   From Pavel Skripkin.

2) Fix XFRM_MSG_MAPPING ABI breakage. The new XFRM_MSG_MAPPING
   messages were accidentally not paced at the end.
   Fix by Eugene Syromiatnikov.

3) Fix the uapi for the default policy, use explicit field and macros
   and make it accessible to userland.
   From Nicolas Dichtel.

4) Fix a missing rcu lock in xfrm_notify_userpolicy().
   From Nicolas Dichtel.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-
queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2021-10-06

This series contains updates to i40e and iavf drivers.

Jiri Benc expands an error check to prevent infinite loop for i40e.

Sylwester prevents freeing of uninitialized IRQ vector to resolve a
kernel oops for i40e.

Stefan Assmann fixes a double mutex unlock for iavf.
====================

Signed-off-by: David S. Miller <[email protected]>

powerpc/64s: Fix unrecoverable MCE calling async handler from NMI

The machine check handler is not considered NMI on 64s. The early
handler is the true NMI handler, and then it schedules the
machine_check_exception handler to run when interrupts are enabled.

This works fine except the case of an unrecoverable MCE, where the true
NMI is taken when MSR[RI] is clear, it can not recover, so it calls
machine_check_exception directly so something might be done about it.

Calling an async handler from NMI context can result in irq state and
other things getting corrupted. This can also trigger the BUG at
arch/powerpc/include/asm/interrupt.h:168
BUG_ON(!arch_irq_disabled_regs(regs) && !(regs->msr & MSR_EE));

Fix this by making an _async version of the handler which is called
in the normal case, and a NMI version that is called for unrecoverable
interrupts.

Fixes: 2b43dd7653cc ("powerpc/64: enable MSR[EE] in irq replay pt_regs")
Signed-off-by: Nicholas Piggin <[email protected]>
Tested-by: Cédric Le Goater <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

powerpc/64/interrupt: Reconcile soft-mask state in NMI and fix false BUG

If a NMI hits early in an interrupt handler before the irq soft-mask
state is reconciled, that can cause a false-positive BUG with a
CONFIG_PPC_IRQ_SOFT_MASK_DEBUG assertion.

Remove that assertion and instead check the case that if regs->msr has
EE clear, then regs->softe should be marked as disabled so the irq state
looks correct to NMI handlers, the same as how it's fixed up in the
case it was implicit soft-masked.

This doesn't fix a known problem -- the change that was fixed by commit
4ec5feec1ad02 ("powerpc/64s: Make NMI record implicitly soft-masked code
as irqs disabled") was the addition of a warning in the soft-nmi
watchdog interrupt which can never actually fire when MSR[EE]=0. However
it may be important if NMI handlers grow more code, and it's less
surprising to anything using 'regs' - (I tripped over this when working
in the area).

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

powerpc/64: warn if local irqs are enabled in NMI or hardirq context

This can help catch bugs such as the one fixed by the previous change
to prevent _exception() from enabling irqs.

ppc32 could have a similar warning but it has no good config option to
debug this stuff (the test may be overkill to add for production
kernels).

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

powerpc/traps: do not enable irqs in _exception

_exception can be called by machine check handlers when the MCE hits
user code (e.g., pseries and powernv). This will enable local irqs
because, which is a dicey thing to do in NMI or hard irq context.

This seemed to worked out okay because a userspace MCE can basically be
treated like a synchronous interrupt (after async / imprecise MCEs are
filtered out). Since NMI and hard irq handlers have started growing
nmi_enter / irq_enter, and more irq state sanity checks, this has
started to cause problems (or at least trigger warnings).

The Fixes tag to the commit which introduced this rather than try to
work out exactly which commit was the first that could possibly cause a
problem because that may be difficult to prove.

Fixes: 9f2f79e3a3c1 ("powerpc: Disable interrupts in 64-bit kernel FP and vector faults")
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

powerpc/64s: fix program check interrupt emergency stack path

Emergency stack path was jumping into a 3: label inside the
__GEN_COMMON_BODY macro for the normal path after it had finished,
rather than jumping over it. By a small miracle this is the correct
place to build up a new interrupt frame with the existing stack
pointer, so things basically worked okay with an added weird looking
700 trap frame on top (which had the wrong ->nip so it didn't decode
bug messages either).

Fix this by avoiding using numeric labels when jumping over non-trivial
macros.

Before:

LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 88 Comm: sh Not tainted 5.15.0-rc2-00034-ge057cdade6e5 #2637
NIP:  7265677368657265 LR: c00000000006c0c8 CTR: c0000000000097f0
REGS: c0000000fffb3a50 TRAP: 0700   Not tainted
MSR:  9000000000021031 <SF,HV,ME,IR,DR,LE>  CR: 00000700  XER: 20040000
CFAR: c0000000000098b0 IRQMASK: 0
GPR00: c00000000006c964 c0000000fffb3cf0 c000000001513800 0000000000000000
GPR04: 0000000048ab0778 0000000042000000 0000000000000000 0000000000001299
GPR08: 000001e447c718ec 0000000022424282 0000000000002710 c00000000006bee8
GPR12: 9000000000009033 c0000000016b0000 00000000000000b0 0000000000000001
GPR16: 0000000000000000 0000000000000002 0000000000000000 0000000000000ff8
GPR20: 0000000000001fff 0000000000000007 0000000000000080 00007fff89d90158
GPR24: 0000000002000000 0000000002000000 0000000000000255 0000000000000300
GPR28: c000000001270000 0000000042000000 0000000048ab0778 c000000080647e80
NIP [7265677368657265] 0x7265677368657265
LR [c00000000006c0c8] ___do_page_fault+0x3f8/0xb10
Call Trace:
[c0000000fffb3cf0] [c00000000000bdac] soft_nmi_common+0x13c/0x1d0 (unreliable)
--- interrupt: 700 at decrementer_common_virt+0xb8/0x230
NIP:  c0000000000098b8 LR: c00000000006c0c8 CTR: c0000000000097f0
REGS: c0000000fffb3d60 TRAP: 0700   Not tainted
MSR:  9000000000021031 <SF,HV,ME,IR,DR,LE>  CR: 22424282  XER: 20040000
CFAR: c0000000000098b0 IRQMASK: 0
GPR00: c00000000006c964 0000000000002400 c000000001513800 0000000000000000
GPR04: 0000000048ab0778 0000000042000000 0000000000000000 0000000000001299
GPR08: 000001e447c718ec 0000000022424282 0000000000002710 c00000000006bee8
GPR12: 9000000000009033 c0000000016b0000 00000000000000b0 0000000000000001
GPR16: 0000000000000000 0000000000000002 0000000000000000 0000000000000ff8
GPR20: 0000000000001fff 0000000000000007 0000000000000080 00007fff89d90158
GPR24: 0000000002000000 0000000002000000 0000000000000255 0000000000000300
GPR28: c000000001270000 0000000042000000 0000000048ab0778 c000000080647e80
NIP [c0000000000098b8] decrementer_common_virt+0xb8/0x230
LR [c00000000006c0c8] ___do_page_fault+0x3f8/0xb10
--- interrupt: 700
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace 6d28218e0cc3c949 ]---

After:

------------[ cut here ]------------
kernel BUG at arch/powerpc/kernel/exceptions-64s.S:491!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 88 Comm: login Not tainted 5.15.0-rc2-00034-ge057cdade6e5-dirty #2638
NIP:  c0000000000098b8 LR: c00000000006bf04 CTR: c0000000000097f0
REGS: c0000000fffb3d60 TRAP: 0700   Not tainted
MSR:  9000000000021031 <SF,HV,ME,IR,DR,LE>  CR: 24482227  XER: 00040000
CFAR: c0000000000098b0 IRQMASK: 0
GPR00: c00000000006bf04 0000000000002400 c000000001513800 c000000001271868
GPR04: 00000000100f0d29 0000000042000000 0000000000000007 0000000000000009
GPR08: 00000000100f0d29 0000000024482227 0000000000002710 c000000000181b3c
GPR12: 9000000000009033 c0000000016b0000 00000000100f0d29 c000000005b22f00
GPR16: 00000000ffff0000 0000000000000001 0000000000000009 00000000100eed90
GPR20: 00000000100eed90 0000000010000000 000000001000a49c 00000000100f1430
GPR24: c000000001271868 0000000002000000 0000000000000215 0000000000000300
GPR28: c000000001271800 0000000042000000 00000000100f0d29 c000000080647860
NIP [c0000000000098b8] decrementer_common_virt+0xb8/0x230
LR [c00000000006bf04] ___do_page_fault+0x234/0xb10
Call Trace:
Instruction dump:
4182000c 39400001 48000008 894d0932 714a0001 39400008 408225fc 718a4000
7c2a0b78 3821fcf0 41c20008 e82d0910 <0981fcf0> f92101a0 f9610170 f9810178
---[ end trace a5dbd1f5ea4ccc51 ]---

Fixes: 0a882e28468f4 ("powerpc/64s/exception: remove bad stack branch")
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

powerpc/bpf ppc32: Fix BPF_SUB when imm == 0x80000000

Special case handling of the smallest 32-bit negative number for BPF_SUB.

Fixes: 51c66ad849a703 ("powerpc/bpf: Implement extended BPF on PPC32")
Signed-off-by: Naveen N. Rao <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/7135360a0cdf70adedbccf9863128b8daef18764.1633464148.git.naveen.n.rao@linux.vnet.ibm.com

powerpc/bpf ppc32: Do not emit zero extend instruction for 64-bit BPF_END

Suppress emitting zero extend instruction for 64-bit BPF_END_FROM_[L|B]E
operation.

Fixes: 51c66ad849a703 ("powerpc/bpf: Implement extended BPF on PPC32")
Signed-off-by: Naveen N. Rao <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/b4e3c3546121315a8e2059b19a1bda84971816e4.1633464148.git.naveen.n.rao@linux.vnet.ibm.com

powerpc/bpf ppc32: Fix JMP32_JSET_K

'andi' only takes an unsigned 16-bit value. Correct the imm range used
when emitting andi.

Fixes: 51c66ad849a703 ("powerpc/bpf: Implement extended BPF on PPC32")
Signed-off-by: Naveen N. Rao <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/b94489f52831305ec15aca4dd04a3527236be7e8.1633464148.git.naveen.n.rao@linux.vnet.ibm.com

powerpc/bpf ppc32: Fix ALU32 BPF_ARSH operation

Correct the destination register used for ALU32 BPF_ARSH operation.

Fixes: 51c66ad849a703 ("powerpc/bpf: Implement extended BPF on PPC32")
Signed-off-by: Naveen N. Rao <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/6d24c1f9e79b6f61f5135eaf2ea1e8bcd4dac87b.1633464148.git.naveen.n.rao@linux.vnet.ibm.com

powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC

Emit similar instruction sequences to commit a048a07d7f4535
("powerpc/64s: Add support for a store forwarding barrier at kernel
entry/exit") when encountering BPF_NOSPEC.

Mitigations are enabled depending on what the firmware advertises. In
particular, we do not gate these mitigations based on current settings,
just like in x86. Due to this, we don't need to take any action if
mitigations are enabled or disabled at runtime.

Signed-off-by: Naveen N. Rao <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/956570cbc191cd41f8274bed48ee757a86dac62a.1633464148.git.naveen.n.rao@linux.vnet.ibm.com

powerpc/security: Add a helper to query stf_barrier type

Add a helper to return the stf_barrier type for the current processor.

Signed-off-by: Naveen N. Rao <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/3bd5d7f96ea1547991ac2ce3137dc2b220bae285.1633464148.git.naveen.n.rao@linux.vnet.ibm.com

powerpc/bpf: Fix BPF_SUB when imm == 0x80000000

We aren't handling subtraction involving an immediate value of
0x80000000 properly. Fix the same.

Fixes: 156d0e290e969c ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
Signed-off-by: Naveen N. Rao <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
[mpe: Fold in fix from Naveen to use imm <= 32768]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/fc4b1276eb10761fd7ce0814c8dd089da2815251.1633464148.git.naveen.n.rao@linux.vnet.ibm.com

powerpc/bpf: Fix BPF_MOD when imm == 1

Only ignore the operation if dividing by 1.

Fixes: 156d0e290e969c ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
Signed-off-by: Naveen N. Rao <[email protected]>
Tested-by: Johan Almbladh <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Acked-by: Song Liu <[email protected]>
Acked-by: Johan Almbladh <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/c674ca18c3046885602caebb326213731c675d06.1633464148.git.naveen.n.rao@linux.vnet.ibm.com

powerpc/bpf: Validate branch ranges

Add checks to ensure that we never emit branch instructions with
truncated branch offsets.

Suggested-by: Michael Ellerman <[email protected]>
Signed-off-by: Naveen N. Rao <[email protected]>
Tested-by: Johan Almbladh <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Acked-by: Song Liu <[email protected]>
Acked-by: Johan Almbladh <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/71d33a6b7603ec1013c9734dd8bdd4ff5e929142.1633464148.git.naveen.n.rao@linux.vnet.ibm.com