Git Repo - linux.git/log

Merge tag 'nfs-for-5.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
"Highlights include:

  Stable fixes:

   - SUNRPC: Ensure bvecs are re-synced when we re-encode the RPC
     request

   - Fix an Oops in ff_layout_track_ds_error due to a PTR_ERR()
     dereference

   - Revert buggy NFS readdirplus optimisation

   - NFSv4: Handle the special Linux file open access mode

   - pnfs: Fix a problem where we gratuitously start doing I/O through
     the MDS

  Features:

   - Allow NFS client to set up multiple TCP connections to the server
     using a new 'nconnect=X' mount option. Queue length is used to
     balance load.

   - Enhance statistics reporting to report on all transports when using
     multiple connections.

   - Speed up SUNRPC by removing bh-safe spinlocks

   - Add a mechanism to allow NFSv4 to request that containers set a
     unique per-host identifier for when the hostname is not set.

   - Ensure NFSv4 updates the lease_time after a clientid update

  Bugfixes and cleanup:

   - Fix use-after-free in rpcrdma_post_recvs

   - Fix a memory leak when nfs_match_client() is interrupted

   - Fix buggy file access checking in NFSv4 open for execute

   - disable unsupported client side deduplication

   - Fix spurious client disconnections

   - Fix occasional RDMA transport deadlock

   - Various RDMA cleanups

   - Various tracepoint fixes

   - Fix the TCP callback channel to guarantee the server can actually
     send the number of callback requests that was negotiated at mount
     time"

* tag 'nfs-for-5.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (68 commits)
  pnfs/flexfiles: Add tracepoints for detecting pnfs fallback to MDS
  pnfs: Fix a problem where we gratuitously start doing I/O through the MDS
  SUNRPC: Optimise transport balancing code
  SUNRPC: Ensure the bvecs are reset when we re-encode the RPC request
  pnfs/flexfiles: Fix PTR_ERR() dereferences in ff_layout_track_ds_error
  NFSv4: Don't use the zero stateid with layoutget
  SUNRPC: Fix up backchannel slot table accounting
  SUNRPC: Fix initialisation of struct rpc_xprt_switch
  SUNRPC: Skip zero-refcount transports
  SUNRPC: Replace division by multiplication in calculation of queue length
  NFSv4: Validate the stateid before applying it to state recovery
  nfs4.0: Refetch lease_time after clientid update
  nfs4: Rename nfs41_setup_state_renewal
  nfs4: Make nfs4_proc_get_lease_time available for nfs4.0
  nfs: Fix copy-and-paste error in debug message
  NFS: Replace 16 seq_printf() calls by seq_puts()
  NFS: Use seq_putc() in nfs_show_stats()
  Revert "NFS: readdirplus optimization by cache mechanism" (memleak)
  SUNRPC: Fix transport accounting when caller specifies an rpc_xprt
  NFS: Record task, client ID, and XID in xdr_status trace points
  ...

sched/rt, Kconfig: Introduce CONFIG_PREEMPT_RT

Add a new entry to the preemption menu which enables the real-time support
for the kernel. The choice is only enabled when an architecture supports
it.

It selects PREEMPT as the RT features depend on it. To achieve that the
existing PREEMPT choice is renamed to PREEMPT_LL which select PREEMPT as
well.

No functional change.

Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
Acked-by: Steven Rostedt (VMware) <[email protected]>
Acked-by: Clark Williams <[email protected]>
Acked-by: Daniel Bristot de Oliveira <[email protected]>
Acked-by: Frederic Weisbecker <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Acked-by: Daniel Wagner <[email protected]>
Acked-by: Luis Claudio R. Goncalves <[email protected]>
Acked-by: Julia Cartwright <[email protected]>
Acked-by: Tom Zanussi <[email protected]>
Acked-by: Gratian Crisan <[email protected]>
Acked-by: Sebastian Siewior <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Lukas Bulwahn <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Tejun Heo <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Alexei Starovoitov says:

====================
pull-request: bpf 2019-07-18

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) verifier precision propagation fix, from Andrii.

2) BTF size fix for typedefs, from Andrii.

3) a bunch of big endian fixes, from Ilya.

4) wide load from bpf_sock_addr fixes, from Stanislav.

5) a bunch of misc fixes from a number of developers.
====================

Signed-off-by: David S. Miller <[email protected]>

selftests/bpf: fix test_xdp_noinline on s390

test_xdp_noinline fails on s390 due to a handful of endianness issues.
Use ntohs for parsing eth_proto.
Replace bswaps with ntohs/htons.

Signed-off-by: Ilya Leoshkevich <[email protected]>
Acked-by: Vasily Gorbik <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

selftests/bpf: fix "valid read map access into a read-only array 1" on s390

This test looks up a 32-bit map element and then loads it using a 64-bit
load. This does not work on s390, which is a big-endian machine.

Since the point of this test doesn't seem to be loading a smaller value
using a larger load, simply use a 32-bit load.

Signed-off-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

pnfs/flexfiles: Add tracepoints for detecting pnfs fallback to MDS

Add tracepoints to allow debugging of the event chain leading to
a pnfs fallback to doing I/O through the MDS.

Signed-off-by: Trond Myklebust <[email protected]>

x86, boot: Remove multiple copy of static function sanitize_boot_params()

Kernel build warns:
'sanitize_boot_params' defined but not used [-Wunused-function]

at below files:
  arch/x86/boot/compressed/cmdline.c
  arch/x86/boot/compressed/error.c
  arch/x86/boot/compressed/early_serial_console.c
  arch/x86/boot/compressed/acpi.c

That's becausethey each include misc.h which includes a definition of
sanitize_boot_params() via bootparam_utils.h.

Remove the inclusion from misc.h and have the c file including
bootparam_utils.h directly.

Signed-off-by: Zhenzhong Duan <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

x86/boot/compressed/64: Remove unused variable

Fix gcc warning:

arch/x86/boot/compressed/pgtable_64.c: In function 'find_trampoline_placement':
arch/x86/boot/compressed/pgtable_64.c:43:16: warning: unused variable 'trampoline_start' [-Wunused-variable]
unsigned long trampoline_start;
^

Signed-off-by: Zhenzhong Duan <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

x86/boot/efi: Remove unused variables

Fix gcc warnings:

arch/x86/boot/compressed/eboot.c: In function 'make_boot_params':
arch/x86/boot/compressed/eboot.c:394:6: warning: unused variable 'i' [-Wunused-variable]
  int i;
      ^
arch/x86/boot/compressed/eboot.c:393:6: warning: unused variable 's1' [-Wunused-variable]
  u8 *s1;
      ^
arch/x86/boot/compressed/eboot.c:392:7: warning: unused variable 's2' [-Wunused-variable]
  u16 *s2;
       ^
arch/x86/boot/compressed/eboot.c:387:8: warning: unused variable 'options' [-Wunused-variable]
  void *options, *handle;
        ^
arch/x86/boot/compressed/eboot.c: In function 'add_e820ext':
arch/x86/boot/compressed/eboot.c:498:16: warning: unused variable 'size' [-Wunused-variable]
  unsigned long size;
                ^
arch/x86/boot/compressed/eboot.c:497:15: warning: unused variable 'status' [-Wunused-variable]
  efi_status_t status;
               ^
arch/x86/boot/compressed/eboot.c: In function 'exit_boot_func':
arch/x86/boot/compressed/eboot.c:681:15: warning: unused variable 'status' [-Wunused-variable]
  efi_status_t status;
               ^
arch/x86/boot/compressed/eboot.c:680:8: warning: unused variable 'nr_desc' [-Wunused-variable]
  __u32 nr_desc;
        ^
arch/x86/boot/compressed/eboot.c: In function 'efi_main':
arch/x86/boot/compressed/eboot.c:750:22: warning: unused variable 'image' [-Wunused-variable]
  efi_loaded_image_t *image;
                      ^

Signed-off-by: Zhenzhong Duan <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

pnfs: Fix a problem where we gratuitously start doing I/O through the MDS

If the client has to stop in pnfs_update_layout() to wait for another
layoutget to complete, it currently exits and defaults to I/O through
the MDS if the layoutget was successful.

Fixes: d03360aaf5cc ("pNFS: Ensure we return the error if someone kills...")
Signed-off-by: Trond Myklebust <[email protected]>
Cc: [email protected] # v4.20+

Merge tag 'riscv/for-v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Paul Walmsley:

- Hugepage support

- "Image" header support for RISC-V kernel binaries, compatible with
   the current ARM64 "Image" header

- Initial page table setup now split into two stages

- CONFIG_SOC support (starting with SiFive SoCs)

- Avoid reserving memory between RAM start and the kernel in
   setup_bootmem()

- Enable high-res timers and dynamic tick in the RV64 defconfig

- Remove long-deprecated gate area stubs

- MAINTAINERS updates to switch to the newly-created shared RISC-V git
   tree, and to fix a get_maintainers.pl issue for patches involving
   SiFive E-mail addresses

Also, one integration fix to resolve a build problem introduced during
in the v5.3-rc1 merge window:

- Fix build break after macro-to-function conversion in
   asm-generic/cacheflush.h

* tag 'riscv/for-v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: fix build break after macro-to-function conversion in generic cacheflush.h
  RISC-V: Add an Image header that boot loader can parse.
  RISC-V: Setup initial page tables in two stages
  riscv: remove free_initrd_mem
  riscv: ccache: Remove unused variable
  riscv: Introduce huge page support for 32/64bit kernel
  x86, arm64: Move ARCH_WANT_HUGE_PMD_SHARE config in arch/Kconfig
  RISC-V: Fix memory reservation in setup_bootmem()
  riscv: defconfig: enable SOC_SIFIVE
  riscv: select SiFive platform drivers with SOC_SIFIVE
  arch: riscv: add config option for building SiFive's SoC resource
  riscv: Remove gate area stubs
  MAINTAINERS: change the arch/riscv git tree to the new shared tree
  MAINTAINERS: don't automatically patches involving SiFive to the linux-riscv list
  RISC-V: defconfig: Enable NO_HZ_IDLE and HIGH_RES_TIMERS

Merge branch 'parisc-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc fixes from Helge Deller:

- Prevent kernel panics by adding proper checking of register values
   injected via the ptrace interface

- Wire up the new clone3 syscall

* 'parisc-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Wire up clone3 syscall
  parisc: Avoid kernel panic triggered by invalid kprobe
  parisc: Ensure userspace privilege for ptraced processes in regset functions
  parisc: Fix kernel panic due invalid values in IAOQ0 or IAOQ1

drm/amd/display: init res_pool dccg_ref, dchub_ref with xtalin_freq

[WHY] dc sw clock implementation of navi10 and raven are not exact the
same. dcccg, dchub reference clock initialization is done after dc calls
vbios dispcontroller_init table. for raven family, before
dispcontroller_init is called by dc, the ref clk values are referred
by sw clock implementation and program asic register using wrong
values. this causes dchub pstate error. This need provide valid ref
clk values. for navi10, since dispcontroller_init is not called,
dchubbub_global_timer_enable = 0, hubbub2_get_dchub_ref_freq will
hit aeert. this need remove hubbub2_get_dchub_ref_freq from this
location and move to dcn20_init_hw.

[HOW] for all asic, initialize dccg, dchub ref clk with data from
vbios firmware table by default. for raven asic family, use these data
from vbios, for asic which support sw dccg component, like navi10,
read ref clk by sw dccg functions and update the ref clk.

Signed-off-by: hersen wu <[email protected]>
Reviewed-by: Jun Lei <[email protected]>
Acked-by: Leo Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu/pm: remove check for pp funcs in freq sysfs handlers

The dpm sensor function already does this for us. This fixes
the freq*_input files with the new SMU implementation.

Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Force uclk to max for every state

Workaround for now to avoid underflow.

The uclk switch time should really be bumped up to 404, but doing so
would expose p-state hang issues for higher bandwidth display
configurations.

Signed-off-by: Nicholas Kazlauskas <[email protected]>
Signed-off-by: Leo Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

net/mlx5: Replace kfree with kvfree

Variable allocated by kvmalloc should not be freed by kfree.
Because it may be allocated by vmalloc.
So replace kfree with kvfree here.

Fixes: 9b1f298236057 ("net/mlx5: Add support for FW fatal reporter dump")
Signed-off-by: Chuhong Yuan <[email protected]>
Acked-by: Saeed Mahameed <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'modules-for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux

Pull module updates from Jessica Yu:
"Summary of modules changes for the 5.3 merge window:

   - Code fixes and cleanups

   - Fix bug where set_memory_x() wasn't being called when rodata=n

   - Fix bug where -EEXIST was being returned for going modules

   - Allow arches to override module_exit_section()"

* tag 'modules-for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  modules: fix compile error if don't have strict module rwx
  ARM: module: recognize unwind exit sections
  module: allow arch overrides for .exit section names
  modules: fix BUG when load module with rodata=n
  kernel/module: Fix mem leak in module_add_modinfo_attrs
  kernel: module: Use struct_size() helper
  kernel/module.c: Only return -EEXIST for modules that have finished loading

MAINTAINERS: update netsec driver

Add myself to maintainers since i provided the XDP and page_pool
implementation

Signed-off-by: Ilias Apalodimas <[email protected]>
Acked-by: Jassi Brar <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cifs: copy_file_range needs to strip setuid bits and update timestamps

cifs has both source and destination inodes locked throughout the copy.
Like ->write_iter(), we update mtime and strip setuid bits of destination
file before copy and like ->read_iter(), we update atime of source file
after copy.

Signed-off-by: Amir Goldstein <[email protected]>
Signed-off-by: Steve French <[email protected]>

ipv6: Unlink sibling route in case of failure

When a route needs to be appended to an existing multipath route,
fib6_add_rt2node() first appends it to the siblings list and increments
the number of sibling routes on each sibling.

Later, the function notifies the route via call_fib6_entry_notifiers().
In case the notification is vetoed, the route is not unlinked from the
siblings list, which can result in a use-after-free.

Fix this by unlinking the route from the siblings list before returning
an error.

Audited the rest of the call sites from which the FIB notification chain
is called and could not find more problems.

Fixes: 2233000cba40 ("net/ipv6: Move call_fib6_entry_notifiers up for route adds")
Signed-off-by: Ido Schimmel <[email protected]>
Reported-by: Alexander Petrovskiy <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

objtool: Support conditional retpolines

A Clang-built kernel is showing the following warning:

  arch/x86/kernel/platform-quirks.o: warning: objtool: x86_early_init_platform_quirks()+0x84: unreachable instruction

That corresponds to this code:

  7e:   0f 85 00 00 00 00       jne    84 <x86_early_init_platform_quirks+0x84>
                        80: R_X86_64_PC32       __x86_indirect_thunk_r11-0x4
  84:   c3                      retq

This is a conditional retpoline sibling call, which is now possible
thanks to retpolines.  Objtool hasn't seen that before.  It's
incorrectly interpreting the conditional jump as an unconditional
dynamic jump.

Reported-by: Nick Desaulniers <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/30d4c758b267ef487fb97e6ecb2f148ad007b554.1563413318.git.jpoimboe@redhat.com

objtool: Convert insn type to enum

This makes it easier to add new instruction types. Also it's hopefully
more robust since the compiler should warn about out-of-range enums.

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/0740e96af0d40e54cfd6a07bf09db0fbd10793cd.1563413318.git.jpoimboe@redhat.com

objtool: Fix seg fault on bad switch table entry

In one rare case, Clang generated the following code:

5ca:       83 e0 21                and    $0x21,%eax
5cd:       b9 04 00 00 00          mov    $0x4,%ecx
5d2:       ff 24 c5 00 00 00 00    jmpq   *0x0(,%rax,8)
                    5d5: R_X86_64_32S       .rodata+0x38

which uses the corresponding jump table relocations:

  000000000038  000200000001 R_X86_64_64       0000000000000000 .text + 834
  000000000040  000200000001 R_X86_64_64       0000000000000000 .text + 5d9
  000000000048  000200000001 R_X86_64_64       0000000000000000 .text + b96
  000000000050  000200000001 R_X86_64_64       0000000000000000 .text + b96
  000000000058  000200000001 R_X86_64_64       0000000000000000 .text + b96
  000000000060  000200000001 R_X86_64_64       0000000000000000 .text + b96
  000000000068  000200000001 R_X86_64_64       0000000000000000 .text + b96
  000000000070  000200000001 R_X86_64_64       0000000000000000 .text + b96
  000000000078  000200000001 R_X86_64_64       0000000000000000 .text + b96
  000000000080  000200000001 R_X86_64_64       0000000000000000 .text + b96
  000000000088  000200000001 R_X86_64_64       0000000000000000 .text + b96
  000000000090  000200000001 R_X86_64_64       0000000000000000 .text + b96
  000000000098  000200000001 R_X86_64_64       0000000000000000 .text + b96
  0000000000a0  000200000001 R_X86_64_64       0000000000000000 .text + b96
  0000000000a8  000200000001 R_X86_64_64       0000000000000000 .text + b96
  0000000000b0  000200000001 R_X86_64_64       0000000000000000 .text + b96
  0000000000b8  000200000001 R_X86_64_64       0000000000000000 .text + b96
  0000000000c0  000200000001 R_X86_64_64       0000000000000000 .text + b96
  0000000000c8  000200000001 R_X86_64_64       0000000000000000 .text + b96
  0000000000d0  000200000001 R_X86_64_64       0000000000000000 .text + b96
  0000000000d8  000200000001 R_X86_64_64       0000000000000000 .text + b96
  0000000000e0  000200000001 R_X86_64_64       0000000000000000 .text + b96
  0000000000e8  000200000001 R_X86_64_64       0000000000000000 .text + b96
  0000000000f0  000200000001 R_X86_64_64       0000000000000000 .text + b96
  0000000000f8  000200000001 R_X86_64_64       0000000000000000 .text + b96
  000000000100  000200000001 R_X86_64_64       0000000000000000 .text + b96
  000000000108  000200000001 R_X86_64_64       0000000000000000 .text + b96
  000000000110  000200000001 R_X86_64_64       0000000000000000 .text + b96
  000000000118  000200000001 R_X86_64_64       0000000000000000 .text + b96
  000000000120  000200000001 R_X86_64_64       0000000000000000 .text + b96
  000000000128  000200000001 R_X86_64_64       0000000000000000 .text + b96
  000000000130  000200000001 R_X86_64_64       0000000000000000 .text + b96
  000000000138  000200000001 R_X86_64_64       0000000000000000 .text + 82f
  000000000140  000200000001 R_X86_64_64       0000000000000000 .text + 828

Since %eax was masked with 0x21, only the first two and the last two
entries are possible.

Objtool doesn't actually emulate all the code, so it isn't smart enough
to know that all the middle entries aren't reachable.  They point to the
NOP padding area after the end of the function, so objtool seg faulted
when it tried to dereference a NULL insn->func.

After this fix, objtool still gives an "unreachable" error because it
stops reading the jump table when it encounters the bad addresses:

  /home/jpoimboe/objtool-tests/adm1275.o: warning: objtool: adm1275_probe()+0x828: unreachable instruction

While the above code is technically correct, it's very wasteful of
memory -- it uses 34 jump table entries when only 4 are needed.  It's
also not possible for objtool to validate this type of switch table
because the unused entries point outside the function and objtool has no
way of determining if that's intentional.  Hopefully the Clang folks can
fix it.

Reported-by: Arnd Bergmann <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/a9db88eec4f1ca089e040989846961748238b6d8.1563413318.git.jpoimboe@redhat.com

objtool: Support repeated uses of the same C jump table

This fixes objtool for both a GCC issue and a Clang issue:

1) GCC issue:

   kernel/bpf/core.o: warning: objtool: ___bpf_prog_run()+0x8d5: sibling call from callable instruction with modified stack frame

   With CONFIG_RETPOLINE=n, GCC is doing the following optimization in
   ___bpf_prog_run().

   Before:

           select_insn:
                   jmp *jumptable(,%rax,8)
                   ...
           ALU64_ADD_X:
                   ...
                   jmp select_insn
           ALU_ADD_X:
                   ...
                   jmp select_insn

   After:

           select_insn:
                   jmp *jumptable(, %rax, 8)
                   ...
           ALU64_ADD_X:
                   ...
                   jmp *jumptable(, %rax, 8)
           ALU_ADD_X:
                   ...
                   jmp *jumptable(, %rax, 8)

   This confuses objtool.  It has never seen multiple indirect jump
   sites which use the same jump table.

   For GCC switch tables, the only way of detecting the size of a table
   is by continuing to scan for more tables.  The size of the previous
   table can only be determined after another switch table is found, or
   when the scan reaches the end of the function.

   That logic was reused for C jump tables, and was based on the
   assumption that each jump table only has a single jump site.  The
   above optimization breaks that assumption.

2) Clang issue:

   drivers/usb/misc/sisusbvga/sisusb.o: warning: objtool: sisusb_write_mem_bulk()+0x588: can't find switch jump table

   With clang 9, code can be generated where a function contains two
   indirect jump instructions which use the same switch table.

The fix is the same for both issues: split the jump table parsing into
two passes.

In the first pass, locate the heads of all switch tables for the
function and mark their locations.

In the second pass, parse the switch tables and add them.

Fixes: e55a73251da3 ("bpf: Fix ORC unwinding in non-JIT BPF code")
Reported-by: Randy Dunlap <[email protected]>
Reported-by: Arnd Bergmann <[email protected]>
Signed-off-by: Jann Horn <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/e995befaada9d4d8b2cf788ff3f566ba900d2b4d.1563413318.git.jpoimboe@redhat.com
Co-developed-by: Josh Poimboeuf <[email protected]>

objtool: Refactor jump table code

Now that C jump tables are supported, call them "jump tables" instead of
"switch tables". Also rename some other variables, add comments, and
simplify the code flow a bit.

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/cf951b0c0641628e0b9b81f7ceccd9bcabcb4bd8.1563413318.git.jpoimboe@redhat.com

objtool: Refactor sibling call detection logic

Simplify the sibling call detection logic a bit.

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/8357dbef9e7f5512e76bf83a76c81722fc09eb5e.1563413318.git.jpoimboe@redhat.com

objtool: Do frame pointer check before dead end check

Even calls to __noreturn functions need the frame pointer setup first.
Such functions often dump the stack.

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/aed62fbd60e239280218be623f751a433658e896.1563413318.git.jpoimboe@redhat.com

objtool: Change dead_end_function() to return boolean

dead_end_function() can no longer return an error. Simplify its
interface by making it return boolean.

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/9e6679610768fb6e6c51dca23f7d4d0c03b0c910.1563413318.git.jpoimboe@redhat.com

objtool: Warn on zero-length functions

All callable functions should have an ELF size.

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/03d429c4fa87829c61c5dc0e89652f4d9efb62f1.1563413318.git.jpoimboe@redhat.com

objtool: Refactor function alias logic

- Add an alias check in validate_functions().  With this change, aliases
  no longer need uaccess_safe set.

- Add an alias check in decode_instructions().  With this change, the
  "if (!insn->func)" check is no longer needed.

- Don't create aliases for zero-length functions, as it can have
  unexpected results.  The next patch will spit out a warning for
  zero-length functions anyway.

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/26a99c31426540f19c9a58b9e10727c385a147bc.1563413318.git.jpoimboe@redhat.com

objtool: Track original function across branches

If 'insn->func' is NULL, objtool skips some important checks, including
sibling call validation. So if some .fixup code does an invalid sibling
call, objtool ignores it.

Treat all code branches (including alts) as part of the original
function by keeping track of the original func value from
validate_functions().

This improves the usefulness of some clang function fallthrough
warnings, and exposes some additional kernel bugs in the process.

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/505df630f33c9717e1ccde6e4b64c5303135c25f.1563413318.git.jpoimboe@redhat.com

objtool: Add mcsafe_handle_tail() to the uaccess safe list

After an objtool improvement, it's reporting that __memcpy_mcsafe() is
calling mcsafe_handle_tail() with AC=1:

  arch/x86/lib/memcpy_64.o: warning: objtool: .fixup+0x13: call to mcsafe_handle_tail() with UACCESS enabled
  arch/x86/lib/memcpy_64.o: warning: objtool:   __memcpy_mcsafe()+0x34: (alt)
  arch/x86/lib/memcpy_64.o: warning: objtool:   __memcpy_mcsafe()+0xb: (branch)
  arch/x86/lib/memcpy_64.o: warning: objtool:   __memcpy_mcsafe()+0x0: <=== (func)

mcsafe_handle_tail() is basically an extension of __memcpy_mcsafe(), so
AC=1 is supposed to be set.  Add mcsafe_handle_tail() to the uaccess
safe list.

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/035c38f7eac845281d3c3d36749144982e06e58c.1563413318.git.jpoimboe@redhat.com

bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()

On x86-64, with CONFIG_RETPOLINE=n, GCC's "global common subexpression
elimination" optimization results in ___bpf_prog_run()'s jumptable code
changing from this:

select_insn:
jmp *jumptable(, %rax, 8)
...
ALU64_ADD_X:
...
jmp *jumptable(, %rax, 8)
ALU_ADD_X:
...
jmp *jumptable(, %rax, 8)

to this:

select_insn:
mov jumptable, %r12
jmp *(%r12, %rax, 8)
...
ALU64_ADD_X:
...
jmp *(%r12, %rax, 8)
ALU_ADD_X:
...
jmp *(%r12, %rax, 8)

The jumptable address is placed in a register once, at the beginning of
the function.  The function execution can then go through multiple
indirect jumps which rely on that same register value.  This has a few
issues:

1) Objtool isn't smart enough to be able to track such a register value
   across multiple recursive indirect jumps through the jump table.

2) With CONFIG_RETPOLINE enabled, this optimization actually results in
   a small slowdown.  I measured a ~4.7% slowdown in the test_bpf
   "tcpdump port 22" selftest.

   This slowdown is actually predicted by the GCC manual:

     Note: When compiling a program using computed gotos, a GCC
     extension, you may get better run-time performance if you
     disable the global common subexpression elimination pass by
     adding -fno-gcse to the command line.

So just disable the optimization for this function.

Fixes: e55a73251da3 ("bpf: Fix ORC unwinding in non-JIT BPF code")
Reported-by: Randy Dunlap <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/30c3ca29ba037afcbd860a8672eef0021addf9fe.1563413318.git.jpoimboe@redhat.com

x86/uaccess: Remove redundant CLACs in getuser/putuser error paths

The same getuser/putuser error paths are used regardless of whether AC
is set.  In non-exception failure cases, this results in an unnecessary
CLAC.

Fixes the following warnings:

  arch/x86/lib/getuser.o: warning: objtool: .altinstr_replacement+0x18: redundant UACCESS disable
  arch/x86/lib/putuser.o: warning: objtool: .altinstr_replacement+0x18: redundant UACCESS disable

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/bc14ded2755ae75bd9010c446079e113dbddb74b.1563413318.git.jpoimboe@redhat.com

x86/uaccess: Don't leak AC flag into fentry from mcsafe_handle_tail()

After adding mcsafe_handle_tail() to the objtool uaccess safe list,
objtool reports:

arch/x86/lib/usercopy_64.o: warning: objtool: mcsafe_handle_tail()+0x0: call to __fentry__() with UACCESS enabled

With SMAP, this function is called with AC=1, so it needs to be careful
about which functions it calls. Disable the ftrace entry hook, which
can potentially pull in a lot of extra code.

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/8e13d6f0da1c8a3f7603903da6cbf6d582bbfe10.1563413318.git.jpoimboe@redhat.com

x86/uaccess: Remove ELF function annotation from copy_user_handle_tail()

After an objtool improvement, it's complaining about the CLAC in
copy_user_handle_tail():

  arch/x86/lib/copy_user_64.o: warning: objtool: .altinstr_replacement+0x12: redundant UACCESS disable
  arch/x86/lib/copy_user_64.o: warning: objtool:   copy_user_handle_tail()+0x6: (alt)
  arch/x86/lib/copy_user_64.o: warning: objtool:   copy_user_handle_tail()+0x2: (alt)
  arch/x86/lib/copy_user_64.o: warning: objtool:   copy_user_handle_tail()+0x0: <=== (func)

copy_user_handle_tail() is incorrectly marked as a callable function, so
objtool is rightfully concerned about the CLAC with no corresponding
STAC.

Remove the ELF function annotation.  The copy_user_handle_tail() code
path is already verified by objtool because it's jumped to by other
callable asm code (which does the corresponding STAC).

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/6b6e436774678b4b9873811ff023bd29935bee5b.1563413318.git.jpoimboe@redhat.com

x86/head/64: Annotate start_cpu0() as non-callable

After an objtool improvement, it complains about the fact that
start_cpu0() jumps to code which has an LRET instruction.

arch/x86/kernel/head_64.o: warning: objtool: .head.text+0xe4: unsupported instruction in callable function

Technically, start_cpu0() is callable, but it acts nothing like a
callable function. Prevent objtool from treating it like one by
removing its ELF function annotation.

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/6b1b4505fcb90571a55fa1b52d71fb458ca24454.1563413318.git.jpoimboe@redhat.com

x86/entry: Fix thunk function ELF sizes

Fix the following warnings:

  arch/x86/entry/thunk_64.o: warning: objtool: trace_hardirqs_on_thunk() is missing an ELF size annotation
  arch/x86/entry/thunk_64.o: warning: objtool: trace_hardirqs_off_thunk() is missing an ELF size annotation
  arch/x86/entry/thunk_64.o: warning: objtool: lockdep_sys_exit_thunk() is missing an ELF size annotation

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/89c97adc9f6cc44a0f5d03cde6d0357662938909.1563413318.git.jpoimboe@redhat.com

x86/kvm: Don't call kvm_spurious_fault() from .fixup

After making a change to improve objtool's sibling call detection, it
started showing the following warning:

  arch/x86/kvm/vmx/nested.o: warning: objtool: .fixup+0x15: sibling call from callable instruction with modified stack frame

The problem is the ____kvm_handle_fault_on_reboot() macro.  It does a
fake call by pushing a fake RIP and doing a jump.  That tricks the
unwinder into printing the function which triggered the exception,
rather than the .fixup code.

Instead of the hack to make it look like the original function made the
call, just change the macro so that the original function actually does
make the call.  This allows removal of the hack, and also makes objtool
happy.

I triggered a vmx instruction exception and verified that the stack
trace is still sane:

  kernel BUG at arch/x86/kvm/x86.c:358!
  invalid opcode: 0000 [#1] SMP PTI
  CPU: 28 PID: 4096 Comm: qemu-kvm Not tainted 5.2.0+ #16
  Hardware name: Lenovo THINKSYSTEM SD530 -[7X2106Z000]-/-[7X2106Z000]-, BIOS -[TEE113Z-1.00]- 07/17/2017
  RIP: 0010:kvm_spurious_fault+0x5/0x10
  Code: 00 00 00 00 00 8b 44 24 10 89 d2 45 89 c9 48 89 44 24 10 8b 44 24 08 48 89 44 24 08 e9 d4 40 22 00 0f 1f 40 00 0f 1f 44 00 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
  RSP: 0018:ffffbf91c683bd00 EFLAGS: 00010246
  RAX: 000061f040000000 RBX: ffff9e159c77bba0 RCX: ffff9e15a5c87000
  RDX: 0000000665c87000 RSI: ffff9e15a5c87000 RDI: ffff9e159c77bba0
  RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9e15a5c87000
  R10: 0000000000000000 R11: fffff8f2d99721c0 R12: ffff9e159c77bba0
  R13: ffffbf91c671d960 R14: ffff9e159c778000 R15: 0000000000000000
  FS:  00007fa341cbe700(0000) GS:ffff9e15b7400000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fdd38356804 CR3: 00000006759de003 CR4: 00000000007606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   loaded_vmcs_init+0x4f/0xe0
   alloc_loaded_vmcs+0x38/0xd0
   vmx_create_vcpu+0xf7/0x600
   kvm_vm_ioctl+0x5e9/0x980
   ? __switch_to_asm+0x40/0x70
   ? __switch_to_asm+0x34/0x70
   ? __switch_to_asm+0x40/0x70
   ? __switch_to_asm+0x34/0x70
   ? free_one_page+0x13f/0x4e0
   do_vfs_ioctl+0xa4/0x630
   ksys_ioctl+0x60/0x90
   __x64_sys_ioctl+0x16/0x20
   do_syscall_64+0x55/0x1c0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7fa349b1ee5b

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/64a9b64d127e87b6920a97afde8e96ea76f6524e.1563413318.git.jpoimboe@redhat.com

x86/kvm: Replace vmx_vmenter()'s call to kvm_spurious_fault() with UD2

Objtool reports the following:

arch/x86/kvm/vmx/vmenter.o: warning: objtool: vmx_vmenter()+0x14: call without frame pointer save/setup

But frame pointers are necessarily broken anyway, because
__vmx_vcpu_run() clobbers RBP with the guest's value before calling
vmx_vmenter(). So calling without a frame pointer doesn't make things
any worse.

Make objtool happy by changing the call to a UD2.

Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Link: https://lkml.kernel.org/r/9fc2216c9dc972f95bb65ce2966a682c6bda1cb0.1563413318.git.jpoimboe@redhat.com

x86/kvm: Fix fastop function ELF metadata

Some of the fastop functions, e.g. em_setcc(), are actually just used as
global labels which point to blocks of functions.  The global labels are
incorrectly annotated as functions.  Also the functions themselves don't
have size annotations.

Fixes a bunch of warnings like the following:

  arch/x86/kvm/emulate.o: warning: objtool: seto() is missing an ELF size annotation
  arch/x86/kvm/emulate.o: warning: objtool: em_setcc() is missing an ELF size annotation
  arch/x86/kvm/emulate.o: warning: objtool: setno() is missing an ELF size annotation
  arch/x86/kvm/emulate.o: warning: objtool: setc() is missing an ELF size annotation

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/c8cc9be60ebbceb3092aa5dd91916039a1f88275.1563413318.git.jpoimboe@redhat.com

x86/paravirt: Fix callee-saved function ELF sizes

The __raw_callee_save_*() functions have an ELF symbol size of zero,
which confuses objtool and other tools.

Fixes a bunch of warnings like the following:

  arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_pte_val() is missing an ELF size annotation
  arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_pgd_val() is missing an ELF size annotation
  arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_make_pte() is missing an ELF size annotation
  arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_make_pgd() is missing an ELF size annotation

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/afa6d49bb07497ca62e4fc3b27a2d0cece545b4e.1563413318.git.jpoimboe@redhat.com

Merge tag 'wireless-drivers-for-davem-2019-07-18' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 5.3

First set of fixes for 5.3.

iwlwifi

* add new cards for 9000 and 20000 series and qu c-step devices

ath10k

* workaround an uninitialised variable warning

rt2x00

* fix rx queue hand on USB
====================

Signed-off-by: David S. Miller <[email protected]>

liquidio: Replace vmalloc + memset with vzalloc

Use vzalloc and vzalloc_node instead of using vmalloc and
vmalloc_node and then zeroing the allocated memory by
memset 0.
This simplifies the code.

Signed-off-by: Chuhong Yuan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

CIFS: fix deadlock in cached root handling

Prevent deadlock between open_shroot() and
cifs_mark_open_files_invalid() by releasing the lock before entering
SMB2_open, taking it again after and checking if we still need to use
the result.

Link: https://lore.kernel.org/linux-cifs/[email protected]/T/#u
Fixes: 3d4ef9a15343 ("smb3: fix redundant opens on root")
Signed-off-by: Aurelien Aptel <[email protected]>
Reviewed-by: Pavel Shilovsky <[email protected]>
Signed-off-by: Steve French <[email protected]>
CC: Stable <[email protected]>

Merge tag 'trace-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
"The main changes in this release include:

   - Add user space specific memory reading for kprobes

   - Allow kprobes to be executed earlier in boot

  The rest are mostly just various clean ups and small fixes"

* tag 'trace-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
  tracing: Make trace_get_fields() global
  tracing: Let filter_assign_type() detect FILTER_PTR_STRING
  tracing: Pass type into tracing_generic_entry_update()
  ftrace/selftest: Test if set_event/ftrace_pid exists before writing
  ftrace/selftests: Return the skip code when tracing directory not configured in kernel
  tracing/kprobe: Check registered state using kprobe
  tracing/probe: Add trace_event_call accesses APIs
  tracing/probe: Add probe event name and group name accesses APIs
  tracing/probe: Add trace flag access APIs for trace_probe
  tracing/probe: Add trace_event_file access APIs for trace_probe
  tracing/probe: Add trace_event_call register API for trace_probe
  tracing/probe: Add trace_probe init and free functions
  tracing/uprobe: Set print format when parsing command
  tracing/kprobe: Set print format right after parsed command
  kprobes: Fix to init kprobes in subsys_initcall
  tracepoint: Use struct_size() in kmalloc()
  ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS
  ftrace: Enable trampoline when rec count returns back to one
  tracing/kprobe: Do not run kprobe boot tests if kprobe_event is on cmdline
  tracing: Make a separate config for trace event self tests
  ...

Merge branch 'x86/debug' into core/urgent

Pick up the two pending objtool patches as the next round of objtool fixes
depend on them.

udp: Fix typo in net/ipv4/udp.c

Signed-off-by: Su Yanjun <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'for-linus-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb

Pull swiotlb updates from Konrad Rzeszutek Wilk:
"One compiler fix, and a bug-fix in swiotlb_nr_tbl() and
  swiotlb_max_segment() to check also for no_iotlb_memory"

* 'for-linus-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
  swiotlb: fix phys_addr_t overflow warning
  swiotlb: Return consistent SWIOTLB segments/nr_tbl
  swiotlb: Group identical cleanup in swiotlb_cleanup()

net: bcmgenet: use promisc for unsupported filters

Currently we silently ignore filters if we cannot meet the filter
requirements. This will lead to the MAC dropping packets that are
expected to pass. A better solution would be to set the NIC to promisc
mode when the required filters cannot be met.

Also correct the number of MDF filters supported. It should be 17,
not 16.

Signed-off-by: Justin Chen <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

SUNRPC: Optimise transport balancing code

Moves the balancing code to avoid doing cursor changes on every search
iteration.

Signed-off-by: Trond Myklebust <[email protected]>

SUNRPC: Ensure the bvecs are reset when we re-encode the RPC request

The bvec tracks the list of pages, so if the number of pages changes
due to a re-encode, we need to reset the bvec as well.

Fixes: 277e4ab7d530 ("SUNRPC: Simplify TCP receive code by switching...")
Signed-off-by: Trond Myklebust <[email protected]>
Cc: [email protected] # v4.20+

pnfs/flexfiles: Fix PTR_ERR() dereferences in ff_layout_track_ds_error

mirror->mirror_ds can be NULL if uninitialised, but can contain
a PTR_ERR() if call to GETDEVICEINFO failed.

Fixes: 65990d1afbd2 ("pNFS/flexfiles: Fix a deadlock on LAYOUTGET")
Signed-off-by: Trond Myklebust <[email protected]>
Cc: [email protected] # 4.10+

NFSv4: Don't use the zero stateid with layoutget

The NFSv4.1 protocol explicitly forbids us from using the zero stateid
together with layoutget, so when we see that nfs4_select_rw_stateid()
is unable to return a valid delegation, lock or open stateid, then
we should initiate recovery and retry.

Signed-off-by: Trond Myklebust <[email protected]>

Merge tag 'xfs-5.3-merge-13' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs cleanups from Darrick Wong:
"We had a few more lateish cleanup patches come in for 5.3 -- a couple
  of syncups with the userspace libxfs code and a conversion of the XFS
  administrator's guide to ReST format.

  Summary:

   - Bring fs/xfs/libxfs/xfs_trans_inode.c in sync with userspace
     libxfs.

   - Convert the xfs administrator guide to rst and move it into the
     official admin guide under Documentation"

* tag 'xfs-5.3-merge-13' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  Documentation: filesystem: Convert xfs.txt to ReST
  xfs: sync up xfs_trans_inode with userspace
  xfs: move xfs_trans_inode.c to libxfs/

Merge tag '4.3-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs updates from Steve French:
"Fixes (three for stable) and improvements including much faster
  encryption (SMB3.1.1 GCM)"

* tag '4.3-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: (27 commits)
  smb3: smbdirect no longer experimental
  cifs: fix crash in smb2_compound_op()/smb2_set_next_command()
  cifs: fix crash in cifs_dfs_do_automount
  cifs: fix parsing of symbolic link error response
  cifs: refactor and clean up arguments in the reparse point parsing
  SMB3: query inode number on open via create context
  smb3: Send netname context during negotiate protocol
  smb3: do not send compression info by default
  smb3: add new mount option to retrieve mode from special ACE
  smb3: Allow query of symlinks stored as reparse points
  cifs: Fix a race condition with cifs_echo_request
  cifs: always add credits back for unsolicited PDUs
  fs: cifs: cifsssmb: Change return type of convert_ace_to_cifs_ace
  add some missing definitions
  cifs: fix typo in debug message with struct field ia_valid
  smb3: minor cleanup of compound_send_recv
  CIFS: Fix module dependency
  cifs: simplify code by removing CONFIG_CIFS_ACL ifdef
  cifs: Fix check for matching with existing mount
  cifs: Properly handle auto disabling of serverino option
  ...

Merge tag 'ceph-for-5.3-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
"Lots of exciting things this time!

   - support for rbd object-map and fast-diff features (myself). This
     will speed up reads, discards and things like snap diffs on sparse
     images.

   - ceph.snap.btime vxattr to expose snapshot creation time (David
     Disseldorp). This will be used to integrate with "Restore Previous
     Versions" feature added in Windows 7 for folks who reexport ceph
     through SMB.

   - security xattrs for ceph (Zheng Yan). Only selinux is supported for
     now due to the limitations of ->dentry_init_security().

   - support for MSG_ADDR2, FS_BTIME and FS_CHANGE_ATTR features (Jeff
     Layton). This is actually a single feature bit which was missing
     because of the filesystem pieces. With this in, the kernel client
     will finally be reported as "luminous" by "ceph features" -- it is
     still being reported as "jewel" even though all required Luminous
     features were implemented in 4.13.

   - stop NULL-terminating ceph vxattrs (Jeff Layton). The convention
     with xattrs is to not terminate and this was causing
     inconsistencies with ceph-fuse.

   - change filesystem time granularity from 1 us to 1 ns, again fixing
     an inconsistency with ceph-fuse (Luis Henriques).

  On top of this there are some additional dentry name handling and cap
  flushing fixes from Zheng. Finally, Jeff is formally taking over for
  Zheng as the filesystem maintainer"

* tag 'ceph-for-5.3-rc1' of git://github.com/ceph/ceph-client: (71 commits)
  ceph: fix end offset in truncate_inode_pages_range call
  ceph: use generic_delete_inode() for ->drop_inode
  ceph: use ceph_evict_inode to cleanup inode's resource
  ceph: initialize superblock s_time_gran to 1
  MAINTAINERS: take over for Zheng as CephFS kernel client maintainer
  rbd: setallochint only if object doesn't exist
  rbd: support for object-map and fast-diff
  rbd: call rbd_dev_mapping_set() from rbd_dev_image_probe()
  libceph: export osd_req_op_data() macro
  libceph: change ceph_osdc_call() to take page vector for response
  libceph: bump CEPH_MSG_MAX_DATA_LEN (again)
  rbd: new exclusive lock wait/wake code
  rbd: quiescing lock should wait for image requests
  rbd: lock should be quiesced on reacquire
  rbd: introduce copyup state machine
  rbd: rename rbd_obj_setup_*() to rbd_obj_init_*()
  rbd: move OSD request allocation into object request state machines
  rbd: factor out __rbd_osd_setup_discard_ops()
  rbd: factor out rbd_osd_setup_copyup()
  rbd: introduce obj_req->osd_reqs list
  ...

Merge tag 'dax-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull dax updates from Dan Williams:
"The fruits of a bug hunt in the fsdax implementation with Willy and a
  small feature update for device-dax:

   - Fix a hang condition that started triggering after the Xarray
     conversion of fsdax in the v4.20 kernel.

   - Add a 'resource' (root-only physical base address) sysfs attribute
     to device-dax instances to correlate memory-blocks onlined via the
     kmem driver with a given device instance"

* tag 'dax-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: Fix missed wakeup with PMD faults
  device-dax: Add a 'resource' attribute

Merge tag 'libnvdimm-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
"Primarily just the virtio_pmem driver:

   - virtio_pmem

     The new virtio_pmem facility introduces a paravirtualized
     persistent memory device that allows a guest VM to use DAX
     mechanisms to access a host-file with host-page-cache. It arranges
     for MAP_SYNC to be disabled and instead triggers a host fsync()
     when a 'write-cache flush' command is sent to the virtual disk
     device.

   - Miscellaneous small fixups"

* tag 'libnvdimm-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  virtio_pmem: fix sparse warning
  xfs: disable map_sync for async flush
  ext4: disable map_sync for async flush
  dax: check synchronous mapping is supported
  dm: enable synchronous dax
  libnvdimm: add dax_dev sync flag
  virtio-pmem: Add virtio pmem driver
  libnvdimm: nd_region flush callback support
  libnvdimm, namespace: Drop uuid_t implementation detail

Merge tag 'linux-watchdog-5.3-rc1' of git://www.linux-watchdog.org/linux-watchdog

Pull watchdog updates from Wim Van Sebroeck:

- add Allwinner H6 watchdog

- drop warning after registering device patches

- hpwdt improvements

- gpio: add support for nowayout option

- introduce CONFIG_WATCHDOG_OPEN_TIMEOUT

- convert remaining drivers to use SPDX license identifier

- Fixes and improvements on several watchdog device drivers

* tag 'linux-watchdog-5.3-rc1' of git://www.linux-watchdog.org/linux-watchdog: (74 commits)
  watchdog: digicolor_wdt: Remove unused variable in dc_wdt_probe
  watchdog: ie6xx_wdt: Use spinlock_t instead of struct spinlock
  watchdog: atmel: atmel-sama5d4-wdt: Disable watchdog on system suspend
  watchdog: convert remaining drivers to use SPDX license identifier
  dt-bindings: watchdog: Rename bindings documentation file
  watchdog: mei_wdt: no need to check return value of debugfs_create functions
  watchdog: bcm_kona_wdt: no need to check return value of debugfs_create functions
  docs: watchdog: Fix build error.
  docs: watchdog: convert docs to ReST and rename to *.rst
  watchdog: make the device time out at open_deadline when open_timeout is used
  watchdog: introduce CONFIG_WATCHDOG_OPEN_TIMEOUT
  watchdog: introduce watchdog.open_timeout commandline parameter
  dt-bindings: watchdog: move i.MX system controller watchdog binding to SCU
  watchdog: imx_sc: Add pretimeout support
  watchdog: renesas_wdt: Add a few cycles delay
  watchdog: gpio: add support for nowayout option
  watchdog: renesas_wdt: Use 'dev' instead of dereferencing it repeatedly
  dt-bindings: watchdog: add Allwinner H6 watchdog
  watchdog: jz4740: Avoid starting watchdog in set_timeout
  watchdog: jz4740: Use register names from <linux/mfd/ingenic-tcu.h>
  ...

Merge tag 'sound-fix-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"A collection of small fixes.

   - The optimization of PM resume with HD-audio HDMI codecs, which
     eventually work around weird issues

   - A correction of Intel Icelake HDMI audio code

   - Quirks for Dell machines with Realtek HD-audio codecs

   - The fix for too long sequencer write stall that was spotted by
     syzkaller

   - A few trivial cleanups reported by coccinelle"

* tag 'sound-fix-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Don't resume forcibly i915 HDMI/DP codec
  ALSA: hda/hdmi - Fix i915 reverse port/pin mapping
  ALSA: hda/hdmi - Remove duplicated define
  ALSA: seq: Break too long mutex context in the write loop
  ALSA: hda/realtek: apply ALC891 headset fixup to one Dell machine
  ALSA: rme9652: Unneeded variable: "result".
  ALSA: emu10k1: Remove unneeded variable "change"
  ALSA: au88x0: Remove unneeded variable: "changed"
  ALSA: hda/realtek - Fixed Headphone Mic can't record on Dell platform
  ALSA: ps3: Remove Unneeded variable: "ret"
  ALSA: lx6464es: Remove unneeded variable err

Merge tag 'pm-5.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
"These modify the Intel RAPL driver to allow it to use an MMIO
  interface to the hardware, make the int340X thermal driver provide
  such an interface for it, add Intel Ice Lake CPU IDs to the RAPL
  driver (these changes depend on the previously merged x86 arch
  changes), update cpufreq to use the PM QoS framework for managing the
  min and max frequency limits, and add update the imx-cpufreq-dt
  cpufreq driver to support i.MX8MN.

  Specifics:

   - Add MMIO interface support to the Intel RAPL power capping driver
     and update the int340X thermal driver to provide a RAPL MMIO
     interface (Zhang Rui, Stephen Rothwell).

   - Add Intel Ice Lake CPU IDs to the RAPL driver (Zhang Rui, Rajneesh
     Bhardwaj).

   - Make cpufreq use the PM QoS framework (instead of notifiers) for
     managing the min and max frequency constraints (Viresh Kumar).

   - Add i.MX8MN support to the imx-cpufreq-dt cpufreq driver (Anson
     Huang)"

* tag 'pm-5.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (27 commits)
  cpufreq: Make cpufreq_generic_init() return void
  intel_rapl: need linux/cpuhotplug.h for enum cpuhp_state
  powercap/rapl: Add Ice Lake NNPI support to RAPL driver
  powercap/intel_rapl: add support for ICX-D
  powercap/intel_rapl: add support for ICX
  powercap/intel_rapl: add support for IceLake desktop
  intel_rapl: Fix module autoloading issue
  int340X/processor_thermal_device: add support for MMIO RAPL
  intel_rapl: support two power limits for every RAPL domain
  intel_rapl: support 64 bit register
  intel_rapl: abstract RAPL common code
  intel_rapl: cleanup hardcoded MSR access
  intel_rapl: cleanup some functions
  intel_rapl: abstract register access operations
  intel_rapl: abstract register address
  intel_rapl: introduce struct rapl_if_private
  intel_rapl: introduce intel_rapl.h
  intel_rapl: remove hardcoded register index
  intel_rapl: use reg instead of msr
  cpufreq: imx-cpufreq-dt: Add i.MX8MN support
  ...

Merge tag 'acpi-5.3-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more ACPI updates from Rafael Wysocki:
"These get rid of two clang warnings, add a new quirk mechanism to the
  ACPI backlight driver (and apply it to one machine) and update the
  table load object initialization in ACPICA (this is a replacement for
  a previously reverted ACPICA commit).

  Specifics:

   - Make ACPI table loading work more consistently regardless of the
     exact mechanism used for loading a table (Erik Schmauss).

   - Get rid of two clang warnings (Arnd Bergmann).

   - Add new quirk mechanism to the ACPI backlight driver and use it to
     add a quirk for PB Easynote MZ35 (Hans de Goede)"

* tag 'acpi-5.3-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: video: Add new hw_changes_brightness quirk, set it on PB Easynote MZ35
  ACPI: fix false-positive -Wuninitialized warning
  ACPI: blacklist: fix clang warning for unused DMI table
  ACPICA: Update table load object initialization

Merge branch 'floppy'

Merge floppy ioctl verification fixes from Denis Efremov.

This also marks the floppy driver as orphaned - it turns out that Jiri
no longer has working hardware.

Actual working physical floppy hardware is getting hard to find, and
while Willy was able to test this, I think the driver can be considered
pretty much dead from an actual hardware standpoint.  The hardware that
is still sold seems to be mainly USB-based, which doesn't use this
legacy driver at all.

The old floppy disk controller is still emulated in various VM
environments, so the driver isn't going away, but let's see if anybody
is interested to step up to maintain it.

The lack of hardware also likely means that the ioctl range verification
fixes are probably mostly relevant to anybody using floppies in a
virtual environment.  Which is probably also going away in favor of USB
storage emulation, but who knows.

Will Decon reviewed the patches but I'm not rebasing them just for that,
so I'll add a

Reviewed-by: Will Deacon <[email protected]>
here instead.

* floppy:
  MAINTAINERS: mark floppy.c orphaned
  floppy: fix out-of-bounds read in copy_buffer
  floppy: fix invalid pointer dereference in drive_name
  floppy: fix out-of-bounds read in next_valid_format
  floppy: fix div-by-zero in setup_format_params

MAINTAINERS: mark floppy.c orphaned

I volunteered myself to maintain it quite some time ago back when I
fixed the concurrency issues which exhibited itself only with
VM-emulated devices, and at the same time I still had the physical 3.5"
reader to test all the changes.

The reader doesn't work any more though, so I guess it's time to step
down from this super-prestigious role :p and mark floppy.c as Orphaned.

Signed-off-by: Jiri Kosina <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

riscv: fix build break after macro-to-function conversion in generic cacheflush.h

Commit c296d4dc13ae ("asm-generic: fix a compilation warning")
converted the various flush_*cache_* macros in
asm-generic/cacheflush.h to static inline functions. This breaks
RISC-V builds, since RISC-V's cacheflush.h includes the generic
cacheflush.h and then undefines the macros to be overridden.

Fix by copying the subset of the no-op functions that are reused from
the generic cacheflush.h into the RISC-V cacheflush.h, and dropping
the include of the generic cacheflush.h.

Fixes: c296d4dc13ae ("asm-generic: fix a compilation warning")
Signed-off-by: Paul Walmsley <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>

stacktrace: Force USER_DS for stack_trace_save_user()

When walking userspace stacks, USER_DS needs to be set, otherwise
access_ok() will not function as expected.

Reported-by: Vegard Nossum <[email protected]>
Reported-by: Eiichi Tsukata <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Vegard Nossum <[email protected]>
Reviewed-by: Joel Fernandes (Google) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

ALSA: line6: Fix wrong altsetting for LINE6_PODHD500_1

Commit 7b9584fa1c0b ("staging: line6: Move altsetting to properties")
set a wrong altsetting for LINE6_PODHD500_1 during refactoring.

Set the correct altsetting number to fix the issue.

BugLink: https://bugs.launchpad.net/bugs/1790595
Fixes: 7b9584fa1c0b ("staging: line6: Move altsetting to properties")
Signed-off-by: Kai-Heng Feng <[email protected]>
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

ALSA: hda - Optimize resume for codecs without jack detection

The codecs without jack detection also don't have to be resumed
forcibly because, obviously, they have no jack. Skip the forced
resume in such a case as optimization as well.

Reviewed-by: Kai Vehmanen <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

Merge branches 'acpi-misc' and 'acpi-video'

* acpi-misc:
  ACPI: fix false-positive -Wuninitialized warning
  ACPI: blacklist: fix clang warning for unused DMI table

* acpi-video:
  ACPI: video: Add new hw_changes_brightness quirk, set it on PB Easynote MZ35

Merge branch 'pm-cpufreq'

* pm-cpufreq:
  cpufreq: Make cpufreq_generic_init() return void
  cpufreq: imx-cpufreq-dt: Add i.MX8MN support
  cpufreq: Add QoS requests for userspace constraints
  cpufreq: intel_pstate: Reuse refresh_frequency_limits()
  cpufreq: Register notifiers with the PM QoS framework
  PM / QoS: Add support for MIN/MAX frequency constraints
  PM / QOS: Pass request type to dev_pm_qos_read_value()
  PM / QOS: Rename __dev_pm_qos_read_value() and dev_pm_qos_raw_read_value()
  PM / QOS: Pass request type to dev_pm_qos_{add|remove}_notifier()

padata: use smp_mb in padata_reorder to avoid orphaned padata jobs

Testing padata with the tcrypt module on a 5.2 kernel...

    # modprobe tcrypt alg="pcrypt(rfc4106(gcm(aes)))" type=3
    # modprobe tcrypt mode=211 sec=1

...produces this splat:

    INFO: task modprobe:10075 blocked for more than 120 seconds.
          Not tainted 5.2.0-base+ #16
    modprobe        D    0 10075  10064 0x80004080
    Call Trace:
     ? __schedule+0x4dd/0x610
     ? ring_buffer_unlock_commit+0x23/0x100
     schedule+0x6c/0x90
     schedule_timeout+0x3b/0x320
     ? trace_buffer_unlock_commit_regs+0x4f/0x1f0
     wait_for_common+0x160/0x1a0
     ? wake_up_q+0x80/0x80
     { crypto_wait_req }             # entries in braces added by hand
     { do_one_aead_op }
     { test_aead_jiffies }
     test_aead_speed.constprop.17+0x681/0xf30 [tcrypt]
     do_test+0x4053/0x6a2b [tcrypt]
     ? 0xffffffffa00f4000
     tcrypt_mod_init+0x50/0x1000 [tcrypt]
     ...

The second modprobe command never finishes because in padata_reorder,
CPU0's load of reorder_objects is executed before the unlocking store in
spin_unlock_bh(pd->lock), causing CPU0 to miss CPU1's increment:

CPU0                                 CPU1

padata_reorder                       padata_do_serial
  LOAD reorder_objects  // 0
                                       INC reorder_objects  // 1
                                       padata_reorder
                                         TRYLOCK pd->lock   // failed
  UNLOCK pd->lock

CPU0 deletes the timer before returning from padata_reorder and since no
other job is submitted to padata, modprobe waits indefinitely.

Add a pair of full barriers to guarantee proper ordering:

CPU0                                 CPU1

padata_reorder                       padata_do_serial
  UNLOCK pd->lock
  smp_mb()
  LOAD reorder_objects
                                       INC reorder_objects
                                       smp_mb__after_atomic()
                                       padata_reorder
                                         TRYLOCK pd->lock

smp_mb__after_atomic is needed so the read part of the trylock operation
comes after the INC, as Andrea points out.   Thanks also to Andrea for
help with writing a litmus test.

Fixes: 16295bec6398 ("padata: Generic parallelization/serialization interface")
Signed-off-by: Daniel Jordan <[email protected]>
Cc: <[email protected]>
Cc: Andrea Parri <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Herbert Xu <[email protected]>

crypto: ccp - Fix SEV_VERSION_GREATER_OR_EQUAL

SEV_VERSION_GREATER_OR_EQUAL() will fail if upgrading from 2.2 to 3.1, for
example, because the minor version is not equal to or greater than the
major.

Fix this and move to a static inline function for appropriate type
checking.

Fixes: edd303ff0e9e ("crypto: ccp - Add DOWNLOAD_FIRMWARE SEV command")
Reported-by: Cfir Cohen <[email protected]>
Signed-off-by: David Rientjes <[email protected]>
Acked-by: Tom Lendacky <[email protected]>
Acked-by: Gary R Hook <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

crypto: ccp/gcm - use const time tag comparison.

Avoid leaking GCM tag through timing side channel.

Fixes: 36cf515b9bbe ("crypto: ccp - Enable support for AES GCM on v5 CCPs")
Cc: <[email protected]> # v4.12+
Signed-off-by: Cfir Cohen <[email protected]>
Acked-by: Gary R Hook <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

SUNRPC: Fix up backchannel slot table accounting

Add a per-transport maximum limit in the socket case, and add
helpers to allow the NFSv4 code to discover that limit.

Signed-off-by: Trond Myklebust <[email protected]>

SUNRPC: Fix initialisation of struct rpc_xprt_switch

Ensure that we do initialise the fields xps_nactive, xps_queuelen
and xps_net.

Signed-off-by: Trond Myklebust <[email protected]>

Merge tag 'drm-misc-next-fixes-2019-07-11' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

Pull request for drm-misc-fixes-next for v5.3:
- Revert properties exposed in komeda that need improvement before they become ABI.
- Only add modes from the cmdline if they are valid.
- Add orientation quirk for GPD MicroPC.
- Reduce stack usage in drm selftests.
- Fix bochs framebuffer setup.

Signed-off-by: Dave Airlie <[email protected]>
From: Maarten Lankhorst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

xen: let alloc_xenballooned_pages() fail if not enough memory free

Instead of trying to allocate pages with GFP_USER in
add_ballooned_pages() check the available free memory via
si_mem_available(). GFP_USER is far less limiting memory exhaustion
than the test via si_mem_available().

This will avoid dom0 running out of memory due to excessive foreign
page mappings especially on ARM and on x86 in PVH mode, as those don't
have a pre-ballooned area which can be used for foreign mappings.

As the normal ballooning suffers from the same problem don't balloon
down more than si_mem_available() pages in one iteration. At the same
time limit the default maximum number of retries.

This is part of XSA-300.

Signed-off-by: Juergen Gross <[email protected]>

objtool: Rename elf_open() to prevent conflict with libelf from elftoolchain

The elftoolchain version of libelf has a function named elf_open().

The function name isn't quite accurate anyway, since it also reads all
the ELF data. Rename it to elf_read(), which is more accurate.

[ jpoimboe: rename to elf_read(); write commit description ]

Signed-off-by: Michael Forney <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/7ce2d1b35665edf19fd0eb6fbc0b17b81a48e62f.1562793604.git.jpoimboe@redhat.com

objtool: Use Elf_Scn typedef instead of assuming struct name

The libelf implementation might use a different struct name, and the
Elf_Scn typedef is already used throughout the rest of objtool.

Signed-off-by: Michael Forney <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/d270e1be2835fc2a10acf67535ff2ebd2145bf43.1562793448.git.jpoimboe@redhat.com

Merge tag 'perf-core-for-mingo-5.3-20190715' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

perf db-export:
  Adrian Hunter:
  - Improvements in how COMM details are exported to databases for
    post processing and use in the sql-viewer.py UI.

  - Export switch events to the database.

BPF:
  Arnaldo Carvalho de Melo:
  - Bump rlimit(MEMLOCK) for 'perf test bpf' and 'perf trace', just like
    selftests/bpf/bpf_rlimit.h do, which makes errors due to exhaustion of
    this limit, which are kinda cryptic (EPERM sometimes) less frequent.

perf version:
  Ravi Bangoria:
  - Fix segfault due to missing OPT_END(), noticed on PowerPC.

perf vendor events:
  Thomas Richter:
  - Add JSON files for IBM s/390 machine type 8561.

perf cs-etm (ARM):
  YueHaibing:
  - Fix two cases of error returns not bing done properly: Invalid ERR_PTR() use
    and loss of propagation error codes.

ipv6: rt6_check should return NULL if 'from' is NULL

Paul reported that l2tp sessions were broken after the commit referenced
in the Fixes tag. Prior to this commit rt6_check returned NULL if the
rt6_info 'from' was NULL - ie., the dst_entry was disconnected from a FIB
entry. Restore that behavior.

Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: Paul Donohue <[email protected]>
Tested-by: Paul Donohue <[email protected]>
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tipc: initialize 'validated' field of received packets

The tipc_msg_validate() function leaves a boolean flag 'validated' in
the validated buffer's control block, to avoid performing this action
more than once. However, at reception of new packets, the position of
this field may already have been set by lower layer protocols, so
that the packet is erroneously perceived as already validated by TIPC.

We fix this by initializing the said field to 'false' before performing
the initial validation.

Signed-off-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'ipv4-relax-source-validation-check-for-loopback-packets'

Cong Wang says:

====================
ipv4: relax source validation check for loopback packets

This patchset fixes a corner case when loopback packets get dropped
by rp_filter when we route them from veth to lo. Patch 1 is the fix
and patch 2 provides a simplified test case for this scenario.
====================

Signed-off-by: David S. Miller <[email protected]>

selftests: add a test case for rp_filter

Add a test case to simulate the loopback packet case fixed
in the previous patch.

This test gets passed after the fix:

IPv4 rp_filter tests
TEST: rp_filter passes local packets [ OK ]
TEST: rp_filter passes loopback packets [ OK ]

Cc: David Ahern <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

fib: relax source validation check for loopback packets

In a rare case where we redirect local packets from veth to lo,
these packets fail to pass the source validation when rp_filter
is turned on, as the tracing shows:

<...>-311708 [040] ..s1 7951180.957825: fib_table_lookup: table 254 oif 0 iif 1 src 10.53.180.130 dst 10.53.180.130 tos 0 scope 0 flags 0
<...>-311708 [040] ..s1 7951180.957826: fib_table_lookup_nh: nexthop dev eth0 oif 4 src 10.53.180.130

So, the fib table lookup returns eth0 as the nexthop even though
the packets are local and should be routed to loopback nonetheless,
but they can't pass the dev match check in fib_info_nh_uses_dev()
without this patch.

It should be safe to relax this check for this special case, as
normally packets coming out of loopback device still have skb_dst
so they won't even hit this slow path.

Cc: Julian Anastasov <[email protected]>
Cc: David Ahern <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'mlxsw-Two-fixes'

Ido Schimmel says:

====================
mlxsw: Two fixes

This patchset contains two fixes for mlxsw.

Patch #1 from Petr fixes an issue in which DSCP rewrite can occur even
if the egress port was switched to Trust L2 mode where priority mapping
is based on PCP.

Patch #2 fixes a problem where packets can be learned on a non-existing
FID if a tc filter with a redirect action is configured on a bridged
port. The problem and fix are explained in detail in the commit message.

Please consider both patches for 5.2.y
====================

Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum: Do not process learned records with a dummy FID

The switch periodically sends notifications about learned FDB entries.
Among other things, the notification includes the FID (Filtering
Identifier) and the port on which the MAC was learned.

In case the driver does not have the FID defined on the relevant port,
the following error will be periodically generated:

mlxsw_spectrum2 0000:06:00.0 swp32: Failed to find a matching {Port, VID} following FDB notification

This is not supposed to happen under normal conditions, but can happen
if an ingress tc filter with a redirect action is installed on a bridged
port. The redirect action will cause the packet's FID to be changed to
the dummy FID and a learning notification will be emitted with this FID
- which is not defined on the bridged port.

Fix this by having the driver ignore learning notifications generated
with the dummy FID and delete them from the device.

Another option is to chain an ignore action after the redirect action
which will cause the device to disable learning, but this means that we
need to consume another action whenever a redirect action is used. In
addition, the scenario described above is merely a corner case.

Fixes: cedbb8b25948 ("mlxsw: spectrum_flower: Set dummy FID before forward action")
Signed-off-by: Ido Schimmel <[email protected]>
Reported-by: Alex Kushnarov <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Tested-by: Alex Kushnarov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_dcb: Configure DSCP map as the last rule is removed

Spectrum systems use DSCP rewrite map to update DSCP field in egressing
packets to correspond to priority that the packet has. Whether rewriting
will take place is determined at the point when the packet ingresses the
switch: if the port is in Trust L3 mode, packet priority is determined from
the DSCP map at the port, and DSCP rewrite will happen. If the port is in
Trust L2 mode, 802.1p is used for packet prioritization, and no DSCP
rewrite will happen.

The driver determines the port trust mode based on whether any DSCP
prioritization rules are in effect at given port. If there are any, trust
level is L3, otherwise it's L2. When the last DSCP rule is removed, the
port is switched to trust L2. Under that scenario, if DSCP of a packet
should be rewritten, it should be rewritten to 0.

However, when switching to Trust L2, the driver neglects to also update the
DSCP rewrite map. The last DSCP rule thus remains in effect, and packets
egressing through this port, if they have the right priority, will have
their DSCP set according to this rule.

Fix by first configuring the rewrite map, and only then switching to trust
L2 and bailing out.

Fixes: b2b1dab6884e ("mlxsw: spectrum: Support ieee_setapp, ieee_delapp")
Signed-off-by: Petr Machata <[email protected]>
Reported-by: Alex Veber <[email protected]>
Tested-by: Alex Veber <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ag71xx: Add missing header

ag71xx uses devm_ioremap_nocache. This fixes usage of an implicit function

Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver")
Signed-off-by: Rosen Penev <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

floppy: fix out-of-bounds read in copy_buffer

This fixes a global out-of-bounds read access in the copy_buffer
function of the floppy driver.

The FDDEFPRM ioctl allows one to set the geometry of a disk.  The sect
and head fields (unsigned int) of the floppy_drive structure are used to
compute the max_sector (int) in the make_raw_rw_request function.  It is
possible to overflow the max_sector.  Next, max_sector is passed to the
copy_buffer function and used in one of the memcpy calls.

An unprivileged user could trigger the bug if the device is accessible,
but requires a floppy disk to be inserted.

The patch adds the check for the .sect * .head multiplication for not
overflowing in the set_geometry function.

The bug was found by syzkaller.

Signed-off-by: Denis Efremov <[email protected]>
Tested-by: Willy Tarreau <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

floppy: fix invalid pointer dereference in drive_name

This fixes the invalid pointer dereference in the drive_name function of
the floppy driver.

The native_format field of the struct floppy_drive_params is used as
floppy_type array index in the drive_name function.  Thus, the field
should be checked the same way as the autodetect field.

To trigger the bug, one could use a value out of range and set the drive
parameters with the FDSETDRVPRM ioctl.  Next, FDGETDRVTYP ioctl should
be used to call the drive_name.  A floppy disk is not required to be
inserted.

CAP_SYS_ADMIN is required to call FDSETDRVPRM.

The patch adds the check for a value of the native_format field to be in
the '0 <= x < ARRAY_SIZE(floppy_type)' range of the floppy_type array
indices.

The bug was found by syzkaller.

Signed-off-by: Denis Efremov <[email protected]>
Tested-by: Willy Tarreau <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

floppy: fix out-of-bounds read in next_valid_format

This fixes a global out-of-bounds read access in the next_valid_format
function of the floppy driver.

The values from autodetect field of the struct floppy_drive_params are
used as indices for the floppy_type array in the next_valid_format
function 'floppy_type[DP->autodetect[probed_format]].sect'.

To trigger the bug, one could use a value out of range and set the drive
parameters with the FDSETDRVPRM ioctl. A floppy disk is not required to
be inserted.

CAP_SYS_ADMIN is required to call FDSETDRVPRM.

The patch adds the check for values of the autodetect field to be in the
'0 <= x < ARRAY_SIZE(floppy_type)' range of the floppy_type array indices.

The bug was found by syzkaller.

Signed-off-by: Denis Efremov <[email protected]>
Tested-by: Willy Tarreau <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

floppy: fix div-by-zero in setup_format_params

This fixes a divide by zero error in the setup_format_params function of
the floppy driver.

Two consecutive ioctls can trigger the bug: The first one should set the
drive geometry with such .sect and .rate values for the F_SECT_PER_TRACK
to become zero.  Next, the floppy format operation should be called.

A floppy disk is not required to be inserted.  An unprivileged user
could trigger the bug if the device is accessible.

The patch checks F_SECT_PER_TRACK for a non-zero value in the
set_geometry function.  The proper check should involve a reasonable
upper limit for the .sect and .rate fields, but it could change the
UAPI.

The patch also checks F_SECT_PER_TRACK in the setup_format_params, and
cancels the formatting operation in case of zero.

The bug was found by syzkaller.

Signed-off-by: Denis Efremov <[email protected]>
Tested-by: Willy Tarreau <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

x86/mm, tracing: Fix CR2 corruption

Despite the current efforts to read CR2 before tracing happens there still
exist a number of possible holes:

  idtentry page_fault             do_page_fault           has_error_code=1
    call error_entry
      TRACE_IRQS_OFF
        call trace_hardirqs_off*
          #PF // modifies CR2

      CALL_enter_from_user_mode
        __context_tracking_exit()
          trace_user_exit(0)
            #PF // modifies CR2

    call do_page_fault
      address = read_cr2(); /* whoopsie */

And similar for i386.

Fix it by pulling the CR2 read into the entry code, before any of that
stuff gets a chance to run and ruin things.

Reported-by: He Zhe <[email protected]>
Reported-by: Eiichi Tsukata <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Debugged-by: Steven Rostedt <[email protected]>

x86/entry/64: Update comments and sanity tests for create_gap

Commit 2700fefdb2d9 ("x86_64: Add gap to int3 to allow for call
emulation") forgot to update the comment, do so now.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

x86/entry/64: Simplify idtentry a little

There's a bunch of duplication in idtentry, namely the
.Lfrom_usermode_switch_stack is a paranoid=0 copy of the normal flow.

Make this explicit by creating a idtentry_part helper macro.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

x86/entry/32: Simplify common_exception

Adding one more option to SAVE_ALL can be used in common_exception to
simplify things. This also saves duplication later where page_fault will no
longer use common_exception.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

x86/paravirt: Make read_cr2() CALLEE_SAVE

The one paravirt read_cr2() implementation (Xen) is actually quite trivial
and doesn't need to clobber anything other than the return register.

Making read_cr2() CALLEE_SAVE avoids all the PUSH/POP nonsense and allows
more convenient use from assembly.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

parisc: Wire up clone3 syscall

Signed-off-by: Helge Deller <[email protected]>
Tested-by: Sven Schnelle <[email protected]>
Acked-by: Christian Brauner <[email protected]>