Git Repo - linux.git/log

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
"A handful of fixes. I've been queuing them up a bit too long so the
  list is longer than it otherwise would have been spread out across a
  few -rcs.

  In general, it's a scattering of fixes across several platforms,
  nothing truly serious enough to point out.

  There's a slightly larger batch of them for the Davinci platforms due
  to work to bring them back to life after some time, so there's a
  handful of regressions, some of them going back very far, others more
  recent.

  There's also a few patches fixing DT on Renesas platforms since they
  changed some bindings without remaining backwards compatible,
  splitting up describing LVDS as a proper bridge instead of having it
  as part of the display unit.

  We could push for them to be backwards compatible with old device
  trees, but it's likely to regress eventually if nobody's actually
  using said compatibility"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (36 commits)
  ARM: davinci: board-dm646x-evm: set VPIF capture card name
  ARM: davinci: board-dm646x-evm: pass correct I2C adapter id for VPIF
  ARM: davinci: dm646x: fix timer interrupt generation
  ARM: keystone: fix platform_domain_notifier array overrun
  arm64: dts: exynos: Fix interrupt type for I2S1 device on Exynos5433
  ARM: dts: imx51-zii-rdu1: fix touchscreen bindings
  firmware: arm_scmi: Use after free in scmi_create_protocol_device()
  ARM: dts: cygnus: fix irq type for arm global timer
  Revert "ARM: dts: logicpd-som-lv: Fix pinmux controller references"
  tee: check shm references are consistent in offset/size
  tee: shm: fix use-after-free via temporarily dropped reference
  ARM: dts: imx7s: Pass the 'fsl,sec-era' property
  ARM: dts: tegra20: Revert "Fix ULPI regression on Tegra20"
  ARM: dts: correct missing "compatible" entry for ti81xx SoCs
  ARM: OMAP1: ams-delta: fix deferred_fiq handler
  arm64: tegra: Make BCM89610 PHY interrupt as active low
  ARM: davinci: fix GPIO lookup for I2C
  ARM: dts: logicpd-som-lv: Fix pinmux controller references
  ARM: dts: logicpd-som-lv: Fix Audio Mute
  ARM: dts: logicpd-som-lv: Fix WL127x Startup Issues
  ...

Merge tag 'tegra-for-4.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into fixes

arm64: tegra: Device tree fixes for v4.17

This contains a one-line update to the device tree of the Tegra186 P3310
processor module, fixing the polarity of the PHY interrupt. Originally,
this was queued to go into v4.18, but the PHY ID matching patch has now
found its way into v4.17-rc5, which means that the PHY driver will know
how to identify the PHY on this board and try to use the interrupt. This
will unfortunately cause networking to break on P3310, hence why I think
this should go into v4.17.

* tag 'tegra-for-4.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
arm64: tegra: Make BCM89610 PHY interrupt as active low

Signed-off-by: Olof Johansson <[email protected]>

bpf: Prevent memory disambiguation attack

Detect code patterns where malicious 'speculative store bypass' can be used
and sanitize such patterns.

39: (bf) r3 = r10
40: (07) r3 += -216
41: (79) r8 = *(u64 *)(r7 +0)   // slow read
42: (7a) *(u64 *)(r10 -72) = 0  // verifier inserts this instruction
43: (7b) *(u64 *)(r8 +0) = r3   // this store becomes slow due to r8
44: (79) r1 = *(u64 *)(r6 +0)   // cpu speculatively executes this load
45: (71) r2 = *(u8 *)(r1 +0)    // speculatively arbitrary 'load byte'
                                 // is now sanitized

Above code after x86 JIT becomes:
e5: mov    %rbp,%rdx
e8: add    $0xffffffffffffff28,%rdx
ef: mov    0x0(%r13),%r14
f3: movq   $0x0,-0x48(%rbp)
fb: mov    %rdx,0x0(%r14)
ff: mov    0x0(%rbx),%rdi
103: movzbq 0x0(%rdi),%rsi

Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>

ARM: fix kill( ,SIGFPE) breakage

Commit 7771c6645700 ("signal/arm: Document conflicts with SI_USER and
SIGFPE") broke the siginfo structure for userspace triggered signals,
causing the strace testsuite to regress. Fix this by eliminating
the FPE_FIXME definition (which is at the root of the breakage) and
use FPE_FLTINV instead for the case where the hardware appears to be
reporting nonsense.

Fixes: 7771c6645700 ("signal/arm: Document conflicts with SI_USER and SIGFPE")
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Russell King <[email protected]>

Merge tag 'dmaengine-fix-4.17-rc6' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fix from Vinod Koul:

- qcom bam runtime_pm fix

- email update for Vinod

* tag 'dmaengine-fix-4.17-rc6' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: qcom: bam_dma: check if the runtime pm enabled
dmaengine: Update email address for Vinod

mmap: relax file size limit for regular files

Commit be83bbf80682 ("mmap: introduce sane default mmap limits") was
introduced to catch problems in various ad-hoc character device drivers
doing mmap and getting the size limits wrong.  In the process, it used
"known good" limits for the normal cases of mapping regular files and
block device drivers.

It turns out that the "s_maxbytes" limit was less "known good" than I
thought.  In particular, /proc doesn't set it, but exposes one regular
file to mmap: /proc/vmcore.  As a result, that file got limited to the
default MAX_INT s_maxbytes value.

This went unnoticed for a while, because apparently the only thing that
needs it is the s390 kernel zfcpdump, but there might be other tools
that use this too.

Vasily suggested just changing s_maxbytes for all of /proc, which isn't
wrong, but makes me nervous at this stage.  So instead, just make the
new mmap limit always be MAX_LFS_FILESIZE for regular files, which won't
affect anything else.  It wasn't the regular file case I was worried
about.

I'd really prefer for maxsize to have been per-inode, but that is not
how things are today.

Fixes: be83bbf80682 ("mmap: introduce sane default mmap limits")
Reported-by: Vasily Gorbik <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

x86/MCE/AMD: Cache SMCA MISC block addresses

... into a global, two-dimensional array and service subsequent reads from
that cache to avoid rdmsr_on_cpu() calls during CPU hotplug (IPIs with IRQs
disabled).

In addition, this fixes a KASAN slab-out-of-bounds read due to wrong usage
of the bank->blocks pointer.

Fixes: 27bd59502702 ("x86/mce/AMD: Get address from already initialized block")
Reported-by: Johannes Hirte <[email protected]>
Tested-by: Johannes Hirte <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Yazen Ghannam <[email protected]>
Link: http://lkml.kernel.org/r/20180414004230.GA2033@probook

ARM: 8772/1: kprobes: Prohibit kprobes on get_user functions

Since do_undefinstr() uses get_user to get the undefined
instruction, it can be called before kprobes processes
recursive check. This can cause an infinit recursive
exception.
Prohibit probing on get_user functions.

Fixes: 24ba613c9d6c ("ARM kprobes: core code")
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: [email protected]
Signed-off-by: Russell King <[email protected]>

ARM: 8771/1: kprobes: Prohibit kprobes on do_undefinstr

Prohibit kprobes on do_undefinstr because kprobes on
arm is implemented by undefined instruction. This means
if we probe do_undefinstr(), it can cause infinit
recursive exception.

Fixes: 24ba613c9d6c ("ARM kprobes: core code")
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: [email protected]
Signed-off-by: Russell King <[email protected]>

ARM: 8770/1: kprobes: Prohibit probing on optimized_callback

Prohibit probing on optimized_callback() because
it is called from kprobes itself. If we put a kprobes
on it, that will cause a recursive call loop.
Mark it NOKPROBE_SYMBOL.

Fixes: 0dc016dbd820 ("ARM: kprobes: enable OPTPROBES for ARM 32")
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: [email protected]
Signed-off-by: Russell King <[email protected]>

ARM: 8769/1: kprobes: Fix to use get_kprobe_ctlblk after irq-disabed

Since get_kprobe_ctlblk() uses smp_processor_id() to access
per-cpu variable, it hits smp_processor_id sanity check as below.

[    7.006928] BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
[    7.007859] caller is debug_smp_processor_id+0x20/0x24
[    7.008438] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1-00192-g4eb17253e4b5 #1
[    7.008890] Hardware name: Generic DT based system
[    7.009917] [<c0313f0c>] (unwind_backtrace) from [<c030e6d8>] (show_stack+0x20/0x24)
[    7.010473] [<c030e6d8>] (show_stack) from [<c0c64694>] (dump_stack+0x84/0x98)
[    7.010990] [<c0c64694>] (dump_stack) from [<c071ca5c>] (check_preemption_disabled+0x138/0x13c)
[    7.011592] [<c071ca5c>] (check_preemption_disabled) from [<c071ca80>] (debug_smp_processor_id+0x20/0x24)
[    7.012214] [<c071ca80>] (debug_smp_processor_id) from [<c03335e0>] (optimized_callback+0x2c/0xe4)
[    7.013077] [<c03335e0>] (optimized_callback) from [<bf0021b0>] (0xbf0021b0)

To fix this issue, call get_kprobe_ctlblk() right after
irq-disabled since that disables preemption.

Fixes: 0dc016dbd820 ("ARM: kprobes: enable OPTPROBES for ARM 32")
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: [email protected]
Signed-off-by: Russell King <[email protected]>

ARM: replace unnecessary perl with sed and the shell $(( )) operator

You can build a kernel in a cross compiling environment that doesn't
have perl in the $PATH. Commit 429f7a062e3b broke that for 32 bit
ARM. Fix it.

As reported by Stephen Rothwell, it appears that the symbols can be
either part of the BSS section or absolute symbols depending on the
binutils version. When they're an absolute symbol, the $(( ))
operator errors out and the build fails. Fix this as well.

Fixes: 429f7a062e3b ("ARM: decompressor: fix BSS size calculation")
Reported-by: Rob Landley <[email protected]>
Reported-by: Stephen Rothwell <[email protected]>
Acked-by: Rob Landley <[email protected]>
Signed-off-by: Russell King <[email protected]>

ARM: kexec: record parent context registers for non-crash CPUs

How we got to machine_crash_nonpanic_core() (iow, from an IPI, etc) is
not interesting for debugging a crash. The more interesting context
is the parent context prior to the IPI being received.

Record the parent context register state rather than the register state
in machine_crash_nonpanic_core(), which is more relevant to the failing
condition.

Signed-off-by: Russell King <[email protected]>

ARM: kexec: fix kdump register saving on panic()

When a panic() occurs, the kexec code uses smp_send_stop() to stop
the other CPUs, but this results in the CPU register state not being
saved, and gdb is unable to inspect the state of other CPUs.

Commit 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump
friendly version in panic path") addressed the issue on x86, but
ignored other architectures. Address the issue on ARM by splitting
out the crash stop implementation to crash_smp_send_stop() and
adding the necessary protection.

Signed-off-by: Russell King <[email protected]>

ARM: 8758/1: decompressor: restore r1 and r2 just before jumping to the kernel

The hypervisor setup before __enter_kernel destroys the value
sotred in r1. The value needs to be restored just before the jump.

Fixes: 6b52f7bdb888 ("ARM: hyp-stub: Use r1 for the soft-restart address")
Signed-off-by: Łukasz Stelmach <[email protected]>
Signed-off-by: Russell King <[email protected]>

ARM: 8753/1: decompressor: add a missing parameter to the addruart macro

In commit 639da5ee374b ("ARM: add an extra temp register to the low
level debugging addruart macro") an additional temporary register was
added to the addruart macro, but the decompressor code wasn't updated.

Fixes: 639da5ee374b ("ARM: add an extra temp register to the low level debugging addruart macro")
Signed-off-by: Łukasz Stelmach <[email protected]>
Signed-off-by: Russell King <[email protected]>

x86/mm: Drop TS_COMPAT on 64-bit exec() syscall

The x86 mmap() code selects the mmap base for an allocation depending on
the bitness of the syscall. For 64bit sycalls it select mm->mmap_base and
for 32bit mm->mmap_compat_base.

exec() calls mmap() which in turn uses in_compat_syscall() to check whether
the mapping is for a 32bit or a 64bit task. The decision is made on the
following criteria:

  ia32    child->thread.status & TS_COMPAT
   x32    child->pt_regs.orig_ax & __X32_SYSCALL_BIT
  ia64    !ia32 && !x32

__set_personality_x32() was dropping TS_COMPAT flag, but
set_personality_64bit() has kept compat syscall flag making
in_compat_syscall() return true during the first exec() syscall.

Which in result has user-visible effects, mentioned by Alexey:
1) It breaks ASAN
$ gcc -fsanitize=address wrap.c -o wrap-asan
$ ./wrap32 ./wrap-asan true
==1217==Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING.
==1217==ASan shadow was supposed to be located in the [0x00007fff7000-0x10007fff7fff] range.
==1217==Process memory map follows:
        0x000000400000-0x000000401000   /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
        0x000000600000-0x000000601000   /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
        0x000000601000-0x000000602000   /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
        0x0000f7dbd000-0x0000f7de2000   /lib64/ld-2.27.so
        0x0000f7fe2000-0x0000f7fe3000   /lib64/ld-2.27.so
        0x0000f7fe3000-0x0000f7fe4000   /lib64/ld-2.27.so
        0x0000f7fe4000-0x0000f7fe5000
        0x7fed9abff000-0x7fed9af54000
        0x7fed9af54000-0x7fed9af6b000   /lib64/libgcc_s.so.1
[snip]

2) It doesn't seem to be great for security if an attacker always knows
that ld.so is going to be mapped into the first 4GB in this case
(the same thing happens for PIEs as well).

The testcase:
$ cat wrap.c

int main(int argc, char *argv[]) {
  execvp(argv[1], &argv[1]);
  return 127;
}

$ gcc wrap.c -o wrap
$ LD_SHOW_AUXV=1 ./wrap ./wrap true |& grep AT_BASE
AT_BASE:         0x7f63b8309000
AT_BASE:         0x7faec143c000
AT_BASE:         0x7fbdb25fa000

$ gcc -m32 wrap.c -o wrap32
$ LD_SHOW_AUXV=1 ./wrap32 ./wrap true |& grep AT_BASE
AT_BASE:         0xf7eff000
AT_BASE:         0xf7cee000
AT_BASE:         0x7f8b9774e000

Fixes: 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()")
Fixes: ada26481dfe6 ("x86/mm: Make in_compat_syscall() work during exec")
Reported-by: Alexey Izbyshev <[email protected]>
Bisected-by: Alexander Monakov <[email protected]>
Investigated-by: Andy Lutomirski <[email protected]>
Signed-off-by: Dmitry Safonov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Cyrill Gorcunov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Alexander Monakov <[email protected]>
Cc: Dmitry Safonov <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Andy Lutomirski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

objtool: Detect RIP-relative switch table references, part 2

With the following commit:

  fd35c88b7417 ("objtool: Support GCC 8 switch tables")

I added a "can't find switch jump table" warning, to stop covering up
silent failures if add_switch_table() can't find anything.

That warning found yet another bug in the objtool switch table detection
logic.  For cases 1 and 2 (as described in the comments of
find_switch_table()), the find_symbol_containing() check doesn't adjust
the offset for RIP-relative switch jumps.

Incidentally, this bug was already fixed for case 3 with:

  6f5ec2993b1f ("objtool: Detect RIP-relative switch table references")

However, that commit missed the fix for cases 1 and 2.

The different cases are now starting to look more and more alike.  So
fix the bug by consolidating them into a single case, by checking the
original dynamic jump instruction in the case 3 loop.

This also simplifies the code and makes it more robust against future
switch table detection issues -- of which I'm sure there will be many...

Switch table detection has been the most fragile area of objtool, by
far.  I long for the day when we'll have a GCC plugin for annotating
switch tables.  Linus asked me to delay such a plugin due to the
flakiness of the plugin infrastructure in older versions of GCC, so this
rickety code is what we're stuck with for now.  At least the code is now
a little simpler than it was.

Reported-by: kbuild test robot <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/f400541613d45689086329432f3095119ffbc328.1526674218.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <[email protected]>

efi/libstub/arm64: Handle randomized TEXT_OFFSET

When CONFIG_RANDOMIZE_TEXT_OFFSET=y, TEXT_OFFSET is an arbitrary
multiple of PAGE_SIZE in the interval [0, 2MB).

The EFI stub does not account for the potential misalignment of
TEXT_OFFSET relative to EFI_KIMG_ALIGN, and produces a randomized
physical offset which is always a round multiple of EFI_KIMG_ALIGN.
This may result in statically allocated objects whose alignment exceeds
PAGE_SIZE to appear misaligned in memory. This has been observed to
result in spurious stack overflow reports and failure to make use of
the IRQ stacks, and theoretically could result in a number of other
issues.

We can OR in the low bits of TEXT_OFFSET to ensure that we have the
necessary offset (and hence preserve the misalignment of TEXT_OFFSET
relative to EFI_KIMG_ALIGN), so let's do that.

Reported-by: Kim Phillips <[email protected]>
Tested-by: Kim Phillips <[email protected]>
[ardb: clarify comment and commit log, drop unneeded parens]
Signed-off-by: Mark Rutland <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Fixes: 6f26b3671184c36d ("arm64: kaslr: increase randomization granularity")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"10 fixes"

* emailed patches from Andrew Morton <[email protected]>:
  hfsplus: stop workqueue when fill_super() failed
  mm: don't allow deferred pages with NEED_PER_CPU_KM
  MAINTAINERS: add Q: entry to kselftest for patchwork project
  radix tree: fix multi-order iteration race
  radix tree test suite: multi-order iteration race
  radix tree test suite: add item_delete_rcu()
  radix tree test suite: fix compilation issue
  radix tree test suite: fix mapshift build target
  include/linux/mm.h: add new inline function vmf_error()
  lib/test_bitmap.c: fix bitmap optimisation tests to report errors correctly

Merge tag 'platform-drivers-x86-v4.17-3' of git://git.infradead.org/linux-platform-drivers-x86

Pull x86 platform driver fix from Darren Hart:
"Remove the last of the "select DELL_SMBIOS" references in the Kconfig"

* tag 'platform-drivers-x86-v4.17-3' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: DELL_WMI use depends on instead of select for DELL_SMBIOS

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:

- a modified revert of a patch that made new choices come out for a
   couple stm32 clk drivers that really always need to be there when
   that particular machine is compiled in

- boot fix on i.MX for Stefan who noticed odd behavior from the
   critical flag patch that came in during the merge window

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: stm32: fix: stm32 clock drivers are not compiled by default
  clk: imx6ull: use OSC clock during AXI rate change

Merge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"A bunch of driver bugfixes and a MAINTAINERS addition"

* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  MAINTAINERS: add entry for STM32 I2C driver
  i2c: viperboard: return message count on master_xfer success
  i2c: pmcmsp: fix error return from master_xfer
  i2c: pmcmsp: return message count on master_xfer success
  i2c: designware: fix poll-after-enable regression
  eeprom: at24: fix retrieving the at24_chip_data structure
  i2c: core: ACPI: Log device not acking errors at dbg loglevel
  i2c: core: ACPI: Improve OpRegion read errors

hfsplus: stop workqueue when fill_super() failed

syzbot is reporting ODEBUG messages at hfsplus_fill_super() [1]. This
is because hfsplus_fill_super() forgot to call cancel_delayed_work_sync().

As far as I can see, it is hfsplus_mark_mdb_dirty() from
hfsplus_new_inode() in hfsplus_fill_super() that calls
queue_delayed_work(). Therefore, I assume that hfsplus_new_inode() does
not fail if queue_delayed_work() was called, and the out_put_hidden_dir
label is the appropriate location to call cancel_delayed_work_sync().

[1] https://syzkaller.appspot.com/bug?id=a66f45e96fdbeb76b796bf46eb25ea878c42a6c9

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Tetsuo Handa <[email protected]>
Reported-by: syzbot <[email protected]>
Cc: Al Viro <[email protected]>
Cc: David Howells <[email protected]>
Cc: Ernesto A. Fernandez <[email protected]>
Cc: Vyacheslav Dubeyko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: don't allow deferred pages with NEED_PER_CPU_KM

It is unsafe to do virtual to physical translations before mm_init() is
called if struct page is needed in order to determine the memory section
number (see SECTION_IN_PAGE_FLAGS).  This is because only in mm_init()
we initialize struct pages for all the allocated memory when deferred
struct pages are used.

My recent fix in commit c9e97a1997 ("mm: initialize pages on demand
during boot") exposed this problem, because it greatly reduced number of
pages that are initialized before mm_init(), but the problem existed
even before my fix, as Fengguang Wu found.

Below is a more detailed explanation of the problem.

We initialize struct pages in four places:

1. Early in boot a small set of struct pages is initialized to fill the
   first section, and lower zones.

2. During mm_init() we initialize "struct pages" for all the memory that
   is allocated, i.e reserved in memblock.

3. Using on-demand logic when pages are allocated after mm_init call
   (when memblock is finished)

4. After smp_init() when the rest free deferred pages are initialized.

The problem occurs if we try to do va to phys translation of a memory
between steps 1 and 2.  Because we have not yet initialized struct pages
for all the reserved pages, it is inherently unsafe to do va to phys if
the translation itself requires access of "struct page" as in case of
this combination: CONFIG_SPARSE && !CONFIG_SPARSE_VMEMMAP

The following path exposes the problem:

  start_kernel()
   trap_init()
    setup_cpu_entry_areas()
     setup_cpu_entry_area(cpu)
      get_cpu_gdt_paddr(cpu)
       per_cpu_ptr_to_phys(addr)
        pcpu_addr_to_page(addr)
         virt_to_page(addr)
          pfn_to_page(__pa(addr) >> PAGE_SHIFT)

We disable this path by not allowing NEED_PER_CPU_KM with deferred
struct pages feature.

The problems are discussed in these threads:
  http://lkml.kernel.org/r/20180418135300 [email protected]
  http://lkml.kernel.org/r/20180419013128 [email protected]
  http://lkml.kernel.org/r/20180426202619 [email protected]

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Signed-off-by: Pavel Tatashin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Steven Sistare <[email protected]>
Cc: Daniel Jordan <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Fengguang Wu <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

MAINTAINERS: add Q: entry to kselftest for patchwork project

A new patchwork project is created to track kselftest patches. Update
the kselftest entry in the MAINTAINERS file adding 'Q:' entry:

https://patchwork.kernel.org/project/linux-kselftest/list/

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Shuah Khan (Samsung OSG) <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

radix tree: fix multi-order iteration race

Fix a race in the multi-order iteration code which causes the kernel to
hit a GP fault.  This was first seen with a production v4.15 based
kernel (4.15.6-300.fc27.x86_64) utilizing a DAX workload which used
order 9 PMD DAX entries.

The race has to do with how we tear down multi-order sibling entries
when we are removing an item from the tree.  Remember for example that
an order 2 entry looks like this:

  struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]

where 'entry' is in some slot in the struct radix_tree_node, and the
three slots following 'entry' contain sibling pointers which point back
to 'entry.'

When we delete 'entry' from the tree, we call :

  radix_tree_delete()
    radix_tree_delete_item()
      __radix_tree_delete()
        replace_slot()

replace_slot() first removes the siblings in order from the first to the
last, then at then replaces 'entry' with NULL.  This means that for a
brief period of time we end up with one or more of the siblings removed,
so:

  struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]

This causes an issue if you have a reader iterating over the slots in
the tree via radix_tree_for_each_slot() while only under
rcu_read_lock()/rcu_read_unlock() protection.  This is a common case in
mm/filemap.c.

The issue is that when __radix_tree_next_slot() => skip_siblings() tries
to skip over the sibling entries in the slots, it currently does so with
an exact match on the slot directly preceding our current slot.
Normally this works:

                                      V preceding slot
  struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
                                              ^ current slot

This lets you find the first sibling, and you skip them all in order.

But in the case where one of the siblings is NULL, that slot is skipped
and then our sibling detection is interrupted:

                                             V preceding slot
  struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
                                                    ^ current slot

This means that the sibling pointers aren't recognized since they point
all the way back to 'entry', so we think that they are normal internal
radix tree pointers.  This causes us to think we need to walk down to a
struct radix_tree_node starting at the address of 'entry'.

In a real running kernel this will crash the thread with a GP fault when
you try and dereference the slots in your broken node starting at
'entry'.

We fix this race by fixing the way that skip_siblings() detects sibling
nodes.  Instead of testing against the preceding slot we instead look
for siblings via is_sibling_entry() which compares against the position
of the struct radix_tree_node.slots[] array.  This ensures that sibling
entries are properly identified, even if they are no longer contiguous
with the 'entry' they point to.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 148deab223b2 ("radix-tree: improve multiorder iterators")
Signed-off-by: Ross Zwisler <[email protected]>
Reported-by: CR, Sapthagirish <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

radix tree test suite: multi-order iteration race

Add a test which shows a race in the multi-order iteration code.  This
test reliably hits the race in under a second on my machine, and is the
result of a real bug report against kernel a production v4.15 based
kernel (4.15.6-300.fc27.x86_64).  With a real kernel this issue is hit
when using order 9 PMD DAX radix tree entries.

The race has to do with how we tear down multi-order sibling entries
when we are removing an item from the tree.  Remember that an order 2
entry looks like this:

  struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]

where 'entry' is in some slot in the struct radix_tree_node, and the
three slots following 'entry' contain sibling pointers which point back
to 'entry.'

When we delete 'entry' from the tree, we call :

  radix_tree_delete()
    radix_tree_delete_item()
      __radix_tree_delete()
        replace_slot()

replace_slot() first removes the siblings in order from the first to the
last, then at then replaces 'entry' with NULL.  This means that for a
brief period of time we end up with one or more of the siblings removed,
so:

  struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]

This causes an issue if you have a reader iterating over the slots in
the tree via radix_tree_for_each_slot() while only under
rcu_read_lock()/rcu_read_unlock() protection.  This is a common case in
mm/filemap.c.

The issue is that when __radix_tree_next_slot() => skip_siblings() tries
to skip over the sibling entries in the slots, it currently does so with
an exact match on the slot directly preceding our current slot.
Normally this works:

                                      V preceding slot
  struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
                                              ^ current slot

This lets you find the first sibling, and you skip them all in order.

But in the case where one of the siblings is NULL, that slot is skipped
and then our sibling detection is interrupted:

                                             V preceding slot
  struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
                                                    ^ current slot

This means that the sibling pointers aren't recognized since they point
all the way back to 'entry', so we think that they are normal internal
radix tree pointers.  This causes us to think we need to walk down to a
struct radix_tree_node starting at the address of 'entry'.

In a real running kernel this will crash the thread with a GP fault when
you try and dereference the slots in your broken node starting at
'entry'.

In the radix tree test suite this will be caught by the address
sanitizer:

  ==27063==ERROR: AddressSanitizer: heap-buffer-overflow on address
  0x60c0008ae400 at pc 0x00000040ce4f bp 0x7fa89b8fcad0 sp 0x7fa89b8fcac0
  READ of size 8 at 0x60c0008ae400 thread T3
      #0 0x40ce4e in __radix_tree_next_slot /home/rzwisler/project/linux/tools/testing/radix-tree/radix-tree.c:1660
      #1 0x4022cc in radix_tree_next_slot linux/../../../../include/linux/radix-tree.h:567
      #2 0x4022cc in iterator_func /home/rzwisler/project/linux/tools/testing/radix-tree/multiorder.c:655
      #3 0x7fa8a088d50a in start_thread (/lib64/libpthread.so.0+0x750a)
      #4 0x7fa8a03bd16e in clone (/lib64/libc.so.6+0xf516e)

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ross Zwisler <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: CR, Sapthagirish <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

radix tree test suite: add item_delete_rcu()

Currently the lifetime of "struct item" entries in the radix tree are
not controlled by RCU, but are instead deleted inline as they are
removed from the tree.

In the following patches we add a test which has threads iterating over
items pulled from the tree and verifying them in an
rcu_read_lock()/rcu_read_unlock() section. This means that though an
item has been removed from the tree it could still be being worked on by
other threads until the RCU grace period expires. So, we need to
actually free the "struct item" structures at the end of the grace
period, just as we do with "struct radix_tree_node" items.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ross Zwisler <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: CR, Sapthagirish <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

radix tree test suite: fix compilation issue

Pulled from a patch from Matthew Wilcox entitled "xarray: Add definition
of struct xarray":

> From: Matthew Wilcox <[email protected]>
> Signed-off-by: Matthew Wilcox <[email protected]>

  https://patchwork.kernel.org/patch/10341249/

These defines fix this compilation error:

  In file included from ./linux/radix-tree.h:6:0,
                   from ./linux/../../../../include/linux/idr.h:15,
                   from ./linux/idr.h:1,
                   from idr.c:4:
  ./linux/../../../../include/linux/idr.h: In function `idr_init_base':
  ./linux/../../../../include/linux/radix-tree.h:129:2: warning: implicit declaration of function `spin_lock_init'; did you mean `spinlock_t'? [-Wimplicit-function-declaration]
    spin_lock_init(&(root)->xa_lock);    \
    ^
  ./linux/../../../../include/linux/idr.h:126:2: note: in expansion of macro `INIT_RADIX_TREE'
    INIT_RADIX_TREE(&idr->idr_rt, IDR_RT_MARKER);
    ^~~~~~~~~~~~~~~

by providing a spin_lock_init() wrapper for the v4.17-rc* version of the
radix tree test suite.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ross Zwisler <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: CR, Sapthagirish <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

radix tree test suite: fix mapshift build target

Commit c6ce3e2fe3da ("radix tree test suite: Add config option for map
shift") introduced a phony makefile target called 'mapshift' that ends
up generating the file generated/map-shift.h.  This phony target was
then added as a dependency of the top level 'targets' build target,
which is what is run when you go to tools/testing/radix-tree and just
type 'make'.

Unfortunately, this phony target doesn't actually work as a dependency,
so you end up getting:

  $ make
  make: *** No rule to make target 'generated/map-shift.h', needed by 'main.o'.  Stop.
  make: *** Waiting for unfinished jobs....

Fix this by making the file generated/map-shift.h our real makefile
target, and add this a dependency of the top level build target.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ross Zwisler <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: CR, Sapthagirish <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

include/linux/mm.h: add new inline function vmf_error()

Many places in drivers/ file systems, error was handled in a common way
like below:

ret = (ret == -ENOMEM) ? VM_FAULT_OOM : VM_FAULT_SIGBUS;

vmf_error() will replace this and return vm_fault_t type err.

A lot of drivers and filesystems currently have a rather complex mapping
of errno-to-VM_FAULT code. We have been able to eliminate a lot of it
by just returning VM_FAULT codes directly from functions which are
called exclusively from the fault handling path.

Some functions can be called both from the fault handler and other
context which are expecting an errno, so they have to continue to return
an errno. Some users still need to choose different behaviour for
different errnos, but vmf_error() captures the essential error
translation that's common to all users, and those that need to handle
additional errors can handle them first.

Link: http://lkml.kernel.org/r/20180510174826.GA14268@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <[email protected]>
Reviewed-by: Matthew Wilcox <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/test_bitmap.c: fix bitmap optimisation tests to report errors correctly

I had neglected to increment the error counter when the tests failed,
which made the tests noisy when they fail, but not actually return an
error code.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 3cc78125a081 ("lib/test_bitmap.c: add optimisation tests")
Signed-off-by: Matthew Wilcox <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Reported-by: Michael Ellerman <[email protected]>
Tested-by: Michael Ellerman <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Yury Norov <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: <[email protected]> [4.13+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

platform/x86: DELL_WMI use depends on instead of select for DELL_SMBIOS

If DELL_WMI "select"s DELL_SMBIOS, the DELL_SMBIOS dependencies are
ignored and it is still possible to end up with unmet direct
dependencies.

Change the select to a depends on.

Tested-by: Randy Dunlap <[email protected]>
Signed-off-by: Darren Hart (VMware) <[email protected]>

selftests: bpf: config: enable NET_SCH_INGRESS for xdp_meta.sh

When running bpf's selftest test_xdp_meta.sh it fails:
./test_xdp_meta.sh
Error: Specified qdisc not found.
selftests: test_xdp_meta [FAILED]

Need to enable CONFIG_NET_SCH_INGRESS and CONFIG_NET_CLS_ACT to get the
test to pass.

Fixes: 22c8852624fc ("bpf: improve selftests and add tests for meta pointer")
Signed-off-by: Anders Roxell <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent

When run raidconfig from Dom0 we found that the Xen DMA heap is reduced,
but Dom Heap is increased by the same size. Tracing raidconfig we found
that the related ioctl() in megaraid_sas will call dma_alloc_coherent()
to apply memory. If the memory allocated by Dom0 is not in the DMA area,
it will exchange memory with Xen to meet the requiment. Later drivers
call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent()
the check condition (dev_addr + size - 1 <= dma_mask) is always false,
it prevents calling xen_destroy_contiguous_region() to return the memory
to the Xen DMA heap.

This issue introduced by commit 6810df88dcfc2 "xen-swiotlb: When doing
coherent alloc/dealloc check before swizzling the MFNs.".

Signed-off-by: Joe Jin <[email protected]>
Tested-by: John Sobecki <[email protected]>
Reviewed-by: Rzeszutek Wilk <[email protected]>
Cc: [email protected]
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>

cxgb4: fix offset in collecting TX rate limit info

Correct the indirect register offsets in collecting TX rate limit info
in UP CIM logs.

Also, T5 doesn't support these indirect register offsets, so remove
them from collection logic.

Fixes: be6e36d916b1 ("cxgb4: collect TX rate limit info in UP CIM logs")
Signed-off-by: Rahul Lakkireddy <[email protected]>
Signed-off-by: Ganesh Goudar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sched: red: avoid hashing NULL child

Hangbin reported an Oops triggered by the syzkaller qdisc rules:

kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN PTI
Modules linked in: sch_red
CPU: 0 PID: 28699 Comm: syz-executor5 Not tainted 4.17.0-rc4.kcov #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:qdisc_hash_add+0x26/0xa0
RSP: 0018:ffff8800589cf470 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff824ad971
RDX: 0000000000000007 RSI: ffffc9000ce9f000 RDI: 000000000000003c
RBP: 0000000000000001 R08: ffffed000b139ea2 R09: ffff8800589cf4f0
R10: ffff8800589cf50f R11: ffffed000b139ea2 R12: ffff880054019fc0
R13: ffff880054019fb4 R14: ffff88005c0af600 R15: ffff880054019fb0
FS:  00007fa6edcb1700(0000) GS:ffff88005ce00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000740 CR3: 000000000fc16000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  red_change+0x2d2/0xed0 [sch_red]
  qdisc_create+0x57e/0xef0
  tc_modify_qdisc+0x47f/0x14e0
  rtnetlink_rcv_msg+0x6a8/0x920
  netlink_rcv_skb+0x2a2/0x3c0
  netlink_unicast+0x511/0x740
  netlink_sendmsg+0x825/0xc30
  sock_sendmsg+0xc5/0x100
  ___sys_sendmsg+0x778/0x8e0
  __sys_sendmsg+0xf5/0x1b0
  do_syscall_64+0xbd/0x3b0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x450869
RSP: 002b:00007fa6edcb0c48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fa6edcb16b4 RCX: 0000000000450869
RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000013
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000008778 R14: 0000000000702838 R15: 00007fa6edcb1700
Code: e9 0b fe ff ff 0f 1f 44 00 00 55 53 48 89 fb 89 f5 e8 3f 07 f3 fe 48 8d 7b 3c 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 51
RIP: qdisc_hash_add+0x26/0xa0 RSP: ffff8800589cf470

When a red qdisc is updated with a 0 limit, the child qdisc is left
unmodified, no additional scheduler is created in red_change(),
the 'child' local variable is rightfully NULL and must not add it
to the hash table.

This change addresses the above issue moving qdisc_hash_add() right
after the child qdisc creation. It additionally removes unneeded checks
for noop_qdisc.

Reported-by: Hangbin Liu <[email protected]>
Fixes: 49b499718fa1 ("net: sched: make default fifo qdiscs appear in the dump")
Signed-off-by: Paolo Abeni <[email protected]>
Acked-by: Jiri Kosina <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sock_diag: fix use-after-free read in __sk_free

We must not call sock_diag_has_destroy_listeners(sk) on a socket
that has no reference on net structure.

BUG: KASAN: use-after-free in sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline]
BUG: KASAN: use-after-free in __sk_free+0x329/0x340 net/core/sock.c:1609
Read of size 8 at addr ffff88018a02e3a0 by task swapper/1/0

CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.17.0-rc5+ #54
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b9/0x294 lib/dump_stack.c:113
print_address_description+0x6c/0x20b mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline]
__sk_free+0x329/0x340 net/core/sock.c:1609
sk_free+0x42/0x50 net/core/sock.c:1623
sock_put include/net/sock.h:1664 [inline]
reqsk_free include/net/request_sock.h:116 [inline]
reqsk_put include/net/request_sock.h:124 [inline]
inet_csk_reqsk_queue_drop_and_put net/ipv4/inet_connection_sock.c:672 [inline]
reqsk_timer_handler+0xe27/0x10e0 net/ipv4/inet_connection_sock.c:739
call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
expire_timers kernel/time/timer.c:1363 [inline]
__run_timers+0x79e/0xc50 kernel/time/timer.c:1666
run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
__do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
invoke_softirq kernel/softirq.c:365 [inline]
irq_exit+0x1d1/0x200 kernel/softirq.c:405
exiting_irq arch/x86/include/asm/apic.h:525 [inline]
smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
</IRQ>
RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:54
RSP: 0018:ffff8801d9ae7c38 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: dffffc0000000000 RBX: 1ffff1003b35cf8a RCX: 0000000000000000
RDX: 1ffffffff11a30d0 RSI: 0000000000000001 RDI: ffffffff88d18680
RBP: ffff8801d9ae7c38 R08: ffffed003b5e46c3 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
R13: ffff8801d9ae7cf0 R14: ffffffff897bef20 R15: 0000000000000000
arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
default_idle+0xc2/0x440 arch/x86/kernel/process.c:354
arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:345
default_idle_call+0x6d/0x90 kernel/sched/idle.c:93
cpuidle_idle_call kernel/sched/idle.c:153 [inline]
do_idle+0x395/0x560 kernel/sched/idle.c:262
cpu_startup_entry+0x104/0x120 kernel/sched/idle.c:368
start_secondary+0x426/0x5b0 arch/x86/kernel/smpboot.c:269
secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242

Allocated by task 4557:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
kmem_cache_zalloc include/linux/slab.h:691 [inline]
net_alloc net/core/net_namespace.c:383 [inline]
copy_net_ns+0x159/0x4c0 net/core/net_namespace.c:423
create_new_namespaces+0x69d/0x8f0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc3/0x1f0 kernel/nsproxy.c:206
ksys_unshare+0x708/0xf90 kernel/fork.c:2408
__do_sys_unshare kernel/fork.c:2476 [inline]
__se_sys_unshare kernel/fork.c:2474 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2474
do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 69:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
__cache_free mm/slab.c:3498 [inline]
kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
net_free net/core/net_namespace.c:399 [inline]
net_drop_ns.part.14+0x11a/0x130 net/core/net_namespace.c:406
net_drop_ns net/core/net_namespace.c:405 [inline]
cleanup_net+0x6a1/0xb20 net/core/net_namespace.c:541
process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
kthread+0x345/0x410 kernel/kthread.c:240
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412

The buggy address belongs to the object at ffff88018a02c140
which belongs to the cache net_namespace of size 8832
The buggy address is located 8800 bytes inside of
8832-byte region [ffff88018a02c140, ffff88018a02e3c0)
The buggy address belongs to the page:
page:ffffea0006280b00 count:1 mapcount:0 mapping:ffff88018a02c140 index:0x0 compound_mapcount: 0
flags: 0x2fffc0000008100(slab|head)
raw: 02fffc0000008100 ffff88018a02c140 0000000000000000 0000000100000001
raw: ffffea00062a1320 ffffea0006268020 ffff8801d9bdde40 0000000000000000
page dumped because: kasan: bad access detected

Fixes: b922622ec6ef ("sock_diag: don't broadcast kernel sockets")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Craig Gallek <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sh_eth: Change platform check to CONFIG_ARCH_RENESAS

Since commit 9b5ba0df4ea4f940 ("ARM: shmobile: Introduce ARCH_RENESAS")
is CONFIG_ARCH_RENESAS a more appropriate platform check than the legacy
CONFIG_ARCH_SHMOBILE, hence use the former.

Renesas SuperH SH-Mobile SoCs are still covered by the CONFIG_CPU_SH4
check.

This will allow to drop ARCH_SHMOBILE on ARM and ARM64 in the near
future.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Acked-by: Sergei Shtylyov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'powerpc-4.17-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
"Just three commits.

  The two cxl ones are not fixes per se, but they modify code that was
  added this cycle so that it will work with a recent firmware change.

  And then a fix for a recent commit that added sleeps in the NVRAM
  code, which needs to be more careful and not sleep if eg. we're called
  in the panic() path.

  Thanks to Nicholas Piggin, Philippe Bergheaud, Christophe Lombard"

* tag 'powerpc-4.17-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/powernv: Fix NVRAM sleep in invalid context when crashing
  cxl: Report the tunneled operations status
  cxl: Set the PBCQ Tunnel BAR register when enabling capi mode

Merge tag 'acpi-4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
"Fix an ACPICA regression introduced in this cycle and related to the
  handling of package objects loaded by the Load and loadTable AML
  operators that are not initialized properly after recent changes (Bob
  Moore)"

* tag 'acpi-4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPICA: Add deferred package support for the Load and loadTable operators

Merge tag 'pm-4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
"Fix Kconfig dependencies of the armada-37xx cpufreq driver (Miquel
Raynal)"

* tag 'pm-4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: armada-37xx: driver relies on cpufreq-dt

Merge tag 'usb-4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some USB driver fixes fro 4.17-rc6.

  They resolve some reported bugs in the musb driver, the xhci driver,
  and a number of small fixes for the usbip driver.

  All of these have been in linux-next with no reported issues"

* tag 'usb-4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usbip: usbip_host: fix bad unlock balance during stub_probe()
  usbip: usbip_host: fix NULL-ptr deref and use-after-free errors
  usbip: usbip_host: run rebind from exit when module is removed
  usbip: usbip_host: delete device from busid_table after rebind
  usbip: usbip_host: refine probe and disconnect debug msgs to be useful
  usb: musb: fix remote wakeup racing with suspend
  xhci: Fix USB3 NULL pointer dereference at logical disconnect.

Merge tag 'for-linus-20180518' of git://git.kernel.dk/linux-block

Pull block fix from Jens Axboe:
"Single fix this time, from Coly, fixing a failure case when
CONFIG_DEBUGFS isn't enabled"

* tag 'for-linus-20180518' of git://git.kernel.dk/linux-block:
bcache: return 0 from bch_debug_init() if CONFIG_DEBUG_FS=n

Merge tag 'spi-fix-v4.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A small collection of fixes accumilated since the merge window, all
  fairly small and driver specific"

* tag 'spi-fix-v4.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: bcm2835aux: ensure interrupts are enabled for shared handler
  spi: bcm-qspi: Always read and set BSPI_MAST_N_BOOT_CTRL
  spi: bcm-qspi: Avoid setting MSPI_CDRAM_PCS for spi-nor master
  spi: pxa2xx: Allow 64-bit DMA
  spi: cadence: Add usleep_range() for cdns_spi_fill_tx_fifo()
  spi: sh-msiof: Fix bit field overflow writes to TSCR/RSCR
  spi: imx: Update MODULE_DESCRIPTION to "SPI Controller driver"

Merge tag 'mtd/fixes-for-4.17-rc6' of git://git.infradead.org/linux-mtd

Pull mtd fixes from Boris Brezillon:
"NAND fixes:
   - Fix read path of the Marvell NAND driver
   - Make sure we don't pass a u64 to ndelay()

  CFI fix:
   - Fix the map_word_andequal() implementation"

* tag 'mtd/fixes-for-4.17-rc6' of git://git.infradead.org/linux-mtd:
  mtd: rawnand: Fix return type of __DIVIDE() when called with 32-bit
  mtd: rawnand: marvell: Fix read logic for layouts with ->nchunks > 2
  mtd: Fix comparison in map_word_andequal()

Merge tag 'drm-fixes-for-v4.17-rc6' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"Pretty quiet week again: one vmwgfx regression fix, one core buffer
  overflow fix, one vc4 leak fix and three i915 fixes"

* tag 'drm-fixes-for-v4.17-rc6' of git://people.freedesktop.org/~airlied/linux:
  drm/dumb-buffers: Integer overflow in drm_mode_create_ioctl()
  drm/i915/gen9: Add WaClearHIZ_WM_CHICKEN3 for bxt and glk
  drm/vmwgfx: Set dmabuf_size when vmw_dmabuf_init is successful
  drm/vc4: Fix leak of the file_priv that stored the perfmon.
  drm/i915/execlists: Use rmb() to order CSB reads
  drm/i915/userptr: reject zero user_size
  drm: Match sysfs name in link removal to link creation

net: dsa: Do not register devlink for unused ports

Even if commit 1d27732f411d ("net: dsa: setup and teardown ports") indicated
that registering a devlink instance for unused ports is not a problem, and this
is true, this can be confusing nonetheless, so let's not do it.

Fixes: 1d27732f411d ("net: dsa: setup and teardown ports")
Reported-by: Jiri Pirko <[email protected]>
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Fix a bug in removing queues from XPS map

While removing queues from the XPS map, the individual CPU ID
alone was used to index the CPUs map, this should be changed to also
factor in the traffic class mapping for the CPU-to-queue lookup.

Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes")
Signed-off-by: Amritha Nambiar <[email protected]>
Acked-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

scsi: sg: allocate with __GFP_ZERO in sg_build_indirect()

This shall help avoid copying uninitialized memory to the userspace when
calling ioctl(fd, SG_IO) with an empty command.

Reported-by: [email protected]
Cc: [email protected]
Signed-off-by: Alexander Potapenko <[email protected]>
Acked-by: Douglas Gilbert <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

parisc: Move ccio_cujo20_fixup() into init section

ccio_cujo20_fixup() is called by dino_probe() only, which is in init
section already.

Signed-off-by: Helge Deller <[email protected]>

parisc: Move setup_profiling_timer() out of init section

No other architecture has setup_profiling_timer() in the init section,
thus on parisc we face this section mismatch warning:
Reference from the function devm_device_add_group() to the function .init.text:setup_profiling_timer()

Signed-off-by: Helge Deller <[email protected]>

parisc: Move find_pa_parent_type() out of init section

The 0-DAY kernel test infrastructure reported that inet_put_port() may
reference the find_pa_parent_type() function, so it can't be moved into the
init section.

Fixes: b86db40e1ecc ("parisc: Move various functions and strings to init section")
Signed-off-by: Helge Deller <[email protected]>

x86/bugs: Rename SSBD_NO to SSB_NO

The "336996 Speculative Execution Side Channel Mitigations" from
May defines this as SSB_NO, hence lets sync-up.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>

mac80211: mesh: fix premature update of rc stats

The mesh_neighbour_update() function, queued via beacon rx, can race with
userspace creating the same station. If the station already exists by the
time mesh_neighbour_update() is called, the function wrongly assumes rate
control has been initialized and calls rate_control_rate_update(), which
in turn calls into the driver.

Updating the rate control before it has been initialized can cause a
crash in some drivers, for example this firmware crash in ath10k due
to sta->rx_nss being 0:

[ 3078.088247] mesh0: Inserted STA 5c:e2:8c:f1:ab:ba
[ 3078.258407] ath10k_pci 0000:0d:00.0: firmware crashed! (uuid d6ed5961-93cc-4d61-803f-5eda55bb8643)
[ 3078.258421] ath10k_pci 0000:0d:00.0: qca988x hw2.0 target 0x4100016c chip_id 0x043202ff sub 0000:0000
[ 3078.258426] ath10k_pci 0000:0d:00.0: kconfig debug 1 debugfs 1 tracing 1 dfs 0 testmode 0
[ 3078.258608] ath10k_pci 0000:0d:00.0: firmware ver 10.2.4.70.59-2 api 5 features no-p2p,raw-mode,mfp crc32 4159f498
[ 3078.258613] ath10k_pci 0000:0d:00.0: board_file api 1 bmi_id N/A crc32 bebc7c08
[ 3078.258617] ath10k_pci 0000:0d:00.0: htt-ver 2.1 wmi-op 5 htt-op 2 cal otp max-sta 128 raw 0 hwcrypto 1
[ 3078.260627] ath10k_pci 0000:0d:00.0: firmware register dump:
[ 3078.260640] ath10k_pci 0000:0d:00.0: [00]: 0x4100016C 0x000015B3 0x009A31BB 0x00955B31
[ 3078.260647] ath10k_pci 0000:0d:00.0: [04]: 0x009A31BB 0x00060130 0x00000008 0x00000007
[ 3078.260652] ath10k_pci 0000:0d:00.0: [08]: 0x00000000 0x00955B31 0x00000000 0x0040F89E
[ 3078.260656] ath10k_pci 0000:0d:00.0: [12]: 0x00000009 0xFFFFFFFF 0x009580F5 0x00958117
[ 3078.260660] ath10k_pci 0000:0d:00.0: [16]: 0x00958080 0x0094085D 0x00000000 0x00000000
[ 3078.260664] ath10k_pci 0000:0d:00.0: [20]: 0x409A31BB 0x0040AA84 0x00000002 0x00000001
[ 3078.260669] ath10k_pci 0000:0d:00.0: [24]: 0x809A2B8D 0x0040AAE4 0x00000088 0xC09A31BB
[ 3078.260673] ath10k_pci 0000:0d:00.0: [28]: 0x809898C8 0x0040AB04 0x0043F91C 0x009C6458
[ 3078.260677] ath10k_pci 0000:0d:00.0: [32]: 0x809B66AC 0x0040AB34 0x009C6458 0x0043F91C
[ 3078.260686] ath10k_pci 0000:0d:00.0: [36]: 0x809B2824 0x0040ADA4 0x00400000 0x00416EB4
[ 3078.260692] ath10k_pci 0000:0d:00.0: [40]: 0x809C07D9 0x0040ADE4 0x0040AE08 0x00412028
[ 3078.260696] ath10k_pci 0000:0d:00.0: [44]: 0x809486FA 0x0040AE04 0x00000001 0x00000000
[ 3078.260700] ath10k_pci 0000:0d:00.0: [48]: 0x80948E2C 0x0040AEA4 0x0041F4F0 0x00412634
[ 3078.260704] ath10k_pci 0000:0d:00.0: [52]: 0x809BFC39 0x0040AEC4 0x0041F4F0 0x00000001
[ 3078.260709] ath10k_pci 0000:0d:00.0: [56]: 0x80940F18 0x0040AF14 0x00000010 0x00403AC0
[ 3078.284130] ath10k_pci 0000:0d:00.0: failed to to request monitor vdev 1 stop: -108

Fix this by checking whether the sta has already initialized rate control
using the flag for that purpose. We can also drop the unnecessary insert
parameter here.

Signed-off-by: Bob Copeland <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>

nl80211: fix nlmsg allocation in cfg80211_ft_event

Allocation size of nlmsg in cfg80211_ft_event is based on ric_ies_len
and doesn't take into account ies_len. This leads to
NL80211_CMD_FT_EVENT message construction failure in case ft_event
contains large enough ies buffer.
Add ies_len to the nlmsg allocation size.

Signed-off-by: Dedy Lansky <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>

cfg80211: further limit wiphy names to 64 bytes

wiphy names were recently limited to 128 bytes by commit a7cfebcb7594
("cfg80211: limit wiphy names to 128 bytes").  As it turns out though,
this isn't sufficient because dev_vprintk_emit() needs the syslog header
string "SUBSYSTEM=ieee80211\0DEVICE=+ieee80211:$devname" to fit into 128
bytes.  This triggered the "device/subsystem name too long" WARN when
the device name was >= 90 bytes.  As before, this was reproduced by
syzbot by sending an HWSIM_CMD_NEW_RADIO command to the MAC80211_HWSIM
generic netlink family.

Fix it by further limiting wiphy names to 64 bytes.

Reported-by: [email protected]
Fixes: a7cfebcb7594 ("cfg80211: limit wiphy names to 128 bytes")
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>

sched/fair: Fix documentation file path

The 'tip' prefix probably referred to the -tip tree and is not required,
remove it.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

sched/deadline: Make the grub_reclaim() function static

Since the grub_reclaim() function can be made static, make it so.

Silences the following GCC warning (W=1):

kernel/sched/deadline.c:1120:5: warning: no previous prototype for ‘grub_reclaim’ [-Wmissing-prototypes]

Signed-off-by: Mathieu Malaterre <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

sched/debug: Move the print_rt_rq() and print_dl_rq() declarations to kernel/sched/sched.h

In the following commit:

  6b55c9654fcc ("sched/debug: Move print_cfs_rq() declaration to kernel/sched/sched.h")

the print_cfs_rq() prototype was added to <kernel/sched/sched.h>,
right next to the prototypes for print_cfs_stats(), print_rt_stats()
and print_dl_stats().

Finish this previous commit and also move related prototypes for
print_rt_rq() and print_dl_rq().

Remove existing extern declarations now that they not needed anymore.

Silences the following GCC warning, triggered by W=1:

  kernel/sched/debug.c:573:6: warning: no previous prototype for ‘print_rt_rq’ [-Wmissing-prototypes]
  kernel/sched/debug.c:603:6: warning: no previous prototype for ‘print_dl_rq’ [-Wmissing-prototypes]

Signed-off-by: Mathieu Malaterre <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

ALSA: timer: Fix pause event notification

Commit f65e0d299807 ("ALSA: timer: Call notifier in the same spinlock")
combined the start/continue and stop/pause functions, and in doing so
changed the event code for the pause case to SNDRV_TIMER_EVENT_CONTINUE.
Change it back to SNDRV_TIMER_EVENT_PAUSE.

Fixes: f65e0d299807 ("ALSA: timer: Call notifier in the same spinlock")
Signed-off-by: Ben Hutchings <[email protected]>
Cc: [email protected]
Signed-off-by: Takashi Iwai <[email protected]>

powerpc/64s: Clear PCR on boot

Clear the PCR (Processor Compatibility Register) on boot to ensure we
are not running in a compatibility mode.

We've seen this cause problems when a crash (and kdump) occurs while
running compat mode guests. The kdump kernel then runs with the PCR
set and causes problems. The symptom in the kdump kernel (also seen in
petitboot after fast-reboot) is early userspace programs taking
sigills on newer instructions (seen in libc).

Signed-off-by: Michael Neuling <[email protected]>
Cc: [email protected]
Signed-off-by: Michael Ellerman <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2018-05-18

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix two bugs in sockmap, a use after free in sockmap's error path
   from sock_map_ctx_update_elem() where we mistakenly drop a reference
   we didn't take prior to that, and in the same function fix a race
   in bpf_prog_inc_not_zero() where we didn't use the progs from prior
   READ_ONCE(), from John.

2) Reject program expansions once we figure out that their jump target
   which crosses patchlet boundaries could otherwise get truncated in
   insn->off space, from Daniel.

3) Check the return value of fopen() in BPF selftest's test_verifier
   where we determine whether unpriv BPF is disabled, and iff we do
   fail there then just assume it is disabled. This fixes a segfault
   when used with older kernels, from Jesper.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge tag 'drm-intel-fixes-2018-05-17' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Userptr IOCTL zero size check (Matt)
- Two hardware quirk fixes (Michel & Chris)

* tag 'drm-intel-fixes-2018-05-17' of git://anongit.freedesktop.org/drm/drm-intel:
  drm/i915/gen9: Add WaClearHIZ_WM_CHICKEN3 for bxt and glk
  drm/i915/execlists: Use rmb() to order CSB reads
  drm/i915/userptr: reject zero user_size

bpf: fix truncated jump targets on heavy expansions

Recently during testing, I ran into the following panic:

  [  207.892422] Internal error: Accessing user space memory outside uaccess.h routines: 96000004 [#1] SMP
  [  207.901637] Modules linked in: binfmt_misc [...]
  [  207.966530] CPU: 45 PID: 2256 Comm: test_verifier Tainted: G        W         4.17.0-rc3+ #7
  [  207.974956] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB18A 03/31/2017
  [  207.982428] pstate: 60400005 (nZCv daif +PAN -UAO)
  [  207.987214] pc : bpf_skb_load_helper_8_no_cache+0x34/0xc0
  [  207.992603] lr : 0xffff000000bdb754
  [  207.996080] sp : ffff000013703ca0
  [  207.999384] x29: ffff000013703ca0 x28: 0000000000000001
  [  208.004688] x27: 0000000000000001 x26: 0000000000000000
  [  208.009992] x25: ffff000013703ce0 x24: ffff800fb4afcb00
  [  208.015295] x23: ffff00007d2f5038 x22: ffff00007d2f5000
  [  208.020599] x21: fffffffffeff2a6f x20: 000000000000000a
  [  208.025903] x19: ffff000009578000 x18: 0000000000000a03
  [  208.031206] x17: 0000000000000000 x16: 0000000000000000
  [  208.036510] x15: 0000ffff9de83000 x14: 0000000000000000
  [  208.041813] x13: 0000000000000000 x12: 0000000000000000
  [  208.047116] x11: 0000000000000001 x10: ffff0000089e7f18
  [  208.052419] x9 : fffffffffeff2a6f x8 : 0000000000000000
  [  208.057723] x7 : 000000000000000a x6 : 00280c6160000000
  [  208.063026] x5 : 0000000000000018 x4 : 0000000000007db6
  [  208.068329] x3 : 000000000008647a x2 : 19868179b1484500
  [  208.073632] x1 : 0000000000000000 x0 : ffff000009578c08
  [  208.078938] Process test_verifier (pid: 2256, stack limit = 0x0000000049ca7974)
  [  208.086235] Call trace:
  [  208.088672]  bpf_skb_load_helper_8_no_cache+0x34/0xc0
  [  208.093713]  0xffff000000bdb754
  [  208.096845]  bpf_test_run+0x78/0xf8
  [  208.100324]  bpf_prog_test_run_skb+0x148/0x230
  [  208.104758]  sys_bpf+0x314/0x1198
  [  208.108064]  el0_svc_naked+0x30/0x34
  [  208.111632] Code: 91302260 f9400001 f9001fa1 d2800001 (29500680)
  [  208.117717] ---[ end trace 263cb8a59b5bf29f ]---

The program itself which caused this had a long jump over the whole
instruction sequence where all of the inner instructions required
heavy expansions into multiple BPF instructions. Additionally, I also
had BPF hardening enabled which requires once more rewrites of all
constant values in order to blind them. Each time we rewrite insns,
bpf_adj_branches() would need to potentially adjust branch targets
which cross the patchlet boundary to accommodate for the additional
delta. Eventually that lead to the case where the target offset could
not fit into insn->off's upper 0x7fff limit anymore where then offset
wraps around becoming negative (in s16 universe), or vice versa
depending on the jump direction.

Therefore it becomes necessary to detect and reject any such occasions
in a generic way for native eBPF and cBPF to eBPF migrations. For
the latter we can simply check bounds in the bpf_convert_filter()'s
BPF_EMIT_JMP helper macro and bail out once we surpass limits. The
bpf_patch_insn_single() for native eBPF (and cBPF to eBPF in case
of subsequent hardening) is a bit more complex in that we need to
detect such truncations before hitting the bpf_prog_realloc(). Thus
the latter is split into an extra pass to probe problematic offsets
on the original program in order to fail early. With that in place
and carefully tested I no longer hit the panic and the rewrites are
rejected properly. The above example panic I've seen on bpf-next,
though the issue itself is generic in that a guard against this issue
in bpf seems more appropriate in this case.

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

Merge tag 'hwmon-for-linus-v4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
"Two k10temp fixes:

   - fix race condition when accessing System Management Network
     registers

   - fix reading critical temperatures on F15h M60h and M70h

  Also add PCI ID's for the AMD Raven Ridge root bridge"

* tag 'hwmon-for-linus-v4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (k10temp) Use API function to access System Management Network
  x86/amd_nb: Add support for Raven Ridge CPUs
  hwmon: (k10temp) Fix reading critical temperature register

bpf: parse and verdict prog attach may race with bpf map update

In the sockmap design BPF programs (SK_SKB_STREAM_PARSER,
SK_SKB_STREAM_VERDICT and SK_MSG_VERDICT) are attached to the sockmap
map type and when a sock is added to the map the programs are used by
the socket. However, sockmap updates from both userspace and BPF
programs can happen concurrently with the attach and detach of these
programs.

To resolve this we use the bpf_prog_inc_not_zero and a READ_ONCE()
primitive to ensure the program pointer is not refeched and
possibly NULL'd before the refcnt increment. This happens inside
a RCU critical section so although the pointer reference in the map
object may be NULL (by a concurrent detach operation) the reference
from READ_ONCE will not be free'd until after grace period. This
ensures the object returned by READ_ONCE() is valid through the
RCU criticl section and safe to use as long as we "know" it may
be free'd shortly.

Daniel spotted a case in the sock update API where instead of using
the READ_ONCE() program reference we used the pointer from the
original map, stab->bpf_{verdict|parse|txmsg}. The problem with this
is the logic checks the object returned from the READ_ONCE() is not
NULL and then tries to reference the object again but using the
above map pointer, which may have already been NULL'd by a parallel
detach operation. If this happened bpf_porg_inc_not_zero could
dereference a NULL pointer.

Fix this by using variable returned by READ_ONCE() that is checked
for NULL.

Fixes: 2f857d04601a ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support")
Reported-by: Daniel Borkmann <[email protected]>
Signed-off-by: John Fastabend <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: sockmap update rollback on error can incorrectly dec prog refcnt

If the user were to only attach one of the parse or verdict programs
then it is possible a subsequent sockmap update could incorrectly
decrement the refcnt on the program. This happens because in the
rollback logic, after an error, we have to decrement the program
reference count when its been incremented. However, we only increment
the program reference count if the user has both a verdict and a
parse program. The reason for this is because, at least at the
moment, both are required for any one to be meaningful. The problem
fixed here is in the rollback path we decrement the program refcnt
even if only one existing. But we never incremented the refcnt in
the first place creating an imbalance.

This patch fixes the error path to handle this case.

Fixes: 2f857d04601a ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support")
Reported-by: Daniel Borkmann <[email protected]>
Signed-off-by: John Fastabend <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

net: test tailroom before appending to linear skb

Device features may change during transmission. In particular with
corking, a device may toggle scatter-gather in between allocating
and writing to an skb.

Do not unconditionally assume that !NETIF_F_SG at write time implies
that the same held at alloc time and thus the skb has sufficient
tailroom.

This issue predates git history.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Eric Dumazet <[email protected]>
Signed-off-by: Willem de Bruijn <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'ip6_gre-Fixes-in-headroom-handling'

Petr Machata says:

====================
net: ip6_gre: Fixes in headroom handling

This series mends some problems in headroom management in ip6_gre
module. The current code base has the following three closely-related
problems:

- ip6gretap tunnels neglect to ensure there's enough writable headroom
  before pushing GRE headers.

- ip6erspan does this, but assumes that dev->needed_headroom is primed.
  But that doesn't happen until ip6_tnl_xmit() is called later. Thus for
  the first packet, ip6erspan actually behaves like ip6gretap above.

- ip6erspan shares some of the code with ip6gretap, including
  calculations of needed header length. While there is custom
  ERSPAN-specific code for calculating the headroom, the computed
  values are overwritten by the ip6gretap code.

The first two issues lead to a kernel panic in situations where a packet
is mirrored from a veth device to the device in question. They are
fixed, respectively, in patches #1 and #2, which include the full panic
trace and a reproducer.

The rest of the patchset deals with the last issue. In patches #3 to #6,
several functions are split up into reusable parts. Finally in patch #7
these blocks are used to compose ERSPAN-specific callbacks where
necessary to fix the hlen calculation.
====================

Signed-off-by: David S. Miller <[email protected]>

net: ip6_gre: Fix ip6erspan hlen calculation

Even though ip6erspan_tap_init() sets up hlen and tun_hlen according to
what ERSPAN needs, it goes ahead to call ip6gre_tnl_link_config() which
overwrites these settings with GRE-specific ones.

Similarly for changelink callbacks, which are handled by
ip6gre_changelink() calls ip6gre_tnl_change() calls
ip6gre_tnl_link_config() as well.

The difference ends up being 12 vs. 20 bytes, and this is generally not
a problem, because a 12-byte request likely ends up allocating more and
the extra 8 bytes are thus available. However correct it is not.

So replace the newlink and changelink callbacks with an ERSPAN-specific
ones, reusing the newly-introduced _common() functions.

Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Petr Machata <[email protected]>
Acked-by: William Tu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ip6_gre: Split up ip6gre_changelink()

Extract from ip6gre_changelink() a reusable function
ip6gre_changelink_common(). This will allow introduction of
ERSPAN-specific _changelink() function with not a lot of code
duplication.

Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Petr Machata <[email protected]>
Acked-by: William Tu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ip6_gre: Split up ip6gre_newlink()

Extract from ip6gre_newlink() a reusable function
ip6gre_newlink_common(). The ip6gre_tnl_link_config() call needs to be
made customizable for ERSPAN, thus reorder it with calls to
ip6_tnl_change_mtu() and dev_hold(), and extract the whole tail to the
caller, ip6gre_newlink(). Thus enable an ERSPAN-specific _newlink()
function without a lot of duplicity.

Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Petr Machata <[email protected]>
Acked-by: William Tu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ip6_gre: Split up ip6gre_tnl_change()

Split a reusable function ip6gre_tnl_copy_tnl_parm() from
ip6gre_tnl_change(). This will allow ERSPAN-specific code to
reuse the common parts while customizing the behavior for ERSPAN.

Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Petr Machata <[email protected]>
Acked-by: William Tu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ip6_gre: Split up ip6gre_tnl_link_config()

The function ip6gre_tnl_link_config() is used for setting up
configuration of both ip6gretap and ip6erspan tunnels. Split the
function into the common part and the route-lookup part. The latter then
takes the calculated header length as an argument. This split will allow
the patches down the line to sneak in a custom header length computation
for the ERSPAN tunnel.

Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Petr Machata <[email protected]>
Acked-by: William Tu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ip6_gre: Fix headroom request in ip6erspan_tunnel_xmit()

dev->needed_headroom is not primed until ip6_tnl_xmit(), so it starts
out zero. Thus the call to skb_cow_head() fails to actually make sure
there's enough headroom to push the ERSPAN headers to. That can lead to
the panic cited below. (Reproducer below that).

Fix by requesting either needed_headroom if already primed, or just the
bare minimum needed for the header otherwise.

[  190.703567] kernel BUG at net/core/skbuff.c:104!
[  190.708384] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[  190.714007] Modules linked in: act_mirred cls_matchall ip6_gre ip6_tunnel tunnel6 gre sch_ingress vrf veth x86_pkg_temp_thermal mlx_platform nfsd e1000e leds_mlxcpld
[  190.728975] CPU: 1 PID: 959 Comm: kworker/1:2 Not tainted 4.17.0-rc4-net_master-custom-139 #10
[  190.737647] Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2F"/"SA000874", BIOS 4.6.5 03/08/2016
[  190.747006] Workqueue: ipv6_addrconf addrconf_dad_work
[  190.752222] RIP: 0010:skb_panic+0xc3/0x100
[  190.756358] RSP: 0018:ffff8801d54072f0 EFLAGS: 00010282
[  190.761629] RAX: 0000000000000085 RBX: ffff8801c1a8ecc0 RCX: 0000000000000000
[  190.768830] RDX: 0000000000000085 RSI: dffffc0000000000 RDI: ffffed003aa80e54
[  190.776025] RBP: ffff8801bd1ec5a0 R08: ffffed003aabce19 R09: ffffed003aabce19
[  190.783226] R10: 0000000000000001 R11: ffffed003aabce18 R12: ffff8801bf695dbe
[  190.790418] R13: 0000000000000084 R14: 00000000000006c0 R15: ffff8801bf695dc8
[  190.797621] FS:  0000000000000000(0000) GS:ffff8801d5400000(0000) knlGS:0000000000000000
[  190.805786] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  190.811582] CR2: 000055fa929aced0 CR3: 0000000003228004 CR4: 00000000001606e0
[  190.818790] Call Trace:
[  190.821264]  <IRQ>
[  190.823314]  ? ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre]
[  190.828940]  ? ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre]
[  190.834562]  skb_push+0x78/0x90
[  190.837749]  ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre]
[  190.843219]  ? ip6gre_tunnel_ioctl+0xd90/0xd90 [ip6_gre]
[  190.848577]  ? debug_check_no_locks_freed+0x210/0x210
[  190.853679]  ? debug_check_no_locks_freed+0x210/0x210
[  190.858783]  ? print_irqtrace_events+0x120/0x120
[  190.863451]  ? sched_clock_cpu+0x18/0x210
[  190.867496]  ? cyc2ns_read_end+0x10/0x10
[  190.871474]  ? skb_network_protocol+0x76/0x200
[  190.875977]  dev_hard_start_xmit+0x137/0x770
[  190.880317]  ? do_raw_spin_trylock+0x6d/0xa0
[  190.884624]  sch_direct_xmit+0x2ef/0x5d0
[  190.888589]  ? pfifo_fast_dequeue+0x3fa/0x670
[  190.892994]  ? pfifo_fast_change_tx_queue_len+0x810/0x810
[  190.898455]  ? __lock_is_held+0xa0/0x160
[  190.902422]  __qdisc_run+0x39e/0xfc0
[  190.906041]  ? _raw_spin_unlock+0x29/0x40
[  190.910090]  ? pfifo_fast_enqueue+0x24b/0x3e0
[  190.914501]  ? sch_direct_xmit+0x5d0/0x5d0
[  190.918658]  ? pfifo_fast_dequeue+0x670/0x670
[  190.923047]  ? __dev_queue_xmit+0x172/0x1770
[  190.927365]  ? preempt_count_sub+0xf/0xd0
[  190.931421]  __dev_queue_xmit+0x410/0x1770
[  190.935553]  ? ___slab_alloc+0x605/0x930
[  190.939524]  ? print_irqtrace_events+0x120/0x120
[  190.944186]  ? memcpy+0x34/0x50
[  190.947364]  ? netdev_pick_tx+0x1c0/0x1c0
[  190.951428]  ? __skb_clone+0x2fd/0x3d0
[  190.955218]  ? __copy_skb_header+0x270/0x270
[  190.959537]  ? rcu_read_lock_sched_held+0x93/0xa0
[  190.964282]  ? kmem_cache_alloc+0x344/0x4d0
[  190.968520]  ? cyc2ns_read_end+0x10/0x10
[  190.972495]  ? skb_clone+0x123/0x230
[  190.976112]  ? skb_split+0x820/0x820
[  190.979747]  ? tcf_mirred+0x554/0x930 [act_mirred]
[  190.984582]  tcf_mirred+0x554/0x930 [act_mirred]
[  190.989252]  ? tcf_mirred_act_wants_ingress.part.2+0x10/0x10 [act_mirred]
[  190.996109]  ? __lock_acquire+0x706/0x26e0
[  191.000239]  ? sched_clock_cpu+0x18/0x210
[  191.004294]  tcf_action_exec+0xcf/0x2a0
[  191.008179]  tcf_classify+0xfa/0x340
[  191.011794]  __netif_receive_skb_core+0x8e1/0x1c60
[  191.016630]  ? debug_check_no_locks_freed+0x210/0x210
[  191.021732]  ? nf_ingress+0x500/0x500
[  191.025458]  ? process_backlog+0x347/0x4b0
[  191.029619]  ? print_irqtrace_events+0x120/0x120
[  191.034302]  ? lock_acquire+0xd8/0x320
[  191.038089]  ? process_backlog+0x1b6/0x4b0
[  191.042246]  ? process_backlog+0xc2/0x4b0
[  191.046303]  process_backlog+0xc2/0x4b0
[  191.050189]  net_rx_action+0x5cc/0x980
[  191.053991]  ? napi_complete_done+0x2c0/0x2c0
[  191.058386]  ? mark_lock+0x13d/0xb40
[  191.062001]  ? clockevents_program_event+0x6b/0x1d0
[  191.066922]  ? print_irqtrace_events+0x120/0x120
[  191.071593]  ? __lock_is_held+0xa0/0x160
[  191.075566]  __do_softirq+0x1d4/0x9d2
[  191.079282]  ? ip6_finish_output2+0x524/0x1460
[  191.083771]  do_softirq_own_stack+0x2a/0x40
[  191.087994]  </IRQ>
[  191.090130]  do_softirq.part.13+0x38/0x40
[  191.094178]  __local_bh_enable_ip+0x135/0x190
[  191.098591]  ip6_finish_output2+0x54d/0x1460
[  191.102916]  ? ip6_forward_finish+0x2f0/0x2f0
[  191.107314]  ? ip6_mtu+0x3c/0x2c0
[  191.110674]  ? ip6_finish_output+0x2f8/0x650
[  191.114992]  ? ip6_output+0x12a/0x500
[  191.118696]  ip6_output+0x12a/0x500
[  191.122223]  ? ip6_route_dev_notify+0x5b0/0x5b0
[  191.126807]  ? ip6_finish_output+0x650/0x650
[  191.131120]  ? ip6_fragment+0x1a60/0x1a60
[  191.135182]  ? icmp6_dst_alloc+0x26e/0x470
[  191.139317]  mld_sendpack+0x672/0x830
[  191.143021]  ? igmp6_mcf_seq_next+0x2f0/0x2f0
[  191.147429]  ? __local_bh_enable_ip+0x77/0x190
[  191.151913]  ipv6_mc_dad_complete+0x47/0x90
[  191.156144]  addrconf_dad_completed+0x561/0x720
[  191.160731]  ? addrconf_rs_timer+0x3a0/0x3a0
[  191.165036]  ? mark_held_locks+0xc9/0x140
[  191.169095]  ? __local_bh_enable_ip+0x77/0x190
[  191.173570]  ? addrconf_dad_work+0x50d/0xa20
[  191.177886]  ? addrconf_dad_work+0x529/0xa20
[  191.182194]  addrconf_dad_work+0x529/0xa20
[  191.186342]  ? addrconf_dad_completed+0x720/0x720
[  191.191088]  ? __lock_is_held+0xa0/0x160
[  191.195059]  ? process_one_work+0x45d/0xe20
[  191.199302]  ? process_one_work+0x51e/0xe20
[  191.203531]  ? rcu_read_lock_sched_held+0x93/0xa0
[  191.208279]  process_one_work+0x51e/0xe20
[  191.212340]  ? pwq_dec_nr_in_flight+0x200/0x200
[  191.216912]  ? get_lock_stats+0x4b/0xf0
[  191.220788]  ? preempt_count_sub+0xf/0xd0
[  191.224844]  ? worker_thread+0x219/0x860
[  191.228823]  ? do_raw_spin_trylock+0x6d/0xa0
[  191.233142]  worker_thread+0xeb/0x860
[  191.236848]  ? process_one_work+0xe20/0xe20
[  191.241095]  kthread+0x206/0x300
[  191.244352]  ? process_one_work+0xe20/0xe20
[  191.248587]  ? kthread_stop+0x570/0x570
[  191.252459]  ret_from_fork+0x3a/0x50
[  191.256082] Code: 14 3e ff 8b 4b 78 55 4d 89 f9 41 56 41 55 48 c7 c7 a0 cf db 82 41 54 44 8b 44 24 2c 48 8b 54 24 30 48 8b 74 24 20 e8 16 94 13 ff <0f> 0b 48 c7 c7 60 8e 1f 85 48 83 c4 20 e8 55 ef a6 ff 89 74 24
[  191.275327] RIP: skb_panic+0xc3/0x100 RSP: ffff8801d54072f0
[  191.281024] ---[ end trace 7ea51094e099e006 ]---
[  191.285724] Kernel panic - not syncing: Fatal exception in interrupt
[  191.292168] Kernel Offset: disabled
[  191.295697] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Reproducer:

ip link add h1 type veth peer name swp1
ip link add h3 type veth peer name swp3

ip link set dev h1 up
ip address add 192.0.2.1/28 dev h1

ip link add dev vh3 type vrf table 20
ip link set dev h3 master vh3
ip link set dev vh3 up
ip link set dev h3 up

ip link set dev swp3 up
ip address add dev swp3 2001:db8:2::1/64

ip link set dev swp1 up
tc qdisc add dev swp1 clsact

ip link add name gt6 type ip6erspan \
local 2001:db8:2::1 remote 2001:db8:2::2 oseq okey 123
ip link set dev gt6 up

sleep 1

tc filter add dev swp1 ingress pref 1000 matchall skip_hw \
action mirred egress mirror dev gt6
ping -I h1 192.0.2.2

Fixes: e41c7c68ea77 ("ip6erspan: make sure enough headroom at xmit.")
Signed-off-by: Petr Machata <[email protected]>
Acked-by: William Tu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ip6_gre: Request headroom in __gre6_xmit()

__gre6_xmit() pushes GRE headers before handing over to ip6_tnl_xmit()
for generic IP-in-IP processing. However it doesn't make sure that there
is enough headroom to push the header to. That can lead to the panic
cited below. (Reproducer below that).

Fix by requesting either needed_headroom if already primed, or just the
bare minimum needed for the header otherwise.

[  158.576725] kernel BUG at net/core/skbuff.c:104!
[  158.581510] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[  158.587174] Modules linked in: act_mirred cls_matchall ip6_gre ip6_tunnel tunnel6 gre sch_ingress vrf veth x86_pkg_temp_thermal mlx_platform nfsd e1000e leds_mlxcpld
[  158.602268] CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 4.17.0-rc4-net_master-custom-139 #10
[  158.610938] Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2F"/"SA000874", BIOS 4.6.5 03/08/2016
[  158.620426] RIP: 0010:skb_panic+0xc3/0x100
[  158.624586] RSP: 0018:ffff8801d3f27110 EFLAGS: 00010286
[  158.629882] RAX: 0000000000000082 RBX: ffff8801c02cc040 RCX: 0000000000000000
[  158.637127] RDX: 0000000000000082 RSI: dffffc0000000000 RDI: ffffed003a7e4e18
[  158.644366] RBP: ffff8801bfec8020 R08: ffffed003aabce19 R09: ffffed003aabce19
[  158.651574] R10: 000000000000000b R11: ffffed003aabce18 R12: ffff8801c364de66
[  158.658786] R13: 000000000000002c R14: 00000000000000c0 R15: ffff8801c364de68
[  158.666007] FS:  0000000000000000(0000) GS:ffff8801d5400000(0000) knlGS:0000000000000000
[  158.674212] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  158.680036] CR2: 00007f4b3702dcd0 CR3: 0000000003228002 CR4: 00000000001606e0
[  158.687228] Call Trace:
[  158.689752]  ? __gre6_xmit+0x246/0xd80 [ip6_gre]
[  158.694475]  ? __gre6_xmit+0x246/0xd80 [ip6_gre]
[  158.699141]  skb_push+0x78/0x90
[  158.702344]  __gre6_xmit+0x246/0xd80 [ip6_gre]
[  158.706872]  ip6gre_tunnel_xmit+0x3bc/0x610 [ip6_gre]
[  158.711992]  ? __gre6_xmit+0xd80/0xd80 [ip6_gre]
[  158.716668]  ? debug_check_no_locks_freed+0x210/0x210
[  158.721761]  ? print_irqtrace_events+0x120/0x120
[  158.726461]  ? sched_clock_cpu+0x18/0x210
[  158.730572]  ? sched_clock_cpu+0x18/0x210
[  158.734692]  ? cyc2ns_read_end+0x10/0x10
[  158.738705]  ? skb_network_protocol+0x76/0x200
[  158.743216]  ? netif_skb_features+0x1b2/0x550
[  158.747648]  dev_hard_start_xmit+0x137/0x770
[  158.752010]  sch_direct_xmit+0x2ef/0x5d0
[  158.755992]  ? pfifo_fast_dequeue+0x3fa/0x670
[  158.760460]  ? pfifo_fast_change_tx_queue_len+0x810/0x810
[  158.765975]  ? __lock_is_held+0xa0/0x160
[  158.770002]  __qdisc_run+0x39e/0xfc0
[  158.773673]  ? _raw_spin_unlock+0x29/0x40
[  158.777781]  ? pfifo_fast_enqueue+0x24b/0x3e0
[  158.782191]  ? sch_direct_xmit+0x5d0/0x5d0
[  158.786372]  ? pfifo_fast_dequeue+0x670/0x670
[  158.790818]  ? __dev_queue_xmit+0x172/0x1770
[  158.795195]  ? preempt_count_sub+0xf/0xd0
[  158.799313]  __dev_queue_xmit+0x410/0x1770
[  158.803512]  ? ___slab_alloc+0x605/0x930
[  158.807525]  ? ___slab_alloc+0x605/0x930
[  158.811540]  ? memcpy+0x34/0x50
[  158.814768]  ? netdev_pick_tx+0x1c0/0x1c0
[  158.818895]  ? __skb_clone+0x2fd/0x3d0
[  158.822712]  ? __copy_skb_header+0x270/0x270
[  158.827079]  ? rcu_read_lock_sched_held+0x93/0xa0
[  158.831903]  ? kmem_cache_alloc+0x344/0x4d0
[  158.836199]  ? skb_clone+0x123/0x230
[  158.839869]  ? skb_split+0x820/0x820
[  158.843521]  ? tcf_mirred+0x554/0x930 [act_mirred]
[  158.848407]  tcf_mirred+0x554/0x930 [act_mirred]
[  158.853104]  ? tcf_mirred_act_wants_ingress.part.2+0x10/0x10 [act_mirred]
[  158.860005]  ? __lock_acquire+0x706/0x26e0
[  158.864162]  ? mark_lock+0x13d/0xb40
[  158.867832]  tcf_action_exec+0xcf/0x2a0
[  158.871736]  tcf_classify+0xfa/0x340
[  158.875402]  __netif_receive_skb_core+0x8e1/0x1c60
[  158.880334]  ? nf_ingress+0x500/0x500
[  158.884059]  ? process_backlog+0x347/0x4b0
[  158.888241]  ? lock_acquire+0xd8/0x320
[  158.892050]  ? process_backlog+0x1b6/0x4b0
[  158.896228]  ? process_backlog+0xc2/0x4b0
[  158.900291]  process_backlog+0xc2/0x4b0
[  158.904210]  net_rx_action+0x5cc/0x980
[  158.908047]  ? napi_complete_done+0x2c0/0x2c0
[  158.912525]  ? rcu_read_unlock+0x80/0x80
[  158.916534]  ? __lock_is_held+0x34/0x160
[  158.920541]  __do_softirq+0x1d4/0x9d2
[  158.924308]  ? trace_event_raw_event_irq_handler_exit+0x140/0x140
[  158.930515]  run_ksoftirqd+0x1d/0x40
[  158.934152]  smpboot_thread_fn+0x32b/0x690
[  158.938299]  ? sort_range+0x20/0x20
[  158.941842]  ? preempt_count_sub+0xf/0xd0
[  158.945940]  ? schedule+0x5b/0x140
[  158.949412]  kthread+0x206/0x300
[  158.952689]  ? sort_range+0x20/0x20
[  158.956249]  ? kthread_stop+0x570/0x570
[  158.960164]  ret_from_fork+0x3a/0x50
[  158.963823] Code: 14 3e ff 8b 4b 78 55 4d 89 f9 41 56 41 55 48 c7 c7 a0 cf db 82 41 54 44 8b 44 24 2c 48 8b 54 24 30 48 8b 74 24 20 e8 16 94 13 ff <0f> 0b 48 c7 c7 60 8e 1f 85 48 83 c4 20 e8 55 ef a6 ff 89 74 24
[  158.983235] RIP: skb_panic+0xc3/0x100 RSP: ffff8801d3f27110
[  158.988935] ---[ end trace 5af56ee845aa6cc8 ]---
[  158.993641] Kernel panic - not syncing: Fatal exception in interrupt
[  159.000176] Kernel Offset: disabled
[  159.003767] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Reproducer:

ip link add h1 type veth peer name swp1
ip link add h3 type veth peer name swp3

ip link set dev h1 up
ip address add 192.0.2.1/28 dev h1

ip link add dev vh3 type vrf table 20
ip link set dev h3 master vh3
ip link set dev vh3 up
ip link set dev h3 up

ip link set dev swp3 up
ip address add dev swp3 2001:db8:2::1/64

ip link set dev swp1 up
tc qdisc add dev swp1 clsact

ip link add name gt6 type ip6gretap \
local 2001:db8:2::1 remote 2001:db8:2::2
ip link set dev gt6 up

sleep 1

tc filter add dev swp1 ingress pref 1000 matchall skip_hw \
action mirred egress mirror dev gt6
ping -I h1 192.0.2.2

Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Signed-off-by: Petr Machata <[email protected]>
Acked-by: William Tu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

selftests/bpf: check return value of fopen in test_verifier.c

Commit 0a6748740368 ("selftests/bpf: Only run tests if !bpf_disabled")
forgot to check return value of fopen.

This caused some confusion, when running test_verifier (from
tools/testing/selftests/bpf/) on an older kernel (< v4.4) as it will
simply seqfault.

This fix avoids the segfault and prints an error, but allow program to
continue. Given the sysctl was introduced in 1be7f75d1668 ("bpf:
enable non-root eBPF programs"), we know that the running kernel
cannot support unpriv, thus continue with unpriv_disabled = true.

Fixes: 0a6748740368 ("selftests/bpf: Only run tests if !bpf_disabled")
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

erspan: fix invalid erspan version.

ERSPAN only support version 1 and 2. When packets send to an
erspan device which does not have proper version number set,
drop the packet. In real case, we observe multicast packets
sent to the erspan pernet device, erspan0, which does not have
erspan version configured.

Reported-by: Greg Rose <[email protected]>
Signed-off-by: William Tu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

x86/apic/x2apic: Initialize cluster ID properly

Rick bisected a regression on large systems which use the x2apic cluster
mode for interrupt delivery to the commit wich reworked the cluster
management.

The problem is caused by a missing initialization of the clusterid field
in the shared cluster data structures. So all structures end up with
cluster ID 0 which only allows sharing between all CPUs which belong to
cluster 0. All other CPUs with a cluster ID > 0 cannot share the data
structure because they cannot find existing data with their cluster
ID. This causes malfunction with IPIs because IPIs are sent to the wrong
cluster and the caller waits for ever that the target CPU handles the IPI.

Add the missing initialization when a upcoming CPU is the first in a
cluster so that the later booting CPUs can find the data and share it for
proper operation.

Fixes: 023a611748fd ("x86/apic/x2apic: Simplify cluster management")
Reported-by: Rick Warner <[email protected]>
Bisected-by: Rick Warner <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Rick Warner <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

Merge branch 'ibmvnic-Fix-bugs-and-memory-leaks'

Thomas Falcon says:

====================
ibmvnic: Fix bugs and memory leaks

This is a small patch series fixing up some bugs and memory leaks
in the ibmvnic driver. The first fix frees up previously allocated
memory that should be freed in case of an error. The second fixes
a reset case that was failing due to TX/RX queue IRQ's being
erroneously disabled without being enabled again. The final patch
fixes incorrect reallocated of statistics buffers during a device
reset, resulting in loss of statistics information and a memory leak.
====================

Signed-off-by: David S. Miller <[email protected]>

ibmvnic: Fix statistics buffers memory leak

Move initialization of statistics buffers from ibmvnic_init function
into ibmvnic_probe. In the current state, ibmvnic_init will be called
again during a device reset, resulting in the allocation of new
buffers without freeing the old ones.

Signed-off-by: Thomas Falcon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ibmvnic: Fix non-fatal firmware error reset

It is not necessary to disable interrupt lines here during a reset
to handle a non-fatal firmware error. Move that call within the code
block that handles the other cases that do require interrupts to be
disabled and re-enabled.

Signed-off-by: Thomas Falcon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ibmvnic: Free coherent DMA memory if FW map failed

If the firmware map fails for whatever reason, remember to free
up the memory after.

Signed-off-by: Thomas Falcon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/ipv4: Initialize proto and ports in flow struct

Updating the FIB tracepoint for the recent change to allow rules using
the protocol and ports exposed a few places where the entries in the flow
struct are not initialized.

For __fib_validate_source add the call to fib4_rules_early_flow_dissect
since it is invoked for the input path. For netfilter, add the memset on
the flow struct to avoid future problems like this. In ip_route_input_slow
need to set the fields if the skb dissection does not happen.

Fixes: bfff4862653b ("net: fib_rules: support for match on ip_proto, sport and dport")
Signed-off-by: David Ahern <[email protected]>
Acked-by: Roopa Prabhu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tls: don't use stack memory in a scatterlist

scatterlist code expects virt_to_page() to work, which fails with
CONFIG_VMAP_STACK=y.

Fixes: c46234ebb4d1e ("tls: RX path for ktls")
Signed-off-by: Matt Mullins <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:

- ARM/ARM64 locking fixes

- x86 fixes: PCID, UMIP, locking

- improved support for recent Windows version that have a 2048 Hz APIC
   timer

- rename KVM_HINTS_DEDICATED CPUID bit to KVM_HINTS_REALTIME

- better behaved selftests

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME
  KVM: arm/arm64: VGIC/ITS save/restore: protect kvm_read_guest() calls
  KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock
  KVM: arm/arm64: VGIC/ITS: Promote irq_lock() in update_affinity
  KVM: arm/arm64: Properly protect VGIC locks from IRQs
  KVM: X86: Lower the default timer frequency limit to 200us
  KVM: vmx: update sec exec controls for UMIP iff emulating UMIP
  kvm: x86: Suppress CR3_PCID_INVD bit only when PCIDs are enabled
  KVM: selftests: exit with 0 status code when tests cannot be run
  KVM: hyperv: idr_find needs RCU protection
  x86: Delay skip of emulated hypercall instruction
  KVM: Extend MAX_IRQ_ROUTES to 4096 for all archs

Merge tag 'kvm-s390-master-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master

KVM: s390: Fix vsie handling for transactional diagnostic block

vsie (nested KVM) might reject a valid input. Fix it.

Merge tag 'sound-4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"We have a core fix in the compat code for covering a potential race
  (double references), but it's a very minor change.

  The rest are all small device-specific quirks, as well as a correction
  of the new UAC3 support code"

* tag 'sound-4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: usb-audio: Use Class Specific EP for UAC3 devices.
  ALSA: hda/realtek - Clevo P950ER ALC1220 Fixup
  ALSA: usb: mixer: volume quirk for CM102-A+/102S+
  ALSA: hda: Add Lenovo C50 All in one to the power_save blacklist
  ALSA: control: fix a redundant-copy issue

kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME

KVM_HINTS_DEDICATED seems to be somewhat confusing:

Guest doesn't really care whether it's the only task running on a host
CPU as long as it's not preempted.

And there are more reasons for Guest to be preempted than host CPU
sharing, for example, with memory overcommit it can get preempted on a
memory access, post copy migration can cause preemption, etc.

Let's call it KVM_HINTS_REALTIME which seems to better
match what guests expect.

Also, the flag most be set on all vCPUs - current guests assume this.
Note so in the documentation.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:

- a fix for the vfio ccw translation code

- update an incorrect email address in the MAINTAINERS file

- fix a division by zero oops in the cpum_sf code found by trinity

- two fixes for the error handling of the qdio code

- several spectre related patches to convert all left-over indirect
   branches in the kernel to expoline branches

- update defconfigs to avoid warnings due to the netfilter Kconfig
   changes

- avoid several compiler warnings in the kexec_file code for s390

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/qdio: don't release memory in qdio_setup_irq()
  s390/qdio: fix access to uninitialized qdio_q fields
  s390/cpum_sf: ensure sample frequency of perf event attributes is non-zero
  s390: use expoline thunks in the BPF JIT
  s390: extend expoline to BC instructions
  s390: remove indirect branch from do_softirq_own_stack
  s390: move spectre sysfs attribute code
  s390/kernel: use expoline for indirect branches
  s390/ftrace: use expoline for indirect branches
  s390/lib: use expoline for indirect branches
  s390/crc32-vx: use expoline for indirect branches
  s390: move expoline assembler macros to a header
  vfio: ccw: fix cleanup if cp_prefetch fails
  s390/kexec_file: add declaration of purgatory related globals
  s390: update defconfigs
  MAINTAINERS: update s390 zcrypt maintainers email address

Merge tag 'selinux-pr-20180516' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull SELinux fixes from Paul Moore:
"A small pull request to fix a few regressions in the SELinux/SCTP code
  with applications that call bind() with AF_UNSPEC/INADDR_ANY.

  The individual commit descriptions have more information, but the
  commits themselves should be self explanatory"

* tag 'selinux-pr-20180516' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: correctly handle sa_family cases in selinux_sctp_bind_connect()
  selinux: fix address family in bind() and connect() to match address/port
  selinux: add AF_UNSPEC and INADDR_ANY checks to selinux_socket_bind()

proc: do not access cmdline nor environ from file-backed areas

proc_pid_cmdline_read() and environ_read() directly access the target
process' VM to retrieve the command line and environment. If this
process remaps these areas onto a file via mmap(), the requesting
process may experience various issues such as extra delays if the
underlying device is slow to respond.

Let's simply refuse to access file-backed areas in these functions.
For this we add a new FOLL_ANON gup flag that is passed to all calls
to access_remote_vm(). The code already takes care of such failures
(including unmapped areas). Accesses via /proc/pid/mem were not
changed though.

This was assigned CVE-2018-1120.

Note for stable backports: the patch may apply to kernels prior to 4.11
but silently miss one location; it must be checked that no call to
access_remote_vm() keeps zero as the last argument.

Reported-by: Qualys Security Advisory <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: [email protected]
Signed-off-by: Willy Tarreau <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

bcache: return 0 from bch_debug_init() if CONFIG_DEBUG_FS=n

Commit 539d39eb2708 ("bcache: fix wrong return value in bch_debug_init()")
returns the return value of debugfs_create_dir() to bcache_init(). When
CONFIG_DEBUG_FS=n, bch_debug_init() always returns 1 and makes
bcache_init() failedi.

This patch makes bch_debug_init() always returns 0 if CONFIG_DEBUG_FS=n,
so bcache can continue to work for the kernels which don't have debugfs
enanbled.

Changelog:
v4: Add Acked-by from Kent Overstreet.
v3: Use IS_ENABLED(CONFIG_DEBUG_FS) to replace #ifdef DEBUG_FS.
v2: Remove a warning information
v1: Initial version.

Fixes: Commit 539d39eb2708 ("bcache: fix wrong return value in bch_debug_init()")
Cc: [email protected]
Signed-off-by: Coly Li <[email protected]>
Reported-by: Massimo B. <[email protected]>
Reported-by: Kai Krakow <[email protected]>
Tested-by: Kai Krakow <[email protected]>
Acked-by: Kent Overstreet <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD

Expose the new virtualized architectural mechanism, VIRT_SSBD, for using
speculative store bypass disable (SSBD) under SVM. This will allow guests
to use SSBD on hardware that uses non-architectural mechanisms for enabling
SSBD.

[ tglx: Folded the migration fixup from Paolo Bonzini ]

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>

x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG

Add the necessary logic for supporting the emulated VIRT_SPEC_CTRL MSR to
x86_virt_spec_ctrl(). If either X86_FEATURE_LS_CFG_SSBD or
X86_FEATURE_VIRT_SPEC_CTRL is set then use the new guest_virt_spec_ctrl
argument to check whether the state must be modified on the host. The
update reuses speculative_store_bypass_update() so the ZEN-specific sibling
coordination can be reused.

Signed-off-by: Thomas Gleixner <[email protected]>

x86/bugs: Rework spec_ctrl base and mask logic

x86_spec_ctrL_mask is intended to mask out bits from a MSR_SPEC_CTRL value
which are not to be modified. However the implementation is not really used
and the bitmask was inverted to make a check easier, which was removed in
"x86/bugs: Remove x86_spec_ctrl_set()"

Aside of that it is missing the STIBP bit if it is supported by the
platform, so if the mask would be used in x86_virt_spec_ctrl() then it
would prevent a guest from setting STIBP.

Add the STIBP bit if supported and use the mask in x86_virt_spec_ctrl() to
sanitize the value which is supplied by the guest.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>

x86/bugs: Remove x86_spec_ctrl_set()

x86_spec_ctrl_set() is only used in bugs.c and the extra mask checks there
provide no real value as both call sites can just write x86_spec_ctrl_base
to MSR_SPEC_CTRL. x86_spec_ctrl_base is valid and does not need any extra
masking or checking.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>

x86/bugs: Expose x86_spec_ctrl_base directly

x86_spec_ctrl_base is the system wide default value for the SPEC_CTRL MSR.
x86_spec_ctrl_get_default() returns x86_spec_ctrl_base and was intended to
prevent modification to that variable. Though the variable is read only
after init and globaly visible already.

Remove the function and export the variable instead.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>