Git Repo - linux.git/log

perf evlist: Remove stale mmap read for backward

perf_evlist__mmap_read_catchup() and perf_evlist__mmap_read_backward()
are only for overwrite mode.

But they read the evlist->mmap buffer which is for non-overwrite mode.

It did not bring any serious problem yet, because there is no one use
it.

Remove the unused interfaces.

Signed-off-by: Kan Liang <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Wang Nan <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf vendor events aarch64: Add JSON metrics for ARM Cortex-A53 Processor

Add JSON metrics for ARM Cortex-A53 Processor.

Unlike the Intel processors there isn't a script that automatically
generated these files. The patch was manually generated from the
documentation and the previous oprofile ARM Cortex ac53 event file patch
I made.

The relevant documentation is in the "12.9 Events" section of the ARM
Cortex A53 MPCore Processor Revision: r0p4 Technical Reference Manual.

The ARM Cortex A53 manual is available at:

http://infocenter.arm.com/help/topic/com.arm.doc.ddi0500g/DDI0500G_cortex_a53_trm.pdf

Use that to look for additional information about the events.

Signed-off-by: William Cohen <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Added references provided by William Cohen ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

Merge branches 'acpi-ec', 'acpi-tables' and 'acpi-doc'

* acpi-ec:
  ACPI / EC: Restore polling during noirq suspend/resume phases

* acpi-tables:
  ACPI: SPCR: Mark expected switch fall-through in acpi_parse_spcr

* acpi-doc:
  ACPI: dock: document sysfs interface
  ACPI / DPTF: Document dptf_power sysfs atttributes

Merge branches 'pm-cpuidle' and 'pm-opp'

* pm-cpuidle:
  PM: cpuidle: Fix cpuidle_poll_state_init() prototype
  Documentation/ABI: update cpuidle sysfs documentation

* pm-opp:
  opp: cpu: Replace GFP_ATOMIC with GFP_KERNEL in dev_pm_opp_init_cpufreq_table

platform/x86: dell-laptop: Removed duplicates in DMI whitelist

Fixed a mistake in which several entries were duplicated in the DMI list
from the below commit
fe486138 platform/x86: dell-laptop: Add 2-in-1 devices to the DMI whitelist

Signed-off-by: Alexander Abrosimov <[email protected]>
Reviewed-by: Pali Rohár <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>

platform/x86: dell-laptop: fix kbd_get_state's request value

Commit 9862b43624a5 ("platform/x86: dell-laptop: Allocate buffer on heap
rather than globally")
broke one request, changed it back to the original value.

Tested on a Dell E6540, backlight came back.

Fixes: 9862b43624a5 ("platform/x86: dell-laptop: Allocate buffer on heap rather than globally")
Signed-off-by: Laszlo Toth <[email protected]>
Reviewed-by: Pali Rohár <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>

platform/x86: ideapad-laptop: Increase timeout to wait for EC answer

Lenovo E41-20 needs more time than 100ms to read VPC,
the funtion keys always failed responding.
Increase timeout to get the value from VPC, then
the funtion keys like mic mute key work well.

Signed-off-by: Aaron Ma <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>

platform/x86: wmi: fix off-by-one write in wmi_dev_probe()

wmi_dev_probe() allocates one byte less than necessary, thus
subsequent sprintf() call writes trailing zero past the end
of the 'buf':

    BUG: KASAN: slab-out-of-bounds in vsnprintf+0xda4/0x1240
    Write of size 1 at addr ffff880423529caf by task kworker/1:1/32

    Call Trace:
     dump_stack+0xb3/0x14d
     print_address_description+0xd7/0x380
     kasan_report+0x166/0x2b0
     vsnprintf+0xda4/0x1240
     sprintf+0x9b/0xd0
     wmi_dev_probe+0x1c3/0x400
     driver_probe_device+0x5d1/0x990
     bus_for_each_drv+0x109/0x190
     __device_attach+0x217/0x360
     bus_probe_device+0x1ad/0x260
     deferred_probe_work_func+0x10f/0x5d0
     process_one_work+0xa8b/0x1dc0
     worker_thread+0x20d/0x17d0
     kthread+0x311/0x3d0
     ret_from_fork+0x3a/0x50

    Allocated by task 32:
     kasan_kmalloc+0xa0/0xd0
     __kmalloc+0x14f/0x3e0
     wmi_dev_probe+0x182/0x400
     driver_probe_device+0x5d1/0x990
     bus_for_each_drv+0x109/0x190
     __device_attach+0x217/0x360
     bus_probe_device+0x1ad/0x260
     deferred_probe_work_func+0x10f/0x5d0
     process_one_work+0xa8b/0x1dc0
     worker_thread+0x20d/0x17d0
     kthread+0x311/0x3d0
     ret_from_fork+0x3a/0x50

Increment allocation size to fix this.

Fixes: 44b6b7661132 ("platform/x86: wmi: create userspace interface for drivers")
Signed-off-by: Andrey Ryabinin <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>

Merge branch 'nvme-4.16-rc' of git://git.infradead.org/nvme into for-linus

Pull NVMe fixes from Keith:

"After syncing with Christoph and Sagi, we feel this is a good time to
send our latest fixes across most of the nvme components for 4.16"

* 'nvme-4.16-rc' of git://git.infradead.org/nvme:
  nvme-rdma: fix sysfs invoked reset_ctrl error flow
  nvmet: Change return code of discard command if not supported
  nvme-pci: Fix timeouts in connecting state
  nvme-pci: Remap CMB SQ entries on every controller reset
  nvme: fix the deadlock in nvme_update_formats
  nvme: Don't use a stack buffer for keep-alive command
  nvme_fc: cleanup io completion
  nvme_fc: correct abort race condition on resets
  nvme: Fix discard buffer overrun
  nvme: delete NVME_CTRL_LIVE --> NVME_CTRL_CONNECTING transition
  nvme-rdma: use NVME_CTRL_CONNECTING state to mark init process
  nvme: rename NVME_CTRL_RECONNECTING state to NVME_CTRL_CONNECTING

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"Misc fixes all across the map:

   - /proc/kcore vsyscall related fixes
   - LTO fix
   - build warning fix
   - CPU hotplug fix
   - Kconfig NR_CPUS cleanups
   - cpu_has() cleanups/robustification
   - .gitignore fix
   - memory-failure unmapping fix
   - UV platform fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages
  x86/error_inject: Make just_return_func() globally visible
  x86/platform/UV: Fix GAM Range Table entries less than 1GB
  x86/build: Add arch/x86/tools/insn_decoder_test to .gitignore
  x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU
  x86/mm/kcore: Add vsyscall page to /proc/kcore conditionally
  vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user page
  x86/Kconfig: Further simplify the NR_CPUS config
  x86/Kconfig: Simplify NR_CPUS config
  x86/MCE: Fix build warning introduced by "x86: do not use print_symbol()"
  x86/cpufeature: Update _static_cpu_has() to use all named variables
  x86/cpufeature: Reindent _static_cpu_has()

Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 PTI and Spectre related fixes and updates from Ingo Molnar:
"Here's the latest set of Spectre and PTI related fixes and updates:

  Spectre:
   - Add entry code register clearing to reduce the Spectre attack
     surface
   - Update the Spectre microcode blacklist
   - Inline the KVM Spectre helpers to get close to v4.14 performance
     again.
   - Fix indirect_branch_prediction_barrier()
   - Fix/improve Spectre related kernel messages
   - Fix array_index_nospec_mask() asm constraint
   - KVM: fix two MSR handling bugs

  PTI:
   - Fix a paranoid entry PTI CR3 handling bug
   - Fix comments

  objtool:
   - Fix paranoid_entry() frame pointer warning
   - Annotate WARN()-related UD2 as reachable
   - Various fixes
   - Add Add Peter Zijlstra as objtool co-maintainer

  Misc:
   - Various x86 entry code self-test fixes
   - Improve/simplify entry code stack frame generation and handling
     after recent heavy-handed PTI and Spectre changes. (There's two
     more WIP improvements expected here.)
   - Type fix for cache entries

  There's also some low risk non-fix changes I've included in this
  branch to reduce backporting conflicts:

   - rename a confusing x86_cpu field name
   - de-obfuscate the naming of single-TLB flushing primitives"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
  x86/entry/64: Fix CR3 restore in paranoid_exit()
  x86/cpu: Change type of x86_cache_size variable to unsigned int
  x86/spectre: Fix an error message
  x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping
  selftests/x86/mpx: Fix incorrect bounds with old _sigfault
  x86/mm: Rename flush_tlb_single() and flush_tlb_one() to __flush_tlb_one_[user|kernel]()
  x86/speculation: Add <asm/msr-index.h> dependency
  nospec: Move array_index_nospec() parameter checking into separate macro
  x86/speculation: Fix up array_index_nospec_mask() asm constraint
  x86/debug: Use UD2 for WARN()
  x86/debug, objtool: Annotate WARN()-related UD2 as reachable
  objtool: Fix segfault in ignore_unreachable_insn()
  selftests/x86: Disable tests requiring 32-bit support on pure 64-bit systems
  selftests/x86: Do not rely on "int $0x80" in single_step_syscall.c
  selftests/x86: Do not rely on "int $0x80" in test_mremap_vdso.c
  selftests/x86: Fix build bug caused by the 5lvl test which has been moved to the VM directory
  selftests/x86/pkeys: Remove unused functions
  selftests/x86: Clean up and document sscanf() usage
  selftests/x86: Fix vDSO selftest segfault for vsyscall=none
  x86/entry/64: Remove the unused 'icebp' macro
  ...

x86/entry/64: Fix CR3 restore in paranoid_exit()

Josh Poimboeuf noticed the following bug:

"The paranoid exit code only restores the saved CR3 when it switches back
  to the user GS.  However, even in the kernel GS case, it's possible that
  it needs to restore a user CR3, if for example, the paranoid exception
  occurred in the syscall exit path between SWITCH_TO_USER_CR3_STACK and
  SWAPGS."

Josh also confirmed via targeted testing that it's possible to hit this bug.

Fix the bug by also restoring CR3 in the paranoid_exit_no_swapgs branch.

The reason we haven't seen this bug reported by users yet is probably because
"paranoid" entry points are limited to the following cases:

idtentry double_fault       do_double_fault  has_error_code=1  paranoid=2
idtentry debug              do_debug         has_error_code=0  paranoid=1 shift_ist=DEBUG_STACK
idtentry int3               do_int3          has_error_code=0  paranoid=1 shift_ist=DEBUG_STACK
idtentry machine_check      do_mce           has_error_code=0  paranoid=1

Amongst those entry points only machine_check is one that will interrupt an
IRQS-off critical section asynchronously - and machine check events are rare.

The other main asynchronous entries are NMI entries, which can be very high-freq
with perf profiling, but they are special: they don't use the 'idtentry' macro but
are open coded and restore user CR3 unconditionally so don't have this bug.

Reported-and-tested-by: Josh Poimboeuf <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

x86/cpu: Change type of x86_cache_size variable to unsigned int

Currently, x86_cache_size is of type int, which makes no sense as we
will never have a valid cache size equal or less than 0. So instead of
initializing this variable to -1, it can perfectly be initialized to 0
and use it as an unsigned variable instead.

Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Addresses-Coverity-ID: 1464429
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

x86/spectre: Fix an error message

If i == ARRAY_SIZE(mitigation_options) then we accidentally print
garbage from one space beyond the end of the mitigation_options[] array.

Signed-off-by: Dan Carpenter <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: KarimAllah Ahmed <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Fixes: 9005c6834c0f ("x86/spectre: Simplify spectre_v2 command line parsing")
Link: http://lkml.kernel.org/r/20180214071416.GA26677@mwanda
Signed-off-by: Ingo Molnar <[email protected]>

x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping

x86_mask is a confusing name which is hard to associate with the
processor's stepping.

Additionally, correct an indent issue in lib/cpu.c.

Signed-off-by: Jia Zhang <[email protected]>
[ Updated it to more recent kernels. ]
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

selftests/x86/mpx: Fix incorrect bounds with old _sigfault

For distributions with old userspace header files, the _sigfault
structure is different. mpx-mini-test fails with the following
error:

  [root@Purley]# mpx-mini-test_64 tabletest
  XSAVE is supported by HW & OS
  XSAVE processor supported state mask: 0x2ff
  XSAVE OS supported state mask: 0x2ff
   BNDREGS: size: 64 user: 1 supervisor: 0 aligned: 0
    BNDCSR: size: 64 user: 1 supervisor: 0 aligned: 0
  starting mpx bounds table test
  ERROR: siginfo bounds do not match shadow bounds for register 0

Fix it by using the correct offset of _lower/_upper in _sigfault.
RHEL needs this patch to work.

Signed-off-by: Rui Wang <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Fixes: e754aedc26ef ("x86/mpx, selftests: Add MPX self test")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

x86/mm: Rename flush_tlb_single() and flush_tlb_one() to __flush_tlb_one_[user|kernel]()

flush_tlb_single() and flush_tlb_one() sound almost identical, but
they really mean "flush one user translation" and "flush one kernel
translation".  Rename them to flush_tlb_one_user() and
flush_tlb_one_kernel() to make the semantics more obvious.

[ I was looking at some PTI-related code, and the flush-one-address code
  is unnecessarily hard to understand because the names of the helpers are
  uninformative.  This came up during PTI review, but no one got around to
  doing it. ]

Signed-off-by: Andy Lutomirski <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Linux-MM <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Link: http://lkml.kernel.org/r/3303b02e3c3d049dc5235d5651e0ae6d29a34354.1517414378.git.luto@kernel.org
Signed-off-by: Ingo Molnar <[email protected]>

x86/speculation: Add <asm/msr-index.h> dependency

Joe Konno reported a compile failure resulting from using an MSR
without inclusion of <asm/msr-index.h>, and while the current code builds
fine (by accident) this needs fixing for future patches.

Reported-by: Joe Konno <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Fixes: 20ffa1caecca ("x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

nospec: Move array_index_nospec() parameter checking into separate macro

For architectures providing their own implementation of
array_index_mask_nospec() in asm/barrier.h, attempting to use WARN_ONCE() to
complain about out-of-range parameters using WARN_ON() results in a mess
of mutually-dependent include files.

Rather than unpick the dependencies, simply have the core code in nospec.h
perform the checking for us.

Signed-off-by: Will Deacon <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

x86/speculation: Fix up array_index_nospec_mask() asm constraint

Allow the compiler to handle @size as an immediate value or memory
directly rather than allocating a register.

Reported-by: Linus Torvalds <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/151797010204.1289.1510000292250184993.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <[email protected]>

x86/debug: Use UD2 for WARN()

Since the Intel SDM added an ModR/M byte to UD0 and binutils followed
that specification, we now cannot disassemble our kernel anymore.

This now means Intel and AMD disagree on the encoding of UD0. And instead
of playing games with additional bytes that are valid ModR/M and single
byte instructions (0xd6 for instance), simply use UD2 for both WARN() and
BUG().

Requested-by: Linus Torvalds <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

x86/debug, objtool: Annotate WARN()-related UD2 as reachable

By default, objtool assumes that a UD2 is a dead end. This is mainly
because GCC 7+ sometimes inserts a UD2 when it detects a divide-by-zero
condition.

Now that WARN() is moving back to UD2, annotate the code after it as
reachable so objtool can follow the code flow.

Reported-by: Borislav Petkov <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: kbuild test robot <[email protected]>
Link: http://lkml.kernel.org/r/0e483379275a42626ba8898117f918e1bf661e40.1518130694.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <[email protected]>

objtool: Fix segfault in ignore_unreachable_insn()

Peter Zijlstra's patch for converting WARN() to use UD2 triggered a
bunch of false "unreachable instruction" warnings, which then triggered
a seg fault in ignore_unreachable_insn().

The seg fault happened when it tried to dereference a NULL 'insn->func'
pointer. Thanks to static_cpu_has(), some functions can jump to a
non-function area in the .altinstr_aux section. That breaks
ignore_unreachable_insn()'s assumption that it's always inside the
original function.

Make sure ignore_unreachable_insn() only follows jumps within the
current function.

Reported-by: Borislav Petkov <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: kbuild test robot <[email protected]>
Link: http://lkml.kernel.org/r/bace77a60d5af9b45eddb8f8fb9c776c8de657ef.1518130694.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <[email protected]>

selftests/x86: Disable tests requiring 32-bit support on pure 64-bit systems

The ldt_gdt and ptrace_syscall selftests, even in their 64-bit variant, use
hard-coded 32-bit syscall numbers and call "int $0x80".

This will fail on 64-bit systems with CONFIG_IA32_EMULATION=y disabled.

Therefore, do not build these tests if we cannot build 32-bit binaries
(which should be a good approximation for CONFIG_IA32_EMULATION=y being enabled).

Signed-off-by: Dominik Brodowski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dmitry Safonov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

selftests/x86: Do not rely on "int $0x80" in single_step_syscall.c

On 64-bit builds, we should not rely on "int $0x80" working (it only does if
CONFIG_IA32_EMULATION=y is enabled). To keep the "Set TF and check int80"
test running on 64-bit installs with CONFIG_IA32_EMULATION=y enabled, build
this test only if we can also build 32-bit binaries (which should be a
good approximation for that).

Signed-off-by: Dominik Brodowski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dmitry Safonov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

powerpc/pseries: Add empty update_numa_cpu_lookup_table() for NUMA=n

When CONFIG_NUMA is not set, the build fails with:

arch/powerpc/platforms/pseries/hotplug-cpu.c:335:4:
error: déclaration implicite de la fonction « update_numa_cpu_lookup_table »

So we have to add update_numa_cpu_lookup_table() as an empty function
when CONFIG_NUMA is not set.

Fixes: 1d9a090783be ("powerpc/numa: Invalidate numa_cpu_lookup_table on cpu remove")
Signed-off-by: Corentin Labbe <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/powernv: IMC fix out of bounds memory access at shutdown

The OPAL IMC driver's shutdown handler disables nest PMU counters by
walking nodes and taking the first CPU out of their cpumask, which is
used to index into the paca (get_hard_smp_processor_id()). This does
not always do the right thing, and in particular for CPU-less nodes it
returns NR_CPUS and that overruns the paca and dereferences random
memory.

Fix it by being more careful about checking returned CPU, and only
using online CPUs. It's not clear this shutdown code makes sense after
commit 885dcd709b ("powerpc/perf: Add nest IMC PMU support"), but this
should not make things worse

Currently the bug causes us to call OPAL with a junk CPU number. A
separate patch in development to change the way pacas are allocated
escalates this bug into a crash:

  Unable to handle kernel paging request for data at address 0x2a21af1eeb000076
  Faulting instruction address: 0xc0000000000a5468
  Oops: Kernel access of bad area, sig: 11 [#1]
  ...
  NIP opal_imc_counters_shutdown+0x148/0x1d0
  LR  opal_imc_counters_shutdown+0x134/0x1d0
  Call Trace:
   opal_imc_counters_shutdown+0x134/0x1d0 (unreliable)
   platform_drv_shutdown+0x44/0x60
   device_shutdown+0x1f8/0x350
   kernel_restart_prepare+0x54/0x70
   kernel_restart+0x28/0xc0
   SyS_reboot+0x1d0/0x2c0
   system_call+0x58/0x6c

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/xive: Use hw CPU ids when configuring the CPU queues

The CPU event notification queues on sPAPR should be configured using
a hardware CPU identifier.

The problem did not show up on the Power Hypervisor because pHyp
supports 8 threads per core which keeps CPU number contiguous. This is
not the case on all sPAPR virtual machines, some use SMT=1.

Also improve error logging by adding the CPU number.

Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
Cc: [email protected] # v4.14+
Signed-off-by: Cédric Le Goater <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Expose TSCR via sysfs only on powernv

The TSCR can only be accessed in hypervisor mode.

Fixes: 88b5e12eeb11 ("powerpc: Expose TSCR via sysfs")
Signed-off-by: Cyril Bur <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

tls: getsockopt return record sequence number

Return the TLS record sequence number in getsockopt.

Signed-off-by: Boris Pismenny <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tls: reset the crypto info if copy_from_user fails

copy_from_user could copy some partial information, as a result
TLS_CRYPTO_INFO_READY(crypto_info) could be true while crypto_info is
using uninitialzed data.

This patch resets crypto_info when copy_from_user fails.

fixes: 3c4d7559159b ("tls: kernel TLS support")
Signed-off-by: Boris Pismenny <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tls: retrun the correct IV in getsockopt

Current code returns four bytes of salt followed by four bytes of IV.
This patch returns all eight bytes of IV.

fixes: 3c4d7559159b ("tls: kernel TLS support")
Signed-off-by: Boris Pismenny <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'net-segmentation-offload-doc-fixes'

Daniel Axtens says:

====================
Updates to segmentation-offloads.txt

I've been trying to wrap my head around GSO for a while now. This is a
set of small changes to the docs that would probably have been helpful
when I was starting out.

I realise that GSO_DODGY is still a notable omission - I'm hesitant to
write too much on it just yet as I don't understand it well and I
think it's in the process of changing.
====================

Signed-off-by: David S. Miller <[email protected]>

docs: segmentation-offloads.txt: add SCTP info

Most of this is extracted from 90017accff61 ("sctp: Add GSO support"),
with some extra text about GSO_BY_FRAGS and the need to check for it.

Cc: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: Daniel Axtens <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

docs: segmentation-offloads.txt: Fix ref to SKB_GSO_TUNNEL_REMCSUM

The doc originally called it SKB_GSO_REMCSUM. Fix it.

Fixes: f7a6272bf3cb ("Documentation: Add documentation for TSO and GSO features")
Signed-off-by: Daniel Axtens <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

docs: segmentation-offloads.txt: update for UFO depreciation

UFO is deprecated except for tuntap and packet per 0c19f846d582,
("net: accept UFO datagrams from tuntap and packet"). Update UFO
docs to reflect this.

Signed-off-by: Daniel Axtens <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'tipc-locking-fixes'

Ying Xue says:

====================
tipc: Fix missing RTNL lock protection during setting link properties

At present it's unsafe to configure link properties through netlink
as the entire setting process is not under RTNL lock protection. Now
TIPC supports two different sets of netlink APIs at the same time, and
they share the same set of backend functions to configure bearer,
media and net properties. In order to solve the missing RTNL issue,
we have to make the whole __tipc_nl_compat_doit() protected by RTNL,
which means any function called within it cannot take RTNL any more.
So in the series we first introduce the following new functions which
doesn't hold RTNl lock:

- __tipc_nl_bearer_disable()
- __tipc_nl_bearer_enable()
- __tipc_nl_bearer_set()
- __tipc_nl_media_set()
- __tipc_nl_net_set()

Meanwhile, __tipc_nl_compat_doit() has been reconstructed to minimize
the time of holding RTNL lock.

Changes in v4:
- Per suggestion of Kirill Tkhai, divided original big one patch into
   seven small ones so that they can be easily reviewed.

Changes in v3:
- Optimized return method of __tipc_nl_bearer_enable() regarding
   the comments from David M and Kirill Tkhai
- Moved the allocations of memory in __tipc_nl_compat_doit() out
   of RTNL lock to minimize the time of holding RTNL lock according
   to the suggestion of Kirill Tkhai.

Changes in v2:
- The whole operation of setting bearer/media properties has been
   protected under RTNL, as per feedback from David M.
====================

Signed-off-by: David S. Miller <[email protected]>

tipc: Fix missing RTNL lock protection during setting link properties

Currently when user changes link properties, TIPC first checks if
user's command message contains media name or bearer name through
tipc_media_find() or tipc_bearer_find() which is protected by RTNL
lock. But when tipc_nl_compat_link_set() conducts the checking with
the two functions, it doesn't hold RTNL lock at all, as a result,
the following complaints were reported:

audit: type=1400 audit(1514679888.244:9): avc:  denied  { write } for
pid=3194 comm="syzkaller021477" path="socket:[11143]" dev="sockfs"
ino=11143 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
tcontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
tclass=netlink_generic_socket permissive=1
Reviewed-by: Kirill Tkhai <[email protected]>
=============================
WARNING: suspicious RCU usage
4.15.0-rc5+ #152 Not tainted
-----------------------------
net/tipc/bearer.c:177 suspicious rcu_dereference_protected() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
2 locks held by syzkaller021477/3194:
  #0:  (cb_lock){++++}, at: [<00000000d20133ea>] genl_rcv+0x19/0x40
net/netlink/genetlink.c:634
  #1:  (genl_mutex){+.+.}, at: [<00000000fcc5d1bc>] genl_lock
net/netlink/genetlink.c:33 [inline]
  #1:  (genl_mutex){+.+.}, at: [<00000000fcc5d1bc>] genl_rcv_msg+0x115/0x140
net/netlink/genetlink.c:622

stack backtrace:
CPU: 1 PID: 3194 Comm: syzkaller021477 Not tainted 4.15.0-rc5+ #152
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:53
  lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4585
  tipc_bearer_find+0x2b4/0x3b0 net/tipc/bearer.c:177
  tipc_nl_compat_link_set+0x329/0x9f0 net/tipc/netlink_compat.c:729
  __tipc_nl_compat_doit net/tipc/netlink_compat.c:288 [inline]
  tipc_nl_compat_doit+0x15b/0x660 net/tipc/netlink_compat.c:335
  tipc_nl_compat_handle net/tipc/netlink_compat.c:1119 [inline]
  tipc_nl_compat_recv+0x112f/0x18f0 net/tipc/netlink_compat.c:1201
  genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:599
  genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:624
  netlink_rcv_skb+0x21e/0x460 net/netlink/af_netlink.c:2408
  genl_rcv+0x28/0x40 net/netlink/genetlink.c:635
  netlink_unicast_kernel net/netlink/af_netlink.c:1275 [inline]
  netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1301
  netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1864
  sock_sendmsg_nosec net/socket.c:636 [inline]
  sock_sendmsg+0xca/0x110 net/socket.c:646
  sock_write_iter+0x31a/0x5d0 net/socket.c:915
  call_write_iter include/linux/fs.h:1772 [inline]
  new_sync_write fs/read_write.c:469 [inline]
  __vfs_write+0x684/0x970 fs/read_write.c:482
  vfs_write+0x189/0x510 fs/read_write.c:544
  SYSC_write fs/read_write.c:589 [inline]
  SyS_write+0xef/0x220 fs/read_write.c:581
  do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
  do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
  entry_SYSENTER_compat+0x54/0x63 arch/x86/entry/entry_64_compat.S:129

In order to correct the mistake, __tipc_nl_compat_doit() has been
protected by RTNL lock, which means the whole operation of setting
bearer/media properties is under RTNL protection.

Signed-off-by: Ying Xue <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tipc: Introduce __tipc_nl_net_set

Introduce __tipc_nl_net_set() which doesn't hold RTNL lock.

Signed-off-by: Ying Xue <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tipc: Introduce __tipc_nl_media_set

Introduce __tipc_nl_media_set() which doesn't hold RTNL lock.

Signed-off-by: Ying Xue <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tipc: Introduce __tipc_nl_bearer_set

Introduce __tipc_nl_bearer_set() which doesn't holding RTNL lock.

Signed-off-by: Ying Xue <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tipc: Introduce __tipc_nl_bearer_enable

Introduce __tipc_nl_bearer_enable() which doesn't hold RTNL lock.

Signed-off-by: Ying Xue <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tipc: Introduce __tipc_nl_bearer_disable

Introduce __tipc_nl_bearer_disable() which doesn't hold RTNL lock.

Signed-off-by: Ying Xue <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tipc: Refactor __tipc_nl_compat_doit

As preparation for adding RTNL to make (*cmd->transcode)() and
(*cmd->transcode)() constantly protected by RTNL lock, we move out of
memory allocations existing between them as many as possible so that
the time of holding RTNL can be minimized in __tipc_nl_compat_doit().

Signed-off-by: Ying Xue <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drm/i915: Fix DSI panels with v1 MIPI sequences without a DEASSERT sequence v3

So far models of the Dell Venue 8 Pro, with a panel with MIPI panel
index = 3, one of which has been kindly provided to me by Jan Brummer,
where not working with the i915 driver, giving a black screen on the
first modeset.

The problem with at least these Dells is that their VBT defines a MIPI
ASSERT sequence, but not a DEASSERT sequence. Instead they DEASSERT the
reset in their INIT_OTP sequence, but the deassert must be done before
calling intel_dsi_device_ready(), so that is too late.

Simply doing the INIT_OTP sequence earlier is not enough to fix this,
because the INIT_OTP sequence also sends various MIPI packets to the
panel, which can only happen after calling intel_dsi_device_ready().

This commit fixes this by splitting the INIT_OTP sequence into everything
before the first DSI packet and everything else, including the first DSI
packet. The first part (everything before the first DSI packet) is then
used as deassert sequence.

Changed in v2:
-Split the init OTP sequence into a deassert reset and the actual init
OTP sequence, instead of calling it earlier and then having the first
mipi_exec_send_packet() call call intel_dsi_device_ready().

Changes in v3:
-Move the whole shebang to intel_bios.c

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82880
References: https://bugs.freedesktop.org/show_bug.cgi?id=101205
Cc: Jan-Michael Brummer <[email protected]>
Reported-by: Jan-Michael Brummer <[email protected]>
Tested-by: Hans de Goede <[email protected]>
Reviewed-by: Ville Syrjälä <[email protected]>
Acked-by: Jani Nikula <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit fb38e7ade9af4f3e96f5916c3f6f19bfc7d5f961)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915: Free memdup-ed DSI VBT data structures on driver_unload

Make intel_bios_cleanup function free the DSI VBT data structures which
are memdup-ed by parse_mipi_config() and parse_mipi_sequence().

Reviewed-by: Ville Syrjälä <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit e1b86c85f6c2029c31dba99823b6f3d9e15eaacd)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915: Add intel_bios_cleanup() function

Add an intel_bios_cleanup() function to act as counterpart of
intel_bios_init() and move the cleanup of vbt related resources there,
putting it in the same file as the allocation.

Changed in v2:
-While touching the code anyways, remove the unnecessary:
if (dev_priv->vbt.child_dev) done before kfree(dev_priv->vbt.child_dev)

Reviewed-by: Ville Syrjälä <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 785f076b3ba781804f2b22b347b4431e3efb0ab3)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915/vlv: Add cdclk workaround for DSI

At least on the Chuwi Vi8 (non pro/plus) the LCD panel will show an image
shifted aprox. 20% to the left (with wraparound) and sometimes also wrong
colors, showing that the panel controller is starting with sampling the
datastream somewhere mid-line. This happens after the first blanking and
re-init of the panel.

After looking at drm.debug output I noticed that initially we inherit the
cdclk of 333333 KHz set by the GOP, but after the re-init we picked 266667
KHz, which turns out to be the cause of this problem, a quick hack to hard
code the cdclk to 333333 KHz makes the problem go away.

I've tested this on various Bay Trail devices, to make sure this not does
cause regressions on other devices and the higher cdclk does not cause
any problems on the following devices:
-GP-electronic T701      1024x600   333333 KHz cdclk after this patch
-PEAQ C1010              1920x1200  333333 KHz cdclk after this patch
-PoV mobii-wintab-800w    800x1280  333333 KHz cdclk after this patch
-Asus Transformer-T100TA 1368x768   320000 KHz cdclk after this patch

Also interesting wrt this is the comment in vlv_calc_cdclk about the
existing workaround to avoid 200 Mhz as clock because that causes issues
in some cases.

This commit extends the "do not use 200 Mhz" workaround with an extra
check to require atleast 320000 KHz (avoiding 266667 KHz) when a DSI
panel is active.

Changes in v2:
-Change the commit message and the code comment to not treat the GOP as
a reference, the GOP should not be treated as a reference

Acked-by: Ville Syrjälä <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit c8dae55a8ced625038d52d26e48273707fab2688)
Signed-off-by: Rodrigo Vivi <[email protected]>

Merge branch 'ibmvnic-leaks'

Thomas Falcon says:

====================
ibmvnic: Fix memory leaks in the driver

This patch set is pretty self-explanatory. It includes
a number of patches that fix memory leaks found with
kmemleak in the ibmvnic driver.
====================

Signed-off-by: David S. Miller <[email protected]>

ibmvnic: Clean RX pool buffers during device close

During device close or reset, there were some cases of outstanding
RX socket buffers not being freed. Include a function similar to the
one that already exists to clean TX socket buffers in this case.

Signed-off-by: Thomas Falcon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ibmvnic: Free RX socket buffer in case of adapter error

If a RX buffer is returned to the client driver with an error, free the
corresponding socket buffer before continuing.

Signed-off-by: Thomas Falcon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ibmvnic: Fix NAPI structures memory leak

This memory is allocated during initialization but never freed,
so do that now.

Signed-off-by: Thomas Falcon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ibmvnic: Fix login buffer memory leaks

During device bringup, the driver exchanges login buffers with
firmware. These buffers contain information such number of TX
and RX queues alloted to the device, RX buffer size, etc. These
buffers weren't being properly freed on device reset or close.

We can free the buffer we send to firmware as soon as we get
a response. There is information in the response buffer that
the driver needs for normal operation so retain it until the
next reset or removal.

Signed-off-by: Thomas Falcon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ibmvnic: Wait until reset is complete to set carrier on

Pushes back setting the carrier on until the end of the reset
code. This resolves a bug where a watchdog timer was detecting
that a TX queue had stalled before the adapter reset was complete.

Signed-off-by: Thomas Falcon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Revert "net: thunderx: Add support for xdp redirect"

This reverts commit aa136d0c82fcd6af14535853c30e219e02b2692d.

As I previously[1] pointed out this implementation of XDP_REDIRECT is
wrong. XDP_REDIRECT is a facility that must work between different
NIC drivers. Another NIC driver can call ndo_xdp_xmit/nicvf_xdp_xmit,
but your driver patch assumes payload data (at top of page) will
contain a queue index and a DMA addr, this is not true and worse will
likely contain garbage.

Given you have not fixed this in due time (just reached v4.16-rc1),
the only option I see is a revert.

[1] http://lkml.kernel.org/r/20171211130902.482513d3@redhat.com

Cc: Sunil Goutham <[email protected]>
Cc: Christina Jacob <[email protected]>
Cc: Aleksey Makarov <[email protected]>
Fixes: aa136d0c82fc ("net: thunderx: Add support for xdp redirect")
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'gvt-fixes-2018-02-14' of https://github.com/intel/gvt-linux into drm-intel-fixes

gvt-fixes-2018-02-14

- gtt mmio 8b access fix (Tina)
- one KBL required mmio reg for switch (Weinan)
- one trace log typo fix (Weinan)

Signed-off-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

sctp: fix some copy-paste errors for file comments

This patch is to fix the file comments in stream.c and
stream_interleave.c

v1->v2:
rephrase the comment for stream.c according to Neil's suggestion.

Fixes: a83863174a61 ("sctp: prepare asoc stream for stream reconf")
Fixes: 0c3f6f655487 ("sctp: implement make_datafrag for sctp_stream_interleave")
Signed-off-by: Xin Long <[email protected]>
Acked-by: Neil Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: fix race on decreasing number of TX queues

netif_set_real_num_tx_queues() can be called when netdev is up.
That usually happens when user requests change of number of
channels/rings with ethtool -L. The procedure for changing
the number of queues involves resetting the qdiscs and setting
dev->num_tx_queues to the new value. When the new value is
lower than the old one, extra care has to be taken to ensure
ordering of accesses to the number of queues vs qdisc reset.

Currently the queues are reset before new dev->num_tx_queues
is assigned, leaving a window of time where packets can be
enqueued onto the queues going down, leading to a likely
crash in the drivers, since most drivers don't check if TX
skbs are assigned to an active queue.

Fixes: e6484930d7c7 ("net: allocate tx queues in register_netdevice")
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

arm64: proc: Set PTE_NG for table entries to avoid traversing them twice

When KASAN is enabled, the swapper page table contains many identical
mappings of the zero page, which can lead to a stall during boot whilst
the G -> nG code continually walks the same page table entries looking
for global mappings.

This patch sets the nG bit (bit 11, which is IGNORED) in table entries
after processing the subtree so we can easily skip them if we see them
a second time.

Tested-by: Mark Rutland <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>

Merge tag 'gfs2-4.16.rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Bob Peterson:
"Fix regressions in the gfs2 iomap for block_map implementation we
recently discovered in commit 3974320ca6"

* tag 'gfs2-4.16.rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Fixes to "Implement iomap for block_map"

Merge tag 'powerpc-4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
"A larger batch of fixes than we'd like. Roughly 1/3 fixes for new
  code, 1/3 fixes for stable and 1/3 minor things.

  There's four commits fixing bugs when using 16GB huge pages on hash,
  caused by some of the preparatory changes for pkeys.

  Two fixes for bugs in the enhanced IRQ soft masking for local_t, one
  of which broke KVM in some circumstances.

  Four fixes for Power9. The most bizarre being a bug where futexes
  stopped working because a NULL pointer dereference didn't trap during
  early boot (it aliased the kernel mapping). A fix for memory hotplug
  when using the Radix MMU, and a fix for live migration of guests using
  the Radix MMU.

  Two fixes for hotplug on pseries machines. One where we weren't
  correctly updating NUMA info when CPUs are added and removed. And the
  other fixes crashes/hangs seen when doing memory hot remove during
  boot, which is apparently a thing people do.

  Finally a handful of build fixes for obscure configs and other minor
  fixes.

  Thanks to: Alexey Kardashevskiy, Aneesh Kumar K.V, Balbir Singh, Colin
  Ian King, Daniel Henrique Barboza, Florian Weimer, Guenter Roeck,
  Harish, Laurent Vivier, Madhavan Srinivasan, Mauricio Faria de
  Oliveira, Nathan Fontenot, Nicholas Piggin, Sam Bobroff"

* tag 'powerpc-4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  selftests/powerpc: Fix to use ucontext_t instead of struct ucontext
  powerpc/kdump: Fix powernv build break when KEXEC_CORE=n
  powerpc/pseries: Fix build break for SPLPAR=n and CPU hotplug
  powerpc/mm/hash64: Zero PGD pages on allocation
  powerpc/mm/hash64: Store the slot information at the right offset for hugetlb
  powerpc/mm/hash64: Allocate larger PMD table if hugetlb config is enabled
  powerpc/mm: Fix crashes with 16G huge pages
  powerpc/mm: Flush radix process translations when setting MMU type
  powerpc/vas: Don't set uses_vas for kernel windows
  powerpc/pseries: Enable RAS hotplug events later
  powerpc/mm/radix: Split linear mapping on hot-unplug
  powerpc/64s/radix: Boot-time NULL pointer protection using a guard-PID
  ocxl: fix signed comparison with less than zero
  powerpc/64s: Fix may_hard_irq_enable() for PMI soft masking
  powerpc/64s: Fix MASKABLE_RELON_EXCEPTION_HV_OOL macro
  powerpc/numa: Invalidate numa_cpu_lookup_table on cpu remove

nvme-rdma: fix sysfs invoked reset_ctrl error flow

When reset_controller that is invoked by sysfs fails,
it enters an error flow which practically removes the
nvme ctrl entirely (similar to delete_ctrl flow). It
causes the system to hang, since a sysfs attribute cannot
be unregistered by one of its own methods.

This can be fixed by calling delete_ctrl as a work rather
than sequential code. In addition, it should give the ctrl
a chance to recover using reconnection mechanism (consistant
with FC reset_ctrl error flow). Also, while we're here, return
suitable errno in case the reset ended with non live ctrl.

Signed-off-by: Nitzan Carmi <[email protected]>
Reviewed-by: Max Gurtovoy <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>

nvmet: Change return code of discard command if not supported

Execute discard command on block device that doesn't support it
should return success.
Returning internal error while using multi-path fails the path.

Reviewed-by: Max Gurtovoy <[email protected]>
Signed-off-by: Israel Rukshin <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>

virtio/s390: implement PM operations for virtio_ccw

Suspend/Resume to/from disk currently fails. Let us wire
up the necessary callbacks. This is mostly just forwarding
the requests to the virtio drivers. The only thing that
has to be done in virtio_ccw itself is to re-set the
virtio revision.

Suggested-by: Thomas Huth <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
Message-Id: <20171207141102 [email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
[CH: merged <20171218083706 [email protected]> to fix
!CONFIG_PM configs]
Signed-off-by: Cornelia Huck <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

ALSA: hda/realtek: PCI quirk for Fujitsu U7x7

These laptops have a combined jack to attach headsets, the U727 on
the left, the U757 on the right, but a headsets microphone doesn't
work. Using hdajacksensetest I found that pin 0x19 changed the
present state when plugging the headset, in addition to 0x21, but
didn't have the correct configuration (shown as "Not connected").

So this sets the configuration to the same values as the headphone
pin 0x21 except for the device type microphone, which makes it
work correctly. With the patch the configured pins for U727 are

Pin 0x12 (Internal Mic, Mobile-In): present = No
Pin 0x14 (Internal Speaker): present = No
Pin 0x19 (Black Mic, Left side): present = No
Pin 0x1d (Internal Aux): present = No
Pin 0x21 (Black Headphone, Left side): present = No

Signed-off-by: Jan-Marek Glogowski <[email protected]>
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

mmc: bcm2835: Don't overwrite max frequency unconditionally

The optional DT parameter max-frequency could init the max bus frequency.
So take care of this, before setting the max bus frequency.

Fixes: 660fc733bd74 ("mmc: bcm2835: Add new driver for the sdhost controller.")
Signed-off-by: Phil Elwell <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Cc: <[email protected]> # 4.12+
Signed-off-by: Ulf Hansson <[email protected]>

Revert "mmc: meson-gx: include tx phase in the tuning process"

This reverts commit 0a44697627d17a66d7dc98f17aeca07ca79c5c20.

This commit was initially intended to fix problems with hs200 and hs400
on some boards, mainly the odroid-c2. The OC2 (Rev 0.2) I have performs
well in this modes, so I could not confirm these issues.

We've had several reports about the issues being still present on (some)
OC2, so apparently, this change does not do what it was supposed to do.
Maybe the eMMC signal quality is on the edge on the board. This may
explain the variability we see in term of stability, but this is just a
guess. Lowering the max_frequency to 100Mhz seems to do trick for those
affected by the issue

Worse, the commit created new issues (CRC errors and hangs) on other
boards, such as the kvim 1 and 2, the p200 or the libretech-cc.

According to amlogic, the Tx phase should not be tuned and left in its
default configuration, so it is best to just revert the commit.

Fixes: 0a44697627d1 ("mmc: meson-gx: include tx phase in the tuning process")
Cc: <[email protected]> # 4.14+
Signed-off-by: Jerome Brunet <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>

ALSA: seq: Fix racy pool initializations

ALSA sequencer core initializes the event pool on demand by invoking
snd_seq_pool_init() when the first write happens and the pool is
empty. Meanwhile user can reset the pool size manually via ioctl
concurrently, and this may lead to UAF or out-of-bound accesses since
the function tries to vmalloc / vfree the buffer.

A simple fix is to just wrap the snd_seq_pool_init() call with the
recently introduced client->ioctl_mutex; as the calls for
snd_seq_pool_init() from other side are always protected with this
mutex, we can avoid the race.

Reported-by: 范龙飞 <[email protected]>
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

drm/i915/gvt: fix one typo of render_mmio trace

Fix one typo of render_mmio trace, exchange the mmio value of old and new.

Signed-off-by: Weinan Li <[email protected]>
Signed-off-by: Zhenyu Wang <[email protected]>

drm/i915/gvt: Support BAR0 8-byte reads/writes

GGTT is in BAR0 with 8 bytes aligned. With a qemu patch (commit:
38d49e8c1523d97d2191190d3f7b4ce7a0ab5aa3), VFIO can use 8-byte reads/
writes to access it.

This patch is to support the 8-byte GGTT reads/writes.

Ideally, we would like to support 8-byte reads/writes for the total BAR0.
But it needs more work for handling 8-byte MMIO reads/writes.

This patch can fix the issue caused by partial updating GGTT entry, during
guest booting up.

v3:
- Use intel_vgpu_get_bar_gpa() stead. (Zhenyu)
- Include all the GGTT checking logic in gtt_entry(). (Zhenyu)

v2:
- Limit to GGTT entry. (Zhenyu)

Signed-off-by: Tina Zhang <[email protected]>
Signed-off-by: Zhenyu Wang <[email protected]>

drm/i915/gvt: add 0xe4f0 into gen9 render list

Guest may set this register on KBL platform, it can impact hardware
behavior, so add it into the gen9 render list. Otherwise gpu hang issue may
happen during different vgpu switch.

v2: separate it from patch set.

Cc: Zhi Wang <[email protected]>
Cc: Zhenyu Wang <[email protected]>
Signed-off-by: Weinan Li <[email protected]>
Signed-off-by: Zhenyu Wang <[email protected]>

drm/i915/pmu: Fix building without CONFIG_PM

As we peek inside struct device to query members guarded by CONFIG_PM,
so must be the code.

Reported-by: kbuild test robot <[email protected]>
Fixes: 1fe699e30113 ("drm/i915/pmu: Fix sleep under atomic in RC6 readout")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 05273c950a3c93c5f96be8807eaf24f2cc9f1c1e)
Signed-off-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/i915/pmu: Fix sleep under atomic in RC6 readout

We are not allowed to call intel_runtime_pm_get from the PMU counter read
callback since the former can sleep, and the latter is running under IRQ
context.

To workaround this, we record the last known RC6 and while runtime
suspended estimate its increase by querying the runtime PM core
timestamps.

Downside of this approach is that we can temporarily lose a chunk of RC6
time, from the last PMU read-out to runtime suspend entry, but that will
eventually catch up, once device comes back online and in the presence of
PMU queries.

Also, we have to be careful not to overshoot the RC6 estimate, so once
resumed after a period of approximation, we only update the counter once
it catches up. With the observation that RC6 is increasing while the
device is suspended, this should not pose a problem and can only cause
slight inaccuracies due clock base differences.

v2: Simplify by estimating on top of PM core counters. (Imre)

Signed-off-by: Tvrtko Ursulin <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104943
Fixes: 6060b6aec03c ("drm/i915/pmu: Add RC6 residency metrics")
Testcase: igt/perf_pmu/rc6-runtime-pm
Cc: Tvrtko Ursulin <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Imre Deak <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: David Airlie <[email protected]>
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Chris Wilson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 1fe699e30113ed6f6e853ff44710d256072ea627)
Signed-off-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/i915/pmu: Fix PMU enable vs execlists tasklet race

Commit 99e48bf98dd0 ("drm/i915: Lock out execlist tasklet while peeking
inside for busy-stats") added a tasklet_disable call in busy stats
enabling, but we failed to understand that the PMU enable callback runs
as an hard IRQ (IPI).

Consequence of this is that the PMU enable callback can interrupt the
execlists tasklet, and will then deadlock when it calls
intel_engine_stats_enable->tasklet_disable.

To fix this, I realized it is possible to move the engine stats enablement
and disablement to PMU event init and destroy hooks. This allows for much
simpler implementation since those hooks run in normal context (can
sleep).

v2: Extract engine_event_destroy. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <[email protected]>
Fixes: 99e48bf98dd0 ("drm/i915: Lock out execlist tasklet while peeking inside for busy-stats")
Testcase: igt/perf_pmu/enable-race-*
Cc: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: [email protected]
Reviewed-by: Chris Wilson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit b2f78cda260bc6a1a2d382b1d85a29e69b5b3724)
Signed-off-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/i915: Lock out execlist tasklet while peeking inside for busy-stats

In order to prevent a race condition where we may end up overaccounting
the active state and leaving the busy-stats believing the GPU is 100%
busy, lock out the tasklet while we reconstruct the busy state. There is
no direct spinlock guard for the execlists->port[], so we need to
utilise tasklet_disable() as a synchronous barrier to prevent it, the
only writer to execlists->port[], from running at the same time as the
enable.

Fixes: 4900727d35bb ("drm/i915/pmu: Reconstruct active state on starting busy-stats")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Tvrtko Ursulin <[email protected]>
(cherry picked from commit 99e48bf98dd036090b480a12c39e8b971731247e)
Signed-off-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/i915/breadcrumbs: Ignore unsubmitted signalers

When a request is preempted, it is unsubmitted from the HW queue and
removed from the active list of breadcrumbs. In the process, this
however triggers the signaler and it may see the clear rbtree with the
old, and still valid, seqno, or it may match the cleared seqno with the
now zero rq->global_seqno. This confuses the signaler into action and
signaling the fence.

Fixes: d6a2289d9d6b ("drm/i915: Remove the preempted request from the execution queue")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: <[email protected]> # v4.12+
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit fd10e2ce9905030d922e179a8047a4d50daffd8e)
Signed-off-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

nvme-pci: Fix timeouts in connecting state

We need to halt the controller immediately if we haven't completed
initialization as indicated by the new "connecting" state.

Fixes: ad70062cdb ("nvme-pci: introduce RECONNECTING state to mark initializing procedure")
Signed-off-by: Keith Busch <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

nvme-pci: Remap CMB SQ entries on every controller reset

The controller memory buffer is remapped into a kernel address on each
reset, but the driver was setting the submission queue base address
only on the very first queue creation. The remapped address is likely to
change after a reset, so accessing the old address will hit a kernel bug.

This patch fixes that by setting the queue's CMB base address each time
the queue is created.

Fixes: f63572dff1421 ("nvme: unmap CMB and remove sysfs file in reset path")
Reported-by: Christian Black <[email protected]>
Cc: Jon Derrick <[email protected]>
Cc: <[email protected]> # 4.9+
Signed-off-by: Keith Busch <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

nvme: fix the deadlock in nvme_update_formats

nvme_update_formats will invoke nvme_ns_remove under namespaces_mutext.
The will cause deadlock because nvme_ns_remove will also require
the namespaces_mutext. Fix it by getting the ns entries which should
be removed under namespaces_mutext and invoke nvme_ns_remove out of
namespaces_mutext.

Signed-off-by: Jianchao Wang <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>

gfs2: Fixes to "Implement iomap for block_map"

It turns out that commit 3974320ca6 "Implement iomap for block_map"
introduced a few bugs that trigger occasional failures with xfstest
generic/476:

In gfs2_iomap_begin, we jump to do_alloc when we determine that we are
beyond the end of the allocated metadata (height > ip->i_height).
There, we can end up calling hole_size with a metapath that doesn't
match the current metadata tree, which doesn't make sense. After
untangling the code at do_alloc, fix this by checking if the block we
are looking for is within the range of allocated metadata.

In addition, add a BUG() in case gfs2_iomap_begin is accidentally called
for reading stuffed files: this is handled separately. Make sure we
don't truncate iomap->length for reads beyond the end of the file; in
that case, the entire range counts as a hole.

Finally, revert to taking a bitmap write lock when doing allocations.
It's unclear why that change didn't lead to any failures during testing.

Signed-off-by: Andreas Gruenbacher <[email protected]>
Signed-off-by: Bob Peterson <[email protected]>

rds: do not call ->conn_alloc with GFP_KERNEL

Commit ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
netns/module teardown and rds connection/workq management")
adds an rcu read critical section to __rd_conn_create. The
memory allocations in that critcal section need to use
GFP_ATOMIC to avoid sleeping.

This patch was verified with syzkaller reproducer.

Reported-by: [email protected]
Fixes: ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
netns/module teardown and rds connection/workq management")
Signed-off-by: Sowmini Varadhan <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'mips_4.16_2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips

Pull MIPS fix from James Hogan:
"A single change (and associated DT binding update) to allow the
  address of the MIPS Cluster Power Controller (CPC) to be chosen by DT,
  which allows SMP to work on generic MIPS kernels where the bootloader
  hasn't configured the CPC address (i.e. the new Ranchu platform)"

* tag 'mips_4.16_2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips:
  MIPS: CPC: Map registers using DT in mips_cpc_default_phys_base()
  dt-bindings: Document mti,mips-cpc binding

Merge branch 'net-sched-couple-of-fixes'

Jiri Pirko says:

====================
net: sched: couple of fixes

This patchset contains couple of fixes following-up the shared block
patchsets.
====================

Signed-off-by: David S. Miller <[email protected]>

net: sched: fix tc_u_common lookup

The offending commit wrongly assumes 1:1 mapping between block and q.
However, there are multiple blocks for a single q for classful qdiscs.
Since the obscure tc_u_common sharing mechanism expects it to be shared
among a qdisc, fix it by storing q pointer in case the block is not
shared.

Reported-by: Paweł Staszewski <[email protected]>
Reported-by: Cong Wang <[email protected]>
Fixes: 7fa9d974f3c2 ("net: sched: cls_u32: use block instead of q in tc_u_common")
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sched: don't set q pointer for shared blocks

It is pointless to set block->q for block which are shared among
multiple qdiscs. So remove the assignment in that case. Do a bit of code
reshuffle to make block->index initialized at that point so we can use
tcf_block_shared() helper.

Reported-by: Cong Wang <[email protected]>
Fixes: 4861738775d7 ("net: sched: introduce shared filter blocks infrastructure")
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_router: Fix error path in mlxsw_sp_vr_create

Since mlxsw_sp_fib_create() and mlxsw_sp_mr_table_create()
use ERR_PTR macro to propagate int err through return of a pointer,
the return value is not NULL in case of failure. So if one
of the calls fails, one of vr->fib4, vr->fib6 or vr->mr4_table
is not NULL and mlxsw_sp_vr_is_used wrongly assumes
that vr is in use which leads to crash like following one:

[ 1293.949291] BUG: unable to handle kernel NULL pointer dereference at 00000000000006c9
[ 1293.952729] IP: mlxsw_sp_mr_table_flush+0x15/0x70 [mlxsw_spectrum]

Fix this by using local variables to hold the pointers and set vr->*
only in case everything went fine.

Fixes: 76610ebbde18 ("mlxsw: spectrum_router: Refactor virtual router handling")
Fixes: a3d9bc506d64 ("mlxsw: spectrum_router: Extend virtual routers with IPv6 support")
Fixes: d42b0965b1d4 ("mlxsw: spectrum_router: Add multicast routes notification handling functionality")
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: af_unix: fix typo in UNIX_SKB_FRAGS_SZ comment

Change "minimun" to "minimum".

Signed-off-by: Tobias Klauser <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

powerpc/macio: set a proper dma_coherent_mask

We have expected busses to set up a coherent mask to properly use the
common dma mapping code for a long time, and now that I've added a warning
macio turned out to not set one up yet. This sets it to the same value
as the dma_mask, which seems to be what the drivers expect.

Reported-by: Mathieu Malaterre <[email protected]>
Tested-by: Mathieu Malaterre <[email protected]>
Reported-by: Meelis Roos <[email protected]>
Tested-by: Meelis Roos <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

uapi/if_ether.h: move __UAPI_DEF_ETHHDR libc define

This fixes a compile problem of some user space applications by not
including linux/libc-compat.h in uapi/if_ether.h.

linux/libc-compat.h checks which "features" the header files, included
from the libc, provide to make the Linux kernel uapi header files only
provide no conflicting structures and enums. If a user application mixes
kernel headers and libc headers it could happen that linux/libc-compat.h
gets included too early where not all other libc headers are included
yet. Then the linux/libc-compat.h would not prevent all the
redefinitions and we run into compile problems.
This patch removes the include of linux/libc-compat.h from
uapi/if_ether.h to fix the recently introduced case, but not all as this
is more or less impossible.

It is no problem to do the check directly in the if_ether.h file and not
in libc-compat.h as this does not need any fancy glibc header detection
as glibc never provided struct ethhdr and should define
__UAPI_DEF_ETHHDR by them self when they will provide this.

The following test program did not compile correctly any more:

#include <linux/if_ether.h>
#include <netinet/in.h>
#include <linux/in.h>

int main(void)
{
return 0;
}

Fixes: 6926e041a892 ("uapi/if_ether.h: prevent redefinition of struct ethhdr")
Reported-by: Guillaume Nault <[email protected]>
Cc: <[email protected]> # 4.15
Signed-off-by: Hauke Mehrtens <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

blk: optimization for classic polling

This removes the dependency on interrupts to wake up task. Set task
state as TASK_RUNNING, if need_resched() returns true,
while polling for IO completion.
Earlier, polling task used to sleep, relying on interrupt to wake it up.
This made some IO take very long when interrupt-coalescing is enabled in
NVMe.

Reference:
http://lists.infradead.org/pipermail/linux-nvme/2018-February/015435.html

Changes since v2->v3:
-using __set_current_state() instead of set_current_state()

Changes since v1->v2:
-setting task state once in blk_poll, instead of multiple
callers.

Signed-off-by: Nitesh Shetty <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages

In the following commit:

ce0fa3e56ad2 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")

... we added code to memory_failure() to unmap the page from the
kernel 1:1 virtual address space to avoid speculative access to the
page logging additional errors.

But memory_failure() may not always succeed in taking the page offline,
especially if the page belongs to the kernel. This can happen if
there are too many corrected errors on a page and either mcelog(8)
or drivers/ras/cec.c asks to take a page offline.

Since we remove the 1:1 mapping early in memory_failure(), we can
end up with the page unmapped, but still in use. On the next access
the kernel crashes :-(

There are also various debug paths that call memory_failure() to simulate
occurrence of an error. Since there is no actual error in memory, we
don't need to map out the page for those cases.

Revert most of the previous attempt and keep the solution local to
arch/x86/kernel/cpu/mcheck/mce.c. Unmap the page only when:

1) there is a real error
2) memory_failure() succeeds.

All of this only applies to 64-bit systems. 32-bit kernel doesn't map
all of memory into kernel space. It isn't worth adding the code to unmap
the piece that is mapped because nobody would run a 32-bit kernel on a
machine that has recoverable machine checks.

Signed-off-by: Tony Luck <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robert (Persistent Memory) <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected] #v4.14
Fixes: ce0fa3e56ad2 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
Signed-off-by: Ingo Molnar <[email protected]>

locking/semaphore: Update the file path in documentation

While reading this header I noticed that the locking stuff has moved to
kernel/locking/*, so update the path in semaphore.h to point to that.

Signed-off-by: Tycho Andersen <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

locking/atomic/bitops: Document and clarify ordering semantics for failed test_and_{}_bit()

A test_and_{}_bit() operation fails if the value of the bit is such that
the modification does not take place. For example, if test_and_set_bit()
returns 1. In these cases, follow the behaviour of cmpxchg and allow the
operation to be unordered. This also applies to test_and_set_bit_lock()
if the lock is found to be be taken already.

Signed-off-by: Will Deacon <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

locking/qspinlock: Ensure node->count is updated before initialising node

When queuing on the qspinlock, the count field for the current CPU's head
node is incremented. This needn't be atomic because locking in e.g. IRQ
context is balanced and so an IRQ will return with node->count as it
found it.

However, the compiler could in theory reorder the initialisation of
node[idx] before the increment of the head node->count, causing an
IRQ to overwrite the initialised node and potentially corrupt the lock
state.

Avoid the potential for this harmful compiler reordering by placing a
barrier() between the increment of the head node->count and the subsequent
node initialisation.

Signed-off-by: Will Deacon <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

locking/qspinlock: Ensure node is initialised before updating prev->next

When a locker ends up queuing on the qspinlock locking slowpath, we
initialise the relevant mcs node and publish it indirectly by updating
the tail portion of the lock word using xchg_tail. If we find that there
was a pre-existing locker in the queue, we subsequently update their
->next field to point at our node so that we are notified when it's our
turn to take the lock.

This can be roughly illustrated as follows:

  /* Initialise the fields in node and encode a pointer to node in tail */
  tail = initialise_node(node);

  /*
   * Exchange tail into the lockword using an atomic read-modify-write
   * operation with release semantics
   */
  old = xchg_tail(lock, tail);

  /* If there was a pre-existing waiter ... */
  if (old & _Q_TAIL_MASK) {
prev = decode_tail(old);
smp_read_barrier_depends();

/* ... then update their ->next field to point to node.
WRITE_ONCE(prev->next, node);
  }

The conditional update of prev->next therefore relies on the address
dependency from the result of xchg_tail ensuring order against the
prior initialisation of node. However, since the release semantics of
the xchg_tail operation apply only to the write portion of the RmW,
then this ordering is not guaranteed and it is possible for the CPU
to return old before the writes to node have been published, consequently
allowing us to point prev->next to an uninitialised node.

This patch fixes the problem by making the update of prev->next a RELEASE
operation, which also removes the reliance on dependency ordering.

Signed-off-by: Will Deacon <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

x86/error_inject: Make just_return_func() globally visible

With link time optimizations enabled, I get a link failure:

./ccLbOEHX.ltrans19.ltrans.o: In function `override_function_with_return':
<artificial>:(.text+0x7f3): undefined reference to `just_return_func'

Marking the symbol .globl makes it work as expected.

Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nicolas Pitre <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Fixes: 540adea3809f ("error-injection: Separate error-injection from kprobe")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

x86/platform/UV: Fix GAM Range Table entries less than 1GB

The latest UV platforms include the new ApachePass NVDIMMs into the
UV address space. This has introduced address ranges in the Global
Address Map Table that are less than the previous lowest range, which
was 2GB. Fix the address calculation so it accommodates address ranges
from bytes to exabytes.

Signed-off-by: Mike Travis <[email protected]>
Reviewed-by: Andrew Banman <[email protected]>
Reviewed-by: Dimitri Sivanich <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Russ Anderson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

MIPS: Fix incorrect mem=X@Y handling

Commit 73fbc1eba7ff ("MIPS: fix mem=X@Y commandline processing") added a
fix to ensure that the memory range between PHYS_OFFSET and low memory
address specified by mem= cmdline argument is not later processed by
free_all_bootmem. This change was incorrect for systems where the
commandline specifies more than 1 mem argument, as it will cause all
memory between PHYS_OFFSET and each of the memory offsets to be marked
as reserved, which results in parts of the RAM marked as reserved
(Creator CI20's u-boot has a default commandline argument 'mem=256M@0x0
mem=768M@0x30000000').

Change the behaviour to ensure that only the range between PHYS_OFFSET
and the lowest start address of the memories is marked as protected.

This change also ensures that the range is marked protected even if it's
only defined through the devicetree and not only via commandline
arguments.

Reported-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Marcin Nowakowski <[email protected]>
Fixes: 73fbc1eba7ff ("MIPS: fix mem=X@Y commandline processing")
Cc: Ralf Baechle <[email protected]>
Cc: [email protected]
Cc: <[email protected]> # v4.11+
Tested-by: Mathieu Malaterre <[email protected]>
Patchwork: https://patchwork.linux-mips.org/patch/18562/
Signed-off-by: James Hogan <[email protected]>

x86/build: Add arch/x86/tools/insn_decoder_test to .gitignore

The file was generated by make command and should not be in the source tree.

Signed-off-by: Progyan Bhattacharya <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>

sched/cpufreq: Remove unused SUGOV_KTHREAD_PRIORITY macro

Since schedutil kernel thread directly set priority to 0, the macro
SUGOV_KTHREAD_PRIORITY is not used. So remove it.

Signed-off-by: Leo Yan <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
Acked-by: Daniel Lezcano <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rafael J . Wysocki <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vikram Mulukutla <[email protected]>
Cc: Vincent Guittot <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>